Privacy-Preserving CPU Tracker

Let's build a privacy-preserving CPU monitoring system. This API is responsible for collecting local CPU usage data while ensuring privacy through Differential privacy. While aggregated statistics are shared with the network, the raw CPU measurements never leave the datasite, and remain private.

In this tutorial, we will create a system that monitors CPU usage and applies differential privacy to protect sensitive information. Although we use CPU metrics for demonstration, the principles apply to any system metrics you choose to monitor.

Source Code

Find the complete cpu_tracker_member API code in our GitHub repository.

Project Structure

cpu_tracker_member/
├── .gitignore           # Git ignore rules
├── .python-version      # Python version specification
├── main.py              # Main application logic
├── requirements.txt     # Python package dependencies
└── run.sh               # Setup and run script

Required Dependencies

Python (version 3.12)
SyftBox
diffprivlib
psutil

The local virtual environment used by the API is automatically created as part of the running script.

Running the Application

Installation

Install the application in your SyftBox environment:

# Clone the repository
git clone https://github.com/openmined/cpu_tracker_member.git

# Copy to your SyftBox installation
cp -r cpu_tracker_member <SYFTBOX_DATADIR>/apis

Note <SYFTBOX_DATADIR> refers to SyftBox data directory, default is $HOME/SyftBox

Data Organization

The API automatically organizes and share data into two main folders:

Public API Data Folder: <SYFTBOX_DATADIR>/<your-datasite>/api_data/cpu_tracker/

This directory serves as the secure channel for sharing privacy-protected CPU data:

Stores the differentially-private CPU metrics calculated by the API.
Automatically synchronizes with the SyftBox server
Implements strict access control:
- Only the designated aggregator has read permissions (i.e. data remains invisible to all other network participants)
- Synchronization occurs exclusively with the authorized aggregator's datasite.

Private Datasite Folder: <SYFTBOX_DATADIR>/<your-datasite>/private/cpu_tracker/

This directory stores raw CPU measurements and it will remain completely isolated from the network, ensuring data stays exclusively within the datasite.

Verification

Protected CPU data will be available in the public API data folder, and raw CPU data will be stored in the private folder
Processing logs will indicate successful data collection and protection.

API Workflow

The execution of the cpu_tracker_member API follows these main steps:

Directory Initialization

Set up shared directories with appropriate permissions.

CPU Data Collection

Sample CPU usage every 20 seconds, and calculate mean usage
Applies differential privacy to protect the data
Stores both raw and protected versions in appropriate directories

Implementation Details

In this section we'll explore the key code components of each step. While we'll focus on the most important snippets here, you can find the complete implementation in the GitHub repository.

1. Directory Initialization

First, we set up the directories that enable secure data storage and sharing:

def create_restricted_public_folder(cpu_tracker_path: Path) -> None:
    os.makedirs(cpu_tracker_path, exist_ok=True)

    # Set default permissions for the created folder
    permissions = SyftPermission.datasite_default(email=client.email)
    permissions.read.append(AGGREGATOR_DATASITE)
    permissions.save(cpu_tracker_path)

def create_private_folder(path: Path) -> Path:
    cpu_tracker_path: Path = path / "private" / "cpu_tracker"
    os.makedirs(cpu_tracker_path, exist_ok=True)

    permissions = SyftPermission.datasite_default(email=client.email)
    permissions.save(cpu_tracker_path)

    return cpu_tracker_path

2. CPU Data Collection

The API collects CPU usage samples at regular intervals:

def get_cpu_usage_samples():
    """Collect 50 CPU usage samples over time intervals of 0.1 seconds."""
    cpu_usage_values = []

    while len(cpu_usage_values) < 50:
        cpu_usage = psutil.cpu_percent()
        cpu_usage_values.append(cpu_usage)
        time.sleep(0.1)

    return cpu_usage_values

and then calculates private and differentially-private means to be shared in private and public protected folders, respectively:

# [...] omissis
from statistics import mean

# [...] omissis

raw_cpu_mean = mean(cpu_usage)

# In main:
mean_with_noise = round(
    dp.mean(
        cpu_usage_samples,
        epsilon=0.5,  # Privacy parameter
        bounds=(0, 100),  # CPU usage bounds
    ),
    2
)

View full implementation

Privacy Considerations

The system implements several privacy-preserving techniques:

Differential Privacy: Protects individual measurements by adding calibrated noise
Access Control: Restricts access to protected data through SyftBox permissions
Local Processing: Ensures raw measurements never leave the local machine
Data Separation: Maintains strict separation between raw and protected data

Additional privacy enhancements could include:

Adjusting the epsilon parameter for stronger privacy guarantees
Implementing additional noise mechanisms
Adding secure aggregation protocols¹²³

References:

Bonawitz et al. "Practical Secure Aggregation for Privacy-Preserving Machine Learning" (2017)
So et al. "SecAgg+: Learning improved secure aggregation for Federated Learning" (2023)
So et al. "Turbo-Aggregate: Breaking the Quadratic Aggregation Barrier in Secure Federated Learning" (2020)

Source Code​

Project Structure​

Required Dependencies​

Running the Application​

Installation​

Data Organization​

Verification​

API Workflow​

Implementation Details​

1. Directory Initialization​

2. CPU Data Collection​

Privacy Considerations​