Privacy-Preserving CPU Tracker
Let's build a privacy-preserving CPU monitoring system. This API is responsible for collecting local CPU usage data while ensuring privacy through Differential privacy. While aggregated statistics are shared with the network, the raw CPU measurements never leave the datasite, and remain private.
In this tutorial, we will create a system that monitors CPU usage and applies differential privacy to protect sensitive information. Although we use CPU metrics for demonstration, the principles apply to any system metrics you choose to monitor.
Source Code
Find the complete cpu_tracker_member
API code in our GitHub repository.
Project Structure
cpu_tracker_member/
├── .gitignore # Git ignore rules
├── .python-version # Python version specification
├── main.py # Main application logic
├── requirements.txt # Python package dependencies
└── run.sh # Setup and run script
Required Dependencies
- Python (version 3.12)
- SyftBox
- diffprivlib
- psutil
The local virtual environment used by the API is automatically created as part of the running script.
Running the Application
Installation
Install the application in your SyftBox environment:
# Clone the repository
git clone https://github.com/openmined/cpu_tracker_member.git
# Copy to your SyftBox installation
cp -r cpu_tracker_member <SYFTBOX_DATADIR>/apis
Note <SYFTBOX_DATADIR>
refers to SyftBox data directory, default is $HOME/SyftBox
Data Organization
The API automatically organizes and share data into two main folders:
- Public API Data Folder:
<SYFTBOX_DATADIR>/<your-datasite>/api_data/cpu_tracker/
This directory serves as the secure channel for sharing privacy-protected CPU data:
- Stores the differentially-private CPU metrics calculated by the API.
- Automatically synchronizes with the SyftBox server
- Implements strict access control:
- Only the designated aggregator has read permissions (i.e. data remains invisible to all other network participants)
- Synchronization occurs exclusively with the authorized aggregator's datasite.
- Private Datasite Folder:
<SYFTBOX_DATADIR>/<your-datasite>/private/cpu_tracker/
This directory stores raw CPU measurements and it will remain completely isolated from the network, ensuring data stays exclusively within the datasite.
Verification
- Protected CPU data will be available in the public API data folder, and raw CPU data will be stored in the private folder
- Processing logs will indicate successful data collection and protection.
API Workflow
The execution of the cpu_tracker_member
API follows these main steps:
- Directory Initialization
- Set up shared directories with appropriate permissions.
- CPU Data Collection
- Sample CPU usage every 20 seconds, and calculate mean usage
- Applies differential privacy to protect the data
- Stores both raw and protected versions in appropriate directories
Implementation Details
In this section we'll explore the key code components of each step. While we'll focus on the most important snippets here, you can find the complete implementation in the GitHub repository.
1. Directory Initialization
First, we set up the directories that enable secure data storage and sharing:
def create_restricted_public_folder(cpu_tracker_path: Path) -> None:
os.makedirs(cpu_tracker_path, exist_ok=True)
# Set default permissions for the created folder
permissions = SyftPermission.datasite_default(email=client.email)
permissions.read.append(AGGREGATOR_DATASITE)
permissions.save(cpu_tracker_path)
def create_private_folder(path: Path) -> Path:
cpu_tracker_path: Path = path / "private" / "cpu_tracker"
os.makedirs(cpu_tracker_path, exist_ok=True)
permissions = SyftPermission.datasite_default(email=client.email)
permissions.save(cpu_tracker_path)
return cpu_tracker_path
2. CPU Data Collection
The API collects CPU usage samples at regular intervals:
def get_cpu_usage_samples():
"""Collect 50 CPU usage samples over time intervals of 0.1 seconds."""
cpu_usage_values = []
while len(cpu_usage_values) < 50:
cpu_usage = psutil.cpu_percent()
cpu_usage_values.append(cpu_usage)
time.sleep(0.1)
return cpu_usage_values
and then calculates private and differentially-private means to be shared in private and public protected folders, respectively:
# [...] omissis
from statistics import mean
# [...] omissis
raw_cpu_mean = mean(cpu_usage)
# In main:
mean_with_noise = round(
dp.mean(
cpu_usage_samples,
epsilon=0.5, # Privacy parameter
bounds=(0, 100), # CPU usage bounds
),
2
)
Privacy Considerations
The system implements several privacy-preserving techniques:
- Differential Privacy: Protects individual measurements by adding calibrated noise
- Access Control: Restricts access to protected data through SyftBox permissions
- Local Processing: Ensures raw measurements never leave the local machine
- Data Separation: Maintains strict separation between raw and protected data
Additional privacy enhancements could include:
- Adjusting the epsilon parameter for stronger privacy guarantees
- Implementing additional noise mechanisms
- Adding secure aggregation protocols¹²³
References:
- Bonawitz et al. "Practical Secure Aggregation for Privacy-Preserving Machine Learning" (2017)
- So et al. "SecAgg+: Learning improved secure aggregation for Federated Learning" (2023)
- So et al. "Turbo-Aggregate: Breaking the Quadratic Aggregation Barrier in Secure Federated Learning" (2020)