Privacy-Preserving Aggregator API
The Aggregator API collects and aggregates privacy-protected CPU metrics across distributed clients, calculating network-wide statistics and presenting them through an interactive dashboard.
All aggregation preserves the differential privacy guarantees provided by member nodes, ensuring meaningful network insights while protecting individual node privacy.
Source Code
Find the complete implementation in our cpu_tracker_leader
GitHub repository.
Project Structure
cpu_tracker_leader/
├── run.sh # Entry point script
├── main.py # Main application logic
├── .gitignore # Git ignore file
└── assets/ # Web dashboard assets
├── index.html # Dashboard HTML
├── index.js # Dashboard JavaScript
└── syftbox-sdk.js # SyftBox integration SDK
Required Dependencies
- Python 3.12
- SyftBox
The run.sh
script automatically creates and manages the virtual environment.
Running the Application
Installation
# Clone the repository
git clone https://github.com/openmined/cpu_tracker_leader.git
# Copy to your SyftBox installation
cp -r cpu_tracker_leader <SYFTBOX_DATADIR>/apis
System Operation
The aggregator performs five core functions:
The system scans for active CPU tracker members in the SyftBox directory (<datasite>/api_data/cpu_tracker/active_peers
), validating each peer's data for freshness (within the last minute) and proper privacy protections. It then calculates network-wide statistics using only differentially private metrics, maintaining a rolling window of historical data for time-series analysis.
Access the real-time metrics dashboard at https://syftbox.openmined.org/datasites/<your-datasite>/cpu_tracker/
, where you can view network CPU usage, historical trends, and active peer status. The dashboard also provides tools for new peers to join the network.
API Workflow
The API operates in three phases:
-
Discovery: The system identifies and validates active participants in the network, establishing secure channels for data collection.
-
Data Collection: Protected CPU metrics are gathered from validated peers, ensuring data freshness and correct formatting before processing.
-
Aggregation and Visualization: Network-wide statistics are calculated from collected metrics and displayed through an interactive dashboard that enables peer participation.
Implementation Details
1. Network Participant Discovery
def network_participants(datasite_path: Path):
"""
Retrieves a list of user directories (participants) in a given datasite path.
[...]
"""
entries = os.listdir(datasite_path)
users = []
for entry in entries:
if Path(datasite_path / entry).is_dir():
users.append(entry)
return users
2. Data Collection and Validation
def is_updated(timestamp: str) -> bool:
"""Verifies data freshness (within last minute)."""
data_timestamp = datetime.strptime(timestamp, "%Y-%m-%d %H:%M:%S")
return datetime.now() - data_timestamp < timedelta(minutes=1)
3. Aggregation Process
def get_network_cpu_mean(datasites_path: Path, peers: list[str]) -> Tuple[float, list[str]]:
"""Calculates mean CPU usage across network peers."""
aggregated_usage = aggregated_peers = 0
active_peers = []
for peer in peers:
tracker_file = datasites_path / peer / "api_data" / "cpu_tracker" / "cpu_tracker.json"
if not tracker_file.exists():
continue # skip
peer_data = json.load(open(str(tracker_file), "r"))
if "timestamp" in peer_data and is_updated(peer_data["timestamp"]):
aggregated_usage += float(peer_data["cpu"])
aggregated_peers += 1
active_peers.append(peer)
return (aggregated_usage / aggregated_peers if aggregated_peers else -1), active_peers
Privacy & Security Considerations
Data Privacy
The aggregator preserves privacy through:
- Exclusive use of differentially private metrics
- Time-limited data retention
- Aggregation of recent measurements only
- No exposure of individual peer data
Next Steps
Future enhancements could incorporate: