Skip to main content

Privacy-Preserving Aggregator API

The Aggregator API collects and aggregates privacy-protected CPU metrics across distributed clients, calculating network-wide statistics and presenting them through an interactive dashboard.

info

All aggregation preserves the differential privacy guarantees provided by member nodes, ensuring meaningful network insights while protecting individual node privacy.

Source Code

Find the complete implementation in our cpu_tracker_leader GitHub repository.

Project Structure

cpu_tracker_leader/
├── run.sh # Entry point script
├── main.py # Main application logic
├── .gitignore # Git ignore file
└── assets/ # Web dashboard assets
├── index.html # Dashboard HTML
├── index.js # Dashboard JavaScript
└── syftbox-sdk.js # SyftBox integration SDK

Required Dependencies

  • Python 3.12
  • SyftBox

The run.sh script automatically creates and manages the virtual environment.

Running the Application

Installation

# Clone the repository
git clone https://github.com/openmined/cpu_tracker_leader.git

# Copy to your SyftBox installation
cp -r cpu_tracker_leader <SYFTBOX_DATADIR>/apis

System Operation

The aggregator performs five core functions:

The system scans for active CPU tracker members in the SyftBox directory (<datasite>/api_data/cpu_tracker/active_peers), validating each peer's data for freshness (within the last minute) and proper privacy protections. It then calculates network-wide statistics using only differentially private metrics, maintaining a rolling window of historical data for time-series analysis.

Access the real-time metrics dashboard at https://syftbox.openmined.org/datasites/<your-datasite>/cpu_tracker/, where you can view network CPU usage, historical trends, and active peer status. The dashboard also provides tools for new peers to join the network.

API Workflow

The API operates in three phases:

  1. Discovery: The system identifies and validates active participants in the network, establishing secure channels for data collection.

  2. Data Collection: Protected CPU metrics are gathered from validated peers, ensuring data freshness and correct formatting before processing.

  3. Aggregation and Visualization: Network-wide statistics are calculated from collected metrics and displayed through an interactive dashboard that enables peer participation.

Implementation Details

1. Network Participant Discovery


def network_participants(datasite_path: Path):
"""
Retrieves a list of user directories (participants) in a given datasite path.
[...]
"""
entries = os.listdir(datasite_path)
users = []
for entry in entries:
if Path(datasite_path / entry).is_dir():
users.append(entry)
return users

View full implementation

2. Data Collection and Validation


def is_updated(timestamp: str) -> bool:
"""Verifies data freshness (within last minute)."""
data_timestamp = datetime.strptime(timestamp, "%Y-%m-%d %H:%M:%S")
return datetime.now() - data_timestamp < timedelta(minutes=1)

View full implementation

3. Aggregation Process


def get_network_cpu_mean(datasites_path: Path, peers: list[str]) -> Tuple[float, list[str]]:
"""Calculates mean CPU usage across network peers."""
aggregated_usage = aggregated_peers = 0
active_peers = []

for peer in peers:
tracker_file = datasites_path / peer / "api_data" / "cpu_tracker" / "cpu_tracker.json"
if not tracker_file.exists():
continue # skip

peer_data = json.load(open(str(tracker_file), "r"))
if "timestamp" in peer_data and is_updated(peer_data["timestamp"]):
aggregated_usage += float(peer_data["cpu"])
aggregated_peers += 1
active_peers.append(peer)

return (aggregated_usage / aggregated_peers if aggregated_peers else -1), active_peers

View full implementation

Privacy & Security Considerations

Data Privacy

The aggregator preserves privacy through:

  • Exclusive use of differentially private metrics
  • Time-limited data retention
  • Aggregation of recent measurements only
  • No exposure of individual peer data

Next Steps

Future enhancements could incorporate: