An Interactive Reading of

FedACT
Concurrent Federated Intelligence
across Heterogeneous Data Sources

The paper, in plain English

Your phone learns to predict your next word. Your smartwatch tracks your health. Your home assistant understands your voice. All of these run federated learning — model training that keeps your data on your device. But here's the problem: when all these tasks run at the same time across a shared pool of devices, the system bogs down. A powerful phone might get assigned to an easy task while a weak one struggles with a heavy model. Worse, some devices never get picked at all, starving the system of diverse data. FedACT tackles this exact problem.

FedACT introduces an alignment scoring mechanism that evaluates every possible device-job pairing across two dimensions. First, resource alignment: does the device actually have the compute, memory, and bandwidth the job needs? Second, participation fairness: has this device been left out too long? The scheduler then assigns the best-matched devices to each job, round after round, dynamically updating scores as the system runs. Think of it like an air traffic controller that routes planes not just to any runway, but to the one best suited for each aircraft's size and speed — while making sure every gate gets its fair share of traffic.

The result: FedACT cuts average job completion time by up to 8.3× compared to existing multi-job FL schedulers. Under non-IID data (the realistic case where each device holds a skewed, non-representative slice of data), FedACT improves final model accuracy by up to 44.5% — because fair participation ensures training data diversity that naive schedulers miss. For the hardest job in the test suite (VGG-16 on CIFAR-10 with heterogeneous devices), FedACT reaches the target accuracy 8.7× faster than sequential single-job training.

I

Resource Alignment

Match devices to jobs by computing a compatibility score across compute, memory, and bandwidth — ensuring no device is assigned a job it can't handle efficiently.

II

Participation Fairness

Balance device selection across jobs so that no data source is over- or under-represented, critical when data distributions are non-IID.

III

Concurrent Scheduling

Dynamically re-evaluate device-job pairings every training round — no static assignments, no blocking — maximizing resource utilization across all simultaneous jobs.

Chapter 1

Ships Passing in the Night

Federated learning is already hard with one model. What happens when three — or thirty — models all need to train at once across the same pool of smartphones?

In standard federated learning, a single global model is trained collaboratively across K devices. Each device k holds a local dataset $D_k$ of size $n_k$, and the goal is to minimize:

$$\min_{w} F(w) = \sum_{k=1}^{K} \frac{n_k}{N} F_k(w), \qquad F_k(w) = \frac{1}{n_k}\sum_{j=1}^{n_k} f(w; x_j, y_j)$$

This is the classic FedAvg objective: weighted average of per-device empirical losses. It works — for one model. But in the real world, phones need to train next-word predictors, image classifiers, and speech recognizers all at the same time.

With M concurrent jobs, the problem becomes:

$$\min_{\mathbf{W}} \sum_{m=1}^{M} L^m, \quad L^m = \sum_{k=1}^{K} \frac{|D_k^m|}{|D^m|} F_k^m(w^m)$$

where $\mathbf{W} = \{w^1, w^2, \dots, w^M\}$ collects the model parameters for all jobs. Each job m has its own distributed dataset $D^m = \cup_k D_k^m$ and its own loss landscape. The naive approach — random device assignment per job — produces the blue and gray curves below. They work, but they're slow.

Each curve shows a different scheduling strategy reaching the same target accuracy. FedACT (red) reaches it in 43 minutes. Random (gray) takes 134 minutes — over 3× longer.

Random assignment wastes up to 70% of available training time in multi-job FL — not because the math is wrong, but because the scheduler is blind. It doesn't know which device is good at what.

Next: The FedACT Architecture
Chapter 2

A System That Actually Looks at Its Devices

FedACT replaces blind scheduling with a six-step round that continuously re-evaluates which device should train which model.

The six-step FedACT round. Click each numbered step for details.

Click a step in the diagram to see its role in the training loop.

FedACT's key innovation is Step 2 — the alignment scoring and scheduling plan. This is where resource compatibility and participation fairness are jointly evaluated. Steps 3–6 follow standard FL protocols, but Step 2 is what makes multi-job FL fast.

Next: Resource Alignment
Chapter 3

Does This Device Fit This Job?

Every job has different resource appetites. A VGG network devours GPU memory; a simple LeNet sips it. FedACT computes a match score before making any assignment.

FedACT considers three resource types: computational power, memory, and network bandwidth. The execution time of job m on device k follows a shifted exponential distribution:

$$P[t_k^m < t] = \begin{cases} 1 - \exp\!\left(-\frac{\mu_k}{\tau^m \alpha_k |D_k^m|}\,(t - \tau^m\alpha_k|D_k^m|)\right), & t \ge \tau^m\alpha_k|D_k^m| \\[4pt] 0, & \text{otherwise} \end{cases}$$

where $\alpha_k$ captures the device's max capability, $\mu_k$ captures its variability, $\tau^m$ is the number of local epochs for job m, and $|D_k^m|$ is the local data size. Faster devices have larger $\mu_k$ and smaller $\alpha_k$.

Live updates as you drag — adjust each device's available resources and watch the alignment scores shift.

Unmet Jobs Count
0
Best Match Score
0.00
All Assignable?
Yes

A job is only considered for a device if every resource demand can be met. This prevents resource oversubscription. In the baseline systems, a VGG job might land on a weak device and stall everyone else waiting for it — the classic straggler problem.

Next: Participation Fairness
Chapter 4

Every Voice in the Room

Resource alignment picks the strongest devices. But when data is non-IID, always picking the strongest devices starves the model of diversity — and kills accuracy.

The fairness score of device k for job m in round r is defined as the deviation from the average participation count:

$$F_{k,m}^{r}(V_m^r) = 1 - \left(s_{k,m}^{r} - \frac{1}{|K|}\sum_{k \in K} s_{k,m}^{r}\right)^{\!2}$$

where $s_{k,m}^{r}$ counts how many times device k has been assigned to job m across all prior scheduling rounds. The participation count updates incrementally:

$$s_{k,m}^{r+1} = \begin{cases} s_{k,m}^{r} + 1, & \text{if device } k \in V_m^{r} \\[4pt] s_{k,m}^{r}, & \text{otherwise} \end{cases}$$

The score is highest (1.0) when a device's participation count exactly matches the average, and drops as it deviates — either by being picked too often (over-representation) or too rarely (under-representation).

Advance through rounds and watch how fairness keeps device participation balanced.

Current Round
1
Fairness σ
0.00
Max Imbalance
0

Under non-IID data, participation fairness improves accuracy by up to 44.5%. Without it, resource-preferred devices dominate training, and the global model never sees diverse data. With it, every device's data contributes to convergence.

Next: The Alignment Score
Chapter 5

Balancing Speed and Fairness

Resource alignment speeds up training. Participation fairness improves accuracy. FedACT combines them with two tunable weights.

The alignment score for device k and job m in round r is:

$$\text{Score}_{k,m}^{r}(V_m^r) = \alpha \cdot R_{k,m}^{r}(V_m^r) \;+\; \beta \cdot F_{k,m}^{r}(V_m^r)$$

where $R_{k,m}^r$ is the resource alignment score (normalized dot product of device resources × job demands) and $F_{k,m}^r$ is the participation fairness score. After computing all scores, FedACT selects the top-$C_m$ devices for each job, where $C_m$ is the fraction of total devices assigned to job m.

The paper uses $\alpha$ for fast convergence and $\beta$ for high accuracy. In experiments, $\alpha$ and $\beta$ are empirically tuned from short training runs before full deployment.

Higher → faster convergence

Higher → better accuracy

Est. Convergence Speed
70
Est. Final Accuracy
91.2%
Score Components
R: 70% / F: 30%

The sweet spot depends on your data distribution. Under IID data, crank up $\alpha$ and race to the finish. Under non-IID, $\beta$ becomes critical — without it, the model converges to a biased solution that misses entire data patterns.

Next: The Experiments
Chapter 6

Testing Across the Map

FedACT was evaluated on 5 benchmark datasets, 5 model architectures, and 2 concurrent job groups — against 4 baselines, under both IID and non-IID data.

Baselines compared against:

Hardware: 4× NVIDIA RTX A4000, Intel i9-10900X, 64GB RAM. SGD optimizer, 5 local epochs, 100 devices with 10% selected per job per round.

The experiments span 5 datasets, 5 models, and 2 distinct job groups — covering a 420× range in parameter count (62K to 26M) and both IID and non-IID data distributions. This isn't a narrow benchmark; it's a stress test.

Next: Results at Scale
Chapter 7

8.3× Faster, 44.5% More Accurate

FedACT doesn't just edge out the competition — it reshapes the curve. The benefits are largest exactly where they matter most: complex models on heterogeneous, non-IID data.

Avg. JCT Reduction (FedACT)
8.3×
Max Accuracy Gain
44.5%
Max Speedup (Single Job)
8.7×

Under non-IID settings, FedACT cuts average JCT by up to 8.3× versus baselines. Even under IID (the "easy" case), it's up to 3.9× faster. The gains are system-wide, not cherry-picked — every job in every group finishes faster with FedACT.

Next: Why FedACT Matters
Chapter 8

From Your Phone to the Factory Floor

Multi-job federated learning isn't a hypothetical — it's already happening. Your phone already runs multiple on-device ML models. FedACT shows us how to run them efficiently.

FedACT's contributions, summarized:

The paper also lays out concrete future directions: extending FedACT to support asynchronous model updates (so a device can train multiple jobs without server-side blocking), and incorporating privacy-enhancing techniques like homomorphic encryption, secure aggregation, and differential privacy into the scheduling pipeline.

The scheduler is not a plumbing detail — it's a multiplier. In federated systems, where every device counts and every round matters, smart scheduling delivers compound gains: faster convergence, higher accuracy, and better resource utilization — all at once.

Paper: arXiv:2605.00011 · Accepted at IPDPS 2026, New Orleans · Built for geepity.com