An Interactive Reading of

FedACT
Concurrent Federated Intelligence
across Heterogeneous Data Sources

Md Sirajul Islam · Isabelle G Chapman · N I Md Ashafuddula · Xu Yuan · Li Chen · Nian-Feng Tzeng · Klara Nahrstedt
University of Louisiana at Lafayette · University of Delaware · UIUC
IPDPS 2026, New Orleans · arXiv:2605.00011

The paper, in plain English

Your phone learns to predict your next word. Your smartwatch tracks your health. Your home assistant understands your voice. All of these run federated learning — model training that keeps your data on your device. But here's the problem: when all these tasks run at the same time across a shared pool of devices, the system bogs down. A powerful phone might get assigned to an easy task while a weak one struggles with a heavy model. Worse, some devices never get picked at all, starving the system of diverse data. FedACT tackles this exact problem.

FedACT introduces an alignment scoring mechanism that evaluates every possible device-job pairing across two dimensions. First, resource alignment: does the device actually have the compute, memory, and bandwidth the job needs? Second, participation fairness: has this device been left out too long? The scheduler then assigns the best-matched devices to each job, round after round, dynamically updating scores as the system runs. Think of it like an air traffic controller that routes planes not just to any runway, but to the one best suited for each aircraft's size and speed — while making sure every gate gets its fair share of traffic.

The result: FedACT cuts average job completion time by up to 8.3× compared to existing multi-job FL schedulers. Under non-IID data (the realistic case where each device holds a skewed, non-representative slice of data), FedACT improves final model accuracy by up to 44.5% — because fair participation ensures training data diversity that naive schedulers miss. For the hardest job in the test suite (VGG-16 on CIFAR-10 with heterogeneous devices), FedACT reaches the target accuracy 8.7× faster than sequential single-job training.

I

Resource Alignment

Match devices to jobs by computing a compatibility score across compute, memory, and bandwidth — ensuring no device is assigned a job it can't handle efficiently.

II

Participation Fairness

Balance device selection across jobs so that no data source is over- or under-represented, critical when data distributions are non-IID.

III

Concurrent Scheduling

Dynamically re-evaluate device-job pairings every training round — no static assignments, no blocking — maximizing resource utilization across all simultaneous jobs.

Chapter 1

Ships Passing in the Night

Federated learning is already hard with one model. What happens when three — or thirty — models all need to train at once across the same pool of smartphones?

In plain English

Picture a rideshare fleet operating in a city. Each car has a data connection, an onboard computer, and varying amounts of downtime. The fleet needs to simultaneously improve three algorithms: an arrival-time predictor, a surge-pricing model, and a driver-routing engine. If you naively assign cars using round-robin or random selection, the beefiest onboard computers might get stuck on the simplest model while underpowered units choke on the routing engine. Some cars sit idle while others are overworked. Now add the twist that each car only sees a skewed slice of the city's ride patterns — that's non-IID data — and you've got the multi-job FL problem in a nutshell.

Existing FL systems like FedAvg were designed for a single job. Stretch them to multiple jobs and they break: they either run jobs sequentially (wasting idle devices) or randomly assign devices (wasting compute). Neither approach considers which device is actually good at which job.

In standard federated learning, a single global model is trained collaboratively across K devices. Each device k holds a local dataset $D_k$ of size $n_k$, and the goal is to minimize:

$$\min_{w} F(w) = \sum_{k=1}^{K} \frac{n_k}{N} F_k(w), \qquad F_k(w) = \frac{1}{n_k}\sum_{j=1}^{n_k} f(w; x_j, y_j)$$

This is the classic FedAvg objective: weighted average of per-device empirical losses. It works — for one model. But in the real world, phones need to train next-word predictors, image classifiers, and speech recognizers all at the same time.

With M concurrent jobs, the problem becomes:

$$\min_{\mathbf{W}} \sum_{m=1}^{M} L^m, \quad L^m = \sum_{k=1}^{K} \frac{|D_k^m|}{|D^m|} F_k^m(w^m)$$

where $\mathbf{W} = \{w^1, w^2, \dots, w^M\}$ collects the model parameters for all jobs. Each job m has its own distributed dataset $D^m = \cup_k D_k^m$ and its own loss landscape. The naive approach — random device assignment per job — produces the blue and gray curves below. They work, but they're slow.

Each curve shows a different scheduling strategy reaching the same target accuracy. FedACT (red) reaches it in 43 minutes. Random (gray) takes 134 minutes — over 3× longer.

Random assignment wastes up to 70% of available training time in multi-job FL — not because the math is wrong, but because the scheduler is blind. It doesn't know which device is good at what.

Next: The FedACT Architecture →

Chapter 2

A System That Actually Looks at Its Devices

FedACT replaces blind scheduling with a six-step round that continuously re-evaluates which device should train which model.

The six-step FedACT round. Click each numbered step for details.

Click a step in the diagram to see its role in the training loop.

FedACT's key innovation is Step 2 — the alignment scoring and scheduling plan. This is where resource compatibility and participation fairness are jointly evaluated. Steps 3–6 follow standard FL protocols, but Step 2 is what makes multi-job FL fast.

Next: Resource Alignment →

Chapter 3

Does This Device Fit This Job?

Every job has different resource appetites. A VGG network devours GPU memory; a simple LeNet sips it. FedACT computes a match score before making any assignment.

FedACT considers three resource types: computational power, memory, and network bandwidth. The execution time of job m on device k follows a shifted exponential distribution:

$$P[t_k^m < t] = \begin{cases} 1 - \exp\!\left(-\frac{\mu_k}{\tau^m \alpha_k |D_k^m|}\,(t - \tau^m\alpha_k|D_k^m|)\right), & t \ge \tau^m\alpha_k|D_k^m| \\[4pt] 0, & \text{otherwise} \end{cases}$$

where $\alpha_k$ captures the device's max capability, $\mu_k$ captures its variability, $\tau^m$ is the number of local epochs for job m, and $|D_k^m|$ is the local data size. Faster devices have larger $\mu_k$ and smaller $\alpha_k$.

Device A Compute 0.7 Device A Memory 0.5 Device A Bandwidth 0.8

Device B Compute 0.3 Device B Memory 0.4 Device B Bandwidth 0.4

Device C Compute 0.9 Device C Memory 0.9 Device C Bandwidth 0.3

Job 1 (LeNet) Required Compute 0.2

Job 2 (CNN) Required Compute 0.5

Job 3 (VGG) Required Compute 0.8

Live updates as you drag — adjust each device's available resources and watch the alignment scores shift.

Unmet Jobs Count

0

Best Match Score

0.00

All Assignable?

Yes

A job is only considered for a device if every resource demand can be met. This prevents resource oversubscription. In the baseline systems, a VGG job might land on a weak device and stall everyone else waiting for it — the classic straggler problem.

Next: Participation Fairness →

Chapter 4

Every Voice in the Room

Resource alignment picks the strongest devices. But when data is non-IID, always picking the strongest devices starves the model of diversity — and kills accuracy.

The fairness score of device k for job m in round r is defined as the deviation from the average participation count:

$$F_{k,m}^{r}(V_m^r) = 1 - \left(s_{k,m}^{r} - \frac{1}{|K|}\sum_{k \in K} s_{k,m}^{r}\right)^{\!2}$$

where $s_{k,m}^{r}$ counts how many times device k has been assigned to job m across all prior scheduling rounds. The participation count updates incrementally:

$$s_{k,m}^{r+1} = \begin{cases} s_{k,m}^{r} + 1, & \text{if device } k \in V_m^{r} \\[4pt] s_{k,m}^{r}, & \text{otherwise} \end{cases}$$

The score is highest (1.0) when a device's participation count exactly matches the average, and drops as it deviates — either by being picked too often (over-representation) or too rarely (under-representation).

Advance through rounds and watch how fairness keeps device participation balanced.

Current Round

1

Fairness σ

0.00

Max Imbalance

0

Under non-IID data, participation fairness improves accuracy by up to 44.5%. Without it, resource-preferred devices dominate training, and the global model never sees diverse data. With it, every device's data contributes to convergence.

Next: The Alignment Score →

Chapter 5

Balancing Speed and Fairness

Resource alignment speeds up training. Participation fairness improves accuracy. FedACT combines them with two tunable weights.

The alignment score for device k and job m in round r is:

$$\text{Score}_{k,m}^{r}(V_m^r) = \alpha \cdot R_{k,m}^{r}(V_m^r) \;+\; \beta \cdot F_{k,m}^{r}(V_m^r)$$

where $R_{k,m}^r$ is the resource alignment score (normalized dot product of device resources × job demands) and $F_{k,m}^r$ is the participation fairness score. After computing all scores, FedACT selects the top-$C_m$ devices for each job, where $C_m$ is the fraction of total devices assigned to job m.

The paper uses $\alpha$ for fast convergence and $\beta$ for high accuracy. In experiments, $\alpha$ and $\beta$ are empirically tuned from short training runs before full deployment.

α (Resource weight) 0.7

Higher → faster convergence

β (Fairness weight) 0.3

Higher → better accuracy

Est. Convergence Speed

70

Est. Final Accuracy

91.2%

Score Components

R: 70% / F: 30%

The sweet spot depends on your data distribution. Under IID data, crank up $\alpha$ and race to the finish. Under non-IID, $\beta$ becomes critical — without it, the model converges to a biased solution that misses entire data patterns.

Next: The Experiments →

Chapter 6

Testing Across the Map

FedACT was evaluated on 5 benchmark datasets, 5 model architectures, and 2 concurrent job groups — against 4 baselines, under both IID and non-IID data.

Baselines compared against:

Random — extended from single-job FedAvg; picks devices uniformly at random per round
Greedy — iteratively maximizes a combined resource+penalty score for device selection
Genetic — uses heuristic initialization + evolutionary strategies to optimize assignments
MJFL — applies Bayesian Optimization to assign devices considering communication & computation

Hardware: 4× NVIDIA RTX A4000, Intel i9-10900X, 64GB RAM. SGD optimizer, 5 local epochs, 100 devices with 10% selected per job per round.

The experiments span 5 datasets, 5 models, and 2 distinct job groups — covering a 420× range in parameter count (62K to 26M) and both IID and non-IID data distributions. This isn't a narrow benchmark; it's a stress test.

Next: Results at Scale →

Chapter 7

8.3× Faster, 44.5% More Accurate

FedACT doesn't just edge out the competition — it reshapes the curve. The benefits are largest exactly where they matter most: complex models on heterogeneous, non-IID data.

Avg. JCT Reduction (FedACT)

8.3×

Max Accuracy Gain

44.5%

Max Speedup (Single Job)

8.7×

Under non-IID settings, FedACT cuts average JCT by up to 8.3× versus baselines. Even under IID (the "easy" case), it's up to 3.9× faster. The gains are system-wide, not cherry-picked — every job in every group finishes faster with FedACT.

Next: Why FedACT Matters →

Chapter 8

From Your Phone to the Factory Floor

Multi-job federated learning isn't a hypothetical — it's already happening. Your phone already runs multiple on-device ML models. FedACT shows us how to run them efficiently.

In plain English

The paper frames multi-job FL as a scheduling problem, but the real insight is deeper. Right now, every on-device ML model — your keyboard's next-word predictor, your photos app's face detector, your voice assistant's speech recognizer — trains in isolation. They fight over the same device resources without coordination. FedACT shows that just by being smarter about who trains what and when, you can get 8× more work done per unit time — without changing the models, without upgrading hardware, without compromising privacy. That's a pure scheduling dividend.

The next step, as the authors note, is asynchronous multi-job FL — letting a single device train multiple models simultaneously without waiting on server aggregation. Combined with FedACT's alignment scoring, that could unlock another order-of-magnitude improvement. For now, though, FedACT demonstrates a key architectural principle: in federated systems, the scheduler is a first-class optimization target, not an afterthought.

FedACT's contributions, summarized:

A resource alignment scoring mechanism that matches devices to jobs based on compute, memory, and bandwidth compatibility
A participation fairness module that prevents over-representation of resource-rich devices, critical for non-IID data
A dynamic, online scheduling algorithm that recomputes device-job pairings every round
Experimental validation across 5 datasets, 5 models, 4 baselines, both IID and non-IID distributions
Up to 8.3× faster average job completion and up to 44.5% higher accuracy than state-of-the-art baselines

The paper also lays out concrete future directions: extending FedACT to support asynchronous model updates (so a device can train multiple jobs without server-side blocking), and incorporating privacy-enhancing techniques like homomorphic encryption, secure aggregation, and differential privacy into the scheduling pipeline.

The scheduler is not a plumbing detail — it's a multiplier. In federated systems, where every device counts and every round matters, smart scheduling delivers compound gains: faster convergence, higher accuracy, and better resource utilization — all at once.

Paper: arXiv:2605.00011 · Accepted at IPDPS 2026, New Orleans · Built for geepity.com

FedACTConcurrent Federated Intelligenceacross Heterogeneous Data Sources

Resource Alignment

Participation Fairness

Concurrent Scheduling

Ships Passing in the Night

A System That Actually Looks at Its Devices

Does This Device Fit This Job?

Every Voice in the Room

Balancing Speed and Fairness

Testing Across the Map

8.3× Faster, 44.5% More Accurate

From Your Phone to the Factory Floor

FedACT
Concurrent Federated Intelligence
across Heterogeneous Data Sources