ASSC 2026 · Association for the Scientific Study of Consciousness
We implement the mechanism Global Workspace Theory describes inside a pretrained LLM, and connect it to IIT through synergy. Because the workspace runs and we can switch its parts on and off, the same system becomes a testbed for both theories, where the brain rarely allows clean intervention. We use it to study how the workspace changes reasoning, integration, and generalisation.
A capacity-limited spotlight selects salient components, writes them sparsely to a shared workspace, and broadcasts the result back to all components, iterated across layers. Unlike prior workspace networks trained from scratch, ours runs on a pretrained model's KV cache. It was derived as an information bottleneck: sharing the model's loss, it keeps what the latent tells us about the output while compressing what it carries about the input. Compress the input, keep the output.
| Method | GSM8K | SVAMP | GSM-Hd | LogiQA | Gaokao | AVG |
|---|---|---|---|---|---|---|
| Llama-3.2 1B | ||||||
| SFT | 33.00 | 42.00 | 8.00 | 28.90 | 23.90 | 27.16 |
| BT | 35.30 | 43.70 | 8.20 | 28.60 | 25.60 | 28.28 |
| +GW | 35.60 | 46.70 | 7.70 | 29.50 | 25.40 | 28.98 |
| Llama-3.2 3B | ||||||
| SFT | 53.98 | 64.67 | 14.33 | 30.26 | 31.05 | 38.86 |
| BT | 57.09 | 71.33 | 14.63 | 30.26 | 30.20 | 40.70 |
| +GW | 58.30 | 69.67 | 15.85 | 31.49 | 30.48 | 41.16 |
| Llama-3.1 8B | ||||||
| SFT | 13.12 | 23.33 | 3.11 | 27.19 | 28.21 | 18.99 |
| BT | 20.85 | 39.33 | 4.93 | 27.04 | 26.50 | 23.73 |
| +GW | 21.76 | 40.00 | 5.38 | 26.57 | 25.07 | 23.76 |
| Qwen3-0.6B | ||||||
| SFT | 53.70 | 68.30 | 20.30 | 27.50 | 26.80 | 39.32 |
| BT | 54.80 | 68.70 | 21.20 | 27.20 | 26.50 | 39.68 |
| +GW | 55.00 | 70.00 | 21.10 | 27.50 | 27.10 | 39.94 |
Across four backbones, broadcasting selected information improves multi-step reasoning. This is access, the consequence GNWT names: the broadcast makes information globally available and usable.
A selective workspace using about 20% of heads matches the full dense version, at roughly 10× fewer parameters and lower compute. The capacity limit is not a sacrifice. Preliminary; the width sweep against the full-width model is in progress.
Synergy is information in the whole that no sum of parts carries, and broadcasting shifts representations toward it. This is integration, the consequence IIT names, produced by the mechanism GWT describes. Still redundancy-dominated, so a relative shift.
Iterating the workspace lifts synergy and performance to a peak, then over-integration reduces both. This predicts an over-integration limit and a natural stopping rule.
Re-coupling. Specialised modules approximate a factorised, mean-field posterior that drops the dependencies between them. Synergy is the part no subset captures. The workspace re-couples a few to recover it and escape the local optima that factorised inference gets stuck in, at an energy cost.
Compression. Trained on the model's own loss, the workspace keeps what the latent tells us about the output while discarding input detail (data processing inequality). Compress the input, keep the output, the condition for generalisation.
Compression and synergy are orthogonal: how much of the input survives, versus how it is organised across modules. The workspace does both. We see synergy rise where broadcast helps; we have not yet shown it is the cause.
We have the testbed. Help us design the experiment.