The Global Key-Value Workspace

The idea

A testbed for consciousness theories

Theories of consciousness are hard to test in brains. An LLM with an explicit workspace gives us a system we can build a theory into and intervene on directly.

Consciousness theories are normally tested in brains, where clean intervention is hard.

An LLM with an explicit workspace lets us build a theory's commitments in and manipulate them directly.

We use it as a testbed for Global Workspace Theory, IIT, and how they relate.

Results · reasoning

The workspace improves math and logic reasoning

Across four backbones, broadcasting selected information improves multi-step reasoning. The workspace does functional work, the access claim made concrete.

Method	GSM8K	SVAMP	GSM-Hd	LogiQA	Gaokao	AVG
Llama-3.2 1B
SFT	33.00	42.00	8.00	28.90	23.90	27.16
BT	35.30	43.70	8.20	28.60	25.60	28.28
+GW	35.60	46.70	7.70	29.50	25.40	28.98
Llama-3.2 3B
SFT	53.98	64.67	14.33	30.26	31.05	38.86
BT	57.09	71.33	14.63	30.26	30.20	40.70
+GW	58.30	69.67	15.85	31.49	30.48	41.16
Llama-3.1 8B
SFT	13.12	23.33	3.11	27.19	28.21	18.99
BT	20.85	39.33	4.93	27.04	26.50	23.73
+GW	21.76	40.00	5.38	26.57	25.07	23.76
Qwen3-0.6B
SFT	53.70	68.30	20.30	27.50	26.80	39.32
BT	54.80	68.70	21.20	27.20	26.50	39.68
+GW	55.00	70.00	21.10	27.50	27.10	39.94

SFT supervised fine-tuning · BT Bottlenecked Transformer · +GW global workspace (dense broadcast). Average over the five tasks; best per column shaded.

Results · efficiency

Better, and cheaper

48.0% ↑

avg accuracy · BGT (TF 46.8%)

2.574 ↓

validation NLL · BGT (TF 2.603)

56.6 ↓

exaFLOPs · BGT (TF 58.5)

Averaged over the standard LM benchmarks · 355–356M params · 20B tokens · matched across TF / BT / BGT.

Results · performance vs cost

A narrow spotlight, at a fraction of the cost

We observe. Accuracy rises with width and is highest at full broadcast. A narrow spotlight recovers most of the accuracy at far lower compute.

Results · integration

The workspace shifts components toward synergy

Synergy is the information in the whole minus the sum of its parts, over balanced head partitions. More positive means higher synergy, and our model (BGT) tends toward it.

SFT baseline · BT · BGT 95% CI · less separable →. Still redundancy-dominated, so a relative shift.

Results · performance vs integration

Integration rises to a peak, then falls

We observe. Synergy and performance have a complex relationship. Accuracy peaks within a few steps, then collapses if the processor keeps iterating, even as raw synergy carries on climbing.

Discussion

Why the workspace helps

Re-coupling

Specialised modules approximate a factorised, mean-field posterior that drops the dependencies between them. Synergy is the part no subset captures. The workspace re-couples a few to recover it and escape the local optima that factorised inference gets stuck in, at an energy cost.

Compression

Trained on the model's own loss, the workspace keeps what the latent tells us about the output while discarding input detail (data processing inequality). Compress the input, keep the output, the condition for generalisation.

Compression and synergy are orthogonal: how much of the input survives, versus how it is organised across modules. The workspace does both. We see synergy rise where broadcast helps; we have not yet shown it is the cause.

Open questions

Help us design the experiment

Are Global Workspace Theory and IIT actually rival accounts, or two views of one mechanism?
Why is the workspace capacity-limited, and when should a bottleneck help beyond efficiency (distractors, interference, task-switching)?
Does the workspace ignite, all-or-none, as GNWT predicts?
What is the principled halting rule?
Is the mean-field and bottleneck account of why the workspace helps correct, trivial, or new?

We have the testbed. Tell us what experiment would convince you.

The Global Key-Value Workspace

Get the preprint & code

A testbed for consciousness theories

Global Workspace Theory, made mechanistic

The workspace improves math and logic reasoning

Better, and cheaper

A narrow spotlight, at a fraction of the cost

The workspace shifts components toward synergy

Integration rises to a peak, then falls

Why the workspace helps

Re-coupling

Compression

Help us design the experiment

A testbed for consciousness theories

Global Workspace Theory, made mechanistic

The workspace improves math and logic reasoning

Better, and cheaper

A narrow spotlight, at a fraction of the cost

The workspace shifts components toward synergy

Integration rises to a peak, then falls

Why the workspace helps

Re-coupling

Compression

Help us design the experiment

From the same line of research

Get the preprint and code