Tuning the m68k Cost Model for the 68000
A Practical Guide to Optimizing GCC’s Instruction Heuristics
The Motorola 68000 is a beautifully simple CPU — but its instruction timings, addressing modes, and encoding sizes vary enough that GCC’s default heuristics often produce sub‑optimal code.
To address this, amiga‑gcc introduces a new cost‑table system that allows precise tuning of the optimizer without recompiling the compiler itself.
While the tuning scripts are running, I update the chart this and then, it can be viewed here.
This article describes:
- the new unified cost table
- how external cost files work
- how mutation‑based search finds better values
- what the logs shows
- why this matters specifically for the 68000
The New Unified Cost Table
Traditionally, GCC hard‑codes instruction costs inside C functions.
For amiga‑gcc, this has been replaced with a single table‑driven cost model:
- one table for speed
- one table for size
To achieve this, the cost values have been replaced with lookups into the cost table.
This makes the cost model:
- easier to tune
- easier to compare
- easier to export/import
External Cost Files
To avoid recompiling GCC for every experiment, the cost tables can be loaded from a text file at compiler startup.
This feature will be removed once a good enough setting was found.
But for now this enables:
- rapid iteration
- automated search
- easy sharing of tuned profiles
A typical file looks like:
cost_shift_bit=5
cost_shift_rest=9
cost_logic_reg_const=14
...
The compiler reads these values and overrides the built‑in defaults.
Why Tuning Matters for the 68000
The 68000 has:
- slow multi‑bit shifts
- slow memory accesses
- cheap quick‑immediates
- cheap addq/subq
- expensive longword operations
- no barrel shifter
- no 32‑bit multiply/divide unit
This means the optimizer must be guided differnetly for speed and size:
- discourage multi‑bit shifts or force these
- discourage memory‑heavy sequences or make them cheap
- encourage register arithmetic if speed matters
- encourage quick immediates
- avoid unnecessary `ext.w` / `ext.l` chains
A tuned cost model can reduce code size by 10% and improve speed by 5% on real hardware.
Right now no setting was found where all benchmarks benefit, but the overall looks quite good.
Mutation‑Based Search
To find optimal values, a mutation‑driven search process is used:
1. Load the current best cost table.
2. Randomly mutate a small number of parameters.
3. Rebuild benchmarks using the new table.
4. Measure total runtime or binary size.
5. If the score improves → keep the new table.
6. Otherwise → discard it.
7. Repeat tens of thousands of times.
This evolutionary approach converges surprisingly well.
Example Log Snippet
Here is a real mutation event from the search where I started with all values at 0:
=== Iteration 749 / 99999 ===
Ändere 6 Parameter: cost_shift_reg_w=5 cost_mem_plus_reg_disp=4 cost_set_clr_reg=0 cost_const_int_l=17 cost_shift_reg_w=2 cost_const_int_q=5
Neuer Score: 1672439452 (beste: 1672439456)
+++ Übernehme (neue bester Score 1672439452) +++
I started this several times, and you get different results which are close together but not the same.
Maybe I run it really long and check what happens.
Interpretation:
- six parameters mutated
- new score is better (lower is better)
- the new table becomes the current best
This process repeats continuously.
Stable vs. Unstable Parameters
During tuning, some parameters quickly prove stable:
- changing them never improves results
- they belong in the “excluded” list
- they form the backbone of the cost model
Other parameters are highly sensitive:
- shifts
- memory costs
- addressing modes
- logic operations
- subregs
These are the ones the search mutates most often.
Current Status
My searches are still running, and the score continues to improve.
Once the score plateaus for several thousand iterations, the search can be stopped and the best table extracted.
Conclusion
The new cost‑table system transforms the way amiga‑gcc optimizes code for the 68000:
- no recompilation needed
- fast iteration
- automated tuning
- reproducible results
- architecture‑aware heuristics
Combined with mutation‑based search, this approach produces:
- smaller binaries
- faster execution
- more predictable code generation
- better use of 68000‑specific strengths
After all the updated costs will be comitted into the code base.
rev: 1.6