Tuning the m68k Cost Model for the 68000
A Practical Guide to Optimizing GCC’s Instruction Heuristics
The Motorola 68000 is a beautifully simple CPU — but its instruction timings, addressing modes, and encoding sizes vary enough that GCC’s default heuristics often produce sub‑optimal code.
To address this, amiga‑gcc introduces a new cost‑table system that allows precise tuning of the optimizer without recompiling the compiler itself.
While the tuning scripts are running, I update the chart this and then, it can be viewed here.
Current gains:
- size 6.3% less using -Os
- speed 2.2% faster using -O2
Fun facts
- Setting default values to all equal values performs almost as well as the original cost method.
- Many of the initially modelled values are now set to 0 since its value does not affect the result.
- Unexpectedly the current best cost for speed/size have a register cost of 6/7 which is way more than expected.
- On the other side: memory costs are unexpectedly low.
The naiv approach to use the intruction cycles yields not the best since there are always side effects as register pressure.
It's looks nice to load a small number in a register (moveq) and then use the register, which looks faster on the first glance,
but it uses a register that must be free at that point. So that moveq might require a register spill to the stack, which is slow.
Content
- the new unified cost table
- how external cost files work
- how mutation‑based search finds better values
- what the logs shows
- why this matters specifically for the 68000
The New Unified Cost Table
Traditionally, GCC hard‑codes instruction costs inside C functions.
For amiga‑gcc, this has been replaced with a single table‑driven cost model:
- one table for speed
- one table for size
static int m68k_costs_speed[cost_max] = {
7, // cost_reg
1, // cost_mem_reg
2, // cost_mem_plus_reg_disp
3, // cost_mem_plus_reg_reg
0, // cost_mem_other
9, // cost_const_int_q
1, // cost_const_int_w
5, // cost_const_int_l
0, // cost_const_double
7, // cost_symbol
3, // cost_plus_reg_reg
1, // cost_plus_reg_constq
0, // cost_plus_reg_constw
0, // cost_plus_reg_constl
6, // cost_plus_other
12, // cost_logic_reg_const
10, // cost_shift_const
8, // cost_shift_const_w
3, // cost_neg_not_l
1, // cost_neg_not_w
0, // cost_branch
1, // cost_set_reg_reg
12, // cost_set_clr_mem_l
15, // cost_call_reg
16 // cost_call_other
};
To achieve this, the cost values have been replaced with lookups into the cost table.
case CONST_INT:
{
HOST_WIDE_INT v = INTVAL (x);
if (USE_MOVQ (v))
*total = thecosts[cost_const_int_q];
else if (v >= -32768 && v <= 32767)
*total = thecosts[cost_const_int_w];
else
*total = thecosts[cost_const_int_l];
return true;
}
This makes the cost model:
- easier to tune
- easier to compare
- easier to export/import
External Cost Files
To avoid recompiling GCC for every experiment, the cost tables can be loaded from a text file at compiler startup.
This feature will be removed once a good enough setting was found.
But for now this enables:
- rapid iteration
- automated search
- easy sharing of tuned profiles
A typical file looks like:
cost_shift_bit=5
cost_shift_rest=9
cost_logic_reg_const=14
...
The compiler reads these values and overrides the built‑in defaults.
Why Tuning Matters for the 68000
The 68000 has:
- slow multi‑bit shifts
- slow memory accesses
- cheap quick‑immediates
- cheap addq/subq
- expensive longword operations
- no barrel shifter
- no 32‑bit multiply/divide unit
This means the optimizer must be guided differently for speed and size:
- discourage multi‑bit shifts or force these
- discourage memory‑heavy sequences or make them cheap
- encourage register arithmetic if speed matters
- encourage quick immediates
- avoid unnecessary `ext.w` / `ext.l` chains
A tuned cost model can reduce code size by 10% and improve speed by 5% on real hardware.
Right now no setting was found where all benchmarks benefit, but the overall looks quite good.
Mutation‑Based Search
To find optimal values, a mutation‑driven search process is used:
1. Load the current best cost table.
2. Randomly mutate a random number of parameters.
3. Rebuild benchmarks using the new table.
4. Measure total runtime or binary size. To compare the geometric mean is used compared to the original costs.
5. If the score improves → keep the new table.
6. Otherwise → discard it.
7. Repeat tens of thousands of times.
This evolutionary approach converges surprisingly well.
I also created a CMA based python program, that performs a search. This has the advantage to guess correlations of the parameters.
The current best result was achieved using both vice versa, since both can be stuck at a plateau.
Example Log Snippet
Here is a real mutation event from the search where I started with all values at 0:
=== Iteration 749 / 99999 ===
Ändere 6 Parameter: cost_shift_reg_w=5 cost_mem_plus_reg_disp=4 cost_set_clr_reg=0 cost_const_int_l=17 cost_shift_reg_w=2 cost_const_int_q=5
Neuer Score: 1672439452 (beste: 1672439456)
+++ Übernehme (neue bester Score 1672439452) +++
I started this several times, and you get different results which are close together but not the same.
Maybe I run it really long and check what happens.
Interpretation:
- six parameters mutated
- new score is better (lower is better)
- the new table becomes the current best
This process repeats continuously.
Stable vs. Unstable Parameters
During tuning, some parameters quickly prove stable:
- changing them never improves results
- they belong in the “excluded” list
- they form the backbone of the cost model
Other parameters are highly sensitive:
- shifts
- memory costs
- addressing modes
- logic operations
- subregs
These are the ones the search mutates most often.
Current Status
My searches are still running, and the score continues to improve.
Once the score plateaus for several thousand iterations, the search can be stopped and the best table extracted.
Conclusion
The new cost‑table system transforms the way amiga‑gcc optimizes code for the 68000:
- no recompilation needed
- fast iteration
- automated tuning
- reproducible results
- architecture‑aware heuristics
Combined with mutation‑based search, this approach produces:
- smaller binaries
- faster execution
- more predictable code generation
- better use of 68000‑specific strengths
After all the updated costs will be comitted into the code base.
rev: 1.6