Tuning the m68k Cost Model for the 68000

A Practical Guide to Optimizing GCC’s Instruction Heuristics

The Motorola 68000 is a beautifully simple CPU — but its instruction timings, addressing modes, and encoding sizes vary enough that GCC’s default heuristics often produce sub‑optimal code. To address this, amiga‑gcc introduces a new cost‑table system that allows precise tuning of the optimizer without recompiling the compiler itself.
While the tuning scripts are running, I update the chart this and then, it can be viewed here.

Current gains:

Fun facts

The naiv approach to use the intruction cycles yields not the best since there are always side effects as register pressure. It's looks nice to load a small number in a register (moveq) and then use the register, which looks faster on the first glance, but it uses a register that must be free at that point. So that moveq might require a register spill to the stack, which is slow.

Content

The New Unified Cost Table

Traditionally, GCC hard‑codes instruction costs inside C functions. For amiga‑gcc, this has been replaced with a single table‑driven cost model:
static int m68k_costs_speed[cost_max] = {
	    7,   // cost_reg

	    1,   // cost_mem_reg
	    2,   // cost_mem_plus_reg_disp
	    3,   // cost_mem_plus_reg_reg
	    0,   // cost_mem_other

	    9,   // cost_const_int_q
	    1,   // cost_const_int_w
	    5,   // cost_const_int_l
	    0,   // cost_const_double
	    7,   // cost_symbol

	    3,   // cost_plus_reg_reg
	    1,   // cost_plus_reg_constq
	    0,   // cost_plus_reg_constw
	    0,   // cost_plus_reg_constl
	    6,   // cost_plus_other

	    12,  // cost_logic_reg_const

	    10,  // cost_shift_const
	    8,   // cost_shift_const_w

	    3,   // cost_neg_not_l
	    1,   // cost_neg_not_w

	    0,   // cost_branch

	    1,   // cost_set_reg_reg
	    12,  // cost_set_clr_mem_l

	    15,  // cost_call_reg
	    16   // cost_call_other
	};

To achieve this, the cost values have been replaced with lookups into the cost table.
    case CONST_INT:
      {
        HOST_WIDE_INT v = INTVAL (x);

        if (USE_MOVQ (v))
          *total = thecosts[cost_const_int_q];
        else if (v >= -32768 && v <= 32767)
          *total = thecosts[cost_const_int_w];
        else
          *total = thecosts[cost_const_int_l];

        return true;
      }
This makes the cost model:

External Cost Files

To avoid recompiling GCC for every experiment, the cost tables can be loaded from a text file at compiler startup. This feature will be removed once a good enough setting was found.
But for now this enables:
A typical file looks like:
cost_shift_bit=5
cost_shift_rest=9
cost_logic_reg_const=14
...

The compiler reads these values and overrides the built‑in defaults.

Why Tuning Matters for the 68000

The 68000 has:
This means the optimizer must be guided differently for speed and size:
A tuned cost model can reduce code size by 10% and improve speed by 5% on real hardware.
Right now no setting was found where all benchmarks benefit, but the overall looks quite good.

Mutation‑Based Search

To find optimal values, a mutation‑driven search process is used:
1. Load the current best cost table. 2. Randomly mutate a random number of parameters. 3. Rebuild benchmarks using the new table. 4. Measure total runtime or binary size. To compare the geometric mean is used compared to the original costs. 5. If the score improves → keep the new table. 6. Otherwise → discard it. 7. Repeat tens of thousands of times.
This evolutionary approach converges surprisingly well.
I also created a CMA based python program, that performs a search. This has the advantage to guess correlations of the parameters. The current best result was achieved using both vice versa, since both can be stuck at a plateau.

Example Log Snippet

Here is a real mutation event from the search where I started with all values at 0:
=== Iteration 749 / 99999 ===
Ändere 6 Parameter: cost_shift_reg_w=5 cost_mem_plus_reg_disp=4 cost_set_clr_reg=0 cost_const_int_l=17 cost_shift_reg_w=2 cost_const_int_q=5
Neuer Score: 1672439452 (beste: 1672439456)
+++ Übernehme (neue bester Score 1672439452) +++

I started this several times, and you get different results which are close together but not the same. Maybe I run it really long and check what happens.

Interpretation:


This process repeats continuously.

Stable vs. Unstable Parameters

During tuning, some parameters quickly prove stable:
Other parameters are highly sensitive:
These are the ones the search mutates most often.

Current Status

My searches are still running, and the score continues to improve.
Once the score plateaus for several thousand iterations, the search can be stopped and the best table extracted.

Conclusion

The new cost‑table system transforms the way amiga‑gcc optimizes code for the 68000:
Combined with mutation‑based search, this approach produces:
After all the updated costs will be comitted into the code base.
rev: 1.6