After re-adding/-inventing the AmigaOS support I am starting to add further optimizations for the M68k.

The generated code for M68K is not bad - it's often the best available compiler. But there are code snippets where the code could be better.

_strcpy: move.l a0,d0 .L2: move.b (a1)+,d1 move.b d1,(a0)+ ; tst.b d1 -- not visible in output jeq .L7 move.b (a1)+,d1 move.b d1,(a0)+ ; tst.b d1 -- not visible in output jne .L2 .L7: rtsThe code is ok but not the best. Can't we spare the d1 register?

_strcpy: move.l a0,d0 .L2: move.b (a1)+,(a0)+ jeq .L7 move.b (a1)+,(a0)+ jne .L2 .L7: rtsTo achieve this the register and also the invisible tst statement were removed.

But in complex programs - with inlined strcpy - there may be illegal registers at the combine stage and you might have code like:

... move.l a0,d0 .L2: move.b (d2)+,(a0)+ jne .L2 .L7: ...The later running "reload" pass will kill-fix this:

... move.l a0,d0 .L2: move.l d2,a1 move.b (a1)+,(a0)+ move.l a1,d2 jne .L2 .L7: ...And the the condition is killed. Since the d0 register and the test were omitted there is no way to handle it during "reload".

... move.l a0,d0 .L2: move.l d2,a1 move.b (a1)+,d1 move.b d1,(a0)+ move.l a1,d2 ; tst.b d1 -- not visible in output jne .L2 .L7: ...

... move.l a0,d0 move.l d2,a1 .L2: move.b (a1)+,d1 move.b d1,(a0)+ ; tst.b d1 -- not visible in output jne .L2 .L7: sub.l #1,a1 move.l a1,d2 ...Now the strcpy_opt can by applied again:

... move.l a0,d0 move.l d2,a1 .L2: move.b (a1)+,(a0)+ jne .L2 .L7: sub.l #1,a1 move.l a1,d2 ...And if d2 is not used discard the superfluous statements.

Now strcpy is ok and also inlined strcpy seems to be ok - my test cases (from reported bugs) are all ok.

_strncpy: move.l a2,-(sp) move.l d2,-(sp) move.l 12(sp),d0 move.l 16(sp),a2 move.l 20(sp),d1 jeq .L11 move.l d0,a1 .L6: lea (1,a1),a0 move.b (a2)+,d2 move.b d2,(a1) jeq .L15 subq.l #1,d1 move.l a0,a1 jne .L6 .L11: move.l (sp)+,d2 move.l (sp)+,a2 rts .L15: add.l d1,a1 moveq #1,d2 cmp.l d1,d2 jeq .L11 .L8: clr.b (a0)+ cmp.l a1,a0 jeq .L11 clr.b (a0)+ cmp.l a1,a0 jne .L8 jra .L11So let's apply

_strncpy: move.l a2,-(sp) move.l d2,-(sp) move.l 12(sp),d0 move.l 16(sp),a2 move.l 20(sp),d1 jeq .L11 move.l d0,a1 .L6: lea (1,a1),a0 move.b (a2)+,(a1) jeq .L15 subq.l #1,d1 move.l a0,a1 jne .L6 .L11: move.l (sp)+,d2 move.l (sp)+,a2 rts .L15: add.l d1,a1 moveq #1,d2 cmp.l d1,d2 jeq .L11 .L8: clr.b (a0)+ cmp.l a1,a0 jeq .L11 clr.b (a0)+ cmp.l a1,a0 jne .L8 jra .L11The

lea (1,a1),a0 move.b (a2)+,(a1)to

move a1,a0 move.b (a2)+,(a0+)Altogether the optimizations are yielding now:

_strncpy: move.l a2,-(sp) move.l d2,-(sp) move.l 12(sp),d0 move.l 16(sp),a2 move.l 20(sp),d1 jeq .L11 move.l d0,a1 move.l d0,a0 .L6: move.b (a2)+,(a0)+ jeq .L15 subq.l #1,d1 jne .L6 move.l a0,a1 .L11: move.l (sp)+,d2 move.l (sp)+,a2 rts .L15: lea (-1,a0),a1 add.l d1,a1 moveq #1,d2 cmp.l d1,d2 jeq .L11 .L8: clr.b (a0)+ cmp.l a1,a0 jeq .L11 clr.b (a0)+ cmp.l a1,a0 jne .L8 jra .L11Now the code after label .L15 could be better.

_strncpy: move.l a2,-(sp) move.l d2,-(sp) move.l 12(sp),d0 move.l 16(sp),a2 move.l 20(sp),d1 jeq .L11 move.l d0,a1 move.l d0,a0 .L6: move.b (a2)+,(a0)+ jeq .L15 subq.l #1,d1 jne .L6 move.l a0,a1 .L11: move.l (sp)+,d2 move.l (sp)+,a2 rts .L15: lea (-1,a0),a1 add.l d1,a1 subq.l #1,d1 jeq .L11 .L8: clr.b (a0)+ cmp.l a1,a0 jeq .L11 clr.b (a0)+ cmp.l a1,a0 jne .L8 jra .L11

_strncpy: move.l a2,-(sp) move.l d2,-(sp) move.l 12(sp),d0 move.l 16(sp),a2 move.l 20(sp),d1 jeq .L11 move.l d0,a1 move.l d0,a0 .L6: move.b (a2)+,(a0)+ jeq .L15 subq.l #1,d1 jne .L6 move.l a0,a1 .L11: move.l (sp)+,d2 move.l (sp)+,a2 rts .L15: move.l a0,a1 subq.l #1,d1 add.l d1,a1 jeq .L11 .L8: clr.b (a0)+ cmp.l a1,a0 jeq .L11 clr.b (a0)+ cmp.l a1,a0 jne .L8 jra .L11

_strncpy: move.l a2,-(sp) move.l d2,-(sp) move.l 12(sp),d0 move.l 16(sp),a2 move.l 20(sp),d1 jeq .L11 move.l d0,a0 .L6: move.b (a2)+,(a0)+ jeq .L15 subq.l #1,d1 jne .L6 .L11: move.l (sp)+,d2 move.l (sp)+,a2 rts .L15: move.l a0,a1 subq.l #1,d1 add.l d1,a1 jeq .L11 .L8: clr.b (a0)+ cmp.l a1,a0 jeq .L11 clr.b (a0)+ cmp.l a1,a0 jne .L8 jra .L11

_strncpy: move.l 4(sp),d0 move.l 8(sp),a1 move.l 12(sp),d1 jeq .L11 move.l d0,a0 .L6: move.b (a1)+,(a0)+ jeq .L15 subq.l #1,d1 jne .L6 .L11: rts .L15: move.l a0,a1 subq.l #1,d1 add.l d1,a1 jeq .L11 .L8: clr.b (a0)+ cmp.l a1,a0 jeq .L11 clr.b (a0)+ cmp.l a1,a0 jne .L8 jra .L11

- work with a abstract cpu
- use an infinite set of pseudo registers
- combine instructions to create instructions for the real cpu and use hard (real) registers
- fix registers to conform the cpu constraints might be improvable. As far as I understand it now, it requires some fundamental changes, e.g. inventing a register mode which determines a pointer. Then the combine stage could use that information to chose correct registers. But this is a bigger modification.

rev: 1.11