Hacking GCC and BINUTILS

Guess it would be easier to ask 'what's already ok with gcc for the Amiga?' ... but even then a correct answer is hard. Well the current issue I'm talking about that using C++ you can end up with non working executables. And I am not sure, if I'll succeed ...
... where I am at?

Native Amiga-GCC

Few weeks ago I managed to build a first version which really creates assembly, object and executable files. native Amiga-GCC.
And it's kind a slow.
Well, it wont be very much faster, but what, if all these executables were resident? Then - with enough memory - one could preload the executables and should gain some time...

simple resident executable in C with libnix

The libnix library is the only one I'm using. There is a comparison page for the libraries: https://github.com/bebbo/amiga-gcc/wiki/Libraries Even the old data shows, that libnix is not the worst choice. Guess I should run the tests again, and update that site, but that's a different topic.
To create a resident program you simply have to specify -resident or -resdient32.
No sooner said than done: and say hello to the guru.
What happened? The autostart feature hit it hard, since the code to init the libraries references the library bases directly linked:
extern __far struct lib {
	struct Library *base;
	char const *name;
};
But now, the code must refer the data segment of the current instance.
So instead of referencing the memory, a pointer to a getter which returns the address from the running instance is provided:
	struct Library ** (*get)();
The code needs adjustment to work with this change, and also sfdc should generate the auto open code that matches. I also added
#ifdef __baserel__
for that new behaviour and ensured that libstubs.a is also generated for each multilib variant.
And - hurray - the simple resident program is starting.

simple resident executable in C++ with libnix

So how could this be different? Applying g++ to a hello-world now still yields a working program, but other programs fail badly...
DUH
After some experiments I managed to create a smaller program that fails. The important part
struct TT x{1, 2};
An object with constructor and destructor. And after searching the culprit: The linker is not able to treat the constructors properly. This is handled with the old stab mechanism.
Since there is no quick solution to fix the binutils linker, I decided to switch from stabs to sections. This requires the use of an end file to terminate the list sections with a zero, but the more complex version is working.
The objdump of the generated file proves that linking is proper now:
00002124  ___INIT_LIST__:
        .long 0x00000000,0x0000023c,0x00000058,0x0000053c,0x00000031,0x00000598,0x0000007b,0x000005f4
        .long 0x00000030,0x00000894,0x0000004e,0x00001c5e,0x00000062

00002158  __term0__INIT_LIST__:
        .long 0x00000000

0000215c  ___EXIT_LIST__:
        .long 0x00000000,0x00000466,0x00000058,0x000005c0,0x0000007b,0x000006b4,0x00000030,0x000008de
        .long 0x0000004e,0x00001e74,0x00000062

00002188  __term0__EXIT_LIST__:
        .long 0x00000000

0000218c  ___CTOR_LIST__:
        .long 0x00000598,0x000020f4

00002194  __term0__CTOR_LIST__:
        .long 0x00000000

00002198  ___DTOR_LIST__:
        .long 0x00000000,0x0000210c

000021a0  __term0__DTOR_LIST__:
        .long 0x00000000

000021a4  ___LIB_LIST__:
        .long 0x000005f4,0x00000100

000021ac  __term0__LIB_LIST__:
        .long 0x00000000
Since these lists all moved into the .text segment, the startup codes needed some care too - as the linker script - to put it all together.

and gcc still fails

The crash happens no longer during startup. It's ... hold the line ... investigating ...
... ok, it's a bsr somewhere.

Using resident or baserel the gcc compiler also decides to emit jbsr which may result into a pc relative call. While it's 1 tick slower (I guess) it needs less relocation. But why does it fail?

Different architectures have different approaches what to write something into the offset which gets adjusted while linking. The Amiga wants zeroes in there. But sometimes there aren't zeroes. I considered this already fixed (modified it a while ago), but that workaround does not work reliably. This time I went to hack the linker and poke it to zero:
	  int off = 0;
	  if (r->howto->type == H_PC32)
	    off = bfd_getb_signed_32 (((bfd_byte *)data)+r->address);
	  else if (r->howto->type == H_PC16)
	    off = bfd_getb_signed_16 (((bfd_byte *)data)+r->address);
...
          relocation = ... - off;
Now if there is an offset, it gets eliminated. And
gcc is starting:
vamos -Hdisable -s100  -m32000 -C40 --  gcc/xgcc -v
Using built-in specs.
COLLECT_GCC=xgcc
Target: m68k-amigaos
Configured with: /home/stefan/amiga-gcc/projects/gcc/configure --host=m68k-amigaos --prefix=/opt/amiga68 --enable-languages=c,c++,objc --enable-version-specific-runtime-libs --disable-libssp --disable-shared --disable-lto --disable-host-shared --disable-libstdcxx --disable-nls --with-gmp-include=/home/stefan/amiga-gcc/amiga/gcc/../gmp --with-gmp-lib=/home/stefan/amiga-gcc/amiga/gcc/../gmp/.libs --with-mpfr-include=/home/stefan/amiga-gcc/amiga/gcc/../mpfr-2.4.2 --with-mpfr-lib=/home/stefan/amiga-gcc/amiga/gcc/../mpfr/.libs --with-mpc-include=/home/stefan/amiga-gcc/amiga/gcc/../mpc-0.8.1/src --with-mpc-lib=/home/stefan/amiga-gcc/amiga/gcc/../mpc/src/.libs
Thread model: single
gcc version 6.5.0b 220317174649 (GCC) 
Also resident is working as expected: once loaded the startup time is reduced.

and cc1 still fails

I'm into it and I know it's again a linker issue, where accessing some static function in a .text section from a different section (created via template) results into the wrong offset.

After following some wrong paths, the current guess is: This is related to local symbols not visible in .o files...
... another wrong guess...
start over: the object contains
Disassembly of section .text._ZNK10hash_tableI14int_cst_hasher11xcallocatorE13alloc_entriesEj:
...
  5a:	2f02           	move.l d2,-(sp)
  5c:	61ff 0001 36ee 	bsr.l 1374c 1374c __Z21ggc_cleared_vec_allocIP9tree_nodeEPT_j
and inside of the text section
Sections:
Idx Name          Size 
  0 .text         00019334 
...
0001374c 0001374c __Z21ggc_cleared_vec_allocIP9tree_nodeEPT_j:
   1374c:	2f0d           	move.l a5,-(sp)
and the final file
  8e2b76:	2f02           	move.l d2,-(sp)
  8e2b78:	61ff ffc7 79b6 	bsr.l 55a530 55a530 __Z13make_pass_vrpPN3gcc7contextE+0x34bc
which points to a wrong location. The object file was placed here:
.text          0x000000000055a530    0x19334 libbackend.a(tree.o)
adding the offset 0001374c yields 56DC7C and the code is really there:
  56dc7c:	2f0d           	move.l a5,-(sp)
=> the emitted jump offset 000136ee plus the own address 5c plus 2 bytes for the bsr opcode yield 137AC which **is** the correct offset into the .text segment.
Erasing the offset is not always valid.
Next idea: keep the offset, if the symbol is a section symbol...
  8e2b78:	61ff ffc8 b102 	bsr.l 56dc7c 56dc7c __Z25gt_clear_caches_gt_tree_hv+0x228
that address is now correct - and the rest? I reduced applying that approach only within the same object file, since only there the offset is correct.

cc1 is starting now, but not yet working. But it's resident^^

... just a little insight how I botcher around...

illegal addresses

Next stop: bogus code...
00011990 00011990 .L4379:
   11990:       307c 0005       movea.w #5,a0
   11994:       d1e8 0008       adda.l 8(a0),a0
   11998:       e9d0 4001       bfextu (a0),0,1,d4
accessing the memory at address 00000005 isn't correct. It seems that this code
     shorten = (((((orig_op0)->typed.type))->base.u.bits.unsigned_flag)
         || (((enum tree_code) (op1)->base.code) == INTEGER_CST
      && !integer_all_onesp (op1)));
translates into
...
	move.w #5,a0
	add.l (8,a0),a0
	bfextu (a0){#0:#1},d6
	cmp.l d6,d6
...
before reload the insn looks like (if this is the right one)
(insn 1128 1127 1129 91 
(set (subreg:SI (reg:QI 551) 0)
     (zero_extract:SI (mem:QI (plus:SI (mem/f:SI (plus:SI (reg/v/f:SI 470 [ orig_op0 ])
                                                          (const_int 8 [0x8])) [0 orig_op0_965(D)->typed.type+0 S4 A16])
                                       (const_int 5 [0x5])) [13 *_535+5 S1 A8])
            (const_int 1 [0x1])
            (const_int 0 [0]))) gcc/c/c-typeck.c:10753 359 {*extzv_bfextu_mem2}
     (nil))
and reload converts it to
(insn 5322 1127 5324 91 (set (reg:SI 9 a1)
        (mem/f/c:SI (plus:SI (reg/f:SI 13 a5)
                (const_int 16 [0x10])) [22 orig_op0+0 S4 A16])) gcc/c/c-typeck.c:10753 39 {*movsi_m68k}
     (nil))
(insn 5324 5322 5325 91 (set (reg:SI 9 a1)
        (const_int 5 [0x5])) gcc/c/c-typeck.c:10753 39 {*movsi_m68k}
     (nil))
(insn 5325 5324 1128 91 (set (reg:SI 9 a1)
        (plus:SI (reg:SI 9 a1)
            (mem/f:SI (plus:SI (reg:SI 9 a1)
                    (const_int 8 [0x8])) [0 orig_op0_965(D)->typed.type+0 S4 A16]))) gcc/c/c-typeck.c:10753 135 {*addsi3_internal}
     (expr_list:REG_EQUIV (plus:SI (mem/f:SI (plus:SI (reg:SI 9 a1)
                    (const_int 8 [0x8])) [0 orig_op0_965(D)->typed.type+0 S4 A16])
            (const_int 5 [0x5]))
        (nil)))
(insn 1128 5325 1129 91 (set (reg:SI 4 d4 [orig:551+-3 ] [551])
        (zero_extract:SI (mem:QI (reg:SI 9 a1) [13 *_535+5 S1 A8])
            (const_int 1 [0x1])
            (const_int 0 [0]))) gcc/c/c-typeck.c:10753 359 {*extzv_bfextu_mem2}
     (nil))
The insns were ok if not always a1 would be used. It's reload again...
(insn 350 349 351 76 (set (reg/v:SI 83 [ unsigned_op1 ])
        (zero_extract:SI (mem:QI (plus:SI (mem/f:SI (plus:SI (reg/v/f:SI 302 [ op1 ])
                            (const_int 8 [0x8])) [0 op1_460(D)->typed.type+0 S4 A16])
                    (const_int 5 [0x5])) [32 *_123+5 S1 A8])
            (const_int 1 [0x1])
            (const_int 0 [0]))) /home/stefan/amiga-gcc/projects/gcc/gcc/c/c-typeck.c:4815 374 {*extzv_bfextu_mem2}
     (nil))
... change this and that ...
	move.l (16,a5),a1
	move.l (8,a1),a0
	lea (5,a0),a1
	bfextu (a1){#0:#1},d4
which looks correct now, but the result should be
	move.l (16,a5),a1
        bfextu ([8,a1],5){#0,#1},d4
ok, these insns aren't well defined in the machine description... ... extending the usable memory address (plus undo the reload change) and voilá:
	move.l (20,a5),a0
	bfextu ([8,a0],5){#0:#1},d6
	move.l (28,a5),a0
	bfextu ([8,a0],5){#0:#1},d1
	cmp.l d6,d1
this asm makes more sense.

rinse and repeat

After rebuilding everything start to test gcc and
vamos -Hdisable -s100  -m32000 -C40 --  gcc/xgcc 
xgcc: fatal error: no input files
compilation terminated.
18:36:13.500       exec:  ERROR:  deallocate: block outside of mem header!
18:36:13.503       exec:  ERROR:  deallocate: block outside of mem header!
18:36:13.511       exec:  ERROR:  deallocate: block outside of mem header!
18:36:13.551       exec:  ERROR:  deallocate: block outside of mem header!
18:36:13.554       exec:  ERROR:  deallocate: block outside of mem header!
18:36:13.563       exec:  ERROR:  deallocate: block outside of mem header!
and the program hangs.

After searching a while, I tracked that one down as an issue in vamos, fixed it. But the output is the same - only the program exits porperly. Let's use some more debug output:
13:28:54.176      instr:   INFO:  PC=00001626  SR=-----    USP=000af330 ISP=00000700 MSP=00000780
13:28:54.176      instr:   INFO:  D0=6763632f  D1=0010222d  D2=000af368  D3=0007b024  D4=0000007b  D5=0000007b  D6=00000000  D7=00000067
13:28:54.176      instr:   INFO:  A0=00100574  A1=00102229  A2=00100574  A3=0010222d  A4=000b75f2  A5=000af370  A6=0000133c  A7=000af330
13:28:54.176      instr:   INFO:  F0=0  F1=0  F2=0  F3=0  F4=0  F5=0  F6=0  F7=0  N=1  Z=0  I=0  NAN=0
13:28:54.176      instr:   INFO:  @0015e8 +00003e exec.library(Traps)       001626    PyTrap  #$021 ; base_func  
13:28:54.176       exec:  DEBUG:  DEALLOC: mh_addr=100574, blk_addr=102229, num_bytes=1734566704
13:28:54.176       exec:  DEBUG:  read: [MH@100574:(100594+003fe0=104574):free=001118,MC=100594]
13:28:54.176       exec:  ERROR:  deallocate: block outside of mem header!
The memory gets trashed somewhere. Note that d0 contains 6763632f which is 'GCC:'

So switch to the more informative debug malloc...

... and there is more info
18:47:54.430      instr:   INFO:                                          b'__ZN10hash_tableIN8hash_mapIN21mem_alloc_descriptionI9vec_usageE17mem_location_hashEPS2_21simple_hashmap_traitsI19default_ha':
18:47:54.430      instr:   INFO:  @0020b0 +078ed0 xgcc_0:code               07af88    move.l  A5, -(A7)     
...
18:47:54.432      instr:   INFO:  @0020b0 +078ede xgcc_0:code               07af96    move.l  (A0), D0      
18:47:54.432      instr:   INFO:  PC=0007af98  SR=--Z--    USP=000af270 ISP=00000700 MSP=00000780
18:47:54.432      instr:   INFO:  D0=00000000  D1=00000000  D2=000af2ac  D3=00000062  D4=0000007b  D5=0000007b  D6=00000000  D7=00000067
there is 00000000 in d0 which is used as pointer later...
grep "ZN10hash_tableIN8hash_mapIN21mem_alloc_descriptionI9vec_usageE17mem_location_hashEPS2_21simple_hashmap_traitsI19default_ha" gcc/*.o
grep: gcc/vec.o: binary file matches
the truncated name plus the code leads to
00000000 00000000 __ZN10hash_tableIN8hash_mapIN21mem_alloc_descriptionI9vec_usageE17mem_location_hashEPS2_21simple_hashmap_traitsI19default_hash_traitsIS4_ES5_EE10hash_entryE11xcallocatorE8iterator5slideEv:
   0:   2f0d            move.l a5,-(sp)

00000002 00000002 .LCFI108:
   2:   2a4f            movea.l sp,a5

00000004 00000004 .LCFI109:
   4:   2f0a            move.l a2,-(sp)

00000006 00000006 .LCFI110:
   6:   206d 0008       movea.l 8(a5),a0

0000000a 0000000a .LBB2022:
   a:   2268 0004       movea.l 4(a0),a1

0000000e 0000000e .L308:
   e:   2010            move.l (a0),d0
  10:   b3c0            cmpa.l d0,a1
which is called shortly after __exitcpp

maybe initialization isn't done properly?
uhm, yes, _CTOR_LIST is in the data segment:
m68k-amigaos-objdump -D xgcc | less
...
000178b4 000178b4 ___CTOR_LIST__:
   178b4:       0007 0b88       ori.b #-120,d7
   178b8:       0000 0000       ori.b #0,d0
... and bogus. Why? Ask the map file:
 .data          0x0000000000091fe4        0x8 /opt/amiga/m68k-amigaos/libnix/lib/libb32/libm020/libm881/libstubs.a(__ctor_list__.o)
                0x0000000000091fe4                __CTOR_LIST__
ok, had to remove some old files and fix a typo:
vamos -Hdisable -s100  -m32000 -C40  --  gcc/xgcc 
xgcc: fatal error: no input files
compilation terminated.
smooth. With cc1 there is some wierd output and winuaeenforcer reports hits...


... took some effort but hunted it down as well: It was a use after free in the original gcc code. The same code is still present in the recent gcc versions...

Now cc1 runs and can be made resident which saves some time (I guess)