CPU Detection Routine

The Sorbus Computer can run any kind of 6502 variant that shares the same pinout. The small test and learn environment "Monitor Command Prompt" or MCP for short (pun very intended) can be used to learn all the details about each CPU up to a certain degree. (Not all pins are connected due to a limited amount of 30 GPIO pins on the RP2040.)

However, it is interesting to know which CPU the system is running on. The "JAM Core" is the one with the most features, but does use opcodes and features that are not available or buggy on an (old) NMOS 6502. This means that a 65C02 or any other CMOS variant is required. So, it should be a good idea to detect the NMOS 6502 and print an error message that the system won't work, when running on an NMOS 6502, instead of randomly crashing. (It even drops you to WozMon after acknowledging the error message, so that some very basic things can be done.)

Also, there is another point where a CPU detection is very handy. If you search for cheap 65C02 processors, you typically find offerings on ebay or AliExpress for WDC W65C02S processors. However, none of those are original ones, but all of those are pulled out of machines, and then get relabeled. (At least the ones I've seen so far.) For this project using those would not be that bad, if they would not throw in NMOS 6502s in the mix as well. Those, as explained above, don't fit the requirements.

Finally, there is also another reason to try this: for the challenge. After all, this is something where there isn't a solution available on the internet in several variants.

And the solution presented here is also not one without side effects, as it relys on special features of the runtime environment. This exact code would only work partially on an Apple II series machine for example.

Partial Detection

A very easy way to tell apart an NMOS 6502 CPU from it's CMOS successors it done like this in the Sorbus JAM kernel.

;  65C02       ; NMOS 6502
   LDA   #$00  ; LDA   #$00
   DEC         ; .byte $3a  ; "illegal" NOP
   BNE   CMOS  ; BNE   CMOS ; NMOS will not take the branch

In the above case DEC will be assembled to $3A which is an undocumented (also called "illegal") NOP opcode. So, while the LDA #$00 sets the zero-flag, it will be cleared by the DEC opcode, but not by the "illegal" NOP opcode. This also works the same with the INC ($1A) Opcode as well. So, this is a short and efficiant way to tell NMOS and CMOS variants apart, as all CMOS variants share the same INC/DEC instructions. This is the detection the "JAM Core" kernel uses for locking out the NMOS 6502.

On the internet, I found a solution on how to detect the three major CPUs: NMOS 6502, CMOS 65C02 and the 65816, the variant with 16 bit extensions. This was most probably used to tell apart an Apple IIgs (using a 65816) from an Apple IIc/IIe enhanced (using a 65C02) and the original Apple II/II+/IIe (using an NMOS 6502).

;  65816       ; 65C02            ; 6502
   LDA   #$01  ; LDA   #$01       ; LDA   #$01
   XBA         ; .byte $EB ; NOP  ; .byte $EB, $EA ; "illegal" SBC #$EA
   NOP         ; NOP              ;
   LDA   #$EA  ; LDA   #$EA       ; LDA   #$EA
   XBA         ; .byte $EB ; NOP  ; .byte $EB, $EA ; "illegal" SBC #$EA
   NOP         ; NOP              ;
;  A=$01       ; A=$EA            ; A=$00

;  ALL
   BEQ   is6502   ; needs to be first
   BPL   is65816
   BMI   is65c02  ; is obsolete if 65c02 specific code continues here

As you can see, just detecting three CPUs make things significantly more complex. Still, this routine is also quite clever with utilizing the CPU flags for easy branching to the CPU specific routines.

However, the opcode $EB on a 65CE02 is totally incompatible as it is a read-modify-write opcode that even operates on two bytes (ROW). This renders this routine useless when a 65CE02 is encountered.

Now there are two ways. Either to check for the 65CE02 in advance or start from scratch.

The Goal

The goal is to could tell 6502 variants apart by their instruction sets. There are five different instruction sets have been seen in chips with a pin layout similar to the original NMOS 6502 and the CMOS 65C02:

  • NMOS 6502 (Rev.D)
  • however, there is the NMOS 6502 Rev.A, the first 6502 that was still missing the ROR opcodes
  • 65C02, the base CMOS variant, as compared to the NMOS version, a couple of bugs were fixed, and new instructions were introduced
  • 65SC02, a 65C02 with the bit-related opcodes removed ($x7 and $xF are just 1 byte, 1 cycle NOPs), STP and WAI are missing as well
  • 65816, a 16 bit capable variant of the 65SC02, with all 256 opcodes defined now
  • 65CE02, a CMOS reimplementation of the 65C02 by Commodore with also 256 opcodes defined, but totally different compared to the 65816

The pin layout is not 100% the same on all chips, but still enough to run all CPUs with the Sorbus Computer, while omitting some of their features provided by pins on the chip. Variants like the HuC6280 are not part of conciderations, as those chips have totally different pin layouts.

Side note: the NMOS 6502 has 3510 transistors, the Rev.A should have a few less. The 65C02 and 65SC02 are told to have around 4000 transistors. The 65CE02 doubles that amount with about 8000, while the 65816 has whopping 22000 transistors. (If you can, please provide more exact numbers and also references.)

The Runtime Environment

To make things easier, a runtime environment was defined with the sole purpose to detect the CPU. This is the advantage of this software defined computer: you can use a special environment for detection and then switch over to a generic one.

The "runtime environment" is - as all other runtime environments like the previous mentioned MCP and JAM Cores - written in C. It provides everything up to the memory and clock signal. The running code to detect the CPU should write an identifier to the last byte of memory. If this value changes from the default one, the processing will be stopped.

Since the code will be rather simple, the environment provided should also be rather simple. Except from the "return value" no I/O is required. Memory used can also be kept very low: 32 bytes have been proven to be enough, if the memory is "wrapped around" by not fully decoding the address. The code size is just 30 bytes, but the amount of memory provided needs to be a power of 2, as implementing this differently does not make sense.

The "result-byte" written to the end of memory will then be evaluated. If it is in the defined range, the processor type id will be returned by the runtime environment, otherwise a zero indicates a failure.

The "runtime environment" also comes with another feature. Since it's well encapsulated, it can be used just as a subroutine in differenct cores. As of now, it is used within the MCP and the JAM Core.

Every Byte Is Sacred

One of the fun things about coding for a 6502 processor is trying to reach a maximum of efficiency. This typically means: either to use a minimum amount of CPU clock cycles or a minimum amount of bytes of code. This code focuses on the latter.

The basic idea for supplying the return code is a slide of INX opcodes, which then writes the result to the exit code address ($FF). But for this to work, X has to be initialized to $00.

So the code would pratically look something like this:

   LDX   #$00 ; clear out X
   ; run test code
   ; [...]
   ; more return codes before those three
is65816:      ; exit for 16 bit 65816 (3)
   INX
is65C02:      ; exit for CMOS 65C02 (2)
   INX
is6502:       ; exit for NMOS 6502 (1)
   INX
   STX   $FF  ; stops runtime environment

This way, if a specific CPU was detected, jumping to the matching label will then return the proper code.

The last six bytes in memory are vectors for NMI, reset and IRQ (in this order). However, as we are not using neither the NMI nor the IRQ, those memory addresses can be used for code as well. So our code starts at the IRQ vector, and then wraps around as described above to address $0020.

The Detection Routine

The source code is complex, as it was written for all architectures at the same time.

   ; should start at $0000
   ; whole memory is just $20 (=32) bytes
   ; -> code also needs to be 32 bytes
   ; runtime environment is also not capable of generating IRQ or NMI
start:
   ;ldx   #$00 ; removed due to reset vector now at irq and code is wrapping
   clc
   ; $5c is evaluated by ($xxxx = address of is65816):
   ; 6502:     NOP $xxxx,X ("illegal" opcode)
   ; 65(S)C02: NOP #$xxxx (reserved)
   ; 65816:    JMP $38xxxx ($38 taken from SEC)
   ; 65CE02:   AUG #$38xxxx ($38 taken from SEC)
   .byte $5c
   .word is65816
   ; 6502 and 65(S)C02 continue here from $5c
   sec
   ; 65CE02 continues here from $5c
   bcc   is65CE02
   txa           ; A=$00
   ; $1a is evaluated by:
   ; 6502:   NOP ("illegal" opcode)
   ; 65C02:  INC
   .byte $1a
   bne   check65sc02  ; 6502: A=$00, 65(S)C02: A=$01
   ror
   bcs   is6502noror
   bcc   is6502
check65sc02:
   ; $97 is evaluated by:
   ; 65C02:  SMB1 $FF ; will set retval to $02 = 65C02
   ; 65SC02: NOP(reserved) : NOP(reserved, $FF)
   .byte $97
   .byte $FF

is65SC02:
   inx            ; X=$06
is6502noror:
   inx            ; X=$05
is65CE02:
   inx            ; X=$04
is65816:
   ; 65816 continues here from $5c
   inx            ; X=$03
is65C02:
   inx            ; X=$02 ; will be set using SMB1 $FF above
is6502:
   inx            ; X=$01
   stx   $ff      ; will stop CPU

   .byte $ea,$4c  ; spare bytes, unused, evaluate to NOP : JMP irq
                  ; also NMI vector, which is also unused
reset:
   .word irq      ; reset vector, start of ram
irq:
   ldx   #$00     ; argument needs to be $00
   ;slip through to start

Because reading the source code is very hard, as you can't write code for different variants at the same time, let's use a simple flow chart on how the detection is implemented. Red bubbles show a successful detection of a CPU variant.

flowchart

Detection Routine As Processed By The Different CPUs

Now, we can also take a look at the traces collected from running all those CPUs.

Hexdump

This is the initial memory configuration when starting the detection environment.

0000: 18 5c 15 00 38 90 0d 8a 1a d0 05 6a b0 05 90 07  .\..8......j....
0010: 97 ff e8 e8 e8 e8 e8 e8 86 ff ea 4c 1e 00 a2 00  ...........L....

Let's start by taking a look at the last six bytes, the vectors described above. The reset vector points to the last two bytes of memory ($001E), which holds the first instruction. The two bytes before the NMI vector ($0018) hold the STX $FF instruction that will stop the runtime environment. Even though the bytes for the NMI vector ($001a) shouldn't be processed, they contain same sane data, a NOP opcode and the opcode for JMP, this way using the reset vector as an address, this way restarting the code. This ensures that when something does go wrong, the system does not drift into undefined behaviour.

Before we dive into the disassemblies, first let's explain the format using the first executed instruction as our example.

 10:001e r a2    :LDX  #$00

The 10 is just a line number. Counting starts when the reset line is pulled high, so the CPU leaves reset state. Code execution starts with the instruction after reading the reset vector at $FFFC/$FFFD, which is also the first line with disassembly.

This is followed by an overview of the bus state at that time. 001e is the address bus, r shows that the CPU is reading, and a2 is the data bus, showing what was read (or written). If lines like Reset, NMI or IRQ are triggered by pulling to GND, that would be also shown by a corresponding letter.

LDX #$00 is the disassembly of the instruction starting at this memory address. (In this case "LoaD the X register with the value of $00".) A dot behind the opcode (like: NOP.) indicates that the opcode is a reserved (CMOS) or undocumented (NMOS) one. Not every line has a disassembly, because opcodes typically use more than a single cycle to execute.

Every trace should end with a 00ff w XX, which indicates the writing of the CPU id at the "end of memory". This also stops the runtime environment.

The disassemblies were created with the command cold debug of the MCP core. They have been slightly modified for a better readability. (As of writing this document, the disassembler can't always tell apart if the byte read is an opcode or a parameter.)

Disassembly as seen on a 6502

  1:8aff r 00    :
  2:0089 r d0    :
  3:d0ff r 00    :
  4:d0ff r 00    :
  5:0100 r 18    :
  6:01ff r 00    :
  7:01fe r a2    :
  8:fffc r 1e    :
  9:fffd r 00    :
 10:001e r a2    :LDX  #$00
 11:001f r 00    :
 12:0020 r 18    :CLC
 13:0021 r 5c    :
 14:0021 r 5c    :NOP. $0015,X
 15:0022 r 15    :
 16:0023 r 00    :
 17:0015 r e8    :
 18:0024 r 38    :SEC
 19:0025 r 90    :
 20:0025 r 90    :BCC  $0034
 21:0026 r 0d    :
 22:0027 r 8a    :TXA
 23:0028 r 1a    :
 24:0028 r 1a    :NOP.
 25:0029 r d0    :
 26:0029 r d0    :BNE  $0030
 27:002a r 05    :
 28:002b r 6a    :ROR
 29:002c r b0    :
 30:002c r b0    :BCS  $0033
 31:002d r 05    :
 32:002e r 90    :BCC  $0037
 33:002f r 07    :
 34:0030 r 97    :
 35:0037 r e8    :INX
 36:0038 r 86    :
 37:0038 r 86    :STX  $FF
 38:0039 r ff    :
 39:00ff w 01    :

This is the exact CPU variant used in most machines in the late 1970s and early 1980s. The $5c used in line 14 to detect the 65816 and 65CE02 is skipped and the INC-test in line 24 for any CMOS variant does also not succeed. The third and final test in line 28 for a working ROR instruction does succeed, though.

Disassembly as seen on a 6502 Rev.A

This is just estimated, as such a processor is hard to find for a decent price.

  1:8aff r 00    :
  2:0089 r d0    :
  3:d0ff r 00    :
  4:d0ff r 00    :
  5:0100 r 18    :
  6:01ff r 00    :
  7:01fe r a2    :
  8:fffc r 1e    :
  9:fffd r 00    :
 10:001e r a2    :LDX  #$00
 11:001f r 00    :
 12:0020 r 18    :CLC
 13:0021 r 5c    :
 14:0021 r 5c    :NOP. $0015,X
 15:0022 r 15    :
 16:0023 r 00    :
 17:0015 r e8    :
 18:0024 r 38    :SEC
 19:0025 r 90    :
 20:0025 r 90    :BCC  $0034
 21:0026 r 0d    :
 22:0027 r 8a    :TXA
 23:0028 r 1a    :
 24:0028 r 1a    :NOP.
 25:0029 r d0    :
 26:0029 r d0    :BNE  $0030
 27:002a r 05    :
 28:002b r 6a    :ROR.
 29:002c r b0    :
 30:002c r b0    :BCS  $0033
 31:002d r 05    :
 32:002e r 90    :
 33:0033 r e8    :INX
 34:0034 r e8    :
 35:0034 r e8    :INX
 36:0035 r e8    :
 37:0035 r e8    :INX
 38:0036 r e8    :
 39:0036 r e8    :INX
 40:0037 r e8    :
 41:0037 r e8    :INX
 42:0038 r 86    :
 43:0038 r 86    :STX  $FF
 44:0039 r ff    :
 45:00ff w 05    :

This is almost the same as before (6502), except that the ROR instruction in line 28 does not modify the carry flag, like it was supposed to. So this has to be one of those early and rare 6502s with the ROR-instruction missing. (Note the dot after the ROR.)

For more details on this topic, I recommend watching the video "The 6502 Rotate Right Myth" by Eric Schlaepfer, who also built the MOnSter 6502.

Disassembly as seen on a 65C02

  1:0032 r e8    :
  2:0032 r e8    :
  3:ffff r 00    :
  4:0033 r e8    :
  5:01f7 r e8    :
  6:01f6 r e8    :
  7:01f5 r e8    :
  8:fffc r 1e    :
  9:fffd r 00    :
 10:001e r a2    :LDX  #$00
 11:001f r 00    :
 12:0020 r 18    :CLC
 13:0021 r 5c    :
 14:0021 r 5c    :NOP. #$0015
 15:0022 r 15    :
 16:0023 r 00    :
 17:ff15 r e8    :
 18:ffff r 00    :
 19:ffff r 00    :
 20:ffff r 00    :
 21:ffff r 00    :
 22:0024 r 38    :SEC
 23:0025 r 90    :
 24:0025 r 90    :BCC  $0034
 25:0026 r 0d    :
 26:0027 r 8a    :TXA
 27:0028 r 1a    :
 28:0028 r 1a    :INC
 29:0029 r d0    :
 30:0029 r d0    :BNE  $0030
 31:002a r 05    :
 32:002b r 6a    :
 33:0030 r 97    :SMB1 $FF
 34:0031 r ff    :
 35:00ff r 00    :
 36:00ff r 00    :
 37:00ff w 02    :

The way the $5c opcode at line 14 is processed here looks a bit strange. For a couple of cycles (lines 17-21), the CPU seems to have stopped working, like when an NMOS 6502 executes a KIL (illegal) opcode. However at some point the CPU just continues working at line 22. So this behaviour is very different from the way the NMOS 6502 processes this opcode. The result is the same, though: a three-byte NOP.

But when testing for an implemented INC instruction at line 28, it succeeds this time. Then one of the bit manipulation instructions in line 33, which is not present on the 65SC02, sets the bit 1 of the return value, at line 37 making this the only time, the INX-slide is not used for the return value.

Disassembly as seen on a 65SC02

  1:0001 r 5c    :
  2:0001 r 5c    :
  3:0001 r 5c    :
  4:0100 w 00    :
  5:01ff w 01    :
  6:01fe w 62    :
  7:fffc r 1e    :
  8:fffd r 00    :
  9:001e r a2    :LDX  #$00
 10:001f r 00    :
 11:0020 r 18    :CLC
 12:0021 r 5c    :
 13:0021 r 5c    :NOP. #$0015
 14:0022 r 15    :
 15:0023 r 00    :
 16:ff15 r e8    :
 17:ffff r 00    :
 18:ffff r 00    :
 19:ffff r 00    :
 20:ffff r 00    :
 21:0024 r 38    :SEC
 22:0025 r 90    :
 23:0025 r 90    :BCC  $0034
 24:0026 r 0d    :
 25:0027 r 8a    :TXA
 26:0028 r 1a    :
 27:0028 r 1a    :INC
 28:0029 r d0    :
 29:0029 r d0    :BNE  $0030
 30:002a r 05    :
 31:002b r 6a    :
 32:0030 r 97    :NOP.
 33:0031 r ff    :NOP.
 34:0032 r e8    :INX
 35:0033 r e8    :
 36:0033 r e8    :INX
 37:0034 r e8    :
 38:0034 r e8    :INX
 39:0035 r e8    :
 40:0035 r e8    :INX
 41:0036 r e8    :
 42:0036 r e8    :INX
 43:0037 r e8    :
 44:0037 r e8    :INX
 45:0038 r 86    :
 46:0038 r 86    :STX  $FF
 47:0039 r ff    :
 48:00ff w 06    :

This the same as before (65C02), except that the bit set instruction is interpreted as a reserved NOP. opcode in lines 32. It's the same with the address of that instruction at line 33. So execution continues with the INX-slide.

Disassembly as seen on a 65816

  1:001b r 4c    :
  2:001b r 4c    :
  3:001b r 4c    :
  4:01ee r 90    :
  5:01ed r 05    :
  6:01ec r b0    :
  7:fffc r 1e    :
  8:fffd r 00    :
  9:001e r a2    :LDX  #$00
 10:001f r 00    :
 11:0020 r 18    :CLC
 12:0021 r 5c    :
 13:0021 r 5c    :JMP  $380015
 14:0022 r 15    :
 15:0023 r 00    :
 16:0024 r 38    :
 17:0015 r e8    :INX
 18:0016 r e8    :
 19:0016 r e8    :INX
 20:0017 r e8    :
 21:0017 r e8    :INX
 22:0018 r 86    :
 23:0018 r 86    :STX  $FF
 24:0019 r ff    :
 25:00ff w 03    :

This is a straight forward one. The 65816 interprets the $5c opcode in line 13 as a JMP to a 24 bit address. Since we're wrapping around most of the address does not matter much, only the least significant byte is required here.

Disassembly as seen on a 65CE02

  1:01f8 r 86    :
  2:01f8 r 86    :
  3:01f8 r 86    :
  4:01f7 r e8    :
  5:01f6 r e8    :
  6:fffc r 1e    :
  7:fffd r 00    :
  8:001e r a2    :LDX  #$00
  9:001f r 00    :
 10:0020 r 18    :CLC
 11:0021 r 5c    :AUG  $380015
 12:0022 r 15    :
 13:0023 r 00    :
 14:0024 r 38    :
 15:0025 r 90    :BCC  $0034
 16:0026 r 0d    :
 17:0034 r e8    :INX
 18:0035 r e8    :INX
 19:0036 r e8    :INX
 20:0037 r e8    :INX
 21:0038 r 86    :STX  $FF
 22:0039 r ff    :
 23:00ff w 04    :

This CPU is very interesting. Notice how the INX opcodes in lines 17-20 are all processed within a single clock cycle. No other 6502 variant can do this, they all require two clock cycles.

The rest of the detection is rather plain. The $5c opcode in line 11 is evaluated as a 4 bytes instruction. The only 4 byte opcode this CPU has. Only the 65816 also has 4 byte opcodes.

This opcode however, even though it is called AUG, behaves like a 4 byte NOP. So, in this case the SEC, that's evaluated by all other CPUs (except for the 65816), is skipped, and the BCC-branch is taken by this CPU only. (The 4510 chip uses the $5c opcode, now called MAP and shrunken down to 1 byte, to set up memory mapping. Also note that branching requires only two clock cycles, not three like with all other 6502 variants.)

The Mysterious Problem

Since the start of a CPU detection within the Sorbus Computer, sometimes a CPU could not be detected, but when running the test a couple of times, it then magically worked.

What has happend? Typically when processing the reset, the CPU does some dummy reads from the stack. However, take a look at lines 4-6 from the 65SC02 disassembly. On some CPUs, these dummy reads are actually dummy writes. This destroys three subsequent bytes in memory. The start must be concidered random, as the stackpointer is not reset yet. But after that the stackpointer seems to just stay where it was, moving by three bytes each reset. So during the retries, the stackpointer gets moved to a position, where the writes go to a part of memory that was not used during the test.

Interestingly, this is described as an enhancement to the original NMOS 6502 CPU according to the CMD G65SC02 datasheet. However, even later CPUs do not have this "feature". It is also not possible to cleanly "return" from a reset, since the program counter might not be pointing to an instruction. This is not the case with interrupts.

How was it fixed? The writes happen in a very early stage, even before the reset vector is being read to determine where in memory to start executing code. So, writes are now discared, if they were done before reading the reset vector. Or one can say: before the reset vector is read the memory is read-only. Problem solved with an elegant solution.

There Is Always Someone Better

After finishing the CPU detection from the Sorbus, I found this: getspu.s of the cc65 compiler suite. It can tell apart nine different 6502 based CPUs.

While there are more than the six CPUs being detected with the method described here, it does not make sense to add any of those CPUs to this routine for a simple reason: they don't fit in a 40 pins socket compatible with the 65C02. The table has a slightly more detailed description:

CPU Description Supported in Sorbus
NMOS 6502 original 6502 yes
65C02 "current" 6502 still being sold yes
65816 16-bit variant still being sold yes
65SC02 65C02 without bit manipultion opcodes yes
65CE02 Commodore CMOS, used in Amiga A2232 yes
4510 65CE02 based microcontroller with MMU no, different package
45GS65 MEGA65, huge expansion of 4510 no, FPGA
HuC6280 PC Engine, CMOS, adds MMU and sound no, different package
2a03/2a07 NES/Famicom, NMOS, adds sound, no BCD no, different pinout

So, there is no other CPU to be detected.

However, getcpu.s does not support the 6502 without the ROR opcode. As the output of the C compiler relies on this opcode, it does not make sense to add this detection to a library function. The code will most probably crash before running this function. The Sorbus JAM on the other hand will fall back to a machine language monitor that is implemented without using any ROR opcodes, so the Rev.A CPU can be experimented with. If you can find one.