Intel 80286 emulator for Raspberry Pico

30 points by fleeks 2 days ago

The code has some 386 things implemented but not protected mode 286 stuff.

Seems to be written by somebody who writes fairly neat and clean code, but hasn't discovered structs yet.

Definitely not cycle accurate. Video emulation only covers simple standard modes, not the undocumented ones, not proper hardware programming.

Does have a decent handling of (real-mode) interrupts. The AT keyboard interface (+ keyboard itself) is handled poorly on Windows/Linux -- the code for handling bytes sent to the keyboard is missing + there is no emulation of the receive queue in the keyboard controller. The code for the Raspberry Pi PICO is different: it actually bit bangs some pins to support a PS/2 interface (basically an AT interface). Still doesn't handle any buffering as far as I can see.

There is an 8K PC XT(?) BIOS included with no mention of where it comes from. The code there handles hardware interrupts. The software interrupt BIOS API is handled in C code (intcall86() in src/emulator/cpu.c).

JdeBP 2 days ago

What's the likelihood that it's a hex dump one of these?
* https://github.com/virtualxt/virtualxt/tree/develop/bios
Because if you look back far enough you'll find that this is based upon something called fake86 written by a Mike Chambers in 2010, of which there are many half-written derivatives including this:
* https://github.com/lgblgblgb/fake86/tree/master
obviously replicated here:
* https://github.com/xrip/pico-286/blob/0d2c76ba51572addaeed8a...
and which loads a pcxtbios.bin file in several flavours, including but not limited to:
* https://github.com/lgblgblgb/fake86/tree/master/bin/data
- xrip 2 days ago
  
  Bios is lightly modified https://github.com/virtualxt/pcxtbios (https://www.phatcode.net/downloads.php?id=101)
  
  peterfirefly 2 days ago
  
  Does the source code still exist? Probably a good idea to have that in the repo + a tool to generate the constant byte array (or maybe use the new fangled C23 #embed feature).
BearOso 2 days ago
I noticed this:
```
  320×200×256 Colors: Mode 13h - the famous "Mode X" used by many DOS games
```
Mode 13h and Mode X are very different things. Glancing at the code, I see lots of stuff apparently cobbled together from various parts of the web, which kind of says to me "LLM data set". I'm going to guess that this was vibe-coded.
magicalhippo 2 days ago

> Seems to be written by somebody who writes fairly neat and clean code, but hasn't discovered structs yet.
So Claude, given the CLAUDE.md file in the repo?
- peterfirefly 2 days ago
  
  Pretty sure Claude knows about structs.
ForOldHack 2 days ago

Well, then Xenix 286 is out of the question ...
- peterfirefly 2 days ago
  
  Any code that treats the machine like an opponent in a full contact sport is out, not just code that relies on a 286. There should still be lots of tame code that kinda, sorta almost works.
  It's useless as a real emulator, it's impressive as a hobby project.

mschuster91 2 days ago

The idea reminds me of the fact that truly old-ass Intel clones are the foundation of many a microcontroller design [1]. Searching for the 8051, the most popular family, here on HN will lead you to some very deep rabbit holes [2].

[1] https://en.wikipedia.org/wiki/Intel_MCS-51

[2] https://hn.algolia.com/?q=8051&utm_source=opensearch&utm_med...

jshaqaw 2 days ago

My first neural net code (very very bad) was on a 286/287 back in the early 90s. 286 is kind of a forgotten chip since 386 32 bit changed the game but give a kid (ie teenage me) a 286, a 40mb hard drive, and Turbo Pascal and I felt like I could build anything!

ForOldHack 2 days ago

Did you run TP in the '87 mode?

janice1999 2 days ago

Anyone else instantly turned away by the LLM emoji bullet points?

This project could be the work of an enthusiastic developer with a deep understanding/love of the 80286 or LLM slop based on regurgitated code stolen from years of hard work by dedicated retro emulator developers. Unfair or not, emoji bullet points do not give confidence it's the first.

tab8000 2 days ago

Looking at the commits, for me it seems to be LLM generated. Claude is even mentioned in one of them [0].
[0] https://github.com/xrip/pico-286/commit/d7fd03193fa3f406758a...
rasz 2 days ago

Code is also LLMed + no screenshots, no videos, no software compatibility listed, no github issues/error reports and nobody on the googlable internet ever ran this. No idea whats going on.
- actionfromafar 2 days ago
  
  Russian site, fwiw.
- xrip 2 days ago
  
  [dead]
peterfirefly 2 days ago

At a first glance, it looks cleaner than a lot of real emulator code I've seen.
What would you do if you only spoke English and Russian was the global language?
- dpoloncsak 2 days ago
  
  Probably release it in English, and not blindly trust an LLM to accurately translate something I am about to post online if I am unable to verify it for accuracy.
  
  peterfirefly 2 days ago
  
  If this is LLM translated then I'm very impressed!
  https://github.com/xrip/pico-286/issues/28
  
  xrip 2 days ago
  
  Check the commits chain. LLM write wonderfull stub but not realy working code, and then i've cleaned and fixed it down. Also mapdrive.asm writen by me. In other words - llm was a good helper.
  
  dpoloncsak 2 days ago
  
  Yeah, I had no doubt you wrote this code. This is way above what I think an LLM could currently handle by itself, in my opinion
rep_lodsb 2 days ago

Not sure if it's 100% slop, but as someone knowledgeable about older x86 processors, I can say after a casual look through "src/emulator/cpu.c" that the code is pretty terrible and often wrong.
For example, "subtract with carry" simply adds the carry to the second operand before doing the subtraction, which will wrap around to zero if it's 0xffff; this doesn't affect the result, but the carry out will be incorrectly cleared. Shift with a count of zero updates the flags, which it doesn't on real hardware. Probably many more subtle bugs with the flags as well.
It can't really be called a 286 emulator either, because it only runs in real mode! For some reason there is already code in there to handle the 32-bit addressing of the 386, but it can't have been written by someone who actually understands how the address size override works - it changes both the size and the encoding of the address register(s), "AX+BX" is never a thing, nor is "EBX+ESI" (etc) used if there is only an operand size override. Also what looks like a (human?) copy-paste mistake with "EBX" in a place where it should be "EAX". At least all that code is #ifdef'd out.
And rather than running a real BIOS, it appears to handle its functions in the INT emulation code, but what is there looks too incomplete to support any real software.
- peterfirefly 2 days ago
  
  There is emulation code for the PIC (interrupt controller) and PIT (timer) + various other stuff. The intcall86() filters out and handles INT 10h/13h/etc but everything else is emulated "for real": ip/cs/flags are pushed, new cs:ip set, etc.
  There is an 8KB BIOS of some sort defined as a big array which should handle boot and hardware interrupts.
  > Probably many more subtle bugs with the flags as well.
  Sure. And lots of subtle bugs in general. Which is fine for someone's personal fun side project.
  
  ForOldHack 2 days ago
  
  Since it's 286, can it run the cursed Xenix 286? Do hacked GDTs work?

glonq 2 days ago

The page calls 320×200×256 "Mode X" but IIRC that was actually 320×240×256 wasn't it?

peterfirefly 2 days ago

Yes. And it's got them new cool square pixels!
There's also an undocumented version of the 320x200x256 mode that uses different access/addressing (similar to Mode X) which sometimes makes it faster + it allows for page flipping. Some people called that one Mode X too. Others called it Mode Y.

peterfirefly 2 days ago

The 8253 (timer) implementation seems downright weird. The timers only count when the low byte of their current counter value is read!? And they are only read when ports 0x40-0x42 are read? So there is no proper timer interrupt?!

... no, wait...

The 8253 calculates a 'timer_period' which is used in the {linux/win/pico}-main files. They read the platform time and call 'doirq(0)' to signal IRQ0 whenever the time is right. The actual counting in the emulated 8253 isn't used and has absolutely no relation to the IRQ0 signals. Quirky.

peterfirefly 2 days ago

xrip (Ilya), if you read this -- please post a write up of your emulator(s) on vogons.org!

xrip 2 days ago

My english is pretty bad, its take time to write post in not sort of a 'pigeon English' :)
Regarding other commenters, about LLMs. Sure they're used, nowdays it's stupid to drop the oportunities that LLMS provide. But around 60% of codoe wroten by me, and CPU emulation is wroten by famous Mike Chambers in fake86 emulator. Sometimes, when code looks rought or stupid (why no struct, folks?) -- keep in mind it's target on very limited cpu power and MUCH more limited RAM of RP2040. Also, code mess of .inl.c cause we need maximize functions inlineing, each funciton call on pico took abonrmal amout of CPU cycles.
While RP2350 was some sort of gamechanger, resources is still limited.
- peterfirefly 4 hours ago
  
  Another trick that might work:
  https://gcc.gnu.org/onlinedocs/gcc/Global-Register-Variables...
  As the docs say, it isn't easy to use it in a safe way but it might speed up your CPU emulator.
  This trick used to be common in VMs that were compiler targets. It was also sometimes used by compilers that compiled to C (for portability). There would be some preprocessor magic that enabled it on certain common targets so the C code would compile to faster code.
  The implementation strategy here would be something like:
  - keep the x86 registers (8 GPRs + PC + flags) in a struct somewhere when executing outside the CPU code - on entry to the CPU core, copy from the struct into global register variables - on exit from the CPU core, copy them back - everything inside the CPU core then uses the global register variables => no load/store code needed, no code for calculating memory addresses of the x86 registers needed - the way operand read/write works in the CPU emulator would have to be changed (probably) - the entire structure of the CPU emulator would have to be changed (probably), so each opcode value would have its own C code, and each modrm byte value would also have its own C code - you might need to use some compiler magic to force the CPU core to use tail calls and to avoid tail merging
  You can always prototype it for a few x86 instructions and see what it would look like. No need to attempt to switch the entire thing over unless you like the prototype.
  Computed gotos to help with the tail calls: https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
  Disable tail merging: -fno-tree-tail-merge
  If you want to go really crazy, you can have a faster handling of prefixes:
  - instruction interpretation doesn't look for prefixes, it just uses a jump table - in case of REP/REPNE/CS/ES/SS, set a flag + use a jump table on the next byte. The code jumped to is general enough to read the override flags and access memory properly - the normal code does not look at the segment override flags and does not have that flexibility
  So, two versions of each opcode implementation with two versions of the memory access code: with and without segment override handling.
  You can use the same C code for both, you just need to put it in an include file and include it twice (and set some #define's before each include).
  There is zero reason to have a slow CPU emulation, especially as you are not doing cycle accuracy.
  Again, this is something you can play around with and prototype for a few x86 instructions first.
  Even if you don't want to change your emulator in these directions, you could still learn some practical C tricks by writing the prototype code.
- JdeBP 2 days ago
  
  If it helps, the word is "pidgin".
  * https://ru.wiktionary.org/wiki/pidgin#%D0%90%D0%BD%D0%B3%D0%...
  
  xrip 2 days ago
  
  [dead]
- peterfirefly 2 days ago
  
  > My english is pretty bad, its take time to write post in not sort of a 'pigeon English' :)
  Why not work on it? Isn't there a fairly substantial return on investment for someone in Russia who does that? This is not meant as a put down or an insult. Just general befuddlement because it seems like a no-brainer to me.
  ---
  So cpu.[ch] are from fake86 -- via faux86 or straight from fake86? What version did you fork? That is something that would be good to put in the readme.
  You wrote the files in the drivers/ directory? And LinuxMiniFB.c, WinMiniFB.c, {linux|win}-main.cpp, pico-main.c? And the audio code?
  Tell people what you wrote. You get better feedback that way.
  > Sometimes, when code looks rought or stupid (why no struct, folks?) -- keep in mind it's target on very limited cpu power and MUCH more limited RAM of RP2040.
  This is what Wikipedia says about the RP2040:
  Dual ARM Cortex-M0+ cores (ARMv6-M instruction set), Originally run at 133 MHz,[2] but later certified at 200 MHz[16] Each core has an integer divider peripheral and two interpolators 264 KB SRAM in six independent banks (four 64 KB, two 4 KB)
  That is indeed tiny...
  It has never been the structs that slowed my code down. structs are free. Passing them into functions with or without pointers is usually free (or cheaper than free!) with modern C compilers -- but that was not the case until 5-10 years ago, I think.
  I haven't checked if Mike Chamber's original cpu emulator was "struct free". Maybe it was and you just inherited that "feature" ;)
  > Also, code mess of .inl.c cause we need maximize functions inlineing, each funciton call on pico took abonrmal amout of CPU cycles.
  Doesn't link-time optimization (LTO) solve that for you? It looks like you use it in CMakeLists.txt:
  add_link_options(-flto -fwhole-program) # -frename-registers -fno-tree-vectorize add_compile_options(-flto=auto -frename-registers -fomit-frame-pointer -fwhole-program -ffreestanding -ffast-math -ffunction-sections -fdata-sections -fms-extensions -O2)
  You could also try 'inline __attribute__((always_inline))'.
  How do you profile code on a Raspberry Pi Pico?
  Btw, the redirector interface comes in slightly different versions (different struct sizes in different DOS versions). Which DOS versions do your redirector work with?
  
  xrip 2 days ago
  
  > Why not work on it? Isn't there a fairly substantial return on investment for someone in Russia who does that? This is not meant as a put down or an insult. Just general befuddlement because it seems like a no-brainer to me. I have no probs reading and listen in english, but writing is bad caused lack of practice.
  > So cpu.[ch] are from fake86 -- via faux86 or straight from fake86? What version did you fork? That is something that would be good to put in the readme.
  I think it's mixture of all availble version, but base is from https://github.com/lgblgblgb/fake86
  RP2040 drivers is mostly by murmulator community, and initial code by from https://github.com/AlexEkb4ever the creator of murmulator devboard hardware. They're also rewriten and improved by my, but initial from Alex
  My code or often deep rewrite of others code is in ./src/ except emu8950 and cpu.(c|h). MiniFB for win32 is striped down version of minifb laying around at github, linuxminifb is my implementation for linux.
  About structs and so on, i'm not C coder at all i've started with C from scratch two years ago in my spare time :) So this project, if we trace from inital commit till now is mirror of my growth as C coder :) Also, about Linux/Win32 versions. Win32 version used for overal algo debuggin. Linux version is just 'because why not?'
  Typescript is my everyday toy and tool (memory economy? cpu cycles waste... huh!).
  Network Redirector uses >=DOS 4 structs, main difference between dos versions is CDS struct and SDA, which is easely can be changed.
  There is pre-configured boot FDD0 image for which should be used to achieve best emulator configuration and performance.
  
  peterfirefly a day ago
  
  > I have no probs reading and listen in english, but writing is bad caused lack of practice.
  It's worse than my German, and I have the excuse of German having lots of cases and inflections (but fewer than Russian). English is like a toddler language in comparison ;)
  Best of luck with your English practice.
  Put some text about the "source code sources" in your docs. Just take what you wrote in the comments here.
  Same goes for the redirector/DOS versions.
  Structs are essential to good, clean coding in C (and most other languages). Modern compilers are good at handling structs being passed in and returned from functions, especially for small inlined functions. They are also good at handling pointers to structs being passed into functions, especially for small inlined functions. Play around with gcc/clang + either the -S option (to generate assembly output and then stop) or with objdump or some other assembler. Or use godbolt (Compiler Explorer). You'll be amazed at how efficient the code is.
  It's probably a good idea to create a number of short instruction traces, maybe just a thousand instructions each, and figure out a way to build a program that runs them and times each of them. If you can also enable profiling on your Raspberry Pi Pico target so you can see where each trace spent most of its time, it would likely be very useful.
  What's your roadmap for the project? Just tinkering? Becoming a better C programmer? Becoming better at embedded programming? Better at ARM32? "Quality of life" improvements that make it easier to use the emulator? Better emulation? Specific games/apps you want to work well?
  
  duskwuff 2 days ago
  
  > That is indeed tiny...
  Even RP2040 is fairly large as far as microcontrollers go. The widely used STM32F103 is a single 72 MHz Cortex-M3 core with 20 KB SRAM and 64 KB flash, for example. Even smaller parts aren't uncommon.
  
  peterfirefly a day ago
  
  I know. But they won't run PC emulators unless the programmer is truly heroic.
  I have worked on a slower Cortex-M3 than that. I've also worked on 8051 variants and on ST-62 (called "ST6 architecture" in the link below). My first computer was a ZX81 with a 16KB RAM pack.
  https://en.wikipedia.org/wiki/ST6_and_ST7
xrip 2 days ago

Btw, vogons may be more intersted in other project of mine https://github.com/xrip/retro-sound It also can be used on real hardware via COM port, but someone should write DOS TSR like OPL2LPT
Win32 version on pico-286 can also use retro-sound :)

dmitrygr 2 days ago

be warned: LLM slop with mucho bugs (i, immediately, see a few serious correctness issues)

rasz 2 days ago

It does seem to run things. video with no sound from 2023 of pico-xt (pico-286 was based on it) https://www.youtube.com/watch?v=9F0lXviwLE8 and one review from 2024 https://habr-com.translate.goog/ru/articles/842292/?_x_tr_sl... with actual useful info:
-performance ~8MHz 286 on rp2040
-“VGA and EGA modes on pico-xt are supported to a very limited extent (there is little memory in the microcontroller) and 90% of games will not work in them” but that was 2 years ago
-Prince of Persia and Monkey Island run
-“King’s Bounty just doesn’t start”
- dmitrygr 2 days ago
  
  Getting "some things" to run is 0.001% of work of writing an emulator. Reaching "50% of the software corpus for the system" is 1% if the work, reaching "90% of the software corpus" is 10% of the work, reaching "99% of the software corpus" is 20% of the work, and so on...
  The long tail is long, and for older, more esoteric systems, it is veeeeeeeeery long. You'd be surprised how far a 286 emulator will get with broken implementation of carry out sometimes.
  Source: wrote more emulators than I can count