Cameron Kaiser: Another weekend on the new computer (or, making the Talos II into the world's biggest Power Mac) |
Your eyes do not deceive you -- this is QEMU running Tiger with full virtualization on the Talos II. For proof, look at my QEMU command line in the Terminal window. I've just turned my POWER9 into a G4.
Recall last entry that there was a problem using virtualization to run Power Mac operating systems on the T2 because the necessary KVM module, KVM-PR, doesn't load on bare-metal POWER9 systems (the T2 is PowerNV, so it's bare-metal). That means you'd have to run your Mac operating systems under pure emulation, which eked out something equivalent to a 1GHz G4 in System Profiler but was still a drag. With some minimal tweaks to KVM-PR, I was able to coax Tiger to start up under virtualization, increasing the apparent CPU speed to over 2GHz. Hardly a Quad G5, but that's the fastest Power Mac G4 you'll ever see with the fastest front-side bus on a G4 you'll ever see. Ever. Maximum effort.
The issue on POWER9 is actually a little more complex than I described it (thanks to Paul Mackerras at IBM OzLabs for pointing me in the right direction), so let me give you a little background first. To turn a virtual address into an actual real address, PowerPC and POWER processors prior to POWER9 exclusively used a hash table of page table entries (PTEs or HPTEs, depending on who's writing) to find the correct location in memory. The process in a simplified fashion is thus: given a virtual address, the processor translates it into a key for that block of memory using the segment lookaside buffer (SLB), and then hashes that key and part of the address to narrow it down to two page table entry groups (PTEGs), each containing eight PTEs. The processor then checks those 16 entries for a match. If it's there, it continues, or else it sends a page fault to the operating system to map the memory.
The first problem is that the format of HPTEs changed slightly in POWER9, so this needs to be accommodated if the host CPU does lookups of its own (it does in KVM-HV, but this was already converted for the new POWER9 and thus works already).
The bigger problem, though, is that hash tables can be complex to manage and in the worst case could require a lot of searches to map a page. POWER8 and earlier reduce this cost with the translation lookaside buffer (TLB), used to cache a PTE once it's found. However, the POWER9 has another option called the radix MMU. In this scheme (read the patent if you're bored), the SLB entry for that block of memory now has a radix page table pointer, or RPTP. The RPTP in turn points to a chain of hierarchical translation tables ("radix tree") that through a series of cascading lookups build the real address for that page of RAM. This is fast and flexible, and particularly well-suited to discontinuous tracts of addressing space. However, as an implementational detail, a guest operating system running in user mode (i.e., KVM-PR) on a radix host has limitations on so-called quadrant 3 (memory in the 0xc... range). This isn't a problem for a VM that can execute supervisor instructions (i.e., KVM-HV) because it can just remap as necessary, but KVM-HV can't emulate a G3 or G4 on a POWER9; only KVM-PR can do that.
Fortunately, the POWER9 still can support the HPT and turn the radix MMU off by booting the kernel with disable_radix. That gets around the second problem. As it turns out, the first problem actually isn't a problem for booting OS X on KVM once radix mode is off, assuming you hack the KVM-PR kernel module to handle a couple extra interrupt types and remove the lockout on POWER9. And here we are.(*)
Anyway, you lot will be wanting the Geekbench numbers, won't you? Such a competitive bunch, always demanding to know the score. Let's set two baselines. First, my trusty backup workstation, the 1GHz iMac G4: It's not very fast and it has no L3 cache, which makes it worse, but the arm is great, the form-factor has never been equaled, I love the screen and it fits very well on a desk. That gets a fairly weak 580 Geekbench (Geekbench 2.2 on 10.4, integer 693, floating point 581, memory 500, stream 347). For the second baseline, I'll use my trusty Quad G5, but I left it in Reduced power mode since that's how I normally run it. In Reduced, it gets a decent 1700 Geekbench (1907/2040/1002/1190).
First up, Geekbench with pure emulation (using the TCG JIT):
... aah, forget it. I wasn't going to wait all night for that. How about hacked KVM-PR?
Well, damn, son: 1733 (1849/2343/976/536). That's into the G5's range, at least with math performance, and the G5 did it with four threads while this poor thing only has one (QEMU's Power Mac emulation does not yet support SMP, even with KVM). Again, do remember that the G5 was intentionally being run gimped here: if it were going full blast, it would have blown the World's Baddest Power Mac G4 out of the water. But still, this is a decent showing for the T2 in "Mac mode" given all the other overhead that's going on, and the T2 is doing that while running Firefox with a buttload of tabs and lots of Terminal sessions and I think I was playing a movie or something in the background. I will note for the record that some of the numbers seem a bit suspect; although there may well be a performance delta between image compression and decompression, it shouldn't be this different and it would more likely be in the other direction. Likewise, the poor showing for the standard library memory work might be syscall overhead, which is plausible, but that doesn't explain why a copy is faster than a simple write. Regardless, that's heaps better than the emulated CPU which wouldn't have finished even by the time I went to dinner.
The other nice thing is that KVM-PR-Hacky-McHackface doesn't require any changes to QEMU to work, though the hack is pretty hacky. It is not sufficient to boot Mac OS 9; that causes the kernel module to err out with a failure in memory mapped I/O, which is probably because it actually does need the first problem to be fixed, and similarly I would expect Linux and NetBSD won't be happy either for the same reason (let alone nesting KVM-PR within them, which is allowed and even supported). Also, I/O performance in QEMU regardless of KVM is dismal. Even with my hacked KVM-PR, a raw disk image and rebuilding a stripped down QEMU with -O3 -mcpu=power9, disk and network throughput are quite slow and it's even worse if there are lots of graphics updates occurring simultaneously, such as installing Mac OS X with the on-screen Aqua progress bar. Minimizing such windows helps, but only when you're able to do so, of course. More ominously I'll get occasional soft lockouts in the kernel (though everything keeps running), usually if it's doing heavy disk access, and it acts very strangely with stuff that messes with the hardware such as system updates. For that reason I let Software Update run in emulated mode so that if a bug occurred during the installation, it wouldn't completely hose everything and make the disk image unbootable (which did, in fact, happen the first time I tried to upgrade to 10.4.11). Another unrelated annoyance is that QEMU's emulated video card doesn't offer 16:9 resolutions, which is inconvenient on this 1920x1080 display. I could probably hack that in later.
QEMU also has its own bugs, of course; support for running OS 9/OS X is very much still a work in progress. For example, you'll notice there are no screenshots of the T2 running TenFourFox. That's because it can't. I installed the G3 version and tried running it in QEMU+KVM-PR, and TenFourFox crashed with an illegal instruction fault. So I tried starting it in safe mode on the assumption the JIT was making it unsteady, which seemed to work when it gave me the safe mode window, but then when I tried to start the full browser still crashed with an illegal instruction fault (in a different place). At that point I assumed it was a bug in KVM-PR and tried starting it in pure emulation. This time, TenFourFox crashed the entire emulator (which exited with an illegal instruction fault). I think we can safely conclude that this is a bug in QEMU. I haven't even tried running Classic on it yet; I'm almost afraid to.
Still, this means my T2 is a lot further along at being able to run my Power Mac software. It also means I need to go through and reprogram all my AutoKey remappings to not remap the Command-key combinations when I'm actually in QEMU. That's a pain, but worth it. If enough people are interested in playing with this, I'll go post the diff in a gist on new Microsoft Visual GitHub, but remember it will rock your socks, taint your kernel, (possibly) crash your computer and (definitely) slap yo mama. You'll also need to apply it as a patch to the source code for your current kernel, whatever it is, as I will not post binaries to make you do it your own bad and irresponsible self, but you won't have to wait long as the T2 will build the Linux kernel from scratch and all its relevant modules in about 20 minutes at -j24. Now we're playing with POWER!
What else did I learn this weekend?
ls | fgrep -vi '.ttf' | fgrep -vi '.otf' | fgrep -vi '.dfont' | fgrep -vi '.bmap' | perl -ne 'chomp;print"\"$_\"\n"' | xargs -n 1 -I '{}' macbinconv -mac '{}' -mb '/home/spectre/rfont/{}.bin'
The little snippet of Perl there preserves embedded spaces in the filenames. When I ran it, it turned all the font resources into MacBinary, I transferred the resulting files to the T2, and Fondu converted them as well. Now I have my fonts.
SUBSYSTEM=="usb", GROUP="usb", MODE="0660"
Now shark can control it, and I can listen in VLC (using a URL like pulse://alsa_input.usb-Griffin_Technology__Inc._RadioSHARK-00.analog-stereo).
(*) Given that no changes were made to the HPTE format to get KVM-PR to work for OS X, it may not even be necessary to run the kernel with radix mode off. I'll try that this coming weekend at some point.
http://tenfourfox.blogspot.com/2018/06/another-weekend-on-new-computer-or.html
Комментировать | « Пред. запись — К дневнику — След. запись » | Страницы: [1] [Новые] |