Mini vMac v3.2.1 (2010-08-04) Changelog:
Today's Development source snapshot contains an experiment in CPU emulation. The CPU emulation method introduced in 3.0.1, and made the default in 3.1.0 involves a table that classifies each of the 65536 primary opcodes, saving the work of decoding opcodes bit by bit. It has occurred to me that the reason that it is not much faster than the previous approach is because it is a poor fit for the caching scheme of modern processors. On each instruction it loads a random byte from this table, which can cause the CPU to load an entire cache line, perhaps 32 bytes or more, depending on the CPU, the rest of which likely won't be used. One alternative would be to go back to the previous bit by bit decoding, making the program a bit smaller and use a bit less memory, making it more "mini". But instead I've experimented with going in the opposite direction - as long as an entire cache line is being read in anyway, make each element of the table larger, saving additional information that can help with emulation. That's the basic idea, but the pros and cons are complex, and to see what really would happen I needed to try it. Each element is now 8 bytes, and depending on the opcode, saves information about the instruction arguments. The main advantage is that there are now fewer paths for decoding arguments, so those that remain can be better optimized, at the expense of making them larger. Parts of the routines they call are inlined into them, and then rearranged for better parallelism. Another advantage is that since there are no longer separate classes of, for example, the ADD instruction for each style of arguments, instead there can be separate classes for each of the argument sizes (byte, word, and long), avoiding nasty conditional branches that modern processors will likely mispredict. Another benefit is that more of the logic of the emulator is moved into the code that sets up the table (in M68KITAB), simplifying the multiple versions of MINEM68K (c code, and assembly language for each processor). The new approach so far averages around 5 percent faster for x86 and PowerPC assembly language, and I feel more improvement is possible. The c version can be slower than before. The new approach makes careful assembly language optimization more possible, but that generally doesn't help a c compiler. |
Download: Mini vMac v3.2.1 (2010-08-04)
Source:Here
0 Comments
Post a Comment