I don't have much free time lately, unfortunately, and I have too many other things to do to keep working on M2/M3 compression right now. I also don't want this to turn into another nightmare that VMU BIOS dumping turned out to be, so I figured I'll just post what I know and you guys pick up from there. After all there are always cries that I don't release any sources and so many of you would love to help. Well, now you can :)
Small disclaimer: this is all based on my tests run on hardware and/or applying brain gray matter to it - there might be errors. Also, Andreas changed byte ordering in the decoder code he released into public (compared to earlier versions) so, depending on how you implemented it, you might need to swap it again after decompression step.
Compressed M2/M3 is a bit-stream based on variable length codes. The decompression step is done transparently after decryption, and bit 1 of the control word decides if it's used or not.
As far as compression goes this system is rather inefficient and also not really used to its full potential. Perhaps this is just another layer of protection, by obfuscation, with the data size reduction being a useful side effect.
There are only two types of codes being used:
1) repeat code <8-bit value>
2) fetch code
To simplify things I will use capital R to indicate repeat code and F for fetch code.
Every R is followed by 8-bit value that will be repeated N-times in the output string. The N depends on the code.
F codes are standalone and will emit data that was already decoded earlier (which implies there is a buffer somewhere that stores the decoded output). Just how many bytes will be output and the fetch depth depends on the code.
This is how it works (must be viewed with proportional font):
IN: 78 09 80 20 52 00 0C 01 5E 5F D1 3C
OUT: 01 00 00 00 05 00 00 00 00 00 00 00 00 00 00 00 BC FE 27 3D
7 8 0 9 8 0 2 0 5 2 0 0 0 C
0111 1000 0000 1001 1000 0000 0010 0000 0101 0010 0000 0000 0000 1100
.... .000 0000 1... ..00 0000 00.. 0000 0101 .... ..00 0000 00.. ...0
0 1 0 0 0 5 0 0 0
0111 1 001 10 10 0010 00 00 110
^^x3^^ ^^x8^^ ^
0 1 5 E 5 F D 1 3 C
0000 0001 0101 1110 0101 1111 1101 0001 0011 1100
0000 000. .101 1110 0..1 1111 110. .000 0011 1..0
0 B C F E 2 7
1 0 10 1 0 10
^x3^^
By now you should've ralized that "10" is R1 code, meaning the following byte will be output exactly once. This code is used pretty often and bloats the stream, the compression barely manages to suppress this effect and work out a small gain.
Well, that's the easy part. Now for the quirks.
First, every stream must start with what I call start code (S). It works just like R1 but apparenly also resets/setups the internal state machine. Not much of an issue for the decoder but because it's not fully understood there will be some problems later on.
Second, the data is split into 512-byte long chunks, counting the output bytes (not the compressed ones). This is important, the counters follow the output address and not the actual stream, and unlike anything M2/M3 related so far it's bytes and not words. Anyway, at 0x1F9-0x1FF of each chunk there is a magic/wacky region :)
In the wacky region there are different codes being used. Not only that, these are different depending on the position of the byte:
IN: DE 60 7C E6 83 31 3E
OUT: CC CC 3E CD CC CC 3E
0x1f 9 a b c d e f
D E 6 0 7 C E 6 8 3 3 1 3 E
1101 1110 0110 0000 0111 1100 1110 0110 1000 0011 0011 0001 0011 1110
.... .110 0110 0..0 0111 110. .110 0110 1... ..11 0011 00.. 0011 1110
C C 3 E C D C C 3 E
1101 1 00 0 1 000 00 01
^^x2^^ ^^x2^^
Even the most basic R1 code is "00", or "01" here, and second R2 code is nothing like the first one. Not to mention it's all zeros so it would clash with "00" being R1.
The idea I had was that there are XOR masks for each of these locations and the code goes through the mask first and then it's decoded. By doing so one can normalize all codes to form a coherent second set - or so I though but I hit a wall with this approach as well. One workaround would be to keep a separate set for each location I guess...
There is more though :)
Every wacky zone must end with another start code. If you don't do it that way, the resulting output will start to break apart soon, seemingly requiring magic codes to appear in normal region as well. This behaviour is so weird I'm not even going to bother with it anymore, as long as there are start codes used properly. The real problem is the start code will move the output offset to next 512-byte boundary. In other words, if the wacky zone doesn't have enough data to fill all bytes in 0x1F9-0x1FF, the remainder will be filled with something and the start will pick up from 0x200. And there also seems to be a trick when you can use the first bits of the start code as part of the previous code (only F since an R needs a following byte). That or 1-bit codes are possible, which I don't think is the case.
Alas, that is still only the tip of the iceberg. It seems there are either codes longer than 8 bits or there is a preffix possible before the start code. Also, I worked mostly on SEGA Tetris data but there are other games, like Guilty Gear X, that use this system. In GGX the control word has the bit 0 cleared and it seems that the wacky region is now at 0x0FF and not 0x1FF, and possibly the fetch depths of F codes are also shorter accordingly. Haven't really tested it yet.
1-bit code/start code combo in wacky zone example:
IN: 69 6C 1D A0 97 E6 A6 65 7D 66 A6
OUT: 96 00 00 00 04 00 00 00 CD CC
0x1f 9 a b c d e f
6 9 6 C 1 D A 0 9 7 E 6
0110 1001 0110 1100 0001 1101 1010 0000 1001 0111 1110 0110
.. 1001 0110 .... ..,, ,,,, ,..0 0000 100, .... .110 0110
9 6 0 4 C D
("." and "," denote different codes)
The trivial way to implement this system is to build a tree having a leaf for each 0/1 bit value. Table-based version would be faster once it's fully understood though. Since F codes need some previous data to exits, these are not used (and possibly not allowed?) in first data chunk. Also, I only spotted some R codes in later chunks but I'm pretty sure those will work in the first one. Here's what I got so far (the format is: bit code, type, N, fetch depth):
("10", C_REPEAT, 1, 0);
("0100", C_REPEAT, 2, 0);
("00110", C_REPEAT, 3, 0);
("010100", C_REPEAT, 4, 0);
("0110001", C_REPEAT, 5, 0);
("01100101", C_REPEAT, 6, 0);
("00011110", C_REPEAT, 7, 0);
("001000", C_REPEAT, 8, 0);
("01101", C_FETCH, 1, 0x1FF);
("1101", C_FETCH, 1, 0x200);
("0111", C_FETCH, 1, 0x201);
("00000", C_FETCH, 2, 0x1FF);
("11001", C_FETCH, 2, 0x200);
("01011", C_FETCH, 2, 0x201);
("001010", C_FETCH, 3, 0x1FF);
("00111", C_FETCH, 3, 0x200);
("110000", C_FETCH, 3, 0x201);
("0110011", C_FETCH, 4, 0x1FF);
("00001", C_FETCH, 4, 0x200);
("010101", C_FETCH, 5, 0x200);
//("0110000", C_FETCH, 5, 0x201); - not seen, found by testing
("000110", C_FETCH, 6, 0x200);
("1100011", C_FETCH, 7, 0x200);
//("01100100", C_FETCH, 7, 0x201); - not seen, found by testing
("00010", C_FETCH, 8, 0x1FF);
("111", C_FETCH, 8, 0x200);
Wacky codes:
("10", C_REPEAT, 1, 0);
("11011", C_REPEAT, 2, 0);
("11101", C_REPEAT, 4, 0);
("0111001", C_FETCH, 1, 0x1FF);
("01111", C_FETCH, 1, 0x200);
("11110", C_FETCH, 1, 0x200); // ???
("110000", C_FETCH, 2, 0x201);
These assume XOR code bitmasks for 0x1F9-0x1FF:
s_xor [0] = 0x00; // 9 ..000000
s_xor [1] = 0x02; // A ......10
s_xor [2] = 0x02; // B .0000010
s_xor [3] = 0x1B; // C ..011011
s_xor [4] = 0x1B; // D ...11011
s_xor [5] = 0x02; // E ......10
s_xor [6] = 0x03; // F ......11
In short: to fully crack the compression the wacky areas need to be better understood. Figure that out and come back here - the rest is piece of cake really :D
You can download ST files I worked on here. There is the intermediate post-decryption, prior to decompression data stream in "input.bin", the final result in "result.bin" and a small log from my app that might help you writing your own code. Have fun!
Source:dknute.livejournal.com
0 Comments
Post a Comment