Author Topic: Front Mission 2 (PSX) Character Portrait Compression  (Read 9312 times)

FaustWolf

  • *
  • Posts: 60
    • View Profile
I've got this over at Xentax too, but I figured I'd port it over here since it deals with how the PSX handles compressed data. Figuring this out will be a useful exercise for me, and PSX enthusiasts here may also be interested (or already know what's going on).

While the vast majority of Front Mission 2's graphics are uncompressed on the game CD, it appears that the character portraits are compressed in a manner functionally similar to RLE compression, i.e., runs of identical bytes or repeating byte patterns are collapsed into just a few. However, it doesn't seem to be simple RLE and I can't find any sort of identifying marker at the beginning of the compressed graphics data that would indicate what type of compression is being used. I'm interested in the opinions and observations of this forum's veteran game resource investigators.

The following pics are three comparisons between the compressed data on the game CD (top) and uncompressed form as it appears in VRAM (bottom). Green bytes in the upper section of the examples are the compressed form of their expanded counterparts in the lower section of each example. Red bytes are identical in both the data on the game CD and in VRAM, meaning they are not compressed on the game CD.




This comparison is particularly interesting. Notice the compressed code 41 E9 04 from the game CD. This corresponds to four E9 bytes of uncompressed data, as if the compressed code specified that there would be four E9 bytes in a row.


Ah, you can see another instance of what appears to be near-straight RLE compression here. There's a compressed run that reads 40 08 03 in the upper portion of this pic, and it corresponds to an expanded run of 08 08 08 in the lower portion.

Another thing I've noticed is that the first byte in many runs of compressed data may have something to do with the uncompressed length of that run. For example, if a compressed run begins with the byte 9D, the uncompressed run has 31 bytes. Here's a list of equivalencies that appear in these examples:

9A: uncompressed run has 28 bytes
9B: uncompressed run has 29 bytes
9C: uncompressed run has 30 bytes
9D: " " 31 bytes.
A0: " " 34 bytes.

But this pattern doesn't hold all the time. I observe one compressed run beginning with 82 that has an uncompressed length of 10 bytes, and a compressed run beginning with C2 that has an uncompressed length of 30 bytes. Is it possible that the compression scheme being used here involves a mixture of compression techniques?
« Last Edit: 2008-02-25 19:45:09 by FaustWolf »

FaustWolf

  • *
  • Posts: 60
    • View Profile
Re: Front Mission 2 (PSX) Character Portrait Compression
« Reply #1 on: 2008-02-25 22:01:45 »
There are other instances in the data where a run beginning with 0x41 is straight-RLE compressed. For example, there's a 41 00 04, which represents 00 00 00 00. This specific example isn't pictured, but it's interesting. Given that there's a 40 08 03 in one of the posted examples, and that uncompresses to 08 08 08, we have...

40: Tells whatever device is responsible for decompression that the next two bytes represent RLE compression with an uncompressed run of 3 bytes.

41: " " the next two bytes represent RLE compression with an uncompressed run of 4 bytes.

Could the first byte of any compressed run be an opcode of sorts, perhaps? The general magnitude of its value might determine the type of compression used, and its exact value indicates the uncompressed length as well. The bytes after the opcode determine the content of the uncompressed run. Anyone ever see a compression method like this before?

EDIT: Okay, I've found a file identifier for the compressed files. In ASCII, it's PB.., or in hex, 50 42 00 00. This header does not exist in the uncompressed equivalent.
« Last Edit: 2008-02-25 23:39:42 by FaustWolf »

halkun

  • Global moderator
  • *
  • Posts: 2097
  • NicoNico :)
    • View Profile
    • Q-Gears Homepage
Re: Front Mission 2 (PSX) Character Portrait Compression
« Reply #2 on: 2008-02-26 02:20:36 »
We have some really funny rules here...

One is that we make frowny faces on double-posts.

:(

Things to take from this... Square often uses LZS type compression. The RLE thing is new to my knowledge.


FaustWolf

  • *
  • Posts: 60
    • View Profile
Re: Front Mission 2 (PSX) Character Portrait Compression
« Reply #3 on: 2008-02-27 01:36:52 »
Yikes, I've developed a bad habit of double-postedness. :-D

I shall do more research on LZS compression techniques; I'm aware of Ficedula's notes on the wiki now, thanks for pointing this possibility out halkun. Front Mission 2 and FF7 were released in the same year, but given the *apparent* presence of RLE runs seemingly at random, this is perhaps a variation on compression schemes used in other Square games.

For one thing, the compressed portraits each begin with the same bytes (PB.. in ASCII), which might just represent the uncompressed file length. I find it strange (probably just given my inexperience in compression) that all the uncompressed portrait files would have the same length in bytes. The portraits are all the same dimensions, but I figure each portrait will be unique in the amount of "blank space" surrounding the character's face, which would determine the ease with which the file can be compressed in the first place.

I can confirm that the *compressed* portrait files are all the same length, for what it's worth.

halkun

  • Global moderator
  • *
  • Posts: 2097
  • NicoNico :)
    • View Profile
    • Q-Gears Homepage
Re: Front Mission 2 (PSX) Character Portrait Compression
« Reply #4 on: 2008-02-27 04:17:50 »
Waaaait!!!

Aren't you supposed to be working on Chrono Cross?

GET BACK TO WORK!!!!!

:P

FaustWolf

  • *
  • Posts: 60
    • View Profile
Re: Front Mission 2 (PSX) Character Portrait Compression
« Reply #5 on: 2008-02-27 05:07:40 »
Shh, don't tell anyone! :lol:

This is merely a diversion I can chew on while I compile the necessary updates to the Chrono Cross File Structure wiki at the Compendium. I've never dealt with compression before, and the very nicely-drawn Front Mission 2 portraits give me great incentive to investigate this sort of thing.

According to the LZS format wiki, http://wiki.qhimm.com/FF7/LZS_format, a control byte precedes a block of data composed of literal bytes and two-byte references, and you can determine the composition of a block by decomposing the control byte into its bits. Thus a control byte 03 (binary 00000011) tells us that the following block of data is (2 pairs of references + 6 single literal) = 10 bytes long, or two pairs of reference bytes followed by six literal bytes.

Using this analogy with the second comparison example pic in my first post, the top portion (compressed data) starts with a control byte A0. Decomposed into bits, this would read 10100000, or six literal bytes, one reference pair, one literal byte, and one reference pair in order before the next control byte comes into play. However, it would appear that the compressed data is actually set up as follows:

Control byte 0xA0 -> one reference pair -> sixteen literal bytes -> next control byte 0xA0. Literal bytes would be the ones in red in the examples, and the controls and compression references would have to be the bytes in green in the sample data.

Since the analogy doesn't describe what I'm seeing, I'm either misinterpreting the control byte scheme presented in the LZS wiki or Front Mission 2 character portraits are compressed in some variation of LZS that is not identical to the compression scheme used in FF7. While I get back to work on the Cross model project, I'll be thinking about how/if the basic principles of LZS compression can be adapted to reproduce what's in the sample data I lifted from the FM2 game CD and its associated uncompressed content in VRAM.
« Last Edit: 2008-02-27 05:13:20 by FaustWolf »

halkun

  • Global moderator
  • *
  • Posts: 2097
  • NicoNico :)
    • View Profile
    • Q-Gears Homepage
Re: Front Mission 2 (PSX) Character Portrait Compression
« Reply #6 on: 2008-02-27 21:16:37 »
Bounce that off of Cyb, he knows the math behind LZS more than I

FaustWolf

  • *
  • Posts: 60
    • View Profile
Re: Front Mission 2 (PSX) Character Portrait Compression
« Reply #7 on: 2008-02-28 01:49:33 »
I'd definitely be interested in Cyb's observations. It does appear that the "sliding window" typical of LZS compression is coming into play with this graphics data. I don't think Ficedula referred to the method as a "sliding window," but I think that's what's described in his notes -- when the decompression algorithm reaches a section of compressed reference data, that reference data tells the algorithm to fish around in the uncompressed file up to that point and regurgitate a specific run of bytes that fall within the "window".

While tinkering with the compressed data and its expanded VRAM equivalent, I cobbled together a mock "freeze frame" of one of the portrait files in mid-decompression. Whatever decompression algorithm is used would encounter the reference bytes D4 03 08 05 at offset 0x1066 (0x1066 bytes had been decompressed up to this point), and these reference bytes magically transform into a BF followed by twenty-one 00s in the uncompressed file. Lo and behold, there are two instances earlier in the decompressed data of one BF followed by twenty-one 00s! Thus it is extremely likely that the reference bytes instruct the decompressor to re-create that previous run of bytes. It's a fascinating compression scheme.

For now I'll presume that the size of the "sliding window" is 4096 (0x1000) bytes as in Ficedula's FF7 notes. The next step will be to do some thought experiments to determine which earlier BF 00 00... is being targeted (both instances of these runs are within the sliding window, at addresses 0x326 and 0x863) and explore the possible ways in which bitwise operations could be used to get those four reference bytes to point at the target. How I enjoy logic games when there's a treasure at the end!  :lol:

EDIT: Oooh, I just realized I should do the thought experiment with the BF 00 00... run that occurs just prior to the one I've been looking at. That way I'll know that it's definitely a duplicate of the first one starting out, so I can cut the theoretical work in half. This is why typing-while-thinking is important.

I will report my findings for everyone's amusement. Hopefully compression won't be as hard to work with as I had originally feared. The key is probably just having the "before decompression" and "after decompression" examples on hand to look at, although really experienced hackers can just trace the decompression algorithms in an emulator and be done with it.

SUPER EDIT: Ooh! Ooh! I've got part of it now! And more imporantly, I didn't even accidently double-post in my excitement!  :mrgreen:

Here's two situations to illustrate some of what's going on with the decompression algorithm.

Situation 1
The decompression algorithm has worked its way to offset 0x1066, where it encounters the reference instruction D4 03 08 05. Now the algorithm reads those middle two bytes in little endian and goes 0x803 bytes back into the sliding window. Guess what's at offset (x1066 - x803) = x863? Yes! It's the BF 00 00... run we're looking for.

Situation 2
Earlier, the decompression algorithm worked its way to offset 0x863, where it encountered the reference instruction D7 3D 05 29. 0x863 - 0x53D = 0x326, which is -- yee-haw! -- the offset of the first BF 00 00... run!

Wow, this is fun. Now to figure out what the first and last bytes in the reference instructions do, and why reference instructions may be odd or even lengths, and why they may even appear like straight-out RLE instructions at times. But I guess the use of a sliding window proves that this is an LZS derivative, eh?

Mega Edit: The last byte in a four-byte reference specifies the number of literal bytes following the reference, in essence pointing the algorithm to the next control byte. For example, 0x5 literal bytes follow the reference in Situation 1 above, and 0x29 literal bytes follow the reference in Situation 2.

This end-byte rule applies to all compression references, regardless of their length. The last byte always tells the algorithm how many literal bytes come after the reference. Now to figure out what the first (the "control") byte does, exactly...

For one thing, I've discovered that a three-byte reference can act as follows: Control; NegativeOffset; #LiteralsFollowing. In other words, at times only one byte is needed to specify the negative offset for copying because that offset is less than x100. The control must serve the following functions, at the very least...

*Specify the number of bytes to regurgitate from earlier in the uncompressing file.
*Specify the number of bytes needed for the NegativeOffset.

This is where bitwise operations will probably come into play. Time to look at some 1s and 0s!

Another EDIT:

Decomposing some control bytes into bits reveals the following pattern.

81: 10000001 The following backward reference is single (value less than or equal to 0xFF).
D7: 11010111 The following backward reference is a double (value greater than 0xFF).
D4: 11010100 The following backward reference is a double.
A0: 10100000 The following backward reference is single.
C1: 11000001 The following backward reference is a double.
9D: 10011101 The following backward reference is single.
41: 01000001 The following reference is a straight-out RLE reference
42: 01000010 The following reference is a straight-out RLE reference

The only thing that really stands out to me across the examples are the leftmost two bits. Question: I think these would actually be the *final* two bits fed into the decompression algorithm if the byte is being read in Little Endian order. Is that correct? 

If so, we might have the following as decompression "opcodes" of sorts, if this sort of thing is possible:
01: LZS decompression, next backward reference is a single byte.
11: LZS decompression, next bakcward reference is a pair of bytes.
10: RLE decompression, next reference is the byte to be repeated.

The number of bytes to copy, either for LZS or RLE decompression, must also be specified by the control byte. There's definitely a pattern across the control bytes for this as well, but the fact that 0xD7 and 0xD4 both specify a copied run of (Dec 22, x16) bytes throws things off kilter for the time being. Other than the relative positioning of the copied runs, nothing differs between the runs copied by these control bytes (these are the BF 00... runs). The control bytes should be identical, and yet they aren't.

Ha! Scratch that. The control byte 0xD7 in Situation 2 above corresponds to a run of one BF and dec 24 zero bytes. The control byte 0xD4 in Situation 1 above corresponds to a run of one BF and dec 21 zero bytes. Looks like the pattern holds!

I think I'm seriously close to fully cracking this decompression algorithm. Anyone happen to know of any open-source LZS decompressors? I'm wondering if I could simply modify existing code to handle the decompression at hand. One problem is that I've never seen LZS decompression being used side-by-side with RLE compression, so I'll have to figure out how to add that. Or maybe I'll just research python decompression, since python is the first programming language I intend to learn. Wish me luck!
« Last Edit: 2008-02-29 16:48:41 by FaustWolf »

halkun

  • Global moderator
  • *
  • Posts: 2097
  • NicoNico :)
    • View Profile
    • Q-Gears Homepage
Re: Front Mission 2 (PSX) Character Portrait Compression
« Reply #8 on: 2008-02-29 22:49:33 »
Q-gears has one it it. I can see if I also have an implementation at home.

dziugo

  • *
  • Posts: 1470
    • View Profile
    • A new copy of FF7 thanks to Salk. Pack (zip/rar/etc) your saved game before sending it to me.
Re: Front Mission 2 (PSX) Character Portrait Compression
« Reply #9 on: 2008-03-01 08:25:24 »
There is one in the package that can be downloaded here. File is named "unlzs.cpp".

halkun

  • Global moderator
  • *
  • Posts: 2097
  • NicoNico :)
    • View Profile
    • Q-Gears Homepage

FaustWolf

  • *
  • Posts: 60
    • View Profile
Re: Front Mission 2 (PSX) Character Portrait Compression
« Reply #11 on: 2008-03-01 15:28:07 »
Wow, thanks guys! I'll take a look at those LZS examples. When I get some time I'll post the full decompression algorithm as I understand it in English. The way the algorithm is set up, it would seem to involve a ton of "if, else" logic. The value of the control byte apparently specifies whether the following reference byte(s) are to be interpreted as RLE, LZS with one sliding-window reference byte, or LZS with two sliding-window reference bytes.

In short, here's what happens with the control bytes:

If the control byte is greater than or equal to 0x3D but less than 0x7E, the reference bytes represent RLE compression. The length of the uncompressed run will be (Control Byte - 0x3D) bytes long.

If the control byte is greater than or equal to 0x7E but less than 0xBE, only the next byte is a reference that points backward into the sliding window. The number of bytes to copy from the window equals (Control Byte - 0x7E).

If the control byte is greater than or equal to 0xBE, the next two bytes are references that point backward into the sliding window. The number of bytes to copy from the window equals (Control Byte - 0xBE).

I don't believe the control bytes specify where the compressed run ends because, for the middle type above, the length of the compressed runs is completely variable even if the beginning control byte is consistent among compressed runs. Some more "if, else" logic may be needed to identify the last byte, which should always be less than 0x3D. That last byte instructs the algorithm to simply write the next (last byte value) bytes, then it encounters another control byte, rinses, and repeats.

From what I've seen of Ficedula's notes, the LZS compression in FF7 was "consistent," whereas I would describe FM2's compression as "situational." This might mean that available LZS code would have to be modified to such an extent that I'll need to just code something from scratch. I've got a bunch of Python tutorials on hand now, and it appears I would be able to make use of that language's input-output functions. I've always wanted to learn a programming language, so this project will give me the incentive to buck up and do it.

EDIT: Heeey, look at this! Turns out a patent was filed in the US for a new compression scheme that combined single-pass LZSS and RLE compression in 1997 -- the very same year Front Mission 2 was released. Interesting. FM2 wasn't released in the US of course, but I wonder if the LZSS/RLE combination might appear in other Square games.
http://www.patentstorm.us/patents/5701125/description.html
« Last Edit: 2008-05-16 00:56:20 by FaustWolf »