Author Topic: Having Problems with a LZS-Compression Algorithm  (Read 4819 times)

Chrisu

  • *
  • Posts: 363
  • Simply there
    • View Profile
    • Christian's Programs
Having Problems with a LZS-Compression Algorithm
« on: 2007-07-11 20:51:44 »
Hi, everyone,
i'm writing a lzs-compression algorithm for Highwind at the moment, but there just a simple thing i couldnt figure out until now.
I mentioned this in my Highwind-Thread, but apparently noone who knows an answer read it, so i'll ask it here once more ;)
By the way, my lzs-Decompression is working fine and included in Highwind.

The problem is: (My instruction was from the qhimm-wiki page for lzs-compression where i have the variable names from)
When decoding the lzs-Files i get the real_offset with this formula: real_offset = tail - ((tail - 18 - raw_offset) mod 4096)
But how do i reverse that when compressing, i've got the real_offset and want to get the raw_offset?

Maybe someone knows the answer because thats the only wrong thing in my algorithm, i've checked the compressed files.
I would be happy if you could help me,
Christian


Cyberman

  • *
  • Posts: 1572
    • View Profile
Re: Having Problems with a LZS-Compression Algorithm
« Reply #1 on: 2007-07-12 00:15:05 »
When decoding the lzs-Files i get the real_offset with this formula: real_offset = tail - ((tail - 18 - raw_offset) mod 4096)
But how do i reverse that when compressing, i've got the real_offset and want to get the raw_offset?

Maybe someone knows the answer because thats the only wrong thing in my algorithm, i've checked the compressed files.
I would be happy if you could help me,
Christian
REAL_off = (tail - 18 - raw_offset) % 4096
REAL_OFF = (tail - 18 - raw_offset) & 0x0FFF
ignoring the mode completely because that is a truncation calculation
raw_offset + real_offset = tail - 18
raw_offset = (tail - 18 - real_offset) & 0xFFFF
My guess.
tail is the current position correct?

To be more helpful however I'm going to quote the wiki
Quote
Here, 'tail' is your current output position (eg. 10,000), 'raw_offset' is the 12-bit data value you've retrieved from the compressed reference, and 'real_offset' is the position in your output buffer you can begin reading from. This is a bit complex because it's not exactly the way LZSS traditionally does (de)compression; it uses a 4K circular buffer; if you do that, the offset is more or less usable directly.
This means your compressor needs to use a 4K buffer in which to reference data.  You should also be certain to use the 2 special tricks of accessing the absolute location < 0 (this means write 0's since the buffer is CLEARED before you begin adding data into it to compress). It looks to me that they scan said 4K buffer for a pattern that matches the input data being looked at.  The offset is the pretty much the offset into the buffer (wrapped of course).  So that's the only thought I have on it.

Cyb

Chrisu

  • *
  • Posts: 363
  • Simply there
    • View Profile
    • Christian's Programs
Re: Having Problems with a LZS-Compression Algorithm
« Reply #2 on: 2007-07-13 05:11:39 »
And what about the buffer?
When do they clear/flush the buffer, as writing the data to the output file?
Or how do they handle it anyways?

Quote
raw_offset = (tail - 18 - real_offset) & 0xFFFF
Is this formula right (i think it should be "& 0x0FFF, does it?) ?
I'll try it on weekend.

EDIT: I recognized you forgot something in your calculation, it's
REAL_OFF = tail - (tail - 18 - raw_offset) & 0x0FFF
Does this change anything about the final formula?

EDIT: Come on, can noone help me here?
I just need the formula to calculate the index to write in the compressed file, when i have the index of the data to refer to in the uncompressed input file.
I couldnt get it to work properly with your way, Cyberman :(
« Last Edit: 2007-07-14 21:50:35 by Christian »

Cyberman

  • *
  • Posts: 1572
    • View Profile
Re: Having Problems with a LZS-Compression Algorithm
« Reply #3 on: 2007-07-16 23:49:20 »
Sorry I've got work to do :)

Anyhow the buffer is relative to itself. Thus you need to scan the buffer that's filled.
You also need to zero the buffer before you begin you encoding also.
So the number you see is the offset into the buffer from the current position it's reading from wrapped around inside the buffer.
That about sums it up. The pointer 'wraps' within the buffer is all :D Other than that well hmmm can't really tell you more.

Cyb

Chrisu

  • *
  • Posts: 363
  • Simply there
    • View Profile
    • Christian's Programs
Re: Having Problems with a LZS-Compression Algorithm
« Reply #4 on: 2007-07-18 16:23:42 »
Anyways thanks for your answer, obviously your the only one who cares...
Ill see if i can make something of that.

Cyberman

  • *
  • Posts: 1572
    • View Profile
Re: Having Problems with a LZS-Compression Algorithm
« Reply #5 on: 2007-07-19 03:13:44 »
Anyways thanks for your answer, obviously your the only one who cares...
Ill see if i can make something of that.
I don't think it's that I'm the only one who cares it's likely other people aren't able or don't have the time to give you an answer.
As I said the buffer must be cleared before you begin putting data in it from the file as you compress it.  If you don't do this some of the special types of encoding will be lost.  These help getting the compression ratio up as well as they used them in FF7's compression scheme.

Cyb

Chrisu

  • *
  • Posts: 363
  • Simply there
    • View Profile
    • Christian's Programs
Re: Having Problems with a LZS-Compression Algorithm
« Reply #6 on: 2007-07-19 14:13:54 »
And what to do when the buffer is full?
Start writing data into it from the beginning?
Isn't there a simple formula to calculate the index to write into the compressed file?

Thanks for your help, Cyberman, or someone else who knows anything about this and has the kindness to help me ;)

Cyberman

  • *
  • Posts: 1572
    • View Profile
Re: Having Problems with a LZS-Compression Algorithm
« Reply #7 on: 2007-07-22 16:03:28 »
And what to do when the buffer is full?
The buffer data that has already been compressed is tossed because you don't need it anymore, the buffer never gets full as a consequence.  As you add data (at the head of the buffer) you move the tail forward each time you add new data. Since the buffer is circular it reuses the same memory locations over and over again.
Start writing data into it from the beginning?
IE the head, the tail is virtual in this case because it's always 4095 locations away from the head in the queue. The buffer is a Fixed Queue (FIFO) as you cram stuff in you scan the Queue while compressing.  The Queue is initialized with 0's (cleared) to start with. SO when you begin scanning the queue for data sets this is where you get those strange offsets to negative areas near the beginning LZS files you decompress.
Isn't there a simple formula to calculate the index to write into the compressed file?

Thanks for your help, Cyberman, or someone else who knows anything about this and has the kindness to help me ;)
The simple formulea is your offset is relative to the current head of your queue.
If you are using numerical offsets and not pointers this is easier to calculate (but potentially slower).

The offset is always relative to the head of the queue.
However to properly answer you I'll need to think about 'what's going on' to give a better description.  It's been a while sadly.

Cyb