The PC version used normal .wav (PCM) sound samples that were sequenced by number. Because the scripts were not changed between the PSX and the PC version, one of the AKAO opcode arguments could be a call to a particular sound effect.
I only have Psy-Q notes, and that format doesn't match anything I've got. What I have on the midi-like SEQ data and related audio is sketchy. Are you saying that the "midi" sequence data is actually in the Akao block?
Psy-Q had a bunch of converters that you used to make the data for the PSX. For example there was one that turned WAV files to VAG files. There was another that trasformed MIDI data into SEQ files. I'm guessing that the MIDI files in the PC version are the uncompiled SEQ files. Same goes for the audio samples (VAG) and soundfonts (VAB)
Here's what I have on how Psy-Q did sound.
You have a VAG file, which is a single ADPCM sample. It's format was like this
4 bytes - "VAGp" (I don't know why the lowercase "p", but the other sound formats have those too.)
8 bytes - ????? (version?)
4 bytes - data size
4 bytes - frequency
12 bytes - ???? (scratchpad?)
16 bytes - Original file name.... sometimes
after this was the actual ADPCM sample
You could group a bunch of VAGs into a VAB file. This is kind of like a soundfont on a PC
4 bytes - "VABp" (again with the "p")
8 bytes - ???? (version I think, same as VAG)
4 bytes - VAB file number
4 bytes - wave size
2 bytes - ????
2 bytes - number of overall sound setting presets (?)(found later, I will call this "preset A")
2 bytes - number of individual sound setting presets (?)(also found later, I call this "preset B")
2 bytes - number of VAGs in the file
1 byte - volume
1 byte - balance
6 bytes - ??? (Sometimes had text)
(16*preset A) bytes - overall sound table
(32*preset B) bytes - individual sound table
512 bytes - VAG offsets
varies - VAG files all sequenced togeather.
Building on the soundfont, you had a Sequence file, which is like a midi...
4 bytes - "SEQp"
4 bytes - version(?) (shorter than above)
2 bytes - resolution
3 bytes - tempo
2 bytes - rhythm
varies- Actual sequence data.
3 bytes - footer
Then, you can take the SEQ files and put them togeather into one big file called a SEP
4 bytes - "SEPp"
2 bytes - version(?)
2 bytes - SEQ number
varied - (stripped SEQ, missing the first 8 bytes)
2 byes - next SEQ number
varied - (stripped SEQ, missing the first 8 bytes)
2 byes - next SEQ number
varied - (stripped SEQ, missing the first 8 bytes)
....and on.
That's what I have... (Watch, I probably helped on accident like I did with the movies ^_^)