Yeah, I agree, having some PSP hardware/GPU documentation would make this a lot easier. Unfortunately, Googling is not turning up too many resources, but I don't think I'm searching for the right things. I keep getting this thread thrown back near the top of my search results.
I know that homebrew libraries have been able to utilize 3D features for some time. Unfortunately, I know those libraries work by loading official Sony kernel modules, so just looking at the homebrew devkit probably will not reveal low-level register info or GPU formats. I guess I'll keep Googling. Please anyone else who has some helpful links in this area, feel free to chime in.
I'll report back if I find any applicable documentation.
Edit: I think I have just found almost everything we need to know in the source code for "Potemkin", a PSP emulator. It has vertex decoding/pipeline modules, and lists for all of the types and flags for vertex types, and it even shows how to handle the vertex weights within the vertex stream!
The PSP appears to support up to 8 bones per command list, so meshes/command lists will no doubt be broken up for characters that use more than 8 bones. Now we just have to figure out the bone data. Each bone should be a 3x4 (or 4x3) matrix. From the emulator source, I believe it has to be floating point, but I may be missing a part where it does conversion.
Edit 2: I've verified that these vertex blocks are indeed a match to the PSP GPU formats and are decoded in the same way hardware decodes them. For "type 1" vertex blocks, there is a DWORD of value "0x00000723" usually 1 DWORD before the actual number of vertices. This value means:
int tc = 3; //2 floats
int col = 0; //no color
int nrm = 1; //x, y, z in bytes
int pos = 2; //3 shorts
int weighttype = 3; //float
int idx = 0; //none
int morphcount = 0;
int nweights = 0;
DWORD fmt = 0;
fmt |= tc;
fmt |= (col<<2);
fmt |= (nrm<<5);
fmt |= (pos<<7);
fmt |= (weighttype<<9);
fmt |= (idx<<11);
fmt |= (morphcount<<18);
fmt |= (nweights<<14);
//(fmt == 0x00000723)
When you get the sizes and offsets of the various components with that vertex type DWORD, padding and aligning along the way according to hardware spec, you get 24 byte vertex blocks with correct offsets to uv, position, and normal. Now that we have this knowledge, no matter what "type" a vertex is, we can know how it should be indexed (if at all), and where to get the essentials to render, such as position and UV. Now the only thing left to do is to figure out a reliable way to get those GPU control codes out between vertex batches. There is no reliable offset to the next control code/vertex number, from what I can find, and I'm not sure what the other bytes in between chunks mean yet.