Random guess, the GPU driver is uploading an entire texture for each CLUT, because it doesn't know which pixel region is correct for which CLUT entry (since that is determined by the mesh, in which uv segment is drawn with which CLUT index, and that's not something the driver can really determine itself), and it can't rely on native hardware paletted texture support.
If I'm right, in order to end up with your final desired texture map, you would have to find all of the texture maps for all of the CLUT's, and figure out which one contains the correct color for which region, and combine them by hand all into one. However, that task would be made much easier by extracting the corresponding model, as the ripper program should index each draw call (mesh chunk) to the appropriate in-memory texture.