Hahaha.
Nice idea.
I'd also be able to volunteer to voice given a timetable to work with. My voice is not a deep one either and I'm by no means professional but I could gladly do NPC's and such or whatever you guys would see fit.
Warning though, I speak English perfectly but I do have an accent as I'm not a native English speaker.
But before getting into all of this, I think we should ask ourselves this question : would an amateur speech patch (even if some professionals are on board, not everyone would be) actually improve the game or make it worse ? As for myself, wrong deliveries, wrong emotions in a given scene and wrong casting totally takes me out of a game. The first Grandia and Tales of Eternia are good examples of this. I'm asking myself if these two games would have been better off without voice overs as an example.
As for the technical side of it, in order to reduce the size, MP3's would be a logical choice and ff7music could most likely be used to handle the playback if Ficedula agrees with it as, from what I understand, it already can intercept the calls the game makes and start its own sound engine on cue so I think it would be the easiest way to make this work. It would need some modifying obviously though.
Going on a limb here as I'm no programmer but I would think that the game engine should make a call referencing to a certain part of the game script in order to pull the right text (in scene.bin if I'm not mistaken) so it would be likely that these calls could be intercepted to start the right sound bite through whatever program would be used for playback without modifying ff7.exe at all (Ã la ff7music). The trick is going to be to find out what are the actual calls the program makes for text and interpret them to actual events (so in other words, find out how the game ID's event to show the right text so the speech player could intercept those and start the proper MP3). A mapping of the text might already exist though since I know people have been playing with the dialog before for various patches. There might also be info somewhere as to how the game fetches text.
As such, using ff7music could be beneficial to use as it already starts ff7 in a way to intercept its calls, already has a list system to associate sound files to events in the game and already has a sound output system. Obviously, this is Ficedula's program so he would have to agree (and most likely lend a hand to the programmers understanding his code) so it might not be possible to use his program but that would be the #1 solution in my book (or a similar program if not possible ff7music).
Another issue would be that the text is hardcoded into the FMV's so for these it would actually be necessary to code the calls for the speech into the game. That's the only part I see that actually requires modifying of the game. Unless, again, the calls made by the game to start a fmv can be intercepted to determine what fmv it is starting and put in a timer to play the required speech file. That implementation could easily be off-cue though so that might not be the best way to go at it.
Overall, I'm thinking the big parts could just be to add a layer on top so it should not be impossible. Obviously, I'm no programmer and I could be totally in left field on this as this is just educated guesses from what I've read over the years, so if someone more informed reads this, feel free to correct me.