Author Topic: That #@&!! linux segfault is still there ;_;  (Read 9524 times)

halkun

  • Global moderator
  • *
  • Posts: 2097
  • NicoNico :)
    • View Profile
    • Q-Gears Homepage
That #@&!! linux segfault is still there ;_;
« on: 2006-05-10 02:48:17 »
Updated code and still no joy with linux. It still segfaults on the draw command.

Here's my log info on my version of OpenGL

[03:00:37] SDL init begin
[03:00:37] SDL_INIT_VIDEO initialized
[03:00:37] SDL video mode set 640x480, 16 bit SDL_OPENGLBLIT | SDL_HWSURFACE
[03:00:37] Got 32 bpp (8888), 24 depth, 8 stencil
[03:00:37] SDL init complete
[03:00:37] OGL Vendor: Tungsten Graphics, Inc
[03:00:37] OGL Renderer: Mesa DRI Intel(R) 852GM/855GM 20021115 x86/MMX+/SSE2
[03:00:37] OGL Version: 1.2 Mesa 5.0.2

Mind you I'm using the default DRI mesa, which is not thread safe. Is this an issue? What pointers can you give me to track down why (*i) is going into la-la-land after the splash screen is cleared?

Code: [Select]
void ScreenManager::Draw()
{
    DISPLAY->BeginFrame();
/*
    for (int i = 0; i < mScreens.size(); ++i)
    {
        mScreens[i]->Draw();
    }
*/
    for (std::vector<Screen*>::iterator i = mScreens.begin(); i != mScreens.end(); ++i)
    {
        (*i)->Draw();    //Segfaults here!
    }

    DISPLAY->EndFrame();
}

What is *i pointing to? Where did it get it from? I can't trace back becuase I don't know where in the code *i was created and manipulated. Maybe some comments will help. is *I supposed to be the SDL surface? if you ++i, isn't that just moving up one memory address and not a whole structure, (object).. I don't get this.

Also, could you put the increment operater after the variable if it's not critical to have? That just looks wierd.


+++++++++++++
EDIT
+++++++++++++
On closer look it would seem that std::vector<Screen*> is going to lunch. Is there a way to test to see if this is holding a rational value (or *i for that matter) eariler on in the code?
« Last Edit: 2006-05-10 03:01:54 by halkun »

Akari

  • Moderator
  • *
  • Posts: 766
    • View Profile
Re: That #@&!! linux segfault is still there ;_;
« Reply #1 on: 2006-05-10 16:51:14 »
Quote
Mind you I'm using the default DRI mesa, which is not thread safe. Is this an issue? What pointers can you give me to track down why (*i) is going into la-la-land after the splash screen is cleared?

I dont think that this is a reason, because we use single thread right now.
Try check if (*i) is NULL. If this is a reason (this is the most possible way) then we must understand where it set to NULL.

How Screen classes work.

First, when we create ScreenManager object constructor add SplashScreen to vector of Screens

    ScreenSplash* screen = new ScreenSplash();
    PushScreen(screen);

This vector is drawing correctly

Next thing happens after we press space. In the input method of SplashScreen we asq screen manager to delete SplashScreen and add new Screen.

            SCREENMAN->PopTopScreen();
            ScreenDisplayTest* screen = new ScreenDisplayTest();
            SCREENMAN->PushScreen(screen);

I dont see any errors here. Pointer to screen just added to vector. It must be creared otherwise it crushed.
PopTopScreen add pointer of SplashScreen to mScreensToDelete vector - vector of screens that should be deleted at the next Update of screenmanager. It also remove pointer from usual screens vector. (Posible error - mScreens.pop_back() doesn't remove pointer)

    if (mScreens.size() > 0)
    {
        mScreensToDelete.push_back(mScreens[mScreens.size() - 1]);
        mScreens.pop_back();
    }

At the Update we jusr delete all pointers and clear the vector of screens that should be deleted.

    for (int i = 0; i < mScreensToDelete.size(); ++i)
    {
        delete mScreensToDelete;
    }
    mScreensToDelete.clear();

Thats all. I see only one posible error here, but still don't know why this happened.

Quote
Also, could you put the increment operater after the variable if it's not critical to have? That just looks wierd.

I can't and there is reason. postfix increment and decrement orerators create temporary object that should be returned, but prefix operators not, so we all must use prefix operators if there is no reason. Look for example this http://www.devx.com/tips/Tip/12634, this is the first article that I found in google.

Cyberman

  • *
  • Posts: 1572
    • View Profile
Re: That #@&!! linux segfault is still there ;_;
« Reply #2 on: 2006-05-11 05:13:18 »
At the Update we jusr delete all pointers and clear the vector of screens that should be deleted.

    for (int i = 0; i < mScreensToDelete.size(); ++i)
    {
        delete mScreensToDelete;
    }
    mScreensToDelete.clear();

Thats all. I see only one posible error here, but still don't know why this happened.
Akari ++i is a possible source of VERY bad things.
What will happen is I will reach mScreensToDelete.size() and I'm POSITIVE that will be a problem :)
Actually this whole bit of code doesn't look right.
I would check the behavior when you delete a screen in mScreensToDelete.
Generally what will happen here (and you should NEVER use a function for the termination this way really), is as it shrinks and I grows you'll end up reaching a point you are deleting things off in Limbo. (another likely source of the error).

A potentially safe way to do this ( it also eliminates an integer variable ;) )
while(mScreensToDelete.size() > 0)
{
  delete mScreensToDelete[0];
}

That should delete the first element and shift all the screens unless mScreensToDelete is a semi static array that does not automatically shift everything down one position.  At the very least change ++i (which is a preincrement and in this case a definate source for errors) to i++

If this doesn't help.. well DANG!  :cry:

Cyb

Akari

  • Moderator
  • *
  • Posts: 766
    • View Profile
Re: That #@&!! linux segfault is still there ;_;
« Reply #3 on: 2006-05-11 07:50:47 »
Akari ++i is a possible source of VERY bad things.
What will happen is I will reach mScreensToDelete.size() and I'm POSITIVE that will be a problem :)
Actually this whole bit of code doesn't look right.
I would check the behavior when you delete a screen in mScreensToDelete.
Generally what will happen here (and you should NEVER use a function for the termination this way really), is as it shrinks and I grows you'll end up reaching a point you are deleting things off in Limbo. (another likely source of the error).

A potentially safe way to do this ( it also eliminates an integer variable ;) )
while(mScreensToDelete.size() > 0)
{
  delete mScreensToDelete[0];
}

That should delete the first element and shift all the screens unless mScreensToDelete is a semi static array that does not automatically shift everything down one position.  At the very least change ++i (which is a preincrement and in this case a definate source for errors) to i++

If this doesn't help.. well DANG!  :cry:

Cyb

There is three recomended way to handles vector
1) for (int i = 0; i < mScreens.size(); ++i) - array like
2) for (std::vector<Screen*>::iterator i = mScreens.begin(); i != mScreens.end(); ++i) - through iterators (safe way to all standart containers)
3) using algoritms, for example for_each(InputIterator first,InputIterator last, UnaryFunction f);

This won't do anything, just delete first pointer BUT doesn't remove element from std::vector
while(mScreensToDelete.size() > 0)
{
  delete mScreensToDelete[0];
}

Read standart library documentation after all!!!

mScreensToDelete.clear(); - is the fastest way to clear vector.

++i and i++ are identical in this case, only difference is i++ creates temporary object and then deletes it but ++i does not. Why do you need temporary object???


First of all read this http://www.sgi.com/tech/stl/
This is vector docs http://www.sgi.com/tech/stl/Vector.html

Cyberman

  • *
  • Posts: 1572
    • View Profile
Re: That #@&!! linux segfault is still there ;_;
« Reply #4 on: 2006-05-12 05:42:17 »
First you need to be careful about using some STL objects.
I don't believe all STL objects port well to embeded platforms.
Think PDA's and winPC's.

I'm going to be working on a Palm Cobalt OS port (regular palm OS is not going to be worth it due to platform limitations).

I due believe there is a difference between a post and a preincrement operator there :D
a post increment operator used in a for loop increments after the loop is executed.
a pre increment operator increments BEFORE the loop is executed this may be where the error is coming from.
in psuedo code think of it this way

for(i = 0; i < end; i++)
{ D=0;
}

becomes (Psuedo code wise)
DEF I,EBX
 XOR I, I
 XOR BL,BL
LOOP:
 mov [SI + I],BL
 INC I
 CMP I, 124
 JL LOOP


for(i = 0; i < end; ++i)
{ D=0;
}

becomes (Psuedo code wise)
DEF I,EBX
 XOR I, I
 XOR BL,BL
LOOP:
 INC I
 mov [SI + I],BL
 CMP I, 124
 JL LOOP

These are quite different as I will always be 1 greater if a preincrement operator is used then if a post increment operator is used  as it iterates through the array.


Hmmm although probably it won't cause it to have an out of bounds error likely that I was worried about :D

Perhaps using the iterator or the clear() if that allows all objects referenced from the container class to be properly deleted.
The iterator should be safe no matter what. :D

Cyb

dziugo

  • *
  • Posts: 1470
    • View Profile
    • A new copy of FF7 thanks to Salk. Pack (zip/rar/etc) your saved game before sending it to me.
Re: That #@&!! linux segfault is still there ;_;
« Reply #5 on: 2006-05-12 06:02:19 »
Code: [Select]
    int i;
    for(i = 0; i < 10; i++){
        cout << i << endl;
    }
    cout << endl;
    for(i = 0; i < 10; ++i){
        cout << i << endl;
    }
Run it...

halkun

  • Global moderator
  • *
  • Posts: 2097
  • NicoNico :)
    • View Profile
    • Q-Gears Homepage
Re: That #@&!! linux segfault is still there ;_;
« Reply #6 on: 2006-05-12 06:06:48 »
Here's something I got in my email today....

Quote
First off, just wanted to say that I really hope this project gets off the ground. I'm a big fan of tearing into things to check out the inner workings and I can tell you are as well.

On the forum, you posted:

    void ScreenManager::Draw()
    {
        DISPLAY->BeginFrame();
    /*
        for (int i = 0; i < mScreens.size(); ++i)
        {
            mScreens->Draw();
        }
    */
        for (std::vector<Screen*>::iterator i = mScreens.begin(); i != mScreens.end(); ++i)
        {
            (*i)->Draw();    //Segfaults here!
        }

        DISPLAY->EndFrame();
    }


There are a couple things that could cause that line to crash. I don't have the full code in front of me so I'll just list them and leave it to you to ignore anything that isn't applicable.

- if "ScreenManager::Draw()" is a non-virtual function, and "DISPLAY" is not a member variable, the "this" pointer for the function call could be null or otherwise invalid. I'd consider this unlikely because then I would expect it to crash initializing "i" during the "mScreens.begin()" call.

- *i evaluates to null or an otherwise invalid pointer. The iterator itself can be entirely valid but iterating over contents that are invalid for your purposes. How are you adding and removing Screen* objects from the vector? Is it possible that you're deleting or corrupting a Screen object somewhere but never removing it from the vector? Something like the following should definitely crash:

{
  // Allocate a new object.
  Screen *screenObject = new Screen(...);

  // Insert the new object into the list of screens.
  mScreens.push_back( screenObject );

  // Delete the memory that we just allocated. Note that after we do this
  // there is still one element in mScreens, pointing to where there *used*
  // to be a Screen.
  delete screenObject;

  // This will then crash as we call a function on a dead pointer.
  for (std::vector<Screen*>::iterator i = mScreens.begin(); i != mScreens.end(); ++i)
    (*i)->Draw();
}

- Draw() is a virtual function on the Screen class and the virtual table on the object in question is hosed somehow. I'd consider this unlikely unless you're doing strange pointer arithmetic or using C-style memset() in funny ways, but it's possible.

I'd consider the above the most likely possibilities. Any help?

    Also, could you put the increment operater after the variable if it's not critical to have? That just looks wierd.


Yeah. The only difference is that the code internally has to store off a temporary value if you do a post-increment (i++). The difference boils down to something like this, in made-up pseudocode:

function ++i ( int i )
{
  return i + 1;
}

function i++ ( int i )
{
  int old_i = i;
  i = i + 1;
  return old_i;
}

Neither you nor the expressions you're using it in care about the return value and so the compiler is almost definitely smart enough to optimize them out to be equivalent.


I still think the display code is overly complex for what we need it to do. Can someone tell me why a simple opengl double-buffer isn't good enough?

Akari

  • Moderator
  • *
  • Posts: 766
    • View Profile
Re: That #@&!! linux segfault is still there ;_;
« Reply #7 on: 2006-05-12 16:59:43 »
Halkun, try this.
http://server.titansoft.ru/akari/screenmanager.cpp
and check log file after executing.

Quote
Can someone tell me why a simple opengl double-buffer isn't good enough?

We use simple opengl double-buffer. We just incapsulate it into Display class. If you check functions content - there are usual glDrawArrays in there =)

halkun

  • Global moderator
  • *
  • Posts: 2097
  • NicoNico :)
    • View Profile
    • Q-Gears Homepage
Re: That #@&!! linux segfault is still there ;_;
« Reply #8 on: 2006-05-13 03:36:54 »
from game.log

Quote
[03:36:37] GLU Version: 1.3
[03:36:37] Set SDL input handler.
[03:36:38] Size of mScreens 0

Then a segfault

Does that help?

##############
EDIT
##############

I made a teeny change to the code...
Here's what I added.

Quote
ScreenManager::Draw()
{
    DISPLAY->BeginFrame();
                                                                               
    for (int i = 0; i < mScreens.size(); ++i)
    {
         LOGGER->Log("ERROR: Loop in"); //added this to see if simply refrencing the pointer causes it to nuke
        if (mScreens != NULL)
        {
            LOGGER->Log("ERROR: ready to draw");  //Added this too to, just to see if I enter the loop.
                                                                               
            mScreens->Draw();
        }
        else
        {
            LOGGER->Log("ERROR: Here it is... - NULL pointer.");
        }
    }
                                                                               
    DISPLAY->EndFrame();
}



Quote
[03:40:35] ERROR: ready to draw
[03:40:35] ERROR: Loop in
[03:40:35] ERROR: ready to draw
[03:40:35] ERROR: Loop in
[03:40:35] ERROR: ready to draw
[03:40:35] Size of mScreens 0
[03:40:35] ERROR: Loop in
[03:40:35] ERROR: ready to draw
*SEGFAULT*
« Last Edit: 2006-05-13 03:44:04 by halkun »

Akari

  • Moderator
  • *
  • Posts: 766
    • View Profile
Re: That #@&!! linux segfault is still there ;_;
« Reply #9 on: 2006-05-13 06:09:32 »
Then we just have easy error in

void
ScreenDisplayTest::Draw()
{
    DISPLAY->SetPointSize(3);
    DISPLAY->DrawPoints(mPoints);
    DISPLAY->SetLineWidth(1);
    DISPLAY->DrawLines(mPoints);

    DISPLAY->PushMatrix();
    DISPLAY->RotateX(mRotation);
    DISPLAY->DrawTotalGeometry(mPyramid);
    DISPLAY->PopMatrix();
    ++mRotation;

    DISPLAY->TextureTranslate(0.01f, 0.0f);
    DISPLAY->SetTexture(mTexId);
    DISPLAY->DrawQuads(mQuadsTex);
    DISPLAY->UnsetTexture();
}

I has the same error but I fix it for me, as I see not compleatly. Try to comment some of rows in this method in DisplayTest.cpp

I has error here, check this first.
    DISPLAY->PushMatrix();
    DISPLAY->RotateX(mRotation);
    DISPLAY->DrawTotalGeometry(mPyramid);
    DISPLAY->PopMatrix();
    ++mRotation;

Just as I thought its not a matter of ScreenManager after all.

Qhimm

  • Founder
  • *
  • Posts: 1996
    • View Profile
    • Qhimm.com
Re: That #@&!! linux segfault is still there ;_;
« Reply #10 on: 2006-05-13 14:42:03 »
Just to make it perfectly clear to anyone else reading this thread, Cyberman's theory about pre- vs. post-increment operators inside for loops is dead wrong, as is his description of how for loops work. Let no one be confused about this, run dziugo's code if you like.

Refresher course: A for loop like
Code: [Select]
for (int i = 0; i < 10; i++) { D[i] = 0; }is equal to
Code: [Select]
   int i = 0;
loop_begin:
   if (!(i < 10))
      goto loop_end;
   D[i] = 0;
   i++;
   goto loop_begin;
loop_end:
This is perfectly defined by the standard, and obviously it does not matter the tiniest bit whether you use i++ or ++i, as they are conceptually executed as a standalone statement, not as part of some "loop expression". The only difference you need to consider is whether i++ is less efficient, as complex types need to create temporary objects when using post-increment.

Side note: Actually, most x86 compilers are likely to compile the for loop as
Code: [Select]
   int i = 0;
   goto loop_predicate;
loop_increment:
   i++;
loop_predicate:
   if (!(i < 10))
      goto loop_end;
   D[i] = 0;
   goto loop_increment;
loop_end:
This rearrangement is, IIRC, faster for an x86 CPU to execute due to some caching/branch prediction heuristics. But the same applies, i++ and ++i are equal.

If you use post-increment in the loop predicate, however, it's of course a different matter. Like this:
Code: [Select]
for (int i = 0; i++ < 10; ) { D[i] = 0; }But you don't, and you suddenly begin to see why there is even a third sub-statement in the for loop declaration in the first place. Both are executed once every iteration, but one is evaluated before the loop contents and the other is executed after the contents, thus you ensure that the loop predicate is valid throughout the entire iteration, and only modify things after all inside code is done. Easy as π.

Cyberman

  • *
  • Posts: 1572
    • View Profile
Re: That #@&!! linux segfault is still there ;_;
« Reply #11 on: 2006-05-13 19:23:24 »
Good to hear Qhimm  :lol:

Anyhow I chalk it up to staying up to late .. lately :D

At least it's now known. (and it explains the ; partitioning in the for loop construct).

Cyb

andrewdodd13

  • Guest
Re: That #@&!! linux segfault is still there ;_;
« Reply #12 on: 2006-05-16 15:03:32 »
This problem may not be code related at all. I got it yesterday and was kind of perplexed, but then I realised I was running it as root (yes, dangerous, but its in a VM so it doesn't matter much).

Basically, the desktop was logged in as "andrew", but the console doing a ./qgears was root, so I got:
Code: [Select]
Xlib: connection ":0.0" refused by server
Xlib: Invalid MIT-MAGIC-COOKIE-1 key
Segmentation Fault

So, logging out as root and opening the console as "andrew" works fine...

To tie it up, basically I don't think that your X-Server is allowing connection to display anything.

Edit: You could also try sudo ./qgears ...

halkun

  • Global moderator
  • *
  • Posts: 2097
  • NicoNico :)
    • View Profile
    • Q-Gears Homepage
Re: That #@&!! linux segfault is still there ;_;
« Reply #13 on: 2006-05-18 17:42:58 »
Good news...

Well relly not good news, but anyway.

In the latest commits to the source, I'm getting something new. It seems that the code is now throwing exceptions and pumping them to stderr. I have about 15 invalid pointer warnings before the spash screen. Pressing space causes two more and then the segfault.

The bad code is most definatly in the display manager. I'll post the output when I get home.

halkun

  • Global moderator
  • *
  • Posts: 2097
  • NicoNico :)
    • View Profile
    • Q-Gears Homepage
Re: That #@&!! linux segfault is still there ;_;
« Reply #14 on: 2006-05-19 03:18:18 »
Double post, but I have some good news! I found the segfault! I now can see the little triangle thing spin around the line thingys......

Ok, here's the problem. First I had to mod the makefile to include debug information. (-g switch) It's a good idea to probably put that into the windows makefile too.

I grabbed ddd and now have a real debugging platfrom. The execptions that were being thrown were a false alarm. It was my OpenGL testing for sse and throwing a math execption when it failed. (This is caught by mesa, nothing to see here, move along)

Now, I stepped through the display code and now understand it more seeing in action and in slow motion. I also found the null pointer, and when I comment out the offending code, it runs beautifuly. Well, save for the part that's missing.

The bad pointer is located in displaytest.cpp, line 108-125. More percisely, line 112

Quote
//    Surface* texture = SurfaceUtils::LoadFile("data/test_tex.bmp");
    BinGZipFile* file = new BinGZipFile("data/WINDOW.BIN");
    File* data1 = file->ExtractGZip(0);
    delete file;
    TimFile* image = new TimFile(data1);        //<------------Right here! This returns a null pointer.... Reason below!!!
    delete data1;
    Surface* texture = image->GetSurface(7);
                                                                               
    if (texture != NULL)
    {
        mTexId = DISPLAY->CreateTexture(texture);
        delete texture;
    }
    else
    {
        LOGGER->Log("Can't load texture.");
    }

When Timfile is "newed" the following is executed

Quote
TimFile::TimFile(File *file):   //<-----BAD POINTER   *file==NULL
    File(file)     //<---- It's null here too! Guess what happens. Here's a hint, rhymes with 'fegfault'
{
    InnerGetNumberOfClut();
}

Here's my debug output

Quote
(gdb) print *file
Cannot access memory at address 0x0
(gdb)

I don't know how to fix this but commenting out lines 108-125 in displaytest.cpp results in a good compile and execution, but no textures show up (of course)

looking a bit further.... The file method....

Quote
File::File(File *file):
    mBufferSize(file->GetFileSize()),      //<--- where is the sanity check? How do we know *file is good?
    mFileName(file->GetFileName())
{
    mBuffer = (u8 *)malloc(sizeof(u8) * mBufferSize);
    file->GetFileBuffer(mBuffer, 0, mBufferSize);
}


Can someone fix this?  ^_^
« Last Edit: 2006-05-19 03:21:16 by halkun »

Akari

  • Moderator
  • *
  • Posts: 766
    • View Profile
Re: That #@&!! linux segfault is still there ;_;
« Reply #15 on: 2006-05-20 20:29:57 »
I add "assert(file != NULL);".
This will terminate application as soon as it find this =)
I don't know if this is good idea, maybe it's better to throw exeption....  :? But it needs to be handled somewhere later.