14 Aug, 2010, ATT_Turan wrote in the 1st comment:
Votes: 0
So, this is most likely going to end up with me having done something obviously stupid, but as a musician I'm used to publicly humiliating myself - so here goes!

I recently installed Davion's event handler snippet into my MUD:
Davion's event handler

The basic engine seems to work fine, I can use the included testevent command as often as I like and it works perfectly as expected.

The first event of my own I tried to make is one that will cause a player to move through a series of rooms - I figured this would be useful to make things that can no longer fly fall out of the air, have people uppercutting each other into the sky, smashing folks through rooms, good mayhem. The associated code for all that is here:

Moving event code

The problem is rather specific: Anytime I use testknock to move someone out of the room, it's causing some sort of memory error and segfaulting the MUD (the cores show it crashing in alloc_mem). I can use testknock fine, just so long as the character does not leave the room - that is, if there's no door to the west, I can knock them west and they will correctly slam into the wall and life goes on.

If, however, I knock them east, the command and events will properly execute - they soar off to the east before either hitting a wall or landing, as they should. At some point (could be in the middle of that, could be a few moments later), the MUD will crash. I'm really not sure what I might be doing wrong…the string manipulation inside the movement event is fairly simple, so I don't think I'm borking that, but something is bad in there and I don't see it. Any opinions are gratefully accepted.

P.S. I know there's a way to make the links for the code repository and pastebin have nice description windows, but I don't see a tag for that…
14 Aug, 2010, JohnnyStarr wrote in the 2nd comment:
Votes: 0
have you ran gdb and break pointed the line that calls alloc_mem? It could be that an argument is passing a bad
pointer or something of that sort.
14 Aug, 2010, ATT_Turan wrote in the 3rd comment:
Votes: 0
It's not consistently from any one point in the code, it seems to be the next time an object/mob/some such thing is created after this code is executed, indicating a more general memory corruption of some sort. I have tried running the MUD through gdb but was unenlightened…I will tool around with that more this evening.
14 Aug, 2010, David Haley wrote in the 4th comment:
Votes: 0
valgrind is much better than gdb for diagnosing memory problems, although it's sometimes a little hard to parse its output.
14 Aug, 2010, ATT_Turan wrote in the 5th comment:
Votes: 0
I've read this, but unfortunately the server I use appears to not have it installed (or not given me permissions or something). I'll hold out some small hope that something will pop visually to someone on here, but I'll also e-mail them about valgrind and find a tutorial someplace.
15 Aug, 2010, David Haley wrote in the 6th comment:
Votes: 0
Hmm, that's too bad. A very quick cursory reading didn't reveal any obvious issues. What server do you use? As far as I know valgrind is fairly standard, so if you could convince the admin to install it that would be ideal. Or, perhaps you could convince Zeno (or another free host) to give you an account temporarily so that you can run valgrind from there.
15 Aug, 2010, Oliver wrote in the 7th comment:
Votes: 0
Haha.

I had this problem with Davion's Event Handler just yesterday, actually, and I solved the problem.

It isn't actually a problem with Davion's code itself, though. I don't have the time to look over your code, unfortunately– if you can't solve it in a while, I'll try to pick through it to see if the same problem is being created somewhere in yours that was in mine.

For the mean time, I'll explain why it was happening on my game.

I created an event called EVENT_PURGE. It purges/extracts the object or character. I was using it to put in a little bit of code involving players being able to burn things in campfires; I had it set so that in three minutes, the object would echo burning completely. I had it set so that in three minutes, the object would extract itself.

When the code ran, it extracted the object but I didn't see an echo. The next time I tried to initiate an event, the game segfaulted showing an error in alloc_mem. After a little digging in GDB, though, I found that the problem was actually this:

The event handler was purging the object before the echo. It ended up meaning that all sorts of things were pointing to all sorts of null data and by the next time the event_free() list was called, Mister Code wasn't happy.

I'd take a list at how you're doing things and then make sure that you're not purging anything that shouldn't be purged. Or maybe it has something to do with the fact that you're moving people out of rooms (and possibly something is pointing to a null room later).
15 Aug, 2010, Davion wrote in the 8th comment:
Votes: 0
It kinda looks like that no matter what happens, even if they hit a wall, the event to move them is still applied. Maybe after they hit the wall, you should return?

I see that sneaky return :P. Is the event getting removed correctly? Next time it crashes, try to grab the character it's attached to and examine the event list.
15 Aug, 2010, ATT_Turan wrote in the 9th comment:
Votes: 0
I'm not removing any characters or changing any room pointers, so there shouldn't be any possibility of pointing to invalid memory that way. I see no reason for the free_event function to not work as properly as it does with the testevent command, but I'll put some logging in there to make sure it's getting called. I'm also going to try commenting out all of the code that creates any strings to see if I can narrow the problem down at all.
15 Aug, 2010, Davion wrote in the 10th comment:
Votes: 0
Wanna show your modifications to execute_char_event?
15 Aug, 2010, ATT_Turan wrote in the 11th comment:
Votes: 0
Added to the pastefile, and here:

void execute_char_event(CHAR_DATA * ch, EVENT_DATA * event)
{ switch(event->event )
{ case EVENT_PRINT:
send_to_char(event->string,ch);
break;
case EVENT_MOVE:
event_move(ch, NULL, event->string);
break;
default:
LOG("Execute_event: Bad event type" );
break;
}
free_event(ch, NULL, event);
return;
}
15 Aug, 2010, ATT_Turan wrote in the 12th comment:
Votes: 0
So the folks installed valgrind for me, and I see what looks like could be a bad pointer in there, but I don't see where it tells me where it was declared in the code. Are there other flags I should use, or commands within the software I can use? Or an awesome valgrind tutorial anyone recommends (Googling for such comes up with things that seem either very basic, which allowed me to run my game with it, or fairly complex).

valgrind –tool=memcheck ../src/merc
==19521== Memcheck, a memory error detector
==19521== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==19521== Using Valgrind-3.5.0 and LibVEX; rerun with -h for copyright info
==19521== Command: ../src/merc
==19521==
Sun Aug 15 11:53:05 2010 :: IMC: Loading all the IMC stuff we don't care about
==19521== Syscall param socketcall.setsockopt(optval) points to uninitialised byte(s)
==19521== at 0x40008D2: ??? (in /lib/ld-2.10.1.so)
==19521== by 0x809D57C: main (comm.c:461)
==19521== Address 0xbedc821c is on thread 1's stack
==19521==
Sun Aug 15 11:53:04 2010 :: MOBProgs/407.prg : db.c, 3266
Sun Aug 15 11:53:04 2010 :: MOBProgs/600.prg : db.c, 3266
Sun Aug 15 11:53:04 2010 :: MOBProgs/601.prg : db.c, 3266
Sun Aug 15 11:53:04 2010 :: MOBProgs/602.prg : db.c, 3266
Sun Aug 15 11:53:04 2010 :: MOBProgs/604.prg : db.c, 3266
Sun Aug 15 11:53:04 2010 :: MOBProgs/605.prg : db.c, 3266
Sun Aug 15 11:53:04 2010 :: MOBProgs/606.prg : db.c, 3266
Sun Aug 15 11:53:04 2010 :: MOBProgs/607.prg : db.c, 3266
Sun Aug 15 11:53:04 2010 :: God Wars is ready to rock on port 3040. : comm.c, 466
Sun Aug 15 11:53:07 2010 :: IMC: Standard Authentication completed.
Sun Aug 15 11:53:14 2010 :: Sock.sinaddr: blah : comm.c, 1088
Sun Aug 15 11:53:16 2010 :: Turan trying to connect. : comm.c, 1763
Sun Aug 15 11:53:18 2010 :: Turan@blah has connected. : comm.c, 1899
Sun Aug 15 11:53:25 2010 :: Log Turan: mload 608 : interp.c, 3464
Sun Aug 15 11:53:52 2010 :: Log Turan: resetarea : interp.c, 3464
Sun Aug 15 11:54:27 2010 :: Log Turan: force imp testknock turan east : interp.c, 3464
==19521== Invalid read of size 4
==19521== at 0x809EBC6: alloc_mem (db.c:2664)
==19521== by 0x8070D35: show_list_to_char (act_info.c:277)
==19521== by 0x8071BCB: do_look (act_info.c:1053)
==19521== by 0x80BE531: interpret (interp.c:3547)
==19521== by 0x809D01B: game_loop_unix (comm.c:899)
==19521== by 0x809D5D4: main (comm.c:467)
==19521== Address 0x61727554 is not stack'd, malloc'd or (recently) free'd
==19521==
==19521==
==19521== Process terminating with default action of signal 11 (SIGSEGV)
==19521== Access not within mapped region at address 0x61727554
==19521== at 0x809EBC6: alloc_mem (db.c:2664)
==19521== by 0x8070D35: show_list_to_char (act_info.c:277)
==19521== by 0x8071BCB: do_look (act_info.c:1053)
==19521== by 0x80BE531: interpret (interp.c:3547)
==19521== by 0x809D01B: game_loop_unix (comm.c:899)
==19521== by 0x809D5D4: main (comm.c:467)
==19521== If you believe this happened as a result of a stack
==19521== overflow in your program's main thread (unlikely but
==19521== possible), you can try to increase the size of the
==19521== main thread stack using the –main-stacksize= flag.
==19521== The main thread stack size used in this run was 8388608.
==19521==
==19521== HEAP SUMMARY:
==19521== in use at exit: 2,713,689 bytes in 1,899 blocks
==19521== total heap usage: 2,997 allocs, 1,098 frees, 4,423,117 bytes allocated
==19521==
==19521== LEAK SUMMARY:
==19521== definitely lost: 4 bytes in 4 blocks
==19521== indirectly lost: 0 bytes in 0 blocks
==19521== possibly lost: 131,072 bytes in 1 blocks
==19521== still reachable: 2,582,613 bytes in 1,894 blocks
==19521== suppressed: 0 bytes in 0 blocks
==19521== Rerun with –leak-check=full to see details of leaked memory
==19521==
==19521== For counts of detected and suppressed errors, rerun with: -v
==19521== Use –track-origins=yes to see where uninitialised values come from
==19521== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 21 from 20)
Segmentation fault
15 Aug, 2010, David Haley wrote in the 13th comment:
Votes: 0
I wonder if this might actually be a stack overflow like valgrind thinks… it is unlikely, but if the stack was only 8MB big, the 5 MSL strings might be enough, especially since you're calling this function recursively.

Try setting your strings to be of size 512 and see what happens.
15 Aug, 2010, ATT_Turan wrote in the 14th comment:
Votes: 0
I reduced the size of all the strings to 200 or less, got a fairly identical report from valgrind. Will running it with any of those suggested flags help? I'll try the leak-check thing to see what it says… *grumble*

Also, it's not quite recursively - the function can re-apply another event of the same type to the character/object, but the function itself will never be running twice at the same time.
16 Aug, 2010, David Haley wrote in the 15th comment:
Votes: 0
Well, it's also telling you exactly which piece of memory it's unable to read, and that would give a hint as to what you need to look at. What is the code like around the read failure?
16 Aug, 2010, ATT_Turan wrote in the 16th comment:
Votes: 0
You mean around the line from alloc_mem? I've never touched that, so I presume it exists much as it does in stock Rom/Diku/whatever.
if ( rgFreeList[iList] == NULL )
{
pMem = alloc_perm( rgSizeList[iList] );
}
else
{
pMem = rgFreeList[iList];
rgFreeList[iList] = * ((void **) rgFreeList[iList]); // line 2664
}


How does the memory address it's unable to read help me? Is there some command I can use to see what variable used to inhabit that memory? *scurries back to valgrind tutorials*
16 Aug, 2010, David Haley wrote in the 17th comment:
Votes: 0
Well, show_list_to_char would be more useful; we can treat the memory module as a "system library" and hope that it actually works. My guess is that something has been incorrectly freed, or freed twice somehow. That is why the memory address it's unable to read is helpful: it will tell you which part of memory got corrupted.

Normally valgrind can tell you that you're trying to access a pointer that was freed somewhere else. In this case, since it looks like something is managing memory for you (what an unfortunate thing to do – I wish codebases stopped doing that :sad:) valgrind is probably not able to tell where the memory was freed.
16 Aug, 2010, ATT_Turan wrote in the 18th comment:
Votes: 0
Well, I have isolated the source of the problem, although I am still working on understanding the cause.

As an experiment, I separated the movement function entirely from the event handler - I had do_testknock call event_move directly, and I changed event_move to call itself recursively for movement through multiple rooms. I smacked my mob back and forth across the MUD for five minutes, no crash.

I then replaced the event handler only inside of do_testknock - so the original command uses an event to start the character moving, then event_move still called itself recursively. This also works with no crash, I can bash people around all I want.

This means the problem was specifically with me creating new events on the character inside of event_move. I do not know why this should be the case, but I'll spend some time tomorrow…later today…whenever, looking carefully through the functions that came with the event handler to create and free the event structures and see if I can find some sort of logical error in there.


David Haley said:
My guess is that something has been incorrectly freed, or freed twice somehow. That is why the memory address it's unable to read is helpful: it will tell you which part of memory got corrupted.


If you don't mind me asking: how? Should I use gdb inside of valgrind and step through everything, trying to see when that particular memory address is first referenced? Or is there some way to tell valgrind to scan back for other references to it within the program?
03 Sep, 2010, David Haley wrote in the 19th comment:
Votes: 0
Did you ever get this resolved?

Sorry that I missed the question:
Quote
If you don't mind me asking: how? Should I use gdb inside of valgrind and step through everything, trying to see when that particular memory address is first referenced? Or is there some way to tell valgrind to scan back for other references to it within the program?

Valgrind doesn't "step" like gdb does; you just run it and it reports stuff. Usually when it sees that you're freeing something twice, it can give you even the exact line where that piece of memory was freed.
03 Sep, 2010, Rudha wrote in the 20th comment:
Votes: 0
Try putting a break in the affected code and stepping through it with gdb; this will at least give you a better understanding of the program flow and where the error may be occuring within the code that you've isolated.

Maya/Rudha
0.0/31