Short: Crash in walk_mapping()
Date: Tue, 22 Aug 2000 11:47:19 -0300
From: Ron Dawson <rondawson@syd.eastlink.ca>
Type: Bug
State: Done - fixed in 3.2.9-dev.353
See also: b-000218, b-000207
Driver: 3.2.8


I've been working with the new 3.2.8 release and I've run into some
crashes that I thought I'd pass
back to you:

(1)  I was fixing the simul_efun.c file (adding varargs to several
functions).   If I afterwards updated
simul_efun.c the driver would crash.  Nor,ally it would be right away,
but a few times the crash could
be delayed up to 10 minutes.

(2) Once I got things mostly straigtened out, I left the test copy of
the mud running overnight.   Overnight
the driver had crashed again.   The last part of the log was:

2000.08.22 07:33:03 Ref count in freed hash mapping: 1
No program to trace.
Exit status:  139
read: EOF
Read 0, should be 9!

Shortly before the above segment were some log messages indicating that
one of our virtual objects was having troubles being cleaned up.
Unfortunately there is no time on the entry so I'm not sure how close it
was time wise to the driver crash.

Now I hadn't remembered to install the utilities (i.e. make
install-utils) last night (the older 3.2.7 release utilities would have
been installed).  I've since corrected that and am trying to reproduce
the problem.

- Ron  (Caper @ PixieMud)


Date: Tue, 22 Aug 2000 23:20:32 -0300
From: Ron Dawson <rondawson@syd.eastlink.ca>

Lars Duening wrote:

> On 22 Aug 00, at 11:47, Ron Dawson wrote:
> > I've been working with the new 3.2.8 release and I've run into some
> > crashes that I thought I'd pass
> > back to you:
> >
> > (1)  I was fixing the simul_efun.c file (adding varargs to several
> > functions).   If I afterwards updated
> > simul_efun.c the driver would crash.  Nor,ally it would be right away,
> > but a few times the crash could
> > be delayed up to 10 minutes.
> My usual questions at this point are: What system? Which mudlib? Can
> you narrow down the code causing the crash (not always possible these
> days)? And most important: do you have core dumps you can analyze?

I'm running the test mud on RedHat Linux 6.2.  The mudlib is
custom .  You could say it's a 2.4.5 lib that's been heavily hacked
since '91.

I don't have the core files from the simul_efun crashes, but I do have the
fire from the more recent crashes.  I'll take a look at them, but they
compiled with the -G flag so I might not find much.  I'll recompile with
debug options set.

> (2) Once I got things mostly straigtened out, I left the test copy of
> > the mud running overnight.   Overnight
> > the driver had crashed again.   The last part of the log was:
> >
> > 2000.08.22 07:33:03 Ref count in freed hash mapping: 1
> > No program to trace.
> > Exit status:  139
> > read: EOF
> > Read 0, should be 9!
> >
> > Shortly before the above segment were some log messages indicating that
> > one of our virtual objects was having troubles being cleaned up.
> > Unfortunately there is no time on the entry so I'm not sure how close it
> > was time wise to the driver crash.
> Hmm, that could have been related. Was there user activity in the
> testmud? If not, it would be worth trying to repeat the crash.

One of the crashes, a user was moving about.  In the other, a user was
logged on but has been idle for a couple hours by the time the crash

It looks like it may be related.  I did another test where I tried to
the problem and the driver crashed just after exeact same clean_up problem in

the virtual object.

I did another test where I kept the virtual objects unloaded and I've been up

and running for hours no with not problems.

The error seems to be happening when a function called save_me()
returns 1.   In the base inheritable (/virtual/std/basic.c):

int save_me(string str)
  if (!str) {
    if (!save_file_name) return 0;
    str = save_file_name;
  save_file_name = str;
  return unguarded(1, #'save_object, str);

This is then redefined in the inheriting program
(/virtual/room/daemons/terrain_d) where the error is logged:

save_set(string set_name, mapping set_data) {
  string save_file;
  if (!mappingp(set_data)) return 0;
  if (set_data["modified"]) {
    m_delete(set_data, "modified");
    saving_data = set_data;
    save_file = "/virtual/save/rooms/terrain_d/" + set_name;
    if (unguarded(1, #'save_object, save_file)) debug("Terrain set saved.");
    else raise_error("Terrain set not saved!\n");
save_me() {
  if (mappingp(all_data))
    walk_mapping(all_data, #'save_set);
  return 1;

The error happens at "return 1".

The more complete log of the error is:

'       clean_up' in ' virtual/std/basic.c'
('virtual/rooms/daemons/terrain_d')line 143
'         remove' in 'virtual/rooms/daemons/terrain_d.c'
('virtual/rooms/daemons/terrain_d')line 32
'        save_me' in 'virtual/rooms/daemons/terrain_d.c'
('virtual/rooms/daemons/terrain_d')line 28
2000.08.22 14:00:30 Ref count in freed hash mapping: 1
No program to trace.
2000.08.22 14:00:33 [erq] read: EOF
2000.08.22 14:00:34 [erq] Read 0, should be 9!
2000.08.22 14:00:34 [erq] Giving up.
Exit status:  139

I haven't been able to duplicate the crash manually.

> > Now I hadn't remembered to install the utilities (i.e. make
> > install-utils) last night (the older 3.2.7 release utilities would have
> > been installed).
> That should not make any difference, since the interface to the ERQ
> didn't change. But one never knows...

It didn't seem to make a difference as you said.   The crash still happened
when the virtual object clean_up choked.

- Ron

Date: Wed, 23 Aug 2000 09:13:51 -0300
From: Ron Dawson <rondawson@syd.eastlink.ca>

Lars Duening wrote:

> My usual questions at this point are: What system? Which mudlib? Can
> you narrow down the code causing the crash (not always possible these
> days)? And most important: do you have core dumps you can analyze?

Okay, I have something more definite now in terms of data from the


This GDB was configured as "i386-redhat-linux"...
Core was generated by `/home/pixie/mud/bin/parse 6969 -E500000
-m/home/pixie/mud/mudlib -Mkernel/maste'.
Program terminated with signal 8, Floating point exception.
Reading symbols from /lib/libm.so.6...done.
Reading symbols from /lib/libcrypt.so.1...done.
Reading symbols from /lib/libc.so.6...done.
Reading symbols from /lib/ld-linux.so.2...done.
Reading symbols from /lib/libnss_files.so.2...done.
#0  0x80d0d1f in fatal (fmt=0x80f6e20 "Ref count in freed hash mapping:
    at simulate.c:435
435	        *((char*)0) = 0/a;
(gdb) bt
#0  0x80d0d1f in fatal (fmt=0x80f6e20 "Ref count in freed hash mapping:
    at simulate.c:435
#1  0x80c0466 in _free_mapping (m=0x856b504) at mapping.c:478
#2  0x8074810 in free_svalue (v=0x85b6854) at interpret.c:932
#3  0x80d2f6c in remove_object (ob=0x83f0e60) at simulate.c:2006
#4  0x80d30e9 in remove_destructed_objects () at simulate.c:2060
#5  0x805264d in backend () at backend.c:368
#6  0x80be99c in main (argc=5, argv=0xbffffa34) at main.c:315


Date: 24 Aug 2000 17:17:27 -0300 (ADT)
From: Ron Dawson <rdawson@cgc.ns.ca>


I've been playing around as you suggested with the code in terrain_d.c to
isolate what causes the crash.  It's pretty mysterious.  The problem
doesn't seem to be save_object.  I've completely commented out all the
code that does the saving and it still crashes.

Here's the current function:

mapping saving_data;
static mapping all_data = ([ ]);

save_set(string set_name, mapping set_data) {
  string save_file;
  log_file("VGRID_CRASH","Entered: "+ctime()+"\n");
  if (!mappingp(set_data)) return 0;
  if (set_data["modified"]) {
    m_delete(set_data, "modified");
    saving_data = set_data;
    saving_data = set_data;
    save_file = "/virtual/save/rooms/terrain_d/" + set_name;
    if (unguarded(1, #'save_object, save_file)) debug("Terrain set
    else raise_error("Terrain set not saved!\n");

And here is what calls it:

save_me() {
  if (mappingp(all_data))
      sprintf("all data map: %O\n",all_data) );
    walk_mapping(all_data, #'save_set);
  return 1;

If I comment out walk_mapping, the driver won't crash.

As you can see, I've added some traces just to see what is in the mapping.

The contents of the mapping according to the log file are:

all data map: ([ /* #1 */
  "basic": ([ /* #2 */
   "water": ([ /* #3 */
   "moveflags": ({ /* #4, size: 1 */
   "terrain_desc": "You are swimming underwater.
   "plains": ([ /* #5 */
   "moveflags": ({ }),
   "terrain_desc": "The plains are flowing with long grasses and bright
   "air_water": ([ /* #6 */
   "moveflags": ({ /* #7, size: 1 */
   "item_desc": ([ /* #8 */
   "water": "The water is clear and wet.",
   "terrain_desc": "You are swimming along the surface of a body of water.
   "water_solid": ([ /* #9 */
   "moveflags": ({ /* #10, size: 1 */
   "terrain_desc": "You are on the sea floor.
   "air_solid": ([ /* #11 */
   "moveflags": ({ }),
   "item_desc": ([ /* #12 */
   "item_desc": ([ /* #12 */
   "bedrock": "The bedrock is a reddish color",
   "terrain_desc": "Bare bedrock shows through at this point.
   "solid": ([ /* #13 */
   "moveflags": ({ }),
   "terrain_desc": "Bare bedrock shows through at this point.
   "rock": ([ /* #14 */
   "moveflags": ({ /* #15, size: 1 */
   "terrain_desc": "You are encased in rock!
   "air": ([ /* #16 */
   "moveflags": ({ /* #17, size: 1 */
   "terrain_desc": "Nothing but air surrounds you!
   "modified": 0,

I've tried using manipulating an identical mapping in a seperate object
and have had no problems with walk_mapping() there.

I'll keep playing with it, but thought I'd send this in case you had any

- Ron

P.S.  In answer to your question about the debug data, yes, I always
      get the same backtrace each time the driver dumps core and it
      always happens immediately after the weirdness with terrain_d.c

Ron Dawson
CANSARP Support,                       Search and Rescue
Canadian Coast Guard College,                Sydney N.S.
Phone: (902) 564-3660 x1345          Fax: (902) 562-6113
Email: rdawson/@cgc.ns.ca  Pager Email: pageron@cgc.ns.ca

Date sent:      	Fri, 25 Aug 2000 20:21:20 -0300
From:           	Ron Dawson <rondawson@syd.eastlink.ca>
To:             	Lars Duening <lars@bearnip.com>
Copies to:      	Ron Dawson <rdawson@cgc.ns.ca>
Subject:        	Re: ldmud 3.2.8 crash continued

Lars Duening wrote:

> How does all_data get its data - with a restore_object()? And if yes,
> are there any changes to the data between the restore and save?

Hi again.  Right now, the mapping gets it's data via a restore_object().
There is the possibility that the mapping could chage, but not in any
of the tests I ran.  The data only changes when someone adds a new
terrain type (or modifies one). 
> If the data comes indeed from a restore, make a backup copy of the
> savefile (just in case), comment out the restore_object() call and
> replace it with a manual setup of the mapping with the exact same
> data usually restored from the savefile.
> If this with walk_mapping() and save_object() in place causes the
> crashes to vanish, the culprit is most likely restore_object() in
> combination with the restored data and the data in all_data at the
> time of the restore.

I tried this and I had the same results.

Another thing I did was switch to using map() instead of using
Using map() would not cause the driver to shut down (but it did still
an error).  I'll include those errors below.

> > P.S.  In answer to your question about the debug data, yes, I always
> >       get the same backtrace each time the driver dumps core and it
> >       always happens immediately after the weirdness with terrain_d.c
> >       cleanup.
> Do you already have a clue what exactly goes wrong during this
> 'weirdness'?

What I think is happening is that the object is being pulled out from
under itself and dested before/during the walk_mapping and the driver
can't deal with the closure/mapping once that's happened.  This virtual 
grid implemention is a bit over convoluted with regard to handling 
clean_up()'s and I suspect something isn't quite right with this
object (terrain_d).  Looking at the driver files called, backend 
is calling remove_destructed_objects() and so on up to where the
mappings are supposed to be deallocated - only they can't. 

Just a few minutes ago, I switched terrain_d to inherit our "regular"
/std/Object file and now it works fine with no errors (because it's no
longer doing clean_up() the vgrid way).  This isn't really a problem
because terrain_d is a daemon that configures terrain types.  I don't
see why it ever was inheriting the virtual rooms/container objects.

Getting back to the crash, here are a few more logs.  

On the 3.2.7 driver, I get the following error messages for the
the original terrain_d in the run-time log.  The driver doesn't
dump core.

Object 'virtual/rooms/daemons/terrain_d' the closure was bound to has
been destructed
program: virtual/rooms/daemons/terrain_d.c, object:
virtual/rooms/daemons/terrain_d line 28
'       clean_up' in ' virtual/std/basic.c'
('virtual/rooms/daemons/terrain_d')line 143
'         remove' in 'virtual/rooms/daemons/terrain_d.c'
('virtual/rooms/daemons/terrain_d')line 32
'        save_me' in 'virtual/rooms/daemons/terrain_d.c'
('virtual/rooms/daemons/terrain_d')line 28
Error in process_objects().

Line 28 is the call to walk_mapping() that I've been talking about.

In 3.2.8, I'll the following in the run-time log (and the driver will
dump core):

2000.08.25 19:08:25 Object used by walk_mapping destructed2000.08.25
19:08:25 pr
ogram: virtual/rooms/daemons/terrain_d.c, object:
d line 36
'       clean_up' in ' virtual/std/basic.c'
ine 144
'         remove' in 'virtual/rooms/daemons/terrain_d.c'
/terrain_d')line 45
'        save_me' in 'virtual/rooms/daemons/terrain_d.c'
/terrain_d')line 36
2000.08.25 19:08:25 Error in process_objects().
2000.08.25 19:08:25 Ref count in freed hash mapping: 1
2000.08.25 19:08:25 Dump of the call chain:
No program to trace.

Now if I change things so that I can use map() instead
of walk_mapping(), I'll get something like:

2000.08.25 20:10:08 Object used by map destructed2000.08.25 20:10:08
program: vi
rtual/rooms/daemons/terrain_d.c, object: virtual/rooms/daemons/terrain_d
line 37
'       clean_up' in ' virtual/std/basic.c'
ine 144
'         remove' in 'virtual/rooms/daemons/terrain_d.c'
/terrain_d')line 43
'        save_me' in 'virtual/rooms/daemons/terrain_d.c'
/terrain_d')line 37
2000.08.25 20:10:08 Error in process_objects().

Again this seems to be suggesting that the object is being destroyed
In the case of map(), the driver won't crash.

As I said, I've found a way to fix the problem by staying away from the
grid inheritables for the terrain_d daemon.  

- Ron