function_backtrace_1.0/
Function Backtracing v1.0
by Steve Fairbairn (balloon@caves.tf) & Richard Lawrence (silver@ewtoo.org)


Abstract
--------

This package provides function backtracing in your dump log when the 
talker crashes. Its been specifically written for PG+ although since the 
original code came from a Summink talker it should be very easy to stick 
in. Look, I know a guy who is thinking about sticking it into Amnuts (hi 
Andy!) - if he can do it, so can you :o)


Disclaimer
----------

Use this code at your own risk. Make backups of everything and test 
thoroughly before you use this on a live talker with live player files.

I CAN'T STRESS THIS ENOUGH. The code makes modifications to EVERY 
SINGLE function in your code and may cause undesirable results if not 
properly checked. Test, test, test and test even more.


How it works
------------

There are two main macros, ENTERFUNCTION and EXITFUNCTION, which are 
called whenever you enter or exit a function. When the code crashes, the 
dump log contains a list of the functions you entered up to the point it 
died.

There is also another macro called MARKFUNCTION which allows you to leave
a tag at various points throughout a function with extra information.
Note that only one mark is held per function, so as you leave a new one,
it will overwrite the one that is already there, but if you're processing
connected players or something....

for (scan = flatlist_start; scan; scan = scan->flatlist_next)
{
   MARKFUNCTION("Processing %s.", scan->name);

   <whatever processing is actually needed>
}
MARKFUNCTION("Finished processing players.");

Then if it crashes while processing balloon, you'll be told about it in
trace log.


Installation
------------

1. Set up a test server with pfiles and code. Don't use the LIVE system!
2. Backup everything!!! (This is important you will need files later!)
3. Copy the add_debug_info and debug.pl files into your src directory
4. Run the following command:

   for zz in *.c ; do ./add_debug_info $zz ; done

   This will run add_debug_info on every .c file in your src directory. If 
   you didn't back up your code before you did this, then you should 
   consider not running a talker :o)

5. Overwrite the following files with their originals (pre debug):

     angel.c
     intercom.c
     ident_server.c

6. In include/player.h add the following:

  #define UPDATE_INT_DATA         (1<<23)
  #define SCREENING_NEWBIES       (1<<24)
+ #define DONT_TRACE              (1<<25)
+ #define IN_ACTUAL_TIMER         (1<<26)

7. In parse.c - the actual timer function you need to do the following:

  a. At the beginning of the function, insert:

     sys_flags |= IN_ACTUAL_TIMER;

  b. Just before it leaves the function, insert:

     sys_flags &= ~IN_ACTUAL_TIMER;

  For example:

void actual_timer(int c)
{
  static int pling = TIMER_CLICK;
  player *scan;
  time_t t;
  int timeon;

  sys_flags |= IN_ACTUAL_TIMER;    << see step (a)

  ENTERFUNCTION;
  if (sys_flags & PANIC)
  {
    EXITFUNCTION;
    sys_flags &= ~IN_ACTUAL_TIMER;    << see step (b)
    return;
  }

  ...etc...etc...

8. Copy backtrace.c into your src directory
9. Edit include/proto.h and add the following:

#define ENTERFUNCTION EnterFunction(__LINE__, __FILE__, __FUNCTION__);
#define EXITFUNCTION  ExitFunction(__LINE__, __FILE__, __FUNCTION__);
#define MARKFUNCTION(format, args...)\
                      MarkFunction(__LINE__, __FILE__, __FUNCTION__, \
                                   format, ## args);
void MarkFunction(int Line, char *File, char *Function, char *Format, ...);

extern void EnterFunction(int, char *, char *);
extern void ExitFunction(int, char *, char *);
extern void DumpTraceInfo(char *);

10. In glue.c in the "handle_error" function find and add the following:

   sprintf(stack_start, "action %s", action);
   stack = end_string(stack_start);
   log("dump", stack_start);
+  DumpTraceInfo("dump");
   log("dump", "---------- End of dump info");

10. Do "make depend", then "make install"


Errors and code audits
----------------------

You will get warnings and errors! This is because the perl script is not 
perfect and will sometimes try to insert an ENTERFUNCTION; in the wrong 
place within a function. It should be placed just after any declarations 
within that function.

Once you've got it compiling you really should do a code audit - that is 
look at every occurance of EXITFUNCTION; within the code and see that its 
called at the right time. For example, in the command_prescan function in 
parse.c, you'll find something like this:

  if (!strcasecmp(str, "desc") || !strcasecmp(str, "personalize")
      || !strcasecmp(str, "personalise"))
    EXITFUNCTION;
  return DESCc;
  if ((!strcasecmp(str, "soc") || !strcasecmp(str, "social"))
      && social_index >= 0)

As you can see here, the code will always return DESCc when it gets to 
this point because the perl script couldn't identify that the if was 
broken into two lines. You'll need to fix this manually as it should be:

  if (!strcasecmp(str, "desc") || !strcasecmp(str, "personalize")
      || !strcasecmp(str, "personalise"))
  {
    EXITFUNCTION;
    return DESCc;
  }

Thankfully to help you there is an additional program called "check.pl" 
included in the archive which you use in the following way:

  check.pl <filename>

This will look through the c file you specified and tell you if there are 
any possible places where EXITFUNCTION is wrong. A couple of points:

 1. You need to have run "make_pretty" first before you can use this
 2. You need to have the code compiling fine before you use this
 3. Its not perfect and won't find everything

Sample output is:

 $ check.pl plists.c
plists.c:1660: possibly incorrect location of EXITFUNCTION
1660:     EXITFUNCTION;
1661:   return 0;

Here you can visit line 1661 in parse.c and find out if it really is 
wrong. Note that this is an example and might not appear in your own code!

You might want to use this line to scan all files in your src directory 
for errors and report them to a file.

  rm -f report.txt ; for zz in *.c ; do check.pl $id >> report.txt ; done

then you can read report.txt listing all the potential problems.


Running
-------

Okay, so now you've got it in and compiling, now is time to run it. Don't 
run it live! There is still some work to be done. Once you have the code 
up and running, log in and at some point you'll probably see an 
error similar to this:

16:23:57 - 30/11/2001 - Leaving  : socket.c : set_talker_addy : Line 1260
16:23:57 - 30/11/2001 - Not left : get_config_msg : softmsg.c : Line 31

What does this tell you? Well basically, the code encountered an 
EXITFUNCTION; call in set_talker_addy but didn't encounter one in 
get_config_msg. To illustrate this:

set_talker_addy
ENTERFUNCTION(set_talker_addy)
code
call to get_config_msg
ENTERFUNCTION(get_config_msg)
code
exit and return to set_talker_addy
code
EXITFUNCTION(set_talker_addy)

So the codes gone "hang on, i've left set_talker_addy, but i still haven't 
left get_config_msg - this isn't possible". Hence the error. To fix, just 
go and look at the function 'set_talker_addy' and insert the 
EXITFUNCTIONS; in the right places.


Crashing
--------

When the code crashes, you'll get something like this:

16:39:44 - 30/11/2001 - ------------ Starting dump
16:39:44 - 30/11/2001 - Errno set to 2, No such file or directory
16:39:44 - 30/11/2001 - admin
16:39:44 - 30/11/2001 - player main.room
16:39:44 - 30/11/2001 - flags 2186 sys 1572864 tag 53 cus 30722 mis 266240 res 1859968973
16:39:44 - 30/11/2001 - f coder
16:39:44 - 30/11/2001 - current main.room
16:39:44 - 30/11/2001 - global flags 2049, players 3
16:39:44 - 30/11/2001 - action doing command
16:39:44 - 30/11/2001 - ==== Showing last function calls
16:39:44 - 30/11/2001 -   main : glue.c : Line 1749
16:39:44 - 30/11/2001 -   process_players : parse.c : Line 1053
16:39:44 - 30/11/2001 -   input_for_one : parse.c : Line 950
16:39:44 - 30/11/2001 -   match_commands : parse.c : Line 911
16:39:44 - 30/11/2001 -   old_match_commands : parse.c : Line 752
16:39:44 - 30/11/2001 -   execute_command : parse.c : Line 634
16:39:44 - 30/11/2001 -   newfinger : examine.c : Line 892
16:39:44 - 30/11/2001 -   sigsegv : glue.c : Line 1378
16:39:44 - 30/11/2001 -   handle_error : glue.c : Line 1115
16:39:44 - 30/11/2001 - ==== End of function calls
16:39:44 - 30/11/2001 - ---------- End of dump info

Here you can see that newfinger was called, that caused a segmentation 
fault which called "handle_error". If you like, you could remove the 
ENTERFUNCTION and EXITFUNCTION calls from handle_error and sigsegv 
functions so that the code stops logging at the error. Its your call.


Coding
------

Now everything is set up, you need to remember the following:

1. At the beginning of every function, make sure there is a call to 
ENTERFUNCTION;
2. Where control leaves that function (be it at the end or through a 
return) make sure there is a call to EXITFUNCTION;

If you forget one or the other, an error will be logged to the error log 
at some point. Even so, try to remember!

** It is not recommended to program "normally" and then rely on the perl
** script to add the right calls. It is only there to save you 2 weeks
** worth of work implementing the code in the first place!

You may notice on your travels that some functions aren't logged (such as 
get_int_safe, get_string_safe and stack handling). This is mainly because 
these functions work well, are called a lot (spam in the error log) and 
will cause a hit to the talker. Also, some of the functions are a bit 
scary, so we'd like to leave them alone :o)


Finally
-------

Its horribly beta, i've not tested it myself that well, so tread with 
caution and all that. Steve's had the code running on Caves for a while 
now and it works well, however he put all the calls in manually.

Don't roll the code out live, until you've thoroughly tested it.


Documentation last updated
3rd December 2001
silver@ewtoo.org