Function Backtracing v1.0 by Steve Fairbairn (balloon@caves.tf) & Richard Lawrence (silver@ewtoo.org) Abstract -------- This package provides function backtracing in your dump log when the talker crashes. Its been specifically written for PG+ although since the original code came from a Summink talker it should be very easy to stick in. Look, I know a guy who is thinking about sticking it into Amnuts (hi Andy!) - if he can do it, so can you :o) Disclaimer ---------- Use this code at your own risk. Make backups of everything and test thoroughly before you use this on a live talker with live player files. I CAN'T STRESS THIS ENOUGH. The code makes modifications to EVERY SINGLE function in your code and may cause undesirable results if not properly checked. Test, test, test and test even more. How it works ------------ There are two main macros, ENTERFUNCTION and EXITFUNCTION, which are called whenever you enter or exit a function. When the code crashes, the dump log contains a list of the functions you entered up to the point it died. There is also another macro called MARKFUNCTION which allows you to leave a tag at various points throughout a function with extra information. Note that only one mark is held per function, so as you leave a new one, it will overwrite the one that is already there, but if you're processing connected players or something.... for (scan = flatlist_start; scan; scan = scan->flatlist_next) { MARKFUNCTION("Processing %s.", scan->name); <whatever processing is actually needed> } MARKFUNCTION("Finished processing players."); Then if it crashes while processing balloon, you'll be told about it in trace log. Installation ------------ 1. Set up a test server with pfiles and code. Don't use the LIVE system! 2. Backup everything!!! (This is important you will need files later!) 3. Copy the add_debug_info and debug.pl files into your src directory 4. Run the following command: for zz in *.c ; do ./add_debug_info $zz ; done This will run add_debug_info on every .c file in your src directory. If you didn't back up your code before you did this, then you should consider not running a talker :o) 5. Overwrite the following files with their originals (pre debug): angel.c intercom.c ident_server.c 6. In include/player.h add the following: #define UPDATE_INT_DATA (1<<23) #define SCREENING_NEWBIES (1<<24) + #define DONT_TRACE (1<<25) + #define IN_ACTUAL_TIMER (1<<26) 7. In parse.c - the actual timer function you need to do the following: a. At the beginning of the function, insert: sys_flags |= IN_ACTUAL_TIMER; b. Just before it leaves the function, insert: sys_flags &= ~IN_ACTUAL_TIMER; For example: void actual_timer(int c) { static int pling = TIMER_CLICK; player *scan; time_t t; int timeon; sys_flags |= IN_ACTUAL_TIMER; << see step (a) ENTERFUNCTION; if (sys_flags & PANIC) { EXITFUNCTION; sys_flags &= ~IN_ACTUAL_TIMER; << see step (b) return; } ...etc...etc... 8. Copy backtrace.c into your src directory 9. Edit include/proto.h and add the following: #define ENTERFUNCTION EnterFunction(__LINE__, __FILE__, __FUNCTION__); #define EXITFUNCTION ExitFunction(__LINE__, __FILE__, __FUNCTION__); #define MARKFUNCTION(format, args...)\ MarkFunction(__LINE__, __FILE__, __FUNCTION__, \ format, ## args); void MarkFunction(int Line, char *File, char *Function, char *Format, ...); extern void EnterFunction(int, char *, char *); extern void ExitFunction(int, char *, char *); extern void DumpTraceInfo(char *); 10. In glue.c in the "handle_error" function find and add the following: sprintf(stack_start, "action %s", action); stack = end_string(stack_start); log("dump", stack_start); + DumpTraceInfo("dump"); log("dump", "---------- End of dump info"); 10. Do "make depend", then "make install" Errors and code audits ---------------------- You will get warnings and errors! This is because the perl script is not perfect and will sometimes try to insert an ENTERFUNCTION; in the wrong place within a function. It should be placed just after any declarations within that function. Once you've got it compiling you really should do a code audit - that is look at every occurance of EXITFUNCTION; within the code and see that its called at the right time. For example, in the command_prescan function in parse.c, you'll find something like this: if (!strcasecmp(str, "desc") || !strcasecmp(str, "personalize") || !strcasecmp(str, "personalise")) EXITFUNCTION; return DESCc; if ((!strcasecmp(str, "soc") || !strcasecmp(str, "social")) && social_index >= 0) As you can see here, the code will always return DESCc when it gets to this point because the perl script couldn't identify that the if was broken into two lines. You'll need to fix this manually as it should be: if (!strcasecmp(str, "desc") || !strcasecmp(str, "personalize") || !strcasecmp(str, "personalise")) { EXITFUNCTION; return DESCc; } Thankfully to help you there is an additional program called "check.pl" included in the archive which you use in the following way: check.pl <filename> This will look through the c file you specified and tell you if there are any possible places where EXITFUNCTION is wrong. A couple of points: 1. You need to have run "make_pretty" first before you can use this 2. You need to have the code compiling fine before you use this 3. Its not perfect and won't find everything Sample output is: $ check.pl plists.c plists.c:1660: possibly incorrect location of EXITFUNCTION 1660: EXITFUNCTION; 1661: return 0; Here you can visit line 1661 in parse.c and find out if it really is wrong. Note that this is an example and might not appear in your own code! You might want to use this line to scan all files in your src directory for errors and report them to a file. rm -f report.txt ; for zz in *.c ; do check.pl $id >> report.txt ; done then you can read report.txt listing all the potential problems. Running ------- Okay, so now you've got it in and compiling, now is time to run it. Don't run it live! There is still some work to be done. Once you have the code up and running, log in and at some point you'll probably see an error similar to this: 16:23:57 - 30/11/2001 - Leaving : socket.c : set_talker_addy : Line 1260 16:23:57 - 30/11/2001 - Not left : get_config_msg : softmsg.c : Line 31 What does this tell you? Well basically, the code encountered an EXITFUNCTION; call in set_talker_addy but didn't encounter one in get_config_msg. To illustrate this: set_talker_addy ENTERFUNCTION(set_talker_addy) code call to get_config_msg ENTERFUNCTION(get_config_msg) code exit and return to set_talker_addy code EXITFUNCTION(set_talker_addy) So the codes gone "hang on, i've left set_talker_addy, but i still haven't left get_config_msg - this isn't possible". Hence the error. To fix, just go and look at the function 'set_talker_addy' and insert the EXITFUNCTIONS; in the right places. Crashing -------- When the code crashes, you'll get something like this: 16:39:44 - 30/11/2001 - ------------ Starting dump 16:39:44 - 30/11/2001 - Errno set to 2, No such file or directory 16:39:44 - 30/11/2001 - admin 16:39:44 - 30/11/2001 - player main.room 16:39:44 - 30/11/2001 - flags 2186 sys 1572864 tag 53 cus 30722 mis 266240 res 1859968973 16:39:44 - 30/11/2001 - f coder 16:39:44 - 30/11/2001 - current main.room 16:39:44 - 30/11/2001 - global flags 2049, players 3 16:39:44 - 30/11/2001 - action doing command 16:39:44 - 30/11/2001 - ==== Showing last function calls 16:39:44 - 30/11/2001 - main : glue.c : Line 1749 16:39:44 - 30/11/2001 - process_players : parse.c : Line 1053 16:39:44 - 30/11/2001 - input_for_one : parse.c : Line 950 16:39:44 - 30/11/2001 - match_commands : parse.c : Line 911 16:39:44 - 30/11/2001 - old_match_commands : parse.c : Line 752 16:39:44 - 30/11/2001 - execute_command : parse.c : Line 634 16:39:44 - 30/11/2001 - newfinger : examine.c : Line 892 16:39:44 - 30/11/2001 - sigsegv : glue.c : Line 1378 16:39:44 - 30/11/2001 - handle_error : glue.c : Line 1115 16:39:44 - 30/11/2001 - ==== End of function calls 16:39:44 - 30/11/2001 - ---------- End of dump info Here you can see that newfinger was called, that caused a segmentation fault which called "handle_error". If you like, you could remove the ENTERFUNCTION and EXITFUNCTION calls from handle_error and sigsegv functions so that the code stops logging at the error. Its your call. Coding ------ Now everything is set up, you need to remember the following: 1. At the beginning of every function, make sure there is a call to ENTERFUNCTION; 2. Where control leaves that function (be it at the end or through a return) make sure there is a call to EXITFUNCTION; If you forget one or the other, an error will be logged to the error log at some point. Even so, try to remember! ** It is not recommended to program "normally" and then rely on the perl ** script to add the right calls. It is only there to save you 2 weeks ** worth of work implementing the code in the first place! You may notice on your travels that some functions aren't logged (such as get_int_safe, get_string_safe and stack handling). This is mainly because these functions work well, are called a lot (spam in the error log) and will cause a hit to the talker. Also, some of the functions are a bit scary, so we'd like to leave them alone :o) Finally ------- Its horribly beta, i've not tested it myself that well, so tread with caution and all that. Steve's had the code running on Caves for a while now and it works well, however he put all the calls in manually. Don't roll the code out live, until you've thoroughly tested it. Documentation last updated 3rd December 2001 silver@ewtoo.org