24 Jul, 2009, Hyper_Eye wrote in the 1st comment:
Votes: 0
I have mentioned this on IMC but I wanted to discuss this here because this is a common programming error and it can be found in almost any MUD codebase.

Problem

Quiz Question: What is the error in this code?

char a;

while( (a = fgetc(fp)) != EOF )


You might say, "There is nothing wrong with that. I've seen it a million times and it works fine." Well you would be right some of the time but that code has a bug in it and it can bite you in the rear. To expose this bug we need to know some rules. If you don't care to understand exactly why this bug occurs skip the rules and see the problem that results from the bug and the solution to it.

Rule 1: (char == signed char) except when it doesn't or… (char == signed char) except when (char == unsigned char).

The sign of char is not defined by C. It is compiler dependent. While it might be signed on the compiler you are using at the moment it may be unsigned when you use a different one.

Rule 2: fgetc() does not return a char. It returns an int. Here is the declaration of fgetc() according to the GETS(3) man page:

Quote
int fgetc(FILE *stream);


Obviously there is an important difference between an int and a char. An int is usually 2 or 4 bytes and a char is a byte.

Rule 3: EOF is always defined as a negative integer (usually -1).

Rule 4: Storing an int as a byte requires the value be demoted because of the size difference.

Rule 5: A comparison between int and char results in the char value being promoted to an int to account for the size difference. When this promotion occurs on a signed char it will be correctly promoted to the original value (assuming it was a value that could be stored in a byte which is going to be true in the case of the return from fgetc().) When this promotion occurs on an unsigned char a negative integer value will not be produced. If the original value was a negative integer value the unsigned char promotion is going to produce a positive integer value and the comparison will produce a false result regardless of the fact that the two values were originally comparable.

End of Rules

So with those rules the problem becomes obvious. The error is that the returned value can never be equal to EOF on a system that uses an unsigned char as char. On these systems that comparison will result in an infinite loop and your MUD becoming hopelessly unresponsive.

The following code makes this problem absolutely clear:

#include <stdio.h>

int main()
{
unsigned char a = EOF;
signed char b = EOF;
char c = EOF;

printf("EOF\t\t= %d\n", EOF);
printf("char\t\t= %d\n", c);
printf("signed char\t= %d\n", b);
printf("unsigned char\t= %d\n", a);

return 0;
}


Output on Linux X86 - GCC:
$ ./test
EOF = -1
char = -1
signed char = -1
unsigned char = 255


Output on Linux PPC - GCC:
$ ./test
EOF = -1
char = 255
signed char = -1
unsigned char = 255


In the second output you can see that the char value is the same as the unsigned char value when promoted to an int. Both of those values were assigned the value of EOF which was demoted due to the size difference. 255 and -1 will never be equal and any loop that requires that condition to break will loop endlessly.

Solutions

So now that you are aware of why the original statement is a bug (whether it effects your system or not) here are some solutions to this bug:

Solution 1:
signed char a;

while( (a = fgetc(fp)) != EOF )


Explicitly using a signed char puts the sign of your char in your hands. This will work on any compiler and any architecture.

Solution 2:
int a;

while( (a = fgetc(fp)) != EOF )
printf( "%c", char(a) );


Here there is no demotion when storing the value and no promotion for the comparison. You will need to typecast it during use.

Solution 3:
char a;

while( fread(&a, 1, 1, fp) > 0 )


This statement does not depend on sign because fread() always returns 0 on error. Here is the declaration of fread() according to the FREAD(3) man page:

Quote
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);


If you would like to test for EOF upon getting a 0 back you can do it like this:

if( feof( fp ) )


feof() will return true if the end-of-file indicator is set on the stream.

Conclusion

Should you care about this bug? As a MUD developer you should. Most codebases are very portable but all it takes is one bug like this to eliminate your MUD from running on many systems.

Does your MUD contain this bug? It is quite likely that it does. Just using a random MUD I have on my system as an example the released version of ResortMUD6 contains 16 instances of this bug. SmaugFUSS 1.9 contains 14 instances of this bug.

Here is something a lot of you are using. IMC2 Freedom CL-2.2 contains 8 instances of this bug.

I know I wrote a lot more here than was necessary to make people aware of this bug but some people don't know everything I pointed out and this knowledge is useful. So I hope somebody learned something here and I hope you will eliminate this bug from your code.

Happy coding guys!
24 Jul, 2009, Igabod wrote in the 2nd comment:
Votes: 0
I thank you for educating me, I didn't know any of that (self-taught coder) so it will come in handy in the future when I decide to actively develop my mud again.
24 Jul, 2009, Hyper_Eye wrote in the 3rd comment:
Votes: 0
Igabod said:
I thank you for educating me, I didn't know any of that (self-taught coder) so it will come in handy in the future when I decide to actively develop my mud again.


If I educated one person it was worth my time to write it. Good luck on your future coding!
24 Jul, 2009, Lobotomy wrote in the 4th comment:
Votes: 0
I was already aware of that information, but it is a great post nonetheless. Also, welcome to Mudbytes, Hyper_Eye.

Edit: Your post would be a good addition for the articles section as well.
24 Jul, 2009, Igabod wrote in the 5th comment:
Votes: 0
Ah yes, sorry for the rudeness. Welcome to the forums.

I agree with lobotomy, you should put this information in the articles section.
24 Jul, 2009, David Haley wrote in the 6th comment:
Votes: 0
Never have been very fond of platforms that decide that chars should be unsigned. The usual practice appears to be that things are signed by default, and unsigned if you ask for it. On these platforms, if you declare an int, is it unsigned by default? If it's signed, I would think that horribly inconsistent. :thinking:

Of course, this doesn't change that if you want to get an int back or compare against ints, you should be using ints, not making assumptions about the signedness of the data type you're using.
24 Jul, 2009, Hyper_Eye wrote in the 7th comment:
Votes: 0
David Haley said:
Never have been very fond of platforms that decide that chars should be unsigned. The usual practice appears to be that things are signed by default, and unsigned if you ask for it. On these platforms, if you declare an int, is it unsigned by default? If it's signed, I would think that horribly inconsistent. :thinking:

Of course, this doesn't change that if you want to get an int back or compare against ints, you should be using ints, not making assumptions about the signedness of the data type you're using.


Of course an int is not automatically unsigned on those platforms. This only applies to a char as it is not specified in the C standard whether that is signed or not. When using a char for its most common purpose it doesn't matter what its sign is. Of course it is not generally used to represent an integral value and it is in this case that the sign of char matters. When you request an int you will get a signed int on any compiler. I don't frown upon a compiler that uses an unsigned char to represent char as it is up to their discretion which to use and they are not breaking the C standard. When a developer is using a char to store an integral value or they are comparing a char to an integral value they should know that the sign of char is platform dependent.
24 Jul, 2009, David Haley wrote in the 8th comment:
Votes: 0
There's a difference between not breaking a standard and having consistent behavior. I think it's not appropriate to get different default signedness with one type vs. another. It's cause for unnecessary surprise.

BTW, I'm not disagreeing that people should be more careful with their types – I already said that. :wink:
24 Jul, 2009, Hyper_Eye wrote in the 9th comment:
Votes: 0
David Haley said:
There's a difference between not breaking a standard and having consistent behavior. I think it's not appropriate to get different default signedness with one type vs. another. It's cause for unnecessary surprise.

BTW, I'm not disagreeing that people should be more careful with their types – I already said that. :wink:


I don't disagree that it would be nice for all systems to use a consistent sign for char or even that defaulting char to unsigned is inconsistent with the default behavior of other types. In a perfect world there wouldn't be a discrepancy there. But in a perfect world the two systems I used as an example wouldn't use a different endianness but they do (ppc is big endian and x86 is little endian.) The point simply being that as long as the C spec remains so lax these kinds of issues are going to exist. As programmers we should try to be aware of these kinds of issues and consider that our applications could one day be used on systems with different rules from the system we are currently developing on. At some point x86 could be put to rest and a new arch could become the PC standard. In my professional life I have to always develop with a portable mindset and produce portable code. This is required due to the wide array of systems we support. In my hobby life I choose to maintain that mindset because I like portable code… and that is the way my programming brain works.

I don't think that we disagree on anything here. I think I am simply more forgiving of platform differences than you. I don't blame you. These sorts of things bothered me at first but I have been exposed to them so consistently that I have become content with the way it is. :thinking:
24 Jul, 2009, Hyper_Eye wrote in the 10th comment:
Votes: 0
Lobotomy said:
I was already aware of that information, but it is a great post nonetheless. Also, welcome to Mudbytes, Hyper_Eye.

Edit: Your post would be a good addition for the articles section as well.


Well… I must be missing something. I don't see where I would submit something to the articles section. Maybe I don't have enough posts or something? I would be happy to submit it if someone can point me in the right direction.
24 Jul, 2009, David Haley wrote in the 11th comment:
Votes: 0
That's funny, it's broken for me as well. Might be something that was broken during the migrations. Normally you just go to the Articles section and click "Add Article" or something like that.
24 Jul, 2009, Davion wrote in the 12th comment:
Votes: 0
Hyper_Eye said:
Lobotomy said:
I was already aware of that information, but it is a great post nonetheless. Also, welcome to Mudbytes, Hyper_Eye.

Edit: Your post would be a good addition for the articles section as well.


Well… I must be missing something. I don't see where I would submit something to the articles section. Maybe I don't have enough posts or something? I would be happy to submit it if someone can point me in the right direction.


Ya know… you go to length to document stuff and people just ignore it! :P -> http://www.mudbytes.net/index.php?a=arti...
24 Jul, 2009, David Haley wrote in the 13th comment:
Votes: 0
Huh. I was sure there used to be a creation link. Oh well. :tongue:
25 Jul, 2009, Hyper_Eye wrote in the 14th comment:
Votes: 0
I did some reading and I found that there is another solution to this in the case of gcc. You can add the cflag -fsigned-char and gcc will use a 'signed char' for 'char' regardless of platform. This doesn't make the application compiler portable but it does make it platform portable when gcc is used. So I wouldn't use it as an excuse to write code with the described bug in it but if you have code with many instances of this you can use the flag to get around the problem. Adding this flag to stock SmaugFUSS1.9 code made it work on ppc without modification (including MCCP working correctly) except for a single byte-swapping issue (big-endian vs. small-endian issue.) And in case you were wondering, gcc will allow you to force char the other way with -funsigned-char. I just doubt it is used near as often.
0.0/14