29 Jun, 2011, KaVir wrote in the 21st comment:
Votes: 0
Rarva.Riendf said:
KaVir said:
Rarva.Riendf said:
So yes you are wrong. Though I dont know the internal on how to protect memory, your program may segfault.

No, the example Vigud posted is perfectly valid and safe code.

Really, it is so simple to access whatever memory you want without ever being told no ? Somehow I doubted it.

I'm talking specifically about Vigud's example - the one he posted in a code block. You replied to him with the comment "your program may segfault". However that is not true, his example is perfectly safe.

Rarva.Riendf said:
while (*pti)
{
switch (*pti)
{
case '^':
if (isalpha(pti[1]) && pti[2] == '^' && isalpha(pti[3]))

No it is not fine as you go over pti[1] ^ could very well be the last char in your string. pt[1] could be '\0' pti[2] & pti[3] coudl be pretty much anything outside of your string.

I'm afraid Scandum is correct. The above block of code is a perfectly valid way of dealing with NUL-terminated strings.
29 Jun, 2011, Rarva.Riendf wrote in the 22nd comment:
Votes: 0
@to Kavir I said may segfault because as I said I do not know enough about memory protection to know what happens when you actually go looking in protected memory your process did not allocated.
Second code is exactly the same problem to me, so I really do not understand how you can both say it is correct.
I you could explain me why there is no problem I would be grateful.
29 Jun, 2011, KaVir wrote in the 23rd comment:
Votes: 0
Rarva.Riendf said:
@to Kavir I said may segfault because as I said I do not know enough about memory protection to know what happens when you actually go looking in protected memory your process did not allocated.

The process did allocate it; 100 bytes sitting on the stack. His example initialises the first 9 elements with the characters 'a', ' ', 's', 't', 'r', 'i', 'n', 'g' and '\0', then he sets element 50 to 'h' and reads element 55 - but those are all within the 100 bytes allocated for the string at the beginning of the function.

Rarva.Riendf said:
Second code is exactly the same problem to me, so I really do not understand how you can both say it is correct. I you could explain me why there is no problem I would be grateful.

The "while (*pti)" ensures that the first/current byte is not NUL.

The "isalpha(pti[1])" checks if the second byte is a letter. If not, the if statement immediately returns false - and NUL isn't a letter, so if pti[1] is NUL the statement will just return false.

The "pti[2] == '^'" checks if the third byte is the character '^'. If it's anything else (including NUL) the if statement immediately returns false.

Finally, "isalpha(pti[3]))" checks if the fourth byte is a letter. If it's NUL (or indeed any other non-letter) the if statement returns false.
29 Jun, 2011, Rarva.Riendf wrote in the 24th comment:
Votes: 0
1: the red color is so dim I missed it was allocated to 100 and assumed he was accessing a character outside the scope of its allocated memory especially since it was what we are talking about.

2:I still do not get how you can assume you will have access to the memory even though the string has been ended. It is not a matter of what the code does when it can, but what about when he cannot ? pti[2] and pti[3] could have been allocated by a totally another program. I REALLY do not get it.

What if Vigur only allocated 50 chars ? (if I did I know one thing:Valgrind would really whine about it for sure at least.)

And I am pretty sure Valgrind could whine about Scandum snippet in some cases because of what I am talking about, I will write an example with a string ending by a ^ just to test it and show you the result.

And as far as experience go, everytime valgrind whines about a C program it WILL crash or have unexpected behaviour.
29 Jun, 2011, David Haley wrote in the 25th comment:
Votes: 0
Vigud said:
What I'm asking for is clarification on what the reason of such segfault would be.

As I said,
"Let's say str[x] is non-zero and str[x+1] is zero. Then, checking str[x+2] means you are (if it's a normal 0-terminated string) going out of bounds – and could potentially cause a segfault. "

Scandum's code is iterating over a string, making sure that the current position is non-zero before continuing. But at several places, it goes on to check characters beyond the current position – without first checking that it's remaining within string bounds.

So, as you access memory outside the string, you could be at best accessing random data, or at worst accessing data outside of your program's memory block.

Vigud said:
Because it's not reading 50th byte from an array of 100 bytes that is unsafe.

Your example is fine. You know where you are in the string, and you know that the byte you are testing is within bounds.

Scandum's code, however, does not know these things.

It's worth noting however that you said:
"Reading "it is not safe to check a string beyond its bounds" I'm under the impression that David Haley thinks the above program would be unsafe. Please correct me if I'm wrong."

You did not actually check a string beyond its bounds. You were within the string bounds, so I'm not sure why you think I was making a statement about the sort of code you posted.

It isn't safe to read a string outside of its bounds, unless you know something special about that string and where it sits in memory. It's just that in your example code (which, yes, is fine) you weren't reading outside of bounds.

So, my statement stands… it is, in general, not a good idea to read a string outside of its bounds, and your example doesn't address the case of reading a string outside of its bounds.

I'll note however that you might run into a different problem, namely getting garbage data. Being within the bounds of allocated memory doesn't mean you'll be getting useful data: you can still get garbage if you access a string beyond its zero-terminating character. (But, you won't get crashes due to being outside of allocated memory.)

Rarva said:
Or because they lacked good String librairies in the first place, and thought they could do without while a mud is mainly all about that:String Manipulation.

I sympathize with people writing string code without good string libraries. It's annoying, it's confusing, and it's hard – and people get it wrong a lot. Been there, done that.
That's why it's particularly important to be diligent with your string code and make sure you're being proper with bounds and allocations.
29 Jun, 2011, KaVir wrote in the 26th comment:
Votes: 0
Rarva.Riendf said:
1: the red color is so dim I missed it was allocated to 100 and assumed he was accessing a character outside the scope of its allocated memory especially since it was what we are talking about.

Actually the discussion was about checking a string beyond it's NUL terminator. The point is that the real problem is accessing unallocated memory - that results in undefined behaviour. But simply checking beyond a NUL terminator is not always a problem, and in some cases can actually be necessary (TTYPE and ZMP use NUL separators for example).

Rarva.Riendf said:
2:I still do not get how you can assume you will have access to the memory even though the string has been ended. It is not a matter of what the code does when it can, but what about when he cannot ? pti[2] and pti[3] could have been allocated by a totally another program. I REALLY do not get it.

The line of code you posted cannot go beyond the NUL terminator. If pti[2] is NUL then the program will not check pti[3]. If pti[1] is NUL then it won't check pti[2]. And of course if pti[0] is NUL it won't even reach the if statement.

Rarva.Riendf said:
What if Vigur only allocated 50 chars ?

Then of course his program would have undefined behaviour, but that would miss the point of the example.
29 Jun, 2011, Rarva.Riendf wrote in the 27th comment:
Votes: 0
Quote
The line of code you posted cannot go beyond the NUL terminator. If pti[2] is NUL then the program will not check pti[3]. If pti[1] is NUL then it won't check pti[2]. And of course if pti[0] is NUL it won't even reach the if statement.

Bleh I should sleep, thx…got it. I totally occulted all those access were in if….and thus never actually done.

This discussion made me check why it crashed when I implemented Vigud(err Scandum….shoudl sleep, really) code though

char buf[3] = "01^";
char whocares[3];
whocares[0] = '\0';
substitute_color(buf, whocares, 1, 1);

it breaks cause whocares is too small (actually not the problem…I am out of it today, will go rest..)
default:
*pto++ = *pti++; /boom
break;
but it breaks…will check the real why later.
29 Jun, 2011, David Haley wrote in the 28th comment:
Votes: 0
One of the reasons to make what might appear to be merely stylistic changes to a program is to make it easier to read, and therefore easier to understand in your head. We're seeing right now that that big block of code requires a lot of concentration to keep in your head, but if it were broken out into smaller chunks, or similar units grouped into helper functions, it would be that much easier to deal with.
29 Jun, 2011, Rarva.Riendf wrote in the 29th comment:
Votes: 0
David Haley said:
One of the reasons to make what might appear to be merely stylistic changes to a program is to make it easier to read, and therefore easier to understand in your head. We're seeing right now that that big block of code requires a lot of concentration to keep in your head, but if it were broken out into smaller chunks, or similar units grouped into helper functions, it would be that much easier to deal with.

I totally second that, clever code and optimisation is fine, but only if you are sure to never touch it again, and even that is unlikely.
I do not thing mud need such optimisation anymore.

I alwas hated the
if ((xx = yy) == zz) {
}
if (xx == ww) {
}
i prefer to break code in
xx = yy
if (xx == zz) etc…

Even tired i can follow the code easylier this way.
29 Jun, 2011, Vigud wrote in the 30th comment:
Votes: 0
Quote
It's worth noting however that you said:
"Reading "it is not safe to check a string beyond its bounds" I'm under the impression that David Haley thinks the above program would be unsafe. Please correct me if I'm wrong."

You did not actually check a string beyond its bounds. You were within the string bounds, so I'm not sure why you think I was making a statement about the sort of code you posted.

It isn't safe to read a string outside of its bounds, unless you know something special about that string and where it sits in memory. It's just that in your example code (which, yes, is fine) you weren't reading outside of bounds.

So, my statement stands… it is, in general, not a good idea to read a string outside of its bounds, and your example doesn't address the case of reading a string outside of its bounds.
Now I see where this misunderstanding came from. You confuse arrays of chars with strings. But they are very different things.

Quote
A string is a contiguous sequence of characters terminated by and including the first null character. […] A pointer to a string is a pointer to its initial (lowest addressed) character. The length of a string is the number of bytes preceding the null character and the value of a string is the sequence of the values of the contained characters, in order.


So the length of the string in my example is not 100, it's 8. And reading anything after the 9th character is reading data outside the string.
29 Jun, 2011, KaVir wrote in the 31st comment:
Votes: 0
Rarva.Riendf said:
char buf[3] = "01^";

That needs to be buf[4], or just buf[]. Otherwise there's no space for the NUL terminator.
29 Jun, 2011, Rarva.Riendf wrote in the 32nd comment:
Votes: 0
KaVir said:
Rarva.Riendf said:
char buf[3] = "01^";

That needs to be buf[4], or just buf[]. Otherwise there's no space for the NUL terminator.

BLah…I give up being ridiculous, you are right again…..No one will ever take me seriously now…but still my OP was about a REAL problem that really can happen….
I really hate C…why a why did I not swapped the engine for a Java one (or whatever langage that does not piss you off to just add two string together…)
And really thank you for your patience I am really out of it today.
29 Jun, 2011, David Haley wrote in the 33rd comment:
Votes: 0
Quote
So the length of the string in my example is not 100, it's 8. And reading anything after the 9th character is reading data outside the string.

You tell me I'm confusing strings and arrays of characters, and yet we were talking about char*, and you come in with a char array example that doesn't even apply to what we were talking about. So, uh, yeah. :thinking:

The point here is simple and, frankly, not really up for debate. As a general principle, it is not safe to read a buffer outside of its bounds, unless you know something very special about that buffer. Reading a string beyond its zero byte is, in general, not safe, unless you know something special about that string not actually being null terminated.

There is no discussion to be had around the fact that reading a buffer outside its bounds is dangerous unless you have an extremely good reason for doing so based on knowing special circumstances. I'm not really sure what point you are trying to make here and to be honest this is getting uninteresting. :sad:
30 Jun, 2011, Vigud wrote in the 34th comment:
Votes: 0
You didn't and apparently still don't understand the problem, but I agree there's nothing more to talk about since any future reader will understand that you meant reading data outside an array (which is indeed wrong, as you said) and not reading data outside a string (which is fine, as I proved).
30 Jun, 2011, David Haley wrote in the 35th comment:
Votes: 0
A string can point to any block of memory – not necessarily a character array.

For instance, a string can point to some location inside a block of memory that was malloc'ed.

It is not fine, in general, to read data outside a string. You should never go beyond your string bounds unless you know it is safe to do so, and that is not a common circumstance.

Your example is contrived and was constructed such that it is clear that the memory is safe.
Most actual code in the actual world is not.

I don't think it's necessary or appropriate to insult my understanding here. I'm pretty sure I kindasorta have a handle on how this stuff works. :sad:


EDIT:
And just since we're being pedantic, you "proved" nothing about the general case – you merely showed a single example where it is ok by construction. So what you truly did prove is that it is not always bad – but then, nobody was saying that it was always bad.
30 Jun, 2011, Runter wrote in the 36th comment:
Votes: 0
void lol(char *str) {
str[strlen(str)+1];
}


I'm overrunning the string. It's not safe. Nobody said it's going to crash it every time, but that has nothing to do with that since it's potentially (as DH has said over and over) going to be invalid. Or even worse, maybe it'll point to something valid that wasn't part of the string. Nope, not mentioning anything about overrunning the array, because I don't know the length of the array. That is precisely why it's unsafe.
30 Jun, 2011, Scandum wrote in the 37th comment:
Votes: 0
Vigud said:
You didn't and apparently still don't understand the problem, but I agree there's nothing more to talk about since any future reader will understand that you meant reading data outside an array (which is indeed wrong, as you said) and not reading data outside a string (which is fine, as I proved).

Haley's having a classic case of tunnel vision and fails to grasp that if the maximum possible input increase is less than the output buffer size there's no need for buffer checks.

For those who are confused it's comparable to building a prison, either you build the prison in a city in which case you need strong walls, or you build the prison on a desert island in the middle of the ocean, in which case you don't need any walls at all. Haley beliefs that all prisons should have 15 feet high walls regardless of the environment.
30 Jun, 2011, Tyche wrote in the 38th comment:
Votes: 0
Scandum said:
Here's a modified version of my color snippet with color compression for the 32 color codes and the 256 color foreground codes.


It doesn't really remove unneeded color codes. The map…
^B~^B~^B~^B~^B~^B~^B~^B~
^B~^B~^YM^YM^YM^YM^B~^B~
^B~^YM^YM^B~^B~^GF^B~^B~
^GF^GF^YM^GF^GF^GF^GF^B~


Gives…

~~~~~~~~
~~MMMM~~
~MM~~F~~
FFMFFFF~
[/code]

It seems it compresses adjacent color codes like in the following..
[code]
^cHello ^cWorld ^gtoday^^s before^D ^Z^R^R^R^R tomorr^yow^
Hello World today^ before^D  tomorrow^
[/code]

It doesn't handle the 'today^^s' above correctly. I was expecting ^^ to be an escape for ^ and generate no color code.
Also it keeps the invalid color code ^D in the string, but not the invalid color code ^Z.

Also if I overwrite the NUL character in..
[code]
^cHello ^cWorld ^gtoday^^s before^D ^Z^R^R^R^R tomorr^yow^
[/code]
with the letter Q it creates the following..
[code]
Hello World today^ before^D  tomorrow^Q
[/code]
30 Jun, 2011, Runter wrote in the 39th comment:
Votes: 0
Scandum said:
Vigud said:
You didn't and apparently still don't understand the problem, but I agree there's nothing more to talk about since any future reader will understand that you meant reading data outside an array (which is indeed wrong, as you said) and not reading data outside a string (which is fine, as I proved).

Haley's having a classic case of tunnel vision and fails to grasp that if the maximum possible input increase is less than the output buffer size there's no need for buffer checks.

For those who are confused it's comparable to building a prison, either you build the prison in a city in which case you need strong walls, or you build the prison on a desert island in the middle of the ocean, in which case you don't need any walls at all. Haley beliefs that all prisons should have 15 feet high walls regardless of the environment.


Your function makes no assumption of the size of the array passed to it. I'm going to say it's probably some constant like MAX_STRING_LENGTH (what, like 1000 characters?)

This really sounds like you're making the claim that as long as the bucket is large enough you can put any arbitrary amount of water in it without fear, to use your previous justification/analogy. Isn't it true that to be safe you must check the known array size?

Also, isn't it true that the constant buffer size of MSL is more like deciding arbitrarily that you only need 15 foot walls when it turns out you may have needed 16 foot, or only 5?
30 Jun, 2011, Tyche wrote in the 40th comment:
Votes: 0
Runter said:
Your function makes no assumption of the size of the array passed to it. I'm going to say it's probably some constant like MAX_STRING_LENGTH (what, like 1000 characters?)

This really sounds like you're making the claim that as long as the bucket is large enough you can put any arbitrary amount of water in it without fear, to use your previous justification/analogy. Isn't it true that to be safe you must check the known array size?

Also, isn't it true that the constant buffer size of MSL is more like deciding arbitrarily that you only need 15 foot walls when it turns out you may have needed 16 foot, or only 5?


Well using a constant buffer size is fine as long as you actually check it.
When using protocols designed for stream processing, use a damn state machine.
This look ahead crap will ALWAYS bite you in the ass.
That applies to Telnet, VT100, and yes, even these ROM color codes.
I think functions ought to perform a given operation, not several.
You immediately make the trivial, complex.
Maps often use the ^ character for trees or mountains. It ought to be reliably escaped.
It doesn't even do what the poster asked, they haven't figured that out yet.

This does exactly one thing and does it safely and correctly: It removes unnecessary ROM color codes.
char *cc(char *s,size_t sz) {
char lc=0,ef=0,*rp=s,*wp=s,esc=0136,*c="abcegjlmoprstvwyABCEGJLMOPRSTVWY";
search:
if (!*rp || (rp == s+sz)) goto done;
if (ef) goto escaped;
if (*rp==esc) ef=1;
else *wp++=*rp;
rp++;
goto search;
escaped:
if (strchr(c,*rp)) { if (lc != *rp) {lc=*rp;*wp++=esc;*wp++=*rp; }}
else {*wp++=esc;*wp++=*rp;}
ef=0;
rp++;
goto search;
done:
*wp=0;
return s;
}
20.0/52