2010-05-03

C/C++: undefined behavior explained

Figure #1:
char *s = "Hello?";
s[5] = '!';
printf("%s\n",s);
Figure #2:
char *s = "Hello?";
char s2[] = "Jumbo?";
s = s2;
s[5] = '!';
printf("%s\n",s);

It is impossible to tell that the line s[5] = '!'; contains error or not. You can have some speculation according to some static code analysis, but you cannot tell for sure. That's why it will be a run-time error (if it is an error).

Writing a string literal leads to undefined behavior. If we would say that in this case the compiler must generate a code that crashes, then it would be a feature, some kind of requirement for the compiler's manufacturer. Every usage of char * for writing would have a literal-check, which obviously has a very large overhead. The writers of the standard decided not to make any behavioral requirements in such cases. After that you can imagine other undefined behavior cases or you can look them up and think about each of them individually.

Back to the example! In some cases it will lead to changing the literal, in other cases it will crash with "access violation" or "segmentation fault", because we are trying to write to a read-only area of the memory. ... To avoid accidents, I (the const fetishist) recommend you this:

const char *s = "Hello?";

You will have a compile time error instead of undefined behavior and you can fix your code and-or idea.

No comments:

Post a Comment