G+: Does `unsigned int u = UINT_MAX; signed int …

David Coles 12 Apr 2012

Does `unsigned int u = UINT_MAX; signed int i = u;` have undefined behaviour in C? I know it's fine if `u` can fit in `i` and also fine to do the reverse (signed to unsigned behaves like modular arithmetic).

Now kind of curious to see what happens with bitwise operations. Surely that means logical bitwise operations only make sense using unsigned integers - otherwise you start getting into all sorts of fun with arithmetic shifts and logical operators on negative values.

I'm sure I recall +Matt Giuca having a discussion about this (and arithmetic shifting) at some point.

(The thing that kicked this off was me wondering if http://commandcenter.blogspot.com/2012/04/byte-order-fallacy.html is safe for reading a signed 32-bit integer from a byte stream)

The byte order fallacy

Matt Giuca 12 Apr 2012

Yes; behaviour of signed integer overflow is undefined. Unsigned integer overflow is well-defined (truncate the bits). Signed integers are technically not stored in any specific representation (although everybody uses Two's Complement), so the behaviour of overflow is undefined.

Bit shift operations are defined in terms of arithmetic. A left shift is defined as multiplication by 2^n, and if it overflows on a signed integer, that is undefined. You raise a good point about logical bitwise operations (&, | and ^). Because of the undefined bit representation, I can't imagine they would ever be well-defined for signed integers. They probably aren't, but I would have to check. (There's a possibility that they may be well defined for positive signed integers.)

David Coles 12 Apr 2012

I guess the correct way of handling signed to unsigned is something like this:

uint8_t u = 255;
int8_t i;
if (u > INT8_MAX) {
// Assume 2s compliment
i = -2*(INT8_MAX+1) + u;
} else {
// Unsigned value fits in signed one
i = u;
}

Matt Giuca 12 Apr 2012

Why are you doing that? What does "// Assume 2s compliment" mean? If you assume the hardware is 2s complement, then you can just assign and it will be converted. (Technically undefined, but if you are making that assumption, then it will work.)

If you are manually turning it into the number it would be if it was using 2's complement, then ... what is the point of having it in that representation? Basically, if you want to be able to do bit-level things on it, you should keep it unsigned. If you want it to be arithmetic, then the fact that the large positive number turned negative is meaningless anyway.

From my point of view, if you really want to convert unsigned to signed in a meaningful way, you should be making an error if it is > INT8_MAX. Or better still, just don't use unsigned. Never use unsigned int unless it represents some data that will never need to be compatible with a signed int ever.

In other words, it is okay to use unsigned int for an array index, because all array indexes are positive. But you should not use unsigned int for a width or height -- even though widths and heights cannot be negative, they are frequently used in arithmetic operations with signed ints, so they must be signed quanities.

David Coles 13 Apr 2012

In this case I was trying to do construct a 2's complement integer (such as one you might read byte-by-byte from a network buffer) with bitwise operations (hence why initially storing it in an unsigned int) and then turn it into the corresponding C integer type without risking "undefined behaviour".

Having a chat with our resident C/C++ expert at work, the conclusion was that while technically undefined, it just copies the bits directly across, and that there are virtually no systems that use anything but 2's complement to represent signed integers.

Basically mixed mode arithmetic is fraught with problems. Avoid it if you can, but if you must do it, you want to explicitly cast them to the same types before arithmetic, rather than letting C's rules do something you might not expect.

Matt Giuca 14 Apr 2012

Oh so you were assuming that someone gave you an unsigned int with the bit pattern of a two's complement integer, and you were then converting it into a machine-independent int value without undefined behaviour. Yeah, I guess that works. And I guess if you want to be strictly correct, you have to do something like that.

Looking at the code -- it seems like it will work for 8-bit ints, but only because the calculations are done as normal ints which don't overflow. If you apply that technique to normal ints (remove the 8_ts), your algorithm overflows.

First, "u > INT_MAX" will technically work because INT_MAX is a macro, but it kind of is a signed/unsigned comparison. So I'd use "u > (unsigned int) INT_MAX".

Second, "-2*(INT_MAX+1) + u" overflows a few times! First, INT_MAX + 1 overflows an int, then -2* makes it way too small. It "happens" to give the right answer because the machine int is two's complement.

The best strictly correct solution I have is "(int) (u - INT_MAX - 1) - INT_MAX - 1". It converts to a signed int at precisely the right time -- it subtracts INT_MAX+1 to bring the number into the positive signed int range (valid as both a signed and unsigned int), then casts to signed, then subtracts INT_MAX+1 again to make it negative. Interestingly, a smart compiler could make this a noop, because it subtracts exactly 2^32.

But yeah, I kind of agree with your C++ guy -- it's probably not worth it, given that you can likely assume that all machines represent int in two's complement.