G+: So here's a challenge

David Coles 02 Aug 2014

So here's a challenge. Given a bitarray represented by a Python integer (e.g 0x12345678) work what it's two's complement value is (assuming a 32-bit bitarray).

e.g. 0x00000001 => 1
0xffffffff => -1
0x00000001 => -2
0x7fffffff => 2147483647
0x80000000 > -2147483648

Despite all of Python's integers being based on 2s complement form, it turns out to be surprisingly tricky.

One hack is just to use ctypes.c_int32(0x12345678).value, but this feels a little dishonest.

I worked out a way using `value |= (-1 ^ 0xffffffff)` to sign-extend when the high-bit is set, but it's still pretty long and clunky and relies on a conditional statement.

(+1's) 1

Matt Giuca 02 Aug 2014

Using ctypes is technically system dependent (can't guarantee two's complement), so I like the direction of your final solution more. Note that in the other branch of your condition, you need to & it with 0x7fffffff.

Here's a decent solution which doesn't have a condition:

value = -(value & 0x80000000) | (value & 0x7fffffff)

David Coles 02 Aug 2014

Yup. That's what really urked me about it. My "safe solution" ended up using the struct module to convert to a bytestring with ">I" and then back again with ">i".

I think I can get away without the mask since we've already tested that the high-bit is 1, but you do need something to strip off any sign bit.

Your solution definitely feels much cleaner, I guess "the trick" being it uses the identity -0x80000000 = 0x80000000 (assuming 32-bit bitarrays). Then it's just nice bit-wise operations.

I'm actually surprised how sensible Python's bitwise operations are, even though they're done using signed integers. Sure there's a few gotchas (e.g. needing to mask off the sign-bit), but on the whole "infinite bits" integer representation is quite nice.

David Coles 02 Aug 2014

+Matt Giuca You might want to add your answer to https://stackoverflow.com/questions/1375897/how-to-get-the-signed-integer-value-of-a-long-in-python . I'd certainly give it a +1 over the current answers.

Matt Giuca 02 Aug 2014

"I guess "the trick" being it uses the identity -0x80000000 = 0x80000000 (assuming 32-bit bitarrays)."

That's not really it.

The trick is that for any value x with a single 1 bit, -x is a sign extension of x. For example -0x00008000 = 0xffff8000.

David Coles 03 Aug 2014

That's a much better way of putting it (I didn't mean trick in the negative sense, just "the tricky part" I couldn't get my head around). At a low level the "-" operator just flips all the bits and then adds 1, so for one-bit set values the set bit becomes the only unset bit, but the add 1 step bubbles up to set this bit again, while also clearing all the lower bits, right?

Usually when really stuck, I go and take a look at https://graphics.stanford.edu/~seander/bithacks.html, but I can't find this version here. I guess it's because it takes advantage of Python's "bit operations are treated as 2s complement representation" behavior (technically would be undefined behavior in C).

For arbitrary number of bits:

# b: Number of bits
lambda value, b: -(value & (1<<(b-1))) | (value & ((1<<(b-1)) - 1))

Seems like lots of people find this tricky: https://stackoverflow.com/questions/1604464/twos-complement-in-python

David Coles 03 Aug 2014

If you're wondering why I'm trying to do this, it's to deal with horrible Microsoft HRESULT style error codes (e.g. ones that look like 0x80000001 but stored in a signed int).

All sorts of platform specific assumptions (platform uses 32-bit ints, platform uses 2s complement representation, etc.). To Microsoft's credit, they do try and hide this with lots of typedefs and macros (e.g. FAILED, IS_ERROR), but I'm sure many people expect to be able to test errors by checking for a negative value.

Matt Giuca 03 Aug 2014

Maybe it would be best to just treat as a signed int and say >=0x80000000 is an error?

David Coles 03 Aug 2014

That was my original plan, but it breaks if people try and use the -12345 form and felt it's nicer to match what I'd get if I called said C API directly via ctypes (currently get error as a hexstring from another component on another host).