Friday, July 15, 2005

Fore-warned is fore-armed

Many of the networking or related applications have to marshal/unmarshal packets to/from structures. And its not uncommon to have more than one field within a byte of a packet. So, what's the best way to construct a packet before sending it over the wire? Say, a SCSI CDB (command descriptor block - similar to an SNMP packet) has to be sent to a device target. Lets simplify the whole thing and say that just 1 byte consisting of 8 1-bit flags is to be sent over the wire. What's the best way to represent such a byte? The first solution that comes to anyone's mind would be bitfields. So, let's go ahead with that.

typedef struct CDB_Byte {
  unsigned char a:1;
  unsigned char b:1;
  unsigned char c:1;
  unsigned char d:1;
  unsigned char e:1;
  unsigned char f:1;
  unsigned char g:1;
  unsigned char h:1;
} CDB_Byte;

Now I want to set only the LSB(least significant bit) and the one next to it. Okay, that's simple.

CDB_Byte byte;
byte.h = 1;
byte.g = 1;

But, I have a question. Is this the correct way to do it? The answer is NO. Let me illustrate why this is'nt the correct way with a simple example.

unsigned char c = 0x01;    // LSB set
CDB_Byte byte;
std::memcpy(&byte, &c, 1);    //copy the bits
// to change the value of 'c' to 0x03
byte.g = 1;    // set the bit next to the LSB
std::memcpy(&c, &byte, 1);     // copy the bits back
std::cout << (unsigned)c;

Will this print '3'? The answer is 'depends'. The language standard does not say anything about the allocation of bitfields. The compilers are free to implement them as they deem fit. On linux, the gcc compiler allocates the bitfields starting from the LSB. This can be verified - the above piece of code prints '65'.

hgfedcba

However on HP-UX, the aCC compiler allocates bitfields starting from the MSB. The above piece of code prints '3' (as one might have expected in the first place).

abcdefgh

Moral of the story is this: Never use bitfields to construct structures containing sub-byte fields. Always use a void * chunk of memory, extract a byte, and use masks and bitwise operators to set/extract sub-byte fields. At least, if you expect your application to be portable, don't use bitfields! I would say that bitfields should never be used anyway, 'cos there's no guarantee that the same application will run on the same platform if the compiler vendor rolls out a newer version.

Fore-warned is fore-armed!

8 comments:

  1. sorry i tried i got one thing though:
    never use bitfields!!
    hehe did i do good?lol

    ReplyDelete
  2. Its pretty clear that the compilers are allocating the bit-fields corresponding to the endian-ness of the system. Most likely your gcc compiler is generating code for Linux on x86 (little-endian) and the aCC for HP-UX on PA-RISC (big-endian). And bit-field operations are not portable across different-endian machines...

    Though there seems to be no written standard, the compilers seeem to be sticking to a non-written standard and following the endianness for that architecture. I think the code generated by both compilers would be the same, if the target would have been the same? Dont you think so?

    ReplyDelete
  3. morning rose - very good for a beginner!

    sub0 - i don't think its got to do anything with the architecture. my feeling is that its just a coincidence that it matches the endianness in this case. think abt it - endianness relates to "byte" ordering. NOT the ordering of bits within a byte. if this would have been the case, all the 1s compliment and 2s compliment rules would have gone for a toss. is'nt it? and my post here follows the general theme that's advocated - "using things unspecified by standard causes undefined behavior"

    ReplyDelete
  4. You are right... The byte ordering should not dictate how the bits are interpreted. For that moment, I thought endian-ness referred to the bit order as well. Thanks for clarifying...

    ReplyDelete
  5. This is a good tutorial! Thanks for fore-arming others!!
    I already had this problem once during my coding!

    I couldn't figure out any other method than the void * one as you suggested! Anything different that can be done?

    ReplyDelete
  6. sub0 - ya. for a moment, even i started to think if i figured out all this completely wrong...

    keshav - the void *, masks are the only correct and reliable way of doing such stuff. but, you can use bitfields if and only if...
    1) your application runs only on one platform
    2) your application is built using only a particular version of a particular compiler
    3) you prefer the easier handling of bitfields compared to masks and bitwise ops.
    Note that i've placed the ease of handling as the last point. ideally, this must not affect anyone's decision...

    ReplyDelete
  7. Hey nice blog : I especially liked the way the comments expand in the same window instead of changing the window. Much more organized this way :-)

    ReplyDelete
  8. anonymous - thanks, whoever you are :) and abt the inline expansion of comments - well, its nothing that lil' javascript code can't do ;)

    ReplyDelete

What I want to say is: