Thursday, May 07, 2009

Math or Common Sense?

This is an offshoot of an interesting debate (albeit one that ended without conclusion). We've been encountering quite a few out-of-memory conditions leading to crashes. We had not seen them before, but from last one week, we've seen it at least half-a-dozen times. I'd always been surprised that most places in code just assumed that a dynamic allocation works... all the time. I was always told that the system has been sized so that it would never run into out of memory conditions. I thought it would just be a matter of time. I guess I may have been right. I say that I may have been right because there may always be memory leaks that is leading to this situation. Fixing ALL leaks may just ensure that we do not run out of memory... not ever, but perhaps for now.

Now, onto the topic of debate. Math or Common Sense... Its about how to go about designing a model for memory usage in your application. 

Math
You know what your application is supposed to do and you know the algorithm that does the job. You work up numbers using the algorithm's complexity and fix up a peak memory usage. Then you ensure that you are not above the architecture specified maximum memory. Simple? Well, hardly I'd say! Lets list out the pros and cons of this approach: 

Pros
  1. Simplifies coding - no ugly allocation failure handling anywhere in the code base! 
  2. Some people like reading the above line, so this makes point 2! 
Cons
  1. Freezes design and makes extensibility a complicated affair - every major release of your product, one has to revisit the memory sizing to make sure that its still enough for the new code that's going to be put in. If the existing sizing falls short, then back to the drawing board! 
  2. Memory leaks can wreck havoc - while its a cardinal sin to leave in memory leaks in code, this approach gives no leeway whatsoever for leaks! This means more pressure to fix these leaks, fast! 
  3. The end-user's application suffers due to a erraneous sizing or limitation of underlying hardware 
Common Sense
You design and write code knowing that out of memory conditions are a fact of life and be prepared to gracefully handle such conditions. At the worst, a sub task in your application may fail - at least it does not bring the whole application down! You still may need to do some sizing - but this would just be to ensure that you do not hit out of memory conditions very frequently and nothing more than that. This overcomes the cons listed above but may make coding more complicated. Well, I'm a firm believer that its the design that should be simple and not the code! 

There is one way to make the math design work. Your model should ensure that memory allocation ALWAYS succeeds. Either make allocations a blocking call or implement some sort of a resource wait that suspends the task until enough memory becomes available. Crashing/Aborting an application just because an allocation fails is plain cruelty! 

Feel free to speak out your mind in case you have an opinion on this. 

5 comments:

  1. I think I'm missing something here, but shouldn't it the ideal situation be "Math + Common Sense"?

    Lemme elaborate...
    To sum up your "common sense" section, an application/algorithm CANNOT afford to ignore out of memory situations and should do sufficient error checking. If I missed something that u wanted to say here, heLu...

    So, my point is, when u're writing something you end up doing both math and common sense alva? You make a rough estimate that these are gonna be my memory requirements, based on the data structures u choose (keeping in mind the algorithm complexity, an excellent point that u made)...And then u go about coding in a way that doesn't break the application, something like

    ptr = calloc
    if ( ptr == NULL )
    handle gracefully

    Am I making sense?

    ReplyDelete
  2. Quoting from my "Common Sense" paragraph:

    "You still may need to do some sizing - but this would just be to ensure that you do not hit out of memory conditions very frequently and nothing more than that"

    ReplyDelete
  3. do you just want me to re-read it or did you want to make a point there? I didn't get your last comment...Maybe you didn't get my point then :)

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. Ah! The joys of memory leaks :)

    Well, in addition to math and common sense, may I suggest another option? Engineering.

    1. Custom memory management: Having your own memory manager that for example, uses an arena allocation scheme could ensure that you can gracefully handle OOM cases, per 'sub-task' as you said (by giving each task it's own arena). Also, there is the benefit that you don't have to go re-write your entire codebase to handle OOM. You could just have your manager override libc's *alloc calls and put in all your graceful handling there.

    2. A really brave (difficult but most effective?) solution would be to implement gargbage collection. It really makes sense in memory constrained environments. And no - it's not as performance-expensive as it used to be. There has been tonnes of research on making gc faster and there are practical solutions that work very well.

    Admittedly, No. 2 is very ambitious and may not be feasible - but I think No. 1 is just a couple of man-week's of work at worst.

    As for the case that you mentioned always succeeds - in a really constrained+leaking environment, it's likely that your blocking call will block infinitely, which is as bad as, if not worse than, crashing - don't you think?

    ReplyDelete

What I want to say is: