Monday, February 23, 2009

A Dearth of Debugging

There seem to be two schools of thought for getting to the bottom of a piece of buggy software: those who log everything vs. those that use the debugger. I've been surprised to learn how large the former camp is, at least among those developing for web-based applications. I'm not saying that one should use one approach exclusively, just that the debugger is underused in web programming.

I suppose a reason for this is the perceived difficulty in debugging server-side code from a client machine, but this is more a case of lack of knowledge than an actual technical issue. Most application servers support some facility for remote debugging. Similarly, in Java, at least, the virtual machine itself can be configured to accept connections from remote debugger clients (if, for example, you wanted to configure the JVM to listen for debugger connections on port 8000, you'd include this in the command-line arguments: -Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8000). Once started in this way, the debugger can connect to the remote JVM and do everything it could do if it were running on the same machine.... you can set breakpoints, inspect the contents of memory, evaluate statements and see the full stack trace for all the running threads... all of which are quite powerful when looking for an elusive bug.

The heuristic I use for deciding whether to use the debugger or a logger to find the root of an issue is as follows:

If the bug occurs infrequently and it is not clear how to duplicate it, I use a logger to see if I can detect a pattern. If I can, then I'll use the debugger to look deeper into things. Conversely, if it's clear how to duplicate the bug, I'll usually jump right in with the debugger.


Here are some resources for remote debugging:

Sunday, February 22, 2009

Volatility

In the last few months, I've found myself writing a lot more multi-threaded programs than I usually do. This made me reflect upon some of the constructs Java provides for multi-threading. Two such constructs are the volatile and synchronized keywords. In my years working with a wide range of developers, I've found that most know (at least at a cursory level) what synchronized means, but very few know what volatile means or when it should be used.


The semantics of the volatile keyword are similar to those of the synchronized keyword, but they're not equivalent. When one thinks of multi-threaded programs, they have 3 primary properties: visibility - the ability of one thread to see the work of another thread, atomicity - defining sets of indivisible actions, and ordering - enforcing that some operations in one thread happen before others in another. Using synchronized, a developer can enforce all three properties. When one uses volatile, however, one cannot enforce atomicity.


Why would one want to use volatile over synchronized since it seems to be a less powerful construct? Well, for one thing, it incurs slightly less overhead than obtaining locks with synchronized. Additionally, volatile can be used on primitives and null values (you can't directly synchronize on a primitive). One common use case is for a flag value that is updated by one thread but read by another. Declaring it volatile ensures that the other thread is reading the current value and not its own cached copy.


Java is an interesting programming language for many reasons, not the least of which is the way it insulates the developer from so many concerns of the low-level machine. The difference between volatile and synchronized, however, illustrates that, despite this abstraction, developers are obligated to understand the inner workings of the JVM if they are to write correct and efficient code.