Saturday, July 12, 2014

Kafka, Voldemort, oh my!



The quintessential American writer Mark Twain observed, and he may well have been thinking of coincidences when he did so, that
Truth is stranger than fiction, but it is because Fiction is obliged to stick to possibilities; Truth isn't.

And so it was that this past Friday that an esteemed coworker referred in passing to a recurring theme from the works of Franz Kafka. Then, later the same day, over the lunch hour to be precise, as I checked my personal email, I read a new email from a former coworker and friend, telling me about his use of Kafka in his latest software project... No, not that Kafka :-)

I'm not referring to the author Kafka, whom several of you will remember, as I did, from undergraduate English Literature classes as the author of that eerily surreal story, entitled Metamorphosis, in which the protagonist wakes up one morning to discover that he has, overnight, metamorphosed into a giant cockroach; good grief, some authors do get carried away with their imagination, at times, don't they!

The Kafka I have in mind, and which my friend is currently using in his software project is Apache Kafka, a high-throughput, distributed messaging system (Apache Kafka Coding Guide)
Kafka is system software, and certain things are appropriate in system software that are not appropriate elsewhere. Sockets, bytes, concurrency, and distribution are our core competency which means we will have a more "from scratch" implementation of some of these things then would be appropriate for software elsewhere in the stack. This is because we need to be exceptionally good at these things. This does not excuse fiddly low-level code, but it does excuse spending a little extra time to make sure that our file-system structures, networking code, threading model, are all done perfectly right for our application rather than just trying to glue together ill-fitting off-the-shelf pieces (well-fitting off-the-shelf pieces are great though)...
OK, referring one more time to the title of this essay (Kafka, Voldemort, oh my!), it's not that Voldemort either :-)

I do confess, though, that the Harry Potter movies (whose characters include that devious and dastardly Voldemort, aka he-who-must-not-be-named) are aesthetically and technologically among the best-made in cinematic history, almost right up there with Blade Runner, the classic, futuristic movie starring Harrison Ford, based on the book Do Androids Dream of Electric Sheep?).

The Voldemort project which I have in mind here is a software project that provides a distributed key-value storage system. Data is automatically replicated over multiple servers, and each node is independent of other nodes, with no central point of failure or coordination. Concurrency in Voldemort, as I understand it from having briefly glanced at its API, makes use of the usual advisory annotations for thread safety.

At any rate, having laid out very briefly the concurrency aspect each of Kafka and Voldemort, the case can be made, as I hear it, that using the Scala programming language would make a given code-base less cluttered, and hence that much easier to grok than, say, the equivalent version written in Java. But neither the shepherds nor the users of Java have been sitting idly, and we fortunately have first-class resources on which to rely.

And speaking of concurrency, consider this essay as a more specialized follow-up to the fuller, albeit general, discussion of concurrent programming (for the JVM) in my previous essay, and which is entitled Java Concurrency in Retrospect. Heading the list of first-class resources on which we Java programmers can rely on is of course the amazingly rich and highly regarded book Java Concurrency in Practice, by Brian Goetz, et al (Addison-Wesley). Among the other, excellent resources, I would recommend these without any reservation:
  • Concurrent Programming in Java: Design Principles and Pattern, 2nd Edition, by Doug Lea (Addison-Wesley)
  • Growing Object-Oriented Software, Guided by Tests by Steve Freeman and Nat Pryce (Addison-Wesley)
  • Programming Concurrency on the JVM: Mastering Synchronization, STM, and Actors, by Venkat Subramaniam (Pragmatic Bookshelf)
Of these, Growing Object-Oriented Software, Guided by Tests is a gem of a book. The authors get to the heart of the matter by noting that
There’s no getting away from it: concurrency complicates matters. It is a challenge when doing test-driven development. Unit tests cannot give you as much confidence in system quality because concurrency and synchronization are system-wide concerns. When writing tests, you have to worry about getting the synchronization right within the system and between the test and the system. Test failures are harder to diagnose because exceptions may be swallowed by background threads or tests may just time out with no clear explanation. 
Meanwhile, our conceptualization of concurrency (i.e. how we see concurrency) will surely continue to evolve, accelerated and refined by the emergence of the polyglot languages targeting the JVM...
We do not see things as they are, we see them as we are.
~ Anais Nin, in Cities of the Interior

No comments:

Post a Comment