Serialization Round Four

Serialization is the conversion of an object into a magical stream of data that can be deserialized later back into an object.  At first glance you could probably be convinced that this is a totally unnecessary thing for objects to be able to do.  For most simple projects this sort of thing never even comes up, so you’d be right if you figured they’re useless.  For QubeKwest however, they aren’t just needed, they’re needed a lot.

Consider for a moment all the data in QubeKwest.  Everything from the world you are walking around in to chat messages between players is just data at the end of the day.  If the objects in the game can be easily converted to data and then can be easily restored, you suddenly have lots of fun new options for how they can be used.  Pure data versions can be shuttled down a network wire or stored away in files on the disk.  Later when you need the objects again they can be easily loaded from the disk or reconstructed by the client on the other end of the network.

There are loads of different ways to serialize data.  Some are crazy flexible, some are practically automatic, some involve elaborate frameworks, and others are hard coded.  Some make serialized data that is large and fluffy and others are small and concise.  Due to the vastness of the worlds I’m trying to build with QubeKwest, it makes sense to make sure my serializer makes as little data as possible.  That loses me some flexibility, but it means I can store larger worlds in less space and that I can use less bandwidth when you’re playing.

I set out to make my own serialization patterns and to say it’s been an evolutionary process isn’t giving it proper credit.  My first idea involved saving things into 32-bit integers.  I built a collection of functions for aligning data into the integers and started making the basic patterns of how my serialization would work.  Almost immediately I realized that serializing things into integers was a total pain in the butt, and really not worth the trouble.  With that, version 1 died a horrible death.

Later, while learning the finer points of NIO and LWJGL, I realized that both of those make rather heavy use of ByteBuffers.  My integer pattern could be adapted fairly easily to use bytes and that allowed me to throw away all the issues with aligning data in integers and all the masking and shifting that involves.  Even more conveniently, ByteBuffers already have methods for storing and retrieving every type of data I could possibly need.  They even let you add other ByteBuffers to them so objects that have other serializable objects in them are easier to deal with.  This was version 2 and it died when I tried to deserialize things without any direct knowledge of what I was deserializing.  

The next version added the idea of a header to the serialized data.  I could use that to identify what type of data it is I’m looking at when it’s still in its serialized form.  The header data itself was only defined by a collection of constants and the way several helper methods dealt with it.  This wasn’t a great solution, but it lined up with the mess I’d been crafting.  This version also started to add the idea of variable sized data objects.  I freely admit that it was wishful thinking to pretend that everything could be a fixed size, but I knew that variable sized things were going to add a lot of complexity.

Before long, I realized that I would probably need multiple versions of a particular type of object over time.  I know I saw this coming, but for some reason I didn’t include this feature into the header data I was using.  The problem is that if I can’t identify which version I’d serialized, I wouldn’t be able to load it properly.  With that, version 3 also passed on.

The evolution continued when I realized I didn’t just want versions, but also that I’d created an honest to goodness mess of things for serialization.  My interface had static methods on it to avoid creating dummy objects just for the sake of asking them questions about serialization.  The problem with that, is that the static methods were literally on the interface and while they could be overridden they were not enforceable.  Every time I implemented a new object, I had to remember to create those too.  As mentioned before, my header data was a twisted jumble of constants and helper methods.  To fix that I created actual serialization header classes.  That meant a bunch of my helper methods now had a proper place to live and all the constants that told where things go could just be removed.

Attempt number 4 was born as I cleaned up all of the serialization patterns that were broken and sad.  It was a total refactor.  To make the use of the serialization more logical in other packages, I also pulled it into its own package.  All these changes were just the tip of a pretty serious ice burg because after I started the refactor, all of the objects that I had already created the serialization functions for had to be revised to follow the new patterns I’d just established.  Now two days later, I think I’ve finally finished and I’m actually really happy with how it all works.  Hopefully this version will be the last one, but if I’m being honest, I sort of doubt it.