QubeKwest Blog – Page 5 – Epic Kwesting in the land of Qubes.

Streaming Wholeness

A while back I had a brief moment of horror spawned from the idea that I was doing my networking stuff all wrong. The good news is that I wasn’t totally wrong, the bad news is that I needed to make more stuff to do things correctly. The premise is actually pretty simple. I had assumed that when I sent a serialized 100 byte object across the network, that it would arrive at the other end as a convenient 100 byte read of data from the network. I’m not entirely sure why I thought this would be the case, but I have some theories.

First, every single tutorial I’d seen online or in books had no mention of this problem. Second, I’m pretty sure that on localhost, this would be pretty close to how it actually works. Third, the tutorials were sending things that didn’t actually have any logic behind them. All these things settled into my brain as a simple “well, this must just work.”

The horror was at the theory that I was somehow mistaken. I’m not really sure where that horror came from either. It just sort of popped into my head and left shortly afterwards, only to appear again later on. I think perhaps I realized that the way I had been coding it felt a bit like I was getting away with something. As it turns out, I was right about that, unfortunately.

Let’s jump back to that 100 byte object. If I send it, followed by 3 more of them, I’ve now sent 400 bytes of data, and all this happened as 4 simple writes on the sending end. On the receiving end, there is the whole interwebs in the way. So, while the 4 blobs of 100 bytes are guaranteed by the protocol to arrive in the correct order, and in their entirety, there is nothing what-so-ever that guarantees that they will arrive as 4 convenient 100 byte deliveries. In fact, the worst case scenarios are vastly more likely.

Before we delve into the example above, let’s take the same approach most tutorials use. We will send “Hello World” as our data, all at once. If that all arrives all at once, and we show it on the screen, we will see “Hello World” and all will be just great. If it arrives in 3 pieces containing “Hel”, “lo Wo”, and “rld”, and we show them on the screen as they arrive, we actually end up with the exact same result. “Hello World” is on the screen, and everyone is happy about it.

The problem actually comes from the fact that I’m not just sending arbitrary strings of text across the network. I’m sending carefully crafted collections of bytes that represent the various data structures in the game. Everything from a chat message to a chunk full of blocks in the world just boils down to a pile of bytes. The problem is of course that if I don’t have all the bytes for a data structure to repopulate all its fields, then I can’t reconstruct it properly.

Armed with this new wrinkle in the data I’m sending, let’s return to the example from above. We’ll send 4 pieces of 100 bytes. Imagine that they arrive in the following deliveries of data: 1 byte, 72 bytes, 285 bytes, 42 bytes. Due to the structure of my serialized data, that first 1 byte delivery isn’t even enough information for me to figure out what kind of thing it was that I received. The next 72 bytes allows me to determine the type of data I’m dealing with, which in turn allows me to determine the amount of data I need to recreate something of that type. With 73 bytes delivered so far, you’ll probably notice that I still don’t have enough information to actually do that (since our hypothetical data blobs are 100 bytes each). Next, we receive 285 bytes all at once, thanks to the magic of how networks work. That means we are up to 358 bytes, which also means we can now reconstruct our first transmission successfully. But wait, we can also reconstruct the second and the third objects as well, so we should definitely do that. We now receive 42 more bytes, which brings us up to 400 bytes received, and we finally have the ability to finish the process of dealing with the data that was sent to us.

To help handle this situation, I have now created a class I call the DeserializationStreamer. This new structure orchestrates the buffering of incoming data and assists in determining when it’s possible to deserialize something that has arrived, and also in actually doing that deserialization because it has to manage the buffers it’s using when that happens. This was a long and complicated story, but the moral is probably something like “things are seldom as easy as they seem, and they are always way worse when networking is involved.” Or something like that.

Silly Threads

It’s actually become quite entertaining to me how multiple threads can make silly things happen. Like when a piece of code that you would never assume could go wrong suddenly develops amazing ways to surprise you by breaking code that is absolutely no where near it and in ways that are almost perfectly random. This is actually a little related to networking code in QubeKwest because it is the first piece of the software that is actually multi-threaded, but I’m sure it won’t be the last as I try to squeeze every little bit of power out of whatever system I end up on.

In an attempt to prevent multi-threaded problems I spent a solid week walking around with printed out pages of source code and a pen. I was marking every line of code with little letters that indicated which threads I thought could get to it, and “playing computer” in an attempt to figure out how something could go wrong. The end result of this effort was that I found and likely fixed several potential problems and then located and purchased a tool called ThreadSafe by Contemplate. It was a very big hit on my meager indie development tool budget, but it found a problem I hadn’t considered. Actually, I’m fairly proud that it only found one issue considering how many things it looks for and how big my codebase has become.

A new rule I made up and call “the core rule of multithreaded code” can be summarized as “If multiple threads can run the lines of code in a method in an inconvenient order, they will be run in the most inconvenient way possible.”

Consider this code for a moment, but pretend that all of the parts of it are actually using fancy or correctly named versions from the java.util.concurrent package as an attempt to make this work as well as possible without synchronization blocks. I just didn’t want to use their longer names for this example because that would make things more complex for no reason.

private HashMap<Integer, List<String>> hash = 
    new HashMap<Integer, List<String>>();

public void addToList( Integer inKey, String inData )
{
    List<String> list = hash.get( inKey );
    if( list == null )
    {
        list = new List<String>();
        hash.put( inKey, list );
    }
    list.add( inData );
}

This code manages a collection of lists and was based on the idea of “a list of stuff to do for this thing” but was simplified to just a list of strings associated with an integer key. First, I’m going to explain what this code does when you call it in a purely single threaded way.

Let’s say you call it with the parameters 1, “Taco”. First it attempts to load the list of strings for key 1. If there wasn’t one yet, it makes a new list and puts it into the hash map associated with key 1. Afterwards, it adds “Taco” to the list and ends. If you call it a second time later with the parameters 1, “Burrito” it tries to get the list associated with key 1 and succeeds. That means the if block that sets up a new list never happens and it moves on to add “Burrito” to the list and ends. This all flows nicely and covers your various edge cases and generally makes good sense.

Now, imagine you call it twice at the same time in two different threads. (Which is actually possible in my network code.) The first thread uses the parameters 1, “Taco” and the second thread uses 1, “Burrito” just like the two separate calls in the example above, except they happen at the same time. Following the rule that inconvenient things must occur in the most inconvenient way possible, we’ll also assume that there is no list attached to key 1 yet. Thus, both threads enter the method, both threads attempt to get the list for key 1, neither thread gets a list (because they are both the first). Next, both threads check for a null list (which they both have), both threads make a new list, and both threads put their list into the hash map for key 1. This is something that succeeds in both cases, but only one at a time, and where the problems really start.

Let’s say that the thread trying to add “Taco” goes first. Now the hash map has a list that was successfully put into the spot for key 1. Immediately afterwards, the thread trying to add “Burrito” puts its brand new list into the hash map for key 1 too. This also succeeds, but replaces the list that was in the spot already from when the thread trying to add “Taco” put its there.

Consider the state of the local variables in each thread after this has occurred. Both threads have a variable called “list” that refers to a perfectly valid new list, but only one thread has its list set in the hash map. (But they both think their list is the one in the hash map.) Finally, both threads add their new value to their own reference to a list, and exit. What really happened though? The thread putting “Taco” into a list did so to a list that is only known to the local list variable, and the thread putting “Burrito” actually puts it into a list that the hash map knows about. That means that both calls to the function completed successfully, but the result is that in the hash map, there is a list that only has “Burrito” in it. The “Taco” data is simply gone the moment the local variable “list” goes out of scope when the function ends.

This demonstrates how multithreaded code can blow up, but now let’s think about the side effect of this event. Later on, code is going to grab the list for key 1 out of the hash map to do things with the list of strings in it. It will also probably just be missing values (like “Taco” for example) and have no idea that they are even missing. The end user expects the results to be shown for multiple items of food, but some of the ones it figured should be there simply aren’t. Now a developer attempts to figure out why.

The bug report talks about code misbehaving code somewhere no where near the place where the problem actually occurred, and it is random which data is missing so they can’t even reliably reproduce the problem. There is nothing that requires the “Taco” thread put its list in first, so it could be “Burrito” that’s missing just as easily, and completely randomly.

Better still, attempting to walk through the code with a debugger shows the function completing successfully twice, and in a lot of cases, the use of the debugger itself is enough to cause the problem to not occur at all. After all, a breakpoint that causes a delay long enough for one path through the function to already see the other list in the hash map would make the problem vanish.

The solution is to use a special method on the concurrent form of a hash map that atomically puts something into the hash map only if there isn’t already something there. This would make the code above look like this.

private HashMap<Integer, List<String>> hash = 
    new HashMap<Integer, List<String>>();

public void addToList( Integer inKey, String inData )
{
    List<String> list = hash.get( inKey );
    if( list == null )
    {
        list = new List<String>();
        List<String> tempList = hash.putIfAbscent( inKey, list );
        if( tempList != null )
            list = tempList;
    }
    list.add( inData );
}

This should look really familiar, but the attempt to set the list to the hash map is now protected by the use of putIfAbscent() and there is a second if block to adjust things based on how that went. The putIfAbscent() method internally protects against things happening at the exact same time, so they are forced to get in line and happen one after another. Which order that happens is still random, but they at least there is a little bit of sanity instead of none at all. Using the same multithreaded example as before, the new flow has been changed to this:

The first thread uses the parameters 1, “Taco” and the second thread uses 1, “Burrito”, and as before they happen at the same time. We will also repeat the scenario where both threads are attempting to be the first attempt to set a value associated with key 1. Thus, both threads enter the method, both threads attempt to get the list for key 1, neither thread gets a list (because they are both the first). Next, both threads check for a null list (which they both have), both threads make a new list, and both threads attempt to put their list into the hash map for key 1. This time however, either thread can only put their list into the hashmap if there isn’t one already there. The putIfAbscent() method returns null if there was no list already set to the key, or if there was a list, it returns the one that was there instead of actually putting the new one there.

This gives us the opportunity to behave in a way that makes more sense. Now, when we try to insert a new list into the hashmap and there wasn’t one there, we succeed and move on to add our string to the list without any trouble. When we try to insert our new list and there is already one there, we get the one that was there back from the attempt. If that happens, we simply replace our new list with the one we got back. That means we have a new list that will be garbage collected, and that both threads actually have a reference to the exact same list.

As before, let’s say that the thread trying to add “Taco” goes first. Now the hash map has a list that was successfully put into the spot for key 1. Immediately afterwards, the thread trying to add “Burrito” tries to put its brand new list into the hash map for key 1 too. Instead of blowing away the list put there by the thread trying to add “Taco”, it actually returns the list the “Taco” thread put there. After that, we simply check to see if there was some list returned by the putIfAbscent() call, and if there was, we use it as our local list. That means when “Taco” gets added to the list, that works fine. It also means when “Burrito” is added to the list, it’s actually doing it to the same list the “Taco” thread used instead of using the one it initially created for the job. That means both strings are in the same list, and that list is correctly stored in the hash map. That in turn means the code actually worked, even when the most inconvenient way possible was used, and that is a reason to be happy. I like reasons to be happy.

Network Round 5

Networking is still kicking my butt. At the end of the previous post, I’d started what I refer to as “Network Round 4” and I’ve already retired that version. It actually temporarily became the “com.qubekwest.net.old” package (referred to just as “old” later) as a place to constantly refer back to it as I write what I now call “Network Round 5.” This has obviously taken a bit of work, but it isn’t really as horrible as it probably sounds, honest.

The cause of this new iteration of the network code is actually that I’d gotten pretty far with the previous version, and then realized that I sort of had no idea how to actually use it. Meaning, it probably worked, but I didn’t have any clean patterns to follow to use it in something useful. After piles of refactors to try to make it useful, I gave up. This in turn inspired a collection of interfaces that I wanted to write and that occasionally conflicted with names already in my networking code. The best solution I could come up with to make this as easy as possible was to strip the network code back to nothing as described above with the “old” package and start making new files in the “net” package that was now completely empty.

This creation of interfaces helped me to determine two things at the same time. The first was the things that a “user” of the network engine (which is actually other pieces of code, not people) would need it to be able to provide, and the second was the things that the network engine does that don’t need to be exposed to anyone else. Those results may be immediately obvious to other people, since that’s exactly what interfaces are supposed to do, but in this case it was actually supremely useful.

From the first category of “need to be able to provide” I knew that I needed a collection of methods that the server had to be able to do to actually be a server. For example, it needed to be able to send data to clients. What surprised me a little was the number of different ways I had to be able to send data to the client. My primary pattern is that I’m sending data from a specific source to a specific target. Other methods build on that concept for the ability to send updates to other users that are near me in the world. Maxing out the idea of sending to groups there are also methods for broadcasting a single piece of data from a specific source, but to all connected users. This covers things a server needs to be able to do for other pieces of code.

The second category of “don’t need to be exposed” included things like “accept a connection.” I initially put a method to accept a connection in the interface for a server, but then realized that there is literally no code outside of the network itself that would ever call it. If you think about it for a moment, you will probably realize that the method for accepting connections must be there, but that it will be called because of network traffic, not by some other piece of code. That means that it can really be a private method on the network server code itself and building the interface helped me to realize that.

There are several other interfaces as well. I’ve defined what it means to be a network worker thread, which is a piece of code that actually does the work of the server. There can be any number of workers in a server, and they each get incoming data handed to them if they want it. They can’t affect each other, but they all get a shot at doing something with the data as it comes in. There are also handlers that create a sort of call back structure that allows clients to send data to the server where a response is expected.

This whole ball of wax is being implemented with the Java’s NIO package patterns. I still feel it is vastly more complex than other ways of doing things, but I’m also pretty confident that the result will perform better with more people connected to the server than the standard socket approach would. I’m hoping that makes this all worth it. I’ve actually spent almost six months working on, researching, writing and rewriting, and generally being frustrated by the network code, so I sure hope so.

Network Struggles Are Real

I was talking to a coworker about the trouble I’ve been having with coding the network engine for QubeKwest and his comment at the end was simply “The struggle is real.” While the response was intended to be a joke, it turned out to be accurate too. I’ve previously written about networking and how I’d utterly blown up what I’d written. While that is true, there is actually a little more to the story and it took a very long time before I got back to trying again.

I have a deep and passionate dislike for Windows 8 (any of the various flavors) and I had a laptop that I was coding on that came with Windows 8. The result of this was that I wanted almost nothing more than to have any other operating system on it. When Microsoft presented the opportunity to preview Windows 10, I jumped at the chance the very day they allowed it and had it installed on that laptop that very evening. Within fairly short order after that I’d gotten all my various development tools set up again and was ready to go.

The first project was coding a chat server and client that was written using the synchronous java.net package with Swing as the UI toolkit. It actually worked quite well after about 2 days and never made it anywhere near source control before a Windows 10 preview release caused me to lose it just a couple of days later. I wasn’t willing to let that be the end of the story, so once things were up and running again, I started over, this time using the asynchronous java.nio package with JavaFX as the UI toolkit. It was like stepping into the future. A vastly different and obscenely confusing future, but the future none-the-less.

To get this version of a chat server and client working took closer to 2 weeks and at the end it was only sort of working. I was proud of what I had managed to pull off, and once again it never made it anywhere near source control. Those of you out there that are picking up on a pattern here are highly accurate in your observation, but at the time I wasn’t really thinking about source control for a simple proof of concept side project. Well, this version was also lost. As it turns out, this time the Windows 10 preview patch actually bricked my laptop. After 3 days of nearly constant attempts to boot the laptop at all, I finally managed to unbrick it and in the process was forced to fully wipe the drive. Yep, lost that version too.

Then, with the hope of stopping this hideous pattern, I decided my next attempt would be directly inside the QubeKwest code base and would be checked in to source control over and over as I developed it. This is the version that the other blog post referred to when I said “I managed to substantially break it.” In the end, that version was turned into a single massive code comment just so I could get it to compile at all. That was the final fate of the third version of the network code. It lived on as a giant comment for a long time, and was eventually removed entirely from the code to make room for the fourth attempt.

The fourth version was started from scratch and has been through a real struggle of what I call refactoring and unrefactoring. Yes, I’ve decided unrefactoring is an actual word. I’m defining it as the action required to undo the effects of having refactored things. In this case, several parts of the server and client code were refactored to share a common base class. As it turns out, they aren’t quite as similar as I first imagined when I went down the refactoring path and I ended up fighting the results of that refactor almost constantly. Thus, I unrefactored the code and eliminated the base class putting the necessary bits back into the server and client code where they probably belonged anyway.

After all the work involved in the unrefactoring, I am back to where I can try to continue the process of coding the network engine. Fear not, the code is solidly and repeatedly checked in to source control so it’s not going anywhere. I have quite a ways to go to get the networking code to work the way I need it to, but I’m working on it and making good progress. The struggle is real.

Chunk Proxy

This post may have a bit of repeated information for those that follow along closely. A whole lot of Chunk related refactoring happened because I had a new idea about how to do some things. Thus, to make sure everyone is on the same page, I’m going to explain the way it was, and the way it ended up, and why. Feel free to skip around if that suits you.

Chunks are collections of blocks that are 16x16x16 and before the refactoring there were three kinds of them in QubeKwest under the covers. The first is an EmptyChunk, it is a chunk where the whole thing is air. It gives the ability to represent vast empty sections of sky with virtually no storage space. The second is a UniformChunk, it is a chunk where the whole thing is made out of a single material. For example, under a mountain where the whole thing is rock or way underground where you might encounter all lava. Technically, there is nothing stopping you from using a UniformChunk full of air instead of an EmptyChunk but it will take a little more space, so it’s a bit of a waste. The third kind is called a FullChunk (formerly called just Chunk). It details every single block individually and can thus hold anything in any arrangement. All of these types of chunks extend ChunkParent as their base class so they can be treated as the same thing.

The basic concept behind all these different kinds of chunks was that they could be promoted to other types based on blocks being changed inside them. For example if you have an EmptyChunk or a UniformChunk and you change even a single block within it, you have to also change it into a FullChunk. One of the problems with this was that the space it takes to store a chunk has no middle ground. You go from just a few bytes for either an EmptyChunk or a UniformChunk to around 65,000 bytes for a FullChunk. The other problem is, where does this promotion occur so it’s easy to deal with?

The first problem inspired thought about how to make a chunk that can store somewhere between a single type of block and one that has all 4096 individually detailed. This got me thinking back to something I’d coded as a kid. The original Doom saved screen shots in the PCX graphics file format. This ancient format was both palettized and run-length encoded. I learned the inner workings of the PCX file format and coded a PCX decoder so I could make little stop motion movies using Doom screen shots. For QubeKwest, I’d originally considered run-length encoding for blocks in a chunk, but ruled it out because of the constant heavy updates to the encoding when changes are made. This left the idea of using a block palette to reduce the space needed to save a chunk. Basically it’s like an artist’s palette that has things like “1 = air” and “2 = granite” in it so I can reference all of the pieces of granite in a chunk just by using the number 2 instead of a whole block description.

The new PalettizedChunk was born. First, I considered using a palette that used a short (16-bit) index. This was a bad choice because it would actually allow enough entries in the palette to represent every single block in a chunk individually, and could thus result in chunks that take more storage than the original FullChunk did. Next, I considered using a byte (8-bit) index. This would limit the number of blocks in the palette pretty heavily, but it would drastically reduce the space needed to store a chunk. I further realized that because java uses stupid signed bytes, I decided to limit the indexes to 0 to 127. I could have applied a bit of math to the index to use all 256 possible spots in a byte, but I didn’t like all of the primitive type promotion it would involve so I didn’t end up doing it that way. This new version can store 128 kinds of blocks and only takes about 10% of the space as a FullChunk.

The promotion can now go from either EmptyChunk or UniformChunk to a PalettizedChunk when a single block is changed, and then to be promoted to a FullChunk only when the chunk has more than 128 different types of blocks in it. This still leaves the problem of where do the promotions between chunk types occur? And when they occur, how do the various references to chunks get updated without taking a lot of work? The answer to both questions is the idea of a ChunkProxy object (later renamed simply “Chunk” so I didn’t have to type “Proxy” in a million places in the code).

This new object wraps any child of the ChunkParent class (which all 4 types of chunk are) and provides access to the various methods that are actually on the chunk inside. This proxy knows which kind of chunk it is wrapping, and promotes it as needed automatically when an action performed on it requires it. Now the rest of the codebase can just use ChunkProxy objects where they previously used ChunkParent objects. If a chunk gets promoted, the proxy that wraps it stays the same. That means that other things that refer to it don’t actually need to change, and in fact have no idea the promotion has occurred. It’s the best of both worlds.