A while back I had a brief moment of horror spawned from the idea that I was doing my networking stuff all wrong. The good news is that I wasn’t totally wrong, the bad news is that I needed to make more stuff to do things correctly. The premise is actually pretty simple. I had assumed that when I sent a serialized 100 byte object across the network, that it would arrive at the other end as a convenient 100 byte read of data from the network. I’m not entirely sure why I thought this would be the case, but I have some theories.
First, every single tutorial I’d seen online or in books had no mention of this problem. Second, I’m pretty sure that on localhost, this would be pretty close to how it actually works. Third, the tutorials were sending things that didn’t actually have any logic behind them. All these things settled into my brain as a simple “well, this must just work.”
The horror was at the theory that I was somehow mistaken. I’m not really sure where that horror came from either. It just sort of popped into my head and left shortly afterwards, only to appear again later on. I think perhaps I realized that the way I had been coding it felt a bit like I was getting away with something. As it turns out, I was right about that, unfortunately.
Let’s jump back to that 100 byte object. If I send it, followed by 3 more of them, I’ve now sent 400 bytes of data, and all this happened as 4 simple writes on the sending end. On the receiving end, there is the whole interwebs in the way. So, while the 4 blobs of 100 bytes are guaranteed by the protocol to arrive in the correct order, and in their entirety, there is nothing what-so-ever that guarantees that they will arrive as 4 convenient 100 byte deliveries. In fact, the worst case scenarios are vastly more likely.
Before we delve into the example above, let’s take the same approach most tutorials use. We will send “Hello World” as our data, all at once. If that all arrives all at once, and we show it on the screen, we will see “Hello World” and all will be just great. If it arrives in 3 pieces containing “Hel”, “lo Wo”, and “rld”, and we show them on the screen as they arrive, we actually end up with the exact same result. “Hello World” is on the screen, and everyone is happy about it.
The problem actually comes from the fact that I’m not just sending arbitrary strings of text across the network. I’m sending carefully crafted collections of bytes that represent the various data structures in the game. Everything from a chat message to a chunk full of blocks in the world just boils down to a pile of bytes. The problem is of course that if I don’t have all the bytes for a data structure to repopulate all its fields, then I can’t reconstruct it properly.
Armed with this new wrinkle in the data I’m sending, let’s return to the example from above. We’ll send 4 pieces of 100 bytes. Imagine that they arrive in the following deliveries of data: 1 byte, 72 bytes, 285 bytes, 42 bytes. Due to the structure of my serialized data, that first 1 byte delivery isn’t even enough information for me to figure out what kind of thing it was that I received. The next 72 bytes allows me to determine the type of data I’m dealing with, which in turn allows me to determine the amount of data I need to recreate something of that type. With 73 bytes delivered so far, you’ll probably notice that I still don’t have enough information to actually do that (since our hypothetical data blobs are 100 bytes each). Next, we receive 285 bytes all at once, thanks to the magic of how networks work. That means we are up to 358 bytes, which also means we can now reconstruct our first transmission successfully. But wait, we can also reconstruct the second and the third objects as well, so we should definitely do that. We now receive 42 more bytes, which brings us up to 400 bytes received, and we finally have the ability to finish the process of dealing with the data that was sent to us.
To help handle this situation, I have now created a class I call the DeserializationStreamer. This new structure orchestrates the buffering of incoming data and assists in determining when it’s possible to deserialize something that has arrived, and also in actually doing that deserialization because it has to manage the buffers it’s using when that happens. This was a long and complicated story, but the moral is probably something like “things are seldom as easy as they seem, and they are always way worse when networking is involved.” Or something like that.