First Ping

It’s been a long hard road with the networking engine, and I’m absolutely no where near the end…  but I have successfully pinged my server from my client!

Proof, because pictures make everything better, click it to see the animation.

There are loads of problems still, and a lot of those are simply because I haven’t even coded some of the bits that are needed to make things work correctly.  I’m very happy to see a client-server round trip working, and it gives me a lot of hope that this version of the network code will actually be the one I use without having to start over again.

Here is a list of some of the things I know don’t work correctly yet:

  • When a client disconnects from the server, there seem to be some weirdness around the server accepting new connections afterwards.  (Not sure if different ways of disconnecting cause different issues.)
  • When multiple clients are connected to the server, there is no way for the server to accurately identify which one is which, so things that are expected to return to the client that sent them (like a ping) actually end up going to the client that connected first.  (This one is easy, the clients don’t yet have the ability to register themselves and get their “source ID” from the server.  That means they all use the same hard coded ID.)
  • When the server sends a message that is intended as a broadcast, it is not actually broadcasted to all connected clients.  (I think this is the same problem as the one above, but I listed it separately because I am not positive yet.)
  • There is no way yet to do directed messages from either the client or the server.  (There isn’t a way for clients to request the list of valid target IDs and their associated names yet.)

This list probably isn’t complete, but it does represent some of the more obvious things that still need to be added to the network engine.  Please don’t take this as negative, I’m thrilled to have successfully gotten a round trip ping to work, and I’m very happy that the list above no longer includes things like “make the network engine work at all.”  I call this a total victory with a little bit of a TODO list.  :D

Ignoring Network Events

So as usual around here lately, I’ve been working a lot on the network code.  I suspect if you read this blog with any amount of consistency, or if you are reading all of the posts all at once as a form of weird entertainment, you are probably getting a bit tired of hearing about it.  I am too, so I get it.  The problem is that networking is hard and complex and easy to screw up.  The other problem is that it’s such an important core piece of QubeKwest, that I have to get it right to have any hope of building a game on top of it.

Today’s networking story is about network events.  These magical things fly around inside the various parts of the networking engine and change flags on selection keys and allow it to behave accordingly as it does its thing.  The start of the problem was that I noticed that every single time I connected to the server from the client, the client took almost exactly 3 seconds to do so.  To be fair, I could ignore this problem and say “well it’s slow, but it works,” at least for a little while.  That didn’t feel right to me, so I started to dig in and study the code to try to figure out both why it was happening, and why it was so consistent.

Thanks to 3 seconds being the selector’s select() method timeout, and a time that I chose when I realized that the network engine was never shutting down properly, I had a place to start in my search.  Unlike so many other issues with the networking code, this one jumped out immediately.

The process of connecting to the server from the client is a two step process.  The first step is to set up some stuff and schedule the second part to occur once the selector gets to it.  The second part is for the selector to realize it has a connection to complete, and to, well, complete it.  On the surface, this all seems pretty straight forward, but these things are taking place in two different threads.  The client network engine is up and spinning in its own thread, and the UI has its own thread too, and that will be where the connection process is started.

If we look at the client network engine thread first, we see that it does something a bit like this:  process any events that came in, select() on our selector for things that are ready, and then process the stuff that’s ready.  Then it loops back to the start of that and does it all over again.  The problem is that select() is a blocking call.  That means that in a network that isn’t busy doing anything (like before it has connected) there is nothing to cause the select() to ever finish waiting for something to do.  This is the exact spot that I added the 3 second timeout, so it would have the chance to check if the loop is supposed to end because the engine is being shut down.

Do you see the problem?  My selector is sitting there waiting for something to have its flags changed by an event.  The step where events change things happens somewhere else in the loop that isn’t happening because select() is blocking the loop from running.  This causes network events to be effectively ignored for 3 seconds.  That means the pattern is really more like this: when select() times out in 3 seconds, it immediately loops back around, processes the events, tries to select() again, and that ends immediately since there is a flag set from the event.  Just like that, the second part of the connection process immediately processes, but after the 3 second timeout.

Now comes the hard part.  Determining what was happening was easy, coming up with a way to fix it will be quite tricky and I believe to have any shot at it working, it will need to be yet another thread.  This new one has only one job to do.  Watch for events and process them immediately in a thread that isn’t being blocked by the select() method.  I don’t think the code involved in the new part will be terribly difficult to write, but it will require a bit of messing with the client and server engine codebases to integrate the proper use of it.