Intuitive object serialization
Object serialization, as I know it, is getting the object state into a byte array, then using that byte array to construct a new object that looks like the old one. It could be used for transferring an object over the net to a client which then constructs the same object or updates an existing object's state.
I created an object serialization library for this purpose. The basis of my design was that it should be very easy to use. Unfortunately this abstraction costs a lot of memory and speed, but more about that later.
The library lets you read an object's state and use that state to assign variables in any object of same type. These objects might even exist on different computers. Object state is stored in an object called Memento. I'll show a simple example:
using std::string;
//all objects that should be serialized, need
//to be derived from the base class 'Serializable', like in Java
class SharedObject : public Serializable
{
public:
SharedObject(const string& name) : m_name(name), m_position(0) {}
void setName(const string& name) { m_name = name; }
void moveTo(float position) { m_position = position; }
protected:
SharedVariable<float> m_position;
SharedVariable<string> m_name;
int m_someUselessVariableThatIsntShared;
};
Ok, now we've defined an object that has a name and a position, and some useless variable that isn't shared. This is all we have to do. No writing of extra methods or anything. Notice how the shared variables are created as SharedVariables, and the real type is the template argument. This, however, doesn't make them any clumsier to use because I've overridden some operators. Here's how we get and set the object's state by using the Mementos.
SharedObject object("Sam");
Memento m1 = object.getFullState();
object.moveTo(1.5f);
object.setName("Max");
//now the object has name "Max" an it's position is '1.5f'
object.setState(m1);
//now the object again has name "Sam" an it's position is '0.0f'
object.moveTo(5.3f);
Memento m2 = object.getChangedState();
//only m_position was stored to 'm2' because only
//the position had changed
object.moveTo(2.0f);
object.setState(m2);
//now object's position is '5.3f' again
std::vector<char> bytes = m1.getBytes();
//Now 'bytes' contains the m1 data as a continuous char array
//Ready for saving it to HD or sending through the net
//creates a memento from the byte data
Memento newMemento(&bytes[0], bytes.size());
object.setState(newMemento);
Ok, if you could follow that, then try out downloading the real library by clicking here. It has an example application to show you how things are done. It also shows an example of making a shared object that has subclasses (well, it's just like making a normal subclass). Everything in the library is within namespace h_serialize.
Note that this serialization scheme isn't thread safe. If you try to create two objects of same type at the same time, the results are undefined. This is because the serialization system uses an internal static variable to initialize itself (I know, I know, static variables are ugly and should be avoided. But I couldn't think of an alternative way without having the clients do extra work; and we can't have that). Also note that you shouldn't have more than 16 shared variables within an object. This is asserted in the code, but I though it'd be handy for you to know.. Changing the ContentMask typedef in memento.h from unsigned short to unsigned int will let you have 32 shared variables with 2 bytes extra in packet size (hmm, so maybe I should've templated the ContentMask type too)
About performance / memory usage:
Each 'Shared Object' has 4 bytes of overhead. We can handle that easily. But each SharedVariable<> within the object has 12 bytes of overhead! (4 for vtable, 4 for pointer to next shared variable, 1 for boolean which tells if this variable has changed, 3 for padding)
Now I'll describe how the speed compares to the most optimized way of storing an object's state. For objects that have N shared variables, calling getFullState() or setState() has an overhead of:
- N pointer reads, N assignments, N compares, N jumps (all very fast operations)
- N virtual function calls
Now, if you're not concerned about a few pointer reads or function calls and you have enough memory, this library is perfectly valid (as long as it doesn't contain any bugs!). I, however, decided that I need some more speed, and wrote a clumsier, faster system to share variables. More about that later.
Behind the scenes
How does it all work? Well, it's open source so you can check the details there, but I'll give a brief explanation here. The basic idea is that Serializable-object contains a pointer to the first SharedVariable inside that object, and that SharedVariable-object contains a pointer to the next SharedVariable. The last SharedVariable-object points to 0. This way we can iterate through all SharedVariables, read their values and store them into a byte array (Memento). SharedVariable however, isn't a single type, since it's a class template. But linked list may only contain data of a single type. That's why I created a SharedVariableBase-class, that is the base class for all SharedVariable template classes. SharedVariableBase has a pure virtual function that is used to write that variable into a memento. SharedVariable overrides this function to write it's real contents (char, float, std::string, etc.) into a memento. SharedVariable also overrides type conversion operator and assignment operator, so that it can be used somewhat like the type it contains.
But how do we initialize the linked list between SharedVariables? With some constructor trickery. When you create a shared object (one that derives from Serializable), it's base class' (Serializable) constructor is called first. Serializable's constructor initializes some static variables that are used to keep track of the SharedVariable initialization process. After that, the first SharedVariable's constructor is called. It reads the static variables from Serializable and updates them for the next SharedVariable constructor to use. How exactly these static variables help the initialization process, is probably best read from the source.