Oliver Sturm's Blog - Defining data consistency in an object world

Many things change with the decision to work with purely object-oriented data in a specific situation. The outlook seems good: business processes and rules will be much easier to implement, completely typed data will be no problem at all and there’ll be no more structural problems trying to accomodate clumsy handling of records and rows in an otherwise OO application structure. There’ll be an object/relational mapping tool that takes care of all the persistence issues. There’s one thing though that will pose problems much greater than originally anticipated, and it’s easy to overlook large parts of that in the original decision: the wide topic of data integrity in the object world (OW). I’m going to present some general questions and theories about data integrity in conjunction with OO data objects in this article and I’m planning to write some further articles on the same topic later. Occasionally I may reference the technology I’m personally using at the moment, which is .NET 2, the C# language and XPO.

Data integrity, what about it?

First of all, data integrity comes in two flavours:

The technical side of things is where we are talking about referential integrity on a database level, mapping of field types between databases and programming languages, things like that.
The logical part is part of what’s often referred to as Business Logic. There are definitions for value ranges, maximum counts of assignments between objects, valid states of object fields and much more. Obviously, business logic in its whole may encompass lots of other things, the part that’s relevant here is only where it relates to object data in the objects, as opposed to object data in flow between objects and states.

Many aspects of both data integrity parts are very different in the OW, compared to a “simple” relational database model.

Why is it different?

People have been thinking about these issues for the relational database scenario for a long time. Concepts like referential integrity and unique indexes are very important in this domain. Normalisation provides for database configurations where automated referential integrity can be fully exploited. Databases have features that let the designer restrict values, together with modern database access layers like ADO.NET the complete technical side of data integrity and possibly some of the logical part is covered by these mechanisms. In the OW, this is where the problems start.

Obviously, a good o/r mapping tool should be able to exploit the features of the database layer, but this isn’t sufficient. As soon as a single object is mapped to more than one table (as it should be, when inheritance is used), many of these mechanisms break. For example, it’s impossible to define a multi-value index, unique or not, for values that are not in the same table. Depending on implementation details of the O/R mapper, even unique indexes over fields that are in the same table may present problems, for example if the mapper doesn’t make sure that all necessary fields of an object are filled (correctly!) before the object is first saved to the database.

When working directly with relational databases, the easy way to implement business logic is on the server side. Using triggers on the database level, consistency checks can be implemented, other processes executed just in time and so on. Unfortunately, this has a lot of drawbacks; one of the worst is that there’s no easy way to get useful user feedback in case a check fails. In real-world applications, more often than not business logic implementations will be split, performing some kinds of actions on the database level while leaving other stuff to the client application. For the latter part, it’s difficult to find the right “place” where to implement it, in .NET a typed dataset can provide part of a useful answer.

As long as there’s any server-side consistency checking implemented, there’s always the problem that data which is already loaded on the client, and has been changed there, may not adhere to the same restrictions the server would enforce if the data was to be saved. The programmer has to keep an eye on the exact state of things and see to it that data is saved to the server in all the right places.

There are two aspects to this issue: First, an O/R mapping tool should be allowed to define it’s own database structure with as much freedom as possible. (I know that a lot of people think that this should work the other way round, letting them define a structure and leave the tool to deal with that. Apart from situations where one needs to work with legacy data structures, this is nonsense to me and contorts the purpose of such tools.) Obviously, I’d have to be very careful when writing database layer code that successfully uses the information in the generated layout, and I’d risk breakage every time I update the tool.

Second, from the OW point of view, it seems intolerable to have a number of objects in memory at any given time that may not be in a consistent state. With relational data, this is often a situation that’s simply left to the developer of each distinct algorithm. But when objects are global to the application (or parts of it, at least) and there are intelligent caching and lifecycle management mechanisms in place, as implemented by a useful O/R mapper, one can’t live with the possibility of inconsistent states in in-memory data.

Consequences

So, these are (some of) the specific issues we have to deal with in the OW:

The consistency of in-memory data has much greater importance in the OW. Also, in-memory data is probably a more common thing to have to deal with.
Standard mechanisms on the database layer may be partly useless and there’s no ready-made replacement. Client-side counterparts like the index support in .NET datasets are not immediately available.
While obviously consistency checking systems like those implemented in triggers on a database server can be implemented client-side in an object framework, there’s no fixed point where this can be implemented. Everybody knows what a “before insert” trigger in a database is good for, but where do I put the same code in my OW?

These issues and their solutions will be subjects of future posts. Thanks for reading so far!