public
Description: JavaScript Application Framework - JS library only
Home | Edit | New

Relational Object Graphs

by Erich Ocean (onitunes)

Also see Cacheable JSON Results and Schemas.

A relational object graph is one where the objects are related to each other in relational terms as used in the database management community.

There are two classes of objects in a relational object graph:

  1. the objects themselves
  2. the relations between the objects, which are pairs of object references

Objects are trivial to implement; they’re basically structs with dynamic (or positional, in the case of C++) method lookup.

Relations are also objects, but their implementation is much more complex. There are four variations:

  1. one-to-one
  2. one-to-many
  3. many-to-one
  4. many-to-many

Here’s what’s tricky: for each relation, you need two hashes: left-to-right, and right-to-left. Depending on the type of relation, the left or the right may allow multiple objects (or not).

NOTE: Although I mention hashing, this is because I’m implementing an OODB. The concepts expressed are trivial to use in SQL. — Erich Ocean (aka onitunes).

When you “traverse” a relationship in a relational object graph, you are effectively using the relation to access one or more objects. To change how objects are related, you must modify the relation itself, not the objects.

In addition, and most importantly, there is only a single relation for both sides of the relationship. For example, in a one-to-many relationship between a client and its projects, there is only one relation in memory representing the relationship (think === equivalence). If you have a client object in memory, you can access its projects relation, which is an identifiable, addressable object in memory. If you were to get the list of projects and then ask one of those projects for its “client relation”, you would get the same client-projects relation in memory back to you.


KEY POINT

Uniquing relation objects is the key to making relational object graphs work (obviously, this implies that the objects themselves are also unique, i.e. the same object only has a single address in memory).


What makes this difficult to implement (the so-called “impedance mismatch between relational database and object technology”) is that databases do not expose relations explicitly. Instead, they compute them on-demand.

Things that are “computed-on-demand” are not “identifiable, addressable” objects. What can we do? How can we turn the computed-on-demand relationships of a relational database into identifiable, addressable objects that are only represented once in memory?

The solution is to name them. Relations should be named based on both sides, e.g. client-projects or blog-tags. They must also be of a certain type, e.g. one-to-one, one-to-many, or many-to-many. Finally, you must be able to instantiate them from either side of the relationship, and instantiating either side of the relationship multiple times must return the same relation in memory. This implies some kind of “factory” pattern.

When an object is first inserted into an object graph, all of its relations are “undefined”. When a relation is first accessed, the correct relation in memory is retrieved from the relation factory, where either the existing relation is returned, or a new relation is created on-the-fly. In this way, a new object can be “patched into” an existing object graph.

Objects are always singular; only a relation can contain multiple objects. Thus, array controllers always map to (one-side-of) a relation.

What does it mean to “save” an object in a relational database? Well, it means saving not only that object’s attributes, but also its relations, which may or not be stored in that object’s row in the database. Thus, saving a single object could actual result in saving multiple “rows” (and multiple tables) in a SQL database.

To make this efficient, relations should mark themselves as needing to be saved, just like normal objects. You then save “the editing context” (e.g. SC.Store), just like you would in Core Data.

Notes, Ideas…

The basic realization is that there are only two kinds of queries required to do remote observation:

  1. Is this object current as of transaction X?
  2. Does this relationship specify the correct set of object pairs as of transaction X?

And that’s it.

Keeping said server “up-to-date” involves sending back data to update a given object, and sending back data to update a given relationship. If all manipulation is otherwise performed on the client, then no code is required in the database, other than basic consistency/integrity code (to verify relationships are set up properly, etc).


Relationships can all be represented by two-column tables (domain and codomain), and the various types of relationships can be enforced by appropriate “unique” indexes on the columns themselves and the pair of columns.

Should the columns be transaction-oriented? I guess I’d say yes. Setting a relationship for an object is different than updating that object’s attributes, so it would seem to make sense to use different transactions times for both. You can still determine if a given relationship applies for any two relatable objects at a given time. In most cases, a successful update will result in identical transaction times.

Last edited by erichocean, Mon Jul 28 17:33:06 -0700 2008
Home | Edit | New
Versions: