Tuesday, September 11, 2012

"Types, Sets, Tuples, Headings, Relations, a set mess ???"

That's a long title, and it's intended to illustrate how all of that stuff that actually underpins the Relational Model of Data, mathematically speaking, may come across as "one big tangled mess of almost-but-not-quite similar-or-equal concepts all thrown on a big pile".

And the body of this installment is about illustrating how it really is not that "perceived mess".

The following example has been used more than once already in this little corner of the internet.


I:INT X C:CHAR = {(I:1,C:'A') , (I:1,C:'B') , ...}


At this point, I need to take one more step back to basics.  The foregoing definition assumed it to be "known" or "agreed-on" what those domains INT and CHAR precisely were, respectively.  I.e. what its constituent members are.  The thing to note is that those "domains" are sets, and their members are values :


INT = {1 , 2 , ...}
CHAR = {'A' , 'B' , ...}

And that's really all it takes to be a type.  A type is a set of values.  And a value can be whatever we want it to be.  If I have a good use for the set of guitar chords, I can choose that to be a type.  If I have a good use for the set {cloud, oak, armchair} or {scissors, stone, net}, I can choose that to be a type.

(Aside : of course, in a scenario of information exchange, the parties participating in the exchange need to be in full and perfect agreement on what the types are (which values are in it and which are not).  Exchange without such a full agreement is unpleasant surprises in the air and accidents waiting to happen.)


Back to

I:INT X C:CHAR = {(I:1,C:'A') , (I:1,C:'B') , ...}


Obviously, by definition in fact, such Cartesian products are themselves a set.  Are they then also a type ?  The predictable answer is of course 'yes'.  A Cartesian Product is a set of tuples, and if we regard these tuples as being values, then any Cartesian Product of domains constitutes a tuple type.


One step further.  Relations were defined to be any subset of this cartesian product.  Any set is a subset of itself, thus any cartesian product is also a subset of itself, thus any cartesian product is a relation, and since each cartesian product constitutes a tuple type, at least this "maximal" relation ("maximal" subset of the cartesian product of the domains) can also be said to constitute a tuple type.  But all other relations derived from the same cartesian product are also "just sets of tuples".  Likewise, really just any relation can be thought of as "constituting a tuple type".  This holds true even of the empty relation, even though it's fairly counterintuitive to imagine a type that has no values ...  (What could we possibly be using that for ???)

Is all of that important ?  The view of "really just any relation" as itself being, or constituting, a tuple type ?  Not really.  Except perhaps for this.  One could "subdivide" the set of all possible subsets of our cartesian product in three distinct "classes".  The first class consisting of just the subset that is the original cartesian product itself.  The second class being those subsets that can be defined "predicatively", by stating a constraining additional predicate such as, e.g. "I equals 1".  And the third class being "the others".  Pragmatically speaking, this third class can only be defined "by enumeration".  (Formally, there is no real distinction between "by enumeration" and "by predicate", because stating the enumeration is, from the perspective of logic, the very same thing as stating a predicate.)

The third class of subsets is thus characterized by the fact that its definition "looks like", e.g. "It's the tuple type I:INT C:CHAR where either I equals 1 and C equals 'Z', or I equals 5 and C equals 'W'".  Which is a predicate, but one that looks suspiciously much like a full enumeration.

The second class of subsets is thus characterized by the fact that its definition "looks like", e.g. "It's the tuple type I:INT C:CHAR where I equals 1".  A reasonably concise way to define possibly useful subsets/types.

The first class is thus characterized by the fact that its definition is just simply, "It's the tuple type I:INT C:CHAR".  The 'I:INT X C:CHAR' portion is the heading.  The (only) tuple type in this class can thus be said to be characterized by its heading alone.  All other conceivable tuple types with the same heading need an additional constraining predicate to define which tuples are and are not member of the type.  This is the reason why in practice, this particular tuple type is often called "the" tuple type corresponding to some heading (as if there is only one such type).  Formally, this is rather incorrect.  Many tuple types have the same heading, but the one that has all possible tuples, is the most useful, in practice.

The fact that relations are themselves also tuple types, is the source of quite a bit of confusion. The "design dilemma", for example.  (That's the question of whether structures such as 'customer' and 'address' should be conveyed in their own type or not.)  It's not a dilemma at all.  Defining a Customer or Address class in an OO system constitutes defining a type.  The tuple type that corresponds to it is a type too.  Whether or not to give the type a name, is another question, and one that will mostly be moot at that.

On to relations.  If relations are themselves values, then what is their type ?  Well, a type is the set of all its contained values.  Hence, a relation type is a set of relations.  The construction of such a set of relations can be done using the mathematical concept of powersets.  A powerset is the set of all possible subsets of a given set.  So if we :
- start with a tuple type such as the one in our running example,
- enumerate all the possible subsets of that tuple type (that's equivalent to enumerating all the possible relations with that heading)
- and collect all those subsets in a new set,
then this new set is our relation type.

This relation type too is characterized by its heading (and its heading alone).  As with tuple types, this relation type is not the only relation type of this heading, though.  Any subset of the tuple type (e.g. all the tuples that have I>=1, thus ruling out negative numbers), when subjected to the same procedure, will in turn give rise to some relation type (relations with the specified heading and where all its tuples satisfy the predicate that (e.g.) I>=1).  Likewise, subsetting the relation type using some predicate (for example, retaining only "the relations that have at least three tuples") gives rise to some other new relation type.

Typically, however, relational systems do not exploit this property of relation and tuple types.  Instead they give the user the possibility to specify predicates that constrain which relations and tuples can validly appear in which places.  But a system that offered named tuple/relation types, with a constraining predicate, would fundamentally (mathematically) not really be different (in what it allows the user to achieve).

No comments:

Post a Comment