Monday, August 13, 2012

"What is a relational table"

It's the kind of question that always tempts me to respond with brief answers like "a contradiction of terms", or "a misnomer" (and then leave it at that).

But that's not very helpful of course.  Hence a more elaborate attempt here to write down how I would personally answer the question.

First of all, the word "table" is a bit of an overloaded term in the world of relational, or pseudo-relational (SQL) data management.  In SQL, the word stands for two things, that are fundamentally very very distinct, but where SQL, mainly for historical reasons, failed to recognise that distinction.  Those two things are :

(a) an updatable object managed by the database.  (Creating such an updatable object, in its quality as 'updatable object', is done by CREATE TABLE ...)  In terms of modern relational theory, this could better be called a "TABLE VARIABLE".  The qualifier "VARIABLE" refers to the quality of being an updatable object, i.e. a thing whose state (aka current value) can change.
(b) the actual content contained in such an updatable object.  Those are the things that are usually depicted in the relational (and pseudo-relational) textbooks as cells holding a value (but sometimes not) and separated by some lines :

+---------+----------+
! NAME    !    SCORE !
+---------+----------+
! JOHN    !        1 !
+---------+----------+

In relationland however, "tables" do not really exist.  What does exist is what relational theory calls relations.  Tables only enter the picture (pun intended) when relations need to be represented on paper.  And when they do, tables are a very convenient way to depict, to render, to display, a relation on some device, e.g. paper or LCD flatscreen.  (But note that they are not the only possible way !).  That is why Fabian Pascal always says that a table is a picture of a relation.
Also in relationland, the distinction [that SQL failed to make] between 'updatable object' and 'the contents thereof', is properly made.  The 'updatable objects' are referred to as "RELATION VARIABLES" (RELVARS for short), and the 'contents thereof' are referred to as "RELATION VALUES" (RELATIONS for short).

All of this just to illustrate that the "proper" term to use for denoting that concept of "relational table", is " relation".


Now onto "What is a relation", which was presumably the intent of the question originally asked.  I have some stuff on the very same subject on my website, but the treatment there might come across as overly pedantic, and if you can't stand that, well, then it's not helpful.  So I'll be a bit more brief and a bit "looser" in what follows.  Here goes.
Take some sets of values that you deem useful and assign them a name.  For example WE_DAYS is the set {saturday, sunday} and BOOLEAN is the set {true, false}.  From mathematics, you are supposed to know that we can form a cartesian product with those sets.  For example, the cartesian product WE_DAYS X WE_DAYS gives us a set of ordered pairs , where each element of the pair is a value that appears in WE_DAYS : { (saturday,saturday),(sunday,sunday),(saturday,sunday),(sunday,saturday) } .  The cartesian product BOOLEAN X WE_DAYS gives us : { (true,saturday),(true,sunday),(false,sunday),(false,saturday) } .

Two points to note : 
  1. Mathematics typically stays within the bounds of binary relations, i.e. based on the cartesian product of exactly two domains.  For purposes of data management, this is to be generalized to cartesian product of "any arbitrary number of domains", thus giving rise not only to pairs, but also to triplets, quadruplets, quintuplets, etc.  The "poshy" name for this is n-tuples, where n stands for any number such as 2, 3, 4, 5, ... but also possibly for the number 1, and even the number zero !
  2. There is a sense of "ordering" to those n-tuples, and for purposes of data management, this too is undesirable.  We don't want to fill our programs with references to "the second attribute" or some such.  We want to be able to use attribute names instead, in those references, and we do not want to have to bother with any alleged "meaningfulness" of any kind of "ordering".  Any such "meaningfulness" only brings about additional maintenance problems (if a column is removed, all subsequent columns would be "renumbered", and all references to them must be revised), the costs of which are not offset by any material benefit.  And certain laws of capitalist economy dictate that matters be organized so as not to incur those costs :-)
Back to mathematics.  After having defined what a cartesian product is, that discipline then goes on to define what a relation is : a relation is some subset of such a cartesian product.  The two examples [of a cartesian product] previously spelled out, both give rise to sixteen possible relations :
  1. First of all, the cartesian product itself (a cartesian product is itself also a set [of ordered pairs], and any set is by definition a subset of itself).
  2. Four possible relations that have exactly three of the ordered pairs
  3. Six possible relations that have exactly two of the ordered pairs
  4. Four possible relations that have exactly one of the ordered pairs
  5. And finally, the empty set, which is by definition a subset of all sets, thus also a subset of any cartesian product, thus by defintion also always a relation.
A relation in the sense of the relational model of data, is actually the very same thing : it is a subset of a cartesian product of domains, but always with the amendment in mind that the (so-called) "ordered pairs" of such a cartesian product, are not necessarily pairs, and are certainly not "ordered" (but its constituent components are addressable by attribute name instead).
A number of consequences of this definition deserve to be spelled out.
  1. Is there any way for a cartesian product to contain a "pair" (/n-tuple) whose number of constituent components is not equal to the number of domains that the cartesian product was taken with ?  No there isn't.  As a consequence, it there any way for there being any "pair" (/n-tuple) in a relation that does not have exactly n attributes ?  Once again, no, there isn't.  As a consequence, there is no room for such a thing as SQL's NULL in a relation, as defined by the relational model.  (There would be, if a thing such as NULL were a member of one of those sets that the cartesian product was taken over, but that would mean that that kind of thing would actually be a value, meaning that at the very least, such a thing would have to be considered equal to itself.  (Contrast that with SQL.)
  2. Is there a way for a cartesian product to contain "the same pair twice", so to speak ?  No there isn't.  As a consequence, is there any way for a relation to "contain the same n-tuple twice or more, so to speak" ?  Once again, no there isn't.  If a component of a relational engine produces relations as its output, then that means that there cannot possibly be "duplicate rows".

And that really is all there is to say on "what is a relation" ...  The following posts will be related to "what do relations mean", and that very curious possibility of "a subset of a cartesian product of no domains at all" ...

No comments:

Post a Comment