Friday, August 17, 2012

"Relations of nothing at all"

In a previous installment, it was explained how relations, in the sense of the Relational Model of Data, are a generalization of the binary relations that are typically studied in mathematics.  Obviously, that "generalization" implies that such relations consist of "pairs" that have more than two values (drawn from the domains that were used to form the cartesian product), but the possibility was also alluded at to have "pairs" with only one value for its members, and even the possibility of such "pairs" with zero values for its members.

How does that make sense ?

First, let's revisit the "trick" that the Relational Model applied in "replacing" the concept of "ordinal" pairs with that of "unordered" pairs, "decomposing" that trick and its consequences step-by-step.

Start with the cartesian product of domains INT and CHAR :

 +-------------+---------+---------------+--------
INT X CHAR = {(1,'A') , (1,'B') , ... , (2,'A') , ...}
        +---------+---------+---------------+-----

What those lines seek to express is that we "know" that the numbers in those pairs derive from domain INT, precisely because they are the first ones in the pairs, and likewise for the quoted characters.

(Aside : computer languages that support 'tuples' natively typically use this 'ordered' version.  Haskell is a case in point.  There is a "first" operator to "retrieve" the "first" value from a tuple.  I digress.)

But for reasons of practicality, we do not want "addressability" (of the values within the tuples) by ordinal position (well at least not in settings where we are dealing with "large shared databases"), we want addressibility by attribute name.  The way to do this is by prepending attribute names to each and every thingy in that equation :

 +------------------+-------------+--------------
I:INT X C:CHAR = {(I:1,C:'A') , (I:1,C:'B') , ... }
         +--------------+-------------+----------

Aside : The equation as given states the equality between (some kind of) "predicative" way of defining a set (left side of the equation) and the "enumerative" way of defining a set (right side of the equation).  An "intensional" versus an "extensional" way of defining a type if you will.  But pls don't proliferate that latter terminology as it may be grotesquely off-base with more common usage of those terms.  Anyways.  Going in detail on that would be digressing from my main point here.  Maybe something for another installment.  End-of-aside.

Do we now still need those "pairs" (n-tuples) to be ordered ?  Well obviously, no we don't.  As long as we "know of the association" of I to domain INT and of C to domain CHAR, there is no longer any meaningful difference between (writing down that first tuple as)   (I:1,C:'A')   and (writing down that first tuple as)   (C:'A',I:1).


As a consequence, under this notation, we can also think of a (single) tuple as itself being a set, namely a set of these things that are separated by the comma's in each tuple of the enumeration.  And what exactly are those "things that are separated by comma's" ?  Eurhm, well, they are ordered pairs in which the first thing is an attribute name and the second thing is "some value drawn from some domain".

Another aside : "Introduction to Database Systems" says that a tuple is a set of ordered triplets.  The difference between the two approaches is not germane to the main point being made in this post (which is tuples and relations of nothing at all).  Suffice it to say here that the difference plays on ( / is a consequence of ) that presumption that was spelled out "As long as we "know of the association" of I to domain INT and of C to domain CHAR".  Maybe something for another installment.  End-of-another-aside.

Okay let's see what happens if we explicitly re-cast that tuple as a set of ordered pairs.
That gives us   { (I,1) , (C,'A') }.   Braces in bold for emphasis.
It should be clear that if we take a cartesian product of n domains, we get tuples that are themselves a set of cardinality n.  And since set theory is not limited to cardinalities >=2, this implies that, at least theoretically, we could also consider tuples of cardinality one and tuples of cardinality zero.

(Side remark on terminology : the number of attributes in a tuple is more commonly referred to as its degree.  I use the term "cardinality" here because that term is more closely related to set theory, and I am stressing the concept of tuples-as-sets here.  It should be clear that my "cardinality" of a tuple and the literature's more common "degree" of a tuple are the very same thing.)

We'll look briefly at tuples of cardinality one, because there is little unintuitive about them, and they will show us how tuples (and relations) of nothing at all, indeed make sense, and how they even are useful.  A tuple of cardinality one could be something like   TUPLE{ (I,1) }.   They are useful whenever the information that we want to store or retrieve, is the logical counterpart of a unary predicate, i.e. a predicate in which there is exactly one "fill-in-the-gaps" place.  For example, "The average temperature tomorrow will be §I§ degrees centigrade.".  Or "the highest customer number assigned is currently §I§".  Or "A §make of car§ has passed through our street today.".  (This latter one merely to illustrate that one must beware of confusing unary predicates with "predicates of which only one single instantiation can be true.  The talk is of unary predicates.)

Okay.  One step further down the line of decreasing tuple cardinalities (and we're finally where the title suggested we'd end up).  A tuple of cardinality zero, is, then, "a set of attribute values that has zero members".  Iow, it is the empty set of attribute values.  Iow, it is the empty set.   TUPLE{ }, in our usual notation.

How many such tuples are there ?  That's pretty obvious : exactly as many as there are "empty sets" in mathematics ... exactly one, thus.  Because the zero-tuple is unique, we can give it a name.  TUPLE_DEE in what follows.  Now where can TUPLE_DEE be useful in data management ?  That should by now be obvious too : it will be useful in any place where the information we want to store or retrieve, is logically defined by a nilary predicate.  I.e. whenever we deal with information, the defining predicate of which has no fill-in-the-gaps spots at all.  These predicates are sometimes labeled "degenerate predicates", and actually they correspond to propositions.  The following examples are drawn from "Databases, Types and the Relational Model", appendix E : "The door is open", or "The alarm is set".  No §§ marks, no fill-in-the-gaps.

Suppose you'd have to design a database to record the current state of affairs for those two predicates.  In SQL.  Would you create a "type" that can represent open/closed ?  One that can represent set/not set ?  What would you be doing to prevent your tables from containing two distinct rows, one saying that the door is "open" and the other row saying that the door is "closed" ?  Or would you design a boolean attribute "true" for open and "false" for closed ?  Yet again, how would you interpret such a table if it is empty ?  How would you enforce the rule that there must always be exactly one row in your table ?  All of that involves much more complication than is needed or warranted.

All you really need for such a database, is a relvar (/table) with no attributes (columns) at all.  If I want to record the information that "the door is closed", then I just ensure the table is empty.  How does this work ?  Well, any tuple in a relvar represents a true instantiation of the relvar's predicate.  If that predicate is a niladic, "degenerate" one, then the only instantiation possible is, the predicate itself (which already is a proposition by and of itself).  If there are no tuples at all in such a relvar, then this means that there are no true instantiations (/no true instances) of the relvar's proposition, meaning the relvar's proposition is FALSE.  If there is indeed a tuple in such a relvar, then that means that there does exist a true instantiation of the relvar's predicate, and since the only possible instantiation of that predicate is the predicate (proposition) itself, it means that the relvar's proposition is true.

Observe that there are thus two distinct possible values that such a "niladic" relvar can take on : one possible value in which no tuple is present at all (that's just the empty relation), and a value in which the empty tuple is present.  The difference between these two is a bit akin to the difference (in mathematics) between   {} and {{}}.  These values (relation values) are often assigned a name too, most often "TABLE_DUM" and "TABLE_DEE", other times more briefly "DUM" and "DEE".

I coloured the "outer" braces red to illustrate that these are the braces "associated with" a relvar and its corresponding predicate (proposition in our niladic case) and the "inner" ones green to illustrate that these correspond to a tuple that makes an instantiation of a predicate (proposition in our niladic case) true.

When querying a database, it is often claimed that DUM and DEE represent "false and true, respectively", or "no and yes, respectively".  That is a little bit sloppy.  Here's why :

If a database relvar's predicate is "The door is open", and we query that relvar, then we get back either DUM or DEE.  If it is DEE, then TUPLE_DEE appears in it, but the 'meaning' that TUPLE_DEE carries in this case is "The door is open".  It is only because of human interpretation that we "know" that this means a "yes" answer to the question "is the door open".  If we ask a question "is at least one student enrolled on any course", then we could inquire an ENROLMENT relvar and project away all of its attributes.  Once again, the result we get is either DUM or DEE.  If it is DEE, then TUPLE_DEE within it carries the meaning of the projection, which is "there exists some student and there exists some course such that that student is enrolled on that course".  It is only because of human interpretation that we know we can answer the question asked with a firm "yes" ...

1 comment:

  1. Interestingly, I rarely see the acronym that I now take here to stand for "in other words" by ruling out as rather unlikely all the alternatives I saw here:
    https://acronyms.thefreedictionary.com/IOW

    Styling it as Iow (Initial Caps) made it even more confusing to me, i.e. I initially mistook it as "low" being a misspelling of "lo" from "low and behold".

    We agree on countless things that lack general consensus, mostly because people who should be thinking about those things aren't, so understand this wasn't in attack in any sense of the words--I genuinely threw an exception that had to be caught by a dictionary of some sort. If you use it frequently (and I am correct in my interpretation) it won't pose further problems for me.

    ReplyDelete