Friday, August 24, 2012

"A difference that doesn't matter"

This question was touched on lightly in an aside of a previous installment, leaving the question mostly unanswered.  The previous post mentioned this to illustrate the concept of a relation as a set :

 +------------------+-------------+--------------
I:INT X C:CHAR = {(I:1,C:'A') , (I:1,C:'B') , ... }
         +--------------+-------------+----------

And then it went on to conclude that for any of those individual tuples, e.g.  (I:1,C:'A')  ,

"We can also think of a (single) tuple as itself being a set, namely a set of ordered pairs in which the first thing is an attribute name and the second thing is "some value drawn from some domain".

And then a sidenote was made observing that other sources ("Introduction to Database Systems") "say that a tuple is a set of ordered triplets", and it was added that "The difference between the two approaches is not [so important]".

A little post to expand on that "difference" and to justify the claim that indeed this difference "doesn't matter".

Indeed, "Introduction to Database Systems" speaks of "the triple <Ai,Ti,vi> being a component of [some tuple]".  We can recognize the Ai and vi parts, that's the attribute name part and the value part, respectively, and the "additional" Ti part is the name of the corresponding domain for Ai.

Thus, the "difference" is essentially one between

 +------------------+----------------------+--------------
I:INT X C:CHAR = {(I:INT:1,C:CHAR:'A') , (I:INT:1,C:CHAR:'B') , }
         +------------------+----------------------+------

and

 +------------------+-------------+--------------
I:INT X C:CHAR = {(I:1,C:'A') , (I:1,C:'B') , ... }
         +--------------+-------------+----------

Is that "difference" of great significance ?  Not really.  In both "alternatives", the LHS of the equation is a statement of the heading that the resulting tuples conform to.  In both alternatives, there is a "mechanical" way to compute the RHS, given the LHS (and given the precise definition of the domains that are used in that LHS, of course).  That "mechanical" way consists of :
  • process all members of the "first" domain
  • for each member of the domain, process all members of the "second" domain
  • for each member of the second domain, process all members of the "third" domain
  • ...
  • for each member of the "next-to-last" domain, process all members of the "last" domain
  • for each member of the last domain, spell out the tuple consisting of those "current" domain members.
It should be obvious that the only difference between the two is in the details of the "spell out" part.  The first approach includes "spell out the domain name", the second example doesn't.

The "mechanical way" just described is, in a sense, the way to go from the LHS of the equations to the RHS.  One could also, somewhat philosophically, contemplate things that could happen if one ever wanted to "go from the RHS of those equations to the LHS", so to speak.  "Derive the heading from an enumerated cartesian product", so to speak.  In that case, there seems to be a more important difference.  The second example simply doesn't allow us to do that ... at first sight.  Indeed, no domain names are mentioned in the RHS of the second example.  So how do we obtain them ?  Rather simply : generate them.  Name them "1", "2", "3", ...  Those names don't have to be meaningful when dealing with questions that were only philosophical to begin with !

Anyway, it should be clear that both approaches are able to "cover the same ground", and that therefore whatever difference might be perceived between them is indeed only perception, superficial, and is indeed a difference that doesn't matter.

No comments:

Post a Comment