Monday, August 13, 2012

"What is a relational database"

Various sources (see, for example, "An Introduction to Relational Theory") define a "database" as an
  • organized
  • machine-readable
  • collection of data.
Many alternative formulations exist with various degrees of difference in style or covered detail, but the essence is typically always the same.

From such definitions, we can derive (proceeding in random order) :
  1. A database is a collection of data.   Note in particular that a "collection of data" is nowhere near the same thing as "a software system that can be used to manage data" !!!  A mere collection of data is something notoriously passive, a collection of data does not ever actively do anything, but a software system is, contrariwise, a component that can indeed be very active.  What I'm getting at here is of course the distinction between the terms "database" and "DBMS".  A database is a collection of data, a DBMS is a software system that is used to manage that collection of data.  Not respecting that distinction can irritate some people quite heavily, so anyone reading this can now, once and for all, solemnly swear that they "shall never ever again use the word 'database' when what they actually mean is 'DBMS' ".  If the batteries of your camera have run down, then you don't say that it's the batteries of your photograph that need recharging either, do you ?
    Should you think this is unimportant or needless nitpicking, then think again.  It's important to be precise.  Why is it important to be precise ?  Because we deal with machines that do exactly as they're told.  And we programmers and software developers in general are the ones who are supposed to formulate the instructions that those machines will be following.  To be skilled at that, it is an absolute and inevitable requirement that the person doing that is, indeed, capable of being precise.  Lethally precise.  The more capable we are of achieving that "lethal degree of preciseness", the more skilled we will be at writing software that is actually correct.
  2. Back to what a database is.  It is an organized collection of data.  Meaning there must be some kind of structure to the database.  If there is no structure in a database, then what it contains is not data, but just garbage, and nothing or no one will ever be able to make any sense out of what it contains.  The term "unstructured data", "unstructured databases", "schema-less databases", "schema-less data", "schema-less data model", and what have you of that ilk, are all contradictions of terms, and nothing more than utter nonsense.  There always is some kind of structure.  It may be a very high-level kind of structure, it may be that it doesn't really express anything that comes comparatively close to the kind of facts that the end-user is actually dealing with in his world, it may rise skyhigh in its level of "abstraction", in some sense, but there always is at least some kind of structure, some kind of organization.
  3. And finally, it is machine-readable.  Some alternative formulations are about a database being an "electronic registration" of some sort, the essential point is the same.  (And comparatively unimportant, I'm tempted to add.  The sole purpose is to rule out the paper cards in a filing cabinet of elder times, as being a kind of database that should be taken into consideration when discussing database technology in computerized environments.)
That established, we can state what it means to be a relational database :
  1. First and foremost, it means of course being a database, meaning it satisfies the three criteria mentioned earlier,
  2. And two, it is a collection of data that is organized according to the relational model of data, meaning that it is an organized collection of relations.  More precisely, it is an organized collection of relation variables, variables that take on relations as values.  The "organization" takes on the form of each such relation variable (relvar for short) being assigned a unique name, owing to which it is identifiable in the database.
    Also to be noted is that each relvar in the database is accompanied by an intended interpretation.  It is precisely this "intended interpretation" that will allow us users to make any sense of what the database contains.  This intended interpretation is of no concern to the machinery (the DBMS) that manages the data, it is all the more important for us, users, who are "external" to that machinery.  That is why this intended interpretation is often called the "external predicate" associated with the relvar.
(And consequentially, at this point it can also be stated what it means to be a relational DBMS, namely in the first place to be a DBMS, i.e. a software system that can be used to manage data and/or databases, and in the second place one that has the purpose of managing a relational database in particular.)

No comments:

Post a Comment