Tuesday, January 8, 2013

A letter by Carl Hewitt

Recently, an article by Erik Meijer entitled "all your databases are belong to us" created a bit of commotion among Relationlanders.  One Carl Hewitt subsequently wrote a letter in support of Meijer's article.  A few observations and personal thoughts about Hewitt's letter.

Hewitt writes :

"Relational databases have been very useful in practice but are increasingly an obstacle to progress due to the following limitations:"

This clearly implies that mr. Hewitt believes that at least once upon some time, a relational database has actually ever existed somewhere, and he has been able to observe, directly or indirectly, that that database was "useful in practice".  Considering the overwhelming evidence that SQL is not relational, I wonder what "relational database" mr. Hewitt has so observed to be "useful in practice".  Anyway.  Is mr. Hewitt unaware of the difference between "relational technology" and "SQL technology" ?  Or does he consider the difference (and its consequences) between the two so meaningless and futile that it does no harm at all to gloss over that difference and speak of SQL as if it were indeed relational ?

Besides.  Looking at his arguments in support of his claim (inexpressiveness of the RA, for example), one cannot help but wonder if mr. Hewitt is even aware of the difference between a "database" (that's the word he used) and that thing that most knowledgeable people usually use the term "DBMS" for ...  Can "inexpressiveness of the RA" possibly a shortcoming of "an organized collection of data" (if so, how ?), or could it only possibly be a shortcoming of "the software system, the foundations of which are in RA, used to manage the collection of data" ?

What does that tell you about how thorough, accurate and meticulously precise mr. Hewitt tries/bothers to be in his published writings ?

Hewitt writes :

"Inexpressiveness. Relational algebra cannot conveniently express negation or disjunction, much less the generalization/specialization connective required for ontologies;"

Codd's Relational Algebra was proven equivalent to predicate calculus, no ?  So that means that Codd's RA can express both negation and disjunction, right ?  And subsequent definitions of RA that emerged over time (think, for example, of the addition of a transitive closure operation) did not exactly remove the MINUS operator from the existing algebra, right ?  So that indicates that the key operative word in the claim is that vague qualification "conveniently", right ?  Using such a word without being precise about its intended meaning, is just cheap handwaving.

Anyway, the RA has MINUS (and its nephew SEMIMINUS, aka antijoin), and Relationlanders have known for over 40 years that this operator perfectly suits the purpose of expressing any predicate like "... and it is not the case that ...".  It remains an open question what mr. Hewitt thinks is "inconvenient" about writing things such as "<xpr1> MINUS <xpr2>" in, say, Tutorial D code.

Also, there is nothing to stop anyone from defining an algebra that has a "complement" operation (well, so long as all the domains/types are finite, presumably).  This algebraic operation by itself is the exact relational counterpart of the logical operation of negation, taken by itself.  Having to actually compute complements will be contraindicated in most circumstances, as it will typically involve actual computation of what is known in relationland as the "universal relation" for a given heading.  All of that is probably exactly the reason why Codd did not want to include such a "complement" operation in his algebra.

At any rate, I'm still left wondering what mr. Hewitt's problem is here.

Hewitt writes :

"Inconsistency non-robustness. Inconsistency robustness is information-system performance ..."

Note very carefully that Hewitt's complaint here is that the RM lacks "inconsistency robustness", which he then defines to be a performance characteristic.  Performance is not a characteristic of the model.  Anyway. Once writers start going down this alley, readers can already suspect the kind of , eurhm, "talk" that is about to follow ...

"... in the face of continually pervasive inconsistencies, a shift from the once-dominant paradigms of inconsistency denial and inconsistency elimination attempting to sweep inconsistencies under the rug. In practice, it is impossible to meet the requirement of the Relational Model that all information must be consistent, but the Relational Model does not process inconsistent information correctly. ..."

"Inconsistent" in the context of data[base] management, means the presence of information that is in violation of some stated rule that is supposed to apply to the database.  Or iow, that accepting the "inconsistent" information in the database, makes the proposition that states that the violated rule holds, a false one.  Or iow, the proposition that states that the rule holds in the database, is in contradiction with the "inconsistent" information.  Or iow, accepting inconsistent information in the database is tantamount to accepting contradictions.

And I've been told that it is a proven property that you can prove really just anything from a contradiction.

In RM, it is "possible" to consider "inconsistent information", just like in 2-valued propositional and/or predicate logic, it is "possible" to consider contradictory propositions/predicates.  Querying an RM system that holds "inconsistent information" is like applying the rules of logical reasoning to a set of contradictory axioms/premisses.  And blaming the RM for "not processing inconsistent information correctly", is like blaming logic for "not dealing with contradictions correctly" (where by 'correctly', it is implied that it should be something other than just 'false', of course).

"... Attempting to use transactions to remove contradictions from, say, relational medical information is tantamount to a distributed-denial-of-service attack due to the locking required to prevent introduction of new inconsistencies even as contradictions are being removed in the presence of numerous interdependencies;"

Anyone who is familiar with the RM, and also with how it is typically criticized, will recognize this immediately as criticizing the model on grounds of implementation issues (locking and transactions), which are orthogonal to the model.  Typical.

Hewitt continues :

"Information loss. Once information is known, it should be known thereafter;"

Should it really ?  I dispute that.  A requirement to never ever ERASE or DELETE or REMOVE just anything, will inevitably bring us to the point where planet earth does not have enough atoms to store all our stuff.

At any rate, there is absolutely nothing in the RM that prevents any database designer from defining structures that keep a record of "which information was known to the database owner during which period of time" and/or "at which point in time the database owner regarded this particular piece of information as no longer relevant and removed it from his operational system".  Even SQL has included features to support such stuff in the new 2011 standard.

And in the end, when to DELETE a piece of information, should be at the user's discretion (if regulatory bounds apply, then that user should of course be staying within those bounds, if that wasn't obvious), not at the model's discretion.

Hewitt still hasn't finished :

"Lack of provenance. All information stored or derived should have provenance;"

There is absolutely nothing in the RM to stop a DBMS user from defining structures that record exactly the kind of "provenance" information that Hewitt is talking about (whatever that may be), there is nothing in the RM to stop a DBMS designer from building facilities to automagically populate such structures, and there is nothing in the RM to stop a DBMS user from using such facilities.

Nor should there be any such thing in the RM.

And on and on it goes :

"Inadequate performance and modularity. SQL lacks performance ..."

So once again he unambiguously states that he thinks that "The RM is obsolete" (that's what the title says) because "SQL lacks performance".  OMFG.

"... because it has parallelism but no concurrency abstraction. Needed therefore are languages based on the Actor Model (http://www.robust11.org) to achieve performance, operational expressiveness, and inconsistency robustness. To promote modularity, a programming language type should be an interface that does not name its implementations contra to SQL, which requires taking dependencies on internals."

Is the term "programming language type" used here with the same meaning as the term "data type" in relationland ?  If so, then the last sentence seems to demand nothing else than that which relational advocates have been demanding for decades already : that the relational model should and must be "orthogonal to type", that is, that it is not up to the model to prescribe which data types should exist/be supported, and which shouldn't.

Of course, relationlanders have known for a long time already that SQL basically flouts that idea, and that its attempts at supporting -in full- the notion of abstract data types are quite crippled and half-baked.  But apparently it does indeed seem to be the case that mr. Hewitt mistakenly equates "the relational model" with SQL.

And Hewitt concludes :

"There is no practical way to repair the Relational Model to remove these limitations."

Well, this is the first time I can agree with something Hewitt says.  Sadly, as far as I can tell, "these limitations" are not limitations that derive from the Relational Model, rather they seem to derive from his limited understanding thereof.  And indeed there is no repairing a piano when the problem is in its player.

No comments:

Post a Comment