Monday, July 14, 2014

Conceptual vs. Logical modeling, part IV & conclusion

Other kinds of constraint

Still referring to the example model at

I now want to draw your attention to that very peculiar construct close to the center of the image.  Thing is, I have never seen such a symbol before in a conceptual data diagram, and I suspect the same will hold for most of you.

So what does it express ?  The question alone illustrates an important property of using (any) language to express oneself : if you use a word or a symbol that the audience doesn't know/understand, they won't immediately get your meaning, and they'll have to resort to either guessing your intended meaning from context, or else asking you to clarify explicitly.  And if your models and/or drawings end up just being stored in some documentation library, there's a good chance that the readers won't have you around anymore for directly asking further clarification from the original author.  Leaving "guessing the meaning from context" as the only remaining option.  (As said before, guesswork always has its chance of inducing errors, no matter how unlikely or improbable.)

So, since the original author isn't available for asking questions to, let's just do the guesswork.  I guess that this curious symbol intends to express something that might be termed "exclusive subtyping" (I'm not a great fan of the word "subtyping" in contexts of conceptual modeling but never mind).  It expresses that an asset can be a "Financial" asset, or a "Physical" asset, or an "Information" asset, but never two or more of those three simultaneously.  We already touched on this, slightly, in the discussion of referential integrity : the lines from the three subtypes can be seen as relationships, of which only a single one can exist.  I'm pretty sure at one point or other, you've already run into the following notation to denote this :
! ASSETS          !
!...              !
   |   |   |
   |   |   |
   |   |   +-----------------------+
   |   |                           |
   |   +---------------+           |
   |                   |           |
+------------------+ +-------+ +--------+
! FINANCIAL_ASSETS ! ! ...   ! ! ...    !
+------------------+ +-------+ +--------+
!...               ! ! ...   ! ! ...    !
+------------------+ +-------+ +--------+

And this more or less begs the question, "then what about the cardinalities of those relationships ?".  The point being, the example model doesn't seem to give us any information about this.  Can there be only one or can there be more FINANCIAL_ASSETS entries for each ASSSET ?  Can there be zero FINANCIAL_ASSETS associated with a given ASSET (even if the attributes at the ASSETS level tell us it is indeed a "financial" asset) ?  Can there be only one ASSETS associated with each FINANCIAL_ASSET, or can there be more, or even zero ?  Strictly speaking, no answer to be found in the drawing.  Guesswork needed !

Arbitrarily complex constraints

And beyond these, there are of course the "arbitrarily complex" constraints.

"The percentages of the ingredients for a recipe must sum up to 100 for each individual recipe".  No graphical notation I know of allows to specify that kind of thing.  Nevertheless, rules such as these are indeed also a part of a logical information model.


In general, for each information system that manages (and persists) data for its users, there exists some set of "metadata" (I'm not a big fan of that term, just using it for lack of better here and hence in scare quotes) that conveys ***every*** little bit of information about :

(a) the structure of the user's data being managed
(b) the integrity rules that apply to the user's data being managed

Such a "complete" set of information is what I call a "logical information model".

Anything less than that, i.e. any set of information that could have to leave unanswered any possible developer question concerning either structure of or integrity in some database, necessarily becomes a "conceptual information model" by that logic.  Note that such a definition makes the term "conceptual information model" cover the entire range of information models, from those that "leave out almost nothing compared to a fully specced logical one", to those that "leave out almost everything compared to a fully specced logical one".  Some of the existing notations are deliberately purposed for very high levels of abstraction, some of them are deliberately purposed for allowing to document much finer details.

Thing is, all of the existing notations are purposed for _some_ level of abstraction.  Even ORM with its rich set of symbols that allows for way more expressiveness than plain simple basic ER, has more than enough kinds of things it cannot express (perhaps more on that in a next post).  And hence any information drawn in any of the available notations, must then be regarded as a conceptual model.

Some of them will expose more of the nitty gritty details, and thus perhaps come somewhat closer to the "truly logical" kinds of model, others will expose less detail, thus deliberately staying at the higher abstraction levels, and thus staying closer to what might be considered "truly conceptual".  But none of them allow to specify every last little detail.  And the latter is what a true logical informatin model is all about.

No comments:

Post a Comment