|Eliot Kimber||Aug 19, 2004 12:27 pm|
|Subject:||DTD Implementation VS the DITA Abstraction|
|From:||Eliot Kimber (ekim...@innodata-isogen.com)|
|Date:||Aug 19, 2004 12:27:35 pm|
It occurred to me that in my recent discussions I may have given the impression of undervaluing the details of how the DITA DTDs and schemas are actually constructed. I may also appear to be callous with respect the needs of authors.
Neither is the case, it's just that we have to be careful to distinguish the abstractions that are being standardized from the implementation expressions of those abstractions. We also have to be careful to understand when those concerns apply: during standardization or during the implementation of standards-based systems.
The DITA DTDs and schemas are implementation expressions of the core DITA abstractions. This means that, in the abstract, there can be any number of equivalent such implementations that are functionally equivalent and equally useful. In particular, the DITA *standard* needs to explicitly define a world in which there can be different but equivalent implementations.
However, for the purposes of providing useful reference implementations to the DITA community the implementation is very important. In this area, the work that the IBM DITA team has already done is of vital importance--it reflects lots of hard word, careful thought, and hard experience.
By the same token, while most of my focus so far has been on *processing* as opposed to authoring, authoring is of course vitally important to creating a complete information management system that will be used and used effectively. From an author's standpoint the key system features are clear element type names and appropriate content models.
Thus, when defining concrete DITA *applications* (to use the terminology I introduced in the Namespace resolution thread) the issue of element type names and content models are of vital importance. But these are the concerns of *applications*, not of the core DITA standard. Of course, because the DITA standard is also defining abstract types and our expectation is that those types will be used directly as element type names, we can't literally be arbitrary in our choice of DITA type names.
Also, because the element type names used in DITA applications are entirely up to the application designer and are not constrained by the DITA specification in any way, application designers have a lot of flexibility to do what will be best for their authors. This again means that element type names or details such as whether or not applications use namespace qualification need not be a direct concern to the DITA specification itself. It will be a concern to the reference DITA application, but I think we've already established that the only reasonable thing we can do in the 1.0 timeframe is to do essentially what IBM DITA does and use no namespace for element types.
Finally, many XML authoring tools provide ways to provide some form of "alias" for element type names, meaning that the user interface exposed to authors need not be directly constrained by the base element type names.
Of course, as an engineering principle, we want to implement things so that the simplest systems will be as effective as they can reasonably be, meaning that the default element type names should be well thought out and clear, but we don't have to worry over much about the implications for authoring because real production systems will almost always involve a fairly large degree of customization anyway.
I think one thing that may be derailing these discussions is that the IBM DITA developers have, appropriately, been primarily focused on authoring because they were developing an authoring support system and the DTDs were (and are) a key part of that system.
But the DITA standard is *not* primarily an authoring support system. It is a generic standard that defines core types and processing semantics that in turn provides a solid basis from which task-specific authoring support systems can be built. That's a key difference and requires a sometimes subtle shift in emphasis of requirements and features.
I think it comes down to this:
DTDs and schemas are, primarily, system components that support authoring and are important primarily in the context of authoring support systems. Processing systems don't care at all about DTDs except to the degree that they need either markup minimization (default attribute values) or require that documents pass a validation gate. But even for validation DTDs and schemas are either only part of the solution or can be completely replaced by validation applications (which you have to have if you must support schema-less documents). So ultimately you come to the conclusion that DTDs and schemas primarily support authoring and are at best a convenience for the rest of the system (and at worst an impediment because they have to be accounted for even when they aren't needed).
Because standardized DITA must, by the nature of standards, be primarily a processing and interchange standard (because authoring is always localized), it means that the focus of the standard will not be on the details of DTDs but on the abstract structures and business rules the DTDs are implementation expressions of.
-- W. Eliot Kimber Professional Services Innodata Isogen 9390 Research Blvd, #410 Austin, TX 78759 (512) 372-8122