|Eliot Kimber||Aug 4, 2004 1:46 pm|
|Erik Hennum||Aug 5, 2004 8:44 am|
|Eliot Kimber||Aug 5, 2004 8:54 am|
|Erik Hennum||Aug 9, 2004 6:42 pm|
|Eliot Kimber||Aug 10, 2004 9:02 am|
|Erik Hennum||Aug 10, 2004 8:04 pm|
|Eliot Kimber||Aug 19, 2004 11:24 am|
|Erik Hennum||Aug 24, 2004 12:23 am|
|Don Day||Aug 24, 2004 6:41 am|
|Eliot Kimber||Aug 24, 2004 9:05 am|
|Subject:||Re: [dita] Namespace resolution|
|From:||Erik Hennum (ehen...@us.ibm.com)|
|Date:||Aug 24, 2004 12:23:51 am|
Hi, Eliot and TC:
While this thread may be technical and detailed, it's important and useful for articulating key issues. Regardless of namespaces, we need to have an unambiguous understanding of the role of vocabularies and of element polymorphism.
Erik Hennum ehen...@us.ibm.com
1. Vocabularies as a construct in the DITA architecture
Some XML standards provide a single vocabulary. By contrast, the DITA architecture includes as a definitional strategy the creation of vocabularies through the combination of specialization modules.
As a starter kit, the DITA distribution includes a few core vocabularies that integrate the core specialization modules:
* The 4 single topic type vocabularies (for instance, the concept topic type plus all of the core domains) * The ditabase vocabulary (all of the core topic types plus all of the core domains)
The analogy might be to a salad. A salad is distinct from the lettuce, tomato, oil, vinegar, and so on from which it's made. That is, the combination and organization of the ingredients is a construct distinct from the ingredients themselves.
Let's use the name "the DITA core concept vocabulary" for the vocabulary that integrates the concept topic type and all of the core domains. The DITA core concept vocabulary is a vocabulary in the same way as SVG, MathML, and DocBook. DITA just provides an architecture for assembling such vocabularies that promotes reuse and relationship of design modules.
By the way, in retrospect, I think my use of the term "document type shell" gave the impression of the declaration rather than the abstract. A confusing choice of term on my part. Fortunately, others are more careful.
2. Vocabulary processing
Content is validated and processed against the vocabulary as a whole rather than against the individual specialization modules integrated by the vocabulary.
For instance, when resolving a conref, processes are obligated to check whether the vocabulary for the referencing document includes the specialization modules for the elements in the referenced content. Otherwise, the conref could include invalid elements.
Similarly, the nesting of topics can only be validated against a vocabulary. A topic type doesn't control which topics can be nested inside it. The vocabulary controls that.
If every element can be processed in isolation, the specialization modules can provide complete processing. If the processing requires contextual sensitivity, however, the vocabulary has to be able to affect the processing. After all, the vocabulary controls the context.
For instance, in one domain, I've specialized section as backgroundSection so my topics can include background content. In another domain, I've specialized title as safetyInstructionTitle so I can include safety instructions as either a topic or section. I now create a vocabulary that integrates the two domains, so I can have background sections that provide safety instructions. In the same way that a term within a dlentry has processing expectations, a backgroundSection that contains a safetyInstructionTitle could have processing expectations (perhaps of isolation, implemented as a sidebar for some outputs). Only the vocabulary can specify the processing expectations for the combination of the two elements. After all, the background and safety modules might be supplied by designers who are completely unaware of one another's specializations.
Note that this processing expectation is part of the semantics of the vocabulary. Different applications may realize those processing expectations in different ways.
3. Element polymorphism
While the value of the class attribute is crucial, the actual element name still has a role in processing. The element name must match one of the names from the value of the class attribute and is the declared type for the content.
Let's call an application that's sensitive to the DITA architecture as a DITA-sensitive application. In DITA-sensitive applications, the element instance should be treated as the most specialized element for which processing is available. Thus, it's true that in a DITA-sensitive application, the actual element name is irrelevant. In this approach, DITA resembles object oriented systems, which execute the methods of the actual class of an object rather than the methods of the declared class of the object.
We don't want to limit processing of DITA content, however, to DITA-sensitive applications -- especially where existing vocabularies are being retrofitted as DITA vocabularies. For DITA-insensitive applications, the declared element type is everything and the class attribute is nothing.
In addition, the declared element type is displayed to human readers of the content to guide their understanding of the semantics of the content.
Because the actual element name is important for these purposes, the DITA architecture mandates support for generalization and respecialization operations to change the declared element type.
With that background, I'd like to return to the lingering namespace issues.
----- Forwarded by Erik Hennum/Oakland/IBM on 08/21/2004 06:28 AM -----
Eliot Kimber <ekim...@innodata-isogen.com> wrote on 08/19/2004 11:27:31 AM:
Erik Hennum wrote:
That is, the XML world currently only gives us one way to unambiguously bind documents to their vocabularies and that is namespaces.
Given that then I agree that *if* a document needs to be recognized as being governed by a particular XML application then that application should have an associated namespace and the document should declare it.
It would seem, then, to make sense to apply the magic approach to vocabulary namespaces in DITA 1.0 -- the namespaces for DITA vocabularies are specified but not applied. Adopters who need to host DITA vocabularies within other markup languages or edit DITA content in environments that require namespaces can make use of the standard namespaces at their discretion.
5. DITA applications in which element type names are qualified with their corresponding package namespaces. This is possible for the same reason (4) is possible: element type names are arbitrary.
Would the root element for the DITA content have to declare both the namespace for the vocabulary and the namespace for the element's specialization module?
For instance, how would a specialized topic declare both the namespace for its specialization module and the namespace for the vocabulary that's combining it with other topic types and domains? As in the illegal:
<specializedTopic xmlns="http://some.org/dita/vocabulary/specializedVocabulary" xmlns="http://some.org/dita/module/specializedTopic" class="- topic/ph http://some.org/dita/module/specializedTopic#specializedTopic ">
2. The namespace prefixes for the core DITA packages are "magic" and must be use used as-is in class attribute values in DITA 1.0. This avoids any requirement for DITA 1.0 processors to have to be prepared to dereference core package names to namespace URIs.
3. The DITA 1.0 spec can *discuss* the other ways in which namespaces _can_ be used in conforming DITA applications without actually doing it requiuring it or doing it in the oasis-provided DTDs and schemas.
It's an inspired compromise for 1.0 to treat specialization module qualifiers as magically bound to namespaces that aren't actually declared on the element. I'd like to see it applied to both core DITA and non-core specialization modules so we don't have a two-tier typing scheme.
If someone wants to have namespace-qualified element types they'll need to create their own versions of the DTDs or schemas that add the appropriate namespace declarations and qualifications.
In passing, I think an adopter could implement a vocabulary namespace by providing a DTD or Schema wrapper around the DITA DTD or Schema for a core vocabulary. In the DTD wrapper, you'd attach an xmlns attribute to the root element to declare the default namespace. In the Schema wrapper, you'd declare a target namespace and include the core Schema.
6. User-defined specialization packages *must* be namespace qualified and DITA processors should expect to have to dereference non-core package names used in class attributes to namespace URIs. I don't see a away around this as the alternative is to accept the potential for unresolvable package name collision in class attribute values. I don't think this is a hardship in practice. It does suggest that perhaps there are at least two levels of conformance for DITA processors: those that only recognize core DITA packages and those that can handle all packages.
Agreed, without namespaces for specialization modules, there's a risk of naming collisions. In DITA 2.0, we will very likely want to find a way to have namespaces for specialization modules.
On the other hand, a two-tier typing scheme for specialization modules has its own costs through complexity and attendant risks of errors. The processing has to have special cases, and so on.
Can we live with the risk of naming collisions between specialization modules in the early phases of DITA adoption?
In principle, I agree strongly. In practice, my concern is that, to implement this approach, we have to solve problems like swapping namespaces in and out of the class attribute during generalization and respecialization.
I'm not sure I understand this comment: the value of the class attribute is (conceptually) just a list of namespace prefixes that map to the URIs for packages. The class attribute value need never change.
Sorry, I was obscure. The class attribute doesn't change, but the namespace on the element would have to change during generalization and respecialization.
For instance, here's the element before generalization
<specializedPh xmlns="http://some.org/dita/module/specializedDomain" class="- topic/ph http://some.org/dita/module/specializedDomain#specializedPh ">
and after generalization
<ph class="- topic/ph http://some.org/dita/module/specializedDomain#specializedPh ">
If the namespace isn't changed, the element will be in either no namespace or the wrong namespace and thus won't be valid.
DITA generalization and respecialization processes would need to be reworked for namespace rewriting. I agree that it's a good direction. My question is only about the timing.
As long as this is always the case then the element type name is simply irrelevant for the purpose of DITA-based processing. That is, from a DITA perspective, the element type name is, by definition, a synonym for the element's class name.
Agreed, for DITA-sensitive applications, the element name is irrelevant.
DITA content also should be processable, however, by DITA-insensitive applications. For those applications and as well as for human consumption, the DITA architecture needs to support changing the element name -- effectively, casting to a different declared type.
2. Is this DITA document governed by a DITA application I recognize?
This question can be answered unambiguously by looking for namespace declarations that name known DITA application namespace URIs on the root element. It can be answered with reasonable (but not 100%) certainty by looking at the external identifier of the document's DOCTYPE declaration or non-namespace schema or by taking the user's word that this is in fact a DITA document governed by a particular application [this is the implication when you apply an application-specific XSLT to a document for example or when you work in an environment that only supports one XML application.]
Agreed, to process DITA content with its full semantics (that is, with full contextual awareness for the most specialized form of every element), an application needs to recognize the DITA vocabulary. The application might recognize the DITA vocabulary
* By the namespace on the root element if the namespace matches that of a known DITA vocabulary * By the public identifier associated with a document if the DTD is declared and the public identifier matches that of a known DITA vocabulary * By the Schema name associated with a document if the Schema is declared and the name matches that of a known DITA vocabulary * Via user input ("use the processor for this vocabulary on this document")
Regardless of whether the class attribute is namespaced, wouldn't these tests have to be performed anyway and in the same way?
That is, couldn't a content management system such as XIRUSS-T use the following approach?
1. Is a namespace declared on the root element? If so, match the known namespaced vocabularies including the known DITA vocabularies. 2. Is a DTD declared for the document? If so, match the known vocabularies with public identifiers including the DITA vocabularies. 3. Is a Schema declared for the document? If so, match the known vocabulary declarations including the DITA vocabulary declarations. 4. Prompt the user for known vocabularies including the DITA vocabularies.
If a namespace on the class attribute doesn't reduce the number of tests needed to match content with a handler, would it make sense to defer namespacing the class attribute until the full namespace solution is specified? That way, we keep our options open in case something else in the solution makes it unnecessary to namespace the class attribute?
For instance, if in 2.0, the namespace for the base DITA topic module ends up declared in the class attribute value, would declaring the namespace on the class attribute itself become redundant?
<ph class="- http://dita.oasis-open.org/modules/topic#ph ">
Eliot Kimber <ekim...@innodata-isogen.com> wrote on 08/19/2004 12:30:46 PM:
However, for the purposes of providing useful reference implementations to the DITA community the implementation is very important. In this area, the work that the IBM DITA team has already done is of vital importance--it reflects lots of hard word, careful thought, and hard experience.
Very generous remarks -- sincere thanks.
This again means that element type names or details such as whether or not applications use namespace qualification need not be a direct concern to the DITA specification itself.
If (as suggested above) vocabularies are a core construct for the DITA architecture, the namespaces used to identify vocabularies are a concern of the DITA architecture.
Also, in the future, there's a strong argument for DITA to incorporate namespaces into the typing system to identify specialization modules so we can have unambiguous element types.
Those reasons suggest that the DITA specification shouldn't leave namespaces entirely to the discretion of the application.
But the DITA standard is *not* primarily an authoring support system. It is a generic standard that defines core types and processing semantics that in turn provides a solid basis from which task-specific authoring support systems can be built. That's a key difference and requires a sometimes subtle shift in emphasis of requirements and features.
Maybe yes and no?
1. As an architecture, DITA is a typing system for specialization of elements, integration of design modules, and so on.
2. As a specific type hierarchy, DITA seeds the architecture with a base specialization module, derives core specialization modules, and assembles core vocabularies for the problem space of human-readable content.
The core declaration modules and DTDs are an attempt to conform to the DITA architecture within the limits of DTD syntax. For instance, the class and domains attributes exist exclusively to support processing. Similarly, the entity design patterns exist exclusively to support integration of modules as vocabularies.
As a specific type hierarchy, DITA has to be more concerned with authorability and readability than, say, SOAP because DITA content in the core problem space is, fundamentally, a communication from author to reader.
Are concerns with readability and authorability restricted to the declaration level? Couldn't those concerns be legitimate issues for abstract types?