TechiWarehouse.Com


Top 3 Products & Services

1.
2.
3.

Dated: Aug. 12, 2004

Related Categories

XML (Extensible)

Introduction

XML is all about metadata and the idea that certain groups of people have similar needs for describing and organizing the data they use. Like HTML, XML is a set of tags and declarations -- but rather than being concerned with formatting information on a page, XML focuses on providing information about the data itself and how it relates to other data.

Some data types are pretty much universal (,

, , and so forth). Others are industry or even company-specific (, , ). Healthcare organizations, for example, have a whole set of data types and acronyms understandable (some would say penetrable) only to claims processors. XML allows each of these data types to be easily recognized and, for site developers, used to create sites optimized around both the data and the people using it.

XML differs from HTML in three major respects:

If you're designing data-hungry sites, especially for intranets, you should be getting excited about XML, because in XML, you'll be able to create and respond to much richer set of data elements. That will in turn let you build more individualized dynamic sites and pages. For example, your site's users could access information across databases and types of data without having to rely on a search engine.

XML is the Extensible Markup Language. It is designed to improve the functionality of the Web by providing more flexible and adaptable information identification.

It is called extensible because it is not a fixed format like HTML (a single, predefined markup language). Instead, XML is actually a `meta language ' --a language for describing other languages--which lets you design your own customized markup languages for limitless different types of documents. XML can do this because it's written in SGML, the international standard meta language  for text markup systems (ISO 8879).

History

In November 1996 the initial XML draft was presented at the SGML 96 Conference in Boston. Then in March 97, the 1st XML Conference was held in San Diego, by the Graphic Communications Association.

In April 97, we then got the initial XML Linking Working Draft. In July it got revised, and then in August 97, we also got the revised XML Syntax Working Draft, plus the XML Developers Day was held in Montreal Canada on August 21st.

In October the W3C came with a note on 'W3C Data Formats on XML, SGML, HTML, and RDF'.
In December the XML 1.0 Proposed Recommendation arrived, and in February 1998, we now have the 2nd XML draft out.

Future Of XML

With all the activity surrounding XML, it's difficult to predict where it will be in six months. Tim Bray, coauthor of the XML and XLL specifications, says, "We have produced a tool that's designed to be general purpose, and the broad range of people leaping on board is evidence that we've succeeded."

In the short term, XML will probably surface first in metadata applications such as RDF. The next big impact will come with the approval of the Document Object Model specification. Bray claims that "the combination of XML and the DOM is really the magic bullet that will bring the Web alive."

XML should also help jump-start electronic commerce. XML will let e-commerce vendors tag products and the information associated with them (price, size, color, features) in a common way, making it easy for customers to comparison shop across the Web.

Meanwhile, Netscape and Microsoft can be counted on to continue expanding XML browser support to include both valid and well-formed XML documents, more XML applications, style sheet support designed for XML, and XML hyperlinking protocols. Watch both companies--as well as third-party software vendors--for XML authoring and publishing tool developments.

Why XML

What's the point of learning XML when you already know HTML? Well, the reality is that there are many applications that, while they can be done with HTML, DHTML, Java, ASP, or CGI, work much more easily as an XML application. For example: connecting databases and the Web, displaying the same data in multiple formats, information tailored to fit the needs of the reader, moving the processing load from the Web server to the client, One thing to keep in mind is that XML is not as hard as you might think. If you know HTML, XML is not difficult to learn. And there are, as I said above, many things that XML can do much better than HTML.

Connecting Databases

The way most people connect databases to the Web is with some external program such as a CGI script, PHP, or ASP. But the data itself is not defined as a database entry until it is actually in the database. If you use XML to connect your databases to the Web, once the data is entered into XML it is in a format compatible with the database.

The reason you can enter data that is compatible with the database and with your XML document is because of the second benefit of XML over HTML:

Multiple Data Formats

Often XML documents can be thought of as a structure for data. Each data element is labeled in such a way that a computer can recognize what it is. For example, if you enter an address book into XML, you would perhaps define the following data groups:

name - first, middle, and last
address - street, number, and apartment or suite
city
state or province
postal code
country

If you have an XML application that accepts data into each of those data groups, you can then create a form to print the data on an envelope, create another form to write all the entries in an address book, and place a final form to keep all the information in a database. With one entry of the data, it can be used for three different applications.

Tailored Information

Documents written in XML can be more easily tailored to be viewable in various formats. To go back to HTML, if you create a document with headings

,

, and

you have a generally good idea of how they will look on a Web page. But if you were to try to view those headers on a WAP phone you might be surprised how they looked (if they displayed at all).

If we go back to our XML enabled address book, we can use that XML data to display it on the WAP phone and the Web page. We can use a scripting language to determine what is viewing the data, and if it's the WAP phone - display the address book using WML and style sheets, and if it's the Web page - display it using XHTML. But we don't have to create two separate documents. We simply create rules for how the data will be displayed in WML and XHTML and have the pages dynamically built from our XML enabled database.

Load Distribution

One really good feature of XML is that it can be used along with Java and other tools to move most of the processing from the server to the client itself. For example, if you take your address book XML document, you can pass it to a Java applet that will search through the address book fields. But all the processing of both the data and the searching is done by the Java application on the client side rather than the server side. This means that an XML and Java enabled server can process more documents than one that must do the searching itself.

Learning XML or HTML?

Choosing to learn XML is a choice you need to make for yourself, but there are many things that XML can do much more effectively than HTML. If you learn XML, you will have the advantage of a powerful tool in your repertoire to create and manage data.

XML vs HTML

Now the Question is will XML replace HTML or not, Doubtful. At least not in the near term. Initially, I expect we will see XML used as a storage format, and HTML used as the display format. Just run your XML document through a filter and out comes an HTML document. This will provide backward compatibility support for legacy HTML browsers. I think that we will continue to see this for at least 2-3 years. Although, use of native XML browsers will increase throughout that time, and eventually eclipsing HTML, relegating HTML to purely legacy support. This all depends on the tools.

Initially, XML will be difficult to use, and expensive. Only large firms who have clear and distinct needs and the money needed to support it will use it.

HTML has the advantage of being very simple to use. XML is not difficult, but it's not that easy either. So HTML will probably continue to be used by the general public because it's so simple to use.

Software development efforts take time though. It takes 6 months to do a good new product revision, plus a beta-testing cycle, which means it could be a year or more before many of these products become available.

Furthermore, the XML standard isn't even all the way hammered out. The XML data, style and linking pieces have yet to be completed. Each is in various draft stages. We may not see a complete cohesive XML standard before December 31, 1998. On the other hand, the whole XML standards process has been moving along at quite a rapid pace, so we might be surprised and see something sooner.

Considering the time it takes for a technology to become truly mainstream, a 2-3 year adoption curve, with a couple of years tacked on to that until we see the really spectacular implementations, is probably not unrealistic, in my opinion.

XML and HTML complement each other. Browsers will be able to process both, and future HTML standards will likely allow mixing HTML and XML in the same document.

What about existing HTML documents? Am I going to have to re-code all of them in XML? Will XML-native browsers also support HTML documents as well? These are all open questions.

Basically, if your HTML document uses quotes around ALL of the attributes and closes ALL of the tags, then it's awfully close to being well formed.

I think that realistically, we will see both browsers which support XML and HTML. Just like early web browsers built in support for FTP and Gopher in addition to HTML. These protocols continue to be supported. So you won't necessarily have to convert all of your documents. On the other hand, you may want to. In order to help facilitate that, we will probably see HTML-to-XML conversion utilities. Naturally, the quality of the resulting documents will vary. Some will be good. Some won't be. Automation can only take you so far.

The ways to use XML

Traditional data processing, where XML encodes the data for a program to process

Document-driven programming, where XML documents are containers that build interfaces and applications from existing components

Archiving -- the foundation for document-driven programming, where the customized version of a component is saved (archived) so it can be used later

Binding, where the DTD or schema that defines an XML data structure is used to automatically generate a significant portion of the application that will eventually process that data
Traditional Data Processing
XML is fast becoming the data representation of choice for the Web. It's terrific when used in conjunction with network-centric Java-platform programs that send and retrieve information. So a client/server application, for example, could transmit XML-encoded data back and forth between the client and the server.

In the future, XML is potentially the answer for data interchange in all sorts of transactions, as long as both sides agree on the markup to use. (For example, should an email program expect to see tags named and , or and ?) The need for common standards will generate a lot of industry-specific standardization efforts in the years ahead. In the meantime, mechanisms that let you "translate" the tags in an XML document will be important. Such mechanisms include projects like the RDF initiative, which defines "meta tags", and the XSL specification, which lets you translate XML tags into other XML tags.

Document-Driven Programming (DDP)

The newest approach to using XML is to construct a document that describes how an application page should look. The document, rather than simply being displayed, consists of references to user interface components and business-logic components that are "hooked together" to create an application on the fly.

Of course, it makes sense to utilize the Java platform for such components. Both Java BeansTM for interfaces and Enterprise Java BeansTM for business logic can be used to construct such applications. Although none of the efforts undertaken so far are ready for commercial use, much preliminary work has already been done.

Note: The Java programming language is also excellent for writing XML-processing tools that are as portable as XML. Several Visual XML editors have been written for the Java platform. For a listing of editors, processing tools, and other XML resources, see the "Software" section of Robin Cover's SGML/XML Web Page.

Binding

Once you have defined the structure of XML data using either a DTD or the one of the schema standards, a large part of the processing you need to do has already been defined. For example, if the schema says that the text data in a element must follow one of the recognized date formats, then one aspect of the validation criteria for the data has been defined -- it only remains to write the code. Although a DTD specification cannot go the same level of detail, a DTD (like a schema) provides a grammar that tells which data structures can occur, in what sequences. That specification tells you how to write the high-level code that processes the data elements.

But when the data structure (and possibly format) is fully specified, the code you need to process it can just as easily be generated automatically. That process is known as binding -- creating classes that recognize and process different data elements by processing the specification that defines those elements. As time goes on, you should find that you are using the data specification to generate significant chunks of code, so you can focus on the programming that is unique to your application.

Archiving

The Holy Grail of programming is the construction of reusable, modular components. Ideally, you'd like to take them off the shelf, customize them, and plug them together to construct an application, with a bare minimum of additional coding and additional compilation.

The basic mechanism for saving information is called archiving. You archive a component by writing it to an output stream in a form that you can reuse later. You can then read it in and instantiate it using its saved parameters. (For example, if you saved a table component, its parameters might be the number of rows and columns to display.) Archived components can also be shuffled around the Web and used in a variety of ways.

When components are archived in binary form, however, there are some limitations on the kinds of changes you can make to the underlying classes if you want to retain compatibility with previously saved versions. If you could modify the archived version to reflect the change, that would solve the problem. But that's hard to do with a binary object. Such considerations have prompted a number of investigations into using XML for archiving. But if an object's state were archived in text form using XML, then anything and everything in it could be changed as easily as you can say, "search and replace".

XML's text-based format could also make it easier to transfer objects between applications written in different languages. For all of these reasons, XML-based archiving is likely to become an important force in the not-too-distant future.

XML use

A tool for reading XML documents is popularly called an XML parser, though the more formal name is an XML processor. XML processors pass data to an application for authoring, publishing, searching, or displaying. XML doesn't provide an application programming interface (API) to an application, it just passes data to it. No XML processor will parse data that isn't well-formed. Both Netscape and Microsoft either already include or are planning to include XML parsers in their browsers.

The XML developer community makes available free XML readers and parsers for use in applications or XML authoring software.

Lets the Game Begin

XML allows tags to be defined by users. This gives users tremendous power to describe the structure and nature of the information presented in a document. This means, however, that standard browsers will not be able to do anything with these extensions. This makes the software environment for XML more complex, as described below.

Here is an example of XML used to describe a data record that might be presented in a document:




98756
basket
each

color
blue


size
large



Note a few interesting things about this example.

First of all, as with HTML, each tag is surrounded by less than and greater than brackets (<>), and is usually followed by text. The text is in turn followed by an end tag, in the form . A tag may have no content, in which case either the end tag follows immediately upon the tag (as in ), or the tag itself ends with a forward slash (as in ). Unlike with HTML, however, the end tag is always required.

A second thing to note is that, in this case, following the tag for product, a set of related tags follow, describing characteristics (columns, in this case) of product. In this particular case, the tag has been defined such that it must be followed by exactly one tag for and one for . You can't see this from the example, but is optional. The tag is also optional, and there also may be one or more occurrences of it.

Although it is optional, all XML documents should begin with (or whatever version number is appropriate.) Note that the structure is hierarchical, so that an element can be under only one other element, and there can be only one hierarchy in a document.

Comments are in the form Note that the double hyphens must be part of the comment. Note also that, unlike HTML, XML lets you use a comment to surround lines of code that you want to disable.

The meaning of a tag is defined in a "document type declaration" (DTD). This is a body of code that defines tags through a set of "ELEMENTS".

The DTD for the above example looks like this:








]

The DTD for an XML document can be either part of the document or in an external file. If it is external, the DOCTYPE statement still occurs in the document, with the argument "SYSTEM -filename-", where "-filename-" is the name of the file containing the DTD. For example, if the above DTD were in an external file called "xxx.dtd", the DOCTYPE statement would read:

The same line would then also appear as the first line in the file xxx.dtd. Note that the name specified in the DOCTYPE statement must be the same as the name of the highest level ELEMENT.

The definition for the element product includes a list of other elements that must follow - in this case, product_id, product_name, unit_of_measure, and specification. The "?" after unit_of_measure means that one occurrence may or may not follow. It's optional. The "*" after specification means that it is optional, but one or more occurrences may follow.

If there were a "+" after any element in the list, it would mean the element is not optional, and that there may be more than one occurrence of it.

Each of the elements in the list is then defined in turn in one of the lines that follow. "#PCDATA" means that the tag will contain text that can be parsed by browsing software. Specification is further elaborated upon as being followed by variable and value.

Case

XML is case sensitive. XML keywords are in all uppercase. The case of a tag names must be the same as in its DTD definition. By convention, entity/table names in the above example are all in uppercase, while attribute/column names are all in lowercase. Conventions will vary.

Attributes

Tags can have attributes. For example, instead of listing associated tags in defining , above, the following line could be added to the DTD:


This creates "variable" and "value" as two attributes of specification, so they do not have to appear as element in their own right. The data from the above example would then look like this:




98756
basket
each






Note that this provides yet another design decision in the lap of the XML designer. There are advantages and disadvantages to each way of doing this.

Correctness

Three levels of correctness are associated with an XML document:

A "well-formed" XML document is one where the elements are properly structured as a tree, with the opening and closing tags correctly nested. Well-formed documents are essential for information exchange.
A "valid" XML document is well formed and has tags that correspond to the document type declaration. It contains only elements and attribute values that conform to the DTD. While an XML document can be prepared and read without a DTD, a DTD is essential for establishing validity.

A "semantically correct" XML document is beyond the control of XML. It is incumbent upon the preparer of the document to insure that it is logically structured and makes sense. (2)
Implications

The question remains, what does all this mean? The answer to that question is not obvious. Clearly web screens that display data from a database can be designed to do so more easily and with more control.

Not in the language, however, is the mechanism by which data will actually be retrieved from a database and placed in this page. If web pages are to be created with database data, software must be written to retrieve those data and create the pages. Presumably this would be in some combination of Java and SQL.

In addition, a standard browser, by definition, cannot properly interpret customized tags.

This can be addressed in one of three ways:

Software "applets" may be written and attached to the page. These would understand the data structure and respond accordingly to each tag.

Generic software may read the DTD and respond to tags accordingly. In this case, the response would be limited to what can be inferred from the DTD.

A community may define a set of tags for its purposes, agree to use them, and develop community-specific software to respond to them.

Presumably the first two options will be in Java or a similar language, but the standard tools for doing this remain to be written. The third option has already begun to take effect. For example, the chemical industry has set up an XML-based Chemical Markup Language, and astronomers, mathematicians and the like have similarly defined sets of tags for describing things in their respective fields.

Used to Describe Data

One feature of XML that has captured the industry's imagination is its ability to describe data structures and hold data. As was seen in the above example, with XML, you can define new tags specifically to describe the equivalent of tables and columns in a relational database structure. More significantly, the tags for a set of columns or attributes can be related to the tags for their parent table or entity.

While the tag structure does seem to be a good vehicle for describing and communicating database structure, the requirement for discipline in the way we organize data is more present than ever. XML doesn't care if we have repeating groups, monstrous data structures, or whatever. If we are to use XML to express a data structure, it is incumbent upon us to do as good a job with the tool as we can.

Following in the tradition of the chemists and astronomers described above, the Object Management Group (OMG) has settled on a set of XML tags they call the XML Meta data Interchange (XMI) as a way to describe in standard terms the structure of data about data ("meta data"). This is useful in communicating between CASE tools, and in describing a "meta data repository". Along the same lines, a group of companies are in the process of defining a Common Warehouse Meta data Interchange (CWMI) that comprises a subset of the XMI tags to support data warehouses.

This means that there are actually two ways that a database structure can be described in XML:

First, an application database can be described in the DTD of an XML document. In this case the operational data contained in the described database could be placed between sets of the described tags. The DTD could, for example, be generated by one CASE tool and read by another one as a way of communicating data structure from one to the other.

A second approach is to make the table and column definitions data that appear between tags of an XMI metamodel. This is a little more arcane, since the XMI metamodel is very abstract, but using the XMI metamodel allows for description of much more than tables and columns.)

Note, however, that the issue in defining a meta data repository or communicating between CASE tools is not the use of XML or any other particular language. The issue is the database structure and its semantics. The important question is not how a universal meta data repository will be represented. It could as easily be represented by a set of relational tables or an entity/relationship diagram. The questions are, what's in it and what does it mean? XML by itself does not answer that question. Which objects are significant and should be described? That is the harder question. Having a new language for describing them doesn't seem to contribute to that conversation.

Indeed, in recognizing that XML is a good vehicle for describing database structure, the issue that seems most obvious is that this will put greater responsibility on data administrators to define data correctly. XML will not do that. XML will only record whatever data design (good or bad) human beings come up with.

.

Attributes and XML

Describing Your XML Elements

If you've written HTML, you've almost certainly used attributes without even realizing it. For example, the image tag requires the use of at least one attribute, the src attribute in order to display any images. But with HTML, if you include an attribute that is incorrect or invalid, the browsers will ignore it.

In XML, like HTML, an attribute is a part of an element that provides additional information about that element. You might think of an attribute as an adjective describing the element it is within. For example, if you have an element "dog", it might have an attribute color="white":

Attributes are formed in name=value pairs. Thus, in XML, you would never write - that would be incorrect. One way to think about it is to think of the most generic instance of the adjective you are using. If you're describing your "dog" element as "big", "white", and "smart", then you should probably have three attributes: size, color, and intelligence.

Then you could have one and another element

An XML attribute can be one of the following types:

CDATA
CDATA is character data. This means that any string of non-markup characters is legal as part of the attribute. So, if the color attribute uses CDATA for our dog element, your DTD might look like this:

This would allow for both and
ENTITY and ENTITIES

The ENTITY attribute type indicates that the attribute will represent an external entity in the document itself. In order for the parser to know what to do with the entity, you need to declare it with a notation element in your DTD.

Enumeration

Enumeration allows you to define a specific list of values that the attribute value must match. With your dog element, you might want to define the "intelligence" attribute as only "smart" or "stupid". Your DTD for this might look like:

Use the ID attribute type if you want to specify a unique identifier for each element. So, if I had a database write out XML of my dogs, I might use the ID type for their names, as all my dogs have a different name. But this is more often used with truely unique identifiers, like "d001", "d002", and "d003". Your DTD for this attribute type might look like this:



IDREF and IDREFS

You can use the IDREF type to reference an ID that has been named for another element. In my pets data, I might have another element describing the dog's bad habits. With the IDREFS attribute type, I can refer to the ids of all the dogs that did one particular bad habit:
NMTOKEN and NMTOKENS
The NMTOKEN attribute type is similar to CDATA with even more restrictions on what data can be part of the attribute. They are restricted to letters, numbers, periods, hyphens, underscores, and colons.

NOTATION

A NOTATION declares that an element will be referenced to a NOTATION declared somewhere else in the XML document. For example, if you wanted to include graphics in your document, you might include a notation declaration defining a JPeG:

And when you create your graphic element, you would refer to the type of image in the attributes list:

So, if I were creating elements to list my dogs in an XML document, the DTD might look something like this (this is not a complete DTD):

name CDATA #REQUIRED
id ID #REQUIRED
size (small | medium | large) #IMPLIED
color CDATA #IMPLIED
intelligence (smart | stupid) #IMPLIED
photo IDREF #IMPLIED
>


src CDATA #REQUIRED
id ID #REQUIRED
filetype NOTATION (jpg | gif)
>


type (digging | barking | howling) #REQUIRED
dogs IDREFS
>

And this would result in a section of an XML document that might look like this:




&dog name="Calico" id="d002" intelligence="stupid" />

.

Using XML Namespaces

An XML namespace is a collection of names that can be used as element or attribute names in an XML document. The namespace qualifies element names uniquely on the Web in order to avoid conflicts between elements with the same name. The namespace is identified by some Uniform Resource Identifier (URI), either a Uniform Resource Locator (URL), or a Uniform Resource Number (URN), but it doesn't matter what, if anything, it points to. URIs are used simply because they are globally unique across the Internet.

Namespaces can be declared either explicitly or by default. With an explicit declaration, you define a shorthand, or prefix, to substitute for the full name of the namespace. You use this prefix to qualify elements belonging to that namespace. Explicit declarations are useful when a node contains elements from different namespaces. A default declaration declares a namespace to be used for all elements within its scope, and a prefix is not used.

Now the question is that How do I declare an explicit namespace?
The following explicit declaration declares "bk" and "money" to be shorthand for the full names of their respective namespaces. The xmlns attribute is an XML keyword for a namespace declaration.


xmlns:money="urn:Finance:Money">
Creepy Crawlies
22.95



All elements beginning with "bk:" or "money:" are considered to be from the namespace "urn:BookLovers.org:BookInfo" or "urn:Finance:Money," respectively. Run the mouse over the XML to reveal information about the different elements.

NowHow do I declare a default namespace?

A namespace declared without a prefix becomes the default namespace for the document. All elements and attributes in the document that do not have a prefix will then belong to the default namespace. The following example declares that the element and all elements and attributes within it (, <PRICE>, currency) are from the namespace "urn:BookLovers.org:BookInfo." <p><BOOK xmlns="urn:BookLovers.org:BookInfo"><br> <TITLE>Creepy Crawlies
22.95
Try it!

In the following text box, add an explicit namespace declaration to the element of the following XML document for the namespace "urn:BookLovers.org:AuthorInfo" with a prefix of "author". Click the Well-formed? button to see whether your XML document conforms to the XML specification.

Creepy Crawlies 22.95

Now, add an element to the XML above called , qualified with the namespace prefix "author", containing the book's author, "Stefan Knorr". Click the Well-formed? button to see whether your XML document conforms to the XML specification.

XML Terms

XML often sounds more confusing than it really is because there are a lot of jargon terms. Once you understand the following terms, you'll be able to read most basic XML tutorials and have a much better chance of understanding what they mean.

What You Need To Know
In order to best understand these terms, you should have a clear understanding of HTML. If you want to learn HTML, or just brush up, try the HTML Tutorial.

attribute
A part of an element that provides additional information about that element.
Attributes and XML
attribute glossary entry


child
An XML element that is contained within another element.
child glossary entry


DTD
Document Type Definition. A DTD provides a list of the elements, attributes, comments, notes, and entities contained in the document, as well as their relationships to one another.
DTDs and Markup Languages
DTD glossary entry

element
An XML element is the central building block of any XML document.
Elements in XML
element glossary entry

markup
The characters and codes that change a text document into an XML or other Markup Language document. This includes the < and > characters as well as the elements and attributes of a document.
What is a Markup Language
markup glossary entry

Nesting
Placing one element inside another. When two tags are opened, they must be closed in the reverse order.
nesting glossary entry

Parent
An XML element that contains another element.
parent glossary entry

Tag
The markup characters that indicate the start or end of an element - but not the element content itself.
HTML Tags
tag glossary entry

Valid
An XML document that is verified correct against a DTD or schema.
Create a Valid XML Document
valid glossary entry

Well-formed
An XML document that follows the rules set forth by the XML specification, including having an XML declaration, correct comments, all tags are closed, all attributes are quoted, every document has one "container" element.
Create a Well-Formed Document
well-formed glossary entry

XML and style sheets

Because XML is really about specifying characteristics of data, and not simply presenting it, you will need to write style sheets to use it. Since DHTML, CSS and CDF's are all standards supported by both Netscape and Microsoft, you can start using XML today. Also, new tools are constantly emerging to evaluate your XML conventions and ensure that others parsers can use them as you intended.

The Extensible Style Language (XSL) represents a early attempt to create a more dynamic and powerful notation for defining document style, and to augment the capabilities of the Cascading Style Sheets work (CSS1 and CSS2) already in place at the W3C. Objectives here include a model that can dynamically resize itself completely around base font selections (which CSS cannot currently handle) and to provide more powerful, interactive support for document styles and rendering. At present, this work is largely experimental and most active development uses CSS1 or CSS2 style sheets for production. But just as XML represents a strict subset of SGML, the work on XSL derives in large part from the DSSSL style sheet language developed in the SGML community.

XSL can handle an unlimited number of tags, each in an unlimited number of ways, by virtue of its extensibility. It brings advanced layout features to the Web, such as rotated text, multiple columns, and independent regions. It supports international scripts, all the way to mixing left-to-right, right-to-left, and top-to-bottom scripts on a single page.

Designing an XML Data Structure

Saving Yourself Some Work, Whenever possible, use an existing DTD. It's usually a lot easier to ignore the things you don't need than to design your own from scratch. In addition, using a standard DTD makes data interchange possible, and may make it possible to use data-aware tools developed by others.

So, if an industry standard exists, consider referencing that DTD with an external parameter entity. One place to look for industry-standard DTDs is at the repository created by the Organization for the Advancement of Structured Information Standards (OASIS) at http://www.XML.org. Another place to check is CommerceOne's XML Exchange at http://www.xmlx.com, which is described as "a repository for creating and sharing document type definitions".

Attributes and Elements

One of the issues you will encounter frequently when designing an XML structure is whether to model a given data item as a subelement or as an attribute of an existing element. For example, you could model the title of a slide either as:


This is the title

or as:

...
In some cases, the different characteristics of attributes and elements make it easy to choose. Let's consider those cases first, and then move on to the cases where the choice is more ambiguous.

Forced Choices

Sometimes, the choice between an attribute and an element is forced on you by the nature of attributes and elements. Let's look at a few of those considerations:

The data contains substructures
In this case, the data item must be modeled as an element. It can't be modeled as an attribute, because attributes take only simple strings. So if the title can contain emphasized text like this: The Best Choice, then the title must be an element.

The data contains multiple lines
Here, it also makes sense to use an element. Attributes need to be simple, short strings or else they become unreadable, if not unusable.

The data changes frequently
When the data will be frequently modified, especially by the end user, then it makes sense to model it as an element. XML-aware editors tend to make it very easy to find and modify element data. Attributes can be somewhat harder to get to, and therefore somewhat more difficult to modify.

The data is a small, simple string that rarely if ever changes
This is data that can be modeled as an attribute. However, just because you can does not mean that you should. Check the "Stylistic Choices" section below, to be sure.
The data is confined to a small number of fixed choices

Here is one time when it really makes sense to use an attribute. Using the DTD, the attribute can be prevented from taking on any value that is not in the pre-approved list. An XML-aware editor can even provide those choices in a drop-down list. Note, though, that the gain in validity restriction comes at a cost in extensibility. The author of the XML document cannot use any value that is not part of the DTD. If another value becomes useful in the future, the DTD will have to be modified before the document author can make use of it.
Stylistic Choices

As often as not, the choices are not as cut and dried as those shown above. When the choice is not forced, you need a sense of "style" to guide your thinking. The question to answer, then, is what makes good XML style, and why.

Defining a sense of style for XML is, unfortunately, as nebulous a business as defining "style" when it comes to art or music. There are a few ways to approach it, however. The goal of this section is to give you some useful thoughts on the subject of "XML style".

Visibility

The first heuristic for thinking about XML elements and attributes uses the concept of visibility. If the data is intended to be shown -- to be displayed to some end user -- then it should be modeled as an element. On the other hand, if the information guides XML processing but is never displayed, then it may be better to model it as an attribute. For example, in order-entry data for shoes, shoe size would definitely be an element. On the other hand, a manufacturer's code number would be reasonably modeled as an attribute.

Consumer / Provider

Another way of thinking about the visibility heuristic is to ask who is the consumer and/or provider of the information. The shoe size is entered by a human sales clerk, so it's an element. The manufacturer's code number for a given shoe model, on the other hand, may be wired into the application or stored in a database, so that would be an attribute. (If it were entered by the clerk, though, it should perhaps be an element.) You can also think in terms of who or what is processing the information. Things can get a bit murky at that end of the process, however. If the information "consumers" are order-filling clerks, will they need to see the manufacturer's code number? Or, if an order-filling program is doing all the processing, which data items should be elements in that case? Such philosophical distinctions leave a lot of room for differences in style.

Container vs. Contents

Another way of thinking about elements and attributes is to think of an element as a container. To reason by analogy, the contents of the container (water or milk) correspond to XML data modeled as elements. On the other hand, characteristics of the container (blue or white, pitcher or can) correspond to XML data modeled as attributes. Good XML style will, in some consistent way, separate each container's contents from its characteristics.
To show these heuristics at work: In a slideshow the type of the slide (executive or technical) is best modeled as an attribute. It is a characteristic of the slide that lets it be selected or rejected for a particular audience. The title of the slide, on the other hand, is part of its contents. The visibility heuristic is also satisfied here. When the slide is displayed, the title is shown but the type of the slide isn't. Finally, in this example, the consumer of the title information is the presentation audience, while the consumer of the type information is the presentation program.

Normalizing Data

In an HTML file, the only way to achieve that kind of modularity is with HTML links -- but of course the document is then fragmented, rather than whole. XML entities, on the other hand, suffer no such fragmentation. The entity reference acts like a macro -- the entity's contents are expanded in place, producing a whole document, rather than a fragmented one. And when the entity is defined in an external file, multiple documents can reference it.

The considerations for defining an entity reference, then, are pretty much the same as those you would apply to modularize program code:

Whenever you find yourself writing the same thing more than once, think entity.
That lets you write it one place and reference it multiple places.

If the information is likely to change, especially if it is used in more than one place, definitely think in terms of defining an entity. An example is defining productName as an entity so that you can easily change the documents when the product name changes.

If the entity will never be referenced anywhere except in the current file, define it in the local_subset of the document's DTD, much as you would define a method or inner class in a program.

If the entity will be referenced from multiple documents, define it as an external entity, the same way that would define any generally usable class as an external class.

External entities produce modular XML that is smaller, easier to update and maintain. They can also make the resulting document somewhat more difficult to visualize, much as a good OO design can be easy to change, once you understand it, but harder to wrap your head around at first.

You can also go overboard with entities. At an extreme, you could make an entity reference for the word "the" -- it wouldn't buy you much, but you could do it.

Note:

The larger an entity is, the less likely it is that changing it will have unintended effects. When you define an external entity that covers a whole section on installation instructions, for example, making changes to the section is unlikely to make any of the documents that depend on it come out wrong. Small inline substitutions can be more problematic, though. For example, if productName is defined as an entity, the name change can be to a different part of speech, and that can kill you! Suppose the product name is something like "HtmlEdit". That's a verb. So you write, "You can HtmlEdit your file...". Then, when the official name is decided, it's "Killer". After substitution, that becomes "You can Killer your file...". Argh. Still, even if such simple substitutions can sometimes get you in trouble, they can also save a lot of work. To be totally safe, though, you could set up entities named productNoun, productVerb, productAdj, and productAdverb!

Normalizing DTDs

Just as you can normalize your XML document, you can also normalize your DTD declarations by factoring out common pieces and referencing them with a parameter entity. Factoring out the DTDs (also known as modularizing or normalizing) gives the same advantages and disadvantages as normalized XML -- easier to change, somewhat more difficult to follow.

You can also set up conditionalized DTDs, as described in the SAX tutorial section Conditional Sections. If the number and size of the conditional sections is small relative to the size of the DTD as a whole, that can let you "single source" a DTD that you can use for multiple purposes. If the number of conditional sections gets large, though, the result can be a complex document that is difficult to edit.;

DTDs and Markup Languages

When you write HTML with some editors, you'll notice that there is this strange line written across the top . While you may not know what it is, it still serves a purpose, and with XML that line of code is required to write a well-formed document.

This line is the "Document Type Declaration" that defines the "Dcoument Type Definition" or DTD. In the HTML declaration there are four parts:

DOCTYPE
this tells the browser that this tag is a Document Type Declaration
HTML
the name of the DTD
PUBLIC
the DTD is available publicly
-//W3C//DTD HTML 4.0 Transitional//EN
the actual DTD used for the document, in this case HTML 4.0 Transitional in English

A DTD in an XML or HTML document provides a list of the elements, attributes, comments, notes, and entities contained in the document. It also indicates their relationship to one another within the document. In other words, a DTD is the grammar of an XML or HTML document.

Purpose of a DTD

When using a DTD for an XML document (including XHTML), the DTD is there to provide structure for your documents. It is easy to write an XML document, but without the DTD, it has no syntactic meaning to the computer. For example, if you come across this portion of an XML document:


123 Any Street
Jackson MI

While you would know it was an address, you wouldn't know whether it was an address the computer was supposed to create from a database table, if that address was for mailing, and other things. The computer, on the other hand, wouldn't see this as anything more than a string of text. If you viewed it in an XML browser, you might not get any errors, but you also wouldn't get a very interesting or useful page.

In other articles, we'll explore the inner workings of a DTD, and examine the elements that make up a DTD. You'll learn what an element, attribute, entity, and notation is and how to use them within your own DTDs. Plus, you'll learn how to read other DTDs so that you can use them in your own XML documents.

Using the XML Object Model

The XML object model is a collection of objects that you use to access and manipulate the data stored in an XML document. The XML document is modeled after a tree, in which each element in the tree is considered a node. Objects with various properties and methods represent the tree and its nodes. Each node contains the actual data in the document.

How do I access the nodes in the tree?

You access nodes in the tree by scripting against their objects. These objects are created by the XML parser when it loads and parses the XML document. You reference the tree, or document object, by its ID value. In the following example, MyXMLDocument is the document object's ID value. The document object's properties and methods give you access to the root and child node objects of the tree. The root, or document element, is the top-level node from which its child nodes branch out to form the XML tree. The root node can appear in the document only once.

Run the mouse over the following data island to reveal the code required to access each node. The root node is , and its child node is , which has child nodes of and .




Jane Smith
3.8


The following list is a sample of the properties and methods that you use to access nodes in an XML document.

Property/Method Description

XMLDocument Returns a reference to the XML Document Object Model (DOM) exposed by the object.

DocumentElement Returns the document root of the XML document.
ChildNodes Returns a node list containing the children of a node (if any).
Item Accesses individual nodes within the list through an index. Index values are zero-based, so item(0) returns the first child node.
Text Returns the text content of the node.

The following code shows an HTML page containing an XML data island. The data island is contained within the element.



HTML with XML Data Island


Within this document is an XML data island.




Calinda Cabo Baja
Na Balam Resort





You access the data island through the ID value, "resortXML", which becomes the name of the document object. In the preceding example, the root node is , and the child nodes are .

The following code accesses the second child node of and returns its text, "Na Balam Resort."

resortXML.XMLDocument.documentElement.childNodes.item(1).text

How do I persist XML DOM tree information?

Several methods and interfaces are available for persisting DOM information.

If you are using a script language, the DOMDocument object exposes the load, loadXML, and save methods, and the xml property.

Authoring XML Schemas

An XML Schema is an XML-based syntax for defining how an XML document is marked up. XML Schema is a schema specification recommended by Microsoft and it has many advantages over document type definition (DTD), the initial schema specification for defining an XML model. DTDs have many drawbacks, including the use of non-XML syntax, no support for datatyping, and non-extensibility. For example, DTDs do not allow you to define element content as anything other than another element or a string.

For more information about DTDs, see the Worldwide Web Consortium (W3C) XML Recommendation. XML Schema improves upon DTDs in several ways, including the use of XML syntax, andsupport for datatyping and namespaces. For example, an XML Schema allows you to specify an element as an integer, a float, a Boolean, a URL, and so on.

The Microsoft® XML Parser (MSXML) in Microsoft Internet Explorer 5.0 and later can validate an XML document with both a DTD and an XML Schema.

How can I create an XML Schema?

Run the mouse over the following XML document to reveal the schema declarations for each node.



James Smith
3.8

You'll notice in the preceding document that the default namespace is "x-schema:classSchema.xml." This tells the parser to validate the entire document against the schema (x-schema) at the following URL ("classSchema.xml").

The following is the entire schema for the preceding document. The schema begins with the element containing the declaration of the schema namespace and, in this case, the declaration of the "datatypes" namespace as well. The first, "xmlns="urn:schemas-microsoft-com:xml-data"," indicates that this XML document is an XML Schema. The second, "xmlns:dt="urn:schemas-microsoft-com:datatypes"," allows you to type element and attribute content by using the dt prefix on the type attribute within their ElementType and AttributeType declarations.

xmlns:dt="urn:schemas-microsoft-com:datatypes">












The declaration elements that you use to define elements and attributes are described as follows.

Element Description

ElementType Assigns a type and conditions to an element, and what, if any, child elements it can contain.
AttributeType Assigns a type and conditions to an attribute.
attribute Declares that a previously defined attribute type can appear within the scope of the named ElementType element.

element Declares that a previously defined element type can appear within the scope of the named ElementType element.

The content of the schema begins with the AttributeType and ElementType declarations of the innermost elements.





The next ElementType declaration is followed by its attribute and child elements. When an element has attributes or child elements, they must be included this way in its ElementType declaration. They must also be previously declared in their own ElementType or AttributeType declaration.






This process is continued throughout the rest of the schema until every element and attribute has been declared.

Unlike DTDs, XML Schemas allow you to have an open content model, allowing you to do such things as type elements and apply default values without necessarily restricting content.

In the following schema, the element is typed and has an attribute with a default value, but no other nodes are declared within the element.

xmlns:dt="urn:schemas-microsoft-com:datatypes">










The preceding schema allows you to validate only the area with which you are concerned. This gives you more control over the level of validation for your document and allows you to use some of the features provided by the schema without having to employ strict validation.

Try authoring a schema for the following XML document.



Fidelma McGinn
425-655-3393


5523918
shovel
39.99

1998-10-23
1998-11-03


After you have completed the schema, run it through the XML Validator.

What is an XML data island

A data island is an XML document that exists within an HTML page. It allows you to script against the XML document without having to load it through script or through the tag. Almost anything that can be in a well-formed XML document can be inside a data island.

The element marks the beginning of the data island, and its ID attribute provides a name that you can use to reference the data island.

The XML for a data island can be either inline:



Mark Hanson
81422


or referenced through a SRC attribute on the tag:


You can also use the
Authoring guidelines
Simply author an XML document, place that XML document within an element, and give that element an ID attribute.

In the following text box, type a well-formed XML document.

Type an ID for the data island.

Click the Insert Data Island button to display an HTML page with your data island inserted.

Conclusion

XML is pretty simple, and very flexible. It has many uses yet to be discovered -- we are just beginning to scratch the surface of its potential. It is the foundation for a great many standards yet to come, providing a common language that different computer systems can use to exchange data with one another. As each industry-group comes up with standards for what they want to say, computers will begin to link to each other in ways previously unimaginable.

If you want to be an early adopter, now is the time to start reading the standards, looking at the specs, and starting to think how you could use this technology. XML is not going to catch on all by itself. It takes people to support it, to build the tools and create the content using it. XML seems to have a lot of industry support behind it. It offers the potential to do a lot of things that people want to be able to do.

If you think it can work for you, look into it more. Then make an informed decision. Take a look at it. Find out how it's being used. Try it for yourself, just to play with, or in a small pilot project. If the tools aren't mature enough yet, then wait a few months and look again later. If XML turns out to be a good technology it will succeed. If not, people will pass it by.

Everyone, including Netscape is supporting XML. It is already used to some extent by Netscape, mainly in its own internal output and IE4 supports it some, but not completely. Its support will be strong in the 5th generation of both browsers. It is extremely helpful in establishing means to speak with databases and as a way to have PDF type output, but with access to the data on the browser.

Now that you've gotten free know-how on this topic, try to grow your skills even faster with online video training. Then finally, put these skills to the test and make a name for yourself by offering these skills to others by becoming a freelancer. There are literally 2000+ new projects that are posted every single freakin' day, no lie!


Previous Article

Next Article


Xavier's Comment
The folnowilg code is an example of how you write the XML to file:f = open(‘test.xml’, ‘w’)doc.writexml(f)f.close()
13 Mon Feb 2012
Admin's Reply:



Karthiga's Comment
There’s sienthomg I don’t get about this (maybe because of my lack of experience with Python). Once I’m done appending the Childs how do I actually create the file, I mean the file with its location, written in the hard disk?Thanks!
13 Mon Feb 2012
Admin's Reply:

I think Xavier got it just right.




Jessie's Comment
Hey, that's a cevler way of thinking about it.
07 Sat Jan 2012
Admin's Reply:

 Thank you Jessie :)




Satya's Comment
Fantastic........ Thank u ever so much.
12 Fri Aug 2011
Admin's Reply:

 thank you satya :)




kisho's Comment
superb....tx.
17 Fri Sep 2010
Admin's Reply:

Thanks