Chapter 16. XHTML
Despite its name,
you
don't use Extensible
Markup Language (XML) to
directly create and mark up web documents. Instead you use XML
technology to define a new markup language, which you then use to
mark up web documents. This should come as no surprise to anyone who
has read the previous chapter in this book. Nor, then, should it
surprise you that one of the first languages defined using XML is an
XML-ized version of HTML, the most popular markup language ever. HTML
is now being disciplined and cleaned up by XML to bring it back into
line with the larger family of markup languages. This new standard is
XHTML 1.0.[80]
Because of HTML's legacy features and oddities, using XML to
describe HTML was not an easy job for the W3C. In fact, certain HTML
rules, as we'll discuss later, cannot be represented using XML.
Nonetheless, if the W3C has its way, XHTML will ultimately replace
the HTML we currently know and love. We agree that it should.
So much of XHTML is identical to HTML's current standard,
Version 4.01, that almost everything presented elsewhere in this book
may be applied to both HTML and XHTML. The differences, both good and
bad, are detailed in this chapter. To become fluent in XHTML,
you'll first need to absorb the rest of this book, and then
adjust your thinking to embrace what we present in this chapter.
16.1. Why XHTML?
HTML, as everyone should know by now, began as a simple markup
language similar in appearance and usage to other SGML-based markup
languages. In its early years, little effort was put into making HTML
perfectly SGML-compliant. As a result, odd features and a lax
attitude towards enforcing the rules became a standard part of both
HTML and the browsers that processed HTML documents.
As the Web grew from an experiment into an industry, the desire for a
standard version of HTML led to the creation of several official
versions, culminating most recently with Version 4.01. As HTML has
stabilized into this latest version, browsers have become more alike
in their support of various HTML features. In general, the world of
HTML has settled into a familiar set of constructs and usage rules.
Unfortunately,
HTML offers only a limited set of
document-creation primitives, is incapable of handling nontraditional
content such as chemical formulae, musical notation, or mathematical
expressions, and fails to adequately support alternative display
media such as handheld computers or intelligent cellular phones. We
need new ways to deliver information that can be parsed, processed,
displayed, sliced, and diced by the many different communication
technologies that have emerged since the Web sparked the digital
communication revolution a decade ago.
Rather than trying to rein in another herd of maverick, nonstandard
markup languages, the W3C introduced XML as a standard way to create
new markup languages. XML is the framework upon which organizations
can develop their own markup languages to suit the needs of their
users. XML is an updated version of SGML, streamlined and enhanced
for today's dynamic systems. And while the W3C originally
intended it as a tool to create document markup languages, XML is
also becoming quite useful as a standard way to define tiny little
languages that are used as data exchange protocols between different
applications.
Of course, we don't want to abandon the plethora of documents
already marked up with HTML or the infrastructure of knowledge,
tools, and technologies that currently support HTML and the Web. Yet,
we do not want to miss the opportunities of XML, either. XHTML is the
bridge. It uses the features of XML to define a markup language that
is nearly identical to standard HTML 4.01 and gets us all started
down the XML road.
16.1.1. XHTML Document Type Definitions
HTML 4.01 comes in three variants, each defined by a separate SGML
DTD. Similarly XHTML also comes in three variants, with
XML DTDs
corresponding to the three SGML DTDs that define HTML 4.01. To create
an XHTML document, you must choose one of these DTDs and then create
a document that uses its particular elements and rules.
The first XHTML DTD corresponds to the "strict" HTML DTD.
The strict definition excludes all deprecated elements (tags and
attributes) in HTML 4.01 and forces authors to use only those
features that are fully supported in HTML. Many of the HTML elements
and attributes dealing with presentation and appearance, such as the
<font> tag and the align
attribute, are missing from the strict XHTML DTD, replaced by the
equivalent properties in the Cascading Style Sheet model.
Most HTML authors find the strict XHTML DTD too restrictive, since
many of the deprecated elements and attributes are still in
widespread use throughout the Web. More importantly, the popular
browsers -- while fully supporting the deprecated
elements -- have yet to fully implement the new standard ones. The
only real advantage in using the strict XHTML DTD is that compliant
documents are guaranteed to be fully supported in future versions of
XHTML.[81]
Most authors will probably choose to use the
"transitional" XHTML DTD. It's closest to the
current HTML standard and includes all those wonderful, but
deprecated, features that make life as an HTML author easier. With
the transitional XHTML DTD, you can ease into the XML family while
staying current with the browser industry.
The third DTD is for frames. It is identical to the transitional DTD
in all other respects; the only difference is the replacement of the
document body with appropriate frame elements. You might think that,
for completeness' sake, there would be strict and transitional
frame DTDs, but the W3C decided that if you use frames, you might as
well use all the deprecated elements as well.
 |  |  | | 15.8. Using XML |  | 16.2. Creating XHTML Documents |
Copyright © 2002 O'Reilly & Associates. All rights reserved.
|