This chapter provides a basic introduction to the background and
general syntax of HTML, including document structure,
tags, and their
attributes. It also looks briefly at good HTML style and the pros and
cons of using WYSIWYG authoring tools.
8.1. The HTML Standard
The HTML standard and all other Web-related
standards are developed under the authority of the
World Wide Web Consortium (W3C).
Standards, specifications, and drafts of new proposals can be found
at http://www.w3.org. The most
recent standard for document markup is the HTML 4.01 specification.
The HTML standard traveled a long, difficult road to its current
state of relative stability. Early on, competition between the major
web browsers led to a mess of proprietary tags, HTML extensions, and
practices that muddied the original intent of HTML in favor of more
control over page display.
The W3C has pulled in the reins with the HTML 4.0 specification
(which is further refined in the current 4.01 version). It
incorporates many of the tags introduced by the popular browsers that
improve web functionality. It also officially
"deprecates" tags that are used in common practice but
are not in keeping with the priorities of the markup language (such
as keeping style information out of content).
8.1.1. Keeping Style Separate from Content
Before HTML there was SGML
(Standard Generalized Markup Language), which established the system
of describing documents in terms of their structure, independent of
appearance. SGML is a vast set of rules for developing markup
languages such as HTML, but it is so all-encompassing that HTML uses
only a small subset of its capabilities.
Publishers began storing SGML versions of their documents so that
they could be translated into a variety of end uses. For example,
text that is tagged as a heading may be formatted one way if the end
product is a printed book, but another way for a CD-ROM. The
advantage is that a single source file can be used to create a
variety of end products. The way it is interpreted and displayed
(i.e., the way it looks) depends on the end use.
Because HTML is one application of an SGML tagging system, this
principle of keeping style information separate from the structure of
the document remains inherent to the HTML purpose. Over the past few
years, this ideal has been compromised by the creation of HTML tags
that contain explicit style instructions, such as the
<font> tag.
Cascading Style Sheets promise to keep style information out of the
content by storing all style instructions in a separate document (or
a separate section of the source document). With this system in
place, the W3C is more diligent than ever to clean up the HTML
standard to make it work the way it was intended. For more
information, see Chapter 17, "Cascading Style Sheets".
8.1.2. Three Flavors of HTML 4.01
While the W3C has definite ideas on how HTML should work, they are
also aware that it is going to be a while before old browsers are
phased out and web authors begin to mark up documents properly. For
that reason, the HTML 4.01 specification actually encompasses three
slightly different specification documents: one "strict,"
one "transitional," and one just for framed documents.
These documents, called Document Type
Definitions (or DTDs), define every tag, attribute, and entity along
with the rules for their use. DTDs are written following the rules
and conventions of SGML (Standard Generalized Markup Language).
The HTML 4.01 Strict DTD excludes all deprecated tags and attributes
(those scheduled to be phased out). In an ideal world, all developers
would mark up the structure of their documents according to the
strict version of HTML, leaving all presentation to be handled by
style sheets.
The HTML 4.01 Transitional DTD is less restrictive, and it includes
many of the elements dedicated to appearance (such as the
<font> tag and the align
attribute) that are in common use today. Most developers today comply
with the transitional specification because it allows more control
over presentation while the industry waits for older browsers (those
that don't support new features such as style sheets) to fade
away.
The Frameset DTD is identical to the Transitional DTD, except that it
allows for the <frameset> element to be used
in place of the standard <body> element.
Frames are discussed in Chapter 14, "Frames".
8.1.3. The Web Standards Movement
After
years of frustration coding for
incompatible browsers, the web development community finally said,
"Enough is enough!" and began putting pressure on the
browser developers to change their ways. The charge was led in part
by the Web Standards Project (WaSP, http://www.webstandards.org), an industry
watchdog group that works diligently to convince the browser
developers that it is in everyone's best interest to comply
with the established web standards.
Fortunately, the browser developers listened, and things have settled
considerably in the last three years. Microsoft Internet Explorer
began nearly complete support for HTML 4.01 in Version 5.5 for
Windows (5.0 for Mac). Netscape's 4.x releases support most of
the tags in the HTML 4.0 specification, and its 6.0 release is fully
compliant with HTML 4.01 (with very few exceptions). Other browsers,
most notably Opera, have stuck to the specifications from the very
beginning.
But WaSP doesn't stop with the browser developers. If there is
to be a true set of web standards (including HTML, but also CSS,
JavaScript, and the Document Object Model), everybody needs to abide
by them. Web developers need to give up the convenience and habit of
sloppy HTML code and follow the HTML 4.01 mandates to keep style
separate from structure and content. Web authoring tool developers
must make it easy to generate standards-compliant code with their
tools. Furthermore, users must ditch their old
non-standards-compliant browsers and upgrade to current versions.
WaSP is diligent in its efforts, but there is still much work to be
done before all these pieces fall seamlessly into place.
8.1.4. Web Standards in This Book
The intention of this book is to be highly mindful of and compliant
with the standards effort. The tag information in the following
chapters reflects the current HTML 4.01 Transitional Specification.
However, it also represents HTML common practices and includes some
tags that are not necessarily part of the standard. In all cases
where a tag or attribute is proprietary (works with only one browser)
or deprecated by the W3C, it is clearly labeled as such. In this way,
I hope to paint a complete picture of HTML while endorsing the
standard.
8.1.5. The Future of HTML
According to
the W3C, HTML 4.01 is the end of the line for HTML as we know it. The
next version of HTML is the XHTML Version 1.0 specification. XHTML is
the same HTML specification as we know it today, but rewritten using
the new-and-improved rules of XML (Extensible Markup Language). XHTML
uses all the same HTML 4.01 tags, but it enforces a set of rules
(such as closing all tags, putting attribute values in quotation
marks, and keeping tags all lowercase) that make a document
"well-formed." Well-formed XHTML will work in
next-generation XML-based browsers, where HTML will not. Our current
HTML coding standards are incredibly lax by comparison.
These topics are discussed further in Chapter 30, "Introduction to XML"
and Chapter 31, "XHTML".