Chapter 4. Text Basics
Any successful presentation, even a thoughtful tome, should have its
text organized into an attractive, effective document. Organizing
text into attractive documents is HTML and XHTML's forte. The
languages give you a number of tools that help you mold your text and
get your message across. They also help structure your document so
that your target audience has easy access to your words.
Always keep in mind while designing your documents (here we go
again!) that the markup tags, particularly in regard to text, only
advise -- they do not dictate -- how a browser will ultimately
render the document. Rendering varies from browser to browser.
Don't get too entangled with trying to get just the right look
and layout. Your attempts may and probably will be thwarted by the
browser.
4.1. Divisions and Paragraphs
Like
most text processors, a browser wraps the words it finds to fit the
horizontal width of its viewing window. Widen the browser's
window and words automatically flow up to fill the wider lines.
Squeeze the window and words wrap downwards.
Unlike most text processors, however, HTML and XHTML use explicit
division (<div>), paragraph
(<p>), and line-break
(<br>) tags to control the alignment and
flow of text. Return characters, although quite useful for
readability of the source document, typically are ignored by the
browser -- authors must use the <br> tag
to explicitly force a common text line break. The
<p> tag, while also performing the task,
carries with it meaning and effects beyond a simple line break.
The <div> tag is a little different.
Originally codified in the HTML 3.2 standard,
<div> was included in the language to be a
simple organizational tool -- to divide the document into discrete
sections -- whose somewhat obtuse meaning meant few authors used
it. But recent innovations -- alignment, styles, and the
id attribute for document referencing and
automation -- now let you more distinctly label and thereby define
individual sections of your documents, as well as control the
alignment and appearance of those sections. These features breathe
real life and meaning into the <div> tag.
By associating an id and a class name
with the various sections of your document, each delimited by a
<div id=name
class=name> tag and attributes (you can do the
same with other tags like <p>, too), you not
only label those divisions for later reference by a hyperlink and for
automated processing and management (collect all the bibliography
divisions, for instance), but you may also define different, distinct
display styles for those portions of your document. For instance, you
might define one divisional class for your document's abstract
(<div class=abstract>,
for example), another for the body, a third for the conclusion, and a
fourth divisional class for the bibliography
(<div class=biblio>, for
example).
Each class, then, might be given a different display definition in a
document-level or externally related style sheet: the abstract
indented and in an italic typeface (such as
div.abstract {left-margin:
+0.5in; font-style:
italic}); the body in a left-justified roman
typeface; the conclusion similar to the abstract; and the
bibliography automatically numbered and formatted appropriately.
We provide a detailed description of style sheets, classes, and their
applications in Chapter 8, "Cascading Style Sheets".
4.1.1. The <div> Tag
As
defined in the HTML 4.01 and XHTML 1.0 standards, the
<div> tag divides your document into
separate, distinct sections. It may be used strictly as an
organizational tool, without any sort of formatting associated with
it; it becomes more effective if you add the id
and class attributes to label the division. The
<div> tag may also be combined with the
align attribute to control the alignment of whole
sections of your document's content in the display and with the
many programmatic "on" attributes for user interaction.
<div>
- Function:
-
Defines a block of text
- Attributes:
-
ALIGN | ONKEYPRESS |
CLASS | ONKEYUP |
DIR | ONMOUSEDOWN |
ID | ONMOUSEMOVE |
LANG | ONMOUSEOUT |
NOWRAP  | ONMOUSEOVER |
ONCLICK | ONMOUSEUP |
ONDBLCLICK | STYLE |
ONKEYDOWN | TITLE |
- End tag:
-
</div>; usually omitted in HTML
- Contains:
-
body_content
- Used in:
-
block
|
4.1.1.1. The align attribute
The align
attribute for <div> positions the enclosed
content to either the left (default),
center, or right of the
display. In addition, you can specify justify to
align both the left and right margins of the text. The
<div> tag may be nested, and the alignment
of the nested <div> tag takes precedence
over the containing <div> tag. Further,
other nested alignment tags, such as
<center>, aligned paragraphs (see
<p> in Section 4.1.2, "The <p> Tag"), or specially aligned table rows and cells,
override the effect of <div>. Like the
align attribute for other tags, it is deprecated
in the HTML and XHTML standards in deference to style sheet-based
layout controls.
4.1.1.2. The nowrap attribute
Supported only by
Internet Explorer, the
nowrap attribute suppresses automatic word
wrapping of the text within the division. Line breaks will only occur
where you have placed carriage returns in your source document.
While the nowrap attribute probably doesn't
make much sense for large sections of text that would otherwise be
flowed together on the page, it can make things a bit easier when
creating blocks of text with many explicit line breaks: poetry, for
example, or addresses. You don't have to insert all those
explicit <br> tags in a text flow within a
<div nowrap> tag. On the other hand, all
other browsers ignore the nowrap attribute and
merrily flow your text together anyway. If you are targeting only
Internet Explorer with your documents, consider using
nowrap where needed, but otherwise, we can't
recommend this attribute for general use.
4.1.1.4. The id attribute
Use the id attribute to label the document
division specially for later reference by a hyperlink, style sheet,
applet, or other automated process. An acceptable
id value is any quote-enclosed string that
uniquely identifies the division and that later can be used to
reference that document section unambiguously. Although we're
introducing it within the context of the
<div> tag, this attribute can be used with
almost any tag.
When used as an element label, the value of the id
attribute can be added to a URL to address the labelled element
uniquely within the document. You can label both large portions of
content (via a tag like <div>) or small
snippets of text (using a tag like <i> or
<span>). For example, you might label the
abstract of a technical report using <div
id="abstract">. A URL could jump right to that abstract
by referencing report.html#abstract. When used in
this manner, the value of the id attribute must be
unique with respect to all other id attributes
within the document, and all the names defined by any
<a> tags with the name
attribute. Section 6.3.3, "Linking Within a Document"
When used as a style-sheet selector, the value of the
id attribute is the name of a style rule that can
be associated with the current tag. This provides a second set of
definable style rules, similar to the various style classes you can
create. A tag can use both the class and
id attributes to apply two different rules to a
single tag. In this usage, the name associated with the
id attribute must be unique with respect to all
other style IDs within the current document. A more complete
description of style classes and IDs can be found in Chapter 8, "Cascading Style Sheets".
4.1.1.5. The title attribute
Use the optional title
attribute and quote-enclosed string value to associate a descriptive
phrase with the division. Like the id attribute,
the title attribute can be used with almost any
tag and behaves similarly for all tags.
There is no defined usage for the value of the
title attribute, and many browsers simply ignore
it. Internet Explorer, however, will display the title associated
with any element when the mouse pauses over that element. Nifty. Used
correctly, the title attribute could be used in
this manner to provide spot help for the various elements within your
document.
4.1.1.6. The class and style attributes
Use the
style attribute
with the <div> tag to create an inline style
for the content enclosed by the tag. The class
attribute lets you apply the style of a predefined class of the
<div> tag to the contents of this division.
The value of the class attribute is the name of a
style defined in some document-level or externally defined style
sheet. In addition, class-identified divisions also lend themselves
well for computer processing of your documents, such as extraction of
all divisions whose class name is "biblio," for example,
for the automated assembly of a master bibliography. Section 8.1.1, "Inline Styles: The style Attribute" Section 8.3, "Style Classes"
4.1.1.7. Event attributes
The many
user-related events that may happen in and around a division, such as
when a user clicks or double-clicks the mouse within its display
space, are recognized by the browser if it conforms to the current
HTML or XHTML standards. With the respective "on"
attribute and value, you may react to that event by displaying a user
dialog box, or activating some multimedia event. Section 12.3.3, "JavaScript Event Handlers"
4.1.2. The <p> Tag
The <p> tag
signals the start of a paragraph. That's not well-known even by
some veteran webmasters, because it runs counterintuitive to what
we've come to expect from experience. Most word processors
we're familiar with use just one special character, typically
the return character, to signal the end of a
paragraph. In HTML and XHTML, each paragraph should start with
<p> and ends with the corresponding
</p> tag. And while a sequence of newline
characters in a text processor-displayed document creates an empty
paragraph for each one, browsers typically ignore all but the first
paragraph tag.
In practice, with HTML you can ignore the starting
<p> tag at the beginning of the first
paragraph, and the </p> tag at the end of
paragraphs: they can be implied from other tags that occur in the
document, and hence safely omitted.[20]
For example:
<body>
This is the first paragraph, at the very beginning of the
body of this document.
<p>
The tag above signals the start of this second paragraph.
When rendered by a browser, it will begin slightly below the
end of the first paragraph, with a bit of extra white space
between the two paragraphs.
<p>
This is the last paragraph in the example.
</body>
Notice that we haven't included the paragraph start tag
(<p>) for the first paragraph or any end
paragraph tags at all in the HTML example; they can be unambiguously
inferred by the browser and are therefore unnecessary.
<p>
- Function:
-
Defines a paragraph of text
- Attributes:
ALIGN | ONKEYUP |
CLASS | ONMOUSEDOWN |
DIR | ONMOUSEMOVE |
ID | ONMOUSEOUT |
LANG | ONMOUSEOVER |
ONCLICK | ONMOUSEUP |
ONDBLCLICK | STYLE |
ONKEYDOWN | TITLE |
ONKEYPRESS |
- End tag:
-
</p>; often omitted in HTML
- Contains:
-
text
- Used in:
-
block
|
In general, you'll find that human document authors tend to
omit postulated tags whenever possible while automatic document
generators tend to insert them. That may be because the software
designers didn't want to run the risk of having their product
chided by competitors as not adhering to the HTML standard, even
though we're splitting letter-of-the-law hairs here. Go ahead
and be defiant: omit that first paragraph's
<p> tag and don't give a second
thought to paragraph ending </p> tags,
provided, of course, that your document's structure and clarity
are not compromised. That is, as long as you are aware that XHTML
frowns severely on such laxity.
4.1.2.1. Paragraph rendering
When encountering a new paragraph
(<p>) tag, a browser typically inserts one
blank line plus some extra vertical space into the document before
starting the new paragraph. The browser then collects all the words
and, if present, inline images into the new paragraph, ignoring
leading and trailing spaces (not spaces between words, of course) and
return characters in the source text. The browser software then flows
the resulting sequence of words and images into a paragraph that fits
within the margins of its display window, automatically generating
line breaks as needed to wrap the text within the window. For
example, compare how a browser arranges the text into lines and
paragraphs (Figure 4-1) to how the preceding
example is printed on the page. The browser may also automatically
hyphenate long words, and the paragraph may be full-justified to
stretch the line of words out towards both margins.
Figure 4-1. Browsers ignore common return characters in the source HTML document
The net result is that you do not have to worry about line length,
word wrap, and line breaks when composing your documents. The browser
will take any arbitrary sequence of words and images and display a
nicely formatted paragraph.
If you want to control line length and breaks explicitly, consider
using a preformatted text block with the
<pre> tag. If you need to force a line
break, use the <br> tag. Section 4.7.5, "The <pre> Tag" Section 4.7.1, "The <br> Tag"
4.1.2.2. The align attribute
Most browsers automatically left-justify a new
paragraph. To change this behavior, HTML 4 and XHTML give you the
align attribute for the <p> tag and provide
four kinds of content justification: left,
right, center, or
justify.
Figure 4-2 shows you the effect of various
alignments as rendered from the following source:
<p align=right>
Right over here!
<br>
This is too.
<p align=left>
Slide back left.
<p align=center>
Smack in the middle.
</p>
Left is the default.
Figure 4-2. Effect of the align attribute on paragraph justification
Notice in the HTML example that the paragraph alignment remains in
effect until the browser encounters another
<p> tag or an ending
</p> tag. We deliberately left out a final
<p> tag in the example to illustrate the
effects of the </p> end tag on paragraph
justification. Other body elements may also disrupt the current
paragraph alignment and cause subsequent paragraphs to revert to the
default left alignment, including forms, headers, tables, and most
other body content-related tags.
Note that the align attribute is deprecated in
HTML 4 and XHTML in deference to style sheet-based alignments.
4.1.2.3. The dir and lang attributes
The dir lets you advise the browser as to which
direction the text within the paragraph ought to be displayed, and
the lang attribute lets you specify the
language used within that paragraph. The dir and
lang attributes are supported by the popular
browsers, even though there are no behaviors defined for any specific
language. Section 3.6.1.1, "The dir attribute" Section 3.6.1.2, "The lang attribute"
4.1.2.4. The class, id, style, and title attributes
Use the id attribute to create a label for the
paragraph that can later be used to unambiguously reference that
paragraph in a hyperlink target, for automated searches, as a
style-sheet selector, and with a host of other applications. Section 4.1.1.4, "The id attribute"
Use the optional title attribute and
quote-enclosed string value to provide a descriptive phrase for the
paragraph. Section 4.1.1.5, "The title attribute"
Use the style attribute with the
<p> tag to create an inline style for the
paragraph's contents. The class attribute
lets you label the paragraph with a name that refers to a predefined
class of the <p> tag declared in some
document-level or externally defined style sheet. And,
class-identified paragraphs lend themselves well for computer
processing of your documents, such as extraction of all paragraphs
whose class name is "citation," for example, for
automated assembly of a master list of citations.
Section 8.1.1, "Inline Styles: The style Attribute" Section 8.3, "Style Classes"
4.1.2.5. Event attributes
Like with
divisions, there are many user-initiated events, such as when a user
clicks or double-clicks within its display space, that are recognized
by the browser if it conforms to the current HTML or XHTML standards.
With the respective "on" attribute and value, you may
react to that event by displaying a user dialog box or activating
some multimedia event. Section 12.3.3, "JavaScript Event Handlers"
4.1.2.6. Allowed paragraph content
A paragraph may contain any element allowed in a text flow,
including conventional words and punctuation, links
(<a>), images
(<img>), line breaks
(<br>), font changes
(<b>, <i>,
<tt>, <u>,
<strike>, <big>,
<small>, <sup>,
<sub>, and <font>),
and content-based style changes (<acronym>,
<cite>, <code>,
<dfn>, <em>,
<kbd>, <samp>,
<strong>, and
<var>). If any other element occurs within
the paragraph, it implies that the paragraph has ended, and the
browser assumes that the closing </p> tag
was not specified.
4.1.2.7. Allowed paragraph usage
You may specify a paragraph only within a block,
along with other paragraphs, lists, forms, and preformatted text. In
general, this means that paragraphs can appear where a flow of text
is appropriate, such as in the body of a document, an element in a
list, and so on. Technically, paragraphs cannot appear within a
header, anchor, or other element whose content is strictly text-only.
In practice, most browsers ignore this restriction and format the
paragraph as a part of the containing element.
 |  |  | | 3.10. The <bdo> Tag |  | 4.2. Headings |
Copyright © 2002 O'Reilly & Associates. All rights reserved.
|