NCSA -- A Beginner's Guide to HTML, Part 2

A Beginner's Guide to HTML

Part 2 contains the following sections:

Character Formatting
- Logical Versus Physical Styles
- Escape Sequences
Linking

You can return to Part 1 or link to Part 3.

Character Formatting

HTML has two types of styles for individual words or sentences: logical and physical. Logical styles tag text according to its meaning, while physical styles indicate the specific appearance of a section. For example, in the preceding sentence, the words "logical styles" was tagged as "emphasis." The same effect (formatting those words in italics) could have been achieved via a different tag that tells your browser to "put these words in italics."

Logical Versus Physical Styles

If physical and logical styles produce the same result on the screen, why are there both?

In the ideal SGML universe, content is divorced from presentation. Thus SGML tags a level-one heading as a level-one heading, but does not specify that the level-one heading should be, for instance, 24-point bold Times centered. The advantage of this approach (it's similar in concept to style sheets in many word processors) is that if you decide to change level-one headings to be 20-point left-justified Helvetica, all you have to do is change the definition of the level-one heading in your Web browser. Indeed, many browsers today let you define how you want the various HTML tags rendered on-screen using what are called cascading style sheets, or CSS. CSS is more advanced than HTML, though, and will not be covered in this Primer. (You can learn more about CSS at the World Wide Web Consortium CSS site.)

Another advantage of logical tags is that they help enforce consistency in your documents. It's easier to tag something as <H1> than to remember that level-one headings are 24-point bold Times centered or whatever. For example, consider the  tag. Most browsers render it in bold text. However, it is possible that a reader would prefer that these sections be displayed in red instead. (This is possible using a local cascading style sheet on the reader's own computer.) Logical styles offer this flexibility.

Of course, if you want something to be displayed in italics (for example) and do not want a browser's setting to display it differently, you should use physical styles. Physical styles, therefore, offer consistency in that something you tag a certain way will always be displayed that way for readers of your document.

Try to be consistent about which type of style you use. If you tag with physical styles, do so throughout a document. If you use logical styles, stick with them within a document. Keep in mind that future releases of HTML might not support certain logical styles, which could mean that browsers will not display your logical-style coding. (For example, the <DFN> tag -- short for "definition", and typically displayed in italics -- is not widely supported and will be ignored if the reader's browser does not understand it.)

Logical Styles

<DFN>: for a word being defined. Typically displayed in italics. (NCSA Mosaic is a World Wide Web browser.)
: for emphasis. Typically displayed in italics. (Consultants cannot reset your password unless you call the help line.)
<CITE>: for titles of books, films, etc. Typically displayed in italics. (A Beginner's Guide to HTML)
<CODE>: for computer code. Displayed in a fixed-width font. (The <stdio.h> header file)
<KBD>: for user keyboard entry. Typically displayed in plain fixed-width font. (Enter passwd to change your password.)
<SAMP>: for a sequence of literal characters. Displayed in a fixed-width font. (Segmentation fault: Core dumped.)
: for strong emphasis. Typically displayed in bold. (NOTE: Always check your links.)
<VAR>: for a variable, where you will replace the variable with specific information. Typically displayed in italics. (rm filename deletes the file.)

Physical Styles

: bold text
: italic text
<TT>: typewriter text, e.g. fixed-width font.

Escape Sequences (a.k.a. Character Entities)

Character entities have two functions:

escaping special characters
displaying other characters not available in the plain ASCII character set (primarily characters with diacritical marks)

Three ASCII characters--the left angle bracket (<), the right angle bracket (>), and the ampersand (&)--have special meanings in HTML and therefore cannot be used "as is" in text. (The angle brackets are used to indicate the beginning and end of HTML tags, and the ampersand is used to indicate the beginning of an escape sequence.) Double quote marks may be used as-is but a character entity may also be used (").

To use one of the three characters in an HTML document, you must enter its escape sequence instead:

<: the escape sequence for <
>: the escape sequence for >
&: the escape sequence for &

Additional escape sequences support accented characters, such as:

ö: a lowercase o with an umlaut: ö
ñ: a lowercase n with a tilde: ñ
È: an uppercase E with a grave accent: È

You can substitute other letters for the o, n, and E shown above. Visit the World Wide Web Consortium for a complete list of special characters.

NOTE: Unlike the rest of HTML, the escape sequences are case sensitive. You cannot, for instance, use &LT; instead of <.

Linking

The chief power of HTML comes from its ability to link text and/or an image to another document or section of a document. A browser highlights the identified text or image with color and/or underlines to indicate that it is a hypertext link (often shortened to hyperlink or just link).

HTML's single hypertext-related tag is <A>, which stands for anchor. To include an anchor in your document:

start the anchor with <A (include a space after the A)
specify the document you're linking to by entering the parameter HREF="filename" followed by a closing right angle bracket (>)
enter the text that will serve as the hypertext link in the current document
enter the ending anchor tag: </A> (no space is needed before the end anchor tag)

Here is a sample hypertext reference in a file called US.html:

    <A HREF="MaineStats.html">Maine</A>

This entry makes the word Maine the hyperlink to the document MaineStats.html, which is in the same directory as the first document.

Relative Pathnames Versus Absolute Pathnames

You can link to documents in other directories by specifying the relative path from the current document to the linked document. For example, a link to a file NYStats.html located in the subdirectory AtlanticStates would be:

    <A HREF="AtlanticStates/NYStats.html">New York</A>

These are called relative links because you are specifying the path to the linked file relative to the location of the current file. You can also use the absolute pathname (the complete URL) of the file, but relative links are more efficient in accessing a server. They also have the advantage of making your documents more "portable" -- for instance, you can create several web pages in a single folder on your local computer, using relative links to hyperlink one page to another, and then upload the entire folder of web pages to your web server. The pages on the server will then link to other pages on the server, and the copies on your hard drive will still point to the other pages stored there.

It is important to point out that UNIX is a case-sensitive operating system where filenames are concerned, while DOS and the MacOS are not. For instance, on a Macintosh, "DOCUMENT.HTML", "Document.HTML", and "document.html" are all the same name. If you make a relative hyperlink to "DOCUMENT.HTML", and the file is actually named "document.html", the link will still be valid. But if you upload all your pages to a UNIX web server, the link will no longer work. Be sure to check your filenames before uploading.

Pathnames use the standard UNIX syntax. The UNIX syntax for the parent directory (the directory that contains the current directory) is "..". (For more information consult a beginning UNIX reference text such as Learning the UNIX Operating System from O'Reilly and Associates, Inc.)

If you were in the NYStats.html file and were referring to the original document US.html, your link would look like this:

    <A HREF="../US.html">United States</A>

In general, you should use relative links whenever possible because:

it's easier to move a group of documents to another location (because the relative path names will still be valid)
it's more efficient connecting to the server
there is less to type

However, use absolute pathnames when linking to documents that are not directly related. For example, consider a group of documents that comprise a user manual. Links within this group should be relative links. Links to other documents (perhaps a reference to related software) should use absolute pathnames instead. This way if you move the user manual to a different directory, none of the links would have to be updated.

URLs

The World Wide Web uses Uniform Resource Locators (URLs) to specify the location of files on other servers. A URL includes the type of resource being accessed (e.g., Web, gopher, FTP), the address of the server, and the location of the file. The syntax is:

scheme://host.domain [:port]/path/ filename

where scheme is one of

file: a file on your local system
ftp: a file on an anonymous FTP server
http: a file on a World Wide Web server
gopher: a file on a Gopher server
WAIS: a file on a WAIS server
news: a Usenet newsgroup
telnet: a connection to a Telnet-based service

The port number can generally be omitted. (That means unless someone tells you otherwise, leave it out.)

For example, to include a link to this primer in your document, enter:

<A HREF="http://www.ncsa.uiuc.edu/General/Internet/WWW/HTMLPrimer.html">
NCSA's Beginner's Guide to HTML</A>

This entry makes the text NCSA's Beginner's Guide to HTML a hyperlink to this document.

There is also a mailto scheme, used to hyperlink email addresses, but this scheme is unique in that it uses only a colon (:) instead of :// between the scheme and the address. You can read more about mailto below.

For more information on URLs, refer to:

Links to Specific Sections

Anchors can also be used to move a reader to a particular section in a document (either the same or a different document) rather than to the top, which is the default. This type of an anchor is commonly called a named anchor because to create the links, you insert HTML names within the document.

This guide is a good example of using named anchors in one document. The guide is constructed as one document to make printing easier. But as one (long) document, it can be time-consuming to move through when all you really want to know about is one bit of information about HTML. Internal hyperlinks are used to create a "table of contents" at the top of this document. These hyperlinks move you from one location in the document to another location in the same document. (Go to the top of this document and then click on the Links to Specific Sections hyperlink in the table of contents. You will wind up back here.)

You can also link to a specific section in another document. That information is presented first because understanding that helps you understand linking within one document.

Links Between Sections of Different Documents

Suppose you want to set a link from document A (documentA.html) to a specific section in another document (MaineStats.html).

Enter the HTML coding for a link to a named anchor:

     documentA.html:
    
     In addition to the many state parks, Maine is also home to 
     <a href="MaineStats.html#ANP">Acadia National Park</a>.

Think of the characters after the hash (#) mark as a tab within the MaineStats.html file. This tab tells your browser what should be displayed at the top of the window when the link is activated. In other words, the first line in your browser window should be the Acadia National Park heading.

Next, create the named anchor (in this example "ANP") in MaineStats.html:

  <H2><A NAME="ANP">Acadia National Park</a></H2>

With both of these elements in place, you can bring a reader directly to the Acadia reference in MaineStats.html.

NOTE: You cannot make links to specific sections within a different document unless either you have write permission to the coded source of that document or that document already contains in-document named anchors. For example, you could include named anchors to this primer in a document you are writing because there are named anchors in this guide (use View Source in your browser to see the coding). But if this document did not have named anchors, you could not make a link to a specific section because you cannot edit the original file on NCSA's server.

Links to Specific Sections within the Current Document

The technique is the same except the filename is omitted.

For example, to link to the ANP anchor from within MaineStats, enter:

  ...More information about
  <A HREF="#ANP">Acadia National Park</a>
  is available elsewhere in this document.

Be sure to include the <A NAME=> tag at the place in your document where you want the link to jump to (<A NAME="ANP">Acadia National Park</a>).

Named anchors are particularly useful when you think readers will print a document in its entirety or when you have a lot of short information you want to place online in one file.

Mailto

You can make it easy for a reader to send electronic mail to a specific person or mail alias by including the mailto attribute in a hyperlink. The format is:

<A HREF="mailto:emailinfo@host">Name</a>

For example, enter:

 <A HREF="mailto:pubs@ncsa.uiuc.edu">
 NCSA Publications Group</a>

to create a mail window that is already configured to open a mail window for the NCSA Publications Group alias. (You, of course, will enter another mail address!)

Return to Part 1
Link to Part 3
Return to the guide homepage