The CTDP XML Tutorial Version 0.2.1, September 21 2001

XML Tutorial Introduction

Markup Language

If you have some familiarity with HTML, you have some concept of what markup language is. If you write a plain text file, it is composed of simple ASCII characters and nothing more. When a program (such as notepad) is used to display the file, all characters in the text file will be displayed using the same font size, type, and boldness. There are no special display characteristics to this type of file.

Markup languages, such as HTML or XML, allow special markup to be embedded with the rest of the text that will enable the program that displays the file to determine how to display the text. In this way, special text like headers may be centered, have a larger and bolder font, or specific display colors may be set. Also additional elements may be added to the file such as bulleted or numbered lists and tables.

Specifying Display Style

Markup languages use elements to set aside one area of content from other content. The display of these elements (such as color, size, and font type) may be determined within the markup file itself or outside the file using a style sheet. Normally, there is a predetermined set of display characteristics (default) for each element which may be modified locally or using style sheets. Authors are encouraged to separate the determination of display characteristics (style) from the markup file. This makes management of display style much easier but the separation is not required.

DTD

Markup languages normally require a Document Type Definition (DTD) which defines the elements that are allowed in the document. The DTD also defines how these elements may be used with relationship to each other. It will define how many elements and which elements may be included inside another element. The DTD is a text file written by a specific format to define the document. The DTD is based on the Standardized Generalized Markup Language (SGML). SGML is the parent of all markup languages. Although XML may use a DTD, it is not required for those documents that are considered "well formed". A well formed document follows a set of rules for XML and this subject is addressed in more detail later.

The DTD also defines other characteristics of the element such as whether or not it requires a beginning or ending tag along with various possible attributes of each element.

XML Definition

XML stands for extensible markup language. XML was developed around 1996 and is a subset of SGML. It's documents conform to SGML. XML was made less complicated than SGML to enable its use on the web. XML uses the ISO 10646 (Unicode) standard for encoding characters.

Previous Knowledge

Although not required, prior to reading this document, the reader is strongly encouraged to learn HTML and how to read and write DTDs by reading the HTML Guide and DTD Reference in the appropriate sections on this website. I believe understanding this document will be difficult without prior knowledge of DTDs and HTML, but those who desire to proceed without this knowledge may attempt it. Also the information about Cascading Style Sheets (CSS) along with a list of CSS attributes and the types of elements they apply to are contained in the HTML Guide.