Character Data

Within a HTML document, any characters between the HTML elements represents text. A HTML document (including elements and text) is encoded by means of a special character set described by the "charset" parameter as specified in the "text/html" MIME type. Essentially, this is restricted to a character set known as "US-ASCII" (or ISO-8859-1) which encodes the set of characters known as Latin Alphabet No 1 (commonly abbreviated to Latin-1). This covers the characters from most Western European Languages. It also covers 25 control characters, a soft hyphen indicator, 93 graphical characters and 8 unassigned characters.

It should be noted that non-breaking space and soft hyphen indicator characters are not recognised and interpreted by all browsers and due to this, their use is discouraged.

There are 58 character positions occupied by control characters. See Control Characters for details on the interpretation of control characters.

Because certain special characters are subject to interpretation and special processing, information providers and HTML user agent implementors should follow the guidelines in the Special Characters section.

In addition, HTML provides character entity references to facilitate the entry and interpretation of characters by name and by numerical position.

Because certain characters will be interpreted as markup, they must be represented by entity references as described in character and/or numerical references.