Northwestern University Library Home
StaffWeb Home - Staff Intranet for Northwestern University Library Royko Home - Staff SharePoint Site for Northwestern University Library
Digital Library Committee
Joint Committee on Metadata (BAER/DLC)

TEI Summary

More web resources for TEI

Usage
TEI, named for the Text Encoding Initiative, is used to mark up and describe full text materials. A TEI document generally has two main sections: the TEI header, which contains descriptive and administrative metadata, and a text section, which contains the bulk of the text itself and is divided into front matter, body, and back matter. Within these two major divisions, there are a wide variety of tags, very few of which are required.

Creator
The TEI consortium has been active in one form or another since 1989. TEI has an elected board of directors and many working groups. Its membership universities and colleges, research libraries, library consortia, academic and other non-profit publishers, as well as associations, scholarly societies, and a range of commercial entities concerned with the design, production or delivery of structured electronic text. Northwestern joined the TEI consortium this year.

Revisions
TEI is in its third version (P3) issued in May 1999. Version P1 was issued in November of 1990, and version P2 in April 1993. There is also a TEI Lite version, which is compatible with version P3 of the full TEI.

Ease of use
TEI is very flexible, with many tags that may be reused throughout the text and nested within other tags. Diving in to the full TEI tag set is fairly daunting, but using TEI Lite as a starting point simplifies the task considerably. TEI is also divided into tag subsets for particular purposes as defined in Part III: Base Tag sets http://www.tei-c.org/Guidelines/ST.htm#STBABA (tag sets for drama, verse, dictionaries, etc.) The DLF sponsored a workshop in 1998 which definied 5 levels of TEILite encoding to further augment the guidelines http://www.diglib.org/standards/tei.htm.

Documentation
The official TEI site includes the TEI Guidelines, an extensive description of the tagset, sections, application for various types of texts and an introduction to SGML. http://www.tei-c.org/. Guides to local practice have been contributed and are documented on the TEI site http://www.tei-c.org/Tutorials/index.html

Thesauri
There are no specific thesauri recommended by the TEI guidelines.

Projects
Extensive list of TEI encoding projects are listed on the TEI consortium page: http://www.tei-c.org/Applications/index.html

Granularity
TEI may be used to mark up single works or text corpora. Within the body of individual works, text divisions up to 7 layers deep are permissible using the TEI numbered <div> tags (<div0>, <div1>, <div2>, etc.) These divisions might correspond to volumes, chapters, acts, scenes, or other textual divisions depending on rules established through practice. Smaller divisions for identifying paragraphs and lines of text are also included in the standard. The TEI Header as a rule describes the text that follows it, so the descriptive and other metadata must correspond to either the single work or the body of work as a whole.

Data for original and surrogate
From the TEI guidelines: "The Guidelines have been written largely with a focus on text capture (i.e. the representation in electronic form of an already existing copy text in another medium) rather than text creation (where no such copy text exists). Hence the frequent use of terms like `transcription', `original', `copy text', etc. However, the Guidelines should be equally applicable to text creation, and the two terms text creation and text capture are often used interchangeably." Many tags in the TEI Header apply to both the print and electronic versions (i.e. title statement, edition statement) but some apply primarily to the electronic (i.e. publication statement, extent) and some to the original print (i.e. source). There is a slight emphasis in favor of the elements which describe the aspects of the electronic version.

Metadata types
The TEI Header is broken into four major sections:
<fileDesc> contains a full bibliographic description of an electronic file. DESCRIPTIVE
<encodingDesc> documents the relationship between an electronic text and the source or sources from which it was derived. ADMINISTRATIVE AND TECHNICAL
<profileDesc> provides a detailed description of non-bibliographic aspects of a text, specifically the languages and sublanguages used, the situation in which it was produced, the participants and their setting. ADMINISTRATIVE AND DESCRIPTIVE
<revisionDesc> summarizes the revision history for a file. ADMINISTRATIVE
The body of the TEI document contains tags which clearly define the structure of the doucument itself.