TEI Summary
Usage
TEI, named for the Text Encoding Initiative, is used to mark up and describe
full text materials. A TEI document generally has two main sections: the
TEI header, which contains descriptive and administrative metadata, and
a text section, which contains the bulk of the text itself and is divided
into front matter, body, and back matter. Within these two major divisions,
there are a wide variety of tags, very few of which are required.
Creator
The TEI consortium has been active in one form or another since 1989.
TEI has an elected board of directors and many working groups. Its membership
universities and colleges, research libraries, library consortia, academic
and other non-profit publishers, as well as associations, scholarly societies,
and a range of commercial entities concerned with the design, production
or delivery of structured electronic text. Northwestern joined the TEI
consortium this year.
Revisions
TEI is in its third version (P3) issued in May 1999. Version P1 was issued
in November of 1990, and version P2 in April 1993. There is also a TEI
Lite version, which is compatible with version P3 of the full TEI.
Ease of use
TEI is very flexible, with many tags that may be reused throughout the
text and nested within other tags. Diving in to the full TEI tag set is
fairly daunting, but using TEI Lite as a starting point simplifies the
task considerably. TEI is also divided into tag subsets for particular
purposes as defined in Part III: Base Tag sets http://www.tei-c.org/Guidelines/ST.htm#STBABA
(tag sets for drama, verse, dictionaries, etc.) The DLF sponsored a workshop
in 1998 which definied 5 levels of TEILite encoding to further augment
the guidelines http://www.diglib.org/standards/tei.htm.
Documentation
The official TEI site includes the TEI Guidelines, an extensive description
of the tagset, sections, application for various types of texts and an
introduction to SGML. http://www.tei-c.org/. Guides to local practice
have been contributed and are documented on the TEI site http://www.tei-c.org/Tutorials/index.html
Thesauri
There are no specific thesauri recommended by the TEI guidelines.
Projects
Extensive list of TEI encoding projects are listed on the TEI consortium
page: http://www.tei-c.org/Applications/index.html
Granularity
TEI may be used to mark up single works or text corpora. Within the body
of individual works, text divisions up to 7 layers deep are permissible
using the TEI numbered <div> tags (<div0>, <div1>, <div2>,
etc.) These divisions might correspond to volumes, chapters, acts, scenes,
or other textual divisions depending on rules established through practice.
Smaller divisions for identifying paragraphs and lines of text are also
included in the standard. The TEI Header as a rule describes the text
that follows it, so the descriptive and other metadata must correspond
to either the single work or the body of work as a whole.
Data for original and surrogate
From the TEI guidelines: "The Guidelines have been written largely
with a focus on text capture (i.e. the representation in electronic form
of an already existing copy text in another medium) rather than
text creation (where no such copy text exists). Hence the frequent use
of terms like `transcription', `original', `copy text', etc. However,
the Guidelines should be equally applicable to text creation, and the
two terms text creation and text capture are often used interchangeably." Many tags in the TEI Header apply to both the print and electronic versions
(i.e. title statement, edition statement) but some apply primarily to
the electronic (i.e. publication statement, extent) and some to the original
print (i.e. source). There is a slight emphasis in favor of the elements
which describe the aspects of the electronic version.
Metadata types
The TEI Header is broken into four major sections:
<fileDesc> contains a full bibliographic description of an electronic
file. DESCRIPTIVE
<encodingDesc> documents the relationship between an electronic
text and the source or sources from which it was derived. ADMINISTRATIVE
AND TECHNICAL
<profileDesc> provides a detailed description of non-bibliographic
aspects of a text, specifically the languages and sublanguages used, the
situation in which it was produced, the participants and their setting.
ADMINISTRATIVE AND DESCRIPTIVE
<revisionDesc> summarizes the revision history for a file. ADMINISTRATIVE
The body of the TEI document contains tags which clearly define the structure
of the doucument itself.