aboutsummaryrefslogtreecommitdiff
path: root/tagset.txt
diff options
context:
space:
mode:
Diffstat (limited to 'tagset.txt')
-rw-r--r--tagset.txt2057
1 files changed, 1056 insertions, 1001 deletions
diff --git a/tagset.txt b/tagset.txt
index 9a7a501..0093d42 100644
--- a/tagset.txt
+++ b/tagset.txt
@@ -1,131 +1,163 @@
- FIELD MARKS FOR WEBSTER 1913 and CIDE
- =====================================
- Explanations of the tags used to mark the Webster 1913 dictionary
-and the CIDE (Collaborative International Dictionary of English).
-Note that the list of tags used to mark the public domain version
-of this dictionary is shorter than the full set described here.
- If any tag is not listed here, it is either (1) one of the
-"point" (font size) or "type" (font style) tags, which should be
-self-explanatory; or (2) is a functional field with no effect on the
-typography.
+FIELD MARKS FOR WEBSTER 1913 and CIDE
+=====================================
+
+* Overview
+
+This file describes the tags used to mark the Webster 1913 dictionary and
+the GCIDE (GNU Collaborative International Dictionary of English).
+
+If any tag is not listed here, it is either (1) one of the "point" (font
+size) or "type" (font style) tags, which should be self-explanatory; or (2)
+is a functional field with no effect on the typography.
Last modified March 12, 1999.
For questions, contact:
Patrick Cassidy cassidy@micra.com
735 Belvidere Ave.
Plainfield, NJ 07062
(908) 561-3416 or (908) 668-5252
--------------------------------------------------------------
-A separate file, webfont.txt, contains the list of the individual
+
+A separate file, webfont.txt, contains the list of the individual
non-ASCII characters represented by either higher-order hexadecimal
-character marks (e.g., \'94, for o-umlaut) or by entity tags
-(e.g., <root/, for the square root symbol.)
---------------------------------------------------------------
- Use of tags:
- In the MICRA electronic version of the 1913 Webster, each part of
-the entry headed by an entry word ("headword") is labeled so that no
-part of the entry except some punctuation marks should be found
-outside of all fields, i.e. every character should be within some tagged
-field. In the following description, the word "segment" usually refers to
-a major part of an entry such as an etymology or a definition or a
-collocation segment or a usage block, containing more than one field.
-The term "field" may also be used similarly to "segment", but may also
-denote single-word fields, such as an alternative spelling, labeled <asp>.
-
- Note: The tags on this list are similar in structure to SGML tags. Each
-tag on this list marks a field; each field opens with a tagname between
-angle brackets thus: <tagname>, and closes with a similar tag containing
-the forward slash thus: </tagname>. No tags are used without closing
-tags. Thus the HTML <BR> to indicate a line break is symbolized
-here as an entity, <br/, and every <p> has a corresponding </p>.
- The absence of an end-field tag, or the presence of an end-field tag
-without a prior begin-field tag constitutes a typographical error, of which
-there may be a significant number. Any errors detected should be brought
-to the attention of PJC or the appropriate editor.
- Most of the tagged fields are presented in the text in italic type,
-with a number of exceptions. Where a word is contained within more than
-one field, the innermost field determines the font to be used. Wherever
-recognizable functional fields were found, an attempt was made to tag the
-field with a functional mark, but in many cases, words were italicised only
-to represent the word itself as a discourse entity, and in some such cases,
-the "italic" mark <it> was used, implying nothing regarding functionality
-of the word. The base font is considered "plain". Where an italic field
-is indicated, parentheses or brackets within the field are not italicised.
- Where no font is specified for a tag, the tag is merely a functional
+character marks (e.g., \'94, for o-umlaut) or by entity tags (e.g.,
+<root/, for the square root symbol.)
+
+* Introduction
+
+In the MICRA electronic version of the 1913 Webster and in GCIDE, each part
+of the entry headed by an entry word ("headword") is labeled so that no part
+of the entry except some punctuation marks should be found outside of all
+fields, i.e. every character should be within some tagged field. In the
+following description, the word "segment" usually refers to a major part of
+an entry such as an etymology or a definition or a collocation segment or a
+usage block, containing more than one field. The term "field" may also be
+used similarly to "segment", but may also denote single-word fields, such as
+an alternative spelling, labeled <asp>.
+
+The tags on this list are similar in structure to SGML tags. Each tag on
+this list marks a field; each field opens with a tagname between angle
+brackets thus: <tagname>, and closes with a similar tag containing the
+forward slash thus: </tagname>. No tags are used without closing tags.
+Thus a line break (similar to HTML <br> tag) is symbolized here as an
+entity, <br/, and every <p> has a corresponding </p>.
+
+The absence of an end-field tag, or the presence of an end-field tag without
+a prior begin-field tag constitutes a typographical error, of which there
+may be a significant number. Any errors detected should be brought to the
+attention of PJC or the appropriate editor.
+
+Most of the tagged fields are presented in the text in italic type, with a
+number of exceptions. Where a word is contained within more than one field,
+the innermost field determines the font to be used. Wherever recognizable
+functional fields were found, an attempt was made to tag the field with a
+functional mark, but in many cases, words were italicised only to represent
+the word itself as a discourse entity, and in some such cases, the "italic"
+mark <it> was used, implying nothing regarding functionality of the word.
+The base font is considered "plain". Where an italic field is indicated,
+parentheses or brackets within the field are not italicised.
+
+Where no font is specified for a tag, the tag is merely a functional
division, and was printed in plain font unless otherwise tagged. This type
-of segment is marked by an asterisk (*) where the font name would be.
- The size of the "plain" font in the original text is about 1.6 mm for
-the height of capitalized letters.
-=============================================================
-Explicit typographical tags:
- These were used where the purpose of a different font was merely to
-distinguish a word from the body of the text, and no explicit functional
-tag seemed apropriate.
------------------------------------
-Tag Font
------------------------------------
-Explicit formatting tags:
-. . . . . . . . . . . . . . . . . .
-<plain> plain font (that used in the body of a definition) --
- normally not marked, except within fields of
- a different front.
-<it> italic (in master files)
-<i> italic (for use in HTML presentation)
-<bold> bold (in master files)
-<b> bold (for use in HTML presentation)
-<colf> bold, Collocation font. Same font as used in collocations.
- smaller This is used only in the list of "un-" words not
- by 1 point actually defined in the dictionary. Probably could be
- replaced by a segment mark for the entire list!
- The "un-" words should be indexed as headwords.
-
-<ct> bold Same as <colf>, a font similar to that used in
- collocations. However, this tag is used in a table
- and could be set to a different font.
-
-<h1> * HTML tag -- largest heading font.
-
-<h2> * HTML tag -- second largest heading font.
-
-<headrow> * Marks a Row title in a table.
-
-<hwf> Font the same as the headword <hw>, though the field is
- not a headword. Used only once.
-
-<mitem> * Multiple items, a set of items in a table.
-<point ...> A series of point size markers, many unique.
-<point1.5> * One of the tags of the form <point**> where **
-<point6> represents the typographic point size of the
- enclosed text.
-<pre> An HTML tag indicating that the enclosed text is
- of teletype form, preformatted in a uniform-spaced
- font.
-<sc> small caps (used mostly for "a. d.", "b. c.")
- This is the same font a <er>, but has no functional
- or semantic significance
-<str> group of table data elements in a table
-<sub> subscript, like <subs>
-<subs> subscript
-<sups> superscript
-<supr> superscript
-<sansserif> Sans-serif font
-<stypec> Bold (collocation font) and also a subtype.
-<tt> HTML tage -- teletype font
-<universbold> A squared bold font without serifs approximating the
- "universe bold" font on the HP Laserjet4, slightly
- larger than the capitals in a definition body. Used
- in expositions describing shapes, such as
- "Y", "T", "U", "X", "V", "F".
-<vertical> Vertically organized column.
-<column1> Vertically organized column -- only part of a table
- which needs to be completed. Used once.
-<...type> A series of tags, many unique, designating certain
- unusual fonts, such as "bourgeoistype" for
- "bourgeois type", in the section on typography.
- Most of these occur only once, in the section on fonts.
+of segment is marked by an asterisk (*) where the font name would be. The
+size of the "plain" font in the original text is about 1.6 mm for the height
+of capitalized letters.
+
+* Explicit typographical tags
+
+These were used where the purpose of a different font was merely to
+distinguish a word from the body of the text, and no explicit functional tag
+seemed apropriate.
+
+-------------------------------------------------------------------------
+Tag Font Description
+-------------------------------------------------------------------------
+<plain> plain font that used in the body of a definition -- normally
+ not marked, except within fields of a different
+ front.
+
+<it> italic in master files
+
+<i> italic for use in HTML presentation
+
+<bold> bold in master files
+
+<b> bold for use in HTML presentation
+
+<colf> bold, Collocation font. Same font as used in
+ collocations.
+ smaller This is used only in the list of "un-"
+ by 1 point words not actually defined in the
+ dictionary.
+ Probably could be replaced by a segment mark
+ for the entire list! The "un-" words should
+ be indexed as headwords.
+
+<ct> bold Same as <colf>, a font similar to that used
+ in collocations. However, this tag is used
+ in a table and could be set to a different
+ font.
+
+<h1> * HTML tag -- largest heading font.
+
+<h2> * HTML tag -- second largest heading font.
+
+<headrow> * Marks a Row title in a table.
+
+<hwf> Font the same as the headword <hw>, though
+ the field is not a headword. Used only
+ once.
+
+<mitem> * Multiple items, a set of items in a table.
+<point ...> A series of point size markers, many
+ unique.
+
+<point1.5> * One of the tags of the form <point**> where **
+<point6> represents the typographic point size of the
+ enclosed text.
+
+<pre> An HTML tag indicating that the enclosed
+ text is of teletype form, preformatted in a
+ uniform-spaced font.
+
+<sc> small caps used mostly for "a. d.", "b. c."
+ This is the same font as in <er>, but has no
+ functional or semantic significance.
+
+<str> group of table data elements in a table.
+
+<sub> subscript
+
+<subs> subscript
+
+<sups> superscript
+
+<supr> superscript
+
+<sansserif> Sans-serif
+
+<stypec> Bold collocation font, and also a subtype.
+
+<tt> HTML tage -- teletype font
+
+<universbold> A squared bold font without serifs approximating
+ the "universe bold" font on the HP Laserjet4,
+ slightly larger than the capitals in a definition
+ body. Used in expositions describing shapes,
+ such as "Y", "T", "U", "X", "V", "F".
+
+<vertical> Vertically organized column.
+
+<column1> Vertically organized column -- only part of a table
+ which needs to be completed. Used once.
+
+<...type> A series of tags, many unique, designating
+ certain unusual fonts, such as "bourgeoistype"
+ for "bourgeois type", in the section on
+ typography. Most of these occur only once, in
+ the section on fonts. Some examples follow:
<antiquetype>
<blacklettertype>
<boldfacetype>
<bourgeoistype>
<boxtype>
<clarendontype>
@@ -143,938 +175,961 @@ Explicit formatting tags:
<pearltype>
<picatype>
<scripttype>
<smpicatype>
<typewritertype>
-=============================================================
-Tags with semantic content:
-. . . . . . . . . . . . . . . . . . . . . . . . . . .
-<altsp> * Alternative spelling segment. Almost always
- contained within square brackets after the main
- definition segment. Expository words
- such as "Spelled also" are in plain font;
- the actual alternative spelling is marked by
- <asp> ... </asp> tags within this segment.
-
-<ant> italic Antonym.
-
-<asp> italic Alternative spelling. The actual word which is an
- alternative spelling to the headword. These
- are functionally synonyms of the headword. In
- most cases these also occur as headwords, with
- reference to the word where the actual definition
- is found, but not all such words are listed
- separately, particularly if the spelling is
- close enough to the headword to be found at the
- same point in the dictionary. Whether listed
- separately or not, these words should
- be indexed at this location, also.
-
-<au> italic Authority or author. Used where an authority is
- (may be right- given for a definition, and also used for the
- justified. See author, where a quotation within double quotes
- in the section is given in the same paragraph as the
- on formatting). definition. The double quotes are indicated
- by the open-quote (\'bd) and close-quote
- (\'b8). In both cases, it is typically
- right-justified, almost always fitting on
- the same line with the last line of the
- definition or quotation.
- Within collocation segments, it is usually
- used only after quotations, and is not right-
- justified, except occasionally where it
+* Tags with semantic content:
+
+-------------------------------------------------------------------------
+Tag Font Meaning and Description
+-------------------------------------------------------------------------
+<altsp> * Alternative spelling segment. Almost always
+ contained within square brackets after the main
+ definition segment. Expository words such as
+ "Spelled also" are in plain font; the actual
+ alternative spelling is marked by <asp> ...
+ </asp> tags within this segment.
+
+<ant> italic Antonym.
+
+<asp> italic Alternative spelling. The actual word which is
+ an alternative spelling to the headword. These
+ are functionally synonyms of the headword. In
+ most cases these also occur as headwords, with
+ reference to the word where the actual definition
+ is found, but not all such words are listed
+ separately, particularly if the spelling is close
+ enough to the headword to be found at the same
+ point in the dictionary. Whether listed
+ separately or not, these words should be indexed
+ at this location, also.
+
+<au> italic Authority or author. Used where an authority is
+ given for a definition, and also used for the
+ author, where a quotation within double quotes is
+ given in the same paragraph as the definition.
+ The double quotes are indicated by the open-quote
+ (\'bd) and close-quote (\'b8). In both cases, it
+ is typically right-justified, almost always
+ fitting on the same line with the last line of
+ the definition or quotation.
+
+ Within collocation segments, it is usually used
+ only after quotations, and is not
+ right-justified, except occasionally where it
would be close to the right margin, and then
- apparently is is right-justified. We have
- not explicitly marked those which are
- right-justified, but they can be
- recognized because they are on a line by
- themselves, preceded by two carriage returns.
+ apparently is is right-justified. We have not
+ explicitly marked those which are
+ right-justified, but they can be recognized
+ because they are on a line by themselves,
+ preceded by two carriage returns.
-<bio> * Marks a biography. Should be longer than
- a short mention of who a person was, which
- is typically included as a definition.
+<bio> * Marks a biography. Should be longer than a short
+ mention of who a person was, which is typically
+ included as a definition.
-<biography> * Same as <bio>
+<biography> * Same as <bio>
-<booki> italic Marks the name of a book, pamphlet, or similar
- document.
+<booki> italic Marks the name of a book, pamphlet, or similar
+ document.
-<branchof> * A field of knowledge which of which the headword
+<branchof> * A field of knowledge which of which the headword
is a division.
-<caption> * Caption of a figure or table.
-
-<cas> * tags the CAS (Chemical Abstracts Service) registry
- number for a chemical substance.
-
-<causes> italic tags the infectious disease caused by the headword.
- Implied type of the agent is a microorganism, and
- the tag must mark a disease.
+<caption> * Caption of a figure or table.
-<causesp> * Same as <causes> without the italic type.
-<causedbyp> * Same as <causedby> without the italic type.
+<cas> * tags the CAS (Chemical Abstracts Service)
+ registry number for a chemical substance.
-<causedby> italic inverse of causes: tags the causative agent of an
- infectious disease, which is the headword .
- the tag must mark a microorganism, virus, or
- prion, and the implied type of the headword is
- a disease.
+<causes> italic tags the infectious disease caused by the
+ headword. Implied type of the agent is a
+ microorganism, and the tag must mark a disease.
-<centered> Used only for The single letter in the headers to each
- letter of the alphabet.
+<causesp> * Same as <causes> without the italic type.
+<causedbyp> * Same as <causedby> without the italic type.
-<city> * marks the proper name of a city. Used only
- occasionally and not consistently at this stage.
+<causedby> italic inverse of <causes>: tags the causative agent of
+ an infectious disease, which is the headword.
+ The tag must mark a microorganism, virus, or
+ prion, and the implied type of the headword is a
+ disease.
-<cnvto> italic Converted to: used to tag substances which are
- products prepared by conversion from the
- headword. Usually chemicals or complex
- products from mnatuarl materials. Rarely used
- up to 1998.
+<centered> Used only for the single letter in the headers to
+ each letter of the alphabet.
-<colheads> * List of heads for the columns of a table.
+<city> * marks the proper name of a city. Used only
+ occasionally and not consistently at this stage.
-<coltitle> * Title of a column in a table.
+<cnvto> italic Converted to: used to tag substances which are
+ products prepared by conversion from the
+ headword. Usually chemicals or complex products
+ from natuarl materials. Rarely used up to 1998.
-<comm> * Comment -- differs from <note> in being in-line with
- the definition paragraph. Provides a little
- additional information.
+<colheads> * List of heads for the columns of a table.
-<company> * Name of a company (commercial firm). Compare <org>
+<coltitle> * Title of a column in a table.
-<compof> italic Composed of. Tags a substance of which the
- headword is at least partly composed. The
- substance may be particulate, such as
- diatoms composing diatomaceous earth.
+<comm> * Comment -- differs from <note> in being in-line
+ with the definition paragraph. Provides a little
+ additional information.
-<contains> * marks an object contained within the headword.
+<company> * Name of a company (commercial firm). Compare
+ <org>.
-<contr> italic Contrasting word. Not exactly an antonym, which
- is marked <ant>, but a contrasting word which is
- often introduced as "opposite to" or "contrasts
- with".
+<compof> italic Composed of. Tags a substance of which the
+ headword is at least partly composed. The
+ substance may be particulate, such as diatoms
+ composing diatomaceous earth.
-<country> * Name of a country (nation) of the world.
+<contains> * marks an object contained within the headword.
-<cref> italic Collocation reference. A reference to a collocation.
- Each such collocation should have its own entry,
- marked by <col> ... </col> tags, and these
- references should function as hypertext buttons
- to access that entry.
+<contr> italic Contrasting word. Not exactly an antonym, which
+ is marked <ant>, but a contrasting word which is
+ often introduced as "opposite to" or "contrasts
+ with".
-<date> * A Date, of any type, e.g. <date>Dec. 25</date>.
+<country> * Name of a country (nation) of the world.
-<datey> * Date-with-year tags a date containing a year.
-
-<def> * definition. The definition may have subfields,
- particularly <as> (an illustrative phrase
- starting with "as" or "thus" and containing
- the headword (or a morphological derivative).
- The <mark>, \'bd...\'b8 quotations (left and
- right double quotes) and <au> fields may be
- found within a definition field, but should
- and usually are located outside the definition
- proper. The marking macro was
- inconsistent in this placement, and the
- exclusion of the <mark>, <au> and quotations
- needs to be completed by the proof-readers.
- Certain definitions contain <pos>
- fields within them, where the headword is
- an irregular derivative of another headword.
- In these cases, the <pos> field follows
- immediately after the <def> tag, and these
- entries do not have a separate <pos> field.
- In such cases, the <pos> field is italic, as
- usual.
-
-<divof> * Division of the headword, usually an organization.
- E. g. a faculty or department of a university,
- or a United Nations agency.
+<cref> italic Collocation reference. A reference to a
+ collocation. Each such collocation should have
+ its own entry, marked by <col> ... </col> tags,
+ and these references should function as hypertext
+ buttons to access that entry.
-<edi> * Marks an education institution, a subtype of
+<date> * A Date, of any type, e.g. <date>Dec. 25</date>.
+
+<datey> * Date-with-year tags a date containing a year.
+
+<def> * A definition. The definition may have subfields,
+ particularly <as> (an illustrative phrase
+ starting with "as" or "thus" and containing the
+ headword (or a morphological derivative). The
+ <mark>, \'bd...\'b8 quotations (left and right
+ double quotes) and <au> fields may be found
+ within a definition field, but should and usually
+ are located outside the definition proper. The
+ marking macro was inconsistent in this placement,
+ and the exclusion of the <mark>, <au> and
+ quotations needs to be completed by the
+ proof-readers.
+
+ Certain definitions contain <pos> fields within
+ them, where the headword is an irregular
+ derivative of another headword. In these cases,
+ the <pos> field follows immediately after the
+ <def> tag, and these entries do not have a
+ separate <pos> field. In such cases, the <pos>
+ field is italic, as usual.
+
+<divof> * Division of the headword, usually an
+ organization. E. g. a faculty or department of a
+ university, or a United Nations agency.
+
+<edi> * Marks an education institution, a subtype of
organization.
-<emits> * tags a physical object or form of radiation
- emitted by the headword
+<emits> * Tags a physical object or form of radiation
+ emitted by the headword.
-<figure> Just a place-holder for illustrations, but seldom used.
+<figure> Just a place-holder for illustrations, but seldom
+ used.
-<film> italic Marks the name of a movie film.
+<film> italic Marks the name of a movie film.
-<fld> italic Field of specialization. Most often used for
+<fld> italic Field of specialization. Most often used for
Zoology and Botany, but many "fields of
- specialization" are marked for technical
- terms. The parentheses are usually within this
- field, but are not themselves in italics.
-
-<geog> * Name of a geograpahical region of any size;
- if applicable, the more specific <city>,
- <state>, or <country> are preferred.
-
-<hypen> * Hyperym. Points to the hypernym from WordNet 1.5
- Initially, used only for entries extracted
- from WordNet 1.5. Not present in the original
- 1913 version.
+ specialization" are marked for technical terms.
+ The parentheses are usually within this field,
+ but are not themselves in italics.
+
+<geog> * Name of a geograpahical region of any size; if
+ applicable, the more specific <city>, <state>, or
+ <country> are preferred.
+
+<hypen> * Hyperym. Points to the hypernym from WordNet 1.5
+ Initially, used only for entries extracted from
+ WordNet 1.5. Not present in the original 1913
+ version.
-<illu> * Illustrative usage -- mostly from WordNet, and placed
- outside the definition, in contrast to <as> usage.
- These should be converted to <as>...</as> illustrative
- usage format for consistency.
-
-<illust> * Illustration place-holder. Seldom used.
-<img> * HTML usage -- points to an image file, usually
- .gif or .jpg. These have no closing tag, and
- will appear as errors in parsing.
-<intensi> * Points to a word whose meaning is an intensified
- form of the headword. Taken from WordNet
- tags, used with some adjectives from WordNet
-<item> * Designates one item in a row of a table. Used only when
- intervening spaces do not serve properly as natural
- field separaters.
-<itran> italic Translation into a foreign (non-English) language
- of the previous word in the text -- italic font.
- (<sig> is a translation into English)
-<itrans> italic Same as <itran>
-<jour> * Title of a journal (periodical).
-<matrix> * Always a filled rectangular array.
-<matrix2x5> * A 2x5 matrix (2 rows by 5 columns).
-<mstypec> * Multiple synonymous subtypes -- used in
- def. of "grass".
-<mtable> * Multiple table, encloses <table> figures.
-<musfig> * Music figure. Only in a note under the entry "Figure",
- the two numbers of each such field
- are bold, 20 point type, stacked as in a fraction with
- a bar between them, but also having a horizontal stroke
- midway through each numeral. Unique to this entry.
-<p> * paragraph tag, used always in pairs. Line breaks may
- be embedded inside the paragraphs.
-<person> * marks the proper name of a person. Used only
- occasionally, but should be used more frequently
- for cases where first names are abbreviated,
- to reduce ambiguity of the period for automatic
- analysis. Where a title is given, prefixed
- or postfixed, it is included in this tag.
-
-<persfn> * marks the name of a person, when only one name
- (usually the last name) is given. Not used
- consistently where it should be.
-
-<publ> * Marks the name of a publication other than book,
- which is marked by <booki>. It is often a
- magazine or journal.
-<qpers> * Tags the name of a person who is speaking,
- within a quotation.
-<qperson> Same as <qpers>
-<cp> * Collocation, plain text -- used to tag phrases that
- should be parsed as a unit, but has no typographical
- significance.
-<qau> italic Always right-justified, as described for <au>.
-<ref> * A reference to a word in the vocabulary.
-<refs> * Marks the set of references used for a longer article
- such as a biography.
-<river> * Marks the name of a river -- a proper name
-<rj> * Right justified
-<row> * Designates a row in a table.
-<state> * Name of a geopolitical state, the first subdivision of
- a country. Includes, e.g. Canadian provinces.
-<subtypes> * Lists subtypes of the headword.
-<sup> * superscript
-<supr> * Supra. The two parts of each such field
- are stacked, one over the other, *without* a
- horizontal bar between (as in a fraction).
- Used only in one entry, for a musical notation.
-<table> * Always a filled rectangular array, having <row> and <item>
- elements.
-<td> * Table datum - one cell in a table
-<th> * Table header
-<tradename> * Tags a commercial Trade name
-<ttitle> * Table title (Larger than normal font)
+<illu> * Illustrative usage -- mostly from WordNet, and
+ placed outside the definition, in contrast to
+ <as> usage. These should be converted to
+ <as>...</as> illustrative usage format for
+ consistency.
+
+<illust> * Illustration place-holder. Seldom used.
+
+<img> * HTML usage -- points to an image file, usually
+ .gif or .jpg. These have no closing tag, and
+ will appear as errors in parsing.
+
+<intensi> * Points to a word whose meaning is an intensified
+ form of the headword. Taken from WordNet tags,
+ used with some adjectives from WordNet.
+
+<item> * Designates one item in a row of a table. Used
+ only when intervening spaces do not serve
+ properly as natural field separaters.
+
+<itran> italic Translation into a foreign (non-English) language
+ of the previous word in the text -- italic font.
+ (<sig> is a translation into English)
+
+<itrans> italic Same as <itran>
+
+<jour> * Title of a journal (periodical).
+
+<matrix> * Always a filled rectangular array.
+
+<matrix2x5> * A 2x5 matrix (2 rows by 5 columns).
+
+<mstypec> * Multiple synonymous subtypes -- used in def. of
+ "grass".
+
+<mtable> * Multiple table, encloses <table> figures.
+
+<musfig> * Music figure. Only in a note under the entry
+ "Figure", the two numbers of each such field are
+ bold, 20 point type, stacked as in a fraction
+ with a bar between them, but also having a
+ horizontal stroke midway through each
+ numeral. Unique to this entry.
+
+<p> * Paragraph tag, used always in pairs. Line breaks
+ may be embedded inside the paragraphs.
+
+<person> * Marks the proper name of a person. Used only
+ occasionally, but should be used more frequently
+ for cases where first names are abbreviated, to
+ reduce ambiguity of the period for automatic
+ analysis. Where a title is given, prefixed or
+ postfixed, it is included in this tag.
+
+<persfn> * Marks the name of a person, when only one name
+ (usually the last name) is given. Not used
+ consistently where it should be.
+
+<publ> * Marks the name of a publication other than book,
+ which is marked by <booki>. It is often a
+ magazine or journal.
+
+<qpers> * Tags the name of a person who is speaking, within
+ a quotation.
+
+<qperson> Same as <qpers>
+
+<cp> * Collocation, plain text -- used to tag phrases
+ that should be parsed as a unit, but has no
+ typographical significance.
+
+<qau> italic Always right-justified, as described for <au>.
+
+<ref> * A reference to a word in the vocabulary.
+
+<refs> * Marks the set of references used for a longer
+ article such as a biography.
+
+<river> * Marks the name of a river -- a proper name.
+
+<rj> * Right justified.
+
+<row> * Designates a row in a table.
+
+<state> * Name of a geopolitical state, the first
+ subdivision of a country. Includes, e.g. Canadian
+ provinces.
+
+<subtypes> * Lists subtypes of the headword.
+
+<sup> * Superscript
+
+<supr> * Supra. The two parts of each such field are
+ stacked, one over the other, *without* a
+ horizontal bar between (as in a fraction). Used
+ only in one entry, for a musical notation.
+
+<table> * Always a filled rectangular array, having <row>
+ and <item> elements.
+
+<td> * Table datum - one cell in a table.
+
+<th> * Table header.
+
+<tradename> * Tags a commercial Trade name.
+
+<ttitle> * Table title (Larger than normal font).
====================================================================
-Functional Tags
---------------------------------------------------------------------
-Tag Font Meaning
- (Comparatives are relative to the plain font.)
------------------------------------------------------------------------
-<-- --> * Comment, not a tag. These segments should be deleted
- from the written or printed text.
- Page numbers of the o