From 3d4fbac289846464491104b01bebe554da6758da Mon Sep 17 00:00:00 2001
From: Sergey Poznyakoff
Date: Thu, 2 Feb 2012 14:42:06 +0200
Subject: Reorganize the directory structure.
* .gitignore: New file.
* Makefile: Fix the list of distributed files.
* README.DIC: Rename to README and edit.
* WXXVII.JPG: Remove.
* abbrevn.lst: New file.
* authors.lst: New file.
* gcide.conf: New file.
* PRONUNC.JPG: Rename to pronunc.jpg.
* PRONUNC.WEB: Rename to pronunc.txt.
* SYMBOLS.JPG: Rename to symbols.jpg
* TAGSET.WEB: Rename to tagset.txt
* WEBFONT.ASC: Rename to webfont.txt.
* titlepage.png: New file.
---
tagset.txt | 1080 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 1080 insertions(+)
create mode 100644 tagset.txt
(limited to 'tagset.txt')
diff --git a/tagset.txt b/tagset.txt
new file mode 100644
index 0000000..f0b9367
--- /dev/null
+++ b/tagset.txt
@@ -0,0 +1,1080 @@
+ FIELD MARKS FOR WEBSTER 1913 and CIDE
+ =====================================
+Tagset.web:
+ Explanations of the tags used to mark the Webster 1913 dictionary
+and the CIDE (Collaborative International Dictionary of English).
+Note that the list of tags used to mark the public domain version
+of this dictionary is shorter than the full set described here.
+ If any tag is not listed here, it is either (1) one of the
+"point" (font size) or "type" (font style) tags, which should be self-explanatory; or
+ (2) Is a functional field with no effect on the typography.
+
+Last modified March 12, 1999.
+ For questions, contact:
+ Patrick Cassidy cassidy@micra.com
+ 735 Belvidere Ave.
+ Plainfield, NJ 07062
+ (908) 561-3416 or (908) 668-5252
+-------------------------------------------------------------
+A separate file, webfont.asc, contains the list of the individual
+non-ASCII characters represented by either higher-order hexadecimal
+character marks (e.g., \'94, for o-umlaut) or by entity tags
+(e.g., .
+
+ Note: The tags on this list are similar in structure to SGML tags. Each
+tag on this list marks a field; each field opens with a tagname between
+angle brackets thus: , and closes with a similar tag containing
+the forward slash thus: . No tags are used without closing
+tags. Thus the HTML to indicate a line break is symbolized
+here as an entity, has a corresponding
.
+ The absence of an end-field tag, or the presence of an end-field tag
+without a prior begin-field tag constitutes a typographical error, of which
+there may be a significant number. Any errors detected should be brought
+to the attention of PJC or the appropriate editor.
+ Most of the tagged fields are presented in the text in italic type,
+with a number of exceptions. Where a word is contained within more than
+one field, the innermost field determines the font to be used. Wherever
+recognizable functional fields were found, an attempt was made to tag the
+field with a functional mark, but in many cases, words were italicised only
+to represent the word itself as a discourse entity, and in some such cases,
+the "italic" mark was used, implying nothing regarding functionality
+of the word. The base font is considered "plain". Where an italic field
+is indicated, parentheses or brackets within the field are not italicised.
+ Where no font is specified for a tag, the tag is merely a functional
+division, and was printed in plain font unless otherwise tagged. This type
+of segment is marked by an asterisk (*) where the font name would be.
+ The size of the "plain" font in the original text is about 1.6 mm for
+the height of capitalized letters.
+=============================================================
+Explicit typographical tags:
+ These were used where the purpose of a different font was merely to
+distinguish a word from the body of the text, and no explicit functional
+tag seemed apropriate.
+-----------------------------------
+Tag Font
+-----------------------------------
+Explicit formatting tags:
+. . . . . . . . . . . . . . . . . .
+ plain font (that used in the body of a definition) --
+ normally not marked, except within fields of
+ a different front.
+ italic (in master files)
+ italic (for use in HTML presentation)
+ bold (in master files)
+ bold (for use in HTML presentation)
+ bold, Collocation font. Same font as used in collocations.
+ smaller This is used only in the list of "un-" words not
+ by 1 point actually defined in the dictionary. Probably could be
+ replaced by a segment mark for the entire list!
+ The "un-" words should be indexed as headwords.
+
+ bold Same as , a font similar to that used in
+ collocations. However, this tag is used in a table
+ and could be set to a different font.
+
+
* HTML tag -- largest heading font.
+
+
* HTML tag -- second largest heading font.
+
+ * Marks a Row title in a table.
+
+ Font the same as the headword , though the field is
+ not a headword. Used only once.
+
+ * Multiple items, a set of items in a table.
+ A series of point size markers, many unique.
+ * One of the tags of the form where **
+ represents the typographic point size of the
+ enclosed text.
+
An HTML tag indicating that the enclosed text is
+ of teletype form, preformatted in a uniform-spaced
+ font.
+ small caps (used mostly for "a. d.", "b. c.")
+ This is the same font a , but has no functional
+ or semantic significance
+ group of table data elements in a table
+ subscript, like
+ subscript
+ superscript
+ superscript
+ Sans-serif font
+ Bold (collocation font) and also a subtype.
+ HTML tage -- teletype font
+ A squared bold font without serifs approximating the
+ "universe bold" font on the HP Laserjet4, slightly
+ larger than the capitals in a definition body. Used
+ in expositions describing shapes, such as
+ "Y", "T", "U", "X", "V", "F".
+ Vertically organized column.
+ Vertically organized column -- only part of a table
+ which needs to be completed. Used once.
+<...type> A series of tags, many unique, designating certain
+ unusual fonts, such as "bourgeoistype" for
+ "bourgeois type", in the section on typography.
+ Most of these occur only once, in the section on fonts.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+=============================================================
+Tags with semantic content:
+. . . . . . . . . . . . . . . . . . . . . . . . . . .
+ * Alternative spelling segment. Almost always
+ contained within square brackets after the main
+ definition segment. Expository words
+ such as "Spelled also" are in plain font;
+ the actual alternative spelling is marked by
+ ... tags within this segment.
+
+ italic Antonym.
+
+ italic Alternative spelling. The actual word which is an
+ alternative spelling to the headword. These
+ are functionally synonyms of the headword. In
+ most cases these also occur as headwords, with
+ reference to the word where the actual definition
+ is found, but not all such words are listed
+ separately, particularly if the spelling is
+ close enough to the headword to be found at the
+ same point in the dictionary. Whether listed
+ separately or not, these words should
+ be indexed at this location, also.
+
+ italic Authority or author. Used where an authority is
+ (may be right- given for a definition, and also used for the
+ justified. See author, where a quotation within double quotes
+ in the section is given in the same paragraph as the
+ on formatting). definition. The double quotes are indicated
+ by the open-quote (\'bd) and close-quote
+ (\'b8). In both cases, it is typically
+ right-justified, almost always fitting on
+ the same line with the last line of the
+ definition or quotation.
+ Within collocation segments, it is usually
+ used only after quotations, and is not right-
+ justified, except occasionally where it
+ would be close to the right margin, and then
+ apparently is is right-justified. We have
+ not explicitly marked those which are
+ right-justified, but they can be
+ recognized because they are on a line by
+ themselves, preceded by two carriage returns.
+
+ * Marks a biography. Should be longer than
+ a short mention of who a person was, which
+ is typically included as a definition.
+
+ * Same as
+
+ italic Marks the name of a book, pamphlet, or similar
+ document.
+
+ * A field of knowledge which of which the headword
+ is a division.
+
+
* Caption of a figure or table.
+
+ * tags the CAS (Chemical Abstracts Service) registry
+ number for a chemical substance.
+
+ italic tags the infectious disease caused by the headword.
+ Implied type of the agent is a microorganism, and
+ the tag must mark a disease.
+
+ * Same as without the italic type.
+ * Same as without the italic type.
+
+ italic inverse of causes: tags the causative agent of an
+ infectious disease, which is the headword .
+ the tag must mark a microorganism, virus, or
+ prion, and the implied type of the headword is
+ a disease.
+
+ Used only for The single letter in the headers to each
+ letter of the alphabet.
+
+ * marks the proper name of a city. Used only
+ occasionally and not consistently at this stage.
+
+ italic Converted to: used to tag substances which are
+ products prepared by conversion from the
+ headword. Usually chemicals or complex
+ products from mnatuarl materials. Rarely used
+ up to 1998.
+
+ * List of heads for the columns of a table.
+
+ * Title of a column in a table.
+
+ * Comment -- differs from in being in-line with
+ the definition paragraph. Provides a little
+ additional information.
+
+ * Name of a company (commercial firm). Compare
+
+ italic Composed of. Tags a substance of which the
+ headword is at least partly composed. The
+ substance may be particulate, such as
+ diatoms composing diatomaceous earth.
+
+ * marks an object contained within the headword.
+
+ italic Contrasting word. Not exactly an antonym, which
+ is marked , but a contrasting word which is
+ often introduced as "opposite to" or "contrasts
+ with".
+
+ * Name of a country (nation) of the world.
+
+ italic Collocation reference. A reference to a collocation.
+ Each such collocation should have its own entry,
+ marked by
... tags, and these
+ references should function as hypertext buttons
+ to access that entry.
+
+ * A Date, of any type, e.g. Dec. 25.
+
+ * Date-with-year tags a date containing a year.
+
+ * definition. The definition may have subfields,
+ particularly (an illustrative phrase
+ starting with "as" or "thus" and containing
+ the headword (or a morphological derivative).
+ The , \'bd...\'b8 quotations (left and
+ right double quotes) and fields may be
+ found within a definition field, but should
+ and usually are located outside the definition
+ proper. The marking macro was
+ inconsistent in this placement, and the
+ exclusion of the , and quotations
+ needs to be completed by the proof-readers.
+ Certain definitions contain
+ fields within them, where the headword is
+ an irregular derivative of another headword.
+ In these cases, the field follows
+ immediately after the tag, and these
+ entries do not have a separate field.
+ In such cases, the field is italic, as
+ usual.
+
+ * Division of the headword, usually an organization.
+ E. g. a faculty or department of a university,
+ or a United Nations agency.
+
+ * Marks an education institution, a subtype of
+ organization.
+
+ * tags a physical object or form of radiation
+ emitted by the headword
+
+
Just a place-holder for illustrations, but seldom used.
+
+ italic Marks the name of a movie film.
+
+ italic Field of specialization. Most often used for
+ Zoology and Botany, but many "fields of
+ specialization" are marked for technical
+ terms. The parentheses are usually within this
+ field, but are not themselves in italics.
+
+ * Name of a geograpahical region of any size;
+ if applicable, the more specific ,
+ , or are preferred.
+
+ * Hyperym. Points to the hypernym from WordNet 1.5
+ Initially, used only for entries extracted
+ from WordNet 1.5. Not present in the original
+ 1913 version.
+
+ * Illustrative usage -- mostly from WordNet, and placed
+ outside the definition, in contrast to usage.
+ These should be converted to ... illustrative
+ usage format for consistency.
+
+ * Illustration place-holder. Seldom used.
+ * HTML usage -- points to an image file, usually
+ .gif or .jpg. These have no closing tag, and
+ will appear as errors in parsing.
+ * Points to a word whose meaning is an intensified
+ form of the headword. Taken from WordNet
+ tags, used with some adjectives from WordNet
+ * Designates one item in a row of a table. Used only when
+ intervening spaces do not serve properly as natural
+ field separaters.
+ italic Translation into a foreign (non-English) language
+ of the previous word in the text -- italic font.
+ ( is a translation into English)
+ italic Same as
+ * Title of a journal (periodical).
+ * Always a filled rectangular array.
+ * A 2x5 matrix (2 rows by 5 columns).
+ * Multiple synonymous subtypes -- used in
+ def. of "grass".
+ * Multiple table, encloses
figures.
+ * Music figure. Only in a note under the entry "Figure",
+ the two numbers of each such field
+ are bold, 20 point type, stacked as in a fraction with
+ a bar between them, but also having a horizontal stroke
+ midway through each numeral. Unique to this entry.
+
* paragraph tag, used always in pairs. Line breaks may
+ be embedded inside the paragraphs.
+ * marks the proper name of a person. Used only
+ occasionally, but should be used more frequently
+ for cases where first names are abbreviated,
+ to reduce ambiguity of the period for automatic
+ analysis. Where a title is given, prefixed
+ or postfixed, it is included in this tag.
+
+ * marks the name of a person, when only one name
+ (usually the last name) is given. Not used
+ consistently where it should be.
+
+ * Marks the name of a publication other than book,
+ which is marked by . It is often a
+ magazine or journal.
+ * Tags the name of a person who is speaking,
+ within a quotation.
+ Same as
+ * Collocation, plain text -- used to tag phrases that
+ should be parsed as a unit, but has no typographical
+ significance.
+ italic Always right-justified, as described for .
+ * A reference to a word in the vocabulary.
+ * Marks the set of references used for a longer article
+ such as a biography.
+ * Marks the name of a river -- a proper name
+ * Right justified
+ * Designates a row in a table.
+ * Name of a geopolitical state, the first subdivision of
+ a country. Includes, e.g. Canadian provinces.
+ * Lists subtypes of the headword.
+ * superscript
+ * Supra. The two parts of each such field
+ are stacked, one over the other, *without* a
+ horizontal bar between (as in a fraction).
+ Used only in one entry, for a musical notation.
+
* Always a filled rectangular array, having and
+ elements.
+
* Table datum - one cell in a table
+
* Table header
+ * Tags a commercial Trade name
+ * Table title (Larger than normal font)
+====================================================================
+
+Functional Tags
+--------------------------------------------------------------------
+Tag Font Meaning
+ (Comparatives are relative to the plain font.)
+-----------------------------------------------------------------------
+<-- --> * Comment, not a tag. These segments should be deleted
+ from the written or printed text.
+ Page numbers of the original text are indicated
+ within such comments; these may be left in, if
+ desired.
+
+ * HTML-style comment. Used to indicate page numbers
+ in the public domain version.
+
+ italic Tag for abbreviations, when mentioned within
+ the definition text.
+
+ small caps Tags for the actual adjective or adverb
+ comparatives or superlatives. Should be
+ indexed. See also conjf (verbs) and
+ decf (nouns).
+
+ italic Alternative name. Usually for plants or animals,
+ but also used for other cases where words
+ are introduced by "also called", "called also",
+ "formerly called". These are functionally
+ *synonyms* for that word-sense.
+
+ italic Same as , but the marked word is a
+ plural form, whereas the headword is singular.
+
+ * Adjective morphological segment, primarily
+ the comparative and superlative forms.
+ The occasional adverb morphology is
+ also tagged this way.
+
+ * A segment occurring within the definitional
+ sentence, providing an example of usage of
+ the headword. Not conceptually a part of the
+ actual definition.
+
+ smaller spacing Collocation definition. Similar in structure
+ to headword definitions (the field). May
+ contain an field. Plain type, but with
+ closer spacing than main definitions.
+
+
bold, Collocation. A word combination containing the
+ smaller by headword (or a morphological derivative).
+ 1 point The collocations do not have an explicitly
+ marked part of speech.
+ See also , tagging embedded collocations.
+
+ Collocation, no typographic significance.
+ Used to mark a word combination defined in
+ the dictionary without affect on font.
+
+ small caps The conjugated (non-infinitive) forms of
+ verbs. imp. & p. p. is common, as well as
+ p. pr. & vb. n. Irregular variants of
+ these are less common. Words in this
+ field perhaps should be indexed.
+
+ smaller Collocation segment. The font and size is
+ vertical normal in a cs, but the spacing between lines
+ spacing is smaller (0.9 mm between lower-case letters,
+ rather than 1.1 mm in the main body of the
+ definition). For an on-line dictionary,
+ reproducing this typography is probably
+ pointless.
+
+ small caps Declension form. The actual morphological
+ variants of nouns or pronouns. Should
+ be indexed.
+
+ * Embedded Collocation. A word combination
+ containing the headword (or a morphological
+ derivative, embedded within a definition
+ without a separate definition of its own.
+ These collocations should be defined
+ implicitly by the text of the definition in
+ which they are embedded.
+ See also
, tagging explicitly defined
+ collocations.
+
+ Bold Entry field. Gives the headword without accent or
+ syllabication marks, and with special-character
+ symbols converted to their nearest ASCII
+ equivalents. Can be used without conversion
+ as the string that serves as the index word
+ for that entry.
+
+ Small Caps Entry reference. References to headwords
+ within the "etymology" section are in small
+ caps. Such references also occur
+ in the body of definitions, and in "usage"
+ segments.
+ Such entry references should function as hypertext
+ buttons to access that entry.
+
+ * Etymology. Always contained within square
+ brackets. Normal type is used for explanatory
+ comments, and italics for the actual words
+ (marked ) considered as etymological
+ sources.
+
+ italic Etymological source. Words from which the
+ headword was derived, or to which it is related.
+ The Greek words within an etymology segment
+ are invariably etymology sources, and should
+ be marked as such, but are not so marked,
+ even in the rare cases where the Greek word
+ transliteration has been written in.
+
+ italic Etymological source, being the name of a person
+ or geographical location which is the eponym
+ for the concept. This is used to distinguish
+ eponymous etymologies from others, and can also
+ be found in the body of a definition or note,
+ not only in the etymology field. Very few
+ of the names that should be marked this way
+ have actually been so marked, as of version
+ 0.42. In cases where such eponymous names
+ have not yet been thus marked, they will
+ usually be marked by , the non-semantic
+ italic-font marker, or, in etymologies, by
+ .
+
+ italic Example. An example of usage of the headword,
+ usually found within an or segment.
+
+ * Frequency of use, ordinal rank. This is used for
+ WordNet entries, in which the synonyms
+ were ranked in order of frequency of use.
+ 1 indicates that the headword is the
+ first word on the list of synonyms.
+
+ * First use. A date at or around which the first
+ use of this word in writing is recorded.
+ Not in the original 1913 Webster, and usu.
+ taken from a recent dictionary. Only a few
+ such fields have been entered as of version
+ 0.41
+
+ transliteration Greek. The Greek words have been transliterated
+ using the equivalents explained in the
+ file "webfonts.asc". In most cases, the
+ transliterations are typical for Greek
+ letters, except for theta (transl = q),
+ phi (transl. = f), eta (transl. = h), and
+ upsilon (transl. = y, whether pronounced
+ as y or u). This was to eliminate any
+ ambiguity. These words occur primarily
+ in etymologies, and to conform to the
+ usage of should also be marked
+ by , but as of version 0.41 they
+ are not usually thus marked.
+
+ bold, headword. Each main entry begins with the
+ larger by mark, and ends at the next mark. The
+ 2 points main entries are not otherwise explicitly
+ marked as a distinctive field.
+ The same word may appear as a headword
+ several times, usually as different parts
+ of speech, but sometimes with different
+ entries as the same part of speech, presumably
+ to indicate a different etymology.
+ Within the hw field the heavy accent is
+ represented by double quote ("), the
+ light accent by open-single-quote (`),
+ and the short dash separating syllables by
+ an asterisk (*). A hyphen (-) is used to
+ represent the hyphen of hyphenated words.
+
+ italic, Usage mark. Almost always within square
+ brackets, occasionally in parentheses or
+ without any bracketing.
+ but The most common usage marks,
+ explanatory "Obs." = obsolete "R." = rare, "Colloq." =
+ may be plain. colloquial, "Prov. Eng." = Provincial England,
+ etc. are in italics. Some usage notes are also
+ marked with , but are in plain. For
+ simplicity, all words in this field may be
+ italic, until additional explicit marks are
+ added.
+
+ * A usage mark in plain type (not italic). Found
+ within a definition, when there are more than
+ one sense-number listed. "Fig." at the head
+ of an entry is the most common case.
+
+ * Multiple collocation. Similar to multiple
+ headword, when two or more collocations share
+ one definition; however, the two collocations
+ are in-line, rather than stacked or justified.
+ There may be "or" or "and" words
+ (italicised), or an "etc." (plain type)
+ within this field. In many cases, the
+ * Multiple headword. This field is used where
+ more than one headword shares a single
+ definition. In the dictionary, the
+ (usually) two headwords are left-justified
+ one below the other in the column, and are
+ tied together on the right side of the
+ headwords by a long right curly brace.
+ This division is strictly functional,
+ for analytical purposes, and does not
+ affect the typography.
+
+ * Noun morphology section. Rarely used, mostly
+ for irregular personal pronouns.
+
+ * Explanatory note. No explicit font is indicated.
+ These segments may be separate, as in the
+ separate paragraphs starting * Plural. The "plural" segment starts with a
+ "pl." which is italicised, but in this
+ segment is not otherwise marked as
+ italicised. Other words occurring in this
+ segment are plain type. The "pl." can be
+ easily explicitly marked if necessary.
+
+ italic Part of speech. Always an abbreviation: e.g.,
+ n.; v. i.; v. t.; a.; adv.; pron.; prep.
+ Combinations may occur, as "a. & n.".
+
+ * Part of speech, referring to words in
+ etymologies, normal type. Always an
+ abbreviation, as in above
+ Combinations may occur, as "a. or n.".
+
+ small caps Plural word. The actual plural form of the word,
+ found within a segment.
+
+ * pronunciation. The default font is normal, but
+ many non-ASCII characters are used.
+ The pronunciation field may have more than
+ one pronunciation, separated by an " smaller by Quotation. No bracketing quotation marks,
+ two points, though occasionally \'bd-\'b8 quotations occur
+ centered, within these quotations. These quotations
+ Separate tend to be more complete sentences, rather
+ paragraph than just phrases, such as are contained
+ within quotation marks within the definition
+ paragraph.
+
+ italic, Quotation author. Used only for the quotations
+ right justified marked with that are centered in their
+ own paragraphs.
+
+ italic Quotation example. An example of usage of
+ the headword, within quotations marked
+ by .. tags.
+
+ italic Subdefinition, marked (a), (b), (c), etc. THese are
+ finer distinctions of word senses, used
+ within numbered word-sense (for main entries),
+ and also used for subdefinitions within
+ collocation segments, which have no numbering of
+ senses. The letter is italic, the parentheses
+ are not. This tag is also used to indicate the
+ lettered subdefinition when it is referred to
+ at another point in the text.
+
+ italic The name of a ship. Rarely used.
+
+ * Singular. Analogous to the segment, but more
+ rarely used, mostly for Indian tribes, which
+ are listed in the plural form.
+
+ small caps Singular word. The singular form of the
+ plural-form headword.
+
+ bold, Sense number. A headword may have over 20
+ larger by different sense numbers. Within each numbered
+ 2 points sense there may be lettered sub-senses. See
+ the (sub-definition) field.
+
+