aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorSergey Poznyakoff <gray@gnu.org.ua>2012-02-03 12:48:52 +0200
committerSergey Poznyakoff <gray@gnu.org.ua>2012-02-03 12:48:52 +0200
commitd18a469b7a5a4d4b5da21eab37f34ab1e99a8dce (patch)
tree7eb331e376e85287c25b6a9734dae58a4724da8a
parent4a458db06b28492a7e48b1a0560b35778e476482 (diff)
downloadgcide-d18a469b7a5a4d4b5da21eab37f34ab1e99a8dce.tar.gz
gcide-d18a469b7a5a4d4b5da21eab37f34ab1e99a8dce.tar.bz2
Revise tagset.txt
* tagset.txt: Review. * README: Reformat. * webfont.txt: Reformat. Document <and/ and <or/.
-rw-r--r--README363
-rw-r--r--tagset.txt2057
-rw-r--r--webfont.txt302
3 files changed, 1385 insertions, 1337 deletions
diff --git a/README b/README
index b8d21ad..4a36e8b 100644
--- a/README
+++ b/README
@@ -10,25 +10,23 @@ The README file
* OVERVIEW
-==========
-This document describes the GNU version of the Collaborative
-International Dictionary of English. It is organized into a series of
-chapters, introduced by headings beginning with a single asterisk. A
-chapter may have sections, which are marked with two asterisks. For
-those readers who use Emacs, this structure corresponds to its
-"Outline mode", which will be enabled automatically upon loading this
-file.
-
-The chapter "INTRODUCTION" describes the structure of this package.
-The chapter "STRUCTURE OF THE DICTIONARY" describes the dictionary
-structure in general. An overview of the markup tags is provided in
-the chapter "TAGS". A detailed information about dictionary markup
-can be obtained from a set of ancillary files included in this
-package, which are described in the chapter "ANCILLARY FILES".
-
-The chapter "DICTIONARY LOOKUP" describes how to use GNU Dico for
-reading this dictionary. Finally, other versions of the Webster
-dictionary are listed in the chapter "OTHER VERSIONS OF THE
-DICTIONARY".
+
+This document describes the GNU version of the Collaborative International
+Dictionary of English. It is organized into a series of chapters,
+introduced by headings beginning with a single asterisk. A chapter may have
+sections, which are marked with two asterisks. For those readers who use
+Emacs, this structure corresponds to its "Outline mode", which will be
+enabled automatically upon loading this file.
+
+The chapter "INTRODUCTION" describes the structure of this package. The
+chapter "STRUCTURE OF THE DICTIONARY" describes the dictionary structure in
+general. An overview of the markup tags is provided in the chapter "TAGS".
+A detailed information about dictionary markup can be obtained from a set of
+ancillary files included in this package, which are described in the chapter
+"ANCILLARY FILES".
+
+The chapter "DICTIONARY LOOKUP" describes how to use GNU Dico for reading
+this dictionary. Finally, other versions of the Webster dictionary are
+listed in the chapter "OTHER VERSIONS OF THE DICTIONARY".
* INTRODUCTION
-==============
+
The dictionary was derived from the
@@ -50,14 +48,13 @@ and is being proof-read and supplemented by volunteers from around the
world. This is an unfunded project, and future enhancement of this
-dictionary will depend on the efforts of volunteers willing to help
-build this free resource into a comprehensive body of general
-information. New definitions for missing words or words senses and
-longer explanatory notes, as well as images to accompany the articles
-are needed. More modern illustrative quotations giving recent
-examples of usage of the words in their various senses will be very
-helpful, since most quotations in the original 1913 dictionary are now
-well over 100 years old.
-
-This electronic version is being maintained by World Soul, a
-non-profit organization in Plainfield, NJ. For additional information
-or if you are willing to assist construction of this data source, contact:
+dictionary will depend on the efforts of volunteers willing to help build
+this free resource into a comprehensive body of general information. New
+definitions for missing words or words senses and longer explanatory notes,
+as well as images to accompany the articles are needed. More modern
+illustrative quotations giving recent examples of usage of the words in
+their various senses will be very helpful, since most quotations in the
+original 1913 dictionary are now well over 100 years old.
+
+This electronic version is being maintained by World Soul, a non-profit
+organization in Plainfield, NJ. For additional information or if you are
+willing to assist construction of this data source, contact:
@@ -71,36 +68,34 @@ or if you are willing to assist construction of this data source, contact:
-GCIDE is free software; you can redistribute it and/or modify
-it under the terms of the GNU General Public License as published by
-the Free Software Foundation; either version 2, or (at your option)
-any later version.
+GCIDE is free software; you can redistribute it and/or modify it under the
+terms of the GNU General Public License as published by the Free Software
+Foundation; either version 2, or (at your option) any later version.
-GCIDE is distributed in the hope that it will be useful,
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
-GNU General Public License for more details.
+GCIDE is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
+details.
-You should have received a copy of the GNU General Public License
-along with this copy of GCIDE; see the file COPYING. If not, write
-to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
-Boston, MA 02111-1307, USA.
+You should have received a copy of the GNU General Public License along with
+this copy of GCIDE; see the file COPYING. If not, write to the Free
+Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
+02111-1307, USA.
* STRUCTURE OF THE DICTIONARY
-=============================
-When the archive is unpacked, the main dictionary text of the GCIDE
-will be found in 26 files named "CIDE.*", where the asterisk indicates
-which letter of the alphabet begins the words in each file. For
-example, file "CIDE.B" contains words beginning with the letter "B".
-Additional information about the tagging conventions and special
-character symbols are contained in ancillary files in this directory
-(see below the section entitled "ANCILLARY FILES"). The main body of
-the 1913 dictionary was essentially identical to the edition published
-in 1890, and was republished in 1913 with an appendix containing "New
-Words". The new words of that appendix have been integrated into the
-main file in this version. However, it is important to keep in mind
-that the definitions in this dictionary are in most cases over 100
+
+When the archive is unpacked, the main dictionary text of the GCIDE will be
+found in 26 files named "CIDE.*", where the asterisk indicates which letter
+of the alphabet begins the words in each file. For example, file "CIDE.B"
+contains words beginning with the letter "B". Additional information about
+the tagging conventions and special character symbols are contained in
+ancillary files in this directory (see below the section entitled "ANCILLARY
+FILES"). The main body of the 1913 dictionary was essentially identical to
+the edition published in 1890, and was republished in 1913 with an appendix
+containing "New Words". The new words of that appendix have been integrated
+into the main file in this version. However, it is important to keep in
+mind that the definitions in this dictionary are in most cases over 100
years old. Use them with caution!
-At the bottom of each paragraph in this dictionary, there is a
-bracketed and tagged "source" indicated. This tells from where the
-definition or other text in that paragraph came, as follows:
+At the bottom of each paragraph in this dictionary, there is a bracketed and
+tagged "source" indicated. This tells from where the definition or other
+text in that paragraph came, as follows:
@@ -119,6 +114,5 @@ definition or other text in that paragraph came, as follows:
-The original definitions have been tagged and in some cases
-reformatted or slightly rearranged. If substantive information is
-added from a second source, usually the additional source is also
-noted, as in:
+The original definitions have been tagged and in some cases reformatted or
+slightly rearranged. If substantive information is added from a second
+source, usually the additional source is also noted, as in:
@@ -126,35 +120,32 @@ noted, as in:
-This version is tagged with SGML-like tags of the form <pos>...</pos>
-so that the original typography (italics, bold, block quotes) can be
-reproduced. A list of the most important tags for fields in the
-dictionary is given below. The tags also serve the more important
-function of allowing the information content to be conveniently
-imported into computer programs or databases. The set of tags used is
-described in the accompanying file "tagset.txt". ***NOTE*** the
-paragraph tags <p>...</p> do *not* always nest properly with certain
-other tags, such as <note> and <cs> ("collocation section"), which in
-some cases span multiple paragraphs. If you are using a tag parser
-which detects improper nesting, you should first either delete the
-paragraph tags or convert them to non-tag symbols, or, if possible,
-set the parser to ignore the <p>...</p> tags.
-
-The unusual characters (such as Greek or the European accented
-characters, as well as special characters used in the pronunciations)
-are described in the accompanying file "webfont.txt". Some
-information on the pronunciation system used may be found by viewing
-the file "pronunc.jpg", and additional explanations of pronunciation
-are in the file "pronunc.txt".
-
-Each paragraph of the original text is enclosed within tags of the
-form <p> . . . </p>. Within these paragraphs there are no line
-breaks, and some of the paragraphs are over 12,000 characters long,
-which may prove too long to be handled by some editors. At some
-points, embedded line breaks within a "paragraph" are marked by a <br/
-"entity". The file can therefore be converted, if necessary, to a
-form with shorter lines, and subsequently reconverted back to the form
-having one line per paragraph.
-
-If additional line breaks are added, then in order to remove the line
-breaks and reconstruct the original paragraphs, so that the page width
-can be adjusted, perform the following manipulations:
+This version is tagged with SGML-like tags of the form <pos>...</pos> so
+that the original typography (italics, bold, block quotes) can be
+reproduced. A list of the most important tags for fields in the dictionary
+is given below. The tags also serve the more important function of allowing
+the information content to be conveniently imported into computer programs
+or databases. The set of tags used is described in the accompanying file
+"tagset.txt". ***NOTE*** the paragraph tags <p>...</p> do *not* always nest
+properly with certain other tags, such as <note> and <cs> ("collocation
+section"), which in some cases span multiple paragraphs. If you are using a
+tag parser which detects improper nesting, you should first either delete
+the paragraph tags or convert them to non-tag symbols, or, if possible, set
+the parser to ignore the <p>...</p> tags.
+
+The unusual characters (such as Greek or the European accented characters,
+as well as special characters used in the pronunciations) are described in
+the accompanying file "webfont.txt". Some information on the pronunciation
+system used may be found by viewing the file "pronunc.jpg", and additional
+explanations of pronunciation are in the file "pronunc.txt".
+
+Each paragraph of the original text is enclosed within tags of the form <p>
+. . . </p>. Within these paragraphs there are no line breaks, and some of
+the paragraphs are over 12,000 characters long, which may prove too long to
+be handled by some editors. At some points, embedded line breaks within a
+"paragraph" are marked by a <br/ "entity". The file can therefore be
+converted, if necessary, to a form with shorter lines, and subsequently
+reconverted back to the form having one line per paragraph.
+
+If additional line breaks are added, then in order to remove the line breaks
+and reconstruct the original paragraphs, so that the page width can be
+adjusted, perform the following manipulations:
@@ -166,16 +157,15 @@ can be adjusted, perform the following manipulations:
-A more sophisticated formatting of spaces within paragraphs may
-require the use of the fully-tagged master files. If you have a need
-for these files, contact Patrick Cassidy: cassidy@micra.com.
-
-The approximate beginning of each page is marked by an SGML comment of
-the form <-- p. 345 -->. (The exact beginning was in some cases in
-the middle of a paragraph, which we decided was not a good location
-for these page-number comments, so the page number was usually moved
-to the next paragraph break). Pages which have been proofread by
-volunteers (e.g., with initials VOL) will have a note within that page
-comment: <-- p. 345 pr=VOL -->. Pages which have not been proofread
-yet (most of them) will have varying numbers of typographical errors
-in them. We still (January 2012) need proofreaders to get the errors
-out of these dictionary files.
+A more sophisticated formatting of spaces within paragraphs may require the
+use of the fully-tagged master files. If you have a need for these files,
+contact Patrick Cassidy: cassidy@micra.com.
+
+The approximate beginning of each page is marked by an SGML comment of the
+form <-- p. 345 -->. (The exact beginning was in some cases in the middle
+of a paragraph, which we decided was not a good location for these
+page-number comments, so the page number was usually moved to the next
+paragraph break). Pages which have been proofread by volunteers (e.g., with
+initials VOL) will have a note within that page comment: <-- p. 345 pr=VOL
+-->. Pages which have not been proofread yet (most of them) will have
+varying numbers of typographical errors in them. We still (January 2012)
+need proofreaders to get the errors out of these dictionary files.
@@ -183,25 +173,23 @@ out of these dictionary files.
-This version is only a first typing, and has numerous typographic
-errors, including errors in the field-marks. In addition, the user
-must keep in mind that this text is very old and will contain numerous
-obsolete, inaccurate, and perhaps offensive statements, which are
-included solely because this work is intended to reproduce accurately
-this historically interesting classic reference work. This text should
-not be relied upon as an accurate source of information, as in many
-cases it represents the state of knowledge around 1890. The text is
-provided "as is", and the user must accept responsibility for all
-consequences of its use. Please refer to the header of each file and
-the GNU public license. If these conditions of use are unacceptable,
-please do not use these texts.
-
-This electronic dictionary is also made available as a potential
-starting point for development of a modern comprehensive encyclopedic
-dictionary, to be accessible freely on the internet, and developed by
-the efforts of all individuals willing to help build a large and
-freely available knowledge base. A large number of collaborators are
-needed to bring this dictionary to a more accurate, more modern, and
-more useful state. Anyone willing to assist in any way in constructing
-such a knowledge base should contact Patrick Cassidy (see above). All
-reports of errors will be gratefully received, and should also be
-transmitted to PC at: pc@worldsoul.org.
+This version is only a first typing, and has numerous typographic errors,
+including errors in the field-marks. In addition, the user must keep in
+mind that this text is very old and will contain numerous obsolete,
+inaccurate, and perhaps offensive statements, which are included solely
+because this work is intended to reproduce accurately this historically
+interesting classic reference work. This text should not be relied upon as
+an accurate source of information, as in many cases it represents the state
+of knowledge around 1890. The text is provided "as is", and the user must
+accept responsibility for all consequences of its use. Please refer to the
+header of each file and the GNU public license. If these conditions of use
+are unacceptable, please do not use these texts.
+
+This electronic dictionary is also made available as a potential starting
+point for development of a modern comprehensive encyclopedic dictionary, to
+be accessible freely on the internet, and developed by the efforts of all
+individuals willing to help build a large and freely available knowledge
+base. A large number of collaborators are needed to bring this dictionary
+to a more accurate, more modern, and more useful state. Anyone willing to
+assist in any way in constructing such a knowledge base should contact
+Patrick Cassidy (see above). All reports of errors will be gratefully
+received, and should also be transmitted to PC at: pc@worldsoul.org.
@@ -237,4 +225,4 @@ For other tags, see the file "tagset.txt"
In addition to the main text of the dictionary, additional explanatory
-material about this version of the dictionary is available in the
-ancillary files:
+material about this version of the dictionary is available in the ancillary
+files:
@@ -259,4 +247,4 @@ pronunciations.
-A copy of the dictionary page describing the pronunciation symbols used
-in the original work.
+A copy of the dictionary page describing the pronunciation symbols used in
+the original work.
@@ -264,4 +252,4 @@ in the original work.
-This file lists original pronunciation symbols with the corresponding
-markup entities used in this version.
+This file lists original pronunciation symbols with the corresponding markup
+entities used in this version.
@@ -277,22 +265,25 @@ A copy of the original title page.
-Description of the special escape sequences used in this dictionary.
-This file also explains the Greek transliteration syntax used in it.
+Description of the special escape sequences used in this dictionary. This
+file also explains the Greek transliteration syntax used in it.
* DICTIONARY LOOKUP
-===================
+
The GNU Dico project contains a module for reading GCIDE files. This
-distribution provides a configuration file "gcide.conf" which you can
-use with the "dicod" server in order to look up words in the
-dictionary. See http://www.gnu.org.ua/software/dico for a description
-of GNU Dico, including links to download.
+distribution provides a configuration file "gcide.conf" which you can use
+with the "dicod" server in order to look up words in the dictionary. See
+http://www.gnu.org.ua/software/dico for a description of GNU Dico, including
+links to download.
-The instructions below describe how to configure GNU Dico server
-(dicod) to access a copy of the GCIDE dictionary.
+The instructions below describe how to configure GNU Dico server (dicod) to
+access a copy of the GCIDE dictionary.
1. Unpack the GCIDE dictionary;
+
2. Copy the file "gcide.conf" to a directory where you keep your local
configuration files (/etc or /usr/local/etc are usual choices).
-3. Replace the word GCIDE_PATH in the "gcide.conf" statement with the
-path to the gcide-0.51 dicrectory. You can omit this step and use the
--D option instead:
+
+3. Replace the word GCIDE_PATH in the "gcide.conf" statement with the path
+to the gcide-0.51 dicrectory. You can omit this step and use the -D option
+instead:
+
4. Check the configuration file. Run:
@@ -305,23 +296,20 @@ If no errors are reported, then go to the step 5.
-5. Start "dicod". Run the same command as described in step 4, but
-without the "--lint" option. This will start the dictionary server
-which will be avaialble on localhost (127.0.0.1) port 2628. The
-server provides extensive searching facilities. It also parses the
-GCIDE markup and automatically reformats the articles before returning
-them.
+5. Start "dicod". Run the same command as described in step 4, but without
+the "--lint" option. This will start the dictionary server which will be
+avaialble on localhost (127.0.0.1) port 2628. The server provides extensive
+searching facilities. It also parses the GCIDE markup and automatically
+reformats the articles before returning them.
-Now you can access the dictionary using dico (a GNU dictionary command
-line utility), or another dictionary client program (such as Kdict or
-the like).
+Now you can access the dictionary using dico (a GNU dictionary command line
+utility), or another dictionary client program (such as Kdict or the like).
* OTHER VERSIONS OF THE DICTIONARY
-==================================
+
There are several other derivative versions of this dictionary on the
-internet, in some cases reformatted or provided with an interface.
-Those that I am aware of are:
+internet, in some cases reformatted or provided with an interface. Those
+that I am aware of are:
** Dicoweb
-----------
-This version of GCIDE is available online at the GNU Dico web
-site:
+
+This version of GCIDE is available online at the GNU Dico web site:
@@ -332,23 +320,23 @@ The site provides extensive search facilities.
** Project Gutenberg
----------------------
+
In the extext96 directory of Project Gutenberg
-(http://www.gutenberg.org/dirs/etext96), there is a version of the
-original 1913 dictionary, which is in the **public domain**. The main
-files are labeled pgw050*.*. The tags for that version are a subset
-of those used in this GNU version.
+(http://www.gutenberg.org/dirs/etext96), there is a version of the original
+1913 dictionary, which is in the **public domain**. The main files are
+labeled pgw050*.*. The tags for that version are a subset of those used in
+this GNU version.
** The DICT development group
-------------------------------
-This group has created a program to index and search this dictionary.
-The program can be downloaded and used locally, but at present is
-available only in a Unix-compatible executable version. See their web
-site at http://www.dict.org.
+
+This group has created a program to index and search this dictionary. The
+program can be downloaded and used locally, but at present is available only
+in a Unix-compatible executable version. See their web site at
+http://www.dict.org.
** The University of Chicago ARTFL project
-------------------------------------------
-Mark Olsen and Gavin LaRowe at the University of Chicago have
-converted the original 1913 dictionary to HTML and have provided an
-interface allowing search of the headwords. When the supplemented
-version has developed sufficiently to warrant the effort, a similar
-searchable version may be posted there as well. The search page is at:
+
+Mark Olsen and Gavin LaRowe at the University of Chicago have converted the
+original 1913 dictionary to HTML and have provided an interface allowing
+search of the headwords. When the supplemented version has developed
+sufficiently to warrant the effort, a similar searchable version may be
+posted there as well. The search page is at:
@@ -356,5 +344,5 @@ searchable version may be posted there as well. The search page is at:
-That page will provide links to other ARTFL projects and contact
-information for the ARTFL group, who alone can provide information
-about the HTML version or interface.
+That page will provide links to other ARTFL projects and contact information
+for the ARTFL group, who alone can provide information about the HTML
+version or interface.
@@ -366,2 +354,3 @@ paragraph-separate: "[ ]*$"
version-control: never
+fill-column: 76
End:
diff --git a/tagset.txt b/tagset.txt
index 9a7a501..0093d42 100644
--- a/tagset.txt
+++ b/tagset.txt
@@ -1,11 +1,12 @@
- FIELD MARKS FOR WEBSTER 1913 and CIDE
- =====================================
- Explanations of the tags used to mark the Webster 1913 dictionary
-and the CIDE (Collaborative International Dictionary of English).
-Note that the list of tags used to mark the public domain version
-of this dictionary is shorter than the full set described here.
- If any tag is not listed here, it is either (1) one of the
-"point" (font size) or "type" (font style) tags, which should be
-self-explanatory; or (2) is a functional field with no effect on the
-typography.
+FIELD MARKS FOR WEBSTER 1913 and CIDE
+=====================================
+
+* Overview
+
+This file describes the tags used to mark the Webster 1913 dictionary and
+the GCIDE (GNU Collaborative International Dictionary of English).
+
+If any tag is not listed here, it is either (1) one of the "point" (font
+size) or "type" (font style) tags, which should be self-explanatory; or (2)
+is a functional field with no effect on the typography.
@@ -17,110 +18,141 @@ Last modified March 12, 1999.
(908) 561-3416 or (908) 668-5252
--------------------------------------------------------------
-A separate file, webfont.txt, contains the list of the individual
+
+A separate file, webfont.txt, contains the list of the individual
non-ASCII characters represented by either higher-order hexadecimal
-character marks (e.g., \'94, for o-umlaut) or by entity tags
-(e.g., <root/, for the square root symbol.)
---------------------------------------------------------------
- Use of tags:
- In the MICRA electronic version of the 1913 Webster, each part of
-the entry headed by an entry word ("headword") is labeled so that no
-part of the entry except some punctuation marks should be found
-outside of all fields, i.e. every character should be within some tagged
-field. In the following description, the word "segment" usually refers to
-a major part of an entry such as an etymology or a definition or a
-collocation segment or a usage block, containing more than one field.
-The term "field" may also be used similarly to "segment", but may also
-denote single-word fields, such as an alternative spelling, labeled <asp>.
-
- Note: The tags on this list are similar in structure to SGML tags. Each
-tag on this list marks a field; each field opens with a tagname between
-angle brackets thus: <tagname>, and closes with a similar tag containing
-the forward slash thus: </tagname>. No tags are used without closing
-tags. Thus the HTML <BR> to indicate a line break is symbolized
-here as an entity, <br/, and every <p> has a corresponding </p>.
- The absence of an end-field tag, or the presence of an end-field tag
-without a prior begin-field tag constitutes a typographical error, of which
-there may be a significant number. Any errors detected should be brought
-to the attention of PJC or the appropriate editor.
- Most of the tagged fields are presented in the text in italic type,
-with a number of exceptions. Where a word is contained within more than
-one field, the innermost field determines the font to be used. Wherever
-recognizable functional fields were found, an attempt was made to tag the
-field with a functional mark, but in many cases, words were italicised only
-to represent the word itself as a discourse entity, and in some such cases,
-the "italic" mark <it> was used, implying nothing regarding functionality
-of the word. The base font is considered "plain". Where an italic field
-is indicated, parentheses or brackets within the field are not italicised.
- Where no font is specified for a tag, the tag is merely a functional
+character marks (e.g., \'94, for o-umlaut) or by entity tags (e.g.,
+<root/, for the square root symbol.)
+
+* Introduction
+
+In the MICRA electronic version of the 1913 Webster and in GCIDE, each part
+of the entry headed by an entry word ("headword") is labeled so that no part
+of the entry except some punctuation marks should be found outside of all
+fields, i.e. every character should be within some tagged field. In the
+following description, the word "segment" usually refers to a major part of
+an entry such as an etymology or a definition or a collocation segment or a
+usage block, containing more than one field. The term "field" may also be
+used similarly to "segment", but may also denote single-word fields, such as
+an alternative spelling, labeled <asp>.
+
+The tags on this list are similar in structure to SGML tags. Each tag on
+this list marks a field; each field opens with a tagname between angle
+brackets thus: <tagname>, and closes with a similar tag containing the
+forward slash thus: </tagname>. No tags are used without closing tags.
+Thus a line break (similar to HTML <br> tag) is symbolized here as an
+entity, <br/, and every <p> has a corresponding </p>.
+
+The absence of an end-field tag, or the presence of an end-field tag without
+a prior begin-field tag constitutes a typographical error, of which there
+may be a significant number. Any errors detected should be brought to the
+attention of PJC or the appropriate editor.
+
+Most of the tagged fields are presented in the text in italic type, with a
+number of exceptions. Where a word is contained within more than one field,
+the innermost field determines the font to be used. Wherever recognizable
+functional fields were found, an attempt was made to tag the field with a
+functional mark, but in many cases, words were italicised only to represent
+the word itself as a discourse entity, and in some such cases, the "italic"
+mark <it> was used, implying nothing regarding functionality of the word.
+The base font is considered "plain". Where an italic field is indicated,
+parentheses or brackets within the field are not italicised.
+
+Where no font is specified for a tag, the tag is merely a functional
division, and was printed in plain font unless otherwise tagged. This type
-of segment is marked by an asterisk (*) where the font name would be.
- The size of the "plain" font in the original text is about 1.6 mm for
-the height of capitalized letters.
-=============================================================
-Explicit typographical tags:
- These were used where the purpose of a different font was merely to
-distinguish a word from the body of the text, and no explicit functional
-tag seemed apropriate.
------------------------------------
-Tag Font
------------------------------------
-Explicit formatting tags:
-. . . . . . . . . . . . . . . . . .
-<plain> plain font (that used in the body of a definition) --
- normally not marked, except within fields of
- a different front.
-<it> italic (in master files)
-<i> italic (for use in HTML presentation)
-<bold> bold (in master files)
-<b> bold (for use in HTML presentation)
-<colf> bold, Collocation font. Same font as used in collocations.
- smaller This is used only in the list of "un-" words not
- by 1 point actually defined in the dictionary. Probably could be
- replaced by a segment mark for the entire list!
- The "un-" words should be indexed as headwords.
-
-<ct> bold Same as <colf>, a font similar to that used in
- collocations. However, this tag is used in a table
- and could be set to a different font.
-
-<h1> * HTML tag -- largest heading font.
-
-<h2> * HTML tag -- second largest heading font.
-
-<headrow> * Marks a Row title in a table.
-
-<hwf> Font the same as the headword <hw>, though the field is
- not a headword. Used only once.
-
-<mitem> * Multiple items, a set of items in a table.
-<point ...> A series of point size markers, many unique.
-<point1.5> * One of the tags of the form <point**> where **
-<point6> represents the typographic point size of the
- enclosed text.
-<pre> An HTML tag indicating that the enclosed text is
- of teletype form, preformatted in a uniform-spaced
- font.
-<sc> small caps (used mostly for "a. d.", "b. c.")
- This is the same font a <er>, but has no functional
- or semantic significance
-<str> group of table data elements in a table
-<sub> subscript, like <subs>
-<subs> subscript
-<sups> superscript
-<supr> superscript
-<sansserif> Sans-serif font
-<stypec> Bold (collocation font) and also a subtype.
-<tt> HTML tage -- teletype font
-<universbold> A squared bold font without serifs approximating the
- "universe bold" font on the HP Laserjet4, slightly
- larger than the capitals in a definition body. Used
- in expositions describing shapes, such as
- "Y", "T", "U", "X", "V", "F".
-<vertical> Vertically organized column.
-<column1> Vertically organized column -- only part of a table
- which needs to be completed. Used once.
-<...type> A series of tags, many unique, designating certain
- unusual fonts, such as "bourgeoistype" for
- "bourgeois type", in the section on typography.
- Most of these occur only once, in the section on fonts.
+of segment is marked by an asterisk (*) where the font name would be. The
+size of the "plain" font in the original text is about 1.6 mm for the height
+of capitalized letters.
+
+* Explicit typographical tags
+
+These were used where the purpose of a different font was merely to
+distinguish a word from the body of the text, and no explicit functional tag
+seemed apropriate.
+
+-------------------------------------------------------------------------
+Tag Font Description
+-------------------------------------------------------------------------
+<plain> plain font that used in the body of a definition -- normally
+ not marked, except within fields of a different
+ front.
+
+<it> italic in master files
+
+<i> italic for use in HTML presentation
+
+<bold> bold in master files
+
+<b> bold for use in HTML presentation
+
+<colf> bold, Collocation font. Same font as used in
+ collocations.
+ smaller This is used only in the list of "un-"
+ by 1 point words not actually defined in the
+ dictionary.
+ Probably could be replaced by a segment mark
+ for the entire list! The "un-" words should
+ be indexed as headwords.
+
+<ct> bold Same as <colf>, a font similar to that used
+ in collocations. However, this tag is used
+ in a table and could be set to a different
+ font.
+
+<h1> * HTML tag -- largest heading font.
+
+<h2> * HTML tag -- second largest heading font.
+
+<headrow> * Marks a Row title in a table.
+
+<hwf> Font the same as the headword <hw>, though
+ the field is not a headword. Used only
+ once.
+
+<mitem> * Multiple items, a set of items in a table.
+<point ...> A series of point size markers, many
+ unique.
+
+<point1.5> * One of the tags of the form <point**> where **
+<point6> represents the typographic point size of the
+ enclosed text.
+
+<pre> An HTML tag indicating that the enclosed
+ text is of teletype form, preformatted in a
+ uniform-spaced font.
+
+<sc> small caps used mostly for "a. d.", "b. c."
+ This is the same font as in <er>, but has no
+ functional or semantic significance.
+
+<str> group of table data elements in a table.
+
+<sub> subscript
+
+<subs> subscript
+
+<sups> superscript
+
+<supr> superscript
+
+<sansserif> Sans-serif
+
+<stypec> Bold collocation font, and also a subtype.
+
+<tt> HTML tage -- teletype font
+
+<universbold> A squared bold font without serifs approximating
+ the "universe bold" font on the HP Laserjet4,
+ slightly larger than the capitals in a definition
+ body. Used in expositions describing shapes,
+ such as "Y", "T", "U", "X", "V", "F".
+
+<vertical> Vertically organized column.
+
+<column1> Vertically organized column -- only part of a table
+ which needs to be completed. Used once.
+
+<...type> A series of tags, many unique, designating
+ certain unusual fonts, such as "bourgeoistype"
+ for "bourgeois type", in the section on
+ typography. Most of these occur only once, in
+ the section on fonts. Some examples follow:
<antiquetype>
@@ -148,347 +180,382 @@ Explicit formatting tags:
-=============================================================
-Tags with semantic content:
-. . . . . . . . . . . . . . . . . . . . . . . . . . .
-<altsp> * Alternative spelling segment. Almost always
- contained within square brackets after the main
- definition segment. Expository words
- such as "Spelled also" are in plain font;
- the actual alternative spelling is marked by
- <asp> ... </asp> tags within this segment.
-
-<ant> italic Antonym.
-
-<asp> italic Alternative spelling. The actual word which is an
- alternative spelling to the headword. These
- are functionally synonyms of the headword. In
- most cases these also occur as headwords, with
- reference to the word where the actual definition
- is found, but not all such words are listed
- separately, particularly if the spelling is
- close enough to the headword to be found at the
- same point in the dictionary. Whether listed
- separately or not, these words should
- be indexed at this location, also.
-
-<au> italic Authority or author. Used where an authority is
- (may be right- given for a definition, and also used for the
- justified. See author, where a quotation within double quotes
- in the section is given in the same paragraph as the
- on formatting). definition. The double quotes are indicated
- by the open-quote (\'bd) and close-quote
- (\'b8). In both cases, it is typically
- right-justified, almost always fitting on
- the same line with the last line of the
- definition or quotation.
- Within collocation segments, it is usually
- used only after quotations, and is not right-
- justified, except occasionally where it
+* Tags with semantic content:
+
+-------------------------------------------------------------------------
+Tag Font Meaning and Description
+-------------------------------------------------------------------------
+<altsp> * Alternative spelling segment. Almost always
+ contained within square brackets after the main
+ definition segment. Expository words such as
+ "Spelled also" are in plain font; the actual
+ alternative spelling is marked by <asp> ...
+ </asp> tags within this segment.
+
+<ant> italic Antonym.
+
+<asp> italic Alternative spelling. The actual word which is
+ an alternative spelling to the headword. These
+ are functionally synonyms of the headword. In
+ most cases these also occur as headwords, with
+ reference to the word where the actual definition
+ is found, but not all such words are listed
+ se