Revise tagset.txt

* tagset.txt: Review. * README: Reformat. * webfont.txt: Reformat. Document <and/ and <or/.
author: Sergey Poznyakoff <gray@gnu.org.ua> 2012-02-03 12:48:52 +0200
committer: Sergey Poznyakoff <gray@gnu.org.ua> 2012-02-03 12:48:52 +0200
commit: d18a469b7a5a4d4b5da21eab37f34ab1e99a8dce (patch)
tree: 7eb331e376e85287c25b6a9734dae58a4724da8a
parent: 4a458db06b28492a7e48b1a0560b35778e476482 (diff)
download: gcide-d18a469b7a5a4d4b5da21eab37f34ab1e99a8dce.tar.gz
gcide-d18a469b7a5a4d4b5da21eab37f34ab1e99a8dce.tar.bz2
3 files changed, 1385 insertions, 1337 deletions
diff --git a/README b/README
index b8d21ad..4a36e8b 100644
--- a/README
+++ b/README
@@ -10,25 +10,23 @@ The README file
 * OVERVIEW
-==========
-This document describes the GNU version of the Collaborative
-International Dictionary of English.  It is organized into a series of
-chapters, introduced by headings beginning with a single asterisk.  A
-chapter may have sections, which are marked with two asterisks.  For
-those readers who use Emacs, this structure corresponds to its
-"Outline mode", which will be enabled automatically upon loading this
-file.
-
-The chapter "INTRODUCTION" describes the structure of this package.
-The chapter "STRUCTURE OF THE DICTIONARY" describes the dictionary
-structure in general.  An overview of the markup tags is provided in
-the chapter "TAGS".  A detailed information about dictionary markup
-can be obtained from a set of ancillary files included in this
-package, which are described in the chapter "ANCILLARY FILES".
-
-The chapter "DICTIONARY LOOKUP" describes how to use GNU Dico for
-reading this dictionary.  Finally, other versions of the Webster
-dictionary are listed in the chapter "OTHER VERSIONS OF THE
-DICTIONARY".
+
+This document describes the GNU version of the Collaborative International
+Dictionary of English.  It is organized into a series of chapters,
+introduced by headings beginning with a single asterisk.  A chapter may have
+sections, which are marked with two asterisks.  For those readers who use
+Emacs, this structure corresponds to its "Outline mode", which will be
+enabled automatically upon loading this file.
+
+The chapter "INTRODUCTION" describes the structure of this package.  The
+chapter "STRUCTURE OF THE DICTIONARY" describes the dictionary structure in
+general.  An overview of the markup tags is provided in the chapter "TAGS".
+A detailed information about dictionary markup can be obtained from a set of
+ancillary files included in this package, which are described in the chapter
+"ANCILLARY FILES".
+
+The chapter "DICTIONARY LOOKUP" describes how to use GNU Dico for reading
+this dictionary.  Finally, other versions of the Webster dictionary are
+listed in the chapter "OTHER VERSIONS OF THE DICTIONARY".
     
 * INTRODUCTION
-==============
+
 The dictionary was derived from the
@@ -50,14 +48,13 @@ and is being proof-read and supplemented by volunteers from around the
 world.  This is an unfunded project, and future enhancement of this
-dictionary will depend on the efforts of volunteers willing to help
-build this free resource into a comprehensive body of general
-information.  New definitions for missing words or words senses and
-longer explanatory notes, as well as images to accompany the articles
-are needed.  More modern illustrative quotations giving recent
-examples of usage of the words in their various senses will be very
-helpful, since most quotations in the original 1913 dictionary are now
-well over 100 years old.
-
-This electronic version is being maintained by World Soul, a
-non-profit organization in Plainfield, NJ.  For additional information
-or if you are willing to assist construction of this data source, contact:
+dictionary will depend on the efforts of volunteers willing to help build
+this free resource into a comprehensive body of general information.  New
+definitions for missing words or words senses and longer explanatory notes,
+as well as images to accompany the articles are needed.  More modern
+illustrative quotations giving recent examples of usage of the words in
+their various senses will be very helpful, since most quotations in the
+original 1913 dictionary are now well over 100 years old.
+
+This electronic version is being maintained by World Soul, a non-profit
+organization in Plainfield, NJ.  For additional information or if you are
+willing to assist construction of this data source, contact:
 
@@ -71,36 +68,34 @@ or if you are willing to assist construction of this data source, contact:
 
-GCIDE is free software; you can redistribute it and/or modify
-it under the terms of the GNU General Public License as published by
-the Free Software Foundation; either version 2, or (at your option)
-any later version.
+GCIDE is free software; you can redistribute it and/or modify it under the
+terms of the GNU General Public License as published by the Free Software
+Foundation; either version 2, or (at your option) any later version.
 
-GCIDE is distributed in the hope that it will be useful,
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-GNU General Public License for more details.
+GCIDE is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more
+details.
 
-You should have received a copy of the GNU General Public License
-along with this copy of GCIDE; see the file COPYING.  If not, write 
-to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
-Boston, MA 02111-1307, USA.
+You should have received a copy of the GNU General Public License along with
+this copy of GCIDE; see the file COPYING.  If not, write to the Free
+Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
+02111-1307, USA.
 
 * STRUCTURE OF THE DICTIONARY
-=============================
-When the archive is unpacked, the main dictionary text of the GCIDE
-will be found in 26 files named "CIDE.*", where the asterisk indicates
-which letter of the alphabet begins the words in each file.  For
-example, file "CIDE.B" contains words beginning with the letter "B".
-Additional information about the tagging conventions and special
-character symbols are contained in ancillary files in this directory
-(see below the section entitled "ANCILLARY FILES").  The main body of
-the 1913 dictionary was essentially identical to the edition published
-in 1890, and was republished in 1913 with an appendix containing "New
-Words".  The new words of that appendix have been integrated into the
-main file in this version.  However, it is important to keep in mind
-that the definitions in this dictionary are in most cases over 100
+
+When the archive is unpacked, the main dictionary text of the GCIDE will be
+found in 26 files named "CIDE.*", where the asterisk indicates which letter
+of the alphabet begins the words in each file.  For example, file "CIDE.B"
+contains words beginning with the letter "B".  Additional information about
+the tagging conventions and special character symbols are contained in
+ancillary files in this directory (see below the section entitled "ANCILLARY
+FILES").  The main body of the 1913 dictionary was essentially identical to
+the edition published in 1890, and was republished in 1913 with an appendix
+containing "New Words".  The new words of that appendix have been integrated
+into the main file in this version.  However, it is important to keep in
+mind that the definitions in this dictionary are in most cases over 100
 years old.  Use them with caution!
 
-At the bottom of each paragraph in this dictionary, there is a
-bracketed and tagged "source" indicated.  This tells from where the
-definition or other text in that paragraph came, as follows:
+At the bottom of each paragraph in this dictionary, there is a bracketed and
+tagged "source" indicated.  This tells from where the definition or other
+text in that paragraph came, as follows:
 
@@ -119,6 +114,5 @@ definition or other text in that paragraph came, as follows:
 
-The original definitions have been tagged and in some cases
-reformatted or slightly rearranged.  If substantive information is
-added from a second source, usually the additional source is also
-noted, as in:
+The original definitions have been tagged and in some cases reformatted or
+slightly rearranged.  If substantive information is added from a second
+source, usually the additional source is also noted, as in:
 
@@ -126,35 +120,32 @@ noted, as in:
 
-This version is tagged with SGML-like tags of the form <pos>...</pos> 
-so that the original typography (italics, bold, block quotes) can be
-reproduced.  A list of the most important tags for fields in the
-dictionary is given below.  The tags also serve the more important
-function of allowing the information content to be conveniently
-imported into computer programs or databases.  The set of tags used is
-described in the accompanying file "tagset.txt".  ***NOTE*** the
-paragraph tags <p>...</p> do *not* always nest properly with certain
-other tags, such as <note> and <cs> ("collocation section"), which in
-some cases span multiple paragraphs.  If you are using a tag parser
-which detects improper nesting, you should first either delete the
-paragraph tags or convert them to non-tag symbols, or, if possible,
-set the parser to ignore the <p>...</p> tags.
-
-The unusual characters (such as Greek or the European accented
-characters, as well as special characters used in the pronunciations)
-are described in the accompanying file "webfont.txt".  Some
-information on the pronunciation system used may be found by viewing
-the file "pronunc.jpg", and additional explanations of pronunciation
-are in the file "pronunc.txt".
-
-Each paragraph of the original text is enclosed within tags of the
-form <p> . . . </p>.  Within these paragraphs there are no line
-breaks, and some of the paragraphs are over 12,000 characters long,
-which may prove too long to be handled by some editors.  At some
-points, embedded line breaks within a "paragraph" are marked by a <br/
-"entity".  The file can therefore be converted, if necessary, to a
-form with shorter lines, and subsequently reconverted back to the form
-having one line per paragraph.
-
-If additional line breaks are added, then in order to remove the line
-breaks and reconstruct the original paragraphs, so that the page width
-can be adjusted, perform the following manipulations:
+This version is tagged with SGML-like tags of the form <pos>...</pos> so
+that the original typography (italics, bold, block quotes) can be
+reproduced.  A list of the most important tags for fields in the dictionary
+is given below.  The tags also serve the more important function of allowing
+the information content to be conveniently imported into computer programs
+or databases.  The set of tags used is described in the accompanying file
+"tagset.txt".  ***NOTE*** the paragraph tags <p>...</p> do *not* always nest
+properly with certain other tags, such as <note> and <cs> ("collocation
+section"), which in some cases span multiple paragraphs.  If you are using a
+tag parser which detects improper nesting, you should first either delete
+the paragraph tags or convert them to non-tag symbols, or, if possible, set
+the parser to ignore the <p>...</p> tags.
+
+The unusual characters (such as Greek or the European accented characters,
+as well as special characters used in the pronunciations) are described in
+the accompanying file "webfont.txt".  Some information on the pronunciation
+system used may be found by viewing the file "pronunc.jpg", and additional
+explanations of pronunciation are in the file "pronunc.txt".
+
+Each paragraph of the original text is enclosed within tags of the form <p>
+. . . </p>.  Within these paragraphs there are no line breaks, and some of
+the paragraphs are over 12,000 characters long, which may prove too long to
+be handled by some editors.  At some points, embedded line breaks within a
+"paragraph" are marked by a <br/ "entity".  The file can therefore be
+converted, if necessary, to a form with shorter lines, and subsequently
+reconverted back to the form having one line per paragraph.
+
+If additional line breaks are added, then in order to remove the line breaks
+and reconstruct the original paragraphs, so that the page width can be
+adjusted, perform the following manipulations:
 
@@ -166,16 +157,15 @@ can be adjusted, perform the following manipulations:
      
-A more sophisticated formatting of spaces within paragraphs may
-require the use of the fully-tagged master files.  If you have a need
-for these files, contact Patrick Cassidy: cassidy@micra.com. 
-
-The approximate beginning of each page is marked by an SGML comment of
-the form <-- p. 345 -->.  (The exact beginning was in some cases in
-the middle of a paragraph, which we decided was not a good location
-for these page-number comments, so the page number was usually moved
-to the next paragraph break).  Pages which have been proofread by
-volunteers (e.g., with initials VOL) will have a note within that page
-comment: <-- p. 345 pr=VOL -->.  Pages which have not been proofread
-yet (most of them) will have varying numbers of typographical errors
-in them.   We still (January 2012) need proofreaders to get the errors
-out of these dictionary files. 
+A more sophisticated formatting of spaces within paragraphs may require the
+use of the fully-tagged master files.  If you have a need for these files,
+contact Patrick Cassidy: cassidy@micra.com.
+
+The approximate beginning of each page is marked by an SGML comment of the
+form <-- p. 345 -->.  (The exact beginning was in some cases in the middle
+of a paragraph, which we decided was not a good location for these
+page-number comments, so the page number was usually moved to the next
+paragraph break).  Pages which have been proofread by volunteers (e.g., with
+initials VOL) will have a note within that page comment: <-- p. 345 pr=VOL
+-->.  Pages which have not been proofread yet (most of them) will have
+varying numbers of typographical errors in them.  We still (January 2012)
+need proofreaders to get the errors out of these dictionary files.
 
@@ -183,25 +173,23 @@ out of these dictionary files.
 
-This version is only a first typing, and has numerous typographic
-errors, including errors in the field-marks.  In addition, the user
-must keep in mind that this text is very old and will contain numerous 
-obsolete, inaccurate, and perhaps offensive statements, which are 
-included solely because this work is intended to reproduce accurately
-this historically interesting classic reference work.  This text should 
-not be relied upon as an accurate source of information, as in many
-cases it represents the state of knowledge around 1890.  The text is
-provided "as is", and the user must accept responsibility for all
-consequences  of its use. Please refer to the header of each file and
-the GNU public license.  If these conditions of use are unacceptable,
-please do not use these texts.
-
-This electronic dictionary is also made available as a potential
-starting point for development of a modern comprehensive encyclopedic
-dictionary, to be accessible freely on the internet, and developed by
-the efforts of all individuals willing to help build a large and
-freely available knowledge base.  A large number of collaborators are
-needed to bring this dictionary to a more accurate, more modern,  and
-more useful state. Anyone willing to assist in any way in constructing
-such a knowledge base should contact Patrick Cassidy (see above).  All
-reports of errors will be gratefully received, and should also be
-transmitted to PC at: pc@worldsoul.org.
+This version is only a first typing, and has numerous typographic errors,
+including errors in the field-marks.  In addition, the user must keep in
+mind that this text is very old and will contain numerous obsolete,
+inaccurate, and perhaps offensive statements, which are included solely
+because this work is intended to reproduce accurately this historically
+interesting classic reference work.  This text should not be relied upon as
+an accurate source of information, as in many cases it represents the state
+of knowledge around 1890.  The text is provided "as is", and the user must
+accept responsibility for all consequences of its use. Please refer to the
+header of each file and the GNU public license.  If these conditions of use
+are unacceptable, please do not use these texts.
+
+This electronic dictionary is also made available as a potential starting
+point for development of a modern comprehensive encyclopedic dictionary, to
+be accessible freely on the internet, and developed by the efforts of all
+individuals willing to help build a large and freely available knowledge
+base.  A large number of collaborators are needed to bring this dictionary
+to a more accurate, more modern, and more useful state. Anyone willing to
+assist in any way in constructing such a knowledge base should contact
+Patrick Cassidy (see above).  All reports of errors will be gratefully
+received, and should also be transmitted to PC at: pc@worldsoul.org.
 
@@ -237,4 +225,4 @@ For other tags, see the file "tagset.txt"
 In addition to the main text of the dictionary, additional explanatory
-material about this version of the dictionary is available in the
-ancillary files:
+material about this version of the dictionary is available in the ancillary
+files:
 
@@ -259,4 +247,4 @@ pronunciations.
 
-A copy of the dictionary page describing the pronunciation symbols used
-in the original work.
+A copy of the dictionary page describing the pronunciation symbols used in
+the original work.
 
@@ -264,4 +252,4 @@ in the original work.
 
-This file lists original pronunciation symbols with the corresponding
-markup entities used in this version.
+This file lists original pronunciation symbols with the corresponding markup
+entities used in this version.
 
@@ -277,22 +265,25 @@ A copy of the original title page.
 
-Description of the special escape sequences used in this dictionary.
-This file also explains the Greek transliteration syntax used in it.
+Description of the special escape sequences used in this dictionary.  This
+file also explains the Greek transliteration syntax used in it.
 
 * DICTIONARY LOOKUP
-===================
+
 The GNU Dico project contains a module for reading GCIDE files.  This
-distribution provides a configuration file "gcide.conf" which you can
-use with the "dicod" server in order to look up words in the
-dictionary.  See http://www.gnu.org.ua/software/dico for a description
-of GNU Dico, including links to download.
+distribution provides a configuration file "gcide.conf" which you can use
+with the "dicod" server in order to look up words in the dictionary.  See
+http://www.gnu.org.ua/software/dico for a description of GNU Dico, including
+links to download.
 
-The instructions below describe how to configure GNU Dico server
-(dicod) to access a copy of the GCIDE dictionary.
+The instructions below describe how to configure GNU Dico server (dicod) to
+access a copy of the GCIDE dictionary.
 
 1. Unpack the GCIDE dictionary;
+
 2. Copy the file "gcide.conf" to a directory where you keep your local
 configuration files (/etc or /usr/local/etc are usual choices).
-3. Replace the word GCIDE_PATH in the "gcide.conf" statement with the
-path to the gcide-0.51 dicrectory.  You can omit this step and use the
--D option instead:
+
+3. Replace the word GCIDE_PATH in the "gcide.conf" statement with the path
+to the gcide-0.51 dicrectory.  You can omit this step and use the -D option
+instead:
+
 4. Check the configuration file.  Run:
@@ -305,23 +296,20 @@ If no errors are reported, then go to the step 5.
 
-5. Start "dicod".  Run the same command as described in step 4, but
-without the "--lint" option.  This will start the dictionary server
-which will be avaialble on localhost (127.0.0.1) port 2628.  The
-server provides extensive searching facilities.  It also parses the
-GCIDE markup and automatically reformats the articles before returning
-them.
+5. Start "dicod".  Run the same command as described in step 4, but without
+the "--lint" option.  This will start the dictionary server which will be
+avaialble on localhost (127.0.0.1) port 2628.  The server provides extensive
+searching facilities.  It also parses the GCIDE markup and automatically
+reformats the articles before returning them.
 
-Now you can access the dictionary using dico (a GNU dictionary command
-line utility), or another dictionary client program (such as Kdict or
-the like).
+Now you can access the dictionary using dico (a GNU dictionary command line
+utility), or another dictionary client program (such as Kdict or the like).
 
 * OTHER VERSIONS OF THE DICTIONARY
-==================================
+
 There are several other derivative versions of this dictionary on the
-internet, in some cases reformatted or provided with an interface.
-Those that I am aware of are:
+internet, in some cases reformatted or provided with an interface.  Those
+that I am aware of are:
 
 ** Dicoweb 
-----------
-This version of GCIDE is available online at the GNU Dico web
-site:
+
+This version of GCIDE is available online at the GNU Dico web site:
 
@@ -332,23 +320,23 @@ The site provides extensive search facilities.
 ** Project Gutenberg
----------------------
+
 In the extext96 directory of Project Gutenberg
-(http://www.gutenberg.org/dirs/etext96), there is a version of the
-original 1913 dictionary, which is in the **public domain**.  The main
-files are labeled pgw050*.*.  The tags for that version are a subset
-of those used in this GNU version.
+(http://www.gutenberg.org/dirs/etext96), there is a version of the original
+1913 dictionary, which is in the **public domain**.  The main files are
+labeled pgw050*.*.  The tags for that version are a subset of those used in
+this GNU version.
 
 ** The DICT development group
-------------------------------
-This group has created a program to index and search this dictionary.
-The program can be downloaded and used locally, but at present is
-available only in a Unix-compatible executable version.  See their web
-site at http://www.dict.org.
+
+This group has created a program to index and search this dictionary.  The
+program can be downloaded and used locally, but at present is available only
+in a Unix-compatible executable version.  See their web site at
+http://www.dict.org.
 
 ** The University of Chicago ARTFL project
-------------------------------------------
-Mark Olsen and Gavin LaRowe at the University of Chicago have
-converted the original 1913 dictionary to HTML and have provided an
-interface allowing search of the headwords.  When the supplemented
-version has developed sufficiently to warrant the effort, a similar
-searchable version may be posted there as well.  The search page is at:
+
+Mark Olsen and Gavin LaRowe at the University of Chicago have converted the
+original 1913 dictionary to HTML and have provided an interface allowing
+search of the headwords.  When the supplemented version has developed
+sufficiently to warrant the effort, a similar searchable version may be
+posted there as well.  The search page is at:
 
@@ -356,5 +344,5 @@ searchable version may be posted there as well.  The search page is at:
 
-That page will provide links to other ARTFL projects and contact
-information for the ARTFL group, who alone can provide information  
-about the HTML version or interface.
+That page will provide links to other ARTFL projects and contact information
+for the ARTFL group, who alone can provide information about the HTML
+version or interface.
 
@@ -366,2 +354,3 @@ paragraph-separate: "[ 	]*$"
 version-control: never
+fill-column: 76
 End:
diff --git a/tagset.txt b/tagset.txt
index 9a7a501..0093d42 100644
--- a/tagset.txt
+++ b/tagset.txt
@@ -1,11 +1,12 @@
-            FIELD MARKS FOR WEBSTER 1913 and CIDE
-            =====================================
-     Explanations of the tags used to mark the Webster 1913 dictionary
-and the CIDE (Collaborative International Dictionary of English).
-Note that the list of tags used to mark the public domain version
-of this dictionary is shorter than the full set described here.
-    If any tag is not listed here, it is either (1) one of the 
-"point" (font size) or "type" (font style) tags, which should be
-self-explanatory; or (2) is a functional field with no effect on the
-typography.
+FIELD MARKS FOR WEBSTER 1913 and CIDE
+=====================================
+
+* Overview
+
+This file describes the tags used to mark the Webster 1913 dictionary and
+the GCIDE (GNU Collaborative International Dictionary of English).
+
+If any tag is not listed here, it is either (1) one of the "point" (font
+size) or "type" (font style) tags, which should be self-explanatory; or (2)
+is a functional field with no effect on the typography.
 
@@ -17,110 +18,141 @@ Last modified March 12, 1999.
      (908) 561-3416   or (908) 668-5252
--------------------------------------------------------------
-A separate file, webfont.txt, contains the list of the individual 
+
+A separate file, webfont.txt, contains the list of the individual
 non-ASCII characters represented by either higher-order hexadecimal
-character marks (e.g., \'94, for o-umlaut) or by entity tags
-(e.g., <root/, for the square root symbol.)
---------------------------------------------------------------
-     Use of tags:
-     In the MICRA electronic version of the 1913 Webster, each part of
-the entry headed by an entry word ("headword") is labeled so that no
-part of the entry except some punctuation marks should be found
-outside of all fields, i.e. every character should be within some tagged
-field.  In the following description, the word "segment" usually refers to 
-a major part of an entry such as an etymology or a definition or a 
-collocation segment or a usage block, containing more than one field.
-The term "field" may also be used similarly to "segment", but may also
-denote single-word fields, such as an alternative spelling, labeled <asp>.
-
-   Note: The tags on this list are similar in structure to SGML tags.  Each
-tag on this list  marks a field; each field opens with a tagname between
-angle brackets thus: <tagname>, and closes with a similar tag containing
-the forward slash thus: </tagname>.  No tags are used without closing
-tags.  Thus the HTML <BR> to indicate a line break is symbolized
-here as an entity, <br/, and every <p> has a corresponding </p>.
-    The absence of an end-field tag, or the presence of an end-field tag
-without a prior begin-field tag constitutes a typographical error, of which
-there may be a significant number.  Any errors detected should be brought
-to the attention of PJC or the appropriate editor.
-   Most of the tagged fields are presented in the text in italic type, 
-with a number of exceptions.  Where a word is contained within more than 
-one field, the innermost field determines the font to be used.  Wherever
-recognizable functional fields were found, an attempt was made to tag the
-field with a functional mark, but in many cases, words were italicised only
-to represent the word itself as a discourse entity, and in some such cases,
-the "italic" mark <it> was used, implying nothing regarding functionality
-of the word.  The base font is considered "plain".  Where an italic field
-is indicated, parentheses or brackets within the field are not italicised.
-   Where no font is specified for a tag, the tag is merely a functional
+character marks (e.g., \'94, for o-umlaut) or by entity tags (e.g.,
+<root/, for the square root symbol.)
+
+* Introduction
+
+In the MICRA electronic version of the 1913 Webster and in GCIDE, each part
+of the entry headed by an entry word ("headword") is labeled so that no part
+of the entry except some punctuation marks should be found outside of all
+fields, i.e. every character should be within some tagged field.  In the
+following description, the word "segment" usually refers to a major part of
+an entry such as an etymology or a definition or a collocation segment or a
+usage block, containing more than one field.  The term "field" may also be
+used similarly to "segment", but may also denote single-word fields, such as
+an alternative spelling, labeled <asp>.
+
+The tags on this list are similar in structure to SGML tags.  Each tag on
+this list marks a field; each field opens with a tagname between angle
+brackets thus: <tagname>, and closes with a similar tag containing the
+forward slash thus: </tagname>.  No tags are used without closing tags.
+Thus a line break (similar to HTML <br> tag) is symbolized here as an
+entity, <br/, and every <p> has a corresponding </p>.
+
+The absence of an end-field tag, or the presence of an end-field tag without
+a prior begin-field tag constitutes a typographical error, of which there
+may be a significant number.  Any errors detected should be brought to the
+attention of PJC or the appropriate editor.
+
+Most of the tagged fields are presented in the text in italic type, with a
+number of exceptions.  Where a word is contained within more than one field,
+the innermost field determines the font to be used.  Wherever recognizable
+functional fields were found, an attempt was made to tag the field with a
+functional mark, but in many cases, words were italicised only to represent
+the word itself as a discourse entity, and in some such cases, the "italic"
+mark <it> was used, implying nothing regarding functionality of the word.
+The base font is considered "plain".  Where an italic field is indicated,
+parentheses or brackets within the field are not italicised.
+
+Where no font is specified for a tag, the tag is merely a functional
 division, and was printed in plain font unless otherwise tagged.  This type
-of segment is marked by an asterisk (*) where the font name would be.
-   The size of the "plain" font in the original text is about 1.6 mm for
-the height of capitalized letters.
-=============================================================
-Explicit typographical tags:
-   These were used where the purpose of a different font was merely to
-distinguish a word from the body of the text, and no explicit functional
-tag seemed apropriate.
------------------------------------
-Tag        Font
------------------------------------
-Explicit formatting tags:
-. . . . . . . . . . . . . . . . . . 
-<plain>    plain font (that used in the body of a definition) --
-              normally not marked, except within fields of
-              a different front.
-<it>       italic  (in master files)
-<i>        italic  (for use in HTML presentation)
-<bold>     bold    (in master files)
-<b>        bold    (for use in HTML presentation)
-<colf>   bold,    Collocation font.  Same font as used in collocations.
-        smaller      This is used only in the list of "un-" words not
-        by 1 point   actually defined in the dictionary.  Probably could be
-                     replaced by a segment mark for the entire list!
-                     The "un-" words should be indexed as headwords.
-
-<ct>   bold    Same as <colf>, a font similar to that used in 
-                 collocations.  However, this tag is used in a table
-                 and could be set to a different font.
-
-<h1>      *     HTML tag -- largest heading font.
-
-<h2>      *     HTML tag -- second largest heading font.
-
-<headrow> *    Marks a Row title in a table.
-
-<hwf>      Font the same as the headword <hw>, though the field is
-                 not a headword.  Used only once.
-
-<mitem>  *   Multiple items, a set of items in a table.
-<point ...> A series of point size markers, many unique.
-<point1.5> *  One of the tags of the form <point**> where **
-<point6>        represents the typographic point size of the 
-                enclosed text.
-<pre>     An HTML tag indicating that the enclosed text is
-             of teletype form, preformatted in a uniform-spaced
-             font.
-<sc>       small caps    (used mostly for "a. d.",  "b. c.")
-              This is the same font a <er>, but has no functional
-              or semantic significance
-<str>       group of table data elements in a table
-<sub>       subscript, like <subs>
-<subs>      subscript
-<sups>      superscript
-<supr>      superscript
-<sansserif> Sans-serif font
-<stypec>    Bold (collocation font) and also a subtype.
-<tt>         HTML tage -- teletype font
-<universbold>  A squared bold font without serifs approximating the
-               "universe bold" font on the HP Laserjet4, slightly 
-               larger than the capitals in a definition body.  Used
-               in expositions describing shapes, such as
-                  "Y", "T", "U", "X", "V", "F".
-<vertical>  Vertically organized column.
-<column1>   Vertically organized column -- only part of a table
-              which needs to be completed.  Used once.
-<...type>   A series of tags, many unique, designating certain
-              unusual fonts, such as "bourgeoistype" for
-              "bourgeois type", in the section on typography.
-           Most of these occur only once, in the section on fonts.
+of segment is marked by an asterisk (*) where the font name would be.  The
+size of the "plain" font in the original text is about 1.6 mm for the height
+of capitalized letters.
+
+* Explicit typographical tags
+
+These were used where the purpose of a different font was merely to
+distinguish a word from the body of the text, and no explicit functional tag
+seemed apropriate.
+
+-------------------------------------------------------------------------
+Tag           Font         Description 
+-------------------------------------------------------------------------
+<plain>       plain font   that used in the body of a definition -- normally
+                           not marked, except within fields of a different
+                           front.
+			   
+<it>          italic       in master files
+
+<i>           italic       for use in HTML presentation
+
+<bold>        bold         in master files
+
+<b>           bold         for use in HTML presentation
+
+<colf>        bold,        Collocation font.  Same font as used in
+                           collocations. 
+              smaller      This is used only in the list of "un-"
+	      by 1 point   words not actually defined in the
+			   dictionary. 
+	                   Probably could be replaced by a segment mark
+			   for the entire list!  The "un-" words should
+			   be indexed as headwords.
+			   
+<ct>          bold         Same as <colf>, a font similar to that used 
+                           in collocations.  However, this tag is used
+			   in a table and could be set to a different
+			   font.
+			   
+<h1>          *            HTML tag -- largest heading font.
+
+<h2>          *            HTML tag -- second largest heading font.
+
+<headrow>     *            Marks a Row title in a table.
+
+<hwf>                      Font the same as the headword <hw>, though
+                           the field is not a headword.  Used only
+			   once.
+			   
+<mitem>       *            Multiple items, a set of items in a table.
+<point ...>                A series of point size markers, many
+                           unique.
+			   
+<point1.5>    *            One of the tags of the form <point**> where **
+<point6>                   represents the typographic point size of the 
+                           enclosed text.
+			   
+<pre>                      An HTML tag indicating that the enclosed
+                           text is of teletype form, preformatted in a
+                           uniform-spaced font.
+			   
+<sc>          small caps   used mostly for "a. d.",  "b. c."
+                           This is the same font as in <er>, but has no
+                           functional or semantic significance.
+			   
+<str>                      group of table data elements in a table.
+
+<sub>         subscript
+
+<subs>        subscript
+
+<sups>        superscript
+
+<supr>        superscript
+
+<sansserif>   Sans-serif
+
+<stypec>      Bold         collocation font, and also a subtype.
+
+<tt>                       HTML tage -- teletype font
+
+<universbold>              A squared bold font without serifs approximating
+                           the "universe bold" font on the HP Laserjet4,
+                           slightly larger than the capitals in a definition
+                           body.  Used in expositions describing shapes,
+                           such as "Y", "T", "U", "X", "V", "F".
+			   
+<vertical>                 Vertically organized column.
+
+<column1>                  Vertically organized column -- only part of a table
+                           which needs to be completed.  Used once.
+			   
+<...type>                  A series of tags, many unique, designating
+                           certain unusual fonts, such as "bourgeoistype"
+                           for "bourgeois type", in the section on
+                           typography.  Most of these occur only once, in
+                           the section on fonts.  Some examples follow:
 <antiquetype>
@@ -148,347 +180,382 @@ Explicit formatting tags:
 
-=============================================================
-Tags with semantic content:
-. . . . . . . . . . . . . . . . . . . . . . . . . . . 
-<altsp>    *          Alternative spelling segment.  Almost always
-                         contained within square brackets after the main
-                          definition segment.  Expository words
-                          such as "Spelled also" are in plain font;
-                          the actual alternative spelling is marked by
-                          <asp> ...  </asp> tags within this segment.
-
-<ant>     italic     Antonym.
-
-<asp>    italic      Alternative spelling.  The actual word which is an
-                          alternative spelling to the headword.  These
-                          are functionally synonyms of the headword.  In
-                          most cases these also occur as headwords, with
-                          reference to the word where the actual definition
-                          is found, but not all such words are listed
-                          separately, particularly if the spelling is
-                          close enough to the headword to be found at the
-                          same point in the dictionary.  Whether listed
-                          separately or not, these words should
-                          be indexed at this location, also.
-
-<au>    italic          Authority or author.  Used where an authority is
-      (may be right-       given for a definition, and also used for the
-       justified. See      author, where a quotation within double quotes
-       in the section      is given in  the same paragraph as the
-       on formatting).     definition.  The double quotes are indicated
-                           by the open-quote (\'bd) and close-quote
-                           (\'b8).   In both cases, it is typically
-                           right-justified, almost always fitting on
-                           the same line with the last line of the
-                           definition or quotation.
-                               Within collocation segments, it is usually
-                           used only after quotations, and is not right-
-                           justified, except occasionally where it
+* Tags with semantic content:
+
+-------------------------------------------------------------------------
+Tag           Font         Meaning and Description 
+-------------------------------------------------------------------------
+<altsp>       *            Alternative spelling segment.  Almost always
+                           contained within square brackets after the main
+                           definition segment.  Expository words such as
+                           "Spelled also" are in plain font; the actual
+                           alternative spelling is marked by <asp> ...
+                           </asp> tags within this segment.
+			   
+<ant>         italic       Antonym.
+
+<asp>         italic       Alternative spelling.  The actual word which is
+                           an alternative spelling to the headword.  These
+                           are functionally synonyms of the headword.  In
+                           most cases these also occur as headwords, with
+                           reference to the word where the actual definition
+                           is found, but not all such words are listed
+                           se
author	Sergey Poznyakoff <gray@gnu.org.ua>	2012-02-03 12:48:52 +0200
committer	Sergey Poznyakoff <gray@gnu.org.ua>	2012-02-03 12:48:52 +0200
commit	d18a469b7a5a4d4b5da21eab37f34ab1e99a8dce (patch)
tree	7eb331e376e85287c25b6a9734dae58a4724da8a
parent	4a458db06b28492a7e48b1a0560b35778e476482 (diff)
download	gcide-d18a469b7a5a4d4b5da21eab37f34ab1e99a8dce.tar.gz gcide-d18a469b7a5a4d4b5da21eab37f34ab1e99a8dce.tar.bz2