3 files changed, 534 insertions, 522 deletions
diff --git a/pronunc.txt b/pronunc.txt
index 5db6a9f..c2607c9 100644
--- a/pronunc.txt
+++ b/pronunc.txt
@@ -1,4 +1,4 @@
-file PRONUNC.WEB
+File pronunc.txt
 ================
    This file gives a number of examples of pronunciation, 
 using the entity symbols representing the pronunciations as
diff --git a/tagset.txt b/tagset.txt
index f0b9367..9a7a501 100644
--- a/tagset.txt
+++ b/tagset.txt
@@ -1,13 +1,13 @@
             FIELD MARKS FOR WEBSTER 1913 and CIDE
             =====================================
-Tagset.web:
      Explanations of the tags used to mark the Webster 1913 dictionary
 and the CIDE (Collaborative International Dictionary of English).
 Note that the list of tags used to mark the public domain version
 of this dictionary is shorter than the full set described here.
     If any tag is not listed here, it is either (1) one of the 
-"point" (font size) or "type" (font style) tags, which should be self-explanatory; or
-        (2) Is a functional field with no effect on the typography.
+"point" (font size) or "type" (font style) tags, which should be
+self-explanatory; or (2) is a functional field with no effect on the
+typography.
 
 Last modified March 12, 1999.
      For questions, contact:
@@ -16,7 +16,7 @@ Last modified March 12, 1999.
      Plainfield, NJ 07062
      (908) 561-3416   or (908) 668-5252
 -------------------------------------------------------------
-A separate file, webfont.asc, contains the list of the individual 
+A separate file, webfont.txt, contains the list of the individual 
 non-ASCII characters represented by either higher-order hexadecimal
 character marks (e.g., \'94, for o-umlaut) or by entity tags
 (e.g., <root/, for the square root symbol.)
diff --git a/webfont.txt b/webfont.txt
index 591e980..d432fe5 100644
--- a/webfont.txt
+++ b/webfont.txt
@@ -1,88 +1,70 @@
                  WEBSTER FONTS
                  =============
 
-          Fonts for the Webster 1913 Dictionary.
-          For version 0.50
-          Last edit May 5, 2001
-          ______________________________________
- (This file contains some extended ASCII characters, and should be
-transmitted in binary mode)
----------------------------------------------------------------------- 
-
-    This file describes a modified font for use in visualizing the
-text of the 1913 "Webster's Revised Unabridged Dictionary" (W1913),
-usable for the DOS operating system of IBM-compatible personal computers.
-The electronic version of that dictionary and this font were prepared by
-MICRA, Inc., Plainfield NJ, and are copyrighted (C) 1996 by MICRA, Inc.
-For details of permissions and restrictions on using these files, see
-the accompanying file "readme.web".
-    The special characters used in the electronic version of the Webster
+* Overview
+
+This file describes special symbols and markup entities used in the
+GNU Collaborative International Dictionary of English.
+
+* Introduction
+
+The special characters used in the electronic version of the Webster
 1913 are required for visualizing unusual characters used in the
 etymology and pronunciation fields of the dictionary, in a form
-comparable to the way they appear in the original.  Since there are
-more than 256 characters used in that dictionary, not all can be
-represented by single-byte codes, and are instead represented by
-SGML-style "short-form" symbols.  (rather than the "entity" format
-"&xx;"  The ampersand is used frequently, and we prefer to leave
-the "<" as the only "escape" character) of the type <x/ where x
-is a specific code for the symbol in the dictionary.
-See the "Short Form" section below for details about such characters.
-Note that the symbols used here are in some cases abbreviations
-(for compactness) of the ISO 8879 recommended symbols.  If necessary,
-the table below allows simple replacement by alternate encodings.
-    This symbol font can be loaded in IBM-compatible (x86) computers
-running the DOS operating system by using the "font.bat" command file
-in the "utils" directory.  The fonts files for 8x14 and 8x16 fonts are
-"web14.fnt" and "web16.fnt" respectively.
-    For those loading the Webster onto some machine other than an
-IBM-compatible running DOS, it will be necessary to provide a
-translation table, to convert these characters into a code that
-can be handled by that computer.  For this reason, I attach an
-"explanation" for each character, for those who cannot view
-the original DOS font.
-    The DOS-loadable font does not contain all of the characters needed
-to depict the etymologies or the pronunciations.  In addition to an
-absence of several characters used in the pronunciations, no Greek letters are
-included.  The Greek words appearing in the etymologies,
-when they are included, will be typed in a
-roman-letter transcription (See section on Greek transcription, below).
-Only a very few Greek words have been thus transcribed as of the
-present version (version 0.41).
-    Wherever the typists did not know the character to use, they
-usually inserted a reverse-video question mark (decimal 176).
-This appears in full-ASCII versions as <?/.  This mark was used both for
-characters in non-ASCII fonts, and for unreadable characters (i.e.,
-characters smeared in the original or distorted in the copies available
-to the typists. The type in the original was in many places smeared and
+comparable to the way they appear in the original.
+
+The GCIDE markup provides two ways for representing such characters:
+using special "escape sequences" and using special markup entities.
+Historically, "escape sequences" were used to indicate the
+character's ordinal position in a special font, prepared by MICRA,
+Inc. to represent it on screen.  Although nowadays this method is
+obsolete, the dictionary corpus still uses these sequences.  This file
+describes their mapping to Unicode characters.
+
+An escape sequence has the form \'xx, where "x" represent lowercase
+hexadecimal digits.  For example, \'94 stands for "o" with diaeresis.
+There are only 256 such sequences.
+
+Special markup entities are able to represent a wider range of
+characters.  A markup entity is similar to SGML one, but has a
+different format.  The traditional &xx; format was judged inconvenient
+because the ampersand is used frequently in the corpus.  Instead,
+GCIDE entities have the format <WORD/, where "<" and "/" represent the
+beginning and end of the entity and WORD represents the character
+itself.  Valid WORDs are in some cases abbreviations (for compactness)
+of the ISO 8879 recommended symbols.  Characters representable by
+escape sequences can also be represented by entities, but the reverse
+is not true, due to a limited range of the former.
+
+The Greek words appearing in the etymologies, when they are included,
+are typed in a roman-letter transcription, which is described below in
+chapter "Greek transliteration".
+
+* Unrecognized characters
+
+Wherever the typists did not know the character to use, they usually
+inserted a reverse-video question mark (decimal 176).  This appears in
+full-ASCII versions as <?/.  This mark was used both for characters in
+non-ASCII fonts, and for unreadable characters (i.e., characters
+smeared in the original or distorted in the copies available to the
+typists. The type in the original was in many places smeared and
 illegible at the left and right page margins; occasionally, small
 parts of words were blotted out by plain white space).
-    A character table for the high-order characters appears below.
-Under that is a list and description of most of the special characters
-used in the Webster files.
-     Note that there are yet some characters used in the etymologies,
-and some other symbols, which are not in this list.  For example, the
-vowels with a double dot *underneath*, e.g. a (as in all) have no representation
-in this character set, and, where explicitly entered in the dictionary,
-are represented by <xdd/ where "x" is the letter, as in "<add/".
-
-ITALICS
--------
-   In most places, italic font is represented by the tags <it>...</it>
+
+* Italics
+
+In most places, italic font is represented by the tags <it>...</it>
 surrounding the italic text, or by some other tag which also implies
-italic font.  In the pronunciations, however, where italicized vowels 
+italic font.  In the pronunciations, however, where italicized vowels
 are used among non-italic and other special characters to indicate
-pronunciation, the special codes <ait/, <eit/, <iit/, <oit/, <uit/, 
+pronunciation, the special codes <ait/, <eit/, <iit/, <oit/, <uit/,
 are also used to indicate the italicized vowel.
 
-DIACRITICS
--------------
-     The European grave and acute accents are represented by the
-standard (IBM PC) high-order codes.  Other characters with diacritics
-are represented by special "entity" codes, and in some cases also
-are found in this special WEB1913 font, described below.
-     Vowels with a circle above (as in Swedish) are coded <xring/
-(x with a ring, or "degrees" mark over it); vowels with tilde over them
-are represented by <xtil/, where "x" is the vowel, as in <etil/ (<atil/
+* Diacritics
+
+Vowels with a circle above (as in Swedish) are coded <xring/ (x with a
+ring, or "degrees" mark over it); vowels with tilde over them are
+represented by <xtil/, where "x" is the vowel, as in <etil/ (<atil/ 
 also has code 238); letters with a dot above are represented by <xdot/
 -- letter with a dot below are represented by <xsdot/ ("subdot");
 vowels with the semi-long mark (a macron with a short perpendicular
@@ -93,70 +75,57 @@ the "oo" with an unbroken macron above the two letters, <aemac/ = the
 ligature ae with a macron [also 214 = \'d6], and <oemac/ the ligature
 oe with a macron [also 215 = \'d7]); vowels with umlauts or a crescent
 (breve) above have codes in this list, but may also be represented by
-<xum/ and <xcr/ respectively.  There is an occasional hacek or caron mark
-(an inverted circumflex) in the original; such letters are coded <xcar/.
-The o with a caron has code 213, but no others are in this font list.
+<xum/ and <xcr/ respectively.  There is an occasional hacek or caron
+mark (an inverted circumflex) in the original; such letters are coded
+<xcar/.  The o with a caron has code 213, but no other letter with a
+caron is representable by an escape sequence.
+
 The diaeresis is treated typographically as identical to the umlaut.
-   A special modification, used only for poetry (see entry "saturnian verse"
-under "saturnian") is a vowel with a macron, in which the macron is lighter
-than the usual macron, signifying a stressed syllable which has a short
-vowel sound.  This is represented by <xsmac/ ("short mac").
-   Another special character used in pronunciations is an "n" with an underline (like
-a macron, but below the letter), used to represent the "ng" sound.  This is coded
-<nsm/ ("n sub-macron").  The ligated th used in pronunciations to depict the
-"th" sound of "the" is coded as <th/.
-    NOTE: the letter combinations "fi" and "fl" are invariably printed as the
+A special modification, used only for poetry (see entry "saturnian
+verse" under "saturnian") is a vowel with a macron, in which the
+macron is lighter than the usual macron, signifying a stressed
+syllable which has a short vowel sound.  This is represented by
+<xsmac/ ("short mac").
+
+Another special character used in pronunciations is an "n" with an
+underline (like a macron, but below the letter), used to represent the
+"ng" sound.  This is coded <nsm/ ("n sub-macron").  The ligated th
+used in pronunciations to depict the "th" sound of "the" is coded as
+<th/.
+
+NOTE: the letter combinations "fi" and "fl" are invariably printed as the
 ligatures &filig; and &fllig;, but these ligatures are not marked as such
 in this transcription, and the two letters are left as individuals.
 
-SPECIAL SYMBOLS
-   The dagger <dag/, double dagger <ddag/, and paragraph mark <para/ are rarely used.
-    The double prime, or "seconds" of a degree is sometimes represented by
-a double "light accent" (code 183 = \'b7).  In other places, and in later
-versions, it is represented by <sec/ = hex a9, in the webfont.
-   The symbols "greater than" <gt/ and "less than" are encountered only
-once, but are distinguished from the right- and left-angle brackets
-(> and <) because of possible typographical differences in some fonts.
-   The schwa is symbolized by <schwa/.  It is not used in the 
-pronunciations, but is mentioned as a symbol.
-   The right-pointing arrow is <rarr/, consistent with ISO 8879.
-
-----------------------------------
-Table 1     
-----------------------------------
-Numbers
-�������                   Hex codes
-1       
-11                    (12 is a hard page break, 13 CR, 14 sect break)
-21    
-31   !"#  $%&'(         
-121 yz{|}  ~���         79-7d 7e-82
-131 �����  �����         83-87 88-8c
-141 �����  �����         8d-91 92-96
-151 �����  �����         97-9b 9c-a0
-161 �����  �����         a1-a5 a6-aa
-171 �����  �����         ab-af b0-b4
-181 �����  �����         b5-b9 ba-be
-191 �����  �����         bf-c3 c4-c8
-201 �����  �����         c9-cd ce-d2
-211 �����  �����         d3-d7 d8-dc
-221 �����  �����         dd-e1 e2-e6
-231 �����  �����         e7-eb ec-f0
-241 �����  �����         f1-f5 f6-fa
-251 �����                fb-ff
-
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
-Below is a complete list of the symbols used in the Webster ("webfont") 
-which are encoded in the special font listed above, together with
-corresponding symbols in ISO 8879 and Tex coding.  Much of this table was
-prepared by Rik Faith, to whom we express our appreciation.
-    The "nearest ASCII" equivalents are given for those who want to
-display the data as best one can in 7-bit simple ASCII symbols without
+* Special symbols
+
+The dagger <dag/, double dagger <ddag/, and paragraph mark <para/ are
+rarely used.
+
+The double prime, or "seconds" of a degree is sometimes represented by
+a double "light accent" (code 183 = \'b7).  In other places, and in
+later versions, it is represented by <sec/ = \'a9.
+
+The symbols "greater than" <gt/ and "less than" are encountered only
+once, but are distinguished from the right- and left-angle brackets (>
+and <) because of possible typographical differences in some fonts.
+
+The schwa is symbolized by <schwa/.  It is not used in the
+pronunciations, but is mentioned as a symbol.  The right-pointing
+arrow is <rarr/, consistent with ISO 8879.
+
+* Symbol summary
+
+Below is a complete list of the symbols used in the Webster, together
+with their "webfont" number (escape sequence), corresponding markup
+entity, and corresponding symbols in ISO 8879 and Tex coding.  Much of
+this table was prepared by Rik Faith, to whom we express our
+appreciation.
+
+The "Uc" column gives the Unicode representation of the character.
+The "nearest ASCII" equivalents are given for those who want to
+display the data as best one can in 7-bit simple ASCII symbols without 
 using the "entity" symbols.
-=========================================================================
-----------------------------------
-Table 2     
-----------------------------------
 
 Comments: 
   (1) The symbol in the "entity" column is the SGML-like symbol used in
@@ -164,19 +133,19 @@ Comments:
       the symbol for the same character given in "The user's guide to
       ISO 8879" by Smith and Stutely.
   (2) An asterisk "*" in the "entity" column means that this symbol and 
-code value is not used in any form in the Webster 1913 electronic version.
+code value is not used in any form in GCIDE.
   (3)  If no asterisk is in the "entity" column, and no other symbol is 
 there, this means that in the Webster, only the hexadecimal representation
 was used (e.g. for \'d8, \'bd, and  \'b8).
   (4) \'b6 and \'b7, the heavy and light "accents", are never above a
 letter (these are not diacritical marks), but in-between letters, as the
-stress accent used in the headwords and pronunciations.  The accent
-*follows* the syllable accented.  The light accent \'b7 is also used as
-the "prime" in mathematical expressions (e.g. a\'b7 = "a prime"), or as
- "minutes" in degrees-minutes-seconds, and when doubled (\'b7\'b7)
-serves as "double prime" in mathematical expressions, and as "seconds"
-in degrees-minutes-seconds.  The character \'a9 (<sec/ or &Prime;) is
-also used to represent the double prime.
+stress accent used in the headwords and pronunciations.  The
+accent *follows* the syllable accented.  The light accent \'b7 is
+also used as the "prime" in mathematical expressions (e.g. a\'b7 = "a
+prime"), or as "minutes" in degrees-minutes-seconds, and when doubled
+(\'b7\'b7) serves as "double prime" in mathematical expressions, and
+as "seconds" in degrees-minutes-seconds.  The character \'a9 (<sec/ or
+&Prime;) is also used to represent the double prime.
   (5) Although the semilong vowels are in the table (e.g. the "asl" 
 = "a semilong", most of the entries in the ASCII version dictionary
 use the <xsl/ symbol coding.  If you know of any printers' names for
@@ -186,419 +155,462 @@ Latin-2 table), but the other vowels don't, in the Smith & Stutely book.
 Is this a mistake?
   (7) The symbol <nsc/ is used for "N small capitals", used in
 pronunciations to represent the soun fo the nasal N in French words.
-  (8) A weak accent (when not in pronunciations) is symbolized by <prime/, the "minutes" (of a degree) symbol.  A strong accent is symbolized by <bprime/ ("bold prime", not an ISO entity).
+  (8) A weak accent (when not in pronunciations) is symbolized by
+<prime/, the "minutes" (of a degree) symbol.  A strong accent is
+symbolized by <bprime/ ("bold prime", not an ISO entity).
   (9) If you find any exceptions to these usage assertions, please
 let me know.
-----------------------------------------------------------------------------------------
-     webfont       ISO 8879    latin1/ascii    TeX    nearest     description
-------------------                                     ASCII
-oct dec hex entity             oct dec hex
---------------------------------------------------------------------------------
-025  21  15  *                                   \S       *     section symbol
-
-074  60  3c          lt         074  60  3c     $<$       <     less than
-076  62  3e          gt         076  62  3e     $>$       >     greater than
-
-200 128  80 <Cced/   Ccedil     307 199  c7     \c{C}     C     C cedilla
-201 129  81 <uum/    uuml       374 252  fc     \"u       ue    u umlaut (diaeresis)
-202 130  82 <eacute/ eacute     351 233  e9     \'e       e     e acute
-203 131  83 <acir/   acirc      342 226  e2     \^a       a     a circumflex
-204 132  84 <aum/    auml       344 228  e4     \"a       ae    a umlaut (diaeresis)
-205 133  85 <agrave/ agrave     340 224  e0     \`a       a     a grave
-206 134  86 <aring/  aring      345 229  e5     \aa       a     a ring above
-207 135  87 <cced/   ccedil     347 231  e7     \c{c}     c     c cedilla
-210 136  88 <ecir/   ecirc      352 234  ea     \^e       e     e circumflex
-211 137  89 <eum/    euml       353 235  eb     \"e       e     e umlaut (diaeresis)
-212 138  8a <egrave/ egrave     350 232  e8     \`e       e     e grave
-213 139  8b <ium/    iuml       357 239  ef     \"i       i     i umlaut (diaeresis)
-214 140  8c <icir/   icirc      356 238  ee     \^i       i     i circumflex
-215 141  8d <igrave/ igrave     354 236  ec     \`i       i     i grave
-216 142  8e <Aum/    Auml                                 A     A umlaut
-217 143  8f          Aring                                A     A ring above
-
-220 144  90 <Eacute/ Eacute     311 201  c9     \'E       e      E acute
-221 145  91 <ae/     aelig      346 230  e6     \ae       ae     ligature ae
-222 146  92 <AE/     AElig      306 198  c6     \AE       AE     ligature AE
-223 147  93 <ocir/   ocirc      364 244  f4     \^o       o      o circumflex
-224 148  94 <oum/    ouml       366 246  f6     \"o       oe     o umlaut (diaeresis)
-225 149  95 <ograve/ ograve     362 242  f2     \`o       o      o grave
-226 150  96 <ucir/   ucirc      373 251  fb     \^u       u      u circumflex
-227 151  97 <ugrave/ ugrave     371 249  f9     \`u       u      u grave
-230 152  98 <yum/    yuml                                 y      y umlaut
-231 153  99 <Oum/    Ouml                                 O      O umlaut
-232 154  9a <Uum/    Uuml       334 220  dc     \"U       U      U umlaut (diaeresis)
+----------------------------------------------------------------------------
+     webfont       ISO 8879   TeX             Uc  ASC   Description
+------------------                               
+oct dec hex entity             
+----------------------------------------------------------------------------
+025  21  15  *                 \S             §   *     section symbol
+
+074  60  3c          lt        $<$            <   <     less than
+076  62  3e          gt        $>$            >   >     greater than
+
+200 128  80 <Cced/   Ccedil    \c{C}          Ç   C     C cedilla
+201 129  81 <uum/    uuml      \"u            ü   ue    u umlaut (diaeresis)
+202 130  82 <eacute/ eacute    \'e            é   e     e acute
+203 131  83 <acir/   acirc     \^a            â   a     a circumflex
+204 132  84 <aum/    auml      \"a            ä   ae    a umlaut (diaeresis)
+205 133  85 <agrave/ agrave    \`a            à   a     a grave
+206 134  86 <aring/  aring     \aa            å   aa    a ring above
+207 135  87 <cced/   ccedil    \c{c}          ç   c     c cedilla
+210 136  88 <ecir/   ecirc     \^e            ê   e     e circumflex
+211 137  89 <eum/    euml      \"e            ë   e     e umlaut (diaeresis)
+212 138  8a <egrave/ egrave    \`e            è   e     e grave
+213 139  8b <ium/    iuml      \"i            ï   i     i umlaut (diaeresis)
+214 140  8c <icir/   icirc     \^i            î   i     i circumflex
+215 141  8d <igrave/ igrave    \`i            ì   i     i grave
+216 142  8e <Aum/    Auml                     Ä   A     A umlaut
+217 143  8f          Aring                    Å   Aa    A ring above
+
+220 144  90 <Eacute/ Eacute    \'E            É   E     E acute
+221 145  91 <ae/     aelig     \ae            æ   ae    ligature ae
+222 146  92 <AE/     AElig     \AE            Æ   AE    ligature AE
+223 147  93 <ocir/   ocirc     \^o            ô   o     o circumflex
+224 148  94 <oum/    ouml      \"o            ö   oe    o umlaut (diaeresis)
+225 149  95 <ograve/ ograve    \`o            ò   o     o grave
+226 150  96 <ucir/   ucirc     \^u            û   u     u circumflex
+227 151  97 <ugrave/ ugrave    \`u            ù   u     u grave
+230 152  98 <yum/    yuml                     ÿ   y     y umlaut
+231 153  99 <Oum/    Ouml                     Ö   O     O umlaut
+232 154  9a <Uum/    Uuml      \"U            Ü   U     U umlaut (diaeresis)
 233 155  9b
-234 156  9c <pound/  pound      243 163  a3     \pounds   *      pound sign (British)
+234 156  9c <pound/  pound     \pounds        £   *     pound sign (British)
 235 157  9d  *
 236 158  9e  *
 237 159  9f  *
-240 160  a0 <aacute/ aacute     341 225  e1     \'a        a     a acute
-241 161  a1 <iacute/ iacute     355 237  ed     \'i        i     i acute
-242 162  a2 <oacute/ oacute     363 243  f3     \'o        o     o acute
-243 163  a3 <uacute/ uacute     372 250  fa     \'u        u     u acute
-244 164  a4 <ntil/   ntilde     361 241  f1     \~n        ny    n tilde
-245 165  a5 <Ntil/   Ntilde                                NY    N tilde
-246 166  a6 <frac23/                      $\frac{2}{3}$    2/3   two-thirds
-247 167  a7 <frac13/                      $\frac{1}{3}$    1/3   one-third
+240 160  a0 <aacute/ aacute    \'a            á   a     a acute
+241 161  a1 <iacute/ iacute    \'i            í   i     i acute
+242 162  a2 <oacute/ oacute    \'o            ó   o     o acute
+243 163  a3 <uacute/ uacute    \'u            ú   u     u acute
+244 164  a4 <ntil/   ntilde    \~n            ñ   ny    n tilde
+245 165  a5 <Ntil/   Ntilde                   Ñ   NY    N tilde
+246 166  a6 <frac23/           $\frac{2}{3}$  ⅔   2/3   two-thirds
+247 167  a7 <frac13/           $\frac{1}{3}$  ⅓   1/3   one-third
 250 168  a8  *
-251 169  a9 <sec/    Prime                                       seconds (of degree or time)
-                                                                    Also, inches or double prime
+251 169  a9 <sec/    Prime                    ˝   ''    seconds (of
+                                                        degree or time)
+                                                        Also, inches
+							or double prime.
 252 170  aa  *
-253 171  ab <frac12/            275 189  bd  $\frac{1}{2}$  1/2  one-half
-254 172  ac <frac14/            274 188  bc  $\frac{1}{4}$  1/4  one-quarter
+253 171  ab <frac12/           $\frac{1}{2}$  ½   1/2   one-half
+254 172  ac <frac14/           $\frac{1}{4}$  ¼   1/4   one-quarter
 255 173  ad  *
 256 174  ae  *
 257 175  af  *
-260 176  b0 <?/                                              (?)  Place-holder
-                                            for unknown or illegible character.
+260 176  b0 <?/                                   (?)   Place-holder
+                                                        for unknown or
+							illegible
+							character.   
 261 177  b1  *
 262 178  b2  *
 263 179  b3  *
-264 180  b4  *                                $\updownarrow$  *   verticle arrow
-265 181  b5 <hand/                                            *   pointing hand
-                                                                 (printer's "fist")
-266 182  b6 <bprime/                           \"{}          ''   bold accent 
-                                                             (used in pronunciations)
-267 183  b7 <prime/  prime      264  180  b4     \'{}          '   light accent 
-                                                            (used in pronunciations)
-                                                            also minutes (of arc or time)
-270 184  b8 <rdquo/ rdquo                       ''            "   close double quote
+264 180  b4  *                 $\updownarrow$ ↑   *     vertical arrow
+265 181  b5 <hand/                            ☞   *     pointing hand
+                                                        (printer's "fist")
+266 182  b6 <bprime/           \"{}           ˝   ''    bold accent 
+                                                        (used in
+							pronunciations) 
+267 183  b7 <prime/  prime     \'{}           ´   '     light accent 
+                                                        (used in
+							pronunciations) 
+                                                        also minutes
+							(of arc or time) 
+270 184  b8 <rdquo/  rdquo     ''             ”   "     close double quote
 271 185  b9  *
-272 186  ba  *                                  $\parallel$   ||   verticle double bar (l)
+272 186  ba  *                 $\parallel$    ‖   ||    vertical double bar
+                                                        (l)
 273 187  bb  *
-274 188  bc <sect/  sect                        \S             *    section mark
-275 189  bd <ldquo/ ldquo                        ``            "    open double quotes
-276 190  be <amac/  amacr                       \=a           a    a macron
-277 191  bf <lsquo/ lsquo                       `             `    left single quote
-
-300 192  c0 <nsm/                                             ng   "n sub-macron"
-301 193  c1 <sharp/ sharp                       $\sharp$      #      musical sharp
-302 194  c2 <flat/  flat                        $\flat$       *     musical flat
-303 195  c3  *                                  --            --    long dash (en-dash? )
-304 196  c4  *                                  $-$            -    horizontal line
-305 197  c5 <th/ (part 1)                                           first part of th ligature
-                                                                   see 231 = e7 for part 2
-306 198  c6 <imac/  imacr                       \=i            i    i macron
-307 199  c7 <emac/  emacr                       \=e            e    e macron
-310 200  c8 <dsdot/                                            d    Sanskrit/Tamil d dot 
-311 201  c9 <nsdot/                                            n    Sanskrit/Tamil n dot
-312 202  ca <tsdot/                                            t    Sanskrit/Tamil t dot
-313 203  cb <ecr/                               \u{e}          e    e breve
-314 204  cc <icr/                               \u{i}          i    i breve
+274 188  bc <sect/  sect       \S             §   *     section mark
+275 189  bd <ldquo/ ldquo      ``             “   "     open double quotes
+276 190  be <amac/  amacr      \=a            ā   a     a macron
+277 191  bf <lsquo/ lsquo      `              ‘   `     left single quote
+
+300 192  c0 <nsm/                             ṉ   ng    "n sub-macron"
+301 193  c1 <sharp/ sharp      $\sharp$       ♯   #     musical sharp
+302 194  c2 <flat/  flat       $\flat$        ♭   *     musical flat
+303 195  c3  *                 --             –   --    long dash (en-dash? )
+304 196  c4  *                 $-$            ―   -     horizontal line
+305 197  c5 <th/ (part 1)                         t     first part of
+                                                        th ligature 
+                                                        see 231 = e7 for part 2
+306 198  c6 <imac/  imacr      \=i            ī   i     i macron
+307 199  c7 <emac/  emacr      \=e            ē   e     e macron
+310 200  c8 <dsdot/                           ḍ   d     Sanskrit/Tamil d dot 
+311 201  c9 <nsdot/                           ṇ   n     Sanskrit/Tamil n dot
+312 202  ca <tsdot/                           ṭ   t     Sanskrit/Tamil t dot
+313 203  cb <ecr/              \u{e}          ĕ   e     e breve
+314 204  cc <icr/              \u{i}          ĭ   i     i breve
 315 205  cd  *
-316 206  ce <ocr/                               \u{o}          o    o breve
-317 207  cf  -                                  --             -    short dash
-
-320 208  d0  --      mdash                      ---            --   long (em) dash
-321 209  d1 <OE/     OElig                      \OE            OE   OE ligature
-322 210  d2 <oe/     oelig                      \oe            oe   oe ligature
-323 211  d3 <omac/   omacr                      \=o            o    o macron
-324 212  d4 <umac/   umacr                      \=u            u    u macron
-325 213  d5 <ocar/                              \v{o}          o    o hacek
-326 214  d6 <aemac/                             \=\ae          ae   ae ligature macron
-327 215  d7 <oemac/                             \=\oe          oe   oe ligature macron
-330 216  d8          par                        $\parallel$    ||   double vertical
-                                                                    bar(s)
+316 206  ce <ocr/              \u{o}          ŏ   o     o breve
+317 207  cf  -                 --             ‐   -     short dash
+
+320 208  d0  --      mdash     ---            —   ---   long (em) dash
+321 209  d1 <OE/     OElig     \OE            Œ   OE    OE ligature
+322 210  d2 <oe/     oelig     \oe            œ   oe    oe ligature
+323 211  d3 <omac/   omacr     \=o            ō   o     o macron
+324 212  d4 <umac/   umacr     \=u            ū   u     u macron
+325 213  d5 <ocar/             \v{o}          ǒ   o     o hacek
+326 214  d6 <aemac/            \=\ae          ǣ   ae    ae ligature macron
+327 215  d7 <oemac/            \=\oe          ōē  oe    oe ligature macron
+330 216  d8          par       $\parallel$    ‖   ||    double vertical bar
+                                                        (s)
 331 217  d9  *
 332 218  da  *
 333 219  db  *
-334 220  dc <ucr/   ubreve                      \u{u}           u  u breve
-335 221  dd <acr/   abreve                      \u{a}           a  a breve
-336 222  de <cre/   ssmile                      \u{}            ~  crescent
-                                           (like a breve, but vertically centered --
-                                            represents the short accent in poetic meter)
-337 223  df <ymac/                              \=y             y   y macron
-
-340 224  e0  <asl/                                              a   a "semilong"
-                                          (has a macron above with a short vertical
-                                            bar on top the center of the macron)
-                                           Used in pronunciations.
-341 225  e1  <esl/                                                   e "semilong"
-342 226  e2  <isl/                                                   i "semilong"
-343 227  e3  <osl/                                                   o "semilong"
-344 228  e4  <usl/                                                   u "semilong"
-345 229  e5  <adot/                                             a    a with dot above
-346 230  e6  *                                                  mu  small Greek mu
-347 231  e7  <th/ (part 2)                                          second part of th ligature
-                                                                     see 197 = c5 for part 1
+334 220  dc <ucr/   ubreve     \u{u}          ŭ   u     u breve
+335 221  dd <acr/   abreve     \u{a}          ă   a     a breve
+336 222  de <cre/   ssmile     \u{}           ˘   ~     crescent
+                                                        (like a breve,
+							but vertically
+							centered -- 
+                                                        represents the
+							short accent
+							in poetic
+							meter) 
+337 223  df <ymac/             \=y            ȳ   y     y macron
+
+340 224  e0  <asl/                                a     a "semilong"
+                                                        (has a macron
+                                                        above with a
+                                                        short vertical
+                                                        bar on top the
+                                                        center of the
+                                                        macron) 
+                                                        Used in
+                                                        pronunciations. 
+341 225  e1  <esl/                                      e "semilong"
+342 226  e2  <isl/                                      i "semilong"
+343 227  e3  <osl/                                      o "semilong"
+344 228  e4  <usl/                                      u "semilong"
+345 229  e5  <adot/                           ȧ   a     a with dot above
+346 230  e6  *                                μ   mu    small Greek mu
+347 231  e7  <th/ (part 2)                              second part of
+                                                        th ligature; 
+                                                        see 197 = c5
+                                                        for part 1
 350 232  e8  *
 351 233  e9  *
 352 234  ea  *
-353 235  eb <edh/   edh        360 240 f0                       th  small eth
+353 235  eb <edh/   edh                       ð   th    small eth
 354 236  ec  *
-355 237  ed <thorn/ thorn      376 254 fe                       th  small thorn
-356 238  ee <atil/  atilde                      \~a              a   a tilde
-357 239  ef <ndot/                                               n   n with dot above
+355 237  ed <thorn/ thorn                     þ   th    small thorn
+356 238  ee <atil/  atilde     \~a            ã   a     a tilde
+357 239  ef <ndot/                            ṅ   n     n with dot above
 
-360 240  f0 <rsdot/                             \d{r}            r   r with a dot below
+360 240  f0 <rsdot/            \d{r}          ṛ   r     r with a dot below
 361 241  f1   *
 362 242  f2   *
 363 243  f3   *
-364 244  f4 <yogh/                                               y   small yogh
-365 245  f5 <mdash/ mdash                       ---              --  em dash
-366 246  f6 <divide/ divide    367 247 f7       $\div$           /   division sign
-367 247  f7         ap                          $\approx$        ~=  "double tilde"
-370 248  f8 <deg/   deg        260 176 b0       ${}^\circ$       *    degree sign
-371 249  f9 <middot/                            $\bullet$        *    bold middle dot
-372 250  fa   *                267 183 b7       $\cdot$          *    light middle dot
-373 251  fb <root/  radic                       $\surd$          *    root sign
+364 244  f4 <yogh/                            ȝ   y     small yogh
+365 245  f5 <mdash/ mdash      ---            —   ---   em dash
+366 246  f6 <divide/ divide    $\div$         ÷   /     division sign
+367 247  f7         ap         $\approx$      ≈   ~=    "double tilde"
+370 248  f8 <deg/   deg        ${}^\circ$     °   *     degree sign
+371 249  f9 <middot/           $\bullet$      •   *     bold middle dot
+372 250  fa   *                $\cdot$        ·   *     light middle dot
+373 251  fb <root/  radic      $\surd$        √   *     root sign
 374 252  fc   *
 375 253  fd   *
 376 254  fe   *
-377 255  ff   *
+377 255  ff   *                 
+----------------------------------------------------------------------------
 
-----------------------------------
-Table 3     
-----------------------------------
-
-====================================================================
 The table below gives some additional information about some of the 
 more commonly used entities
 -------------------------------------------------------------------
 Frequently used:
 decimal  hex    char  definition
-   21                  section symbol -- another section also at 197
-                       (so that 21 can be used as a normal control
-                         character)
-  126            ~     used by typists as a place-holder in word
-                         combinations where an uncapitalized headword
-                         should be.
-  128    80      �     <Cced/ c cedilla (uppercase)
-  129    81      �     <uum/ u umlaut
-  130    82      �     e acute
-  131    83      �     a circumflex
-  132    84      �     <aum/ a umlaut
-  133    85      �     a grave
-  134    86      �     <aring/ a with "ring" (circle) above (Swedish!)
-  135    87      �     <cced/ c cedilla
-  136 - 144            standard European set for IBM
-  136    88      �     <ecir/ e circumflex
-  137    89      �     <eum/ e umlaut (or e with dieresis above)
-  138    8a      �     e grave
-  145    91      �     <ae/ = "ae" fused ligature
-  146    92      �     <AE/ = upper-case "ae" fused ligature
-  147    93      �     <ocir/ o circumflex
-  148    94      �     <oum/ o "umlaut", used mostly in "co�peration,
-                        Zo�l." and in pronunciations
-  164    a4      �     <ntil/ Spanish "enye"
-  166    a6      �     <frac23/ two-thirds (fraction)
-  167    a7      �     <frac13/ one-third (fraction)
-  169    a9      �     <sec/  seconds of degree or time, or double-prime
-  171    ab      �     <frac12/ one-half, as in the original IBM set
-  172    ac      �     <frac14/ one-fourth (fraction)
-  176    b0      �     <?/ = (reverse-video question mark), used
-                        to represent an uncodable or illegible character
-  180    b4      �     long verticle double-headed arrow (a reference mark)
-  181    b5      �     <hand/ = (the typographer's "fist")
-                        Appearing as a "pointing hand" character
-                       (for explanatory notes)
-  182    b6      �    bold accent in headwords
-                       replaced in full ASCII version by double quote = "
-  183    b7      �     light accent in headwords
-                       replaced within headwords in the full ASCII version
-                       by an open-single-quote (` = ASCII 96, not the same
-                       as 191, \'bf).   This mark is used also
-                       for minutes of a degree, and for "prime"
-                       to modify variables in mathematical expressions.
-                         -- two of these in sequence represent seconds
-                        of a degree, or double prime.  The seconds
-                        symbol is also represented by <sec/ (hex a9).
-  184    b8      �     close double quotes (used with 189 [= \'bd], open quote)
-  186    ba      �     verticle double bar - represents the symbol used
-                       in the printed dictionary before a headword to
-                       signify that the word was adopted without
-                       anglicization from a foreign language
-                       but in the full-ASCII version this function
-                        uses \'d8 -- see 216
-  188    bc      �     <sect/ section mark
-                       - alternate to 21 (a control character)
-  189    bd      �     open double quotes (used with 184, close quote)
-  190    be      �     <amac/ a macron
-  191    bf      �     <lsquo/ "left single quote"
-                        single open quote mark (not same as ASCII 96)
-  192    c0      �     <nsm/ "n sub-macron", an n with a macron below --
-                        represents the "ng" sound in pronunciations
-  193    c1      �     <sharp/ sharp - music notation
-  194    c2      �     <flat/  flat  - music notation
-  195    c3      �     long dash, one pixel removed from left
-                       will fuse with left long dash, char 208
-  196    c4      �     graphic horizontal line
-  195+208        ��     combination for a very long dash.  In the
-                        original typing, the dash char 208 was used
-                        for both non-breaking hyphen (in hyphenated
-                        words), and for the em-dash used as an
-                        introductory mark for various segments.
-                        The em-dash should be distinguished from
-                        the hyphen, but that conversion hasn't yet
-                        been done.
-                         In the full ASCII version, a double hypen
-                        "--" represent the m-dash