aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--pronunc.txt2
-rw-r--r--tagset.txt8
-rw-r--r--webfont.txt1046
3 files changed, 534 insertions, 522 deletions
diff --git a/pronunc.txt b/pronunc.txt
index 5db6a9f..c2607c9 100644
--- a/pronunc.txt
+++ b/pronunc.txt
@@ -1,4 +1,4 @@
-file PRONUNC.WEB
+File pronunc.txt
================
This file gives a number of examples of pronunciation,
using the entity symbols representing the pronunciations as
diff --git a/tagset.txt b/tagset.txt
index f0b9367..9a7a501 100644
--- a/tagset.txt
+++ b/tagset.txt
@@ -1,13 +1,13 @@
FIELD MARKS FOR WEBSTER 1913 and CIDE
=====================================
-Tagset.web:
Explanations of the tags used to mark the Webster 1913 dictionary
and the CIDE (Collaborative International Dictionary of English).
Note that the list of tags used to mark the public domain version
of this dictionary is shorter than the full set described here.
If any tag is not listed here, it is either (1) one of the
-"point" (font size) or "type" (font style) tags, which should be self-explanatory; or
- (2) Is a functional field with no effect on the typography.
+"point" (font size) or "type" (font style) tags, which should be
+self-explanatory; or (2) is a functional field with no effect on the
+typography.
Last modified March 12, 1999.
For questions, contact:
@@ -16,7 +16,7 @@ Last modified March 12, 1999.
Plainfield, NJ 07062
(908) 561-3416 or (908) 668-5252
-------------------------------------------------------------
-A separate file, webfont.asc, contains the list of the individual
+A separate file, webfont.txt, contains the list of the individual
non-ASCII characters represented by either higher-order hexadecimal
character marks (e.g., \'94, for o-umlaut) or by entity tags
(e.g., <root/, for the square root symbol.)
diff --git a/webfont.txt b/webfont.txt
index 591e980..d432fe5 100644
--- a/webfont.txt
+++ b/webfont.txt
@@ -1,88 +1,70 @@
WEBSTER FONTS
=============
- Fonts for the Webster 1913 Dictionary.
- For version 0.50
- Last edit May 5, 2001
- ______________________________________
- (This file contains some extended ASCII characters, and should be
-transmitted in binary mode)
-----------------------------------------------------------------------
-
- This file describes a modified font for use in visualizing the
-text of the 1913 "Webster's Revised Unabridged Dictionary" (W1913),
-usable for the DOS operating system of IBM-compatible personal computers.
-The electronic version of that dictionary and this font were prepared by
-MICRA, Inc., Plainfield NJ, and are copyrighted (C) 1996 by MICRA, Inc.
-For details of permissions and restrictions on using these files, see
-the accompanying file "readme.web".
- The special characters used in the electronic version of the Webster
+* Overview
+
+This file describes special symbols and markup entities used in the
+GNU Collaborative International Dictionary of English.
+
+* Introduction
+
+The special characters used in the electronic version of the Webster
1913 are required for visualizing unusual characters used in the
etymology and pronunciation fields of the dictionary, in a form
-comparable to the way they appear in the original. Since there are
-more than 256 characters used in that dictionary, not all can be
-represented by single-byte codes, and are instead represented by
-SGML-style "short-form" symbols. (rather than the "entity" format
-"&xx;" The ampersand is used frequently, and we prefer to leave
-the "<" as the only "escape" character) of the type <x/ where x
-is a specific code for the symbol in the dictionary.
-See the "Short Form" section below for details about such characters.
-Note that the symbols used here are in some cases abbreviations
-(for compactness) of the ISO 8879 recommended symbols. If necessary,
-the table below allows simple replacement by alternate encodings.
- This symbol font can be loaded in IBM-compatible (x86) computers
-running the DOS operating system by using the "font.bat" command file
-in the "utils" directory. The fonts files for 8x14 and 8x16 fonts are
-"web14.fnt" and "web16.fnt" respectively.
- For those loading the Webster onto some machine other than an
-IBM-compatible running DOS, it will be necessary to provide a
-translation table, to convert these characters into a code that
-can be handled by that computer. For this reason, I attach an
-"explanation" for each character, for those who cannot view
-the original DOS font.
- The DOS-loadable font does not contain all of the characters needed
-to depict the etymologies or the pronunciations. In addition to an
-absence of several characters used in the pronunciations, no Greek letters are
-included. The Greek words appearing in the etymologies,
-when they are included, will be typed in a
-roman-letter transcription (See section on Greek transcription, below).
-Only a very few Greek words have been thus transcribed as of the
-present version (version 0.41).
- Wherever the typists did not know the character to use, they
-usually inserted a reverse-video question mark (decimal 176).
-This appears in full-ASCII versions as <?/. This mark was used both for
-characters in non-ASCII fonts, and for unreadable characters (i.e.,
-characters smeared in the original or distorted in the copies available
-to the typists. The type in the original was in many places smeared and
+comparable to the way they appear in the original.
+
+The GCIDE markup provides two ways for representing such characters:
+using special "escape sequences" and using special markup entities.
+Historically, "escape sequences" were used to indicate the
+character's ordinal position in a special font, prepared by MICRA,
+Inc. to represent it on screen. Although nowadays this method is
+obsolete, the dictionary corpus still uses these sequences. This file
+describes their mapping to Unicode characters.
+
+An escape sequence has the form \'xx, where "x" represent lowercase
+hexadecimal digits. For example, \'94 stands for "o" with diaeresis.
+There are only 256 such sequences.
+
+Special markup entities are able to represent a wider range of
+characters. A markup entity is similar to SGML one, but has a
+different format. The traditional &xx; format was judged inconvenient
+because the ampersand is used frequently in the corpus. Instead,
+GCIDE entities have the format <WORD/, where "<" and "/" represent the
+beginning and end of the entity and WORD represents the character
+itself. Valid WORDs are in some cases abbreviations (for compactness)
+of the ISO 8879 recommended symbols. Characters representable by
+escape sequences can also be represented by entities, but the reverse
+is not true, due to a limited range of the former.
+
+The Greek words appearing in the etymologies, when they are included,
+are typed in a roman-letter transcription, which is described below in
+chapter "Greek transliteration".
+
+* Unrecognized characters
+
+Wherever the typists did not know the character to use, they usually
+inserted a reverse-video question mark (decimal 176). This appears in
+full-ASCII versions as <?/. This mark was used both for characters in
+non-ASCII fonts, and for unreadable characters (i.e., characters
+smeared in the original or distorted in the copies available to the
+typists. The type in the original was in many places smeared and
illegible at the left and right page margins; occasionally, small
parts of words were blotted out by plain white space).
- A character table for the high-order characters appears below.
-Under that is a list and description of most of the special characters
-used in the Webster files.
- Note that there are yet some characters used in the etymologies,
-and some other symbols, which are not in this list. For example, the
-vowels with a double dot *underneath*, e.g. a (as in all) have no representation
-in this character set, and, where explicitly entered in the dictionary,
-are represented by <xdd/ where "x" is the letter, as in "<add/".
-
-ITALICS
--------
- In most places, italic font is represented by the tags <it>...</it>
+
+* Italics
+
+In most places, italic font is represented by the tags <it>...</it>
surrounding the italic text, or by some other tag which also implies
-italic font. In the pronunciations, however, where italicized vowels
+italic font. In the pronunciations, however, where italicized vowels
are used among non-italic and other special characters to indicate
-pronunciation, the special codes <ait/, <eit/, <iit/, <oit/, <uit/,
+pronunciation, the special codes <ait/, <eit/, <iit/, <oit/, <uit/,
are also used to indicate the italicized vowel.
-DIACRITICS
--------------
- The European grave and acute accents are represented by the
-standard (IBM PC) high-order codes. Other characters with diacritics
-are represented by special "entity" codes, and in some cases also
-are found in this special WEB1913 font, described below.
- Vowels with a circle above (as in Swedish) are coded <xring/
-(x with a ring, or "degrees" mark over it); vowels with tilde over them
-are represented by <xtil/, where "x" is the vowel, as in <etil/ (<atil/
+* Diacritics
+
+Vowels with a circle above (as in Swedish) are coded <xring/ (x with a
+ring, or "degrees" mark over it); vowels with tilde over them are
+represented by <xtil/, where "x" is the vowel, as in <etil/ (<atil/
also has code 238); letters with a dot above are represented by <xdot/
-- letter with a dot below are represented by <xsdot/ ("subdot");
vowels with the semi-long mark (a macron with a short perpendicular
@@ -93,70 +75,57 @@ the "oo" with an unbroken macron above the two letters, <aemac/ = the
ligature ae with a macron [also 214 = \'d6], and <oemac/ the ligature
oe with a macron [also 215 = \'d7]); vowels with umlauts or a crescent
(breve) above have codes in this list, but may also be represented by
-<xum/ and <xcr/ respectively. There is an occasional hacek or caron mark
-(an inverted circumflex) in the original; such letters are coded <xcar/.
-The o with a caron has code 213, but no others are in this font list.
+<xum/ and <xcr/ respectively. There is an occasional hacek or caron
+mark (an inverted circumflex) in the original; such letters are coded
+<xcar/. The o with a caron has code 213, but no other letter with a
+caron is representable by an escape sequence.
+
The diaeresis is treated typographically as identical to the umlaut.
- A special modification, used only for poetry (see entry "saturnian verse"
-under "saturnian") is a vowel with a macron, in which the macron is lighter
-than the usual macron, signifying a stressed syllable which has a short
-vowel sound. This is represented by <xsmac/ ("short mac").
- Another special character used in pronunciations is an "n" with an underline (like
-a macron, but below the letter), used to represent the "ng" sound. This is coded
-<nsm/ ("n sub-macron"). The ligated th used in pronunciations to depict the
-"th" sound of "the" is coded as <th/.
- NOTE: the letter combinations "fi" and "fl" are invariably printed as the
+A special modification, used only for poetry (see entry "saturnian
+verse" under "saturnian") is a vowel with a macron, in which the
+macron is lighter than the usual macron, signifying a stressed
+syllable which has a short vowel sound. This is represented by
+<xsmac/ ("short mac").
+
+Another special character used in pronunciations is an "n" with an
+underline (like a macron, but below the letter), used to represent the
+"ng" sound. This is coded <nsm/ ("n sub-macron"). The ligated th
+used in pronunciations to depict the "th" sound of "the" is coded as
+<th/.
+
+NOTE: the letter combinations "fi" and "fl" are invariably printed as the
ligatures &filig; and &fllig;, but these ligatures are not marked as such
in this transcription, and the two letters are left as individuals.
-SPECIAL SYMBOLS
- The dagger <dag/, double dagger <ddag/, and paragraph mark <para/ are rarely used.
- The double prime, or "seconds" of a degree is sometimes represented by
-a double "light accent" (code 183 = \'b7). In other places, and in later
-versions, it is represented by <sec/ = hex a9, in the webfont.
- The symbols "greater than" <gt/ and "less than" are encountered only
-once, but are distinguished from the right- and left-angle brackets
-(> and <) because of possible typographical differences in some fonts.
- The schwa is symbolized by <schwa/. It is not used in the
-pronunciations, but is mentioned as a symbol.
- The right-pointing arrow is <rarr/, consistent with ISO 8879.
-
-----------------------------------
-Table 1
-----------------------------------
-Numbers
- Hex codes
-1  
-11   (12 is a hard page break, 13 CR, 14 sect break)
-21  
-31  !"# $%&'(
-121 yz{|} ~ 79-7d 7e-82
-131 83-87 88-8c
-141 8d-91 92-96
-151 97-9b 9c-a0
-161 a1-a5 a6-aa
-171 ab-af b0-b4
-181 b5-b9 ba-be
-191 bf-c3 c4-c8
-201 c9-cd ce-d2
-211 d3-d7 d8-dc
-221 dd-e1 e2-e6
-231 e7-eb ec-f0
-241 f1-f5 f6-fa
-251 fb-ff
-
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
-Below is a complete list of the symbols used in the Webster ("webfont")
-which are encoded in the special font listed above, together with
-corresponding symbols in ISO 8879 and Tex coding. Much of this table was
-prepared by Rik Faith, to whom we express our appreciation.
- The "nearest ASCII" equivalents are given for those who want to
-display the data as best one can in 7-bit simple ASCII symbols without
+* Special symbols
+
+The dagger <dag/, double dagger <ddag/, and paragraph mark <para/ are
+rarely used.
+
+The double prime, or "seconds" of a degree is sometimes represented by
+a double "light accent" (code 183 = \'b7). In other places, and in
+later versions, it is represented by <sec/ = \'a9.
+
+The symbols "greater than" <gt/ and "less than" are encountered only
+once, but are distinguished from the right- and left-angle brackets (>
+and <) because of possible typographical differences in some fonts.
+
+The schwa is symbolized by <schwa/. It is not used in the
+pronunciations, but is mentioned as a symbol. The right-pointing
+arrow is <rarr/, consistent with ISO 8879.
+
+* Symbol summary
+
+Below is a complete list of the symbols used in the Webster, together
+with their "webfont" number (escape sequence), corresponding markup
+entity, and corresponding symbols in ISO 8879 and Tex coding. Much of
+this table was prepared by Rik Faith, to whom we express our
+appreciation.
+
+The "Uc" column gives the Unicode representation of the character.
+The "nearest ASCII" equivalents are given for those who want to
+display the data as best one can in 7-bit simple ASCII symbols without
using the "entity" symbols.
-=========================================================================
-----------------------------------
-Table 2
-----------------------------------
Comments:
(1) The symbol in the "entity" column is the SGML-like symbol used in
@@ -164,19 +133,19 @@ Comments:
the symbol for the same character given in "The user's guide to
ISO 8879" by Smith and Stutely.
(2) An asterisk "*" in the "entity" column means that this symbol and
-code value is not used in any form in the Webster 1913 electronic version.
+code value is not used in any form in GCIDE.
(3) If no asterisk is in the "entity" column, and no other symbol is
there, this means that in the Webster, only the hexadecimal representation
was used (e.g. for \'d8, \'bd, and \'b8).
(4) \'b6 and \'b7, the heavy and light "accents", are never above a
letter (these are not diacritical marks), but in-between letters, as the
-stress accent used in the headwords and pronunciations. The accent
-*follows* the syllable accented. The light accent \'b7 is also used as
-the "prime" in mathematical expressions (e.g. a\'b7 = "a prime"), or as
- "minutes" in degrees-minutes-seconds, and when doubled (\'b7\'b7)
-serves as "double prime" in mathematical expressions, and as "seconds"
-in degrees-minutes-seconds. The character \'a9 (<sec/ or &Prime;) is
-also used to represent the double prime.
+stress accent used in the headwords and pronunciations. The
+accent *follows* the syllable accented. The light accent \'b7 is
+also used as the "prime" in mathematical expressions (e.g. a\'b7 = "a
+prime"), or as "minutes" in degrees-minutes-seconds, and when doubled
+(\'b7\'b7) serves as "double prime" in mathematical expressions, and
+as "seconds" in degrees-minutes-seconds. The character \'a9 (<sec/ or
+&Prime;) is also used to represent the double prime.
(5) Although the semilong vowels are in the table (e.g. the "asl"
= "a semilong", most of the entries in the ASCII version dictionary
use the <xsl/ symbol coding. If you know of any printers' names for
@@ -186,419 +155,462 @@ Latin-2 table), but the other vowels don't, in the Smith & Stutely book.
Is this a mistake?
(7) The symbol <nsc/ is used for "N small capitals", used in
pronunciations to represent the soun fo the nasal N in French words.
- (8) A weak accent (when not in pronunciations) is symbolized by <prime/, the "minutes" (of a degree) symbol. A strong accent is symbolized by <bprime/ ("bold prime", not an ISO entity).
+ (8) A weak accent (when not in pronunciations) is symbolized by
+<prime/, the "minutes" (of a degree) symbol. A strong accent is
+symbolized by <bprime/ ("bold prime", not an ISO entity).
(9) If you find any exceptions to these usage assertions, please
let me know.
-----------------------------------------------------------------------------------------
- webfont ISO 8879 latin1/ascii TeX nearest description
------------------- ASCII
-oct dec hex entity oct dec hex
---------------------------------------------------------------------------------
-025 21 15 * \S * section symbol
-
-074 60 3c lt 074 60 3c $<$ < less than
-076 62 3e gt 076 62 3e $>$ > greater than
-
-200 128 80 <Cced/ Ccedil 307 199 c7 \c{C} C C cedilla
-201 129 81 <uum/ uuml 374 252 fc \"u ue u umlaut (diaeresis)
-202 130 82 <eacute/ eacute 351 233 e9 \'e e e acute
-203 131 83 <acir/ acirc 342 226 e2 \^a a a circumflex
-204 132 84 <aum/ auml 344 228 e4 \"a ae a umlaut (diaeresis)
-205 133 85 <agrave/ agrave 340 224 e0 \`a a a grave
-206 134 86 <aring/ aring 345 229 e5 \aa a a ring above
-207 135 87 <cced/ ccedil 347 231 e7 \c{c} c c cedilla
-210 136 88 <ecir/ ecirc 352 234 ea \^e e e circumflex
-211 137 89 <eum/ euml 353 235 eb \"e e e umlaut (diaeresis)
-212 138 8a <egrave/ egrave 350 232 e8 \`e e e grave
-213 139 8b <ium/ iuml 357 239 ef \"i i i umlaut (diaeresis)
-214 140 8c <icir/ icirc 356 238 ee \^i i i circumflex
-215 141 8d <igrave/ igrave 354 236 ec \`i i i grave
-216 142 8e <Aum/ Auml A A umlaut
-217 143 8f Aring A A ring above
-
-220 144 90 <Eacute/ Eacute 311 201 c9 \'E e E acute
-221 145 91 <ae/ aelig 346 230 e6 \ae ae ligature ae
-222 146 92 <AE/ AElig 306 198 c6 \AE AE ligature AE
-223 147 93 <ocir/ ocirc 364 244 f4 \^o o o circumflex
-224 148 94 <oum/ ouml 366 246 f6 \"o oe o umlaut (diaeresis)
-225 149 95 <ograve/ ograve 362 242 f2 \`o o o grave
-226 150 96 <ucir/ ucirc 373 251 fb \^u u u circumflex
-227 151 97 <ugrave/ ugrave 371 249 f9 \`u u u grave
-230 152 98 <yum/ yuml y y umlaut
-231 153 99 <Oum/ Ouml O O umlaut
-232 154 9a <Uum/ Uuml 334 220 dc \"U U U umlaut (diaeresis)
+----------------------------------------------------------------------------
+ webfont ISO 8879 TeX Uc ASC Description
+------------------
+oct dec hex entity
+----------------------------------------------------------------------------
+025 21 15 * \S § * section symbol
+
+074 60 3c lt $<$ < < less than
+076 62 3e gt $>$ > > greater than
+
+200 128 80 <Cced/ Ccedil \c{C} Ç C C cedilla
+201 129 81 <uum/ uuml \"u ü ue u umlaut (diaeresis)
+202 130 82 <eacute/ eacute \'e é e e acute
+203 131 83 <acir/ acirc \^a â a a circumflex
+204 132 84 <aum/ auml \"a ä ae a umlaut (diaeresis)
+205 133 85 <agrave/ agrave \`a à a a grave
+206 134 86 <aring/ aring \aa å aa a ring above
+207 135 87 <cced/ ccedil \c{c} ç c c cedilla
+210 136 88 <ecir/ ecirc \^e ê e e circumflex
+211 137 89 <eum/ euml \"e ë e e umlaut (diaeresis)
+212 138 8a <egrave/ egrave \`e è e e grave
+213 139 8b <ium/ iuml \"i ï i i umlaut (diaeresis)
+214 140 8c <icir/ icirc \^i î i i circumflex
+215 141 8d <igrave/ igrave \`i ì i i grave
+216 142 8e <Aum/ Auml Ä A A umlaut
+217 143 8f Aring Å Aa A ring above
+
+220 144 90 <Eacute/ Eacute \'E É E E acute
+221 145 91 <ae/ aelig \ae æ ae ligature ae
+222 146 92 <AE/ AElig \AE Æ AE ligature AE
+223 147 93 <ocir/ ocirc \^o ô o o circumflex
+224 148 94 <oum/ ouml \"o ö oe o umlaut (diaeresis)
+225 149 95 <ograve/ ograve \`o ò o o grave
+226 150 96 <ucir/ ucirc \^u û u u circumflex
+227 151 97 <ugrave/ ugrave \`u ù u u grave
+230 152 98 <yum/ yuml ÿ y y umlaut
+231 153 99 <Oum/ Ouml Ö O O umlaut
+232 154 9a <Uum/ Uuml \"U Ü U U umlaut (diaeresis)
233 155 9b
-234 156 9c <pound/ pound 243 163 a3 \pounds * pound sign (British)
+234 156 9c <pound/ pound \pounds £ * pound sign (British)
235 157 9d *
236 158 9e *
237 159 9f *
-240 160 a0 <aacute/ aacute 341 225 e1 \'a a a acute
-241 161 a1 <iacute/ iacute 355 237 ed \'i i i acute
-242 162 a2 <oacute/ oacute 363 243 f3 \'o o o acute
-243 163 a3 <uacute/ uacute 372 250 fa \'u u u acute
-244 164 a4 <ntil/ ntilde 361 241 f1 \~n ny n tilde
-245 165 a5 <Ntil/ Ntilde NY N tilde
-246 166 a6 <frac23/ $\frac{2}{3}$ 2/3 two-thirds
-247 167 a7 <frac13/ $\frac{1}{3}$ 1/3 one-third
+240 160 a0 <aacute/ aacute \'a á a a acute
+241 161 a1 <iacute/ iacute \'i í i i acute
+242 162 a2 <oacute/ oacute \'o ó o o acute
+243 163 a3 <uacute/ uacute \'u ú u u acute
+244 164 a4 <ntil/ ntilde \~n ñ ny n tilde
+245 165 a5 <Ntil/ Ntilde Ñ NY N tilde
+246 166 a6 <frac23/ $\frac{2}{3}$ ⅔ 2/3 two-thirds
+247 167 a7 <frac13/ $\frac{1}{3}$ ⅓ 1/3 one-third
250 168 a8 *
-251 169 a9 <sec/ Prime seconds (of degree or time)
- Also, inches or double prime
+251 169 a9 <sec/ Prime ˝ '' seconds (of
+ degree or time)
+ Also, inches
+ or double prime.
252 170 aa *
-253 171 ab <frac12/ 275 189 bd $\frac{1}{2}$ 1/2 one-half
-254 172 ac <frac14/ 274 188 bc $\frac{1}{4}$ 1/4 one-quarter
+253 171 ab <frac12/ $\frac{1}{2}$ ½ 1/2 one-half
+254 172 ac <frac14/ $\frac{1}{4}$ ¼ 1/4 one-quarter
255 173 ad *
256 174 ae *
257 175 af *
-260 176 b0 <?/ (?) Place-holder
- for unknown or illegible character.
+260 176 b0 <?/ (?) Place-holder
+ for unknown or
+ illegible
+ character.
261 177 b1 *
262 178 b2 *
263 179 b3 *
-264 180 b4 * $\updownarrow$ * verticle arrow
-265 181 b5 <hand/ * pointing hand
- (printer's "fist")
-266 182 b6 <bprime/ \"{} '' bold accent
- (used in pronunciations)
-267 183 b7 <prime/ prime 264 180 b4 \'{} ' light accent
- (used in pronunciations)
- also minutes (of arc or time)
-270 184 b8 <rdquo/ rdquo '' " close double quote
+264 180 b4 * $\updownarrow$ ↑ * vertical arrow
+265 181 b5 <hand/ ☞ * pointing hand
+ (printer's "fist")
+266 182 b6 <bprime/ \"{} ˝ '' bold accent
+ (used in
+ pronunciations)
+267 183 b7 <prime/ prime \'{} ´ ' light accent
+ (used in
+ pronunciations)
+ also minutes
+ (of arc or time)
+270 184 b8 <rdquo/ rdquo '' ” " close double quote
271 185 b9 *
-272 186 ba * $\parallel$ || verticle double bar (l)
+272 186 ba * $\parallel$ ‖ || vertical double bar
+ (l)
273 187 bb *
-274 188 bc <sect/ sect \S * section mark
-275 189 bd <ldquo/ ldquo `` " open double quotes
-276 190 be <amac/ amacr \=a a a macron
-277 191 bf <lsquo/ lsquo ` ` left single quote
-
-300 192 c0 <nsm/ ng "n sub-macron"
-301 193 c1 <sharp/ sharp $\sharp$ # musical sharp
-302 194 c2 <flat/ flat $\flat$ * musical flat
-303 195 c3 * -- -- long dash (en-dash? )
-304 196 c4 * $-$ - horizontal line
-305 197 c5 <th/ (part 1) first part of th ligature
- see 231 = e7 for part 2
-306 198 c6 <imac/ imacr \=i i i macron
-307 199 c7 <emac/ emacr \=e e e macron
-310 200 c8 <dsdot/ d Sanskrit/Tamil d dot
-311 201 c9 <nsdot/ n Sanskrit/Tamil n dot
-312 202 ca <tsdot/ t Sanskrit/Tamil t dot
-313 203 cb <ecr/ \u{e} e e breve
-314 204 cc <icr/ \u{i} i i breve
+274 188 bc <sect/ sect \S § * section mark
+275 189 bd <ldquo/ ldquo `` “ " open double quotes
+276 190 be <amac/ amacr \=a ā a a macron
+277 191 bf <lsquo/ lsquo ` ‘ ` left single quote
+
+300 192 c0 <nsm/ ṉ ng "n sub-macron"
+301 193 c1 <sharp/ sharp $\sharp$ ♯ # musical sharp
+302 194 c2 <flat/ flat $\flat$ ♭ * musical flat
+303 195 c3 * -- – -- long dash (en-dash? )
+304 196 c4 * $-$ ― - horizontal line
+305 197 c5 <th/ (part 1) t first part of
+ th ligature
+ see 231 = e7 for part 2
+306 198 c6 <imac/ imacr \=i ī i i macron
+307 199 c7 <emac/ emacr \=e ē e e macron
+310 200 c8 <dsdot/ ḍ d Sanskrit/Tamil d dot
+311 201 c9 <nsdot/ ṇ n Sanskrit/Tamil n dot
+312 202 ca <tsdot/ ṭ t Sanskrit/Tamil t dot
+313 203 cb <ecr/ \u{e} ĕ e e breve
+314 204 cc <icr/ \u{i} ĭ i i breve
315 205 cd *
-316 206 ce <ocr/ \u{o} o o breve
-317 207 cf - -- - short dash
-
-320 208 d0 -- mdash --- -- long (em) dash
-321 209 d1 <OE/ OElig \OE OE OE ligature
-322 210 d2 <oe/ oelig \oe oe oe ligature
-323 211 d3 <omac/ omacr \=o o o macron
-324 212 d4 <umac/ umacr \=u u u macron
-325 213 d5 <ocar/ \v{o} o o hacek
-326 214 d6 <aemac/ \=\ae ae ae ligature macron
-327 215 d7 <oemac/ \=\oe oe oe ligature macron
-330 216 d8 par $\parallel$ || double vertical
- bar(s)
+316 206 ce <ocr/ \u{o} ŏ o o breve
+317 207 cf - -- ‐ - short dash
+
+320 208 d0 -- mdash --- — --- long (em) dash
+321 209 d1 <OE/ OElig \OE ΠOE OE ligature
+322 210 d2 <oe/ oelig \oe œ oe oe ligature
+323 211 d3 <omac/ omacr \=o ō o o macron
+324 212 d4 <umac/ umacr \=u ū u u macron
+325 213 d5 <ocar/ \v{o} ǒ o o hacek
+326 214 d6 <aemac/ \=\ae ǣ ae ae ligature macron
+327 215 d7 <oemac/ \=\oe ōē oe oe ligature macron
+330 216 d8 par $\parallel$ ‖ || double vertical bar
+ (s)
331 217 d9 *
332 218 da *
333 219 db *
-334 220 dc <ucr/ ubreve \u{u} u u breve
-335 221 dd <acr/ abreve \u{a} a a breve
-336 222 de <cre/ ssmile \u{} ~ crescent
- (like a breve, but vertically centered --
- represents the short accent in poetic meter)
-337 223 df <ymac/ \=y y y macron
-
-340 224 e0 <asl/ a a "semilong"
- (has a macron above with a short vertical
- bar on top the center of the macron)
- Used in pronunciations.
-341 225 e1 <esl/ e "semilong"
-342 226 e2 <isl/ i "semilong"
-343 227 e3 <osl/ o "semilong"
-344 228 e4 <usl/ u "semilong"
-345 229 e5 <adot/ a a with dot above
-346 230 e6 * mu small Greek mu
-347 231 e7 <th/ (part 2) second part of th ligature
- see 197 = c5 for part 1
+334 220 dc <ucr/ ubreve \u{u} ŭ u u breve
+335 221 dd <acr/ abreve \u{a} ă a a breve
+336 222 de <cre/ ssmile \u{} ˘ ~ crescent
+ (like a breve,
+ but vertically
+ centered --
+ represents the
+ short accent
+ in poetic
+ meter)
+337 223 df <ymac/ \=y ȳ y y macron
+
+340 224 e0 <asl/ a a "semilong"
+ (has a macron
+ above with a
+ short vertical
+ bar on top the
+ center of the
+ macron)
+ Used in
+ pronunciations.
+341 225 e1 <esl/ e "semilong"
+342 226 e2 <isl/ i "semilong"
+343 227 e3 <osl/ o "semilong"
+344 228 e4 <usl/ u "semilong"
+345 229 e5 <adot/ ȧ a a with dot above
+346 230 e6 * μ mu small Greek mu
+347 231 e7 <th/ (part 2) second part of
+ th ligature;
+ see 197 = c5
+ for part 1
350 232 e8 *
351 233 e9 *
352 234 ea *
-353 235 eb <edh/ edh 360 240 f0 th small eth
+353 235 eb <edh/ edh ð th small eth
354 236 ec *
-355 237 ed <thorn/ thorn 376 254 fe th small thorn
-356 238 ee <atil/ atilde \~a a a tilde
-357 239 ef <ndot/ n n with dot above
+355 237 ed <thorn/ thorn þ th small thorn
+356 238 ee <atil/ atilde \~a ã a a tilde
+357 239 ef <ndot/ ṅ n n with dot above
-360 240 f0 <rsdot/ \d{r} r r with a dot below
+360 240 f0 <rsdot/ \d{r} ṛ r r with a dot below
361 241 f1 *
362 242 f2 *
363 243 f3 *
-364 244 f4 <yogh/ y small yogh
-365 245 f5 <mdash/ mdash --- -- em dash
-366 246 f6 <divide/ divide 367 247 f7 $\div$ / division sign
-367 247 f7 ap $\approx$ ~= "double tilde"
-370 248 f8 <deg/ deg 260 176 b0 ${}^\circ$ * degree sign
-371 249 f9 <middot/ $\bullet$ * bold middle dot
-372 250 fa * 267 183 b7 $\cdot$ * light middle dot
-373 251 fb <root/ radic $\surd$ * root sign
+364 244 f4 <yogh/ ȝ y small yogh
+365 245 f5 <mdash/ mdash --- — --- em dash
+366 246 f6 <divide/ divide $\div$ ÷ / division sign
+367 247 f7 ap $\approx$ ≈ ~= "double tilde"
+370 248 f8 <deg/ deg ${}^\circ$ ° * degree sign
+371 249 f9 <middot/ $\bullet$ • * bold middle dot
+372 250 fa * $\cdot$ · * light middle dot
+373 251 fb <root/ radic $\surd$ √ * root sign
374 252 fc *
375 253 fd *
376 254 fe *
-377 255 ff *
+377 255 ff *
+----------------------------------------------------------------------------
- ----------------------------------
-Table 3
-----------------------------------
-
-====================================================================
The table below gives some additional information about some of the
more commonly used entities
-------------------------------------------------------------------
Frequently used:
decimal hex char definition
- 21 section symbol -- another section also at 197
- (so that 21 can be used as a normal control
- character)
- 126 ~ used by typists as a place-holder in word
- combinations where an uncapitalized headword
- should be.
- 128 80 <Cced/ c cedilla (uppercase)
- 129 81 <uum/ u umlaut
- 130 82 e acute
- 131 83 a circumflex
- 132 84 <aum/ a umlaut
- 133 85 a grave
- 134 86 <aring/ a with "ring" (circle) above (Swedish!)
- 135 87 <cced/ c cedilla
- 136 - 144 standard European set for IBM
- 136 88 <ecir/ e circumflex
- 137 89 <eum/ e umlaut (or e with dieresis above)
- 138 8a e grave
- 145 91 <ae/ = "ae" fused ligature
- 146 92 <AE/ = upper-case "ae" fused ligature
- 147 93 <ocir/ o circumflex
- 148 94 <oum/ o "umlaut", used mostly in "coperation,
- Zol." and in pronunciations
- 164 a4 <ntil/ Spanish "enye"
- 166 a6 <frac23/ two-thirds (fraction)
- 167 a7 <frac13/ one-third (fraction)
- 169 a9 <sec/ seconds of degree or time, or double-prime
- 171 ab <frac12/ one-half, as in the original IBM set
- 172 ac <frac14/ one-fourth (fraction)
- 176 b0 <?/ = (reverse-video question mark), used
- to represent an uncodable or illegible character
- 180 b4 long verticle double-headed arrow (a reference mark)
- 181 b5 <hand/ = (the typographer's "fist")
- Appearing as a "pointing hand" character
- (for explanatory notes)
- 182 b6 bold accent in headwords
- replaced in full ASCII version by double quote = "
- 183 b7 light accent in headwords
- replaced within headwords in the full ASCII version
- by an open-single-quote (` = ASCII 96, not the same
- as 191, \'bf). This mark is used also
- for minutes of a degree, and for "prime"
- to modify variables in mathematical expressions.
- -- two of these in sequence represent seconds
- of a degree, or double prime. The seconds
- symbol is also represented by <sec/ (hex a9).
- 184 b8 close double quotes (used with 189 [= \'bd], open quote)
- 186 ba verticle double bar - represents the symbol used
- in the printed dictionary before a headword to
- signify that the word was adopted without
- anglicization from a foreign language
- but in the full-ASCII version this function
- uses \'d8 -- see 216
- 188 bc <sect/ section mark
- - alternate to 21 (a control character)
- 189 bd open double quotes (used with 184, close quote)
- 190 be <amac/ a macron
- 191 bf <lsquo/ "left single quote"
- single open quote mark (not same as ASCII 96)
- 192 c0 <nsm/ "n sub-macron", an n with a macron below --
- represents the "ng" sound in pronunciations
- 193 c1 <sharp/ sharp - music notation
- 194 c2 <flat/ flat - music notation
- 195 c3 long dash, one pixel removed from left
- will fuse with left long dash, char 208
- 196 c4 graphic horizontal line
- 195+208 combination for a very long dash. In the
- original typing, the dash char 208 was used
- for both non-breaking hyphen (in hyphenated
- words), and for the em-dash used as an
- introductory mark for various segments.
- The em-dash should be distinguished from
- the hyphen, but that conversion hasn't yet
- been done.
- In the full ASCII version, a double hypen
- "--" represent the m-dash