summaryrefslogtreecommitdiffabout
path: root/webfont.txt
blob: f7423e129edd4eff831f3c069d86a8d3f05e7ad0 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
                 WEBSTER FONTS
                 =============

* Overview

This file describes special symbols and markup entities used in the GNU
Collaborative International Dictionary of English.

* Introduction

The special characters used in the electronic version of the Webster 1913
are required for visualizing unusual characters used in the etymology and
pronunciation fields of the dictionary, in a form comparable to the way they
appear in the original.

The GCIDE markup provides two ways for representing such characters: using
special "escape sequences" and using special markup entities.  Historically,
"escape sequences" were used to indicate the character's ordinal position in
a special font, prepared by MICRA, Inc. to represent it on screen.  Although
nowadays this method is obsolete, the dictionary corpus still uses these
sequences.  This file describes their mapping to Unicode characters.

An escape sequence has the form \'xx, where "x" represent lowercase
hexadecimal digits.  For example, \'94 stands for "o" with diaeresis.  There
are only 256 such sequences.

Special markup entities are able to represent a wider range of characters.
A markup entity is similar to SGML one, but has a different format.  The
traditional &xx; format was judged inconvenient because the ampersand is
used frequently in the corpus.  Instead, GCIDE entities have the format
<WORD/, where "<" and "/" represent the beginning and end of the entity and
WORD represents the character itself.  Valid WORDs are in some cases
abbreviations (for compactness) of the ISO 8879 recommended symbols.
Characters representable by escape sequences can also be represented by
entities, but the reverse is not true, due to a limited range of the former.

The Greek words appearing in the etymologies, when they are included, are
typed in a roman-letter transcription, which is described below in chapter
"Greek transliteration".

* Unrecognized characters

Wherever the typists did not know the character to use, they usually
inserted a reverse-video question mark (decimal 176).  This appears in
full-ASCII versions as <?/.  This mark was used both for characters in
non-ASCII fonts, and for unreadable characters (i.e., characters smeared in
the original or distorted in the copies available to the typists. The type
in the original was in many places smeared and illegible at the left and
right page margins; occasionally, small parts of words were blotted out by
plain white space).

* Italics

In most places, italic font is represented by the tags <it>...</it>
surrounding the italic text, or by some other tag which also implies italic
font.  In the pronunciations, however, where italicized vowels are used
among non-italic and other special characters to indicate pronunciation, the
special codes <ait/, <eit/, <iit/, <oit/, <uit/, are also used to indicate
the italicized vowel.

* Diacritics

Vowels with a circle above (as in Swedish) are coded <xring/ (x with a ring,
or "degrees" mark over it); vowels with tilde over them are represented by
<xtil/, where "x" is the vowel, as in <etil/ (<atil/ also has code 238);
letters with a dot above are represented by <xdot/ -- letter with a dot
below are represented by <xsdot/ ("subdot"); vowels with the semi-long mark
(a macron with a short perpendicular vertical stroke attached above) are
represented by <xsl/; the circumflex vowels have codes on this list, but may
also be represented as <xcir/; vowels with macrons above are <xmac/
(including <oomac/, the "oo" with an unbroken macron above the two letters,
<aemac/ = the ligature ae with a macron [also 214 = \'d6], and <oemac/ the
ligature oe with a macron [also 215 = \'d7]); vowels with umlauts or a
crescent (breve) above have codes in this list, but may also be represented
by <xum/ and <xcr/ respectively.  There is an occasional hacek or caron mark
(an inverted circumflex) in the original; such letters are coded <xcar/.
The o with a caron has code 213, but no other letter with a caron is
representable by an escape sequence.

The diaeresis is treated typographically as identical to the umlaut.  A
special modification, used only for poetry (see entry "saturnian verse"
under "saturnian") is a vowel with a macron, in which the macron is lighter
than the usual macron, signifying a stressed syllable which has a short
vowel sound.  This is represented by <xsmac/ ("short mac").

Another special character used in pronunciations is an "n" with an underline
(like a macron, but below the letter), used to represent the "ng" sound.
This is coded <nsm/ ("n sub-macron").  The ligated th used in pronunciations
to depict the "th" sound of "the" is coded as <th/.

NOTE: the letter combinations "fi" and "fl" are invariably printed as the
ligatures &filig; and &fllig;, but these ligatures are not marked as such in
this transcription, and the two letters are left as individuals.

* Special symbols

The dagger <dag/, double dagger <ddag/, and paragraph mark <para/ are rarely
used.

The double prime, or "seconds" of a degree is sometimes represented by a
double "light accent" (code 183 = \'b7).  In other places, and in later
versions, it is represented by <sec/ = \'a9.

The symbols "greater than" <gt/ and "less than" are encountered only once,
but are distinguished from the right- and left-angle brackets (> and <)
because of possible typographical differences in some fonts.

The schwa is symbolized by <schwa/.  It is not used in the pronunciations,
but is mentioned as a symbol.  The right-pointing arrow is <rarr/,
consistent with ISO 8879.

Two special entities <and/ and <or/ represent words "and" and "or" in
italics font.

* Symbol summary

Below is a complete list of the symbols used in the Webster, together with
their "webfont" number (escape sequence), corresponding markup entity, and
corresponding symbols in ISO 8879 and Tex coding.  Much of this table was
prepared by Rik Faith, to whom we express our appreciation.

The "Uc" column gives the Unicode representation of the character.  The
"nearest ASCII" equivalents are given for those who want to display the data
as best one can in 7-bit simple ASCII symbols without using the "entity"
symbols.

Comments: 
  (1) The symbol in the "entity" column is the SGML-like symbol used in the
      present Webster files; the symbol in the "ISO 8879" column is the
      symbol for the same character given in "The user's guide to ISO 8879"
      by Smith and Stutely.
      
  (2) An asterisk "*" in the "entity" column means that this symbol and code
value is not used in any form in GCIDE.

  (3) If no asterisk is in the "entity" column, and no other symbol is
there, this means that in the Webster, only the hexadecimal representation
was used (e.g. for \'d8, \'bd, and \'b8).

  (4) \'b6 and \'b7, the heavy and light "accents", are never above a letter
(these are not diacritical marks), but in-between letters, as the stress
accent used in the headwords and pronunciations.  The accent *follows* the
syllable accented.  The light accent \'b7 is also used as the "prime" in
mathematical expressions (e.g. a\'b7 = "a prime"), or as "minutes" in
degrees-minutes-seconds, and when doubled (\'b7\'b7) serves as "double
prime" in mathematical expressions, and as "seconds" in
degrees-minutes-seconds.  The character \'a9 (<sec/ or &Prime;) is also used
to represent the double prime.

  (5) Although the semilong vowels are in the table (e.g. the "asl" = "a
semilong", most of the entries in the ASCII version dictionary use the <xsl/
symbol coding.  If you know of any printers' names for these, do let me
know.

  (6) For some reason, the a breve and u breve have ISO codes (in the
Latin-2 table), but the other vowels don't, in the Smith & Stutely book.  Is
this a mistake?

  (7) The symbol <nsc/ is used for "N small capitals", used in
pronunciations to represent the soun fo the nasal N in French words.

  (8) A weak accent (when not in pronunciations) is symbolized by <prime/,
the "minutes" (of a degree) symbol.  A strong accent is symbolized by
<bprime/ ("bold prime", not an ISO entity).

  (9) If you find any exceptions to these usage assertions, please
let me know.

----------------------------------------------------------------------------
     webfont       ISO 8879   TeX             Uc  ASC   Description
------------------                               
oct dec hex entity             
----------------------------------------------------------------------------
025  21  15  *                 \S             §   *     section symbol

074  60  3c          lt        $<$            <   <     less than
076  62  3e          gt        $>$            >   >     greater than

200 128  80 <Cced/   Ccedil    \c{C}          Ç   C     C cedilla
201 129  81 <uum/    uuml      \"u            ü   ue    u umlaut (diaeresis)
202 130  82 <eacute/ eacute    \'e            é   e     e acute
203 131  83 <acir/   acirc     \^a            â   a     a circumflex
204 132  84 <aum/    auml      \"a            ä   ae    a umlaut (diaeresis)
205 133  85 <agrave/ agrave    \`a            à   a     a grave
206 134  86 <aring/  aring     \aa            å   aa    a ring above
207 135  87 <cced/   ccedil    \c{c}          ç   c     c cedilla
210 136  88 <ecir/   ecirc     \^e            ê   e     e circumflex
211 137  89 <eum/    euml      \"e            ë   e     e umlaut (diaeresis)
212 138  8a <egrave/ egrave    \`e            è   e     e grave
213 139  8b <ium/    iuml      \"i            ï   i     i umlaut (diaeresis)
214 140  8c <icir/   icirc     \^i            î   i     i circumflex
215 141  8d <igrave/ igrave    \`i            ì   i     i grave
216 142  8e <Aum/    Auml                     Ä   A     A umlaut
217 143  8f          Aring                    Å   Aa    A ring above

220 144  90 <Eacute/ Eacute    \'E            É   E     E acute
221 145  91 <ae/     aelig     \ae            æ   ae    ligature ae
222 146  92 <AE/     AElig     \AE            Æ   AE    ligature AE
223 147  93 <ocir/   ocirc     \^o            ô   o     o circumflex
224 148  94 <oum/    ouml      \"o            ö   oe    o umlaut (diaeresis)
225 149  95 <ograve/ ograve    \`o            ò   o     o grave
226 150  96 <ucir/   ucirc     \^u            û   u     u circumflex
227 151  97 <ugrave/ ugrave    \`u            ù   u     u grave
230 152  98 <yum/    yuml                     ÿ   y     y umlaut
231 153  99 <Oum/    Ouml                     Ö   O     O umlaut
232 154  9a <Uum/    Uuml      \"U            Ü   U     U umlaut (diaeresis)
233 155  9b
234 156  9c <pound/  pound     \pounds        £   *     pound sign (British)
235 157  9d  *
236 158  9e  *
237 159  9f  *
240 160  a0 <aacute/ aacute    \'a            á   a     a acute
241 161  a1 <iacute/ iacute    \'i            í   i     i acute
242 162  a2 <oacute/ oacute    \'o            ó   o     o acute
243 163  a3 <uacute/ uacute    \'u            ú   u     u acute
244 164  a4 <ntil/   ntilde    \~n            ñ   ny    n tilde
245 165  a5 <Ntil/   Ntilde                   Ñ   NY    N tilde
246 166  a6 <frac23/           $\frac{2}{3}$  ⅔   2/3   two-thirds
247 167  a7 <frac13/           $\frac{1}{3}$  ⅓   1/3   one-third
250 168  a8  *
251 169  a9 <sec/    Prime                    ˝   ''    seconds (of
                                                        degree or time)
                                                        Also, inches
							or double prime.
252 170  aa  *
253 171  ab <frac12/           $\frac{1}{2}$  ½   1/2   one-half
254 172  ac <frac14/           $\frac{1}{4}$  ¼   1/4   one-quarter
255 173  ad  *
256 174  ae  *
257 175  af  *
260 176  b0 <?/                                   (?)   Place-holder
                                                        for unknown or
							illegible
							character.   
261 177  b1  *
262 178  b2  *
263 179  b3  *
264 180  b4  *                 $\updownarrow$ ↑   *     vertical arrow
265 181  b5 <hand/                            ☞   *     pointing hand
                                                        (printer's "fist")
266 182  b6 <bprime/           \"{}           ˝   ''    bold accent 
                                                        (used in
							pronunciations) 
267 183  b7 <prime/  prime     \'{}           ´   '     light accent 
                                                        (used in
							pronunciations) 
                                                        also minutes
							(of arc or time) 
270 184  b8 <rdquo/  rdquo     ''             ”   "     close double quote
271 185  b9  *
272 186  ba  *                 $\parallel$    ‖   ||    vertical double bar
                                                        (l)
273 187  bb  *
274 188  bc <sect/  sect       \S             §   *     section mark
275 189  bd <ldquo/ ldquo      ``             “   "     open double quotes
276 190  be <amac/  amacr      \=a            ā   a     a macron
277 191  bf <lsquo/ lsquo      `              ‘   `     left single quote

300 192  c0 <nsm/                             ṉ   ng    "n sub-macron"
301 193  c1 <sharp/ sharp      $\sharp$       ♯   #     musical sharp
302 194  c2 <flat/  flat       $\flat$        ♭   *     musical flat
303 195  c3  *                 --             –   --    long dash (en-dash? )
304 196  c4  *                 $-$            ―   -     horizontal line
305 197  c5 <th/ (part 1)                         t     first part of
                                                        th ligature 
                                                        see 231 = e7 for part 2
306 198  c6 <imac/  imacr      \=i            ī   i     i macron
307 199  c7 <emac/  emacr      \=e            ē   e     e macron
310 200  c8 <dsdot/                           ḍ   d     Sanskrit/Tamil d dot 
311 201  c9 <nsdot/                           ṇ   n     Sanskrit/Tamil n dot
312 202  ca <tsdot/                           ṭ   t     Sanskrit/Tamil t dot
313 203  cb <ecr/              \u{e}          ĕ   e     e breve
314 204  cc <icr/              \u{i}          ĭ   i     i breve
315 205  cd  *
316 206  ce <ocr/              \u{o}          ŏ   o     o breve
317 207  cf  -                 --             ‐   -     short dash

320 208  d0  --      mdash     ---            —   ---   long (em) dash
321 209  d1 <OE/     OElig     \OE            Π  OE    OE ligature
322 210  d2 <oe/     oelig     \oe            œ   oe    oe ligature
323 211  d3 <omac/   omacr     \=o            ō   o     o macron
324 212  d4 <umac/   umacr     \=u            ū   u     u macron
325 213  d5 <ocar/             \v{o}          ǒ   o     o hacek
326 214  d6 <aemac/            \=\ae          ǣ   ae    ae ligature macron
327 215  d7 <oemac/            \=\oe          ōē  oe    oe ligature macron
330 216  d8          par       $\parallel$    ‖   ||    double vertical bar
                                                        (s)
331 217  d9  *
332 218  da  *
333 219  db  *
334 220  dc <ucr/   ubreve     \u{u}          ŭ   u     u breve
335 221  dd <acr/   abreve     \u{a}          ă   a     a breve
336 222  de <cre/   ssmile     \u{}           ˘   ~     crescent
                                                        (like a breve,
							but vertically
							centered -- 
                                                        represents the
							short accent
							in poetic
							meter) 
337 223  df <ymac/             \=y            ȳ   y     y macron

340 224  e0  <asl/                                a     a "semilong"
                                                        (has a macron
                                                        above with a
                                                        short vertical
                                                        bar on top the
                                                        center of the
                                                        macron) 
                                                        Used in
                                                        pronunciations. 
341 225  e1  <esl/                                      e "semilong"
342 226  e2  <isl/                                      i "semilong"
343 227  e3  <osl/                                      o "semilong"
344 228  e4  <usl/                                      u "semilong"
345 229  e5  <adot/                           ȧ   a     a with dot above
346 230  e6  *                                μ   mu    small Greek mu
347 231  e7  <th/ (part 2)                              second part of
                                                        th ligature; 
                                                        see 197 = c5
                                                        for part 1
350 232  e8  *
351 233  e9  *
352 234  ea  *
353 235  eb <edh/   edh                       ð   th    small eth
354 236  ec  *
355 237  ed <thorn/ thorn                     þ   th    small thorn
356 238  ee <atil/  atilde     \~a            ã   a     a tilde
357 239  ef <ndot/                            ṅ   n     n with dot above

360 240  f0 <rsdot/            \d{r}          ṛ   r     r with a dot below
361 241  f1   *
362 242  f2   *
363 243  f3   *
364 244  f4 <yogh/                            ȝ   y     small yogh
365 245  f5 <mdash/ mdash      ---            —   ---   em dash
366 246  f6 <divide/ divide    $\div$         ÷   /     division sign
367 247  f7         ap         $\approx$      ≈   ~=    "double tilde"
370 248  f8 <deg/   deg        ${}^\circ$     °   *     degree sign
371 249  f9 <middot/           $\bullet$      •   *     bold middle dot
372 250  fa   *                $\cdot$        ·   *     light middle dot
373 251  fb <root/  radic      $\surd$        √   *     root sign
374 252  fc   *
375 253  fd   *
376 254  fe   *
377 255  ff   *                 
----------------------------------------------------------------------------

The table below gives some additional information about some of the more
commonly used entities:

-------------------------------------------------------------------
Frequently used:
decimal  hex    char  definition
  128    80     Ç     <Cced/ c cedilla (uppercase)
  129    81     ü     <uum/ u umlaut
  130    82     é     e acute
  131    83     â     a circumflex
  132    84     ä     <aum/ a umlaut
  133    85     à     a grave
  134    86     å     <aring/ a with "ring" (circle) above (Swedish!)
  135    87     ç     <cced/ c cedilla
  136    88     ê     <ecir/ e circumflex
  137    89     ë     <eum/ e umlaut (or e with dieresis above)
  138    8a     è     e grave
  145    91     æ     <ae/ = "ae" fused ligature
  146    92     Æ     <AE/ = upper-case "ae" fused ligature
  147    93     ô     <ocir/ o circumflex
  148    94     ö     <oum/ o "umlaut", used mostly in "coöperation,
                      Zoöl." and in pronunciations
  164    a4     ñ     <ntil/ Spanish "enye"
  166    a6     ⅔     <frac23/ two-thirds (fraction)
  167    a7     ⅓     <frac13/ one-third (fraction)
  169    a9     ˝     <sec/  seconds of degree or time, or double-prime
  171    ab     ½     <frac12/ one-half, as in the original IBM set
  172    ac     ¼     <frac14/ one-fourth (fraction)
  176    b0           <?/ = (reverse-video question mark), used
                      to represent an uncodable or illegible character
  180    b4     ´     long verticle double-headed arrow (a reference mark)
  181    b5     ☞     <hand/ = (the typographer's "fist")
                      Appearing as a "pointing hand" character
                      (for explanatory notes)
  182    b6     ˝     bold accent in headwords
                      replaced in full ASCII version by double quote = "
  183    b7     ´     light accent in headwords
                      replaced within headwords in the full ASCII version
                      by an open-single-quote (` = ASCII 96, not the same
                      as 191, \'bf).   This mark is used also for
		      minutes of a degree, and for "prime" to modify
		      variables in mathematical expressions.
		      -- two of these in sequence represent seconds of
		      a degree, or double prime.  The seconds symbol
		      is also represented by <sec/ (hex a9).
  184    b8     ”     close double quotes (used with 189 [= \'bd],
                      open quote) 
  186    ba     ‖     vertical double bar - represents the symbol used
                      in the printed dictionary before a headword to
		      signify that the word was adopted without
		      anglicization from a foreign language but in the
		      full-ASCII version this function uses \'d8 -- see 216
  188    bc     §     <sect/ section mark
  189    bd     “     open double quotes (used with 184, close quote)
  190    be     ā     <amac/ a macron
  191    bf     ‘     <lsquo/ "left single quote"
                      single open quote mark (not same as ASCII 96)
  192    c0     ṉ     <nsm/ "n sub-macron", an n with a macron below --
                      represents the "ng" sound in pronunciations
  193    c1     ♯     <sharp/ sharp - music notation
  194    c2     ♭     <flat/  flat  - music notation
  195    c3     –     long dash, one pixel removed from left
                      will fuse with left long dash, char 208
  196    c4     ―     graphic horizontal line
  195+208       –—    combination for a very long dash.  In the
                      original typing, the dash char 208 was used for
		      both non-breaking hyphen (in hyphenated words),
		      and for the em-dash used as an introductory mark
		      for various segments.  The em-dash should be
		      distinguished from the hyphen, but that
		      conversion hasn't yet been done.  In the full
		      ASCII version, a double hypen "--" represent the
		      m-dash.
  197    c5      t    <th/ (part 1) first of a pair of characters
  197+231 =      th   used to represent the th ligature --
                      <th/ represents the "th" sound of "mother"
                      see 231 (e7) for part 2
  198    c6      ī   <imac/ = i macron
  199    c7      ē   <emac/ = e macron
  200    c8      ḍ   <dsdot/ Sanskrit/Tamil d with dot underneath
  201    c9      ṇ   <nsdot/ Sanskrit/Tamil n with dot underneath
  202    ca      ṭ   <tsdot/ Sanskrit/Tamil t with dot underneath
  203    cb      ĕ   <ecr/ = e with crescent (breve) above.  Used
                     - in some etymologies and pronunciation
  204    cc      ĭ   <icr/ = i with crescent (breve) above - used
                     - in some etymologies and pronunciation
  206    ce      ŏ   <ocr/ = o with crescent (breve) above - used
                     - in some etymologies and pronunciation
  207    cf      ‐   short dash, used in hyphenated words, and in
                     breaking syllables where no accent is used. But
		     sometimes the typists used the normal hyphen
		     [45], or the long dash (decimal 208) for that
		     purpose.  The normal hyphen is the same length
		     as the long dash, but one pixel higher in the
		     character box. 
                     # In headwords, in the full ASCII version, this
                     short dash is represented by the asterisk "*".
  208    d0      —   <mdash/ = represents the long dash, used for the em 
                     dash which often precedes certain sections within a
                     definition, and which separates some sections,
                     such as wordforms or collocations within a
                     collocation segment.  This is replaced in the
                     full ASCII version by a double hyphen, "--".
  210    d2      œ   <oe/ = "oe" fused ligature
  211    d3      ō   <omac/ = o macron
  212    d4      ū   <umac/ = u macron
  213    d5      ǒ   <ocar/ o with caron (hacek) (inverted
                      circumflex) above 
  214    d6      ǣ   <aemac/ = "ae" ligature with a macron
  215    d7      ōē  <oemac/ = "oe" ligature with a macron
  216    d8      ‖   <par/ double vertical bar (short length; the long
                     length is the graphics character 186)
                     This precedes words marked with a double
		     vertical bar in the original dictionary,
		     signifying that the word was adopted directly
		     into English without modification of the spelling.
  220    dc      ŭ   <ucr/ = u with crescent above - used in some etymologies
  221    dd      ă   <acr/ = a with crescent above - used in some etymologies
  222    de      ˘   <cre/ = "crescent", an upward-curving crescent
                     used as a poetic meter mark
  223    df      ȳ   <ymac/ = y macron (used in Anglo-Saxon?) 
  229    e5      ȧ   <adot/ = a with a dot above (for pronunciations)
  231    e7      h   <th/ (part 2) second of a two-character combination
  197+231 = th       used to represent the th ligature in pronunciations
                     <th/ represents the "th" sound of "mother"
  235    eb      ð   <edh/ = Old English and Icelandic "edh", (or "eth")
                     like a Greek delta with a hatch mark through
		     the ascender. Used to represent the
		     Anglo-Saxon/Icelandic/Gothic character, in
		     etymologies, pronounced like "th"
  237    ed      þ   <thorn/ "thorn", an Old English and Icelandic
                     character, appears like a "p" with an extended
                     ascender.
                     Used to represent the
		     Anglo-Saxon/Icelandic/Gothic character, in
		     etymologies, pronounced like "th" in "thorn"
		     and also as in "brother"
  238    ee      ã   <atil/ a with tilde above - in some etymologies
  244    f4      ȝ   <yogh/ like a script "3" or "z". Used in Old English
                     etymologies, analogous to "y"
  247    f7      ≈   double tilde ("approximately equals").
                     used by typists as a place-holder in word
                     combinations where the capitalized headword
                     should be.
  248    f8      °   <deg/ degrees (temperature or angle).  Note: some
                     typists used a superscript "o" to signify
                     degrees.  This must be corrected!
  249    f9      •   middle dot (bold)
  250    fa      ·   middle dot (light)
  251    fb      √   <root/ "root" sign used in etymologies, as in original

* Greek transliteration

Stand-alone Greek letters are represented by entities <alpha/, <beta/,
<gamma/, <lambda/ etc.  Capitalized letters are <ALPHA/, etc.

Text appearing within the markers <grk></grk>, is a Greek transliteration
written in roman letters.  The following rules are used:

** Aspirants

Aspirants are represented by ' (apostrophe) and " (double quote) placed in
front of the letter modified.  Apostrophe stands for ψιλὸν πνεῦμα (ψιλή or
spiritus lenis), and double quote stands for δασὺ πνεῦμα (δασεία or spiritus
asper).

   'a  -- ἀ
   "a  -- ἁ

** Accents

Accents are placed after the accented letter.  The acute accent (ὀξεῖα) is
represented by ` (gravis).  The grave accent (βαρεῖα) is represented by ~
(tilde), and circumflex (περισπωμένη) is represented by circumflex.  Thus:

   a`  -- ά
   a~  -- ὰ
   a^  -- ᾶ
   
Some examples of the combined forms (aspirant + accent):

   'a` -- ἄ
   'a~ -- ἂ
   'a^ -- ἆ
   "a` -- ἄ
   "a~ -- ἂ,
   "a~ -- ἃ

   
** Iota subscriptum

Iota subscript is represented by comma placed after the affected vowel.  If
the vowel is accented, the comma is placed after the accent mark.  For
example:

    a`,  --  ᾴ
    'a`, --  ᾄ

** Diaeresis
    
Diaeresis is represented by a colon immediately after the affected vowel.
If the vowel is accented, the accent is placed after the colon, e.g.:

    i:  --  ϊ
    i:^ --  ῗ
    i:` --  ῒ
    
** Letters

The table below shows, for each Greek letter, the corresponding markup
entity and transliteration.  The capitalized Greek letters are represented
by the capitalized versions of the letters shown here.

-----------------------------------------
  Greek letter    transliteration
------------    ---------------
α   alpha           a
β   beta            b
γ   gamma           g
δ   delta           d
ε   epsilon         e
ζ   zeta            z
η   eta             h
θ   theta           q    [1]
ι   iota            i
κ   kappa           k
λ   lambda          l
μ   mu              m
ν   nu              n
ξ   xi              x
ο   omicron         o
π   pi              p
ρ   rho             r
σ,ς sigma           s    [2]
τ   tau             t
υ   upsilon         y,u  [3]
φ   phi             f
χ   chi             ch   [4]
ψ   psi             ps   [5]
ω   omega           w

---
[1] "th" was used in some earier sections, but was changed due to potential
confusion with the tau+eta combination, as in λυτήριος
(<grk>lyth`rios</grk>, at "lyterian") or ποιητής (<grk>poihth`s</grk>, at
"maker").
[2] Final sigma is not distinguished here from middle sigma, but when
isolated, use <sigmat/ ("terminal sigma") for the final form.
[3] Both y and u are used interchangeably in this edition.
[4] "c" is always followed by "h", so the "h" component is not confusable
with eta.  Applications must first convert "ch" before converting "h", or at
least verify that an "h" to be converted has no preceding "c".
[5] This usage is theoretically confusable with pi-sigma, but that
combination seems never to occur.

Roman "j" and "v" are unused.  Roman "u" is occasionally used instead of "y"
to represent upsilon.

Examples:

<grk>'archai:`zein</grk>   ἀρχαῒζειν
<grk>zw^,on</grk>          ζῷον
<grk>o'i^nos</grk>         οἶνος
<grk>"ydra`rgyros</grk>    ὑδράργυρος


Local Variables:
mode: Outline
coding: utf-8
fill-column: 76
End:

Return to:

Send suggestions and report system problems to the System administrator.