1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
|
WEBSTER FONTS
=============
* Overview
This file describes special symbols and markup entities used in the GNU
Collaborative International Dictionary of English.
* Introduction
The special characters used in the electronic version of the Webster 1913
are required for visualizing unusual characters used in the etymology and
pronunciation fields of the dictionary, in a form comparable to the way they
appear in the original.
The GCIDE markup provides two ways for representing such characters: using
special "escape sequences" and using special markup entities. Historically,
"escape sequences" were used to indicate the character's ordinal position in
a special font, prepared by MICRA, Inc. to represent it on screen. Although
nowadays this method is obsolete, the dictionary corpus still uses these
sequences. This file describes their mapping to Unicode characters.
An escape sequence has the form \'xx, where "x" represent lowercase
hexadecimal digits. For example, \'94 stands for "o" with diaeresis. There
are only 256 such sequences.
Special markup entities are able to represent a wider range of characters.
A markup entity is similar to SGML one, but has a different format. The
traditional &xx; format was judged inconvenient because the ampersand is
used frequently in the corpus. Instead, GCIDE entities have the format
<WORD/, where "<" and "/" represent the beginning and end of the entity and
WORD represents the character itself. Valid WORDs are in some cases
abbreviations (for compactness) of the ISO 8879 recommended symbols.
Characters representable by escape sequences can also be represented by
entities, but the reverse is not true, due to a limited range of the former.
The Greek words appearing in the etymologies, when they are included, are
typed in a roman-letter transcription, which is described below in chapter
"Greek transliteration".
* Unrecognized characters
Wherever the typists did not know the character to use, they usually
inserted a reverse-video question mark (decimal 176). This appears in
full-ASCII versions as <?/. This mark was used both for characters in
non-ASCII fonts, and for unreadable characters (i.e., characters smeared in
the original or distorted in the copies available to the typists. The type
in the original was in many places smeared and illegible at the left and
right page margins; occasionally, small parts of words were blotted out by
plain white space).
* Italics
In most places, italic font is represented by the tags <it>...</it>
surrounding the italic text, or by some other tag which also implies italic
font. In the pronunciations, however, where italicized vowels are used
among non-italic and other special characters to indicate pronunciation, the
special codes <ait/, <eit/, <iit/, <oit/, <uit/, are also used to indicate
the italicized vowel.
* Diacritics
Vowels with a circle above (as in Swedish) are coded <xring/ (x with a ring,
or "degrees" mark over it); vowels with tilde over them are represented by
<xtil/, where "x" is the vowel, as in <etil/ (<atil/ also has code 238);
letters with a dot above are represented by <xdot/ -- letter with a dot
below are represented by <xsdot/ ("subdot"); vowels with the semi-long mark
(a macron with a short perpendicular vertical stroke attached above) are
represented by <xsl/; the circumflex vowels have codes on this list, but may
also be represented as <xcir/; vowels with macrons above are <xmac/
(including <oomac/, the "oo" with an unbroken macron above the two letters,
<aemac/ = the ligature ae with a macron [also 214 = \'d6], and <oemac/ the
ligature oe with a macron [also 215 = \'d7]); vowels with umlauts or a
crescent (breve) above have codes in this list, but may also be represented
by <xum/ and <xcr/ respectively. There is an occasional hacek or caron mark
(an inverted circumflex) in the original; such letters are coded <xcar/.
The o with a caron has code 213, but no other letter with a caron is
representable by an escape sequence.
The diaeresis is treated typographically as identical to the umlaut. A
special modification, used only for poetry (see entry "saturnian verse"
under "saturnian") is a vowel with a macron, in which the macron is lighter
than the usual macron, signifying a stressed syllable which has a short
vowel sound. This is represented by <xsmac/ ("short mac").
Another special character used in pronunciations is an "n" with an underline
(like a macron, but below the letter), used to represent the "ng" sound.
This is coded <nsm/ ("n sub-macron"). The ligated th used in pronunciations
to depict the "th" sound of "the" is coded as <th/.
NOTE: the letter combinations "fi" and "fl" are invariably printed as the
ligatures fi and fl, but these ligatures are not marked as such in
this transcription, and the two letters are left as individuals.
* Special symbols
The dagger <dag/, double dagger <ddag/, and paragraph mark <para/ are rarely
used.
The double prime, or "seconds" of a degree is sometimes represented by a
double "light accent" (code 183 = \'b7). In other places, and in later
versions, it is represented by <sec/ = \'a9.
The symbols "greater than" <gt/ and "less than" are encountered only once,
but are distinguished from the right- and left-angle brackets (> and <)
because of possible typographical differences in some fonts.
The schwa is symbolized by <schwa/. It is not used in the pronunciations,
but is mentioned as a symbol. The right-pointing arrow is <rarr/,
consistent with ISO 8879.
Two special entities <and/ and <or/ represent words "and" and "or" in
italics font.
* Symbol summary
Below is a complete list of the symbols used in the Webster, together with
their "webfont" number (escape sequence), corresponding markup entity, and
corresponding symbols in ISO 8879 and Tex coding. Much of this table was
prepared by Rik Faith, to whom we express our appreciation.
The "Uc" column gives the Unicode representation of the character. The
"nearest ASCII" equivalents are given for those who want to display the data
as best one can in 7-bit simple ASCII symbols without using the "entity"
symbols.
Comments:
(1) The symbol in the "entity" column is the SGML-like symbol used in the
present Webster files; the symbol in the "ISO 8879" column is the
symbol for the same character given in "The user's guide to ISO 8879"
by Smith and Stutely.
(2) An asterisk "*" in the "entity" column means that this symbol and code
value is not used in any form in GCIDE.
(3) If no asterisk is in the "entity" column, and no other symbol is
there, this means that in the Webster, only the hexadecimal representation
was used (e.g. for \'d8, \'bd, and \'b8).
(4) \'b6 and \'b7, the heavy and light "accents", are never above a letter
(these are not diacritical marks), but in-between letters, as the stress
accent used in the headwords and pronunciations. The accent *follows* the
syllable accented. The light accent \'b7 is also used as the "prime" in
mathematical expressions (e.g. a\'b7 = "a prime"), or as "minutes" in
degrees-minutes-seconds, and when doubled (\'b7\'b7) serves as "double
prime" in mathematical expressions, and as "seconds" in
degrees-minutes-seconds. The character \'a9 (<sec/ or ″) is also used
to represent the double prime.
(5) Although the semilong vowels are in the table (e.g. the "asl" = "a
semilong", most of the entries in the ASCII version dictionary use the <xsl/
symbol coding. If you know of any printers' names for these, do let me
know.
(6) For some reason, the a breve and u breve have ISO codes (in the
Latin-2 table), but the other vowels don't, in the Smith & Stutely book. Is
this a mistake?
(7) The symbol <nsc/ is used for "N small capitals", used in
pronunciations to represent the soun fo the nasal N in French words.
(8) A weak accent (when not in pronunciations) is symbolized by <prime/,
the "minutes" (of a degree) symbol. A strong accent is symbolized by
<bprime/ ("bold prime", not an ISO entity).
(9) If you find any exceptions to these usage assertions, please
let me know.
----------------------------------------------------------------------------
webfont ISO 8879 TeX Uc ASC Description
------------------
oct dec hex entity
----------------------------------------------------------------------------
025 21 15 * \S § * section symbol
074 60 3c lt $<$ < < less than
076 62 3e gt $>$ > > greater than
200 128 80 <Cced/ Ccedil \c{C} Ç C C cedilla
201 129 81 <uum/ uuml \"u ü ue u umlaut (diaeresis)
202 130 82 <eacute/ eacute \'e é e e acute
203 131 83 <acir/ acirc \^a â a a circumflex
204 132 84 <aum/ auml \"a ä ae a umlaut (diaeresis)
205 133 85 <agrave/ agrave \`a à a a grave
206 134 86 <aring/ aring \aa å aa a ring above
207 135 87 <cced/ ccedil \c{c} ç c c cedilla
210 136 88 <ecir/ ecirc \^e ê e e circumflex
211 137 89 <eum/ euml \"e ë e e umlaut (diaeresis)
212 138 8a <egrave/ egrave \`e è e e grave
213 139 8b <ium/ iuml \"i ï i i umlaut (diaeresis)
214 140 8c <icir/ icirc \^i î i i circumflex
215 141 8d <igrave/ igrave \`i ì i i grave
216 142 8e <Aum/ Auml Ä A A umlaut
217 143 8f Aring Å Aa A ring above
220 144 90 <Eacute/ Eacute \'E É E E acute
221 145 91 <ae/ aelig \ae æ ae ligature ae
222 146 92 <AE/ AElig \AE Æ AE ligature AE
223 147 93 <ocir/ ocirc \^o ô o o circumflex
224 148 94 <oum/ ouml \"o ö oe o umlaut (diaeresis)
225 149 95 <ograve/ ograve \`o ò o o grave
226 150 96 <ucir/ ucirc \^u û u u circumflex
227 151 97 <ugrave/ ugrave \`u ù u u grave
230 152 98 <yum/ yuml ÿ y y umlaut
231 153 99 <Oum/ Ouml Ö O O umlaut
232 154 9a <Uum/ Uuml \"U Ü U U umlaut (diaeresis)
233 155 9b
234 156 9c <pound/ pound \pounds £ * pound sign (British)
235 157 9d *
236 158 9e *
237 159 9f *
240 160 a0 <aacute/ aacute \'a á a a acute
241 161 a1 <iacute/ iacute \'i í i i acute
242 162 a2 <oacute/ oacute \'o ó o o acute
243 163 a3 <uacute/ uacute \'u ú u u acute
244 164 a4 <ntil/ ntilde \~n ñ ny n tilde
245 165 a5 <Ntil/ Ntilde Ñ NY N tilde
246 166 a6 <frac23/ $\frac{2}{3}$ ⅔ 2/3 two-thirds
247 167 a7 <frac13/ $\frac{1}{3}$ ⅓ 1/3 one-third
250 168 a8 *
251 169 a9 <sec/ Prime ˝ '' seconds (of
degree or time)
Also, inches
or double prime.
252 170 aa *
253 171 ab <frac12/ $\frac{1}{2}$ ½ 1/2 one-half
254 172 ac <frac14/ $\frac{1}{4}$ ¼ 1/4 one-quarter
255 173 ad *
256 174 ae *
257 175 af *
260 176 b0 <?/ (?) Place-holder
for unknown or
illegible
character.
261 177 b1 *
262 178 b2 *
263 179 b3 *
264 180 b4 * $\updownarrow$ ↑ * vertical arrow
265 181 b5 <hand/ ☞ * pointing hand
(printer's "fist")
266 182 b6 <bprime/ \"{} ˝ '' bold accent
(used in
pronunciations)
267 183 b7 <prime/ prime \'{} ´ ' light accent
(used in
pronunciations)
also minutes
(of arc or time)
270 184 b8 <rdquo/ rdquo '' ” " close double quote
271 185 b9 *
272 186 ba * $\parallel$ ‖ || vertical double bar
(l)
273 187 bb *
274 188 bc <sect/ sect \S § * section mark
275 189 bd <ldquo/ ldquo `` “ " open double quotes
276 190 be <amac/ amacr \=a ā a a macron
277 191 bf <lsquo/ lsquo ` ‘ ` left single quote
300 192 c0 <nsm/ ṉ ng "n sub-macron"
301 193 c1 <sharp/ sharp $\sharp$ ♯ # musical sharp
302 194 c2 <flat/ flat $\flat$ ♭ * musical flat
303 195 c3 * -- – -- long dash (en-dash? )
304 196 c4 * $-$ ― - horizontal line
305 197 c5 <th/ (part 1) t first part of
th ligature
see 231 = e7 for part 2
306 198 c6 <imac/ imacr \=i ī i i macron
307 199 c7 <emac/ emacr \=e ē e e macron
310 200 c8 <dsdot/ ḍ d Sanskrit/Tamil d dot
311 201 c9 <nsdot/ ṇ n Sanskrit/Tamil n dot
312 202 ca <tsdot/ ṭ t Sanskrit/Tamil t dot
313 203 cb <ecr/ \u{e} ĕ e e breve
314 204 cc <icr/ \u{i} ĭ i i breve
315 205 cd *
316 206 ce <ocr/ \u{o} ŏ o o breve
317 207 cf - -- ‐ - short dash
320 208 d0 -- mdash --- — --- long (em) dash
321 209 d1 <OE/ OElig \OE Œ OE OE ligature
322 210 d2 <oe/ oelig \oe œ oe oe ligature
323 211 d3 <omac/ omacr \=o ō o o macron
324 212 d4 <umac/ umacr \=u ū u u macron
325 213 d5 <ocar/ \v{o} ǒ o o hacek
326 214 d6 <aemac/ \=\ae ǣ ae ae ligature macron
327 215 d7 <oemac/ \=\oe ōē oe oe ligature macron
330 216 d8 par $\parallel$ ‖ || double vertical bar
(s)
331 217 d9 *
332 218 da *
333 219 db *
334 220 dc <ucr/ ubreve \u{u} ŭ u u breve
335 221 dd <acr/ abreve \u{a} ă a a breve
336 222 de <cre/ ssmile \u{} ˘ ~ crescent
(like a breve,
but vertically
centered --
represents the
short accent
in poetic
meter)
337 223 df <ymac/ \=y ȳ y y macron
340 224 e0 <asl/ a a "semilong"
(has a macron
above with a
short vertical
bar on top the
center of the
macron)
Used in
pronunciations.
341 225 e1 <esl/ e "semilong"
342 226 e2 <isl/ i "semilong"
343 227 e3 <osl/ o "semilong"
344 228 e4 <usl/ u "semilong"
345 229 e5 <adot/ ȧ a a with dot above
346 230 e6 * μ mu small Greek mu
347 231 e7 <th/ (part 2) second part of
th ligature;
see 197 = c5
for part 1
350 232 e8 *
351 233 e9 *
352 234 ea *
353 235 eb <edh/ edh ð th small eth
354 236 ec *
355 237 ed <thorn/ thorn þ th small thorn
356 238 ee <atil/ atilde \~a ã a a tilde
357 239 ef <ndot/ ṅ n n with dot above
360 240 f0 <rsdot/ \d{r} ṛ r r with a dot below
361 241 f1 *
362 242 f2 *
363 243 f3 *
364 244 f4 <yogh/ ȝ y small yogh
365 245 f5 <mdash/ mdash --- — --- em dash
366 246 f6 <divide/ divide $\div$ ÷ / division sign
367 247 f7 ap $\approx$ ≈ ~= "double tilde"
370 248 f8 <deg/ deg ${}^\circ$ ° * degree sign
371 249 f9 <middot/ $\bullet$ • * bold middle dot
372 250 fa * $\cdot$ · * light middle dot
373 251 fb <root/ radic $\surd$ √ * root sign
374 252 fc *
375 253 fd *
376 254 fe *
377 255 ff *
----------------------------------------------------------------------------
The table below gives some additional information about some of the more
commonly used entities:
-------------------------------------------------------------------
Frequently used:
decimal hex char definition
128 80 Ç <Cced/ c cedilla (uppercase)
129 81 ü <uum/ u umlaut
130 82 é e acute
131 83 â a circumflex
132 84 ä <aum/ a umlaut
133 85 à a grave
134 86 å <aring/ a with "ring" (circle) above (Swedish!)
135 87 ç <cced/ c cedilla
136 88 ê <ecir/ e circumflex
137 89 ë <eum/ e umlaut (or e with dieresis above)
138 8a è e grave
145 91 æ <ae/ = "ae" fused ligature
146 92 Æ <AE/ = upper-case "ae" fused ligature
147 93 ô <ocir/ o circumflex
148 94 ö <oum/ o "umlaut", used mostly in "coöperation,
Zoöl." and in pronunciations
164 a4 ñ <ntil/ Spanish "enye"
166 a6 ⅔ <frac23/ two-thirds (fraction)
167 a7 ⅓ <frac13/ one-third (fraction)
169 a9 ˝ <sec/ seconds of degree or time, or double-prime
171 ab ½ <frac12/ one-half, as in the original IBM set
172 ac ¼ <frac14/ one-fourth (fraction)
176 b0 <?/ = (reverse-video question mark), used
to represent an uncodable or illegible character
180 b4 ´ long verticle double-headed arrow (a reference mark)
181 b5 ☞ <hand/ = (the typographer's "fist")
Appearing as a "pointing hand" character
(for explanatory notes)
182 b6 ˝ bold accent in headwords
replaced in full ASCII version by double quote = "
183 b7 ´ light accent in headwords
replaced within headwords in the full ASCII version
by an open-single-quote (` = ASCII 96, not the same
as 191, \'bf). This mark is used also for
minutes of a degree, and for "prime" to modify
variables in mathematical expressions.
-- two of these in sequence represent seconds of
a degree, or double prime. The seconds symbol
is also represented by <sec/ (hex a9).
184 b8 ” close double quotes (used with 189 [= \'bd],
open quote)
186 ba ‖ vertical double bar - represents the symbol used
in the printed dictionary before a headword to
signify that the word was adopted without
anglicization from a foreign language but in the
full-ASCII version this function uses \'d8 -- see 216
188 bc § <sect/ section mark
189 bd “ open double quotes (used with 184, close quote)
190 be ā <amac/ a macron
191 bf ‘ <lsquo/ "left single quote"
single open quote mark (not same as ASCII 96)
192 c0 ṉ <nsm/ "n sub-macron", an n with a macron below --
represents the "ng" sound in pronunciations
193 c1 ♯ <sharp/ sharp - music notation
194 c2 ♭ <flat/ flat - music notation
195 c3 – long dash, one pixel removed from left
will fuse with left long dash, char 208
196 c4 ― graphic horizontal line
195+208 –— combination for a very long dash. In the
original typing, the dash char 208 was used for
both non-breaking hyphen (in hyphenated words),
and for the em-dash used as an introductory mark
for various segments. The em-dash should be
distinguished from the hyphen, but that
conversion hasn't yet been done. In the full
ASCII version, a double hypen "--" represent the
m-dash.
197 c5 t <th/ (part 1) first of a pair of characters
197+231 = th used to represent the th ligature --
<th/ represents the "th" sound of "mother"
see 231 (e7) for part 2
198 c6 ī <imac/ = i macron
199 c7 ē <emac/ = e macron
200 c8 ḍ <dsdot/ Sanskrit/Tamil d with dot underneath
201 c9 ṇ <nsdot/ Sanskrit/Tamil n with dot underneath
202 ca ṭ <tsdot/ Sanskrit/Tamil t with dot underneath
203 cb ĕ <ecr/ = e with crescent (breve) above. Used
- in some etymologies and pronunciation
204 cc ĭ <icr/ = i with crescent (breve) above - used
- in some etymologies and pronunciation
206 ce ŏ <ocr/ = o with crescent (breve) above - used
- in some etymologies and pronunciation
207 cf ‐ short dash, used in hyphenated words, and in
breaking syllables where no accent is used. But
sometimes the typists used the normal hyphen
[45], or the long dash (decimal 208) for that
purpose. The normal hyphen is the same length
as the long dash, but one pixel higher in the
character box.
# In headwords, in the full ASCII version, this
short dash is represented by the asterisk "*".
208 d0 — <mdash/ = represents the long dash, used for the em
dash which often precedes certain sections within a
definition, and which separates some sections,
such as wordforms or collocations within a
collocation segment. This is replaced in the
full ASCII version by a double hyphen, "--".
210 d2 œ <oe/ = "oe" fused ligature
211 d3 ō <omac/ = o macron
212 d4 ū <umac/ = u macron
213 d5 ǒ <ocar/ o with caron (hacek) (inverted
circumflex) above
214 d6 ǣ <aemac/ = "ae" ligature with a macron
215 d7 ōē <oemac/ = "oe" ligature with a macron
216 d8 ‖ <par/ double vertical bar (short length; the long
length is the graphics character 186)
This precedes words marked with a double
vertical bar in the original dictionary,
signifying that the word was adopted directly
into English without modification of the spelling.
220 dc ŭ <ucr/ = u with crescent above - used in some etymologies
221 dd ă <acr/ = a with crescent above - used in some etymologies
222 de ˘ <cre/ = "crescent", an upward-curving crescent
used as a poetic meter mark
223 df ȳ <ymac/ = y macron (used in Anglo-Saxon?)
229 e5 ȧ <adot/ = a with a dot above (for pronunciations)
231 e7 h <th/ (part 2) second of a two-character combination
197+231 = th used to represent the th ligature in pronunciations
<th/ represents the "th" sound of "mother"
235 eb ð <edh/ = Old English and Icelandic "edh", (or "eth")
like a Greek delta with a hatch mark through
the ascender. Used to represent the
Anglo-Saxon/Icelandic/Gothic character, in
etymologies, pronounced like "th"
237 ed þ <thorn/ "thorn", an Old English and Icelandic
character, appears like a "p" with an extended
ascender.
Used to represent the
Anglo-Saxon/Icelandic/Gothic character, in
etymologies, pronounced like "th" in "thorn"
and also as in "brother"
238 ee ã <atil/ a with tilde above - in some etymologies
244 f4 ȝ <yogh/ like a script "3" or "z". Used in Old English
etymologies, analogous to "y"
247 f7 ≈ double tilde ("approximately equals").
used by typists as a place-holder in word
combinations where the capitalized headword
should be.
248 f8 ° <deg/ degrees (temperature or angle). Note: some
typists used a superscript "o" to signify
degrees. This must be corrected!
249 f9 • middle dot (bold)
250 fa · middle dot (light)
251 fb √ <root/ "root" sign used in etymologies, as in original
* Greek transliteration
Stand-alone Greek letters are represented by entities <alpha/, <beta/,
<gamma/, <lambda/ etc. Capitalized letters are <ALPHA/, etc.
Text appearing within the markers <grk></grk>, is a Greek transliteration
written in roman letters. The following rules are used:
** Aspirants
Aspirants are represented by ' (apostrophe) and " (double quote) placed in
front of the letter modified. Apostrophe stands for ψιλὸν πνεῦμα (ψιλή or
spiritus lenis), and double quote stands for δασὺ πνεῦμα (δασεία or spiritus
asper).
'a -- ἀ
"a -- ἁ
** Accents
Accents are placed after the accented letter. The acute accent (ὀξεῖα) is
represented by ` (gravis). The grave accent (βαρεῖα) is represented by ~
(tilde), and circumflex (περισπωμένη) is represented by circumflex. Thus:
a` -- ά
a~ -- ὰ
a^ -- ᾶ
Some examples of the combined forms (aspirant + accent):
'a` -- ἄ
'a~ -- ἂ
'a^ -- ἆ
"a` -- ἄ
"a~ -- ἂ,
"a~ -- ἃ
** Iota subscriptum
Iota subscript is represented by comma placed after the affected vowel. If
the vowel is accented, the comma is placed after the accent mark. For
example:
a`, -- ᾴ
'a`, -- ᾄ
** Diaeresis
Diaeresis is represented by a colon immediately after the affected vowel.
If the vowel is accented, the accent is placed after the colon, e.g.:
i: -- ϊ
i:^ -- ῗ
i:` -- ῒ
** Letters
The table below shows, for each Greek letter, the corresponding markup
entity and transliteration. The capitalized Greek letters are represented
by the capitalized versions of the letters shown here.
-----------------------------------------
Greek letter transliteration
------------ ---------------
α alpha a
β beta b
γ gamma g
δ delta d
ε epsilon e
ζ zeta z
η eta h
θ theta q [1]
ι iota i
κ kappa k
λ lambda l
μ mu m
ν nu n
ξ xi x
ο omicron o
π pi p
ρ rho r
σ,ς sigma s [2]
τ tau t
υ upsilon y,u [3]
φ phi f
χ chi ch [4]
ψ psi ps [5]
ω omega w
---
[1] "th" was used in some earier sections, but was changed due to potential
confusion with the tau+eta combination, as in λυτήριος
(<grk>lyth`rios</grk>, at "lyterian") or ποιητής (<grk>poihth`s</grk>, at
"maker").
[2] Final sigma is not distinguished here from middle sigma, but when
isolated, use <sigmat/ ("terminal sigma") for the final form.
[3] Both y and u are used interchangeably in this edition.
[4] "c" is always followed by "h", so the "h" component is not confusable
with eta. Applications must first convert "ch" before converting "h", or at
least verify that an "h" to be converted has no preceding "c".
[5] This usage is theoretically confusable with pi-sigma, but that
combination seems never to occur.
Roman "j" and "v" are unused. Roman "u" is occasionally used instead of "y"
to represent upsilon.
Examples:
<grk>'archai:`zein</grk> ἀρχαῒζειν
<grk>zw^,on</grk> ζῷον
<grk>o'i^nos</grk> οἶνος
<grk>"ydra`rgyros</grk> ὑδράργυρος
Local Variables:
mode: Outline
coding: utf-8
fill-column: 76
End:
|