aboutsummaryrefslogtreecommitdiff
path: root/webfont.txt
diff options
context:
space:
mode:
authorSergey Poznyakoff <gray@gnu.org.ua>2012-02-03 00:08:07 +0200
committerSergey Poznyakoff <gray@gnu.org.ua>2012-02-03 00:08:07 +0200
commit4a458db06b28492a7e48b1a0560b35778e476482 (patch)
treeef19ae1addbb291801482465d9b6a923ba2417ed /webfont.txt
parent60c1ea4788f2702eeeba8453f158861091ed28b1 (diff)
downloadgcide-4a458db06b28492a7e48b1a0560b35778e476482.tar.gz
gcide-4a458db06b28492a7e48b1a0560b35778e476482.tar.bz2
Further work on ancillary files.
* webfont.txt: Use Unicode, rewrite character table and Greek transliteration sections. * pronunc.txt: Update. * tagset.txt: Update.
Diffstat (limited to 'webfont.txt')
-rw-r--r--webfont.txt1046
1 files changed, 529 insertions, 517 deletions
diff --git a/webfont.txt b/webfont.txt
index 591e980..d432fe5 100644
--- a/webfont.txt
+++ b/webfont.txt
@@ -1,88 +1,70 @@
1 WEBSTER FONTS 1 WEBSTER FONTS
2 ============= 2 =============
3 3
4 Fonts for the Webster 1913 Dictionary. 4* Overview
5 For version 0.50 5
6 Last edit May 5, 2001 6This file describes special symbols and markup entities used in the
7 ______________________________________ 7GNU Collaborative International Dictionary of English.
8 (This file contains some extended ASCII characters, and should be 8
9transmitted in binary mode) 9* Introduction
10---------------------------------------------------------------------- 10
11 11The special characters used in the electronic version of the Webster
12 This file describes a modified font for use in visualizing the
13text of the 1913 "Webster's Revised Unabridged Dictionary" (W1913),
14usable for the DOS operating system of IBM-compatible personal computers.
15The electronic version of that dictionary and this font were prepared by
16MICRA, Inc., Plainfield NJ, and are copyrighted (C) 1996 by MICRA, Inc.
17For details of permissions and restrictions on using these files, see
18the accompanying file "readme.web".
19 The special characters used in the electronic version of the Webster
201913 are required for visualizing unusual characters used in the 121913 are required for visualizing unusual characters used in the
21etymology and pronunciation fields of the dictionary, in a form 13etymology and pronunciation fields of the dictionary, in a form
22comparable to the way they appear in the original. Since there are 14comparable to the way they appear in the original.
23more than 256 characters used in that dictionary, not all can be 15
24represented by single-byte codes, and are instead represented by 16The GCIDE markup provides two ways for representing such characters:
25SGML-style "short-form" symbols. (rather than the "entity" format 17using special "escape sequences" and using special markup entities.
26"&xx;" The ampersand is used frequently, and we prefer to leave 18Historically, "escape sequences" were used to indicate the
27the "<" as the only "escape" character) of the type <x/ where x 19character's ordinal position in a special font, prepared by MICRA,
28is a specific code for the symbol in the dictionary. 20Inc. to represent it on screen. Although nowadays this method is
29See the "Short Form" section below for details about such characters. 21obsolete, the dictionary corpus still uses these sequences. This file
30Note that the symbols used here are in some cases abbreviations 22describes their mapping to Unicode characters.
31(for compactness) of the ISO 8879 recommended symbols. If necessary, 23
32the table below allows simple replacement by alternate encodings. 24An escape sequence has the form \'xx, where "x" represent lowercase
33 This symbol font can be loaded in IBM-compatible (x86) computers 25hexadecimal digits. For example, \'94 stands for "o" with diaeresis.
34running the DOS operating system by using the "font.bat" command file 26There are only 256 such sequences.
35in the "utils" directory. The fonts files for 8x14 and 8x16 fonts are 27
36"web14.fnt" and "web16.fnt" respectively. 28Special markup entities are able to represent a wider range of
37 For those loading the Webster onto some machine other than an 29characters. A markup entity is similar to SGML one, but has a
38IBM-compatible running DOS, it will be necessary to provide a 30different format. The traditional &xx; format was judged inconvenient
39translation table, to convert these characters into a code that 31because the ampersand is used frequently in the corpus. Instead,
40can be handled by that computer. For this reason, I attach an 32GCIDE entities have the format <WORD/, where "<" and "/" represent the
41"explanation" for each character, for those who cannot view 33beginning and end of the entity and WORD represents the character
42the original DOS font. 34itself. Valid WORDs are in some cases abbreviations (for compactness)
43 The DOS-loadable font does not contain all of the characters needed 35of the ISO 8879 recommended symbols. Characters representable by
44to depict the etymologies or the pronunciations. In addition to an 36escape sequences can also be represented by entities, but the reverse
45absence of several characters used in the pronunciations, no Greek letters are 37is not true, due to a limited range of the former.
46included. The Greek words appearing in the etymologies, 38
47when they are included, will be typed in a 39The Greek words appearing in the etymologies, when they are included,
48roman-letter transcription (See section on Greek transcription, below). 40are typed in a roman-letter transcription, which is described below in
49Only a very few Greek words have been thus transcribed as of the 41chapter "Greek transliteration".
50present version (version 0.41). 42
51 Wherever the typists did not know the character to use, they 43* Unrecognized characters
52usually inserted a reverse-video question mark (decimal 176). 44
53This appears in full-ASCII versions as <?/. This mark was used both for 45Wherever the typists did not know the character to use, they usually
54characters in non-ASCII fonts, and for unreadable characters (i.e., 46inserted a reverse-video question mark (decimal 176). This appears in
55characters smeared in the original or distorted in the copies available 47full-ASCII versions as <?/. This mark was used both for characters in
56to the typists. The type in the original was in many places smeared and 48non-ASCII fonts, and for unreadable characters (i.e., characters
49smeared in the original or distorted in the copies available to the
50typists. The type in the original was in many places smeared and
57illegible at the left and right page margins; occasionally, small 51illegible at the left and right page margins; occasionally, small
58parts of words were blotted out by plain white space). 52parts of words were blotted out by plain white space).
59 A character table for the high-order characters appears below. 53
60Under that is a list and description of most of the special characters 54* Italics
61used in the Webster files. 55
62 Note that there are yet some characters used in the etymologies, 56In most places, italic font is represented by the tags <it>...</it>
63and some other symbols, which are not in this list. For example, the
64vowels with a double dot *underneath*, e.g. a (as in all) have no representation
65in this character set, and, where explicitly entered in the dictionary,
66are represented by <xdd/ where "x" is the letter, as in "<add/".
67
68ITALICS
69-------
70 In most places, italic font is represented by the tags <it>...</it>
71surrounding the italic text, or by some other tag which also implies 57surrounding the italic text, or by some other tag which also implies
72italic font. In the pronunciations, however, where italicized vowels 58italic font. In the pronunciations, however, where italicized vowels
73are used among non-italic and other special characters to indicate 59are used among non-italic and other special characters to indicate
74pronunciation, the special codes <ait/, <eit/, <iit/, <oit/, <uit/, 60pronunciation, the special codes <ait/, <eit/, <iit/, <oit/, <uit/,
75are also used to indicate the italicized vowel. 61are also used to indicate the italicized vowel.
76 62
77DIACRITICS 63* Diacritics
78------------- 64
79 The European grave and acute accents are represented by the 65Vowels with a circle above (as in Swedish) are coded <xring/ (x with a
80standard (IBM PC) high-order codes. Other characters with diacritics 66ring, or "degrees" mark over it); vowels with tilde over them are
81are represented by special "entity" codes, and in some cases also 67represented by <xtil/, where "x" is the vowel, as in <etil/ (<atil/
82are found in this special WEB1913 font, described below.
83 Vowels with a circle above (as in Swedish) are coded <xring/
84(x with a ring, or "degrees" mark over it); vowels with tilde over them
85are represented by <xtil/, where "x" is the vowel, as in <etil/ (<atil/
86also has code 238); letters with a dot above are represented by <xdot/ 68also has code 238); letters with a dot above are represented by <xdot/
87-- letter with a dot below are represented by <xsdot/ ("subdot"); 69-- letter with a dot below are represented by <xsdot/ ("subdot");
88vowels with the semi-long mark (a macron with a short perpendicular 70vowels with the semi-long mark (a macron with a short perpendicular
@@ -93,70 +75,57 @@ the "oo" with an unbroken macron above the two letters, <aemac/ = the
93ligature ae with a macron [also 214 = \'d6], and <oemac/ the ligature 75ligature ae with a macron [also 214 = \'d6], and <oemac/ the ligature
94oe with a macron [also 215 = \'d7]); vowels with umlauts or a crescent 76oe with a macron [also 215 = \'d7]); vowels with umlauts or a crescent
95(breve) above have codes in this list, but may also be represented by 77(breve) above have codes in this list, but may also be represented by
96<xum/ and <xcr/ respectively. There is an occasional hacek or caron mark 78<xum/ and <xcr/ respectively. There is an occasional hacek or caron
97(an inverted circumflex) in the original; such letters are coded <xcar/. 79mark (an inverted circumflex) in the original; such letters are coded
98The o with a caron has code 213, but no others are in this font list. 80<xcar/. The o with a caron has code 213, but no other letter with a
81caron is representable by an escape sequence.
82
99The diaeresis is treated typographically as identical to the umlaut. 83The diaeresis is treated typographically as identical to the umlaut.
100 A special modification, used only for poetry (see entry "saturnian verse" 84A special modification, used only for poetry (see entry "saturnian
101under "saturnian") is a vowel with a macron, in which the macron is lighter 85verse" under "saturnian") is a vowel with a macron, in which the
102than the usual macron, signifying a stressed syllable which has a short 86macron is lighter than the usual macron, signifying a stressed
103vowel sound. This is represented by <xsmac/ ("short mac"). 87syllable which has a short vowel sound. This is represented by
104 Another special character used in pronunciations is an "n" with an underline (like 88<xsmac/ ("short mac").
105a macron, but below the letter), used to represent the "ng" sound. This is coded 89
106<nsm/ ("n sub-macron"). The ligated th used in pronunciations to depict the 90Another special character used in pronunciations is an "n" with an
107"th" sound of "the" is coded as <th/. 91underline (like a macron, but below the letter), used to represent the
108 NOTE: the letter combinations "fi" and "fl" are invariably printed as the 92"ng" sound. This is coded <nsm/ ("n sub-macron"). The ligated th
93used in pronunciations to depict the "th" sound of "the" is coded as
94<th/.
95
96NOTE: the letter combinations "fi" and "fl" are invariably printed as the
109ligatures &filig; and &fllig;, but these ligatures are not marked as such 97ligatures &filig; and &fllig;, but these ligatures are not marked as such
110in this transcription, and the two letters are left as individuals. 98in this transcription, and the two letters are left as individuals.
111 99
112SPECIAL SYMBOLS 100* Special symbols
113 The dagger <dag/, double dagger <ddag/, and paragraph mark <para/ are rarely used. 101
114 The double prime, or "seconds" of a degree is sometimes represented by 102The dagger <dag/, double dagger <ddag/, and paragraph mark <para/ are
115a double "light accent" (code 183 = \'b7). In other places, and in later 103rarely used.
116versions, it is represented by <sec/ = hex a9, in the webfont. 104
117 The symbols "greater than" <gt/ and "less than" are encountered only 105The double prime, or "seconds" of a degree is sometimes represented by
118once, but are distinguished from the right- and left-angle brackets 106a double "light accent" (code 183 = \'b7). In other places, and in
119(> and <) because of possible typographical differences in some fonts. 107later versions, it is represented by <sec/ = \'a9.
120 The schwa is symbolized by <schwa/. It is not used in the 108
121pronunciations, but is mentioned as a symbol. 109The symbols "greater than" <gt/ and "less than" are encountered only
122 The right-pointing arrow is <rarr/, consistent with ISO 8879. 110once, but are distinguished from the right- and left-angle brackets (>
123 111and <) because of possible typographical differences in some fonts.
124---------------------------------- 112
125Table 1 113The schwa is symbolized by <schwa/. It is not used in the
126---------------------------------- 114pronunciations, but is mentioned as a symbol. The right-pointing
127Numbers 115arrow is <rarr/, consistent with ISO 8879.
128 Hex codes 116
1291   117* Symbol summary
13011   (12 is a hard page break, 13 CR, 14 sect break) 118
13121   119Below is a complete list of the symbols used in the Webster, together
13231  !"# $%&'( 120with their "webfont" number (escape sequence), corresponding markup
133121 yz{|} ~ 79-7d 7e-82 121entity, and corresponding symbols in ISO 8879 and Tex coding. Much of
134131 83-87 88-8c 122this table was prepared by Rik Faith, to whom we express our
135141 8d-91 92-96 123appreciation.
136151 97-9b 9c-a0 124
137161 a1-a5 a6-aa 125The "Uc" column gives the Unicode representation of the character.
138171 ab-af b0-b4 126The "nearest ASCII" equivalents are given for those who want to
139181 b5-b9 ba-be 127display the data as best one can in 7-bit simple ASCII symbols without
140191 bf-c3 c4-c8
141201 c9-cd ce-d2
142211 d3-d7 d8-dc
143221 dd-e1 e2-e6
144231 e7-eb ec-f0
145241 f1-f5 f6-fa
146251 fb-ff
147
148=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
149Below is a complete list of the symbols used in the Webster ("webfont")
150which are encoded in the special font listed above, together with
151corresponding symbols in ISO 8879 and Tex coding. Much of this table was