diff options
author | Sergey Poznyakoff <gray@gnu.org.ua> | 2012-02-02 14:42:06 +0200 |
---|---|---|
committer | Sergey Poznyakoff <gray@gnu.org.ua> | 2012-02-02 14:42:06 +0200 |
commit | 3d4fbac289846464491104b01bebe554da6758da (patch) | |
tree | ef314e6d3f0c12d1879e43c4c0bb5753cc9e5f78 /tagset.txt | |
parent | b61268b9deea32b7d965808f47d1227e3197a83c (diff) | |
download | gcide-3d4fbac289846464491104b01bebe554da6758da.tar.gz gcide-3d4fbac289846464491104b01bebe554da6758da.tar.bz2 |
Reorganize the directory structure.
* .gitignore: New file.
* Makefile: Fix the list of distributed files.
* README.DIC: Rename to README and edit.
* WXXVII.JPG: Remove.
* abbrevn.lst: New file.
* authors.lst: New file.
* gcide.conf: New file.
* PRONUNC.JPG: Rename to pronunc.jpg.
* PRONUNC.WEB: Rename to pronunc.txt.
* SYMBOLS.JPG: Rename to symbols.jpg
* TAGSET.WEB: Rename to tagset.txt
* WEBFONT.ASC: Rename to webfont.txt.
* titlepage.png: New file.
Diffstat (limited to 'tagset.txt')
-rw-r--r-- | tagset.txt | 1080 |
1 files changed, 1080 insertions, 0 deletions
diff --git a/tagset.txt b/tagset.txt new file mode 100644 index 0000000..f0b9367 --- /dev/null +++ b/tagset.txt | |||
@@ -0,0 +1,1080 @@ | |||
1 | FIELD MARKS FOR WEBSTER 1913 and CIDE | ||
2 | ===================================== | ||
3 | Tagset.web: | ||
4 | Explanations of the tags used to mark the Webster 1913 dictionary | ||
5 | and the CIDE (Collaborative International Dictionary of English). | ||
6 | Note that the list of tags used to mark the public domain version | ||
7 | of this dictionary is shorter than the full set described here. | ||
8 | If any tag is not listed here, it is either (1) one of the | ||
9 | "point" (font size) or "type" (font style) tags, which should be self-explanatory; or | ||
10 | (2) Is a functional field with no effect on the typography. | ||
11 | |||
12 | Last modified March 12, 1999. | ||
13 | For questions, contact: | ||
14 | Patrick Cassidy cassidy@micra.com | ||
15 | 735 Belvidere Ave. | ||
16 | Plainfield, NJ 07062 | ||
17 | (908) 561-3416 or (908) 668-5252 | ||
18 | ------------------------------------------------------------- | ||
19 | A separate file, webfont.asc, contains the list of the individual | ||
20 | non-ASCII characters represented by either higher-order hexadecimal | ||
21 | character marks (e.g., \'94, for o-umlaut) or by entity tags | ||
22 | (e.g., <root/, for the square root symbol.) | ||
23 | -------------------------------------------------------------- | ||
24 | Use of tags: | ||
25 | In the MICRA electronic version of the 1913 Webster, each part of | ||
26 | the entry headed by an entry word ("headword") is labeled so that no | ||
27 | part of the entry except some punctuation marks should be found | ||
28 | outside of all fields, i.e. every character should be within some tagged | ||
29 | field. In the following description, the word "segment" usually refers to | ||
30 | a major part of an entry such as an etymology or a definition or a | ||
31 | collocation segment or a usage block, containing more than one field. | ||
32 | The term "field" may also be used similarly to "segment", but may also | ||
33 | denote single-word fields, such as an alternative spelling, labeled <asp>. | ||
34 | |||
35 | Note: The tags on this list are similar in structure to SGML tags. Each | ||
36 | tag on this list marks a field; each field opens with a tagname between | ||
37 | angle brackets thus: <tagname>, and closes with a similar tag containing | ||
38 | the forward slash thus: </tagname>. No tags are used without closing | ||
39 | tags. Thus the HTML <BR> to indicate a line break is symbolized | ||
40 | here as an entity, <br/, and every <p> has a corresponding </p>. | ||
41 | The absence of an end-field tag, or the presence of an end-field tag | ||
42 | without a prior begin-field tag constitutes a typographical error, of which | ||
43 | there may be a significant number. Any errors detected should be brought | ||
44 | to the attention of PJC or the appropriate editor. | ||
45 | Most of the tagged fields are presented in the text in italic type, | ||
46 | with a number of exceptions. Where a word is contained within more than | ||
47 | one field, the innermost field determines the font to be used. Wherever | ||
48 | recognizable functional fields were found, an attempt was made to tag the | ||
49 | field with a functional mark, but in many cases, words were italicised only | ||
50 | to represent the word itself as a discourse entity, and in some such cases, | ||
51 | the "italic" mark <it> was used, implying nothing regarding functionality | ||
52 | of the word. The base font is considered "plain". Where an italic field | ||
53 | is indicated, parentheses or brackets within the field are not italicised. | ||
54 | Where no font is specified for a tag, the tag is merely a functional | ||
55 | division, and was printed in plain font unless otherwise tagged. This type | ||
56 | of segment is marked by an asterisk (*) where the font name would be. | ||
57 | The size of the "plain" font in the original text is about 1.6 mm for | ||
58 | the height of capitalized letters. | ||
59 | ============================================================= | ||
60 | Explicit typographical tags: | ||
61 | These were used where the purpose of a different font was merely to | ||
62 | distinguish a word from the body of the text, and no explicit functional | ||
63 | tag seemed apropriate. | ||
64 | ----------------------------------- | ||
65 | Tag Font | ||
66 | ----------------------------------- | ||
67 | Explicit formatting tags: | ||
68 | . . . . . . . . . . . . . . . . . . | ||
69 | <plain> plain font (that used in the body of a definition) -- | ||
70 | normally not marked, except within fields of | ||
71 | a different front. | ||
72 | <it> italic (in master files) | ||
73 | <i> italic (for use in HTML presentation) | ||
74 | <bold> bold (in master files) | ||
75 | <b> bold (for use in HTML presentation) | ||
76 | <colf> bold, Collocation font. Same font as used in collocations. | ||
77 | smaller This is used only in the list of "un-" words not | ||
78 | by 1 point actually defined in the dictionary. Probably could be | ||
79 | replaced by a segment mark for the entire list! | ||
80 | The "un-" words should be indexed as headwords. | ||
81 | |||
82 | <ct> bold Same as <colf>, a font similar to that used in | ||
83 | collocations. However, this tag is used in a table | ||
84 | and could be set to a different font. | ||
85 | |||
86 | <h1> * HTML tag -- largest heading font. | ||
87 | |||
88 | <h2> * HTML tag -- second largest heading font. | ||
89 | |||
90 | <headrow> * Marks a Row title in a table. | ||
91 | |||
92 | <hwf> Font the same as the headword <hw>, though the field is | ||
93 | not a headword. Used only once. | ||
94 | |||
95 | <mitem> * Multiple items, a set of items in a table. | ||
96 | <point ...> A series of point size markers, many unique. | ||
97 | <point1.5> * One of the tags of the form <point**> where ** | ||
98 | <point6> represents the typographic point size of the | ||
99 | enclosed text. | ||
100 | <pre> An HTML tag indicating that the enclosed text is | ||
101 | of teletype form, preformatted in a uniform-spaced | ||
102 | font. | ||
103 | <sc> small caps (used mostly for "a. d.", "b. c.") | ||
104 | This is the same font a <er>, but has no functional | ||
105 | or semantic significance | ||
106 | <str> group of table data elements in a table | ||
107 | <sub> subscript, like <subs> | ||
108 | <subs> subscript | ||
109 | <sups> superscript | ||
110 | <supr> superscript | ||
111 | <sansserif> Sans-serif font | ||
112 | <stypec> Bold (collocation font) and also a subtype. | ||
113 | <tt> HTML tage -- teletype font | ||
114 | <universbold> A squared bold font without serifs approximating the | ||
115 | "universe bold" font on the HP Laserjet4, slightly | ||
116 | larger than the capitals in a definition body. Used | ||
117 | in expositions describing shapes, such as | ||
118 | "Y", "T", "U", "X", "V", "F". | ||
119 | <vertical> Vertically organized column. | ||
120 | <column1> Vertically organized column -- only part of a table | ||
121 | which needs to be completed. Used once. | ||
122 | <...type> A series of tags, many unique, designating certain | ||
123 | unusual fonts, such as "bourgeoistype" for | ||
124 | "bourgeois type", in the section on typography. | ||
125 | Most of these occur only once, in the section on fonts. | ||
126 | <antiquetype> | ||
127 | <blacklettertype> | ||
128 | <boldfacetype> | ||
129 | <bourgeoistype> | ||
130 | <boxtype> | ||
131 | <clarendontype> | ||
132 | <englishtype> | ||
133 | <extendedtype> | ||
134 | <frenchelzevirtype> | ||
135 | <germantype> | ||
136 | <gothictype> | ||
137 | <greatprimertype> | ||
138 | <longprimertype> | ||
139 | <miniontype> | ||
140 | <nonpareiltype> | ||
141 | <oldenglishtype> | ||
142 | <oldstyletype> | ||
143 | <pearltype> | ||
144 | <picatype> | ||
145 | <scripttype> | ||
146 | <smpicatype> | ||
147 | <typewritertype> | ||
148 | |||
149 | ============================================================= | ||
150 | Tags with semantic content: | ||
151 | . . . . . . . . . . . . . . . . . . . . . . . . . . . | ||
152 | <altsp> * Alternative spelling segment. Almost always | ||
153 | contained within square brackets after the main | ||
154 | definition segment. Expository words | ||
155 | such as "Spelled also" are in plain font; | ||
156 | the actual alternative spelling is marked by | ||
157 | <asp> ... </asp> tags within this segment. | ||
158 | |||
159 | <ant> italic Antonym. | ||
160 | |||
161 | <asp> italic Alternative spelling. The actual word which is an | ||
162 | alternative spelling to the headword. These | ||
163 | are functionally synonyms of the headword. In | ||
164 | most cases these also occur as headwords, with | ||
165 | reference to the word where the actual definition | ||
166 | is found, but not all such words are listed | ||
167 | separately, particularly if the spelling is | ||
168 | close enough to the headword to be found at the | ||
169 | same point in the dictionary. Whether listed | ||
170 | separately or not, these words should | ||
171 | be indexed at this location, also. | ||
172 | |||
173 | <au> italic Authority or author. Used where an authority is | ||
174 | (may be right- given for a definition, and also used for the | ||
175 | justified. See author, where a quotation within double quotes | ||
176 | in the section is given in the same paragraph as the | ||
177 | on formatting). definition. The double quotes are indicated | ||
178 | by the open-quote (\'bd) and close-quote | ||
179 | (\'b8). In both cases, it is typically | ||
180 | right-justified, almost always fitting on | ||
181 | the same line with the last line of the | ||
182 | definition or quotation. | ||
183 | Within collocation segments, it is usually | ||
184 | used only after quotations, and is not right- | ||
185 | justified, except occasionally where it | ||
186 | would be close to the right margin, and then | ||
187 | apparently is is right-justified. We have | ||
188 | not explicitly marked those which are | ||
189 | right-justified, but they can be | ||
190 | recognized because they are on a line by | ||
191 | themselves, preceded by two carriage returns. | ||
192 | |||
193 | <bio> * Marks a biography. Should be longer than | ||
194 | a short mention of who a person was, which | ||
195 | is typically included as a definition. | ||
196 | |||
197 | <biography> * Same as <bio> | ||
198 | |||
199 | <booki> italic Marks the name of a book, pamphlet, or similar | ||
200 | document. | ||
201 | |||
202 | <branchof> * A field of knowledge which of which the headword | ||
203 | is a division. | ||
204 | |||
205 | <caption> * Caption of a figure or table. | ||
206 | |||
207 | <cas> * tags the CAS (Chemical Abstracts Service) registry | ||
208 | number for a chemical substance. | ||
209 | |||
210 | <causes> italic tags the infectious disease caused by the headword. | ||
211 | Implied type of the agent is a microorganism, and | ||
212 | the tag must mark a disease. | ||
213 | |||
214 | <causesp> * Same as <causes> without the italic type. | ||
215 | <causedbyp> * Same as <causedby> without the italic type. | ||
216 | |||
217 | <causedby> italic inverse of causes: tags the causative agent of an | ||
218 | infectious disease, which is the headword . | ||
219 | the tag must mark a microorganism, virus, or | ||
220 | prion, and the implied type of the headword is | ||
221 | a disease. | ||
222 | |||
223 | <centered> Used only for The single letter in the headers to each | ||
224 | letter of the alphabet. | ||
225 | |||
226 | <city> * |