aboutsummaryrefslogtreecommitdiff
path: root/tagset.txt
diff options
context:
space:
mode:
authorSergey Poznyakoff <gray@gnu.org.ua>2012-02-02 14:42:06 +0200
committerSergey Poznyakoff <gray@gnu.org.ua>2012-02-02 14:42:06 +0200
commit3d4fbac289846464491104b01bebe554da6758da (patch)
treeef314e6d3f0c12d1879e43c4c0bb5753cc9e5f78 /tagset.txt
parentb61268b9deea32b7d965808f47d1227e3197a83c (diff)
downloadgcide-3d4fbac289846464491104b01bebe554da6758da.tar.gz
gcide-3d4fbac289846464491104b01bebe554da6758da.tar.bz2
Reorganize the directory structure.
* .gitignore: New file. * Makefile: Fix the list of distributed files. * README.DIC: Rename to README and edit. * WXXVII.JPG: Remove. * abbrevn.lst: New file. * authors.lst: New file. * gcide.conf: New file. * PRONUNC.JPG: Rename to pronunc.jpg. * PRONUNC.WEB: Rename to pronunc.txt. * SYMBOLS.JPG: Rename to symbols.jpg * TAGSET.WEB: Rename to tagset.txt * WEBFONT.ASC: Rename to webfont.txt. * titlepage.png: New file.
Diffstat (limited to 'tagset.txt')
-rw-r--r--tagset.txt1080
1 files changed, 1080 insertions, 0 deletions
diff --git a/tagset.txt b/tagset.txt
new file mode 100644
index 0000000..f0b9367
--- /dev/null
+++ b/tagset.txt
@@ -0,0 +1,1080 @@
1 FIELD MARKS FOR WEBSTER 1913 and CIDE
2 =====================================
3Tagset.web:
4 Explanations of the tags used to mark the Webster 1913 dictionary
5and the CIDE (Collaborative International Dictionary of English).
6Note that the list of tags used to mark the public domain version
7of this dictionary is shorter than the full set described here.
8 If any tag is not listed here, it is either (1) one of the
9"point" (font size) or "type" (font style) tags, which should be self-explanatory; or
10 (2) Is a functional field with no effect on the typography.
11
12Last modified March 12, 1999.
13 For questions, contact:
14 Patrick Cassidy cassidy@micra.com
15 735 Belvidere Ave.
16 Plainfield, NJ 07062
17 (908) 561-3416 or (908) 668-5252
18-------------------------------------------------------------
19A separate file, webfont.asc, contains the list of the individual
20non-ASCII characters represented by either higher-order hexadecimal
21character marks (e.g., \'94, for o-umlaut) or by entity tags
22(e.g., <root/, for the square root symbol.)
23--------------------------------------------------------------
24 Use of tags:
25 In the MICRA electronic version of the 1913 Webster, each part of
26the entry headed by an entry word ("headword") is labeled so that no
27part of the entry except some punctuation marks should be found
28outside of all fields, i.e. every character should be within some tagged
29field. In the following description, the word "segment" usually refers to
30a major part of an entry such as an etymology or a definition or a
31collocation segment or a usage block, containing more than one field.
32The term "field" may also be used similarly to "segment", but may also
33denote single-word fields, such as an alternative spelling, labeled <asp>.
34
35 Note: The tags on this list are similar in structure to SGML tags. Each
36tag on this list marks a field; each field opens with a tagname between
37angle brackets thus: <tagname>, and closes with a similar tag containing
38the forward slash thus: </tagname>. No tags are used without closing
39tags. Thus the HTML <BR> to indicate a line break is symbolized
40here as an entity, <br/, and every <p> has a corresponding </p>.
41 The absence of an end-field tag, or the presence of an end-field tag
42without a prior begin-field tag constitutes a typographical error, of which
43there may be a significant number. Any errors detected should be brought
44to the attention of PJC or the appropriate editor.
45 Most of the tagged fields are presented in the text in italic type,
46with a number of exceptions. Where a word is contained within more than
47one field, the innermost field determines the font to be used. Wherever
48recognizable functional fields were found, an attempt was made to tag the
49field with a functional mark, but in many cases, words were italicised only
50to represent the word itself as a discourse entity, and in some such cases,
51the "italic" mark <it> was used, implying nothing regarding functionality
52of the word. The base font is considered "plain". Where an italic field
53is indicated, parentheses or brackets within the field are not italicised.
54 Where no font is specified for a tag, the tag is merely a functional
55division, and was printed in plain font unless otherwise tagged. This type
56of segment is marked by an asterisk (*) where the font name would be.
57 The size of the "plain" font in the original text is about 1.6 mm for
58the height of capitalized letters.
59=============================================================
60Explicit typographical tags:
61 These were used where the purpose of a different font was merely to
62distinguish a word from the body of the text, and no explicit functional
63tag seemed apropriate.
64-----------------------------------
65Tag Font
66-----------------------------------
67Explicit formatting tags:
68. . . . . . . . . . . . . . . . . .
69<plain> plain font (that used in the body of a definition) --
70 normally not marked, except within fields of
71 a different front.
72<it> italic (in master files)
73<i> italic (for use in HTML presentation)
74<bold> bold (in master files)
75<b> bold (for use in HTML presentation)
76<colf> bold, Collocation font. Same font as used in collocations.
77 smaller This is used only in the list of "un-" words not
78 by 1 point actually defined in the dictionary. Probably could be
79 replaced by a segment mark for the entire list!
80 The "un-" words should be indexed as headwords.
81
82<ct> bold Same as <colf>, a font similar to that used in
83 collocations. However, this tag is used in a table
84 and could be set to a different font.
85
86<h1> * HTML tag -- largest heading font.
87
88<h2> * HTML tag -- second largest heading font.
89
90<headrow> * Marks a Row title in a table.
91
92<hwf> Font the same as the headword <hw>, though the field is
93 not a headword. Used only once.
94
95<mitem> * Multiple items, a set of items in a table.
96<point ...> A series of point size markers, many unique.
97<point1.5> * One of the tags of the form <point**> where **
98<point6> represents the typographic point size of the
99 enclosed text.
100<pre> An HTML tag indicating that the enclosed text is
101 of teletype form, preformatted in a uniform-spaced
102 font.
103<sc> small caps (used mostly for "a. d.", "b. c.")
104 This is the same font a <er>, but has no functional
105 or semantic significance
106<str> group of table data elements in a table
107<sub> subscript, like <subs>
108<subs> subscript
109<sups> superscript
110<supr> superscript
111<sansserif> Sans-serif font
112<stypec> Bold (collocation font) and also a subtype.
113<tt> HTML tage -- teletype font
114<universbold> A squared bold font without serifs approximating the
115 "universe bold" font on the HP Laserjet4, slightly
116 larger than the capitals in a definition body. Used
117 in expositions describing shapes, such as
118 "Y", "T", "U", "X", "V", "F".
119<vertical> Vertically organized column.
120<column1> Vertically organized column -- only part of a table
121 which needs to be completed. Used once.
122<...type> A series of tags, many unique, designating certain
123 unusual fonts, such as "bourgeoistype" for
124 "bourgeois type", in the section on typography.
125 Most of these occur only once, in the section on fonts.
126<antiquetype>
127<blacklettertype>
128<boldfacetype>
129<bourgeoistype>
130<boxtype>
131<clarendontype>
132<englishtype>
133<extendedtype>
134<frenchelzevirtype>
135<germantype>
136<gothictype>
137<greatprimertype>
138<longprimertype>
139<miniontype>
140<nonpareiltype>
141<oldenglishtype>
142<oldstyletype>
143<pearltype>
144<picatype>
145<scripttype>
146<smpicatype>
147<typewritertype>
148
149=============================================================
150Tags with semantic content:
151. . . . . . . . . . . . . . . . . . . . . . . . . . .
152<altsp> * Alternative spelling segment. Almost always
153 contained within square brackets after the main
154 definition segment. Expository words
155 such as "Spelled also" are in plain font;
156 the actual alternative spelling is marked by
157 <asp> ... </asp> tags within this segment.
158
159<ant> italic Antonym.
160
161<asp> italic Alternative spelling. The actual word which is an
162 alternative spelling to the headword. These
163 are functionally synonyms of the headword. In
164 most cases these also occur as headwords, with
165 reference to the word where the actual definition
166 is found, but not all such words are listed
167 separately, particularly if the spelling is
168 close enough to the headword to be found at the
169 same point in the dictionary. Whether listed
170 separately or not, these words should
171 be indexed at this location, also.
172
173<au> italic Authority or author. Used where an authority is
174 (may be right- given for a definition, and also used for the
175 justified. See author, where a quotation within double quotes
176 in the section is given in the same paragraph as the
177 on formatting). definition. The double quotes are indicated
178 by the open-quote (\'bd) and close-quote
179 (\'b8). In both cases, it is typically
180 right-justified, almost always fitting on
181 the same line with the last line of the
182 definition or quotation.
183 Within collocation segments, it is usually
184 used only after quotations, and is not right-
185 justified, except occasionally where it
186 would be close to the right margin, and then
187 apparently is is right-justified. We have
188 not explicitly marked those which are
189 right-justified, but they can be
190 recognized because they are on a line by
191 themselves, preceded by two carriage returns.
192
193<bio> * Marks a biography. Should be longer than
194 a short mention of who a person was, which
195 is typically included as a definition.
196
197<biography> * Same as <bio>
198
199<booki> italic Marks the name of a book, pamphlet, or similar
200 document.
201
202<branchof> * A field of knowledge which of which the headword
203 is a division.
204
205<caption> * Caption of a figure or table.
206
207<cas> * tags the CAS (Chemical Abstracts Service) registry
208 number for a chemical substance.
209
210<causes> italic tags the infectious disease caused by the headword.
211 Implied type of the agent is a microorganism, and
212 the tag must mark a disease.
213
214<causesp> * Same as <causes> without the italic type.
215<causedbyp> * Same as <causedby> without the italic type.
216
217<causedby> italic inverse of causes: tags the causative agent of an
218 infectious disease, which is the headword .
219 the tag must mark a microorganism, virus, or
220 prion, and the implied type of the headword is
221 a disease.
222
223<centered> Used only for The single letter in the headers to each
224 letter of the alphabet.
225
226<city> *