aboutsummaryrefslogtreecommitdiff
path: root/tagset.txt
diff options
context:
space:
mode:
authorSergey Poznyakoff <gray@gnu.org.ua>2012-02-03 12:48:52 +0200
committerSergey Poznyakoff <gray@gnu.org.ua>2012-02-03 12:48:52 +0200
commitd18a469b7a5a4d4b5da21eab37f34ab1e99a8dce (patch)
tree7eb331e376e85287c25b6a9734dae58a4724da8a /tagset.txt
parent4a458db06b28492a7e48b1a0560b35778e476482 (diff)
downloadgcide-d18a469b7a5a4d4b5da21eab37f34ab1e99a8dce.tar.gz
gcide-d18a469b7a5a4d4b5da21eab37f34ab1e99a8dce.tar.bz2
Revise tagset.txt
* tagset.txt: Review. * README: Reformat. * webfont.txt: Reformat. Document <and/ and <or/.
Diffstat (limited to 'tagset.txt')
-rw-r--r--tagset.txt2057
1 files changed, 1056 insertions, 1001 deletions
diff --git a/tagset.txt b/tagset.txt
index 9a7a501..0093d42 100644
--- a/tagset.txt
+++ b/tagset.txt
@@ -1,13 +1,14 @@
1 FIELD MARKS FOR WEBSTER 1913 and CIDE 1FIELD MARKS FOR WEBSTER 1913 and CIDE
2 ===================================== 2=====================================
3 Explanations of the tags used to mark the Webster 1913 dictionary 3
4and the CIDE (Collaborative International Dictionary of English). 4* Overview
5Note that the list of tags used to mark the public domain version 5
6of this dictionary is shorter than the full set described here. 6This file describes the tags used to mark the Webster 1913 dictionary and
7 If any tag is not listed here, it is either (1) one of the 7the GCIDE (GNU Collaborative International Dictionary of English).
8"point" (font size) or "type" (font style) tags, which should be 8
9self-explanatory; or (2) is a functional field with no effect on the 9If any tag is not listed here, it is either (1) one of the "point" (font
10typography. 10size) or "type" (font style) tags, which should be self-explanatory; or (2)
11is a functional field with no effect on the typography.
11 12
12Last modified March 12, 1999. 13Last modified March 12, 1999.
13 For questions, contact: 14 For questions, contact:
@@ -15,114 +16,145 @@ Last modified March 12, 1999.
15 735 Belvidere Ave. 16 735 Belvidere Ave.
16 Plainfield, NJ 07062 17 Plainfield, NJ 07062
17 (908) 561-3416 or (908) 668-5252 18 (908) 561-3416 or (908) 668-5252
18------------------------------------------------------------- 19
19A separate file, webfont.txt, contains the list of the individual 20A separate file, webfont.txt, contains the list of the individual
20non-ASCII characters represented by either higher-order hexadecimal 21non-ASCII characters represented by either higher-order hexadecimal
21character marks (e.g., \'94, for o-umlaut) or by entity tags 22character marks (e.g., \'94, for o-umlaut) or by entity tags (e.g.,
22(e.g., <root/, for the square root symbol.) 23<root/, for the square root symbol.)
23-------------------------------------------------------------- 24
24 Use of tags: 25* Introduction
25 In the MICRA electronic version of the 1913 Webster, each part of 26
26the entry headed by an entry word ("headword") is labeled so that no 27In the MICRA electronic version of the 1913 Webster and in GCIDE, each part
27part of the entry except some punctuation marks should be found 28of the entry headed by an entry word ("headword") is labeled so that no part
28outside of all fields, i.e. every character should be within some tagged 29of the entry except some punctuation marks should be found outside of all
29field. In the following description, the word "segment" usually refers to 30fields, i.e. every character should be within some tagged field. In the
30a major part of an entry such as an etymology or a definition or a 31following description, the word "segment" usually refers to a major part of
31collocation segment or a usage block, containing more than one field. 32an entry such as an etymology or a definition or a collocation segment or a
32The term "field" may also be used similarly to "segment", but may also 33usage block, containing more than one field. The term "field" may also be
33denote single-word fields, such as an alternative spelling, labeled <asp>. 34used similarly to "segment", but may also denote single-word fields, such as
34 35an alternative spelling, labeled <asp>.
35 Note: The tags on this list are similar in structure to SGML tags. Each 36
36tag on this list marks a field; each field opens with a tagname between 37The tags on this list are similar in structure to SGML tags. Each tag on
37angle brackets thus: <tagname>, and closes with a similar tag containing 38this list marks a field; each field opens with a tagname between angle
38the forward slash thus: </tagname>. No tags are used without closing 39brackets thus: <tagname>, and closes with a similar tag containing the
39tags. Thus the HTML <BR> to indicate a line break is symbolized 40forward slash thus: </tagname>. No tags are used without closing tags.
40here as an entity, <br/, and every <p> has a corresponding </p>. 41Thus a line break (similar to HTML <br> tag) is symbolized here as an
41 The absence of an end-field tag, or the presence of an end-field tag 42entity, <br/, and every <p> has a corresponding </p>.
42without a prior begin-field tag constitutes a typographical error, of which 43
43there may be a significant number. Any errors detected should be brought 44The absence of an end-field tag, or the presence of an end-field tag without
44to the attention of PJC or the appropriate editor. 45a prior begin-field tag constitutes a typographical error, of which there
45 Most of the tagged fields are presented in the text in italic type, 46may be a significant number. Any errors detected should be brought to the
46with a number of exceptions. Where a word is contained within more than 47attention of PJC or the appropriate editor.
47one field, the innermost field determines the font to be used. Wherever 48
48recognizable functional fields were found, an attempt was made to tag the 49Most of the tagged fields are presented in the text in italic type, with a
49field with a functional mark, but in many cases, words were italicised only 50number of exceptions. Where a word is contained within more than one field,
50to represent the word itself as a discourse entity, and in some such cases, 51the innermost field determines the font to be used. Wherever recognizable
51the "italic" mark <it> was used, implying nothing regarding functionality 52functional fields were found, an attempt was made to tag the field with a
52of the word. The base font is considered "plain". Where an italic field 53functional mark, but in many cases, words were italicised only to represent
53is indicated, parentheses or brackets within the field are not italicised. 54the word itself as a discourse entity, and in some such cases, the "italic"
54 Where no font is specified for a tag, the tag is merely a functional 55mark <it> was used, implying nothing regarding functionality of the word.
56The base font is considered "plain". Where an italic field is indicated,
57parentheses or brackets within the field are not italicised.
58
59Where no font is specified for a tag, the tag is merely a functional
55division, and was printed in plain font unless otherwise tagged. This type 60division, and was printed in plain font unless otherwise tagged. This type
56of segment is marked by an asterisk (*) where the font name would be. 61of segment is marked by an asterisk (*) where the font name would be. The
57 The size of the "plain" font in the original text is about 1.6 mm for 62size of the "plain" font in the original text is about 1.6 mm for the height
58the height of capitalized letters. 63of capitalized letters.
59============================================================= 64
60Explicit typographical tags: 65* Explicit typographical tags
61 These were used where the purpose of a different font was merely to 66
62distinguish a word from the body of the text, and no explicit functional 67These were used where the purpose of a different font was merely to
63tag seemed apropriate. 68distinguish a word from the body of the text, and no explicit functional tag
64----------------------------------- 69seemed apropriate.
65Tag Font 70
66----------------------------------- 71-------------------------------------------------------------------------
67Explicit formatting tags: 72Tag Font Description
68. . . . . . . . . . . . . . . . . . 73-------------------------------------------------------------------------
69<plain> plain font (that used in the body of a definition) -- 74<plain> plain font that used in the body of a definition -- normally
70 normally not marked, except within fields of 75 not marked, except within fields of a different
71 a different front. 76 front.
72<it> italic (in master files) 77
73<i> italic (for use in HTML presentation) 78<it> italic in master files
74<bold> bold (in master files) 79
75<b> bold (for use in HTML presentation) 80<i> italic for use in HTML presentation
76<colf> bold, Collocation font. Same font as used in collocations. 81
77 smaller This is used only in the list of "un-" words not 82<bold> bold in master files
78 by 1 point actually defined in the dictionary. Probably could be 83
79 replaced by a segment mark for the entire list! 84<b> bold for use in HTML presentation
80 The "un-" words should be indexed as headwords. 85
81 86<colf> bold, Collocation font. Same font as used in
82<ct> bold Same as <colf>, a font similar to that used in 87 collocations.
83 collocations. However, this tag is used in a table 88 smaller This is used only in the list of "un-"
84 and could be set to a different font. 89 by 1 point words not actually defined in the
85 90 dictionary.
86<h1> * HTML tag -- largest heading font. 91 Probably could be replaced by a segment mark
87 92 for the entire list! The "un-" words should
88<h2> * HTML tag -- second largest heading font. 93 be indexed as headwords.
89 94
90<headrow> * Marks a Row title in a table. 95<ct> bold Same as <colf>, a font similar to that used
91 96 in collocations. However, this tag is used
92<hwf> Font the same as the headword <hw>, though the field is 97 in a table and could be set to a different
93 not a headword. Used only once. 98 font.
94 99
95<mitem> * Multiple items, a set of items in a table. 100<h1> * HTML tag -- largest heading font.
96<point ...> A series of point size markers, many unique. 101
97<point1.5> * One of the tags of the form <point**> where ** 102<h2> * HTML tag -- second largest heading font.
98<point6> represents the typographic point size of the 103
99 enclosed text. 104<headrow> * Marks a Row title in a table.
100<pre> An HTML tag indicating that the enclosed text is 105
101 of teletype form, preformatted in a uniform-spaced 106<hwf> Font the same as the headword <hw>, though
102 font. 107 the field is not a headword. Used only
103<sc> small caps (used mostly for "a. d.", "b. c.") 108 once.
104 This is the same font a <er>, but has no functional 109
105 or semantic significance 110<mitem> * Multiple items, a set of items in a table.
106<str> group of table data elements in a table 111<point ...> A series of point size markers, many
107<sub> subscript, like <subs> 112 unique.
108<subs> subscript 113
109<sups> superscript 114<point1.5> * One of the tags of the form <point**> where **
110<supr> superscript 115<point6> represents the typographic point size of the
111<sansserif> Sans-serif font 116 enclosed text.
112<stypec> Bold (collocation font) and also a subtype. 117
113<tt> HTML tage -- teletype font 118<pre> An HTML tag indicating that the enclosed
114<universbold> A squared bold font without serifs approximating the 119 text is of teletype form, preformatted in a
115 "universe bold" font on the HP Laserjet4, slightly 120 uniform-spaced font.
116 larger than the capitals in a definition body. Used 121
117 in expositions describing shapes, such as 122<sc> small caps used mostly for "a. d.", "b. c."
118 "Y", "T", "U", "X", "V", "F". 123 This is the same font as in <er>, but has no
119<vertical> Vertically organized column. 124 functional or semantic significance.
120<column1> Vertically organized column -- only part of a table 125
121 which needs to be completed. Used once. 126<str> group of table data elements in a table.
122<...type> A series of tags, many unique, designating certain 127
123 unusual fonts, such as "bourgeoistype" for 128<sub> subscript
124 "bourgeois type", in the section on typography. 129
125 Most of these occur only once, in the section on fonts. 130<subs> subscript
131
132<sups> superscript
133
134<supr> superscript
135
136<sansserif> Sans-serif
137
138<stypec> Bold collocation font, and also a subtype.
139
140<tt> HTML tage -- teletype font
141
142<universbold> A squared bold font without serifs approximating
143 the "universe bold" font on the HP Laserjet4,
144 slightly larger than the capitals in a definition
145 body. Used in expositions describing shapes,
146 such as "Y", "T", "U", "X", "V", "F".
147
148<vertical> Vertically organized column.
149
150<column1> Vertically organized column -- only part of a table
151 which needs to be completed. Used once.
152
153<...type> A series of tags, many unique, designating