diff options
author | Sergey Poznyakoff <gray@gnu.org.ua> | 2012-02-03 12:48:52 +0200 |
---|---|---|
committer | Sergey Poznyakoff <gray@gnu.org.ua> | 2012-02-03 12:48:52 +0200 |
commit | d18a469b7a5a4d4b5da21eab37f34ab1e99a8dce (patch) | |
tree | 7eb331e376e85287c25b6a9734dae58a4724da8a /tagset.txt | |
parent | 4a458db06b28492a7e48b1a0560b35778e476482 (diff) | |
download | gcide-d18a469b7a5a4d4b5da21eab37f34ab1e99a8dce.tar.gz gcide-d18a469b7a5a4d4b5da21eab37f34ab1e99a8dce.tar.bz2 |
Revise tagset.txt
* tagset.txt: Review.
* README: Reformat.
* webfont.txt: Reformat. Document <and/ and <or/.
Diffstat (limited to 'tagset.txt')
-rw-r--r-- | tagset.txt | 2057 |
1 files changed, 1056 insertions, 1001 deletions
@@ -1,13 +1,14 @@ | |||
1 | FIELD MARKS FOR WEBSTER 1913 and CIDE | 1 | FIELD MARKS FOR WEBSTER 1913 and CIDE |
2 | ===================================== | 2 | ===================================== |
3 | Explanations of the tags used to mark the Webster 1913 dictionary | 3 | |
4 | and the CIDE (Collaborative International Dictionary of English). | 4 | * Overview |
5 | Note that the list of tags used to mark the public domain version | 5 | |
6 | of this dictionary is shorter than the full set described here. | 6 | This file describes the tags used to mark the Webster 1913 dictionary and |
7 | If any tag is not listed here, it is either (1) one of the | 7 | the GCIDE (GNU Collaborative International Dictionary of English). |
8 | "point" (font size) or "type" (font style) tags, which should be | 8 | |
9 | self-explanatory; or (2) is a functional field with no effect on the | 9 | If any tag is not listed here, it is either (1) one of the "point" (font |
10 | typography. | 10 | size) or "type" (font style) tags, which should be self-explanatory; or (2) |
11 | is a functional field with no effect on the typography. | ||
11 | 12 | ||
12 | Last modified March 12, 1999. | 13 | Last modified March 12, 1999. |
13 | For questions, contact: | 14 | For questions, contact: |
@@ -15,114 +16,145 @@ Last modified March 12, 1999. | |||
15 | 735 Belvidere Ave. | 16 | 735 Belvidere Ave. |
16 | Plainfield, NJ 07062 | 17 | Plainfield, NJ 07062 |
17 | (908) 561-3416 or (908) 668-5252 | 18 | (908) 561-3416 or (908) 668-5252 |
18 | ------------------------------------------------------------- | 19 | |
19 | A separate file, webfont.txt, contains the list of the individual | 20 | A separate file, webfont.txt, contains the list of the individual |
20 | non-ASCII characters represented by either higher-order hexadecimal | 21 | non-ASCII characters represented by either higher-order hexadecimal |
21 | character marks (e.g., \'94, for o-umlaut) or by entity tags | 22 | character marks (e.g., \'94, for o-umlaut) or by entity tags (e.g., |
22 | (e.g., <root/, for the square root symbol.) | 23 | <root/, for the square root symbol.) |
23 | -------------------------------------------------------------- | 24 | |
24 | Use of tags: | 25 | * Introduction |
25 | In the MICRA electronic version of the 1913 Webster, each part of | 26 | |
26 | the entry headed by an entry word ("headword") is labeled so that no | 27 | In the MICRA electronic version of the 1913 Webster and in GCIDE, each part |
27 | part of the entry except some punctuation marks should be found | 28 | of the entry headed by an entry word ("headword") is labeled so that no part |
28 | outside of all fields, i.e. every character should be within some tagged | 29 | of the entry except some punctuation marks should be found outside of all |
29 | field. In the following description, the word "segment" usually refers to | 30 | fields, i.e. every character should be within some tagged field. In the |
30 | a major part of an entry such as an etymology or a definition or a | 31 | following description, the word "segment" usually refers to a major part of |
31 | collocation segment or a usage block, containing more than one field. | 32 | an entry such as an etymology or a definition or a collocation segment or a |
32 | The term "field" may also be used similarly to "segment", but may also | 33 | usage block, containing more than one field. The term "field" may also be |
33 | denote single-word fields, such as an alternative spelling, labeled <asp>. | 34 | used similarly to "segment", but may also denote single-word fields, such as |
34 | 35 | an alternative spelling, labeled <asp>. | |
35 | Note: The tags on this list are similar in structure to SGML tags. Each | 36 | |
36 | tag on this list marks a field; each field opens with a tagname between | 37 | The tags on this list are similar in structure to SGML tags. Each tag on |
37 | angle brackets thus: <tagname>, and closes with a similar tag containing | 38 | this list marks a field; each field opens with a tagname between angle |
38 | the forward slash thus: </tagname>. No tags are used without closing | 39 | brackets thus: <tagname>, and closes with a similar tag containing the |
39 | tags. Thus the HTML <BR> to indicate a line break is symbolized | 40 | forward slash thus: </tagname>. No tags are used without closing tags. |
40 | here as an entity, <br/, and every <p> has a corresponding </p>. | 41 | Thus a line break (similar to HTML <br> tag) is symbolized here as an |
41 | The absence of an end-field tag, or the presence of an end-field tag | 42 | entity, <br/, and every <p> has a corresponding </p>. |
42 | without a prior begin-field tag constitutes a typographical error, of which | 43 | |
43 | there may be a significant number. Any errors detected should be brought | 44 | The absence of an end-field tag, or the presence of an end-field tag without |
44 | to the attention of PJC or the appropriate editor. | 45 | a prior begin-field tag constitutes a typographical error, of which there |
45 | Most of the tagged fields are presented in the text in italic type, | 46 | may be a significant number. Any errors detected should be brought to the |
46 | with a number of exceptions. Where a word is contained within more than | 47 | attention of PJC or the appropriate editor. |
47 | one field, the innermost field determines the font to be used. Wherever | 48 | |
48 | recognizable functional fields were found, an attempt was made to tag the | 49 | Most of the tagged fields are presented in the text in italic type, with a |
49 | field with a functional mark, but in many cases, words were italicised only | 50 | number of exceptions. Where a word is contained within more than one field, |
50 | to represent the word itself as a discourse entity, and in some such cases, | 51 | the innermost field determines the font to be used. Wherever recognizable |
51 | the "italic" mark <it> was used, implying nothing regarding functionality | 52 | functional fields were found, an attempt was made to tag the field with a |
52 | of the word. The base font is considered "plain". Where an italic field | 53 | functional mark, but in many cases, words were italicised only to represent |
53 | is indicated, parentheses or brackets within the field are not italicised. | 54 | the word itself as a discourse entity, and in some such cases, the "italic" |
54 | Where no font is specified for a tag, the tag is merely a functional | 55 | mark <it> was used, implying nothing regarding functionality of the word. |
56 | The base font is considered "plain". Where an italic field is indicated, | ||
57 | parentheses or brackets within the field are not italicised. | ||
58 | |||
59 | Where no font is specified for a tag, the tag is merely a functional | ||
55 | division, and was printed in plain font unless otherwise tagged. This type | 60 | division, and was printed in plain font unless otherwise tagged. This type |
56 | of segment is marked by an asterisk (*) where the font name would be. | 61 | of segment is marked by an asterisk (*) where the font name would be. The |
57 | The size of the "plain" font in the original text is about 1.6 mm for | 62 | size of the "plain" font in the original text is about 1.6 mm for the height |
58 | the height of capitalized letters. | 63 | of capitalized letters. |
59 | ============================================================= | 64 | |
60 | Explicit typographical tags: | 65 | * Explicit typographical tags |
61 | These were used where the purpose of a different font was merely to | 66 | |
62 | distinguish a word from the body of the text, and no explicit functional | 67 | These were used where the purpose of a different font was merely to |
63 | tag seemed apropriate. | 68 | distinguish a word from the body of the text, and no explicit functional tag |
64 | ----------------------------------- | 69 | seemed apropriate. |
65 | Tag Font | 70 | |
66 | ----------------------------------- | 71 | ------------------------------------------------------------------------- |
67 | Explicit formatting tags: | 72 | Tag Font Description |
68 | . . . . . . . . . . . . . . . . . . | 73 | ------------------------------------------------------------------------- |
69 | <plain> plain font (that used in the body of a definition) -- | 74 | <plain> plain font that used in the body of a definition -- normally |
70 | normally not marked, except within fields of | 75 | not marked, except within fields of a different |
71 | a different front. | 76 | front. |
72 | <it> italic (in master files) | 77 | |
73 | <i> italic (for use in HTML presentation) | 78 | <it> italic in master files |
74 | <bold> bold (in master files) | 79 | |
75 | <b> bold (for use in HTML presentation) | 80 | <i> italic for use in HTML presentation |
76 | <colf> bold, Collocation font. Same font as used in collocations. | 81 | |
77 | smaller This is used only in the list of "un-" words not | 82 | <bold> bold in master files |
78 | by 1 point actually defined in the dictionary. Probably could be | 83 | |
79 | replaced by a segment mark for the entire list! | 84 | <b> bold for use in HTML presentation |
80 | The "un-" words should be indexed as headwords. | 85 | |
81 | 86 | <colf> bold, Collocation font. Same font as used in | |
82 | <ct> bold Same as <colf>, a font similar to that used in | 87 | collocations. |
83 | collocations. However, this tag is used in a table | 88 | smaller This is used only in the list of "un-" |
84 | and could be set to a different font. | 89 | by 1 point words not actually defined in the |
85 | 90 | dictionary. | |
86 | <h1> * HTML tag -- largest heading font. | 91 | Probably could be replaced by a segment mark |
87 | 92 | for the entire list! The "un-" words should | |
88 | <h2> * HTML tag -- second largest heading font. | 93 | be indexed as headwords. |
89 | 94 | ||
90 | <headrow> * Marks a Row title in a table. | 95 | <ct> bold Same as <colf>, a font similar to that used |
91 | 96 | in collocations. However, this tag is used | |
92 | <hwf> Font the same as the headword <hw>, though the field is | 97 | in a table and could be set to a different |
93 | not a headword. Used only once. | 98 | font. |
94 | 99 | ||
95 | <mitem> * Multiple items, a set of items in a table. | 100 | <h1> * HTML tag -- largest heading font. |
96 | <point ...> A series of point size markers, many unique. | 101 | |
97 | <point1.5> * One of the tags of the form <point**> where ** | 102 | <h2> * HTML tag -- second largest heading font. |
98 | <point6> represents the typographic point size of the | 103 | |
99 | enclosed text. | 104 | <headrow> * Marks a Row title in a table. |
100 | <pre> An HTML tag indicating that the enclosed text is | 105 | |
101 | of teletype form, preformatted in a uniform-spaced | 106 | <hwf> Font the same as the headword <hw>, though |
102 | font. | 107 | the field is not a headword. Used only |
103 | <sc> small caps (used mostly for "a. d.", "b. c.") | 108 | once. |
104 | This is the same font a <er>, but has no functional | 109 | |
105 | or semantic significance | 110 | <mitem> * Multiple items, a set of items in a table. |
106 | <str> group of table data elements in a table | 111 | <point ...> A series of point size markers, many |
107 | <sub> subscript, like <subs> | 112 | unique. |
108 | <subs> subscript | 113 | |
109 | <sups> superscript | 114 | <point1.5> * One of the tags of the form <point**> where ** |
110 | <supr> superscript | 115 | <point6> represents the typographic point size of the |
111 | <sansserif> Sans-serif font | 116 | enclosed text. |
112 | <stypec> Bold (collocation font) and also a subtype. | 117 | |
113 | <tt> HTML tage -- teletype font | 118 | <pre> An HTML tag indicating that the enclosed |
114 | <universbold> A squared bold font without serifs approximating the | 119 | text is of teletype form, preformatted in a |
115 | "universe bold" font on the HP Laserjet4, slightly | 120 | uniform-spaced font. |
116 | larger than the capitals in a definition body. Used | 121 | |
117 | in expositions describing shapes, such as | 122 | <sc> small caps used mostly for "a. d.", "b. c." |
118 | "Y", "T", "U", "X", "V", "F". | 123 | This is the same font as in <er>, but has no |
119 | <vertical> Vertically organized column. | 124 | functional or semantic significance. |
120 | <column1> Vertically organized column -- only part of a table | 125 | |
121 | which needs to be completed. Used once. | 126 | <str> group of table data elements in a table. |
122 | <...type> A series of tags, many unique, designating certain | 127 | |
123 | unusual fonts, such as "bourgeoistype" for | 128 | <sub> subscript |
124 | "bourgeois type", in the section on typography. | 129 | |
125 | Most of these occur only once, in the section on fonts. | 130 | <subs> subscript |
131 | |||
132 | <sups> superscript | ||
133 | |||
134 | <supr> superscript | ||
135 | |||
136 | <sansserif> Sans-serif | ||
137 | |||
138 | <stypec> Bold collocation font, and also a subtype. | ||
139 | |||
140 | <tt> HTML tage -- teletype font | ||
141 | |||
142 | <universbold> A squared bold font without serifs approximating | ||
143 | the "universe bold" font on the HP Laserjet4, | ||
144 | slightly larger than the capitals in a definition | ||
145 | body. Used in expositions describing shapes, | ||
146 | such as "Y", "T", "U", "X", "V", "F". | ||
147 | |||
148 | <vertical> Vertically organized column. | ||
149 | |||
150 | <column1> Vertically organized column -- only part of a table | ||
151 | which needs to be completed. Used once. | ||
152 | |||
153 | <...type> A series of tags, many unique, designating | ||