diff options
author | Sergey Poznyakoff <gray@gnu.org.ua> | 2012-02-03 12:48:52 +0200 |
---|---|---|
committer | Sergey Poznyakoff <gray@gnu.org.ua> | 2012-02-03 12:48:52 +0200 |
commit | d18a469b7a5a4d4b5da21eab37f34ab1e99a8dce (patch) | |
tree | 7eb331e376e85287c25b6a9734dae58a4724da8a | |
parent | 4a458db06b28492a7e48b1a0560b35778e476482 (diff) | |
download | gcide-d18a469b7a5a4d4b5da21eab37f34ab1e99a8dce.tar.gz gcide-d18a469b7a5a4d4b5da21eab37f34ab1e99a8dce.tar.bz2 |
Revise tagset.txt
* tagset.txt: Review.
* README: Reformat.
* webfont.txt: Reformat. Document <and/ and <or/.
-rw-r--r-- | README | 363 | ||||
-rw-r--r-- | tagset.txt | 2057 | ||||
-rw-r--r-- | webfont.txt | 302 |
3 files changed, 1385 insertions, 1337 deletions
@@ -8,29 +8,27 @@ The README file | |||
8 | * * * * * * * * * * * * * * * * * * * * * * * * * * * * | 8 | * * * * * * * * * * * * * * * * * * * * * * * * * * * * |
9 | 9 | ||
10 | * OVERVIEW | 10 | * OVERVIEW |
11 | ========== | 11 | |
12 | This document describes the GNU version of the Collaborative | 12 | This document describes the GNU version of the Collaborative International |
13 | International Dictionary of English. It is organized into a series of | 13 | Dictionary of English. It is organized into a series of chapters, |
14 | chapters, introduced by headings beginning with a single asterisk. A | 14 | introduced by headings beginning with a single asterisk. A chapter may have |
15 | chapter may have sections, which are marked with two asterisks. For | 15 | sections, which are marked with two asterisks. For those readers who use |
16 | those readers who use Emacs, this structure corresponds to its | 16 | Emacs, this structure corresponds to its "Outline mode", which will be |
17 | "Outline mode", which will be enabled automatically upon loading this | 17 | enabled automatically upon loading this file. |
18 | file. | 18 | |
19 | 19 | The chapter "INTRODUCTION" describes the structure of this package. The | |
20 | The chapter "INTRODUCTION" describes the structure of this package. | 20 | chapter "STRUCTURE OF THE DICTIONARY" describes the dictionary structure in |
21 | The chapter "STRUCTURE OF THE DICTIONARY" describes the dictionary | 21 | general. An overview of the markup tags is provided in the chapter "TAGS". |
22 | structure in general. An overview of the markup tags is provided in | 22 | A detailed information about dictionary markup can be obtained from a set of |
23 | the chapter "TAGS". A detailed information about dictionary markup | 23 | ancillary files included in this package, which are described in the chapter |
24 | can be obtained from a set of ancillary files included in this | 24 | "ANCILLARY FILES". |
25 | package, which are described in the chapter "ANCILLARY FILES". | 25 | |
26 | 26 | The chapter "DICTIONARY LOOKUP" describes how to use GNU Dico for reading | |
27 | The chapter "DICTIONARY LOOKUP" describes how to use GNU Dico for | 27 | this dictionary. Finally, other versions of the Webster dictionary are |
28 | reading this dictionary. Finally, other versions of the Webster | 28 | listed in the chapter "OTHER VERSIONS OF THE DICTIONARY". |
29 | dictionary are listed in the chapter "OTHER VERSIONS OF THE | ||
30 | DICTIONARY". | ||
31 | 29 | ||
32 | * INTRODUCTION | 30 | * INTRODUCTION |
33 | ============== | 31 | |
34 | The dictionary was derived from the | 32 | The dictionary was derived from the |
35 | Webster's Revised Unabridged Dictionary | 33 | Webster's Revised Unabridged Dictionary |
36 | Version published 1913 | 34 | Version published 1913 |
@@ -48,18 +46,17 @@ and has been supplemented with some of the definitions from | |||
48 | 46 | ||
49 | and is being proof-read and supplemented by volunteers from around the | 47 | and is being proof-read and supplemented by volunteers from around the |
50 | world. This is an unfunded project, and future enhancement of this | 48 | world. This is an unfunded project, and future enhancement of this |
51 | dictionary will depend on the efforts of volunteers willing to help | 49 | dictionary will depend on the efforts of volunteers willing to help build |
52 | build this free resource into a comprehensive body of general | 50 | this free resource into a comprehensive body of general information. New |
53 | information. New definitions for missing words or words senses and | 51 | definitions for missing words or words senses and longer explanatory notes, |
54 | longer explanatory notes, as well as images to accompany the articles | 52 | as well as images to accompany the articles are needed. More modern |
55 | are needed. More modern illustrative quotations giving recent | 53 | illustrative quotations giving recent examples of usage of the words in |
56 | examples of usage of the words in their various senses will be very | 54 | their various senses will be very helpful, since most quotations in the |
57 | helpful, since most quotations in the original 1913 dictionary are now | 55 | original 1913 dictionary are now well over 100 years old. |
58 | well over 100 years old. | 56 | |
59 | 57 | This electronic version is being maintained by World Soul, a non-profit | |
60 | This electronic version is being maintained by World Soul, a | 58 | organization in Plainfield, NJ. For additional information or if you are |
61 | non-profit organization in Plainfield, NJ. For additional information | 59 | willing to assist construction of this data source, contact: |
62 | or if you are willing to assist construction of this data source, contact: | ||
63 | 60 | ||
64 | =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | 61 | =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= |
65 | Patrick J. Cassidy | TEL: (908) 561-3416 | 62 | Patrick J. Cassidy | TEL: (908) 561-3416 |
@@ -69,40 +66,38 @@ or if you are willing to assist construction of this data source, contact: | |||
69 | pc@worldsoul.org or cassidy@micra.com | 66 | pc@worldsoul.org or cassidy@micra.com |
70 | =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | 67 | =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= |
71 | 68 | ||
72 | GCIDE is free software; you can redistribute it and/or modify | 69 | GCIDE is free software; you can redistribute it and/or modify it under the |
73 | it under the terms of the GNU General Public License as published by | 70 | terms of the GNU General Public License as published by the Free Software |
74 | the Free Software Foundation; either version 2, or (at your option) | 71 | Foundation; either version 2, or (at your option) any later version. |
75 | any later version. | ||
76 | 72 | ||
77 | GCIDE is distributed in the hope that it will be useful, | 73 | GCIDE is distributed in the hope that it will be useful, but WITHOUT ANY |
78 | but WITHOUT ANY WARRANTY; without even the implied warranty of | 74 | WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS |
79 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | 75 | FOR A PARTICULAR PURPOSE. See the GNU General Public License for more |
80 | GNU General Public License for more details. | 76 | details. |
81 | 77 | ||
82 | You should have received a copy of the GNU General Public License | 78 | You should have received a copy of the GNU General Public License along with |
83 | along with this copy of GCIDE; see the file COPYING. If not, write | 79 | this copy of GCIDE; see the file COPYING. If not, write to the Free |
84 | to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, | 80 | Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA |
85 | Boston, MA 02111-1307, USA. | 81 | 02111-1307, USA. |
86 | 82 | ||
87 | * STRUCTURE OF THE DICTIONARY | 83 | * STRUCTURE OF THE DICTIONARY |
88 | ============================= | 84 | |
89 | When the archive is unpacked, the main dictionary text of the GCIDE | 85 | When the archive is unpacked, the main dictionary text of the GCIDE will be |
90 | will be found in 26 files named "CIDE.*", where the asterisk indicates | 86 | found in 26 files named "CIDE.*", where the asterisk indicates which letter |
91 | which letter of the alphabet begins the words in each file. For | 87 | of the alphabet begins the words in each file. For example, file "CIDE.B" |
92 | example, file "CIDE.B" contains words beginning with the letter "B". | 88 | contains words beginning with the letter "B". Additional information about |
93 | Additional information about the tagging conventions and special | 89 | the tagging conventions and special character symbols are contained in |
94 | character symbols are contained in ancillary files in this directory | 90 | ancillary files in this directory (see below the section entitled "ANCILLARY |
95 | (see below the section entitled "ANCILLARY FILES"). The main body of | 91 | FILES"). The main body of the 1913 dictionary was essentially identical to |
96 | the 1913 dictionary was essentially identical to the edition published | 92 | the edition published in 1890, and was republished in 1913 with an appendix |
97 | in 1890, and was republished in 1913 with an appendix containing "New | 93 | containing "New Words". The new words of that appendix have been integrated |
98 | Words". The new words of that appendix have been integrated into the | 94 | into the main file in this version. However, it is important to keep in |
99 | main file in this version. However, it is important to keep in mind | 95 | mind that the definitions in this dictionary are in most cases over 100 |
100 | that the definitions in this dictionary are in most cases over 100 | ||
101 | years old. Use them with caution! | 96 | years old. Use them with caution! |
102 | 97 | ||
103 | At the bottom of each paragraph in this dictionary, there is a | 98 | At the bottom of each paragraph in this dictionary, there is a bracketed and |
104 | bracketed and tagged "source" indicated. This tells from where the | 99 | tagged "source" indicated. This tells from where the definition or other |
105 | definition or other text in that paragraph came, as follows: | 100 | text in that paragraph came, as follows: |
106 | 101 | ||
107 | [<source>1913 Webster</source>] | 102 | [<source>1913 Webster</source>] |
108 | = From the original 1890 dictionary. | 103 | = From the original 1890 dictionary. |
@@ -117,46 +112,42 @@ definition or other text in that paragraph came, as follows: | |||
117 | [<source>XXX</source>] | 112 | [<source>XXX</source>] |
118 | = Added by one of the volunteers. | 113 | = Added by one of the volunteers. |
119 | 114 | ||
120 | The original definitions have been tagged and in some cases | 115 | The original definitions have been tagged and in some cases reformatted or |
121 | reformatted or slightly rearranged. If substantive information is | 116 | slightly rearranged. If substantive information is added from a second |
122 | added from a second source, usually the additional source is also | 117 | source, usually the additional source is also noted, as in: |
123 | noted, as in: | ||
124 | 118 | ||
125 | [<source>Webster 1913 Suppl.</source> + <source>WordNet 1.5</source>] | 119 | [<source>Webster 1913 Suppl.</source> + <source>WordNet 1.5</source>] |
126 | 120 | ||
127 | This version is tagged with SGML-like tags of the form <pos>...</pos> | 121 | This version is tagged with SGML-like tags of the form <pos>...</pos> so |
128 | so that the original typography (italics, bold, block quotes) can be | 122 | that the original typography (italics, bold, block quotes) can be |
129 | reproduced. A list of the most important tags for fields in the | 123 | reproduced. A list of the most important tags for fields in the dictionary |
130 | dictionary is given below. The tags also serve the more important | 124 | is given below. The tags also serve the more important function of allowing |
131 | function of allowing the information content to be conveniently | 125 | the information content to be conveniently imported into computer programs |
132 | imported into computer programs or databases. The set of tags used is | 126 | or databases. The set of tags used is described in the accompanying file |
133 | described in the accompanying file "tagset.txt". ***NOTE*** the | 127 | "tagset.txt". ***NOTE*** the paragraph tags <p>...</p> do *not* always nest |
134 | paragraph tags <p>...</p> do *not* always nest properly with certain | 128 | properly with certain other tags, such as <note> and <cs> ("collocation |
135 | other tags, such as <note> and <cs> ("collocation section"), which in | 129 | section"), which in some cases span multiple paragraphs. If you are using a |
136 | some cases span multiple paragraphs. If you are using a tag parser | 130 | tag parser which detects improper nesting, you should first either delete |
137 | which detects improper nesting, you should first either delete the | 131 | the paragraph tags or convert them to non-tag symbols, or, if possible, set |
138 | paragraph tags or convert them to non-tag symbols, or, if possible, | 132 | the parser to ignore the <p>...</p> tags. |
139 | set the parser to ignore the <p>...</p> tags. | 133 | |
140 | 134 | The unusual characters (such as Greek or the European accented characters, | |
141 | The unusual characters (such as Greek or the European accented | 135 | as well as special characters used in the pronunciations) are described in |
142 | characters, as well as special characters used in the pronunciations) | 136 | the accompanying file "webfont.txt". Some information on the pronunciation |
143 | are described in the accompanying file "webfont.txt". Some | 137 | system used may be found by viewing the file "pronunc.jpg", and additional |
144 | information on the pronunciation system used may be found by viewing | 138 | explanations of pronunciation are in the file "pronunc.txt". |
145 | the file "pronunc.jpg", and additional explanations of pronunciation | 139 | |
146 | are in the file "pronunc.txt". | 140 | Each paragraph of the original text is enclosed within tags of the form <p> |
147 | 141 | . . . </p>. Within these paragraphs there are no line breaks, and some of | |
148 | Each paragraph of the original text is enclosed within tags of the | 142 | the paragraphs are over 12,000 characters long, which may prove too long to |
149 | form <p> . . . </p>. Within these paragraphs there are no line | 143 | be handled by some editors. At some points, embedded line breaks within a |
150 | breaks, and some of the paragraphs are over 12,000 characters long, | 144 | "paragraph" are marked by a <br/ "entity". The file can therefore be |
151 | which may prove too long to be handled by some editors. At some | 145 | converted, if necessary, to a form with shorter lines, and subsequently |
152 | points, embedded line breaks within a "paragraph" are marked by a <br/ | 146 | reconverted back to the form having one line per paragraph. |
153 | "entity". The file can therefore be converted, if necessary, to a | 147 | |
154 | form with shorter lines, and subsequently reconverted back to the form | 148 | If additional line breaks are added, then in order to remove the line breaks |
155 | having one line per paragraph. | 149 | and reconstruct the original paragraphs, so that the page width can be |
156 | 150 | adjusted, perform the following manipulations: | |
157 | If additional line breaks are added, then in order to remove the line | ||
158 | breaks and reconstruct the original paragraphs, so that the page width | ||
159 | can be adjusted, perform the following manipulations: | ||
160 | 151 | ||
161 | (1) convert each line break to a space. | 152 | (1) convert each line break to a space. |
162 | (2) convert the string "</p> " (</p> followed by two spaces) | 153 | (2) convert the string "</p> " (</p> followed by two spaces) |
@@ -164,46 +155,43 @@ can be adjusted, perform the following manipulations: | |||
164 | (3) convert the string "<br/ " (<br/ followed by one space) | 155 | (3) convert the string "<br/ " (<br/ followed by one space) |
165 | to <br/ followed by one line break. | 156 | to <br/ followed by one line break. |
166 | 157 | ||
167 | A more sophisticated formatting of spaces within paragraphs may | 158 | A more sophisticated formatting of spaces within paragraphs may require the |
168 | require the use of the fully-tagged master files. If you have a need | 159 | use of the fully-tagged master files. If you have a need for these files, |
169 | for these files, contact Patrick Cassidy: cassidy@micra.com. | 160 | contact Patrick Cassidy: cassidy@micra.com. |
170 | 161 | ||
171 | The approximate beginning of each page is marked by an SGML comment of | 162 | The approximate beginning of each page is marked by an SGML comment of the |
172 | the form <-- p. 345 -->. (The exact beginning was in some cases in | 163 | form <-- p. 345 -->. (The exact beginning was in some cases in the middle |
173 | the middle of a paragraph, which we decided was not a good location | 164 | of a paragraph, which we decided was not a good location for these |
174 | for these page-number comments, so the page number was usually moved | 165 | page-number comments, so the page number was usually moved to the next |
175 | to the next paragraph break). Pages which have been proofread |