aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorSergey Poznyakoff <gray@gnu.org.ua>2012-02-03 12:48:52 +0200
committerSergey Poznyakoff <gray@gnu.org.ua>2012-02-03 12:48:52 +0200
commitd18a469b7a5a4d4b5da21eab37f34ab1e99a8dce (patch)
tree7eb331e376e85287c25b6a9734dae58a4724da8a
parent4a458db06b28492a7e48b1a0560b35778e476482 (diff)
downloadgcide-d18a469b7a5a4d4b5da21eab37f34ab1e99a8dce.tar.gz
gcide-d18a469b7a5a4d4b5da21eab37f34ab1e99a8dce.tar.bz2
Revise tagset.txt
* tagset.txt: Review. * README: Reformat. * webfont.txt: Reformat. Document <and/ and <or/.
-rw-r--r--README363
-rw-r--r--tagset.txt2057
-rw-r--r--webfont.txt302
3 files changed, 1385 insertions, 1337 deletions
diff --git a/README b/README
index b8d21ad..4a36e8b 100644
--- a/README
+++ b/README
@@ -8,29 +8,27 @@ The README file
8 * * * * * * * * * * * * * * * * * * * * * * * * * * * * 8 * * * * * * * * * * * * * * * * * * * * * * * * * * * *
9 9
10* OVERVIEW 10* OVERVIEW
11========== 11
12This document describes the GNU version of the Collaborative 12This document describes the GNU version of the Collaborative International
13International Dictionary of English. It is organized into a series of 13Dictionary of English. It is organized into a series of chapters,
14chapters, introduced by headings beginning with a single asterisk. A 14introduced by headings beginning with a single asterisk. A chapter may have
15chapter may have sections, which are marked with two asterisks. For 15sections, which are marked with two asterisks. For those readers who use
16those readers who use Emacs, this structure corresponds to its 16Emacs, this structure corresponds to its "Outline mode", which will be
17"Outline mode", which will be enabled automatically upon loading this 17enabled automatically upon loading this file.
18file. 18
19 19The chapter "INTRODUCTION" describes the structure of this package. The
20The chapter "INTRODUCTION" describes the structure of this package. 20chapter "STRUCTURE OF THE DICTIONARY" describes the dictionary structure in
21The chapter "STRUCTURE OF THE DICTIONARY" describes the dictionary 21general. An overview of the markup tags is provided in the chapter "TAGS".
22structure in general. An overview of the markup tags is provided in 22A detailed information about dictionary markup can be obtained from a set of
23the chapter "TAGS". A detailed information about dictionary markup 23ancillary files included in this package, which are described in the chapter
24can be obtained from a set of ancillary files included in this 24"ANCILLARY FILES".
25package, which are described in the chapter "ANCILLARY FILES". 25
26 26The chapter "DICTIONARY LOOKUP" describes how to use GNU Dico for reading
27The chapter "DICTIONARY LOOKUP" describes how to use GNU Dico for 27this dictionary. Finally, other versions of the Webster dictionary are
28reading this dictionary. Finally, other versions of the Webster 28listed in the chapter "OTHER VERSIONS OF THE DICTIONARY".
29dictionary are listed in the chapter "OTHER VERSIONS OF THE
30DICTIONARY".
31 29
32* INTRODUCTION 30* INTRODUCTION
33============== 31
34The dictionary was derived from the 32The dictionary was derived from the
35 Webster's Revised Unabridged Dictionary 33 Webster's Revised Unabridged Dictionary
36 Version published 1913 34 Version published 1913
@@ -48,18 +46,17 @@ and has been supplemented with some of the definitions from
48 46
49and is being proof-read and supplemented by volunteers from around the 47and is being proof-read and supplemented by volunteers from around the
50world. This is an unfunded project, and future enhancement of this 48world. This is an unfunded project, and future enhancement of this
51dictionary will depend on the efforts of volunteers willing to help 49dictionary will depend on the efforts of volunteers willing to help build
52build this free resource into a comprehensive body of general 50this free resource into a comprehensive body of general information. New
53information. New definitions for missing words or words senses and 51definitions for missing words or words senses and longer explanatory notes,
54longer explanatory notes, as well as images to accompany the articles 52as well as images to accompany the articles are needed. More modern
55are needed. More modern illustrative quotations giving recent 53illustrative quotations giving recent examples of usage of the words in
56examples of usage of the words in their various senses will be very 54their various senses will be very helpful, since most quotations in the
57helpful, since most quotations in the original 1913 dictionary are now 55original 1913 dictionary are now well over 100 years old.
58well over 100 years old. 56
59 57This electronic version is being maintained by World Soul, a non-profit
60This electronic version is being maintained by World Soul, a 58organization in Plainfield, NJ. For additional information or if you are
61non-profit organization in Plainfield, NJ. For additional information 59willing to assist construction of this data source, contact:
62or if you are willing to assist construction of this data source, contact:
63 60
64=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 61=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
65 Patrick J. Cassidy | TEL: (908) 561-3416 62 Patrick J. Cassidy | TEL: (908) 561-3416
@@ -69,40 +66,38 @@ or if you are willing to assist construction of this data source, contact:
69 pc@worldsoul.org or cassidy@micra.com 66 pc@worldsoul.org or cassidy@micra.com
70=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 67=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
71 68
72GCIDE is free software; you can redistribute it and/or modify 69GCIDE is free software; you can redistribute it and/or modify it under the
73it under the terms of the GNU General Public License as published by 70terms of the GNU General Public License as published by the Free Software
74the Free Software Foundation; either version 2, or (at your option) 71Foundation; either version 2, or (at your option) any later version.
75any later version.
76 72
77GCIDE is distributed in the hope that it will be useful, 73GCIDE is distributed in the hope that it will be useful, but WITHOUT ANY
78but WITHOUT ANY WARRANTY; without even the implied warranty of 74WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
79MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 75FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
80GNU General Public License for more details. 76details.
81 77
82You should have received a copy of the GNU General Public License 78You should have received a copy of the GNU General Public License along with
83along with this copy of GCIDE; see the file COPYING. If not, write 79this copy of GCIDE; see the file COPYING. If not, write to the Free
84to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, 80Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
85Boston, MA 02111-1307, USA. 8102111-1307, USA.
86 82
87* STRUCTURE OF THE DICTIONARY 83* STRUCTURE OF THE DICTIONARY
88============================= 84
89When the archive is unpacked, the main dictionary text of the GCIDE 85When the archive is unpacked, the main dictionary text of the GCIDE will be
90will be found in 26 files named "CIDE.*", where the asterisk indicates 86found in 26 files named "CIDE.*", where the asterisk indicates which letter
91which letter of the alphabet begins the words in each file. For 87of the alphabet begins the words in each file. For example, file "CIDE.B"
92example, file "CIDE.B" contains words beginning with the letter "B". 88contains words beginning with the letter "B". Additional information about
93Additional information about the tagging conventions and special 89the tagging conventions and special character symbols are contained in
94character symbols are contained in ancillary files in this directory 90ancillary files in this directory (see below the section entitled "ANCILLARY
95(see below the section entitled "ANCILLARY FILES"). The main body of 91FILES"). The main body of the 1913 dictionary was essentially identical to
96the 1913 dictionary was essentially identical to the edition published 92the edition published in 1890, and was republished in 1913 with an appendix
97in 1890, and was republished in 1913 with an appendix containing "New 93containing "New Words". The new words of that appendix have been integrated
98Words". The new words of that appendix have been integrated into the 94into the main file in this version. However, it is important to keep in
99main file in this version. However, it is important to keep in mind 95mind that the definitions in this dictionary are in most cases over 100
100that the definitions in this dictionary are in most cases over 100
101years old. Use them with caution! 96years old. Use them with caution!
102 97
103At the bottom of each paragraph in this dictionary, there is a 98At the bottom of each paragraph in this dictionary, there is a bracketed and
104bracketed and tagged "source" indicated. This tells from where the 99tagged "source" indicated. This tells from where the definition or other
105definition or other text in that paragraph came, as follows: 100text in that paragraph came, as follows:
106 101
107[<source>1913 Webster</source>] 102[<source>1913 Webster</source>]
108 = From the original 1890 dictionary. 103 = From the original 1890 dictionary.
@@ -117,46 +112,42 @@ definition or other text in that paragraph came, as follows:
117[<source>XXX</source>] 112[<source>XXX</source>]
118 = Added by one of the volunteers. 113 = Added by one of the volunteers.
119 114
120The original definitions have been tagged and in some cases 115The original definitions have been tagged and in some cases reformatted or
121reformatted or slightly rearranged. If substantive information is 116slightly rearranged. If substantive information is added from a second
122added from a second source, usually the additional source is also 117source, usually the additional source is also noted, as in:
123noted, as in:
124 118
125[<source>Webster 1913 Suppl.</source> + <source>WordNet 1.5</source>] 119[<source>Webster 1913 Suppl.</source> + <source>WordNet 1.5</source>]
126 120
127This version is tagged with SGML-like tags of the form <pos>...</pos> 121This version is tagged with SGML-like tags of the form <pos>...</pos> so
128so that the original typography (italics, bold, block quotes) can be 122that the original typography (italics, bold, block quotes) can be
129reproduced. A list of the most important tags for fields in the 123reproduced. A list of the most important tags for fields in the dictionary
130dictionary is given below. The tags also serve the more important 124is given below. The tags also serve the more important function of allowing
131function of allowing the information content to be conveniently 125the information content to be conveniently imported into computer programs
132imported into computer programs or databases. The set of tags used is 126or databases. The set of tags used is described in the accompanying file
133described in the accompanying file "tagset.txt". ***NOTE*** the 127"tagset.txt". ***NOTE*** the paragraph tags <p>...</p> do *not* always nest
134paragraph tags <p>...</p> do *not* always nest properly with certain 128properly with certain other tags, such as <note> and <cs> ("collocation
135other tags, such as <note> and <cs> ("collocation section"), which in 129section"), which in some cases span multiple paragraphs. If you are using a
136some cases span multiple paragraphs. If you are using a tag parser 130tag parser which detects improper nesting, you should first either delete
137which detects improper nesting, you should first either delete the 131the paragraph tags or convert them to non-tag symbols, or, if possible, set
138paragraph tags or convert them to non-tag symbols, or, if possible, 132the parser to ignore the <p>...</p> tags.
139set the parser to ignore the <p>...</p> tags. 133
140 134The unusual characters (such as Greek or the European accented characters,
141The unusual characters (such as Greek or the European accented 135as well as special characters used in the pronunciations) are described in
142characters, as well as special characters used in the pronunciations) 136the accompanying file "webfont.txt". Some information on the pronunciation
143are described in the accompanying file "webfont.txt". Some 137system used may be found by viewing the file "pronunc.jpg", and additional
144information on the pronunciation system used may be found by viewing 138explanations of pronunciation are in the file "pronunc.txt".
145the file "pronunc.jpg", and additional explanations of pronunciation 139
146are in the file "pronunc.txt". 140Each paragraph of the original text is enclosed within tags of the form <p>
147 141. . . </p>. Within these paragraphs there are no line breaks, and some of
148Each paragraph of the original text is enclosed within tags of the 142the paragraphs are over 12,000 characters long, which may prove too long to
149form <p> . . . </p>. Within these paragraphs there are no line 143be handled by some editors. At some points, embedded line breaks within a
150breaks, and some of the paragraphs are over 12,000 characters long, 144"paragraph" are marked by a <br/ "entity". The file can therefore be
151which may prove too long to be handled by some editors. At some 145converted, if necessary, to a form with shorter lines, and subsequently
152points, embedded line breaks within a "paragraph" are marked by a <br/ 146reconverted back to the form having one line per paragraph.
153"entity". The file can therefore be converted, if necessary, to a 147
154form with shorter lines, and subsequently reconverted back to the form 148If additional line breaks are added, then in order to remove the line breaks
155having one line per paragraph. 149and reconstruct the original paragraphs, so that the page width can be
156 150adjusted, perform the following manipulations:
157If additional line breaks are added, then in order to remove the line
158breaks and reconstruct the original paragraphs, so that the page width
159can be adjusted, perform the following manipulations:
160 151
161 (1) convert each line break to a space. 152 (1) convert each line break to a space.
162 (2) convert the string "</p> " (</p> followed by two spaces) 153 (2) convert the string "</p> " (</p> followed by two spaces)
@@ -164,46 +155,43 @@ can be adjusted, perform the following manipulations:
164 (3) convert the string "<br/ " (<br/ followed by one space) 155 (3) convert the string "<br/ " (<br/ followed by one space)
165 to <br/ followed by one line break. 156 to <br/ followed by one line break.
166 157
167A more sophisticated formatting of spaces within paragraphs may 158A more sophisticated formatting of spaces within paragraphs may require the
168require the use of the fully-tagged master files. If you have a need 159use of the fully-tagged master files. If you have a need for these files,
169for these files, contact Patrick Cassidy: cassidy@micra.com. 160contact Patrick Cassidy: cassidy@micra.com.
170 161
171The approximate beginning of each page is marked by an SGML comment of 162The approximate beginning of each page is marked by an SGML comment of the
172the form <-- p. 345 -->. (The exact beginning was in some cases in 163form <-- p. 345 -->. (The exact beginning was in some cases in the middle
173the middle of a paragraph, which we decided was not a good location 164of a paragraph, which we decided was not a good location for these
174for these page-number comments, so the page number was usually moved 165page-number comments, so the page number was usually moved to the next
175to the next paragraph break). Pages which have been proofread