aboutsummaryrefslogtreecommitdiff
path: root/README.DIC
blob: 780e0bb4227c6bd2cd5d4a0336be231bbcb57503 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
File  README.DIC
  To accompany the GNU version of the set of files (cide.*) containing 
                the electronic version of the
       Collaborative International Dictionary of English.
                   (called also GCIDE)
       These files contain Version 0.46 (January 2002)
    * * * * * * * * * * * * * * * * * * * * * * * * * * * *

The dictionary was derived from the
         Webster's Revised Unabridged Dictionary
                 Version published 1913
               by the  C. & G. Merriam Co.
                   Springfield, Mass.
                 Under the direction of
                Noah Porter, D.D., LL.D.

and has been supplemented with some of the definitions from
           WordNet, a semantic network created by
              the Cognitive Science Department
                 of Princeton University
                  under the direction of
                   Prof. George Miller

and is being proof-read and supplemented by volunteers from
around the world.  This is an unfunded project, and future
enhancement of this dictionary will depend on the efforts of
volunteers willing to help build this free resource into a
comprehensive body of general information.  New definitions
for missing words or words senses and longer explanatory notes, 
as well as images to accompany the articles are needed.  More
modern illustrative quotations giving recent examples of
usage of the words in their various senses will be very
helpful, since most quotations in the original 1913 dictionary
are now well over 100 years old.

   This electronic version is being maintained by World Soul,
a non-profit organization in Plainfield, NJ.  For additional
information or if you are willing to assist construction of this
data source, contact:

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Patrick J. Cassidy              | TEL:          (908) 561-3416
 World Soul                      | if no answer, (908) 668-5252
 735 Belvidere Ave.              | FAX:          (908) 668-5904
 Plainfield, NJ  07062-2054
 pc@worldsoul.org   or  cassidy@micra.com
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

  * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 

GCIDE is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2, or (at your option)
any later version.

GCIDE is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this copy of GCIDE; see the file COPYING.  If not, write 
to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA.
           * * * * * * * * * * * * * * * * * * * * * 

STRUCTURE OF THE DICTIONARY
---------------------------
   When the archives are unpacked, the main dictionary text of 
the GCIDE will be found in 26 files named "cide.*", where the
asterisk indicates which letter of the alphabet begins the
words in each file.  For example, file "cide.b" contains words 
beginning with the letter "B".  Additional information about the 
tagging conventions and special character symbols are contained in 
ancillary files in this directory more information below).  The main 
body of the 1913 dictionary was essentially identical to the edition
published in 1890, and was republished in 1913 with an appendix 
containing "New Words".  The new words of that appendix have been
integrated into the main file in this version.  However, it is important 
to keep in mind that the definitions in this dictionary are in most 
cases over 100 years old.  Use them with caution!
    At the bottom of each paragraph in this dictionary, there is a
bracketed and tagged "source" indicated.  This tells from where the
definition or other text in that paragraph came, as follows:

[<source>1913 Webster</source>]
  =  From the original 1890 dictionary.
[<source>Webster 1913 Suppl.</source>]
  =  From the 1913 "New Words" supplement to the Webster.
[<source>WordNet 1.5</source>]
  =  From the WordNet on-line semantic network.
[<source>Century Dict. 1906.</source>]
  =  From the Century Dictionary published in 1906, especially from
          the "proper Names" supplement (volume IX).
                                     published
[<source>XXX</source>]
   = Added by one of the volunteers.

    The original definitions have been tagged and in some cases 
reformatted or slightly rearranged.  If substantive information
is added from a second source, usually the additional source is
also noted, as in:
[<source>Webster 1913 Suppl.</source> + <source>WordNet 1.5</source>]

    A list of the ancillary files related to the GCIDE is appended at 
the bottom of this "readme.dic" file.
    This version is tagged with SGML-like tags of the form <pos>...</pos>
so that the original typography (italics, bold, block quotes) can be
reproduced.  A list of the most important tags for fields in the 
dictionary is given below.  The tags also serve the more important 
function of allowing the information content to be conveniently imported
into computer programs or databases.  The set of tags used is described 
in the accompanying file "tagset.web".  ***NOTE*** the paragraph tags
<p>...</p> do *not* always nest properly with certain other tags, such 
as <note> and <cs> ("collocation section"), which in some cases span
multiple paragraphs.  If you are using a tag parser which detects
improper nesting, you should first either delete the paragraph
tags or convert them to non-tag symbols, or, if possible, set the 
parser to ignore the <p>...</p> tags.
    The unusual characters (such as Greek or the European accented
characters, as well as special characters used in the pronunciations)
are described in the accompanying file "webfont.asc".  Some information
on the pronunciation system used may be found by viewing the files
"wxxvii.jpg" and "pronunc.jpg" with a GIF viewer (or any web browser),
and additional explanations of pronunciation are in the file 
"pronunc.web".
     Each paragraph of the original text is enclosed within tags of 
the form <p> . . . </p>.  Within these paragraphs are no line
breaks, and some of the paragraphs are over 12,000 characters long.
These lines are too long to be handled by the vi editor, and probably
by some other text editors.  At some points, embedded line breaks within 
a "paragraph" are marked by a <br/ "entity".  The file can therefore
be converted, if necessary, to a form with shorter lines, and subsequently
reconverted back to the form having one line per paragraph.

   If additional line breaks are added, then in order remove the 
line breaks and reconstruct the original paragraphs, so that the 
page width can be adjusted, perform the following manipulations:
  (1) convert each line break (cr-lf combination) to a space.
  (2) convert the string "</p>  " (</p> followed by two spaces)
     to </p> followed by two line breaks (cr-lf combinations)
  (3) convert the string "<br/ " (<br/ followed by one space)
     to <br/ followed by one line break (cr-lf).
There will be some "lines" (paragraphs) with over 12,000 characters,
which may give trouble to some simple text editors.
   A more sophisticated formatting of spaces within paragraphs may
require the use of the fully-tagged master files.  If you have
a need for these files, contact Patrick Cassidy: cassidy@micra.com.
   The approximate beginning of each page is marked by an SGML
comment of the form <-- p. 345 -->.  (The exact beginning was in some
cases in the middle of a paragraph, which we decided was not a
good location for these page-number comments, so the page number
was usually moved to the next paragraph break).  Pages which have 
been proofread by volunteers (e.g., with initials VOL) will have a 
note within that page comment: <-- p. 345 pr=VOL -->.  Pages which have 
not been proofread yet (most of them) will have varying numbers of 
typographical errors in them.   We still (January 2002) need 
proofreaders to get the errors out of these dictionary files.

***********************************************************************
**                        WARNING!!!                                 **
***********************************************************************

    This version is only a first typing, and has numerous typographic
errors, including errors in the field-marks.  In addition, the user must
keep in mind that this text is very old and will contain numerous 
obsolete, inaccurate, and perhaps offensive statements, which are 
included solely because this work is intended to reproduce accurately
this historically interesting classic reference work.  This text should 
not be relied upon as an accurate source of information, as in many
cases it represents the state of knowledge around 1890.  The text is
provided "as is", and the user must accept responsibility for all
consequences  of its use. Please refer to the header of each file and
the GNU public license.  If these conditions of use are unacceptable,
please do not use these texts.
************************************************************************
************************************************************************
    This electronic dictionary is also made available as a potential
starting point for development of a modern comprehensive encyclopedic
dictionary, to be accessible freely on the internet, and developed by the
efforts of all individuals willing to help build a large and freely
available knowledge base.  A large number of collaborators are needed to
bring this dictionary to a more accurate, more modern,  and more useful
state. Anyone willing to assist in any way in constructing such a 
knowledge base should contact Patrick Cassidy (see above).  All reports 
of errors will be gratefully received, and should also be transmitted to 
PC at: pc@worldsoul.org.

In addition to the main text of the dictionary, additional
explanatory material about this version of the dictionary is available
in the ancillary files:

=====================================================================
COPYING             18,321  11-03-99  1:13a COPYING
README   DIC        13,775  01-17-02 11:48p readme.dic
WEBFONT  ASC        35,234  12-12-01  3:27p WEBFONT.ASC
TAGSET   WEB        55,843  08-16-01  1:16p TAGSET.WEB
PRONUNC  WEB        14,312  06-18-00  3:02p PRONUNC.WEB
PRONUNC  JPG     2,569,796  06-18-00  3:11p PRONUNC.JPG
SYMBOLS  JPG       144,716  06-18-00  3:13p SYMBOLS.JPG
WXXVII   JPG     1,188,380  06-18-00  3:19p WXXVII.JPG
==================================================================


Most important tags used in the GCIDE:
<hw> tags the headword
<pr>          pronunciation
<pos>         part of speech
<ety>         etymology
<ets>         "source" word within an <ety> field, usually foreign words
<fld>         field of knowledge (e.g. Med. = medicine)
<def>         definition
<cs>          collocation section  (containing word combinations)
<col>         collocation entry (word combination)
<cd>          collocation definition
<as>          illustrations of usage (within a <def>. . . </def> field)
<au>          authority for a definition, or author of a quotation
<q>           illustrative quotation -- in block quote format
<au>          author of an illustrative <q> quotation
<altname>     alternative name for the headword -- essentially a synonym
<asp>         alternative spelling of the headword
<syn>         list of synonyms for the headword
<p>           paragraph
<b>           bold type
<it>          italic type

For other tags, see the file "tagset.web"


============================================================
            OTHER VERSIONS OF THE DICTIONARY
=============================================================

   There are several other derivative versions of this dictionary 
on the internet, in some cases reformatted or provided with an 
interface.  Those that I am aware of are:

(1) Project Gutenberg
---------------------
   In the extext96 directory of Project Gutenberg (www.prairienet.org)
there is a version of the original 1913 dictionary, which is in
the **public domain**.  The main files are in the directory etext96,
and sre labeled pgw050**.***.  The tags for that version are a subset
of those used in this GNU version.

(2) The DICT development group
------------------------------
This group has created a program to index and search this dictionary.
The program can be downloaded and used locally, but at present
is available only in a Unix-compatible executable version.
See their web site at http://www.dict.org.

(3)  The University of Chicago ARTFL project
---------------------------------------------
Mark Olsen and Gavin LaRowe at the University of Chicago have 
converted the original 1913 dictionary to HTML and have provided an
interface allowing search of the headwords.  When the supplemented
version has developed sufficiently to warrant the effort, a 
similar searchable version may be posted there as well.  The
search page is at:
  http://humanities.uchicago.edu/forms_unrest/webster.form.html

That page will provide links to other ARTFL projects and contact
information for the ARTFL group, who alone can provide information 
about the HTML version or interface.


 -- PJC

Return to:

Send suggestions and report system problems to the System administrator.