aboutsummaryrefslogtreecommitdiff
path: root/README
blob: b8d21ad4592a73780dea7862a1b8a01479409571 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
The README file

  To accompany the GNU version of the set of files (CIDE.*) containing 
                the electronic version of the
       Collaborative International Dictionary of English.
                   (called also GCIDE)
       These files contain Version 0.51 (January 2012)
    * * * * * * * * * * * * * * * * * * * * * * * * * * * *

* OVERVIEW
==========
This document describes the GNU version of the Collaborative
International Dictionary of English.  It is organized into a series of
chapters, introduced by headings beginning with a single asterisk.  A
chapter may have sections, which are marked with two asterisks.  For
those readers who use Emacs, this structure corresponds to its
"Outline mode", which will be enabled automatically upon loading this
file.

The chapter "INTRODUCTION" describes the structure of this package.
The chapter "STRUCTURE OF THE DICTIONARY" describes the dictionary
structure in general.  An overview of the markup tags is provided in
the chapter "TAGS".  A detailed information about dictionary markup
can be obtained from a set of ancillary files included in this
package, which are described in the chapter "ANCILLARY FILES".

The chapter "DICTIONARY LOOKUP" describes how to use GNU Dico for
reading this dictionary.  Finally, other versions of the Webster
dictionary are listed in the chapter "OTHER VERSIONS OF THE
DICTIONARY".
    
* INTRODUCTION
==============
The dictionary was derived from the
         Webster's Revised Unabridged Dictionary
                 Version published 1913
               by the  C. & G. Merriam Co.
                   Springfield, Mass.
                 Under the direction of
                Noah Porter, D.D., LL.D.

and has been supplemented with some of the definitions from
           WordNet, a semantic network created by
              the Cognitive Science Department
                 of Princeton University
                  under the direction of
                   Prof. George Miller

and is being proof-read and supplemented by volunteers from around the
world.  This is an unfunded project, and future enhancement of this
dictionary will depend on the efforts of volunteers willing to help
build this free resource into a comprehensive body of general
information.  New definitions for missing words or words senses and
longer explanatory notes, as well as images to accompany the articles
are needed.  More modern illustrative quotations giving recent
examples of usage of the words in their various senses will be very
helpful, since most quotations in the original 1913 dictionary are now
well over 100 years old.

This electronic version is being maintained by World Soul, a
non-profit organization in Plainfield, NJ.  For additional information
or if you are willing to assist construction of this data source, contact:

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Patrick J. Cassidy              | TEL:          (908) 561-3416
 World Soul                      | if no answer, (908) 668-5252
 735 Belvidere Ave.              | FAX:          (908) 668-5904
 Plainfield, NJ  07062-2054
 pc@worldsoul.org   or  cassidy@micra.com
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

GCIDE is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2, or (at your option)
any later version.

GCIDE is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this copy of GCIDE; see the file COPYING.  If not, write 
to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA.

* STRUCTURE OF THE DICTIONARY
=============================
When the archive is unpacked, the main dictionary text of the GCIDE
will be found in 26 files named "CIDE.*", where the asterisk indicates
which letter of the alphabet begins the words in each file.  For
example, file "CIDE.B" contains words beginning with the letter "B".
Additional information about the tagging conventions and special
character symbols are contained in ancillary files in this directory
(see below the section entitled "ANCILLARY FILES").  The main body of
the 1913 dictionary was essentially identical to the edition published
in 1890, and was republished in 1913 with an appendix containing "New
Words".  The new words of that appendix have been integrated into the
main file in this version.  However, it is important to keep in mind
that the definitions in this dictionary are in most cases over 100
years old.  Use them with caution!

At the bottom of each paragraph in this dictionary, there is a
bracketed and tagged "source" indicated.  This tells from where the
definition or other text in that paragraph came, as follows:

[<source>1913 Webster</source>]
  =  From the original 1890 dictionary.
[<source>Webster 1913 Suppl.</source>]
  =  From the 1913 "New Words" supplement to the Webster.
[<source>WordNet 1.5</source>]
  =  From the WordNet on-line semantic network.
[<source>Century Dict. 1906.</source>]
  =  From the Century Dictionary published in 1906, especially from
          the "proper Names" supplement (volume IX).
                                     published
[<source>XXX</source>]
   = Added by one of the volunteers.

The original definitions have been tagged and in some cases
reformatted or slightly rearranged.  If substantive information is
added from a second source, usually the additional source is also
noted, as in:

[<source>Webster 1913 Suppl.</source> + <source>WordNet 1.5</source>]

This version is tagged with SGML-like tags of the form <pos>...</pos> 
so that the original typography (italics, bold, block quotes) can be
reproduced.  A list of the most important tags for fields in the
dictionary is given below.  The tags also serve the more important
function of allowing the information content to be conveniently
imported into computer programs or databases.  The set of tags used is
described in the accompanying file "tagset.txt".  ***NOTE*** the
paragraph tags <p>...</p> do *not* always nest properly with certain
other tags, such as <note> and <cs> ("collocation section"), which in
some cases span multiple paragraphs.  If you are using a tag parser
which detects improper nesting, you should first either delete the
paragraph tags or convert them to non-tag symbols, or, if possible,
set the parser to ignore the <p>...</p> tags.

The unusual characters (such as Greek or the European accented
characters, as well as special characters used in the pronunciations)
are described in the accompanying file "webfont.txt".  Some
information on the pronunciation system used may be found by viewing
the file "pronunc.jpg", and additional explanations of pronunciation
are in the file "pronunc.txt".

Each paragraph of the original text is enclosed within tags of the
form <p> . . . </p>.  Within these paragraphs there are no line
breaks, and some of the paragraphs are over 12,000 characters long,
which may prove too long to be handled by some editors.  At some
points, embedded line breaks within a "paragraph" are marked by a <br/
"entity".  The file can therefore be converted, if necessary, to a
form with shorter lines, and subsequently reconverted back to the form
having one line per paragraph.

If additional line breaks are added, then in order to remove the line
breaks and reconstruct the original paragraphs, so that the page width
can be adjusted, perform the following manipulations:

  (1) convert each line break to a space.
  (2) convert the string "</p>  " (</p> followed by two spaces)
     to </p> followed by two line breaks.
  (3) convert the string "<br/ " (<br/ followed by one space)
     to <br/ followed by one line break.
     
A more sophisticated formatting of spaces within paragraphs may
require the use of the fully-tagged master files.  If you have a need
for these files, contact Patrick Cassidy: cassidy@micra.com. 

The approximate beginning of each page is marked by an SGML comment of
the form <-- p. 345 -->.  (The exact beginning was in some cases in
the middle of a paragraph, which we decided was not a good location
for these page-number comments, so the page number was usually moved
to the next paragraph break).  Pages which have been proofread by
volunteers (e.g., with initials VOL) will have a note within that page
comment: <-- p. 345 pr=VOL -->.  Pages which have not been proofread
yet (most of them) will have varying numbers of typographical errors
in them.   We still (January 2012) need proofreaders to get the errors
out of these dictionary files. 

** Warning

This version is only a first typing, and has numerous typographic
errors, including errors in the field-marks.  In addition, the user
must keep in mind that this text is very old and will contain numerous 
obsolete, inaccurate, and perhaps offensive statements, which are 
included solely because this work is intended to reproduce accurately
this historically interesting classic reference work.  This text should 
not be relied upon as an accurate source of information, as in many
cases it represents the state of knowledge around 1890.  The text is
provided "as is", and the user must accept responsibility for all
consequences  of its use. Please refer to the header of each file and
the GNU public license.  If these conditions of use are unacceptable,
please do not use these texts.

This electronic dictionary is also made available as a potential
starting point for development of a modern comprehensive encyclopedic
dictionary, to be accessible freely on the internet, and developed by
the efforts of all individuals willing to help build a large and
freely available knowledge base.  A large number of collaborators are
needed to bring this dictionary to a more accurate, more modern,  and
more useful state. Anyone willing to assist in any way in constructing
such a knowledge base should contact Patrick Cassidy (see above).  All
reports of errors will be gratefully received, and should also be
transmitted to PC at: pc@worldsoul.org.

* TAGS

Most important tags used in the GCIDE:

<hw> tags the headword
<pr>          pronunciation
<pos>         part of speech
<ety>         etymology
<ets>         "source" word within an <ety> field, usually foreign words
<fld>         field of knowledge (e.g. Med. = medicine)
<def>         definition
<cs>          collocation section  (containing word combinations)
<col>         collocation entry (word combination)
<cd>          collocation definition
<as>          illustrations of usage (within a <def>. . . </def> field)
<au>          authority for a definition, or author of a quotation
<q>           illustrative quotation -- in block quote format
<au>          author of an illustrative <q> quotation
<altname>     alternative name for the headword -- essentially a synonym
<asp>         alternative spelling of the headword
<syn>         list of synonyms for the headword
<p>           paragraph
<b>           bold type
<it>          italic type

For other tags, see the file "tagset.txt"

* ANCILLARY FILES

In addition to the main text of the dictionary, additional explanatory
material about this version of the dictionary is available in the
ancillary files:

** COPYING

The license terms for distributing and modifying this dictionary.

** abbrevn.lst

List of the abbreviations used in the dictionary.

** authors.lst

List of authors whose works are quoted in the dictionary.

** pronunc.txt

Description of the special markup used in this dictionary to represent
pronunciations.

** pronunc.jpg

A copy of the dictionary page describing the pronunciation symbols used
in the original work.

** symbols.jpg

This file lists original pronunciation symbols with the corresponding
markup entities used in this version.

** tagset.txt

Description of the markup tags.

** titlepage.png

A copy of the original title page.

** webfont.txt

Description of the special escape sequences used in this dictionary.
This file also explains the Greek transliteration syntax used in it.

* DICTIONARY LOOKUP
===================
The GNU Dico project contains a module for reading GCIDE files.  This
distribution provides a configuration file "gcide.conf" which you can
use with the "dicod" server in order to look up words in the
dictionary.  See http://www.gnu.org.ua/software/dico for a description
of GNU Dico, including links to download.

The instructions below describe how to configure GNU Dico server
(dicod) to access a copy of the GCIDE dictionary.

1. Unpack the GCIDE dictionary;
2. Copy the file "gcide.conf" to a directory where you keep your local
configuration files (/etc or /usr/local/etc are usual choices).
3. Replace the word GCIDE_PATH in the "gcide.conf" statement with the
path to the gcide-0.51 dicrectory.  You can omit this step and use the
-D option instead:
4. Check the configuration file.  Run:
          dicod --config /path/to/gcide.conf --lint
If you skipped the step 3, supply the -D option with the acual path to
the dictionary.  For example, if you copied "gcide.conf" to /etc and
unpacked GCIDE to /usr/local, then run:
          dicod --config /etc/gcide.conf -D GCIDE_PATH=/usr/local --lint
If no errors are reported, then go to the step 5.

5. Start "dicod".  Run the same command as described in step 4, but
without the "--lint" option.  This will start the dictionary server
which will be avaialble on localhost (127.0.0.1) port 2628.  The
server provides extensive searching facilities.  It also parses the
GCIDE markup and automatically reformats the articles before returning
them.

Now you can access the dictionary using dico (a GNU dictionary command
line utility), or another dictionary client program (such as Kdict or
the like).

* OTHER VERSIONS OF THE DICTIONARY
==================================
There are several other derivative versions of this dictionary on the
internet, in some cases reformatted or provided with an interface.
Those that I am aware of are:

** Dicoweb 
----------
This version of GCIDE is available online at the GNU Dico web
site:

  http://dicoweb.gnu.org.ua/?db=gcide

The site provides extensive search facilities.  

** Project Gutenberg
---------------------
In the extext96 directory of Project Gutenberg
(http://www.gutenberg.org/dirs/etext96), there is a version of the
original 1913 dictionary, which is in the **public domain**.  The main
files are labeled pgw050*.*.  The tags for that version are a subset
of those used in this GNU version.

** The DICT development group
------------------------------
This group has created a program to index and search this dictionary.
The program can be downloaded and used locally, but at present is
available only in a Unix-compatible executable version.  See their web
site at http://www.dict.org.

** The University of Chicago ARTFL project
------------------------------------------
Mark Olsen and Gavin LaRowe at the University of Chicago have
converted the original 1913 dictionary to HTML and have provided an
interface allowing search of the headwords.  When the supplemented
version has developed sufficiently to warrant the effort, a similar
searchable version may be posted there as well.  The search page is at:

  http://humanities.uchicago.edu/forms_unrest/webster.form.html

That page will provide links to other ARTFL projects and contact
information for the ARTFL group, who alone can provide information  
about the HTML version or interface.



Local Variables:
mode: outline
paragraph-separate: "[ 	]*$"
version-control: never
End:

Return to:

Send suggestions and report system problems to the System administrator.