summaryrefslogtreecommitdiffabout
path: root/README
blob: 6eeadf6ff17052027f250ac894352aaa8f7d7ab4 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
The README file

  To accompany the GNU version of the set of files (CIDE.*) containing 
                the electronic version of the
       Collaborative International Dictionary of English.
                   (called also GCIDE)
       These files contain Version 0.51 (January 2012)
    * * * * * * * * * * * * * * * * * * * * * * * * * * * *

* OVERVIEW

This document describes the GNU version of the Collaborative International
Dictionary of English.  It is organized into a series of chapters,
introduced by headings beginning with a single asterisk.  A chapter may have
sections, which are marked with two asterisks.  For those readers who use
Emacs, this structure corresponds to its "Outline mode", which will be
enabled automatically upon loading this file.

The chapter "INTRODUCTION" describes the structure of this package.  The
chapter "STRUCTURE OF THE DICTIONARY" describes the dictionary structure in
general.  An overview of the markup tags is provided in the chapter "TAGS".
A detailed information about dictionary markup can be obtained from a set of
ancillary files included in this package, which are described in the chapter
"ANCILLARY FILES".

The chapter "DICTIONARY LOOKUP" describes how to use GNU Dico for reading
this dictionary.  Finally, other versions of the Webster dictionary are
listed in the chapter "OTHER VERSIONS OF THE DICTIONARY".
    
* INTRODUCTION

The dictionary was derived from the
         Webster's Revised Unabridged Dictionary
                 Version published 1913
               by the  C. & G. Merriam Co.
                   Springfield, Mass.
                 Under the direction of
                Noah Porter, D.D., LL.D.

and has been supplemented with some of the definitions from
           WordNet, a semantic network created by
              the Cognitive Science Department
                 of Princeton University
                  under the direction of
                   Prof. George Miller

and is being proof-read and supplemented by volunteers from around the
world.  This is an unfunded project, and future enhancement of this
dictionary will depend on the efforts of volunteers willing to help build
this free resource into a comprehensive body of general information.  New
definitions for missing words or words senses and longer explanatory notes,
as well as images to accompany the articles are needed.  More modern
illustrative quotations giving recent examples of usage of the words in
their various senses will be very helpful, since most quotations in the
original 1913 dictionary are now well over 100 years old.

This electronic version is being maintained by World Soul, a non-profit
organization in Plainfield, NJ.  For additional information or if you are
willing to assist construction of this data source, contact:

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Patrick J. Cassidy              | TEL:          (908) 561-3416
 World Soul                      | if no answer, (908) 668-5252
 735 Belvidere Ave.              | FAX:          (908) 668-5904
 Plainfield, NJ  07062-2054
 pc@worldsoul.org   or  cassidy@micra.com
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

GCIDE is free software; you can redistribute it and/or modify it under the
terms of the GNU General Public License as published by the Free Software
Foundation; either version 3, or (at your option) any later version.

GCIDE is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more
details.

You should have received a copy of the GNU General Public License along with
this copy of GCIDE; see the file COPYING.  If not, write to the Free
Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
02111-1307, USA.

* STRUCTURE OF THE DICTIONARY

When the archive is unpacked, the main dictionary text of the GCIDE will be
found in 26 files named "CIDE.*", where the asterisk indicates which letter
of the alphabet begins the words in each file.  For example, file "CIDE.B"
contains words beginning with the letter "B".  Additional information about
the tagging conventions and special character symbols are contained in
ancillary files in this directory (see below the section entitled "ANCILLARY
FILES").  The main body of the 1913 dictionary was essentially identical to
the edition published in 1890, and was republished in 1913 with an appendix
containing "New Words".  The new words of that appendix have been integrated
into the main file in this version.  However, it is important to keep in
mind that the definitions in this dictionary are in most cases over 100
years old.  Use them with caution!

At the bottom of each paragraph in this dictionary, there is a bracketed and
tagged "source" indicated.  This tells from where the definition or other
text in that paragraph came, as follows:

[<source>1913 Webster</source>]
  =  From the original 1890 dictionary.
[<source>Webster 1913 Suppl.</source>]
  =  From the 1913 "New Words" supplement to the Webster.
[<source>WordNet 1.5</source>]
  =  From the WordNet on-line semantic network.
[<source>Century Dict. 1906.</source>]
  =  From the Century Dictionary published in 1906, especially from
          the "proper Names" supplement (volume IX).
                                     published
[<source>XXX</source>]
   = Added by one of the volunteers.

The original definitions have been tagged and in some cases reformatted or
slightly rearranged.  If substantive information is added from a second
source, usually the additional source is also noted, as in:

[<source>Webster 1913 Suppl.</source> + <source>WordNet 1.5</source>]

This version is tagged with SGML-like tags of the form <pos>...</pos> so
that the original typography (italics, bold, block quotes) can be
reproduced.  A list of the most important tags for fields in the dictionary
is given below.  The tags also serve the more important function of allowing
the information content to be conveniently imported into computer programs
or databases.  The set of tags used is described in the accompanying file
"tagset.txt".  ***NOTE*** the paragraph tags <p>...</p> do *not* always nest
properly with certain other tags, such as <note> and <cs> ("collocation
section"), which in some cases span multiple paragraphs.  If you are using a
tag parser which detects improper nesting, you should first either delete
the paragraph tags or convert them to non-tag symbols, or, if possible, set
the parser to ignore the <p>...</p> tags.

The unusual characters (such as Greek or the European accented characters,
as well as special characters used in the pronunciations) are described in
the accompanying file "webfont.txt".  Some information on the pronunciation
system used may be found by viewing the file "pronunc.jpg", and additional
explanations of pronunciation are in the file "pronunc.txt".

Each paragraph of the original text is enclosed within tags of the form <p>
. . . </p>.  Within these paragraphs there are no line breaks, and some of
the paragraphs are over 12,000 characters long, which may prove too long to
be handled by some editors.  At some points, embedded line breaks within a
"paragraph" are marked by a <br/ "entity".  The file can therefore be
converted, if necessary, to a form with shorter lines, and subsequently
reconverted back to the form having one line per paragraph.

If additional line breaks are added, then in order to remove the line breaks
and reconstruct the original paragraphs, so that the page width can be
adjusted, perform the following manipulations:

  (1) convert each line break to a space.
  (2) convert the string "</p>  " (</p> followed by two spaces)
     to </p> followed by two line breaks.
  (3) convert the string "<br/ " (<br/ followed by one space)
     to <br/ followed by one line break.
     
A more sophisticated formatting of spaces within paragraphs may require the
use of the fully-tagged master files.  If you have a need for these files,
contact Patrick Cassidy: cassidy@micra.com.

The approximate beginning of each page is marked by an SGML comment of the
form <-- p. 345 -->.  (The exact beginning was in some cases in the middle
of a paragraph, which we decided was not a good location for these
page-number comments, so the page number was usually moved to the next
paragraph break).  Pages which have been proofread by volunteers (e.g., with
initials VOL) will have a note within that page comment: <-- p. 345 pr=VOL
-->.  Pages which have not been proofread yet (most of them) will have
varying numbers of typographical errors in them.  We still (January 2012)
need proofreaders to get the errors out of these dictionary files.

** Warning

This version is only a first typing, and has numerous typographic errors,
including errors in the field-marks.  In addition, the user must keep in
mind that this text is very old and will contain numerous obsolete,
inaccurate, and perhaps offensive statements, which are included solely
because this work is intended to reproduce accurately this historically
interesting classic reference work.  This text should not be relied upon as
an accurate source of information, as in many cases it represents the state
of knowledge around 1890.  The text is provided "as is", and the user must
accept responsibility for all consequences of its use. Please refer to the
header of each file and the GNU public license.  If these conditions of use
are unacceptable, please do not use these texts.

This electronic dictionary is also made available as a potential starting
point for development of a modern comprehensive encyclopedic dictionary, to
be accessible freely on the internet, and developed by the efforts of all
individuals willing to help build a large and freely available knowledge
base.  A large number of collaborators are needed to bring this dictionary
to a more accurate, more modern, and more useful state. Anyone willing to
assist in any way in constructing such a knowledge base should contact
Patrick Cassidy (see above).  All reports of errors will be gratefully
received, and should also be transmitted to PC at: pc@worldsoul.org.

* TAGS

Most important tags used in the GCIDE:

<hw> tags the headword
<pr>          pronunciation
<pos>         part of speech
<ety>         etymology
<ets>         "source" word within an <ety> field, usually foreign words
<fld>         field of knowledge (e.g. Med. = medicine)
<def>         definition
<cs>          collocation section  (containing word combinations)
<col>         collocation entry (word combination)
<cd>          collocation definition
<as>          illustrations of usage (within a <def>. . . </def> field)
<au>          authority for a definition, or author of a quotation
<q>           illustrative quotation -- in block quote format
<au>          author of an illustrative <q> quotation
<altname>     alternative name for the headword -- essentially a synonym
<asp>         alternative spelling of the headword
<syn>         list of synonyms for the headword
<p>           paragraph
<b>           bold type
<it>          italic type

For other tags, see the file "tagset.txt"

* ANCILLARY FILES

In addition to the main text of the dictionary, additional explanatory
material about this version of the dictionary is available in the ancillary
files:

** COPYING

The license terms for distributing and modifying this dictionary.

** abbrevn.lst

List of the abbreviations used in the dictionary.

** authors.lst

List of authors whose works are quoted in the dictionary.

** pronunc.txt

Description of the special markup used in this dictionary to represent
pronunciations.

** pronunc.jpg

A copy of the dictionary page describing the pronunciation symbols used in
the original work.

** symbols.jpg

This file lists original pronunciation symbols with the corresponding markup
entities used in this version.

** tagset.txt

Description of the markup tags.

** titlepage.png

A copy of the original title page.

** webfont.txt

Description of the special escape sequences used in this dictionary.  This
file also explains the Greek transliteration syntax used in it.

* DICTIONARY LOOKUP

The GNU Dico project contains a browser for GCIDE, called "Gcider".  It
provides a windowing interface allowing user to search for matching
headwords using several match strategies and browse their definitions.

The package also provides a loadable module which allows to use GCIDE with
the dictionary server.

See http://www.gnu.org.ua/software/dico for more information, including
links to download and documentation.

* OTHER VERSIONS OF THE DICTIONARY

There are several other derivative versions of this dictionary on the
internet, in some cases reformatted or provided with an interface.  Those
that I am aware of are:

** Dicoweb 

This version of GCIDE is available online at the GNU Dico web site:

  http://dicoweb.gnu.org.ua/?db=gcide

The site provides extensive search facilities.  

** Project Gutenberg

In the extext96 directory of Project Gutenberg
(http://www.gutenberg.org/dirs/etext96), there is a version of the original
1913 dictionary, which is in the **public domain**.  The main files are
labeled pgw050*.*.  The tags for that version are a subset of those used in
this GNU version.

** The DICT development group

This group has created a program to index and search this dictionary.  The
program can be downloaded and used locally, but at present is available only
in a Unix-compatible executable version.  See their web site at
http://www.dict.org.

** The University of Chicago ARTFL project

Mark Olsen and Gavin LaRowe at the University of Chicago have converted the
original 1913 dictionary to HTML and have provided an interface allowing
search of the headwords.  When the supplemented version has developed
sufficiently to warrant the effort, a similar searchable version may be
posted there as well.  The search page is at:

  http://humanities.uchicago.edu/forms_unrest/webster.form.html

That page will provide links to other ARTFL projects and contact information
for the ARTFL group, who alone can provide information about the HTML
version or interface.



Local Variables:
mode: outline
paragraph-separate: "[ 	]*$"
version-control: never
fill-column: 76
End:

Return to:

Send suggestions and report system problems to the System administrator.