From 3d4fbac289846464491104b01bebe554da6758da Mon Sep 17 00:00:00 2001
From: Sergey Poznyakoff ... ... . . .
" (
illustrative quotation -- in block quote format +author of an illustrative quotation +alternative name for the headword -- essentially a synonym + alternative spelling of the headword + list of synonyms for the headword + paragraph + bold type +
followed by two spaces) - to followed by two line breaks (cr-lf combinations) - (3) convert the string "italic type + +For other tags, see the file "tagset.txt" + +* ANCILLARY FILES + +In addition to the main text of the dictionary, additional explanatory +material about this version of the dictionary is available in the +ancillary files: + +** COPYING + +The license terms for distributing and modifying this dictionary. + +** abbrevn.lst + +List of the abbreviations used in the dictionary. + +** authors.lst + +List of authors whose works are quoted in the dictionary. + +** pronunc.txt + +Description of the special markup used in this dictionary to represent +pronunciations. + +** pronunc.jpg + +A copy of the dictionary page describing the pronunciation symbols used +in the original work. + +** symbols.jpg + +This file lists original pronunciation symbols with the corresponding +markup entities used in this version. + +** tagset.txt + +Description of the markup tags. + +** titlepage.png + +A copy of the original title page. + +** webfont.txt + +Description of the special escape sequences used in this dictionary. +This file also explains the Greek transliteration syntax used in it. + +* DICTIONARY LOOKUP +=================== +The GNU Dico project contains a module for reading GCIDE files. This +distribution provides a configuration file "gcide.conf" which you can +use with the "dicod" server in order to look up words in the +dictionary. See http://www.gnu.org.ua/software/dico for a description +of GNU Dico, including links to download. + +The instructions below describe how to configure GNU Dico server +(dicod) to access a copy of the GCIDE dictionary. + +1. Unpack the GCIDE dictionary; +2. Copy the file "gcide.conf" to a directory where you keep your local +configuration files (/etc or /usr/local/etc are usual choices). +3. Replace the word GCIDE_PATH in the "gcide.conf" statement with the +path to the gcide-0.51 dicrectory. You can omit this step and use the +-D option instead: +4. Check the configuration file. Run: + dicod --config /path/to/gcide.conf --lint +If you skipped the step 3, supply the -D option with the acual path to +the dictionary. For example, if you copied "gcide.conf" to /etc and +unpacked GCIDE to /usr/local, then run: + dicod --config /etc/gcide.conf -D GCIDE_PATH=/usr/local --lint +If no errors are reported, then go to the step 5. + +5. Start "dicod". Run the same command as described in step 4, but +without the "--lint" option. This will start the dictionary server +which will be avaialble on localhost (127.0.0.1) port 2628. The +server provides extensive searching facilities. It also parses the +GCIDE markup and automatically reformats the articles before returning +them. + +Now you can access the dictionary using dico (a GNU dictionary command +line utility), or another dictionary client program (such as Kdict or +the like). + +* OTHER VERSIONS OF THE DICTIONARY +================================== +There are several other derivative versions of this dictionary on the +internet, in some cases reformatted or provided with an interface. +Those that I am aware of are: + +** Dicoweb +---------- +This version of GCIDE is available online at the GNU Dico web +site: + + http://dicoweb.gnu.org.ua/?db=gcide + +The site provides extensive search facilities. + +** Project Gutenberg +--------------------- +In the extext96 directory of Project Gutenberg +(http://www.gutenberg.org/dirs/etext96), there is a version of the +original 1913 dictionary, which is in the **public domain**. The main +files are labeled pgw050*.*. The tags for that version are a subset +of those used in this GNU version. + +** The DICT development group +------------------------------ +This group has created a program to index and search this dictionary. +The program can be downloaded and used locally, but at present is +available only in a Unix-compatible executable version. See their web +site at http://www.dict.org. + +** The University of Chicago ARTFL project +------------------------------------------ +Mark Olsen and Gavin LaRowe at the University of Chicago have +converted the original 1913 dictionary to HTML and have provided an +interface allowing search of the headwords. When the supplemented +version has developed sufficiently to warrant the effort, a similar +searchable version may be posted there as well. The search page is at: + + http://humanities.uchicago.edu/forms_unrest/webster.form.html + +That page will provide links to other ARTFL projects and contact +information for the ARTFL group, who alone can provide information +about the HTML version or interface. + + + +Local Variables: +mode: outline +paragraph-separate: "[ ]*$" +version-control: never +End: + diff --git a/README.DIC b/README.DIC deleted file mode 100644 index 6a7ea1c..0000000 --- a/README.DIC +++ /dev/null @@ -1,268 +0,0 @@ -File README.DIC - To accompany the GNU version of the set of files (cide.*) containing - the electronic version of the - Collaborative International Dictionary of English. - (called also GCIDE) - These files contain Version 0.51 (January 2012) - * * * * * * * * * * * * * * * * * * * * * * * * * * * * - -The dictionary was derived from the - Webster's Revised Unabridged Dictionary - Version published 1913 - by the C. & G. Merriam Co. - Springfield, Mass. - Under the direction of - Noah Porter, D.D., LL.D. - -and has been supplemented with some of the definitions from - WordNet, a semantic network created by - the Cognitive Science Department - of Princeton University - under the direction of - Prof. George Miller - -and is being proof-read and supplemented by volunteers from -around the world. This is an unfunded project, and future -enhancement of this dictionary will depend on the efforts of -volunteers willing to help build this free resource into a -comprehensive body of general information. New definitions -for missing words or words senses and longer explanatory notes, -as well as images to accompany the articles are needed. More -modern illustrative quotations giving recent examples of -usage of the words in their various senses will be very -helpful, since most quotations in the original 1913 dictionary -are now well over 100 years old. - - This electronic version is being maintained by World Soul, -a non-profit organization in Plainfield, NJ. For additional -information or if you are willing to assist construction of this -data source, contact: - -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= - Patrick J. Cassidy | TEL: (908) 561-3416 - World Soul | if no answer, (908) 668-5252 - 735 Belvidere Ave. | FAX: (908) 668-5904 - Plainfield, NJ 07062-2054 - pc@worldsoul.org or cassidy@micra.com -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= - - * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * - -GCIDE is free software; you can redistribute it and/or modify -it under the terms of the GNU General Public License as published by -the Free Software Foundation; either version 2, or (at your option) -any later version. - -GCIDE is distributed in the hope that it will be useful, -but WITHOUT ANY WARRANTY; without even the implied warranty of -MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -GNU General Public License for more details. - -You should have received a copy of the GNU General Public License -along with this copy of GCIDE; see the file COPYING. If not, write -to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, -Boston, MA 02111-1307, USA. - * * * * * * * * * * * * * * * * * * * * * - -STRUCTURE OF THE DICTIONARY ---------------------------- - When the archives are unpacked, the main dictionary text of -the GCIDE will be found in 26 files named "CIDE.*", where the -asterisk indicates which letter of the alphabet begins the -words in each file. For example, file "CIDE.B" contains words -beginning with the letter "B". Additional information about the -tagging conventions and special character symbols are contained in -ancillary files in this directory more information below). The main -body of the 1913 dictionary was essentially identical to the edition -published in 1890, and was republished in 1913 with an appendix -containing "New Words". The new words of that appendix have been -integrated into the main file in this version. However, it is important -to keep in mind that the definitions in this dictionary are in most -cases over 100 years old. Use them with caution! - At the bottom of each paragraph in this dictionary, there is a -bracketed and tagged "source" indicated. This tells from where the -definition or other text in that paragraph came, as follows: - -[] - = From the original 1890 dictionary. -[] - = From the 1913 "New Words" supplement to the Webster. -[] - = From the WordNet on-line semantic network. -[] - = From the Century Dictionary published in 1906, especially from - the "proper Names" supplement (volume IX). - published -[] - = Added by one of the volunteers. - - The original definitions have been tagged and in some cases -reformatted or slightly rearranged. If substantive information -is added from a second source, usually the additional source is -also noted, as in: -[ + ] - - A list of the ancillary files related to the GCIDE is appended at -the bottom of this "README.DIC" file. - This version is tagged with SGML-like tags of the form ... -so that the original typography (italics, bold, block quotes) can be -reproduced. A list of the most important tags for fields in the -dictionary is given below. The tags also serve the more important -function of allowing the information content to be conveniently imported -into computer programs or databases. The set of tags used is described -in the accompanying file "tagset.web". ***NOTE*** the paragraph tags -...
do *not* always nest properly with certain other tags, such -asand ("collocation section"), which in some cases span -multiple paragraphs. If you are using a tag parser which detects -improper nesting, you should first either delete the paragraph -tags or convert them to non-tag symbols, or, if possible, set the -parser to ignore the ...
tags. - The unusual characters (such as Greek or the European accented -characters, as well as special characters used in the pronunciations) -are described in the accompanying file "WEBFONT.ASC". Some information -on the pronunciation system used may be found by viewing the files -"WXXVII.JPG" and "PRONUNC.JPG" with a GIF viewer (or any web browser), -and additional explanations of pronunciation are in the file -"PRONUNC.WEB". - Each paragraph of the original text is enclosed within tags of -the form. . .
. Within these paragraphs are no line -breaks, and some of the paragraphs are over 12,000 characters long. -These lines are too long to be handled by the vi editor, and probably -by some other text editors. At some points, embedded line breaks within -a "paragraph" are marked by a
" (
. (The exact beginning was in some -cases in the middle of a paragraph, which we decided was not a -good location for these page-number comments, so the page number -was usually moved to the next paragraph break). Pages which have -been proofread by volunteers (e.g., with initials VOL) will have a -note within that page comment: <-- p. 345 pr=VOL -->. Pages which have -not been proofread yet (most of them) will have varying numbers of -typographical errors in them. We still (January 2012) need -proofreaders to get the errors out of these dictionary files. - -*********************************************************************** -** WARNING!!! ** -*********************************************************************** - - This version is only a first typing, and has numerous typographic -errors, including errors in the field-marks. In addition, the user must -keep in mind that this text is very old and will contain numerous -obsolete, inaccurate, and perhaps offensive statements, which are -included solely because this work is intended to reproduce accurately -this historically interesting classic reference work. This text should -not be relied upon as an accurate source of information, as in many -cases it represents the state of knowledge around 1890. The text is -provided "as is", and the user must accept responsibility for all -consequences of its use. Please refer to the header of each file and -the GNU public license. If these conditions of use are unacceptable, -please do not use these texts. -************************************************************************ -************************************************************************ - This electronic dictionary is also made available as a potential -starting point for development of a modern comprehensive encyclopedic -dictionary, to be accessible freely on the internet, and developed by the -efforts of all individuals willing to help build a large and freely -available knowledge base. A large number of collaborators are needed to -bring this dictionary to a more accurate, more modern, and more useful -state. Anyone willing to assist in any way in constructing such a -knowledge base should contact Patrick Cassidy (see above). All reports -of errors will be gratefully received, and should also be transmitted to -PC at: pc@worldsoul.org. - -In addition to the main text of the dictionary, additional -explanatory material about this version of the dictionary is available -in the ancillary files: - -===================================================================== --rw-r--r-- 1 18021 2012-01-30 00:24 COPYING --rw-r--r-- 1 2569796 2000-06-18 15:11 PRONUNC.JPG --rw-r--r-- 1 13994 2012-01-30 00:24 PRONUNC.WEB --rw-r--r-- 1 13507 2012-01-30 00:27 README.DIC --rw-r--r-- 1 144716 2000-06-18 15:13 SYMBOLS.JPG --rw-r--r-- 1 54783 2012-01-30 00:24 TAGSET.WEB --rw-r--r-- 1 34631 2012-01-30 00:24 WEBFONT.ASC --rw-r--r-- 1 1188380 2000-06-18 15:19 WXXVII.JPG -===================================================================== - - -Most important tags used in the GCIDE: -tags the headword - pronunciation - part of speech - etymology - "source" word within an field, usually foreign words - field of knowledge (e.g. Med. = medicine) - definition - collocation section (containing word combinations) - collocation entry (word combination) - collocation definition - illustrations of usage (within a . . . field) -authority for a definition, or author of a quotation - illustrative quotation -- in block quote format -author of an illustrative quotation -alternative name for the headword -- essentially a synonym - alternative spelling of the headword - list of synonyms for the headword - paragraph - bold type -
. - The absence of an end-field tag, or the presence of an end-field tag -without a prior begin-field tag constitutes a typographical error, of which -there may be a significant number. Any errors detected should be brought -to the attention of PJC or the appropriate editor. - Most of the tagged fields are presented in the text in italic type, -with a number of exceptions. Where a word is contained within more than -one field, the innermost field determines the font to be used. Wherever -recognizable functional fields were found, an attempt was made to tag the -field with a functional mark, but in many cases, words were italicised only -to represent the word itself as a discourse entity, and in some such cases, -the "italic" markitalic type - -For other tags, see the file "tagset.web" - - -============================================================ - OTHER VERSIONS OF THE DICTIONARY -============================================================= - - There are several other derivative versions of this dictionary -on the internet, in some cases reformatted or provided with an -interface. Those that I am aware of are: - -(1) Project Gutenberg ---------------------- - In the extext96 directory of Project Gutenberg (www.prairienet.org) -there is a version of the original 1913 dictionary, which is in -the **public domain**. The main files are in the directory etext96, -and sre labeled pgw050**.***. The tags for that version are a subset -of those used in this GNU version. - -(2) The DICT development group ------------------------------- -This group has created a program to index and search this dictionary. -The program can be downloaded and used locally, but at present -is available only in a Unix-compatible executable version. -See their web site at http://www.dict.org. - -(3) The University of Chicago ARTFL project ---------------------------------------------- -Mark Olsen and Gavin LaRowe at the University of Chicago have -converted the original 1913 dictionary to HTML and have provided an -interface allowing search of the headwords. When the supplemented -version has developed sufficiently to warrant the effort, a -similar searchable version may be posted there as well. The -search page is at: - http://humanities.uchicago.edu/forms_unrest/webster.form.html - -That page will provide links to other ARTFL projects and contact -information for the ARTFL group, who alone can provide information -about the HTML version or interface. - - - -- PJC diff --git a/SYMBOLS.JPG b/SYMBOLS.JPG deleted file mode 100644 index aa31caa..0000000 Binary files a/SYMBOLS.JPG and /dev/null differ diff --git a/TAGSET.WEB b/TAGSET.WEB deleted file mode 100644 index 1409569..0000000 --- a/TAGSET.WEB +++ /dev/null @@ -1,1060 +0,0 @@ - FIELD MARKS FOR WEBSTER 1913 and CIDE - ===================================== -Tagset.web: - Explanations of the tags used to mark the Webster 1913 dictionary -and the CIDE (Collaborative International Dictionary of English). -Note that the list of tags used to mark the public domain version -of this dictionary is shorter than the full set described here. - If any tag is not listed here, it is either (1) one of the -"point" (font size) or "type" (font style) tags, which should be self-explanatory; or - (2) Is a functional field with no effect on the typography. - -Last modified March 12, 1999. - For questions, contact: - Patrick Cassidy cassidy@micra.com - 735 Belvidere Ave. - Plainfield, NJ 07062 - (908) 561-3416 or (908) 668-5252 -------------------------------------------------------------- -A separate file, webfont.asc, contains the list of the individual -non-ASCII characters represented by either higher-order hexadecimal -character marks (e.g., \'94, for o-umlaut) or by entity tags -(e.g., . - - Note: The tags on this list are similar in structure to SGML tags. Each -tag on this list marks a field; each field opens with a tagname between -angle brackets thus: , and closes with a similar tag containing -the forward slash thus: . No tags are used without closing -tags. Thus the HTML
to indicate a line break is symbolized -here as an entity,
has a correspondingwas used, implying nothing regarding functionality -of the word. The base font is considered "plain". Where an italic field -is indicated, parentheses or brackets within the field are not italicised. - Where no font is specified for a tag, the tag is merely a functional -division, and was printed in plain font unless otherwise tagged. This type -of segment is marked by an asterisk (*) where the font name would be. - The size of the "plain" font in the original text is about 1.6 mm for -the height of capitalized letters. -============================================================= -Explicit typographical tags: - These were used where the purpose of a different font was merely to -distinguish a word from the body of the text, and no explicit functional -tag seemed apropriate. ------------------------------------ -Tag Font ------------------------------------ -Explicit formatting tags: -. . . . . . . . . . . . . . . . . . - plain font (that used in the body of a definition) -- - normally not marked, except within fields of - a different front. - italic (in master files) - italic (for use in HTML presentation) - bold (in master files) - bold (for use in HTML presentation) - bold, Collocation font. Same font as used in collocations. - smaller This is used only in the list of "un-" words not - by 1 point actually defined in the dictionary. Probably could be - replaced by a segment mark for the entire list! - The "un-" words should be indexed as headwords. - - bold Same as , a font similar to that used in - collocations. However, this tag is used in a table - and could be set to a different font. - - * HTML tag -- largest heading font. - -
* HTML tag -- second largest heading font. - -
* Marks a Row title in a table. - - Font the same as the headword , though the field is - not a headword. Used only once. - - * Multiple items, a set of items in a table. - A series of point size markers, many unique. - * One of the tags of the form where ** - represents the typographic point size of the - enclosed text. - An HTML tag indicating that the enclosed text is - of teletype form, preformatted in a uniform-spaced - font. -small caps (used mostly for "a. d.", "b. c.") - This is the same font a , but has no functional - or semantic significance - group of table data elements in a table - subscript, like - subscript - superscript - superscript - Sans-serif font - Bold (collocation font) and also a subtype. - HTML tage -- teletype font - A squared bold font without serifs approximating the - "universe bold" font on the HP Laserjet4, slightly - larger than the capitals in a definition body. Used - in expositions describing shapes, such as - "Y", "T", "U", "X", "V", "F". - Vertically organized column. - Vertically organized column -- only part of a table - which needs to be completed. Used once. -<...type> A series of tags, many unique, designating certain - unusual fonts, such as "bourgeoistype" for - "bourgeois type", in the section on typography. - Most of these occur only once, in the section on fonts. - - - - - - - - - - - - - - - - - - - - - - - -============================================================= -Tags with semantic content: -. . . . . . . . . . . . . . . . . . . . . . . . . . . - * Alternative spelling segment. Almost always - contained within square brackets after the main - definition segment. Expository words - such as "Spelled also" are in plain font; - the actual alternative spelling is marked by - ... tags within this segment. - -italic Antonym. - - italic Alternative spelling. The actual word which is an - alternative spelling to the headword. These - are functionally synonyms of the headword. In - most cases these also occur as headwords, with - reference to the word where the actual definition - is found, but not all such words are listed - separately, particularly if the spelling is - close enough to the headword to be found at the - same point in the dictionary. Whether listed - separately or not, these words should - be indexed at this location, also. - - italic Authority or author. Used where an authority is - (may be right- given for a definition, and also used for the - justified. See author, where a quotation within double quotes - in the section is given in the same paragraph as the - on formatting). definition. The double quotes are indicated - by the open-quote (\'bd) and close-quote - (\'b8). In both cases, it is typically - right-justified, almost always fitting on - the same line with the last line of the - definition or quotation. - Within collocation segments, it is usually - used only after quotations, and is not right- - justified, except occasionally where it - would be close to the right margin, and then - apparently is is right-justified. We have - not explicitly marked those which are - right-justified, but they can be - recognized because they are on a line by - themselves, preceded by two carriage returns. - - * Marks a biography. Should be longer than - a short mention of who a person was, which - is typically included as a definition. - - * Same as - - italic Marks the name of a book, pamphlet, or similar - document. - - * A field of knowledge which of which the headword - is a division. - - * Caption of a figure or table. - - * tags the CAS (Chemical Abstracts Service) registry - number for a chemical substance. - - italic tags the infectious disease caused by the headword. - Implied type of the agent is a microorganism, and - the tag must mark a disease. - - * Same as without the italic type. - * Same as without the italic type. - - italic inverse of causes: tags the causative agent of an - infectious disease, which is the headword . - the tag must mark a microorganism, virus, or - prion, and the implied type of the headword is - a disease. - - Used only for The single letter in the headers to each - letter of the alphabet. - - * marks the proper name of a city. Used only - occasionally and not consistently at this stage. - - italic Converted to: used to tag substances which are - products prepared by conversion from the - headword. Usually chemicals or complex - products from mnatuarl materials. Rarely used - up to 1998. - - * List of heads for the columns of a table. - - * Title of a column in a table. - - * Comment -- differs from in being in-line with - the definition paragraph. Provides a little - additional information. - - * Name of a company (commercial firm). Compare - - italic Composed of. Tags a substance of which the - headword is at least partly composed. The - substance may be particulate, such as - diatoms composing diatomaceous earth. - - * marks an object contained within the headword. - - italic Contrasting word. Not exactly an antonym, which - is marked , but a contrasting word which is - often introduced as "opposite to" or "contrasts - with". - - * Name of a country (nation) of the world. - - italic Collocation reference. A reference to a collocation. - Each such collocation should have its own entry, - marked by ... tags, and these - references should function as hypertext buttons - to access that entry. - - * A Date, of any type, e.g. Dec. 25 . - -* Date-with-year tags a date containing a year. - - * definition. The definition may have subfields, - particularly (an illustrative phrase - starting with "as" or "thus" and containing - the headword (or a morphological derivative). - The , \'bd...\'b8 quotations (left and - right double quotes) and fields may be - found within a definition field, but should - and usually are located outside the definition - proper. The marking macro was - inconsistent in this placement, and the - exclusion of the , and quotations - needs to be completed by the proof-readers. - Certain definitions contain - fields within them, where the headword is - an irregular derivative of another headword. - In these cases, the field follows - immediately after the tag, and these - entries do not have a separate field. - In such cases, the field is italic, as - usual. - - * Division of the headword, usually an organization. - E. g. a faculty or department of a university, - or a United Nations agency. - - * Marks an education institution, a subtype of - organization. - - * tags a physical object or form of radiation - emitted by the headword - - Just a place-holder for illustrations, but seldom used. - -italic Marks the name of a movie film. - - italic Field of specialization. Most often used for - Zoology and Botany, but many "fields of - specialization" are marked for technical - terms. The parentheses are usually within this - field, but are not themselves in italics. - - * Name of a geograpahical region of any size; - if applicable, the more specific , - , or are preferred. - - * Hyperym. Points to the hypernym from WordNet 1.5 - Initially, used only for entries extracted - from WordNet 1.5. Not present in the original - 1913 version. - - * Illustrative usage -- mostly from WordNet, and placed - outside the definition, in contrast to usage. - These should be converted to ... illustrative - usage format for consistency. - -* Illustration place-holder. Seldom used. - * HTML usage -- points to an image file, usually - .gif or .jpg. These have no closing tag, and - will appear as errors in parsing. - * Points to a word whose meaning is an intensified - form of the headword. Taken from WordNet - tags, used with some adjectives from WordNet - - * Designates one item in a row of a table. Used only when - intervening spaces do not serve properly as natural - field separaters. -
italic Translation into a foreign (non-English) language - of the previous word in the text -- italic font. - ( is a translation into English) - italic Same as - * Title of a journal (periodical). - * Always a filled rectangular array. - * A 2x5 matrix (2 rows by 5 columns). - * Multiple synonymous subtypes -- used in - def. of "grass". - * Multiple table, encloses figures. -
* Music figure. Only in a note under the entry "Figure", - the two numbers of each such field - are bold, 20 point type, stacked as in a fraction with - a bar between them, but also having a hori