author | Sergey Poznyakoff <gray@gnu.org.ua> | 2012-01-29 22:25:18 (GMT) |
---|---|---|
committer | Sergey Poznyakoff <gray@gnu.org.ua> | 2012-01-29 22:25:18 (GMT) |
commit | 96d405c0c71882883e63a2fb19baa8d4017a698f (patch) (side-by-side diff) | |
tree | 3b0ed7d61b1d04747c03622cea63eaeaf11e8ee7 | |
parent | f942c67a2d47f609962f43182f60028f72673726 (diff) | |
download | gcide-96d405c0c71882883e63a2fb19baa8d4017a698f.tar.gz gcide-96d405c0c71882883e63a2fb19baa8d4017a698f.tar.bz2 |
Fix line terminations, remove GNUCIDE.DIR
-rw-r--r-- | COPYING | 680 | ||||
-rw-r--r-- | GNUCIDE.DIR | 36 | ||||
-rw-r--r-- | PRONUNC.WEB | 636 | ||||
-rw-r--r-- | README.DIC | 536 | ||||
-rw-r--r-- | TAGSET.WEB | 2120 | ||||
-rw-r--r-- | WEBFONT.ASC | 1206 |
6 files changed, 2589 insertions, 2625 deletions
@@ -1,340 +1,340 @@ - GNU GENERAL PUBLIC LICENSE
- Version 2, June 1991
-
- Copyright (C) 1989, 1991 Free Software Foundation, Inc.
- 675 Mass Ave, Cambridge, MA 02139, USA
- 617-542-5942
- Everyone is permitted to copy and distribute verbatim copies
- of this license document, but changing it is not allowed.
-
- Preamble
-
- The licenses for most software are designed to take away your
-freedom to share and change it. By contrast, the GNU General Public
-License is intended to guarantee your freedom to share and change free
-software--to make sure the software is free for all its users. This
-General Public License applies to most of the Free Software
-Foundation's software and to any other program whose authors commit to
-using it. (Some other Free Software Foundation software is covered by
-the GNU Library General Public License instead.) You can apply it to
-your programs, too.
-
- When we speak of free software, we are referring to freedom, not
-price. Our General Public Licenses are designed to make sure that you
-have the freedom to distribute copies of free software (and charge for
-this service if you wish), that you receive source code or can get it
-if you want it, that you can change the software or use pieces of it
-in new free programs; and that you know you can do these things.
-
- To protect your rights, we need to make restrictions that forbid
-anyone to deny you these rights or to ask you to surrender the rights.
-These restrictions translate to certain responsibilities for you if you
-distribute copies of the software, or if you modify it.
-
- For example, if you distribute copies of such a program, whether
-gratis or for a fee, you must give the recipients all the rights that
-you have. You must make sure that they, too, receive or can get the
-source code. And you must show them these terms so they know their
-rights.
-
- We protect your rights with two steps: (1) copyright the software, and
-(2) offer you this license which gives you legal permission to copy,
-distribute and/or modify the software.
-
- Also, for each author's protection and ours, we want to make certain
-that everyone understands that there is no warranty for this free
-software. If the software is modified by someone else and passed on, we
-want its recipients to know that what they have is not the original, so
-that any problems introduced by others will not reflect on the original
-authors' reputations.
-
- Finally, any free program is threatened constantly by software
-patents. We wish to avoid the danger that redistributors of a free
-program will individually obtain patent licenses, in effect making the
-program proprietary. To prevent this, we have made it clear that any
-patent must be licensed for everyone's free use or not licensed at all.
-
- The precise terms and conditions for copying, distribution and
-modification follow.
-
- GNU GENERAL PUBLIC LICENSE
- TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
-
- 0. This License applies to any program or other work which contains
-a notice placed by the copyright holder saying it may be distributed
-under the terms of this General Public License. The "Program", below,
-refers to any such program or work, and a "work based on the Program"
-means either the Program or any derivative work under copyright law:
-that is to say, a work containing the Program or a portion of it,
-either verbatim or with modifications and/or translated into another
-language. (Hereinafter, translation is included without limitation in
-the term "modification".) Each licensee is addressed as "you".
-
-Activities other than copying, distribution and modification are not
-covered by this License; they are outside its scope. The act of
-running the Program is not restricted, and the output from the Program
-is covered only if its contents constitute a work based on the
-Program (independent of having been made by running the Program).
-Whether that is true depends on what the Program does.
-
- 1. You may copy and distribute verbatim copies of the Program's
-source code as you receive it, in any medium, provided that you
-conspicuously and appropriately publish on each copy an appropriate
-copyright notice and disclaimer of warranty; keep intact all the
-notices that refer to this License and to the absence of any warranty;
-and give any other recipients of the Program a copy of this License
-along with the Program.
-
-You may charge a fee for the physical act of transferring a copy, and
-you may at your option offer warranty protection in exchange for a fee.
-
- 2. You may modify your copy or copies of the Program or any portion
-of it, thus forming a work based on the Program, and copy and
-distribute such modifications or work under the terms of Section 1
-above, provided that you also meet all of these conditions:
-
- a) You must cause the modified files to carry prominent notices
- stating that you changed the files and the date of any change.
-
- b) You must cause any work that you distribute or publish, that in
- whole or in part contains or is derived from the Program or any
- part thereof, to be licensed as a whole at no charge to all third
- parties under the terms of this License.
-
- c) If the modified program normally reads commands interactively
- when run, you must cause it, when started running for such
- interactive use in the most ordinary way, to print or display an
- announcement including an appropriate copyright notice and a
- notice that there is no warranty (or else, saying that you provide
- a warranty) and that users may redistribute the program under
- these conditions, and telling the user how to view a copy of this
- License. (Exception: if the Program itself is interactive but
- does not normally print such an announcement, your work based on
- the Program is not required to print an announcement.)
-
-These requirements apply to the modified work as a whole. If
-identifiable sections of that work are not derived from the Program,
-and can be reasonably considered independent and separate works in
-themselves, then this License, and its terms, do not apply to those
-sections when you distribute them as separate works. But when you
-distribute the same sections as part of a whole which is a work based
-on the Program, the distribution of the whole must be on the terms of
-this License, whose permissions for other licensees extend to the
-entire whole, and thus to each and every part regardless of who wrote it.
-
-Thus, it is not the intent of this section to claim rights or contest
-your rights to work written entirely by you; rather, the intent is to
-exercise the right to control the distribution of derivative or
-collective works based on the Program.
-
-In addition, mere aggregation of another work not based on the Program
-with the Program (or with a work based on the Program) on a volume of
-a storage or distribution medium does not bring the other work under
-the scope of this License.
-
- 3. You may copy and distribute the Program (or a work based on it,
-under Section 2) in object code or executable form under the terms of
-Sections 1 and 2 above provided that you also do one of the following:
-
- a) Accompany it with the complete corresponding machine-readable
- source code, which must be distributed under the terms of Sections
- 1 and 2 above on a medium customarily used for software interchange; or,
-
- b) Accompany it with a written offer, valid for at least three
- years, to give any third party, for a charge no more than your
- cost of physically performing source distribution, a complete
- machine-readable copy of the corresponding source code, to be
- distributed under the terms of Sections 1 and 2 above on a medium
- customarily used for software interchange; or,
-
- c) Accompany it with the information you received as to the offer
- to distribute corresponding source code. (This alternative is
- allowed only for noncommercial distribution and only if you
- received the program in object code or executable form with such
- an offer, in accord with Subsection b above.)
-
-The source code for a work means the preferred form of the work for
-making modifications to it. For an executable work, complete source
-code means all the source code for all modules it contains, plus any
-associated interface definition files, plus the scripts used to
-control compilation and installation of the executable. However, as a
-special exception, the source code distributed need not include
-anything that is normally distributed (in either source or binary
-form) with the major components (compiler, kernel, and so on) of the
-operating system on which the executable runs, unless that component
-itself accompanies the executable.
-
-If distribution of executable or object code is made by offering
-access to copy from a designated place, then offering equivalent
-access to copy the source code from the same place counts as
-distribution of the source code, even though third parties are not
-compelled to copy the source along with the object code.
-
- 4. You may not copy, modify, sublicense, or distribute the Program
-except as expressly provided under this License. Any attempt
-otherwise to copy, modify, sublicense or distribute the Program is
-void, and will automatically terminate your rights under this License.
-However, parties who have received copies, or rights, from you under
-this License will not have their licenses terminated so long as such
-parties remain in full compliance.
-
- 5. You are not required to accept this License, since you have not
-signed it. However, nothing else grants you permission to modify or
-distribute the Program or its derivative works. These actions are
-prohibited by law if you do not accept this License. Therefore, by
-modifying or distributing the Program (or any work based on the
-Program), you indicate your acceptance of this License to do so, and
-all its terms and conditions for copying, distributing or modifying
-the Program or works based on it.
-
- 6. Each time you redistribute the Program (or any work based on the
-Program), the recipient automatically receives a license from the
-original licensor to copy, distribute or modify the Program subject to
-these terms and conditions. You may not impose any further
-restrictions on the recipients' exercise of the rights granted herein.
-You are not responsible for enforcing compliance by third parties to
-this License.
-
- 7. If, as a consequence of a court judgment or allegation of patent
-infringement or for any other reason (not limited to patent issues),
-conditions are imposed on you (whether by court order, agreement or
-otherwise) that contradict the conditions of this License, they do not
-excuse you from the conditions of this License. If you cannot
-distribute so as to satisfy simultaneously your obligations under this
-License and any other pertinent obligations, then as a consequence you
-may not distribute the Program at all. For example, if a patent
-license would not permit royalty-free redistribution of the Program by
-all those who receive copies directly or indirectly through you, then
-the only way you could satisfy both it and this License would be to
-refrain entirely from distribution of the Program.
-
-If any portion of this section is held invalid or unenforceable under
-any particular circumstance, the balance of the section is intended to
-apply and the section as a whole is intended to apply in other
-circumstances.
-
-It is not the purpose of this section to induce you to infringe any
-patents or other property right claims or to contest validity of any
-such claims; this section has the sole purpose of protecting the
-integrity of the free software distribution system, which is
-implemented by public license practices. Many people have made
-generous contributions to the wide range of software distributed
-through that system in reliance on consistent application of that
-system; it is up to the author/donor to decide if he or she is willing
-to distribute software through any other system and a licensee cannot
-impose that choice.
-
-This section is intended to make thoroughly clear what is believed to
-be a consequence of the rest of this License.
-
- 8. If the distribution and/or use of the Program is restricted in
-certain countries either by patents or by copyrighted interfaces, the
-original copyright holder who places the Program under this License
-may add an explicit geographical distribution limitation excluding
-those countries, so that distribution is permitted only in or among
-countries not thus excluded. In such case, this License incorporates
-the limitation as if written in the body of this License.
-
- 9. The Free Software Foundation may publish revised and/or new versions
-of the General Public License from time to time. Such new versions will
-be similar in spirit to the present version, but may differ in detail to
-address new problems or concerns.
-
-Each version is given a distinguishing version number. If the Program
-specifies a version number of this License which applies to it and "any
-later version", you have the option of following the terms and conditions
-either of that version or of any later version published by the Free
-Software Foundation. If the Program does not specify a version number of
-this License, you may choose any version ever published by the Free Software
-Foundation.
-
- 10. If you wish to incorporate parts of the Program into other free
-programs whose distribution conditions are different, write to the author
-to ask for permission. For software which is copyrighted by the Free
-Software Foundation, write to the Free Software Foundation; we sometimes
-make exceptions for this. Our decision will be guided by the two goals
-of preserving the free status of all derivatives of our free software and
-of promoting the sharing and reuse of software generally.
-
- NO WARRANTY
-
- 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
-FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
-OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
-PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
-OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
-MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
-TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
-PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
-REPAIR OR CORRECTION.
-
- 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
-WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
-REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
-INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
-OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
-TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
-YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
-PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
-POSSIBILITY OF SUCH DAMAGES.
-
- END OF TERMS AND CONDITIONS
-
- Appendix: How to Apply These Terms to Your New Programs
-
- If you develop a new program, and you want it to be of the greatest
-possible use to the public, the best way to achieve this is to make it
-free software which everyone can redistribute and change under these terms.
-
- To do so, attach the following notices to the program. It is safest
-to attach them to the start of each source file to most effectively
-convey the exclusion of warranty; and each file should have at least
-the "copyright" line and a pointer to where the full notice is found.
-
- <one line to give the program's name and a brief idea of what it does.>
- Copyright (C) 19yy <name of author>
-
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation; either version 2 of the License, or
- (at your option) any later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
-
- You should have received a copy of the GNU General Public License
- along with this program; if not, write to the Free Software
- Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
-
-Also add information on how to contact you by electronic and paper mail.
-
-If the program is interactive, make it output a short notice like this
-when it starts in an interactive mode:
-
- Gnomovision version 69, Copyright (C) 19yy name of author
- Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
- This is free software, and you are welcome to redistribute it
- under certain conditions; type `show c' for details.
-
-The hypothetical commands `show w' and `show c' should show the appropriate
-parts of the General Public License. Of course, the commands you use may
-be called something other than `show w' and `show c'; they could even be
-mouse-clicks or menu items--whatever suits your program.
-
-You should also get your employer (if you work as a programmer) or your
-school, if any, to sign a "copyright disclaimer" for the program, if
-necessary. Here is a sample; alter the names:
-
- Yoyodyne, Inc., hereby disclaims all copyright interest in the program
- `Gnomovision' (which makes passes at compilers) written by James Hacker.
-
- <signature of Ty Coon>, 1 April 1989
- Ty Coon, President of Vice
-
-This General Public License does not permit incorporating your program into
-proprietary programs. If your program is a subroutine library, you may
-consider it more useful to permit linking proprietary applications with the
-library. If this is what you want to do, use the GNU Library General
-Public License instead of this License.
+ GNU GENERAL PUBLIC LICENSE + Version 2, June 1991 + + Copyright (C) 1989, 1991 Free Software Foundation, Inc. + 675 Mass Ave, Cambridge, MA 02139, USA + 617-542-5942 + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + Preamble + + The licenses for most software are designed to take away your +freedom to share and change it. By contrast, the GNU General Public +License is intended to guarantee your freedom to share and change free +software--to make sure the software is free for all its users. This +General Public License applies to most of the Free Software +Foundation's software and to any other program whose authors commit to +using it. (Some other Free Software Foundation software is covered by +the GNU Library General Public License instead.) You can apply it to +your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +this service if you wish), that you receive source code or can get it +if you want it, that you can change the software or use pieces of it +in new free programs; and that you know you can do these things. + + To protect your rights, we need to make restrictions that forbid +anyone to deny you these rights or to ask you to surrender the rights. +These restrictions translate to certain responsibilities for you if you +distribute copies of the software, or if you modify it. + + For example, if you distribute copies of such a program, whether +gratis or for a fee, you must give the recipients all the rights that +you have. You must make sure that they, too, receive or can get the +source code. And you must show them these terms so they know their +rights. + + We protect your rights with two steps: (1) copyright the software, and +(2) offer you this license which gives you legal permission to copy, +distribute and/or modify the software. + + Also, for each author's protection and ours, we want to make certain +that everyone understands that there is no warranty for this free +software. If the software is modified by someone else and passed on, we +want its recipients to know that what they have is not the original, so +that any problems introduced by others will not reflect on the original +authors' reputations. + + Finally, any free program is threatened constantly by software +patents. We wish to avoid the danger that redistributors of a free +program will individually obtain patent licenses, in effect making the +program proprietary. To prevent this, we have made it clear that any +patent must be licensed for everyone's free use or not licensed at all. + + The precise terms and conditions for copying, distribution and +modification follow. + + GNU GENERAL PUBLIC LICENSE + TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION + + 0. This License applies to any program or other work which contains +a notice placed by the copyright holder saying it may be distributed +under the terms of this General Public License. The "Program", below, +refers to any such program or work, and a "work based on the Program" +means either the Program or any derivative work under copyright law: +that is to say, a work containing the Program or a portion of it, +either verbatim or with modifications and/or translated into another +language. (Hereinafter, translation is included without limitation in +the term "modification".) Each licensee is addressed as "you". + +Activities other than copying, distribution and modification are not +covered by this License; they are outside its scope. The act of +running the Program is not restricted, and the output from the Program +is covered only if its contents constitute a work based on the +Program (independent of having been made by running the Program). +Whether that is true depends on what the Program does. + + 1. You may copy and distribute verbatim copies of the Program's +source code as you receive it, in any medium, provided that you +conspicuously and appropriately publish on each copy an appropriate +copyright notice and disclaimer of warranty; keep intact all the +notices that refer to this License and to the absence of any warranty; +and give any other recipients of the Program a copy of this License +along with the Program. + +You may charge a fee for the physical act of transferring a copy, and +you may at your option offer warranty protection in exchange for a fee. + + 2. You may modify your copy or copies of the Program or any portion +of it, thus forming a work based on the Program, and copy and +distribute such modifications or work under the terms of Section 1 +above, provided that you also meet all of these conditions: + + a) You must cause the modified files to carry prominent notices + stating that you changed the files and the date of any change. + + b) You must cause any work that you distribute or publish, that in + whole or in part contains or is derived from the Program or any + part thereof, to be licensed as a whole at no charge to all third + parties under the terms of this License. + + c) If the modified program normally reads commands interactively + when run, you must cause it, when started running for such + interactive use in the most ordinary way, to print or display an + announcement including an appropriate copyright notice and a + notice that there is no warranty (or else, saying that you provide + a warranty) and that users may redistribute the program under + these conditions, and telling the user how to view a copy of this + License. (Exception: if the Program itself is interactive but + does not normally print such an announcement, your work based on + the Program is not required to print an announcement.) + +These requirements apply to the modified work as a whole. If +identifiable sections of that work are not derived from the Program, +and can be reasonably considered independent and separate works in +themselves, then this License, and its terms, do not apply to those +sections when you distribute them as separate works. But when you +distribute the same sections as part of a whole which is a work based +on the Program, the distribution of the whole must be on the terms of +this License, whose permissions for other licensees extend to the +entire whole, and thus to each and every part regardless of who wrote it. + +Thus, it is not the intent of this section to claim rights or contest +your rights to work written entirely by you; rather, the intent is to +exercise the right to control the distribution of derivative or +collective works based on the Program. + +In addition, mere aggregation of another work not based on the Program +with the Program (or with a work based on the Program) on a volume of +a storage or distribution medium does not bring the other work under +the scope of this License. + + 3. You may copy and distribute the Program (or a work based on it, +under Section 2) in object code or executable form under the terms of +Sections 1 and 2 above provided that you also do one of the following: + + a) Accompany it with the complete corresponding machine-readable + source code, which must be distributed under the terms of Sections + 1 and 2 above on a medium customarily used for software interchange; or, + + b) Accompany it with a written offer, valid for at least three + years, to give any third party, for a charge no more than your + cost of physically performing source distribution, a complete + machine-readable copy of the corresponding source code, to be + distributed under the terms of Sections 1 and 2 above on a medium + customarily used for software interchange; or, + + c) Accompany it with the information you received as to the offer + to distribute corresponding source code. (This alternative is + allowed only for noncommercial distribution and only if you + received the program in object code or executable form with such + an offer, in accord with Subsection b above.) + +The source code for a work means the preferred form of the work for +making modifications to it. For an executable work, complete source +code means all the source code for all modules it contains, plus any +associated interface definition files, plus the scripts used to +control compilation and installation of the executable. However, as a +special exception, the source code distributed need not include +anything that is normally distributed (in either source or binary +form) with the major components (compiler, kernel, and so on) of the +operating system on which the executable runs, unless that component +itself accompanies the executable. + +If distribution of executable or object code is made by offering +access to copy from a designated place, then offering equivalent +access to copy the source code from the same place counts as +distribution of the source code, even though third parties are not +compelled to copy the source along with the object code. + + 4. You may not copy, modify, sublicense, or distribute the Program +except as expressly provided under this License. Any attempt +otherwise to copy, modify, sublicense or distribute the Program is +void, and will automatically terminate your rights under this License. +However, parties who have received copies, or rights, from you under +this License will not have their licenses terminated so long as such +parties remain in full compliance. + + 5. You are not required to accept this License, since you have not +signed it. However, nothing else grants you permission to modify or +distribute the Program or its derivative works. These actions are +prohibited by law if you do not accept this License. Therefore, by +modifying or distributing the Program (or any work based on the +Program), you indicate your acceptance of this License to do so, and +all its terms and conditions for copying, distributing or modifying +the Program or works based on it. + + 6. Each time you redistribute the Program (or any work based on the +Program), the recipient automatically receives a license from the +original licensor to copy, distribute or modify the Program subject to +these terms and conditions. You may not impose any further +restrictions on the recipients' exercise of the rights granted herein. +You are not responsible for enforcing compliance by third parties to +this License. + + 7. If, as a consequence of a court judgment or allegation of patent +infringement or for any other reason (not limited to patent issues), +conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot +distribute so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you +may not distribute the Program at all. For example, if a patent +license would not permit royalty-free redistribution of the Program by +all those who receive copies directly or indirectly through you, then +the only way you could satisfy both it and this License would be to +refrain entirely from distribution of the Program. + +If any portion of this section is held invalid or unenforceable under +any particular circumstance, the balance of the section is intended to +apply and the section as a whole is intended to apply in other +circumstances. + +It is not the purpose of this section to induce you to infringe any +patents or other property right claims or to contest validity of any +such claims; this section has the sole purpose of protecting the +integrity of the free software distribution system, which is +implemented by public license practices. Many people have made +generous contributions to the wide range of software distributed +through that system in reliance on consistent application of that +system; it is up to the author/donor to decide if he or she is willing +to distribute software through any other system and a licensee cannot +impose that choice. + +This section is intended to make thoroughly clear what is believed to +be a consequence of the rest of this License. + + 8. If the distribution and/or use of the Program is restricted in +certain countries either by patents or by copyrighted interfaces, the +original copyright holder who places the Program under this License +may add an explicit geographical distribution limitation excluding +those countries, so that distribution is permitted only in or among +countries not thus excluded. In such case, this License incorporates +the limitation as if written in the body of this License. + + 9. The Free Software Foundation may publish revised and/or new versions +of the General Public License from time to time. Such new versions will +be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + +Each version is given a distinguishing version number. If the Program +specifies a version number of this License which applies to it and "any +later version", you have the option of following the terms and conditions +either of that version or of any later version published by the Free +Software Foundation. If the Program does not specify a version number of +this License, you may choose any version ever published by the Free Software +Foundation. + + 10. If you wish to incorporate parts of the Program into other free +programs whose distribution conditions are different, write to the author +to ask for permission. For software which is copyrighted by the Free +Software Foundation, write to the Free Software Foundation; we sometimes +make exceptions for this. Our decision will be guided by the two goals +of preserving the free status of all derivatives of our free software and +of promoting the sharing and reuse of software generally. + + NO WARRANTY + + 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY +FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN +OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES +PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED +OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF +MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS +TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE +PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, +REPAIR OR CORRECTION. + + 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR +REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, +INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING +OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED +TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY +YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER +PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE +POSSIBILITY OF SUCH DAMAGES. + + END OF TERMS AND CONDITIONS + + Appendix: How to Apply These Terms to Your New Programs + + If you develop a new program, and you want it to be of the greatest +possible use to the public, the best way to achieve this is to make it +free software which everyone can redistribute and change under these terms. + + To do so, attach the following notices to the program. It is safest +to attach them to the start of each source file to most effectively +convey the exclusion of warranty; and each file should have at least +the "copyright" line and a pointer to where the full notice is found. + + <one line to give the program's name and a brief idea of what it does.> + Copyright (C) 19yy <name of author> + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software + Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + +Also add information on how to contact you by electronic and paper mail. + +If the program is interactive, make it output a short notice like this +when it starts in an interactive mode: + + Gnomovision version 69, Copyright (C) 19yy name of author + Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. + This is free software, and you are welcome to redistribute it + under certain conditions; type `show c' for details. + +The hypothetical commands `show w' and `show c' should show the appropriate +parts of the General Public License. Of course, the commands you use may +be called something other than `show w' and `show c'; they could even be +mouse-clicks or menu items--whatever suits your program. + +You should also get your employer (if you work as a programmer) or your +school, if any, to sign a "copyright disclaimer" for the program, if +necessary. Here is a sample; alter the names: + + Yoyodyne, Inc., hereby disclaims all copyright interest in the program + `Gnomovision' (which makes passes at compilers) written by James Hacker. + + <signature of Ty Coon>, 1 April 1989 + Ty Coon, President of Vice + +This General Public License does not permit incorporating your program into +proprietary programs. If your program is a subroutine library, you may +consider it more useful to permit linking proprietary applications with the +library. If this is what you want to do, use the GNU Library General +Public License instead of this License. diff --git a/GNUCIDE.DIR b/GNUCIDE.DIR deleted file mode 100644 index cc416a7..0000000 --- a/GNUCIDE.DIR +++ b/dev/null @@ -1,36 +0,0 @@ -CIDE A 3,680,387 04-10-02 9:29a CIDE.A
-CIDE B 3,154,243 04-10-02 9:32a CIDE.B
-CIDE C 5,525,332 04-10-02 9:33a CIDE.C
-CIDE D 3,370,374 04-10-02 9:35a cide.d
-CIDE E 2,289,630 01-17-02 10:58p CIDE.E
-CIDE F 2,453,360 01-17-02 11:00p CIDE.F
-CIDE G 1,795,200 04-10-02 9:37a CIDE.G
-CIDE H 2,086,911 01-17-02 11:04p CIDE.H
-CIDE I 2,390,954 01-17-02 11:06p CIDE.I
-CIDE J 497,623 01-17-02 11:07p CIDE.J
-CIDE K 460,759 01-17-02 11:08p CIDE.K
-CIDE L 2,001,288 01-17-02 11:10p CIDE.L
-CIDE M 2,977,038 01-17-02 11:12p CIDE.M
-CIDE N 1,054,928 01-17-02 11:16p CIDE.N
-CIDE O 1,404,418 01-17-02 11:18p CIDE.O
-CIDE P 4,645,196 04-10-02 9:41a CIDE.P
-CIDE Q 312,451 04-10-02 9:43a CIDE.Q
-CIDE R 2,673,840 01-17-02 11:25p CIDE.R
-CIDE S 6,331,172 04-10-02 9:46a CIDE.S
-CIDE T 2,985,967 01-17-02 11:30p CIDE.T
-CIDE U 963,375 01-17-02 11:32p CIDE.U
-CIDE V 962,468 01-17-02 11:33p CIDE.V
-CIDE W 1,569,184 01-17-02 11:35p CIDE.W
-CIDE X 48,494 01-17-02 11:36p CIDE.X
-CIDE Y 182,838 01-17-02 11:38p CIDE.Y
-CIDE Z 134,301 01-17-02 11:41p CIDE.Z
-README DIC 13,775 01-17-02 11:55p readme.dic
-GNUCIDE DIR 1,899 04-10-02 9:58a GNUCIDE.DIR
-PRONUNC JPG 2,569,796 06-18-00 3:11p PRONUNC.JPG
-PRONUNC WEB 14,312 06-18-00 3:02p PRONUNC.WEB
-SYMBOLS JPG 144,716 06-18-00 3:13p SYMBOLS.JPG
-TAGSET WEB 55,843 08-16-01 1:16p TAGSET.WEB
-WEBFONT ASC 35,234 12-12-01 3:27p WEBFONT.ASC
-WXXVII JPG 1,188,380 06-18-00 3:19p WXXVII.JPG
-COPYING 18,361 02-11-02 4:02p COPYING
- 35 file(s) 59,994,047 bytes
diff --git a/PRONUNC.WEB b/PRONUNC.WEB index 39ed073..325f8ce 100644 --- a/PRONUNC.WEB +++ b/PRONUNC.WEB @@ -1,318 +1,318 @@ -file PRONUNC.WEB
-================
- This file gives a number of examples of pronunciation,
-using the entity symbols representing the pronunciations as
-found in the 1913 Webster unabridged dictionary. Not all
-vowel sounds are given here, but the examples should allow one
-to recognize the characters and recall the symbols used to
-represent them. The set of symbols used for pronunciation
-is different from that used in most modern dictionaries,
-but a more worrisome problem is that the pronuncitions themselves
-seem in many cases to differ from modern usage. The places of
-the strong and weak accent are, however, in every case
-examined the same as in modern dictionaries. Anyone who is
-willing to work at revising the pronunciations to reflect modern
-usage or modern symbols should contact PJC.
-
-
- Pronunciations in the 1913 Webster ASCII version
- =================================================
-
-Syllables:
-----------------
- in pronunciations, the short hyphen used in the printed version as a
-syllable-break is represented in the ASCII version by an asterisk (*).
- the main (heavy) accent is represented by a double-quote (").
- the secondary (light) accent is represented by a left-single-quote
-(grave accent) (`)
- the hyphen in hyphenated words is represented by the ASCII hypen (-).
- where an accent occurs, no other syllable break is used.
- sometimes a hyphen occurs after an accent.
- ------------------------------------------------
-
-Consonants:
- Most consonants have their normal value in the pronunciations,
-but there are a few special characters, as the n-submacron and the
-"th" ligature. See the end of the "special characters" section.
-
-Special characters:
---------------------
- The special characters are represented by two different sets of
-symbols: (1) the RTF-format hexadecimal codes such as \'94 for
-o-umlaut, meaning that the byte code is hexadecimal 94. These
-are used only for those symbols which have been designed into a
-special font set for this dictionary. The font set can only be used
-in a DOS system; or
-(2) an "entity" symbol using "<" and "/" as opening and closing
-delimiters, with a mnemonic string between. In the case of o-umlaut
-the symbol is <oum/. For the vowels, the system is consistent,
-thus <aum/ is a-umlaut, and <ium/ is i-umlaut, etc.
- These delimiters are used in preference to the HTML-style
-(e.g. ä) delimiters because of the heavy use of ampersands in
-the dictionary, to minimize file length. For the same reason,
-the codes within the delimiters are generally shorter than the
-corresponding ISO 8879 codes ( <aum/ rather than ä ).
- For this discussion, I will use the "entity" coding. The
-equivalent hexadecimal codes, where they exist, will be found in
-the tables in the file "webfont.asc".
-
- The pronunciation system of the 1913 Webster has three peculiarities
-relative to systems used in recent dictionaries.
-(1) a more complex set of symbols are used. This is evident, for
- example, where the long vowels have different symbols whether
- they are used in stressed or unstressed syllables. Thus
- long a in "acre" or "chaos"is represented as a-macron (<amac/ in
- our notation). But in "chaotic" or "connate" or "comate" it is
- represented as a symbol looking like a-macron, but with a short
- ascender in the middle of the macron above the a. This is denoted
- <asl/ ("a semilong") in our notation.
-
- Also, some sounds have more than one symbol. Thus, there are several
- symbols using "y" with a diacritical mark above, representing
- identical sounds using "i" or "e", but used in those cases where the
- written word has a "y" in it. So words ending in "y" with
- pronunciations like the unaccented long "e" usually have
- a y-breve (<ycr/) in the pronunciation. Why? Apparently,
- just to look more like the spelling. In these cases its
- meaning is unambuiguous.
-
-(2) The indicated pronunciations themselves are in some cases
- different from what one would find in a modern dictionary.
- In part this is due to differences among orthoepists with
- different notions of how a word should sound, and possibly
- it is due to differences in the pronunciation between 1890,
- when British pronunciations may have had more influence, and
- the present. Thus we see that words ending in -"ties",
- which are given the pronunciation "-t<icr/z", which sounds
- like "tizz", whereas I have always heard such words pronounced
- with a long "e", as in "teez" (and most modern dictionaries
- give it the long-e pronunciation. In Webster's 10th collegiate,
- they mention that unstressed long e may be pronounced as i in
- southern British or southern US dialects, and perhaps it
- was more common in the US in 1890. The <icr/ is an unreliable
- indicator of modern standard American pronunciation. A long-e
- pronunciation on the antepenult is also sometimes given an
- <icr/ symbol in this dictionary.
-
-(3) The indefinite value, represented by an upside-down e (called
- the "schwa" is not used, the same sound being represented by
- symbols like short u <ucr/, or sometimes other vowels.
-
- So be warned, the pronunciations may not be quite what one would
- expect. But for the first phase of this effort, we are trying
- to reproduce exactly the pronuciations in the original work.
-
- Notice that in pronunciations, vowels that are obscured are often
- represented by the italicised vowel without any diacritical marks;
- these italicised vowels are represented as either <ait/, <eit/, etc.
- or with an <it> tag, as in m<it>e</it>nt
- Thus "Christian" is represented as kr<icr/s"ch<it>a</it>n
- communicant is represented as k<ocr/m*m<umac/"n<icr/*k<ait/nt
-
-
- Some examples of pronunciations follow:
- for further explanations of the entities, see the file "webfont.asc"
- ==============================================================
-
- <amac/ long a (stressed) (a with a macron above it)
- late = l<amac/t
- later = l<amac/t"<etil/r
- comb-shaped = k<omac/m"-sh<amac/pt`
- commemorate = k<ocr/m*m<ecr/m"<osl/*r<amac/t
- deign = d<amac/n
- deflate = d<esl/*fl<amac/t"
- defray = d<esl/*fr<amac/"
- defrayal = d<esl/*fr<amac/"<ait/l
-
-
- <asl/ long a (unstressed)
- commodate = k<ocr/m"m<osl/*d<asl/t
- cometary = k<ocr/m"<ecr/t*<asl/*r<ycr/
-
- <ait/ italic a
- communicant = k<ocr/m*m<umac/"n<icr/*k<ait/nt
- defeasance = d<esl/*f<emac/"z<ait/ns
- commercial = k<ocr/m*m<etil/r"sh<ait/l
- compass = k<ucr/m"p<ait/s
-
- <acr/ short a (a with a crescent [breve] above it)
- adipose = <acr/d"<icr/*p<omac/s
- absolve = <acr/b*s<ocr/lv"
- land = l<acr/nd
- lamp = l<acr/mp
-
- <adot/ short a (a with a dot above it)
- again = <adot/*g<ecr/n"
- carouse = k<adot/*rouz"
- coma = k<omac/"m<adot/
- comma = k<ocr/m"m<adot/ | *These sound different
- command = k<ocr/m*m<adot/nd" | to me
- mass = m<adot/s
- mash = m<adot/sh
- mat = m<adot/t
-
- <acir/ a-circumflex ("only in syllables closed by r")
- care = k<acir/r
- chair = ch<acir/r
- share = sh<acir/r
- compare = k<ocr/m*p<acir/r"
-
- <aum/ a-umlaut (in pronunciations not the same as in words)
- arsenic = <aum/r"s<esl/*n<icr/k
- arson = <aum/r"s'n
- arm = <aum/rm
- carp = k<aum/rp
- far = f<aum/r
- mar = m<aum/r
- compart = k<ocr/m*p<aum/rt"
- compartment = k<ocr/m*p<aum/rt"m<eit/nt
-
- <add/ a double dot ( with a double dot *below*)
- all = <add/l
- talk = t<add/k
- swarm = sw<add/rm [not aum??]
- water = w<add/"t<etil/r
- default = d<esl/*f<add/lt"
- defraud = d<esl/*fr<add/d"
- deerstalker = d<emac/r"st<add/k`<etil/r
-
-
- <ecr/ short e (e with a crescent [breve] above it)
- degenerate = d<esl/*j<ecr/n"<etil/r*<amac/t
- delve = d<ecr/lv
- end = <ecr/nd
- pet = p<ecr/t
- ten = t<ecr/n
-
- <esl/ long e (unstressed)
- committee = k<ocr/m*m<icr/t"t<esl/
- defame = d<esl/*f<amac/m"
- define = d<esl/*f<imac/n"
- comedy = k<ocr/m"<esl/*d<ycr/
-
- <eit/ e italic
- compartment = k<ocr/m*p<aum/rt"m<eit/nt
- -ment = -"m<eit/nt (for most -ment endings)
-
- <emac/ e macron (long e, stressed)
- compeer = k<ocr/m*p<emac/r"
- deer = d<emac/r"
-
- <etil/ e-tilde
- (representing the e before r in many words)
- (for the same sound in -ur words, <ucir/ is used!)
- fern = f<etil/rn
- commercial = k<ocr/m*m<etil/r"sh<ait/l
- commerce = k<ocr/m"m<etil/rs
-
- <icr/ short i (i with a crescent [breve] above it)
- Note: In most cases, this is used where the
- short i sound of "lip" is intended, but it is
- also used in the middle of words where Americans
- use an unstressed long "e" sound, (as the
- "i" in "serial" and "serious")!?
- and also in words ending in "ies",
- coded as "<icr/z" (as in liberties)
- lip = l<icr/p
- pin = p<icr/n
- commission = k<ocr/m*m<icr/sh"<ucr/n
- committal = k<ocr/m*m<icr/t"t<ait/l
- *serial = s<emac/"r<icr/*<ait/l
- *serious = s<emac/"r<icr/*<ucr/s
- liberty = l<icr/b"<etil/r*t<ycr/
- *but: liberties = l<icr/b"<etil/r*t<icr/z
-
- <imac/ i-macron (long i, stressed) (i with a macron above it)
- combine = k<ocr/m*b<imac/n"
- combined = k<ocr/m*b<imac/"nd
-
- <isl/ long i (unstressed)
- diameter = d<isl/*<acr/m"<esl/*t<etil/r
- diagonal = d<isl/*<acr/g"<osl/*n<ait/l
-
-
- <ocr/ short o (o with a crescent [breve] above it)
- colossus = k<osl/*l<ocr/s"s<ucr/s
- commute = k<ocr/m*m<umac/t"
-
- <omac/ o-macron (long o, stressed) (o with a macron above it)
- boat = b<omac/t
- colt = k<omac/lt
- comb = k<omac/m
- combing = k<omac/m"<icr/ng
- commode = k<ocr/m*m<omac/d"
- course = k<omac/rs
-
- <ocir/ o-circumflex ("only in syllables closed by r")
- orb = <ocir/rb
- lord = l<ocir/rd
- lordship = l<ocir/rd"sh<icr/p
- lorn = l<ocir/rn
- cord = k<ocir/rd
- commorse = k<ocr/m*m<ocir/rs"
- deform = d<esl/*f<ocir/rm"
- deformed = d<esl/*f<ocir/rmd"
- dehortative = d<esl/*h<ocir/rt"<adot*t<icr/v
-
- <osl/ "o semilong" (long o, unstressed)
- diagonal = d<isl/*<acr/g"<osl/*n<ait/l
- dejectory = d<esl/*j<ecr/k"t<osl/*r<ycr/
-
- <oomac/ oo-macron (an oo with a macron above both o's)
- boom = b<oomac/m
- boot = b<oomac/t
- boost = b<oomac/st
- commove = k<ocr/m*m<oomac/v"
-
- <oomcr/ oo-crescent (an oo with a crescent [breve] above both o's)
- foot = f<oocr/t
- cook = k<oocr/k
-
- <umac/ u macron (long u)
- commute = k<ocr/m*m<umac/t"
- definitude = d<esl/*f<icr/n"<icr/*t<umac/d
- communicant = k<ocr/m*m<umac/"n<icr/*k<ait/nt
- defuse = d<esl/*f<umac/z"
-
- <ucr/ short u (u with a crescent [breve] above it)
- come = k<ucr/m
- color = k<ucr/l"<etil/r
- colored = k<ucr/l"<etil/rd
- Columbia = k<osl/*l<ucr/m"b<icr/*<adot/
- up = <ucr/p
-
- <ycr/ y-crescent (y with a crescent [breve] above it)
- used mostly for y-endings (supposed to sound similar to <icr/!!)
- sounds to me like an unstressed long e
- comedy = k<ocr/m"<esl/*d<ycr/
- comely = k<ucr/m"l<ycr/
- liberty = l<icr/b"<etil/r*t<ycr/
-
- <ymac/ y-macron (y with a macron above it)
- used to represent the long i (stressed) sound, but
- examples in pronunciations seem to be absent. It is
- found in some foreign words in the etymologies.
-
- ou the common "ow" sound of "town", "browse"
- count = kount
-
- <nsm/ n-submacron (an n with a macron underneath)
- represents the "ng" sound when it occurs before a
- consonant
- defunct = d<esl/*f<ucr/<nsm/kt"
- commingle = k<ocr/m*m<icr/<nsm/"g'l
-
- <th/ the "th" sound in "mother"
- this is represented in the printed work by a th ligature
- carouse = k<adot/*rouz"
-
- zh not a special character, but used to represent the
- "si" sound in words like
-
- decision = d<esl/*s<icr/zh"<ucr/n
-
- th the usual sound as in thing and thorn
- sh the usual as in ship
- ch the usual as in chip
- N (capital N) represents the nasal "n" sound of the French language
-
+file PRONUNC.WEB +================ + This file gives a number of examples of pronunciation, +using the entity symbols representing the pronunciations as +found in the 1913 Webster unabridged dictionary. Not all +vowel sounds are given here, but the examples should allow one +to recognize the characters and recall the symbols used to +represent them. The set of symbols used for pronunciation +is different from that used in most modern dictionaries, +but a more worrisome problem is that the pronuncitions themselves +seem in many cases to differ from modern usage. The places of +the strong and weak accent are, however, in every case +examined the same as in modern dictionaries. Anyone who is +willing to work at revising the pronunciations to reflect modern +usage or modern symbols should contact PJC. + + + Pronunciations in the 1913 Webster ASCII version + ================================================= + +Syllables: +---------------- + in pronunciations, the short hyphen used in the printed version as a +syllable-break is represented in the ASCII version by an asterisk (*). + the main (heavy) accent is represented by a double-quote ("). + the secondary (light) accent is represented by a left-single-quote +(grave accent) (`) + the hyphen in hyphenated words is represented by the ASCII hypen (-). + where an accent occurs, no other syllable break is used. + sometimes a hyphen occurs after an accent. + ------------------------------------------------ + +Consonants: + Most consonants have their normal value in the pronunciations, +but there are a few special characters, as the n-submacron and the +"th" ligature. See the end of the "special characters" section. + +Special characters: +-------------------- + The special characters are represented by two different sets of +symbols: (1) the RTF-format hexadecimal codes such as \'94 for +o-umlaut, meaning that the byte code is hexadecimal 94. These +are used only for those symbols which have been designed into a +special font set for this dictionary. The font set can only be used +in a DOS system; or +(2) an "entity" symbol using "<" and "/" as opening and closing +delimiters, with a mnemonic string between. In the case of o-umlaut +the symbol is <oum/. For the vowels, the system is consistent, +thus <aum/ is a-umlaut, and <ium/ is i-umlaut, etc. + These delimiters are used in preference to the HTML-style +(e.g. ä) delimiters because of the heavy use of ampersands in +the dictionary, to minimize file length. For the same reason, +the codes within the delimiters are generally shorter than the +corresponding ISO 8879 codes ( <aum/ rather than ä ). + For this discussion, I will use the "entity" coding. The +equivalent hexadecimal codes, where they exist, will be found in +the tables in the file "webfont.asc". + + The pronunciation system of the 1913 Webster has three peculiarities +relative to systems used in recent dictionaries. +(1) a more complex set of symbols are used. This is evident, for + example, where the long vowels have different symbols whether + they are used in stressed or unstressed syllables. Thus + long a in "acre" or "chaos"is represented as a-macron (<amac/ in + our notation). But in "chaotic" or "connate" or "comate" it is + represented as a symbol looking like a-macron, but with a short + ascender in the middle of the macron above the a. This is denoted + <asl/ ("a semilong") in our notation. + + Also, some sounds have more than one symbol. Thus, there are several + symbols using "y" with a diacritical mark above, representing + identical sounds using "i" or "e", but used in those cases where the + written word has a "y" in it. So words ending in "y" with + pronunciations like the unaccented long "e" usually have + a y-breve (<ycr/) in the pronunciation. Why? Apparently, + just to look more like the spelling. In these cases its + meaning is unambuiguous. + +(2) The indicated pronunciations themselves are in some cases + different from what one would find in a modern dictionary. + In part this is due to differences among orthoepists with + different notions of how a word should sound, and possibly + it is due to differences in the pronunciation between 1890, + when British pronunciations may have had more influence, and + the present. Thus we see that words ending in -"ties", + which are given the pronunciation "-t<icr/z", which sounds + like "tizz", whereas I have always heard such words pronounced + with a long "e", as in "teez" (and most modern dictionaries + give it the long-e pronunciation. In Webster's 10th collegiate, + they mention that unstressed long e may be pronounced as i in + southern British or southern US dialects, and perhaps it + was more common in the US in 1890. The <icr/ is an unreliable + indicator of modern standard American pronunciation. A long-e + pronunciation on the antepenult is also sometimes given an + <icr/ symbol in this dictionary. + +(3) The indefinite value, represented by an upside-down e (called + the "schwa" is not used, the same sound being represented by + symbols like short u <ucr/, or sometimes other vowels. + + So be warned, the pronunciations may not be quite what one would + expect. But for the first phase of this effort, we are trying + to reproduce exactly the pronuciations in the original work. + + Notice that in pronunciations, vowels that are obscured are often + represented by the italicised vowel without any diacritical marks; + these italicised vowels are represented as either <ait/, <eit/, etc. + or with an <it> tag, as in m<it>e</it>nt + Thus "Christian" is represented as kr<icr/s"ch<it>a</it>n + communicant is represented as k<ocr/m*m<umac/"n<icr/*k<ait/nt + + + Some examples of pronunciations follow: + for further explanations of the entities, see the file "webfont.asc" + ============================================================== + + <amac/ long a (stressed) (a with a macron above it) + late = l<amac/t + later = l<amac/t"<etil/r + comb-shaped = k<omac/m"-sh<amac/pt` + commemorate = k<ocr/m*m<ecr/m"<osl/*r<amac/t + deign = d<amac/n + deflate = d<esl/*fl<amac/t" + defray = d<esl/*fr<amac/" + defrayal = d<esl/*fr<amac/"<ait/l + + + <asl/ long a (unstressed) + commodate = k<ocr/m"m<osl/*d<asl/t + cometary = k<ocr/m"<ecr/t*<asl/*r<ycr/ + + <ait/ italic a + communicant = k<ocr/m*m<umac/"n<icr/*k<ait/nt + defeasance = d<esl/*f<emac/"z<ait/ns + commercial = k<ocr/m*m<etil/r"sh<ait/l + compass = k<ucr/m"p<ait/s + + <acr/ short a (a with a crescent [breve] above it) + adipose = <acr/d"<icr/*p<omac/s + absolve = <acr/b*s<ocr/lv" + land = l<acr/nd + lamp = l<acr/mp + + <adot/ short a (a with a dot above it) + again = <adot/*g<ecr/n" + carouse = k<adot/*rouz" + coma = k<omac/"m<adot/ + comma = k<ocr/m"m<adot/ | *These sound different + command = k<ocr/m*m<adot/nd" | to me + mass = m<adot/s + mash = m<adot/sh + mat = m<adot/t + + <acir/ a-circumflex ("only in syllables closed by r") + care = k<acir/r + chair = ch<acir/r + share = sh<acir/r + compare = k<ocr/m*p<acir/r" + + <aum/ a-umlaut (in pronunciations not the same as in words) + arsenic = <aum/r"s<esl/*n<icr/k + arson = <aum/r"s'n + arm = <aum/rm + carp = k<aum/rp + far = f<aum/r + mar = m<aum/r + compart = k<ocr/m*p<aum/rt" + compartment = k<ocr/m*p<aum/rt"m<eit/nt + + <add/ a double dot ( with a double dot *below*) + all = <add/l + talk = t<add/k + swarm = sw<add/rm [not aum??] + water = w<add/"t<etil/r + default = d<esl/*f<add/lt" + defraud = d<esl/*fr<add/d" + deerstalker = d<emac/r"st<add/k`<etil/r + + + <ecr/ short e (e with a crescent [breve] above it) + degenerate = d<esl/*j<ecr/n"<etil/r*<amac/t + delve = d<ecr/lv + end = <ecr/nd + pet = p<ecr/t + ten = t<ecr/n + + <esl/ long e (unstressed) + committee = k<ocr/m*m<icr/t"t<esl/ + defame = d<esl/*f<amac/m" + define = d<esl/*f<imac/n" + comedy = k<ocr/m"<esl/*d<ycr/ + + <eit/ e italic + compartment = k<ocr/m*p<aum/rt"m<eit/nt + -ment = -"m<eit/nt (for most -ment endings) + + <emac/ e macron (long e, stressed) + compeer = k<ocr/m*p<emac/r" + deer = d<emac/r" + + <etil/ e-tilde + (representing the e before r in many words) + (for the same sound in -ur words, <ucir/ is used!) + fern = f<etil/rn + commercial = k<ocr/m*m<etil/r"sh<ait/l + commerce = k<ocr/m"m<etil/rs + + <icr/ short i (i with a crescent [breve] above it) + Note: In most cases, this is used where the + short i sound of "lip" is intended, but it is + also used in the middle of words where Americans + use an unstressed long "e" sound, (as the + "i" in "serial" and "serious")!? + and also in words ending in "ies", + coded as "<icr/z" (as in liberties) + lip = l<icr/p + pin = p<icr/n + commission = k<ocr/m*m<icr/sh"<ucr/n + committal = k<ocr/m*m<icr/t"t<ait/l + *serial = s<emac/"r<icr/*<ait/l + *serious = s<emac/"r<icr/*<ucr/s + liberty = l<icr/b"<etil/r*t<ycr/ + *but: liberties = l<icr/b"<etil/r*t<icr/z + + <imac/ i-macron (long i, stressed) (i with a macron above it) + combine = k<ocr/m*b<imac/n" + combined = k<ocr/m*b<imac/"nd + + <isl/ long i (unstressed) + diameter = d<isl/*<acr/m"<esl/*t<etil/r + diagonal = d<isl/*<acr/g"<osl/*n<ait/l + + + <ocr/ short o (o with a crescent [breve] above it) + colossus = k<osl/*l<ocr/s"s<ucr/s + commute = k<ocr/m*m<umac/t" + + <omac/ o-macron (long o, stressed) (o with a macron above it) + boat = b<omac/t + colt = k<omac/lt + comb = k<omac/m + combing = k<omac/m"<icr/ng + commode = k<ocr/m*m<omac/d" + course = k<omac/rs + + <ocir/ o-circumflex ("only in syllables closed by r") + orb = <ocir/rb + lord = l<ocir/rd + lordship = l<ocir/rd"sh<icr/p + lorn = l<ocir/rn + cord = k<ocir/rd + commorse = k<ocr/m*m<ocir/rs" + deform = d<esl/*f<ocir/rm" + deformed = d<esl/*f<ocir/rmd" + dehortative = d<esl/*h<ocir/rt"<adot*t<icr/v + + <osl/ "o semilong" (long o, unstressed) + diagonal = d<isl/*<acr/g"<osl/*n<ait/l + dejectory = d<esl/*j<ecr/k"t<osl/*r<ycr/ + + <oomac/ oo-macron (an oo with a macron above both o's) + boom = b<oomac/m + boot = b<oomac/t + boost = b<oomac/st + commove = k<ocr/m*m<oomac/v" + + <oomcr/ oo-crescent (an oo with a crescent [breve] above both o's) + foot = f<oocr/t + cook = k<oocr/k + + <umac/ u macron (long u) + commute = k<ocr/m*m<umac/t" + definitude = d<esl/*f<icr/n"<icr/*t<umac/d + communicant = k<ocr/m*m<umac/"n<icr/*k<ait/nt + defuse = d<esl/*f<umac/z" + + <ucr/ short u (u with a crescent [breve] above it) + come = k<ucr/m + color = k<ucr/l"<etil/r + colored = k<ucr/l"<etil/rd + Columbia = k<osl/*l<ucr/m"b<icr/*<adot/ + up = <ucr/p + + <ycr/ y-crescent (y with a crescent [breve] above it) + used mostly for y-endings (supposed to sound similar to <icr/!!) + sounds to me like an unstressed long e + comedy = k<ocr/m"<esl/*d<ycr/ + comely = k<ucr/m"l<ycr/ + liberty = l<icr/b"<etil/r*t<ycr/ + + <ymac/ y-macron (y with a macron above it) + used to represent the long i (stressed) sound, but + examples in pronunciations seem to be absent. It is + found in some foreign words in the etymologies. + + ou the common "ow" sound of "town", "browse" + count = kount + + <nsm/ n-submacron (an n with a macron underneath) + represents the "ng" sound when it occurs before a + consonant + defunct = d<esl/*f<ucr/<nsm/kt" + commingle = k<ocr/m*m<icr/<nsm/"g'l + + <th/ the "th" sound in "mother" + this is represented in the printed work by a th ligature + carouse = k<adot/*rouz" + + zh not a special character, but used to represent the + "si" sound in words like + + decision = d<esl/*s<icr/zh"<ucr/n + + th the usual sound as in thing and thorn + sh the usual as in ship + ch the usual as in chip + N (capital N) represents the nasal "n" sound of the French language + @@ -1,268 +1,268 @@ -File README.DIC
- To accompany the GNU version of the set of files (cide.*) containing
- the electronic version of the
- Collaborative International Dictionary of English.
- (called also GCIDE)
- These files contain Version 0.46 (January 2002)
- * * * * * * * * * * * * * * * * * * * * * * * * * * * *
-
-The dictionary was derived from the
- Webster's Revised Unabridged Dictionary
- Version published 1913
- by the C. & G. Merriam Co.
- Springfield, Mass.
- Under the direction of
- Noah Porter, D.D., LL.D.
-
-and has been supplemented with some of the definitions from
- WordNet, a semantic network created by
- the Cognitive Science Department
- of Princeton University
- under the direction of
- Prof. George Miller
-
-and is being proof-read and supplemented by volunteers from
-around the world. This is an unfunded project, and future
-enhancement of this dictionary will depend on the efforts of
-volunteers willing to help build this free resource into a
-comprehensive body of general information. New definitions
-for missing words or words senses and longer explanatory notes,
-as well as images to accompany the articles are needed. More
-modern illustrative quotations giving recent examples of
-usage of the words in their various senses will be very
-helpful, since most quotations in the original 1913 dictionary
-are now well over 100 years old.
-
- This electronic version is being maintained by World Soul,
-a non-profit organization in Plainfield, NJ. For additional
-information or if you are willing to assist construction of this
-data source, contact:
-
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
- Patrick J. Cassidy | TEL: (908) 561-3416
- World Soul | if no answer, (908) 668-5252
- 735 Belvidere Ave. | FAX: (908) 668-5904
- Plainfield, NJ 07062-2054
- pc@worldsoul.org or cassidy@micra.com
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
-
- * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
-
-GCIDE is free software; you can redistribute it and/or modify
-it under the terms of the GNU General Public License as published by
-the Free Software Foundation; either version 2, or (at your option)
-any later version.
-
-GCIDE is distributed in the hope that it will be useful,
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
-GNU General Public License for more details.
-
-You should have received a copy of the GNU General Public License
-along with this copy of GCIDE; see the file COPYING. If not, write
-to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
-Boston, MA 02111-1307, USA.
- * * * * * * * * * * * * * * * * * * * * *
-
-STRUCTURE OF THE DICTIONARY
----------------------------
- When the archives are unpacked, the main dictionary text of
-the GCIDE will be found in 26 files named "cide.*", where the
-asterisk indicates which letter of the alphabet begins the
-words in each file. For example, file "cide.b" contains words
-beginning with the letter "B". Additional information about the
-tagging conventions and special character symbols are contained in
-ancillary files in this directory more information below). The main
-body of the 1913 dictionary was essentially identical to the edition
-published in 1890, and was republished in 1913 with an appendix
-containing "New Words". The new words of that appendix have been
-integrated into the main file in this version. However, it is important
-to keep in mind that the definitions in this dictionary are in most
-cases over 100 years old. Use them with caution!
- At the bottom of each paragraph in this dictionary, there is a
-bracketed and tagged "source" indicated. This tells from where the
-definition or other text in that paragraph came, as follows:
-
-[<source>1913 Webster</source>]
- = From the original 1890 dictionary.
-[<source>Webster 1913 Suppl.</source>]
- = From the 1913 "New Words" supplement to the Webster.
-[<source>WordNet 1.5</source>]
- = From the WordNet on-line semantic network.
-[<source>Century Dict. 1906.</source>]
- = From the Century Dictionary published in 1906, especially from
- the "proper Names" supplement (volume IX).
- published
-[<source>XXX</source>]
- = Added by one of the volunteers.
-
- The original definitions have been tagged and in some cases
-reformatted or slightly rearranged. If substantive information
-is added from a second source, usually the additional source is
-also noted, as in:
-[<source>Webster 1913 Suppl.</source> + <source>WordNet 1.5</source>]
-
- A list of the ancillary files related to the GCIDE is appended at
-the bottom of this "readme.dic" file.
- This version is tagged with SGML-like tags of the form <pos>...</pos>
-so that the original typography (italics, bold, block quotes) can be
-reproduced. A list of the most important tags for fields in the
-dictionary is given below. The tags also serve the more important
-function of allowing the information content to be conveniently imported
-into computer programs or databases. The set of tags used is described
-in the accompanying file "tagset.web". ***NOTE*** the paragraph tags
-<p>...</p> do *not* always nest properly with certain other tags, such
-as <note> and <cs> ("collocation section"), which in some cases span
-multiple paragraphs. If you are using a tag parser which detects
-improper nesting, you should first either delete the paragraph
-tags or convert them to non-tag symbols, or, if possible, set the
-parser to ignore the <p>...</p> tags.
- The unusual characters (such as Greek or the European accented
-characters, as well as special characters used in the pronunciations)
-are described in the accompanying file "webfont.asc". Some information
-on the pronunciation system used may be found by viewing the files
-"wxxvii.jpg" and "pronunc.jpg" with a GIF viewer (or any web browser),
-and additional explanations of pronunciation are in the file
-"pronunc.web".
- Each paragraph of the original text is enclosed within tags of
-the form <p> . . . </p>. Within these paragraphs are no line
-breaks, and some of the paragraphs are over 12,000 characters long.
-These lines are too long to be handled by the vi editor, and probably
-by some other text editors. At some points, embedded line breaks within
-a "paragraph" are marked by a <br/ "entity". The file can therefore
-be converted, if necessary, to a form with shorter lines, and subsequently
-reconverted back to the form having one line per paragraph.
-
- If additional line breaks are added, then in order remove the
-line breaks and reconstruct the original paragraphs, so that the
-page width can be adjusted, perform the following manipulations:
- (1) convert each line break (cr-lf combination) to a space.
- (2) convert the string "</p> " (</p> followed by two spaces)
- to </p> followed by two line breaks (cr-lf combinations)
- (3) convert the string "<br/ " (<br/ followed by one space)
- to <br/ followed by one line break (cr-lf).
-There will be some "lines" (paragraphs) with over 12,000 characters,
-which may give trouble to some simple text editors.
- A more sophisticated formatting of spaces within paragraphs may
-require the use of the fully-tagged master files. If you have
-a need for these files, contact Patrick Cassidy: cassidy@micra.com.
- The approximate beginning of each page is marked by an SGML
-comment of the form <-- p. 345 -->. (The exact beginning was in some
-cases in the middle of a paragraph, which we decided was not a
-good location for these page-number comments, so the page number
-was usually moved to the next paragraph break). Pages which have
-been proofread by volunteers (e.g., with initials VOL) will have a
-note within that page comment: <-- p. 345 pr=VOL -->. Pages which have
-not been proofread yet (most of them) will have varying numbers of
-typographical errors in them. We still (January 2002) need
-proofreaders to get the errors out of these dictionary files.
-
-***********************************************************************
-** WARNING!!! **
-***********************************************************************
-
- This version is only a first typing, and has numerous typographic
-errors, including errors in the field-marks. In addition, the user must
-keep in mind that this text is very old and will contain numerous
-obsolete, inaccurate, and perhaps offensive statements, which are
-included solely because this work is intended to reproduce accurately
-this historically interesting classic reference work. This text should
-not be relied upon as an accurate source of information, as in many
-cases it represents the state of knowledge around 1890. The text is
-provided "as is", and the user must accept responsibility for all
-consequences of its use. Please refer to the header of each file and
-the GNU public license. If these conditions of use are unacceptable,
-please do not use these texts.
-************************************************************************
-************************************************************************
- This electronic dictionary is also made available as a potential
-starting point for development of a modern comprehensive encyclopedic
-dictionary, to be accessible freely on the internet, and developed by the
-efforts of all individuals willing to help build a large and freely
-available knowledge base. A large number of collaborators are needed to
-bring this dictionary to a more accurate, more modern, and more useful
-state. Anyone willing to assist in any way in constructing such a
-knowledge base should contact Patrick Cassidy (see above). All reports
-of errors will be gratefully received, and should also be transmitted to
-PC at: pc@worldsoul.org.
-
-In addition to the main text of the dictionary, additional
-explanatory material about this version of the dictionary is available
-in the ancillary files:
-
-=====================================================================
-COPYING 18,321 11-03-99 1:13a COPYING
-README DIC 13,775 01-17-02 11:48p readme.dic
-WEBFONT ASC 35,234 12-12-01 3:27p WEBFONT.ASC
-TAGSET WEB 55,843 08-16-01 1:16p TAGSET.WEB
-PRONUNC WEB 14,312 06-18-00 3:02p PRONUNC.WEB
-PRONUNC JPG 2,569,796 06-18-00 3:11p PRONUNC.JPG
-SYMBOLS JPG 144,716 06-18-00 3:13p SYMBOLS.JPG
-WXXVII JPG 1,188,380 06-18-00 3:19p WXXVII.JPG
-==================================================================
-
-
-Most important tags used in the GCIDE:
-<hw> tags the headword
-<pr> pronunciation
-<pos> part of speech
-<ety> etymology
-<ets> "source" word within an <ety> field, usually foreign words
-<fld> field of knowledge (e.g. Med. = medicine)
-<def> definition
-<cs> collocation section (containing word combinations)
-<col> collocation entry (word combination)
-<cd> collocation definition
-<as> illustrations of usage (within a <def>. . . </def> field)
-<au> authority for a definition, or author of a quotation
-<q> illustrative quotation -- in block quote format
-<au> author of an illustrative <q> quotation
-<altname> alternative name for the headword -- essentially a synonym
-<asp> alternative spelling of the headword
-<syn> list of synonyms for the headword
-<p> paragraph
-<b> bold type
-<it> italic type
-
-For other tags, see the file "tagset.web"
-
-
-============================================================
- OTHER VERSIONS OF THE DICTIONARY
-=============================================================
-
- There are several other derivative versions of this dictionary
-on the internet, in some cases reformatted or provided with an
-interface. Those that I am aware of are:
-
-(1) Project Gutenberg
----------------------
- In the extext96 directory of Project Gutenberg (www.prairienet.org)
-there is a version of the original 1913 dictionary, which is in
-the **public domain**. The main files are in the directory etext96,
-and sre labeled pgw050**.***. The tags for that version are a subset
-of those used in this GNU version.
-
-(2) The DICT development group
-------------------------------
-This group has created a program to index and search this dictionary.
-The program can be downloaded and used locally, but at present
-is available only in a Unix-compatible executable version.
-See their web site at http://www.dict.org.
-
-(3) The University of Chicago ARTFL project
----------------------------------------------
-Mark Olsen and Gavin LaRowe at the University of Chicago have
-converted the original 1913 dictionary to HTML and have provided an
-interface allowing search of the headwords. When the supplemented
-version has developed sufficiently to warrant the effort, a
-similar searchable version may be posted there as well. The
-search page is at:
- http://humanities.uchicago.edu/forms_unrest/webster.form.html
-
-That page will provide links to other ARTFL projects and contact
-information for the ARTFL group, who alone can provide information
-about the HTML version or interface.
-
-
- -- PJC
+File README.DIC + To accompany the GNU version of the set of files (cide.*) containing + the electronic version of the + Collaborative International Dictionary of English. + (called also GCIDE) + These files contain Version 0.46 (January 2002) + * * * * * * * * * * * * * * * * * * * * * * * * * * * * + +The dictionary was derived from the + Webster's Revised Unabridged Dictionary + Version published 1913 + by the C. & G. Merriam Co. + Springfield, Mass. + Under the direction of + Noah Porter, D.D., LL.D. + +and has been supplemented with some of the definitions from + WordNet, a semantic network created by + the Cognitive Science Department + of Princeton University + under the direction of + Prof. George Miller + +and is being proof-read and supplemented by volunteers from +around the world. This is an unfunded project, and future +enhancement of this dictionary will depend on the efforts of +volunteers willing to help build this free resource into a +comprehensive body of general information. New definitions +for missing words or words senses and longer explanatory notes, +as well as images to accompany the articles are needed. More +modern illustrative quotations giving recent examples of +usage of the words in their various senses will be very +helpful, since most quotations in the original 1913 dictionary +are now well over 100 years old. + + This electronic version is being maintained by World Soul, +a non-profit organization in Plainfield, NJ. For additional +information or if you are willing to assist construction of this +data source, contact: + +=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= + Patrick J. Cassidy | TEL: (908) 561-3416 + World Soul | if no answer, (908) 668-5252 + 735 Belvidere Ave. | FAX: (908) 668-5904 + Plainfield, NJ 07062-2054 + pc@worldsoul.org or cassidy@micra.com +=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= + + * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * + +GCIDE is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2, or (at your option) +any later version. + +GCIDE is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with this copy of GCIDE; see the file COPYING. If not, write +to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, +Boston, MA 02111-1307, USA. + * * * * * * * * * * * * * * * * * * * * * + +STRUCTURE OF THE DICTIONARY +--------------------------- + When the archives are unpacked, the main dictionary text of +the GCIDE will be found in 26 files named "cide.*", where the +asterisk indicates which letter of the alphabet begins the +words in each file. For example, file "cide.b" contains words +beginning with the letter "B". Additional information about the +tagging conventions and special character symbols are contained in +ancillary files in this directory more information below). The main +body of the 1913 dictionary was essentially identical to the edition +published in 1890, and was republished in 1913 with an appendix +containing "New Words". The new words of that appendix have been +integrated into the main file in this version. However, it is important +to keep in mind that the definitions in this dictionary are in most +cases over 100 years old. Use them with caution! + At the bottom of each paragraph in this dictionary, there is a +bracketed and tagged "source" indicated. This tells from where the +definition or other text in that paragraph came, as follows: + +[<source>1913 Webster</source>] + = From the original 1890 dictionary. +[<source>Webster 1913 Suppl.</source>] + = From the 1913 "New Words" supplement to the Webster. +[<source>WordNet 1.5</source>] + = From the WordNet on-line semantic network. +[<source>Century Dict. 1906.</source>] + = From the Century Dictionary published in 1906, especially from + the "proper Names" supplement (volume IX). + published +[<source>XXX</source>] + = Added by one of the volunteers. + + The original definitions have been tagged and in some cases +reformatted or slightly rearranged. If substantive information +is added from a second source, usually the additional source is +also noted, as in: +[<source>Webster 1913 Suppl.</source> + <source>WordNet 1.5</source>] + + A list of the ancillary files related to the GCIDE is appended at +the bottom of this "readme.dic" file. + This version is tagged with SGML-like tags of the form <pos>...</pos> +so that the original typography (italics, bold, block quotes) can be +reproduced. A list of the most important tags for fields in the +dictionary is given below. The tags also serve the more important +function of allowing the information content to be conveniently imported +into computer programs or databases. The set of tags used is described +in the accompanying file "tagset.web". ***NOTE*** the paragraph tags +<p>...</p> do *not* always nest properly with certain other tags, such +as <note> and <cs> ("collocation section"), which in some cases span +multiple paragraphs. If you are using a tag parser which detects +improper nesting, you should first either delete the paragraph +tags or convert them to non-tag symbols, or, if possible, set the +parser to ignore the <p>...</p> tags. + The unusual characters (such as Greek or the European accented +characters, as well as special characters used in the pronunciations) +are described in the accompanying file "webfont.asc". Some information +on the pronunciation system used may be found by viewing the files +"wxxvii.jpg" and "pronunc.jpg" with a GIF viewer (or any web browser), +and additional explanations of pronunciation are in the file +"pronunc.web". + Each paragraph of the original text is enclosed within tags of +the form <p> . . . </p>. Within these paragraphs are no line +breaks, and some of the paragraphs are over 12,000 characters long. +These lines are too long to be handled by the vi editor, and probably +by some other text editors. At some points, embedded line breaks within +a "paragraph" are marked by a <br/ "entity". The file can therefore +be converted, if necessary, to a form with shorter lines, and subsequently +reconverted back to the form having one line per paragraph. + + If additional line breaks are added, then in order remove the +line breaks and reconstruct the original paragraphs, so that the +page width can be adjusted, perform the following manipulations: + (1) convert each line break (cr-lf combination) to a space. + (2) convert the string "</p> " (</p> followed by two spaces) + to </p> followed by two line breaks (cr-lf combinations) + (3) convert the string "<br/ " (<br/ followed by one space) + to <br/ followed by one line break (cr-lf). +There will be some "lines" (paragraphs) with over 12,000 characters, +which may give trouble to some simple text editors. + A more sophisticated formatting of spaces within paragraphs may +require the use of the fully-tagged master files. If you have +a need for these files, contact Patrick Cassidy: cassidy@micra.com. + The approximate beginning of each page is marked by an SGML +comment of the form <-- p. 345 -->. (The exact beginning was in some +cases in the middle of a paragraph, which we decided was not a +good location for these page-number comments, so the page number +was usually moved to the next paragraph break). Pages which have +been proofread by volunteers (e.g., with initials VOL) will have a +note within that page comment: <-- p. 345 pr=VOL -->. Pages which have +not been proofread yet (most of them) will have varying numbers of +typographical errors in them. We still (January 2002) need +proofreaders to get the errors out of these dictionary files. + +*********************************************************************** +** WARNING!!! ** +*********************************************************************** + + This version is only a first typing, and has numerous typographic +errors, including errors in the field-marks. In addition, the user must +keep in mind that this text is very old and will contain numerous +obsolete, inaccurate, and perhaps offensive statements, which are +included solely because this work is intended to reproduce accurately +this historically interesting classic reference work. This text should +not be relied upon as an accurate source of information, as in many +cases it represents the state of knowledge around 1890. The text is +provided "as is", and the user must accept responsibility for all +consequences of its use. Please refer to the header of each file and +the GNU public license. If these conditions of use are unacceptable, +please do not use these texts. +************************************************************************ +************************************************************************ + This electronic dictionary is also made available as a potential +starting point for development of a modern comprehensive encyclopedic +dictionary, to be accessible freely on the internet, and developed by the +efforts of all individuals willing to help build a large and freely +available knowledge base. A large number of collaborators are needed to +bring this dictionary to a more accurate, more modern, and more useful +state. Anyone willing to assist in any way in constructing such a +knowledge base should contact Patrick Cassidy (see above). All reports +of errors will be gratefully received, and should also be transmitted to +PC at: pc@worldsoul.org. + +In addition to the main text of the dictionary, additional +explanatory material about this version of the dictionary is available +in the ancillary files: + +===================================================================== +COPYING 18,321 11-03-99 1:13a COPYING +README DIC 13,775 01-17-02 11:48p readme.dic +WEBFONT ASC 35,234 12-12-01 3:27p WEBFONT.ASC +TAGSET WEB 55,843 08-16-01 1:16p TAGSET.WEB +PRONUNC WEB 14,312 06-18-00 3:02p PRONUNC.WEB +PRONUNC JPG 2,569,796 06-18-00 3:11p PRONUNC.JPG +SYMBOLS JPG 144,716 06-18-00 3:13p SYMBOLS.JPG +WXXVII JPG 1,188,380 06-18-00 3:19p WXXVII.JPG +================================================================== + + +Most important tags used in the GCIDE: +<hw> tags the headword +<pr> pronunciation +<pos> part of speech +<ety> etymology +<ets> "source" word within an <ety> field, usually foreign words +<fld> field of knowledge (e.g. Med. = medicine) +<def> definition +<cs> collocation section (containing word combinations) +<col> collocation entry (word combination) +<cd> collocation definition +<as> illustrations of usage (within a <def>. . . </def> field) +<au> authority for a definition, or author of a quotation +<q> illustrative quotation -- in block quote format +<au> author of an illustrative <q> quotation +<altname> alternative name for the headword -- essentially a synonym +<asp> alternative spelling of the headword +<syn> list of synonyms for the headword +<p> paragraph +<b> bold type +<it> italic type + +For other tags, see the file "tagset.web" + + +============================================================ + OTHER VERSIONS OF THE DICTIONARY +============================================================= + + There are several other derivative versions of this dictionary +on the internet, in some cases reformatted or provided with an +interface. Those that I am aware of are: + +(1) Project Gutenberg +--------------------- + In the extext96 directory of Project Gutenberg (www.prairienet.org) +there is a version of the original 1913 dictionary, which is in +the **public domain**. The main files are in the directory etext96, +and sre labeled pgw050**.***. The tags for that version are a subset +of those used in this GNU version. + +(2) The DICT development group +------------------------------ +This group has created a program to index and search this dictionary. +The program can be downloaded and used locally, but at present +is available only in a Unix-compatible executable version. +See their web site at http://www.dict.org. + +(3) The University of Chicago ARTFL project +--------------------------------------------- +Mark Olsen and Gavin LaRowe at the University of Chicago have +converted the original 1913 dictionary to HTML and have provided an +interface allowing search of the headwords. When the supplemented +version has developed sufficiently to warrant the effort, a +similar searchable version may be posted there as well. The +search page is at: + http://humanities.uchicago.edu/forms_unrest/webster.form.html + +That page will provide links to other ARTFL projects and contact +information for the ARTFL group, who alone can provide information +about the HTML version or interface. + + + -- PJC @@ -1,1060 +1,1060 @@ - FIELD MARKS FOR WEBSTER 1913 and CIDE
- =====================================
-Tagset.web:
- Explanations of the tags used to mark the Webster 1913 dictionary
-and the CIDE (Collaborative International Dictionary of English).
-Note that the list of tags used to mark the public domain version
-of this dictionary is shorter than the full set described here.
- If any tag is not listed here, it is either (1) one of the
-"point" (font size) or "type" (font style) tags, which should be self-explanatory; or
- (2) Is a functional field with no effect on the typography.
-
-Last modified March 12, 1999.
- For questions, contact:
- Patrick Cassidy cassidy@micra.com
- 735 Belvidere Ave.
- Plainfield, NJ 07062
- (908) 561-3416 or (908) 668-5252
--------------------------------------------------------------
-A separate file, webfont.asc, contains the list of the individual
-non-ASCII characters represented by either higher-order hexadecimal
-character marks (e.g., \'94, for o-umlaut) or by entity tags
-(e.g., <root/, for the square root symbol.)
---------------------------------------------------------------
- Use of tags:
- In the MICRA electronic version of the 1913 Webster, each part of
-the entry headed by an entry word ("headword") is labeled so that no
-part of the entry except some punctuation marks should be found
-outside of all fields, i.e. every character should be within some tagged
-field. In the following description, the word "segment" usually refers to
-a major part of an entry such as an etymology or a definition or a
-collocation segment or a usage block, containing more than one field.
-The term "field" may also be used similarly to "segment", but may also
-denote single-word fields, such as an alternative spelling, labeled <asp>.
-
- Note: The tags on this list are similar in structure to SGML tags. Each
-tag on this list marks a field; each field opens with a tagname between
-angle brackets thus: <tagname>, and closes with a similar tag containing
-the forward slash thus: </tagname>. No tags are used without closing
-tags. Thus the HTML <BR> to indicate a line break is symbolized
-here as an entity, <br/, and every <p> has a corresponding </p>.
- The absence of an end-field tag, or the presence of an end-field tag
-without a prior begin-field tag constitutes a typographical error, of which
-there may be a significant number. Any errors detected should be brought
-to the attention of PJC or the appropriate editor.
- Most of the tagged fields are presented in the text in italic type,
-with a number of exceptions. Where a word is contained within more than
-one field, the innermost field determines the font to be used. Wherever
-recognizable functional fields were found, an attempt was made to tag the
-field with a functional mark, but in many cases, words were italicised only
-to represent the word itself as a discourse entity, and in some such cases,
-the "italic" mark <it> was used, implying nothing regarding functionality
-of the word. The base font is considered "plain". Where an italic field
-is indicated, parentheses or brackets within the field are not italicised.
- Where no font is specified for a tag, the tag is merely a functional
-division, and was printed in plain font unless otherwise tagged. This type
-of segment is marked by an asterisk (*) where the font name would be.
- The size of the "plain" font in the original text is about 1.6 mm for
-the height of capitalized letters.
-=============================================================
-Explicit typographical tags:
- These were used where the purpose of a different font was merely to
-distinguish a word from the body of the text, and no explicit functional
-tag seemed apropriate.
------------------------------------
-Tag Font
------------------------------------
-Explicit formatting tags:
-. . . . . . . . . . . . . . . . . .
-<plain> plain font (that used in the body of a definition) --
- normally not marked, except within fields of
- a different front.
-<it> italic (in master files)
-<i> italic (for use in HTML presentation)
-<bold> bold (in master files)
-<b> bold (for use in HTML presentation)
-<colf> bold, Collocation font. Same font as used in collocations.
- smaller This is used only in the list of "un-" words not
- by 1 point actually defined in the dictionary. Probably could be
- replaced by a segment mark for the entire list!
- The "un-" words should be indexed as headwords.
-
-<ct> bold Same as <colf>, a font similar to that used in
- collocations. However, this tag is used in a table
- and could be set to a different font.
-
-<h1> * HTML tag -- largest heading font.
-
-<h2> * HTML tag -- second largest heading font.
-
-<headrow> * Marks a Row title in a table.
-
-<hwf> Font the same as the headword <hw>, though the field is
- not a headword. Used only once.
-
-<mitem> * Multiple items, a set of items in a table.
-<point ...> A series of point size markers, many unique.
-<point1.5> * One of the tags of the form <point**> where **
-<point6> represents the typographic point size of the
- enclosed text.
-<pre> An HTML tag indicating that the enclosed text is
- of teletype form, preformatted in a uniform-spaced
- font.
-<sc> small caps (used mostly for "a. d.", "b. c.")
- This is the same font a <er>, but has no functional
- or semantic significance
-<str> group of table data elements in a table
-<sub> subscript, like <subs>
-<subs> subscript
-<sups> superscript
-<supr> superscript
-<sansserif> Sans-serif font
-<stypec> Bold (collocation font) and also a subtype.
-<tt> HTML tage -- teletype font
-<universbold> A squared bold font without serifs approximating the
- "universe bold" font on the HP Laserjet4, slightly
- larger than the capitals in a definition body. Used
- in expositions describing shapes, such as
- "Y", "T", "U", "X", "V", "F".
-<vertical> Vertically organized column.
-<column1> Vertically organized column -- only part of a table
- which needs to be completed. Used once.
-<...type> A series of tags, many unique, designating certain
- unusual fonts, such as "bourgeoistype" for
- "bourgeois type", in the section on typography.
- Most of these occur only once, in the section on fonts.
-<antiquetype>
-<blacklettertype>
-<boldfacetype>
-<bourgeoistype>
-<boxtype>
-<clarendontype>
-<englishtype>
-<extendedtype>
-<frenchelzevirtype>
-<germantype>
-<gothictype>
-<greatprimertype>
-<longprimertype>
-<miniontype>
-<nonpareiltype>
-<oldenglishtype>
-<oldstyletype>
-<pearltype>
-<picatype>
-<scripttype>
-<smpicatype>
-<typewritertype>
-
-=============================================================
-Tags with semantic content:
-. . . . . . . . . . . . . . . . . . . . . . . . . . .
-<altsp> * Alternative spelling segment. Almost always
- contained within square brackets after the main
- definition segment. Expository words
- such as "Spelled also" are in plain font;
- the actual alternative spelling is marked by
- <asp> ... </asp> tags within this segment.
-
-<ant> italic Antonym.
-
-<asp> italic Alternative spelling. The actual word which is an
- alternative spelling to the headword. These
- are functionally synonyms of the headword. In
- most cases these also occur as headwords, with
- reference to the word where the actual definition
- is found, but not all such words are listed
- separately, particularly if the spelling is
- close enough to the headword to be found at the
- same point in the dictionary. Whether listed
- separately or not, these words should
- be indexed at this location, also.
-
-<au> italic Authority or author. Used where an authority is
- (may be right- given for a definition, and also used for the
- justified. See author, where a quotation within double quotes
- in the section is given in the same paragraph as the
- on formatting). definition. The double quotes are indicated
- by the open-quote (\'bd) and close-quote
- (\'b8). In both cases, it is typically
- right-justified, almost always fitting on
- the same line with the last line of the
- definition or quotation.
- Within collocation segments, it is usually
- used only after quotations, and is not right-
- justified, except occasionally where it
- would be close to the right margin, and then
- apparently is is right-justified. We have
- not explicitly marked those which are
- right-justified, but they can be
- recognized because they are on a line by
- themselves, preceded by two carriage returns.
-
-<bio> * Marks a biography. Should be longer than
- a short mention of who a person was, which
- is typically included as a definition.
-
-<biography> * Same as <bio>
-
-<booki> italic Marks the name of a book, pamphlet, or similar
- document.
-
-<branchof> * A field of knowledge which of which the headword
- is a division.
-
-<caption> * Caption of a figure or table.
-
-<cas> * tags the CAS (Chemical Abstracts Service) registry
- number for a chemical substance.
-
-<causes> italic tags the infectious disease caused by the headword.
- Implied type of the agent is a microorganism, and
- the tag must mark a disease.
-
-<causesp> * Same as <causes> without the italic type.
-<causedbyp> * Same as <causedby> without the italic type.
-
-<causedby> italic inverse of causes: tags the causative agent of an
- infectious disease, which is the headword .
- the tag must mark a microorganism, virus, or
- prion, and the implied type of the headword is
- a disease.
-
-<centered> Used only for The single letter in the headers to each
- letter of the alphabet.
-
-<city> * marks the proper name of a city. Used only
- occasionally and not consistently at this stage.
-
-<cnvto> italic Converted to: used to tag substances which are
- products prepared by conversion from the
- headword. Usually chemicals or complex
- products from mnatuarl materials. Rarely used
- up to 1998.
-
-<colheads> * List of heads for the columns of a table.
-
-<coltitle> * Title of a column in a table.
-
-<comm> * Comment -- differs from <note> in being in-line with
- the definition paragraph. Provides a little
- additional information.
-
-<company> * Name of a company (commercial firm). Compare <org>
-
-<compof> italic Composed of. Tags a substance of which the
- headword is at least partly composed. The
- substance may be particulate, such as
- diatoms composing diatomaceous earth.
-
-<contains> * marks an object contained within the headword.
-
-<contr> italic Contrasting word. Not exactly an antonym, which
- is marked <ant>, but a contrasting word which is
- often introduced as "opposite to" or "contrasts
- with".
-
-<country> * Name of a country (nation) of the world.
-
-<cref> italic Collocation reference. A reference to a collocation.
- Each such collocation should have its own entry,
- marked by <col> ... </col> tags, and these
- references should function as hypertext buttons
- to access that entry.
-
-<date> * A Date, of any type, e.g. <date>Dec. 25</date>.
-
-<datey> * Date-with-year tags a date containing a year.
-
-<def> * definition. The definition may have subfields,
- particularly <as> (an illustrative phrase
- starting with "as" or "thus" and containing
- the headword (or a morphological derivative).
- The <mark>, \'bd...\'b8 quotations (left and
- right double quotes) and <au> fields may be
- found within a definition field, but should
- and usually are located outside the definition
- proper. The marking macro was
- inconsistent in this placement, and the
- exclusion of the <mark>, <au> and quotations
- needs to be completed by the proof-readers.
- Certain definitions contain <pos>
- fields within them, where the headword is
- an irregular derivative of another headword.
- In these cases, the <pos> field follows
- immediately after the <def> tag, and these
- entries do not have a separate <pos> field.
- In such cases, the <pos> field is italic, as
- usual.
-
-<divof> * Division of the headword, usually an organization.
- E. g. a faculty or department of a university,
- or a United Nations agency.
-
-<edi> * Marks an education institution, a subtype of
- organization.
-
-<emits> * tags a physical object or form of radiation
- emitted by the headword
-
-<figure> Just a place-holder for illustrations, but seldom used.
-
-<film> italic Marks the name of a movie film.
-
-<fld> italic Field of specialization. Most often used for
- Zoology and Botany, but many "fields of
- specialization" are marked for technical
- terms. The parentheses are usually within this
- field, but are not themselves in italics.
-
-<geog> * Name of a geograpahical region of any size;
- if applicable, the more specific <city>,
- <state>, or <country> are preferred.
-
-<hypen> * Hyperym. Points to the hypernym from WordNet 1.5
- Initially, used only for entries extracted
- from WordNet 1.5. Not present in the original
- 1913 version.
-
-<illu> * Illustrative usage -- mostly from WordNet, and placed
- outside the definition, in contrast to <as> usage.
- These should be converted to <as>...</as> illustrative
- usage format for consistency.
-
-<illust> * Illustration place-holder. Seldom used.
-<img> * HTML usage -- points to an image file, usually
- .gif or .jpg. These have no closing tag, and
- will appear as errors in parsing.
-<intensi> * Points to a word whose meaning is an intensified
- form of the headword. Taken from WordNet
- tags, used with some adjectives from WordNet
-<item> * Designates one item in a row of a table. Used only when
- intervening spaces do not serve properly as natural
- field separaters.
-<itran> italic Translation into a foreign (non-English) language
- of the previous word in the text -- italic font.
- (<sig> is a translation into English)
-<itrans> italic Same as <itran>
-<jour> * Title of a journal (periodical).
-<matrix> * Always a filled rectangular array.
-<matrix2x5> * A 2x5 matrix (2 rows by 5 columns).
-<mstypec> * Multiple synonymous subtypes -- used in
- def. of "grass".
-<mtable> * Multiple table, encloses <table> figures.
-<musfig> * Music figure. Only in a note under the entry "Figure",
- the two numbers of each such field
- are bold, 20 point type, stacked as in a fraction with
- a bar between them, but also having a horizontal stroke
- midway through each numeral. Unique to this entry.
-<p> * paragraph tag, used always in pairs. Line breaks may
- be embedded inside the paragraphs.
-<person> * marks the proper name of a person. Used only
- occasionally, but should be used more frequently
- for cases where first names are abbreviated,
- to reduce ambiguity of the period for automatic
- analysis. Where a title is given, prefixed
- or postfixed, it is included in this tag.
-
-<persfn> * marks the name of a person, when only one name
- (usually the last name) is given. Not used
- consistently where it should be.
-
-<publ> * Marks the name of a publication other than book,
- which is marked by <booki>. It is often a
- magazine or journal.
-<qpers> * Tags the name of a person who is speaking,
- within a quotation.
-<qperson> Same as <qpers>
-<cp> * Collocation, plain text -- used to tag phrases that
- should be parsed as a unit, but has no typographical
- significance.
-<qau> italic Always right-justified, as described for <au>.
-<ref> * A reference to a word in the vocabulary.
-<refs> * Marks the set of references used for a longer article
- such as a biography.
-<river> * Marks the name of a river -- a proper name
-<rj> * Right justified
-<row> * Designates a row in a table.
-<state> * Name of a geopolitical state, the first subdivision of
- a country. Includes, e.g. Canadian provinces.
-<subtypes> * Lists subtypes of the headword.
-<sup> * superscript
-<supr> * Supra. The two parts of each such field
- are stacked, one over the other, *without* a
- horizontal bar between (as in a fraction).
- Used only in one entry, for a musical notation.
-<table> * Always a filled rectangular array, having <row> and <item>
- elements.
-<td> * Table datum - one cell in a table
-<th> * Table header
-<tradename> * Tags a commercial Trade name
-<ttitle> * Table title (Larger than normal font)
-====================================================================
-
-Functional Tags
---------------------------------------------------------------------
-Tag Font Meaning
- (Comparatives are relative to the plain font.)
------------------------------------------------------------------------
-<-- --> * Comment, not a tag. These segments should be deleted
- from the written or printed text.
- Page numbers of the original text are indicated
- within such comments; these may be left in, if
- desired.
-
-<! !> * HTML-style comment. Used to indicate page numbers
- in the public domain version.
-
-<adjf> small caps Tags for the actual adjective or adverb
- comparatives or superlatives. Should be
- indexed. See also conjf (verbs) and
- decf (nouns).
-
-<altname> italic Alternative name. Usually for plants or animals,
- but also used for other cases where words
- are introduced by "also called", "called also",
- "formerly called". These are functionally
- *synonyms* for that word-sense.
-
-<altnpluf> italic Same as <altname>, but the marked word is a
- plural form, whereas the headword is singular.
-
-<amorph> * Adjective morphological segment, primarily
- the comparative and superlative forms.
- The occasional adverb morphology is
- also tagged this way.
-
-<as> * A segment occurring within the definitional
- sentence, providing an example of usage of
- the headword. Not conceptually a part of the
- actual definition.
-
-<cd> smaller spacing Collocation definition. Similar in structure
- to headword definitions (the <def> field). May
- contain an <as> field. Plain type, but with
- closer spacing than main definitions.
-
-<col> bold, Collocation. A word combination containing the
- smaller by headword (or a morphological derivative).
- 1 point The collocations do not have an explicitly
- marked part of speech.
- See also <ecol>, tagging embedded collocations.
-
-<colp> Collocation, no typographic significance.
- Used to mark a word combination defined in
- the dictionary without affect on font.
-
-<conjf> small caps The conjugated (non-infinitive) forms of
- verbs. imp. & p. p. is common, as well as
- p. pr. & vb. n. Irregular variants of
- these are less common. Words in this
- field perhaps should be indexed.
-
-<cs> smaller Collocation segment. The font and size is
- vertical normal in a cs, but the spacing between lines
- spacing is smaller (0.9 mm between lower-case letters,
- rather than 1.1 mm in the main body of the
- definition). For an on-line dictionary,
- reproducing this typography is probably
- pointless.
-
-<decf> small caps The actual morphological variants of nouns or
- pronouns. Should be indexed.
-
-<ecol> * Embedded Collocation. A word combination
- containing the headword (or a morphological
- derivative, embedded within a definition
- without a separate definitin of its own.
- These collocations should be defined
- implicitly by the text of the definition in
- which they are embedded.
- See also <col>, tagging explicitly defined
- collocations.
-<er> Small Caps Entry reference. References to headwords
- within the "etymology" section are in small
- caps. Such references also occur
- in the body of definitions, and in "usage"
- segments.
- Such entry references should function as hypertext
- buttons to access that entry.
-
-<ety> * Etymology. Always contained within square
- brackets. Normal type is used for explanatory
- comments, and italics for the actual words
- (marked <ets>) considered as etymological
- sources.
-
-<ets> italic Etymological source. Words from which the
- headword was derived, or to which it is related.
- The Greek words within an etymology segment
- are invariably etymology sources, and should
- be marked as such, but are not so marked,
- even in the rare cases where the Greek word
- transliteration has been written in.
-
-<etsep> italic Etymological source, being the name of a person
- or geographical location which is the eponym
- for the concept. This is used to distinguish
- eponymous etymologies from others, and can also
- be found in the body of a definition or note,
- not only in the etymology field. Very few
- of the names that should be marked this way
- have actually been so marked, as of version
- 0.42. In cases where such eponymous names
- have not yet been thus marked, they will
- usually be marked by <xex>, the non-semantic
- italic-font marker, or, in etymologies, by
- <ets>.
-
-<ex> italic Example. An example of usage of the headword,
- usually found within an <as> or <note> segment.
-
-<fr> * Frequency of use, ordinal rank. This is used for
- WordNet entries, in which the synonyms
- were ranked in order of frequency of use.
- <fr>1</fr> indicates that the headword is the
- first word on the list of synonyms.
-
-<fu> * First use. A date at or around which the first
- use of this word in writing is recorded.
- Not in the original 1913 Webster, and usu.
- taken from a recent dictionary. Only a few
- such fields have been entered as of version
- 0.41
-
-<grk> transliteration Greek. The Greek words have been transliterated
- using the equivalents explained in the
- file "webfonts.asc". In most cases, the
- transliterations are typical for Greek
- letters, except for theta (transl = q),
- phi (transl. = f), eta (transl. = h), and
- upsilon (transl. = y, whether pronounced
- as y or u). This was to eliminate any
- ambiguity. These words occur primarily
- in etymologies, and to conform to the
- usage of <ets> should also be marked
- by <ets>, but as of version 0.41 they
- are not usually thus marked.
-
-<hw> bold, headword. Each main entry begins with the <hw>
- larger by mark, and ends at the next <hw> mark. The
- 2 points main entries are not otherwise explicitly
- marked as a distinctive field.
- The same word may appear as a headword
- several times, usually as different parts
- of speech, but sometimes with different
- entries as the same part of speech, presumably
- to indicate a different etymology.
- Within the hw field the heavy accent is
- represented by double quote ("), the
- light accent by open-single-quote (`),
- and the short dash separating syllables by
- an asterisk (*). A hyphen (-) is used to
- represent the hyphen of hyphenated words.
-
-<mark> italic, Usage mark. Almost always within square
- brackets, occasionally in parentheses or
- without any bracketing.
- but The most common usage marks,
- explanatory "Obs." = obsolete "R." = rare, "Colloq." =
- may be plain. colloquial, "Prov. Eng." = Provincial England,
- etc. are in italics. Some usage notes are also
- marked with <mark>, but are in plain. For
- simplicity, all words in this field may be
- italic, until additional explicit marks are
- added.
-
-<markp> * A usage mark in plain type (not italic). Found
- within a definition, when there are more than
- one sense-number listed. "Fig." at the head
- of an entry is the most common case.
-
-<mcol> * Multiple collocation. Similar to multiple
- headword, when two or more collocations share
- one definition; however, the two collocations
- are in-line, rather than stacked or justified.
- There may be "or" or "and" words
- (italicised), or an "etc." (plain type)
- within this field. In many cases, the
- <or/ and <and/ entities are used to
- signify the change of font for these words.
-
-<mhw> * Multiple headword. This field is used where
- more than one headword shares a single
- definition. In the dictionary, the
- (usually) two headwords are left-justified
- one below the other in the column, and are
- tied together on the right side of the
- headwords by a long right curly brace.
- This division is strictly functional,
- for analytical purposes, and does not
- affect the typography.
-
-<nmorph> * Noun morphology section. Rarely used, mostly
- for irregular personal pronouns.
-
-<note> * Explanatory note. No explicit font is indicated.
- These segments may be separate, as in the
- separate paragraphs starting <note><hand/,
- or they may just be further explanation within
- (or more usually, following) the main
- definition paragraph. Typographically,
- the notes following the main definition may
- not be distinguishable from additional
- sentences appended to the first sentence
- of a definition.
-
-<plu> * Plural. The "plural" segment starts with a
- "pl." which is italicised, but in this
- segment is not otherwise marked as
- italicised. Other words occurring in this
- segment are plain type. The "pl." can be
- easily explicitly marked if necessary.
-
-<pos> italic Part of speech. Always an abbreviation: e.g.,
- n.; v. i.; v. t.; a.; adv.; pron.; prep.
- Combinations may occur, as "a. & n.".
-
-<plw> small caps Plural word. The actual plural form of the word,
- found within a <plu> segment.
-
-<pr> * pronunciation. The default font is normal, but
- many non-ASCII characters are used.
- The pronunciation field may have more than
- one pronunciation, separated by an "<or/".
- (An "or" here is in italic, and usually is
- represented by the entity <or/).
- There may also be some commentary, such as
- "Fr."(French pronunciation) or "archaic".
- The commentaries are typically italic, and
- should be marked as such. In certain
- pronunciations there is a numbered reference
- to a root form explained in an introductory
- section on pronunciation.
- Very few of the pronunciation fields have
- been filled in. The pronunciation markings use
- a more complicated method than more modern
- dictionaries. It would be interesting to have
- these fields filled in, if there are any
- volunteers willing to do it.
-
-<q> smaller by Quotation. No bracketing quotation marks,
- two points, though occasionally \'bd-\'b8 quotations occur
- centered, within these quotations. These quotations
- Separate tend to be more complete sentences, rather
- paragraph than just phrases, such as are contained
- within quotation marks within the definition
- paragraph.
-
-<qau> italic, Quotation author. Used only for the quotations
- right justified marked with <q> that are centered in their
- own paragraphs.
-
-<qex> italic Quotation example. An example of usage of
- the headword, within quotations marked
- by <q>..</q> tags.
-
-<sd> italic Subdefinition, marked (a), (b), (c), etc. THese are
- finer distinctions of word senses, used
- within numbered word-sense (for main entries),
- and also used for subdefinitions within
- collocation segments, which have no numbering of
- senses. The letter is italic, the parentheses
- are not. This tag is also used to indicate the
- lettered subdefinition when it is referred to
- at another point in the text.
-
-<ship> italic The name of a ship. Rarely used.
-
-<sing> * Singular. Analogous to the <plu> segment, but more
- rarely used, mostly for Indian tribes, which
- are listed in the plural form.
-
-<singw> small caps Singular word. The singular form of the
- plural-form headword.
-
-<sn> bold, Sense number. A headword may have over 20
- larger by different sense numbers. Within each numbered
- 2 points sense there may be lettered sub-senses. See
- the <sd> (sub-definition) field.
-
-<source> italic Source. The author of the definition. Used only
- for definitions not originally present in
- Webster 1913, and not present in the original
- version intended to mimic the 1913 printed
- dictionary. This source is used for each
- word sense, and may differ for different
- senses of a word, especially where a Web1913
- definition was substantially modified, or a
- new word sense was added to a previously
- defined word.
-
-<syn> plain Synonyms. A list of synonyms, sometimes followed
- by a <usage> segment.
-
-<usage> narrower Comparisons of word usage for words which are
- spacing sometimes confused. As with collocation segments,
- font is plain, but spacing is smaller than
- normal definition spacing. This seems pointlessly
- complicating for an on-line display.
-
-<vmorph> * Verb morphology (conjugation) segment, delimited
- by square brackets.
-
-<wordforms> * Morphological derivatives not contained in the
- bracketed segments, as above. For nouns
- derived from adjectives, adverbs from
- adjectives, etc. This segment is usually
- found at the end of the main entry. The
- adverbial and nominalized derivatives at the
- end of a main entry are usually introduced
- by an em dash [represented as two hyphens (--)].
-
-<wf> bold, Same font as <hw>, with accents and syllable
- larger by breaks marked as in the headword.
- 2 points Marks the actual morphological forms within
- a <wordforms> segment; typically, adverbial or
- nominalized form of an adjective.
-
-
-<def2> * Second definition (occasionally, a third definition is
- present). This is used where a second or third
- part of speech with the same orthography is
- placed under one headword. Within this segment,
- there will be a <pos> field, and sometimes
- a <mark> and/or a quotation.
-
-<specif> * "Specifically:" Used to mark the words "specifically",
- "Hence", "as" which are used to introduce a second
- definition typically more specific than the first,
- but in general derived by extension of the initial
- definition. This functions as a warning of multiple
- definitions where the sense-numbers are not explicitly
- used. It is also useful in separate senses, to
- tag polysemous definitions which may be
- specializations or generalizations of the preceding
- definition.
-
-<pluf> italic. Plural form.
- Used exclusively to mark the "pl." abbreviation,
- which introduces a definition for the headword,
- *when used in the plural form*. Not related to
- <plu>, which spells out the plural form, but does
- define it.
-
-<uex> italic Usage example. Used only a few times, within
- <usage> segments.
-
-<isa> italic supertype (hypernym) the inverse of <stype> and
- identical to <hypen> but not derived from WordNet.
-
-<chform> plain, Chemical formula. The letters are plain font,
- numbers but the numbers are subscript. This is mostly
- subscript useful as a functional mark to pinpoint
- chemicals.
-
-<chformi> plain, Chemical formula same as <chform>, but not
- processed specially by the tag-converter program.
- The letters are plain font, but the numbers are
- subscript.
- Used in place of <chform> when the formula has
- a tag inside, which cannot now be processed by the
- <chform> processing routine.
-
-<chname> * chemical name. Used to allow a IUPAC chemical
- name to be processed as a unit in spite of
- embedded dashes, parentheses, and commas.
-
-<see> * "see" reference to related words, outside of the
- main <def>definition</def> field.
-
-<mathex> italic Mathematical expression. In this dictionary,
- essentially all letters (used as variable labels)
- in math expressions are in italic font.
- The "+" and "-" may also appear typographically
- different from elsewhere in the dictionary.
-
-<ratio> italic Also a mathematical expression, but the colon and
- double colon may have a different typography
- than usual., as in <ratio>a:b</ratio>
-
-<singf> italic Singular form. Analogous to <pluf>, to define
- the singular word where the headword is the
- plural form. ** only modifies the word "sing."
-
-<mord> * Morphological derivation. Used to mark the
- entry-reference portions of those
- entries which are defined as morphological
- derivatives (plural, p. p., imp.) of other
- headwords. Used just as an attempt to
- mark and regularize the entry format.
- May be ignored typographically.
-
-<fract> a stack, Fraction. Used for non-numerical fractions
- with which cannot be expressed as a <frac12/-style
- numerator, entity. The forward slash "/" is to be
- horizontal interpreted as a horizontal line separating
- bar, and the numerator and denominator.
- denominator
-
-<exp> superscript, Exponential. Used in mathematical expressions.
- smaller
- font.
-
-<xlati> italic Translation (e.g. of Greek), in the body of a
- definition or etymology. Used only twice.
-
-<tran> italic Word translated: the word in italic is translated
- by a subsequent word. Usually in etymologies, where
- the word translated is not actually etymologically
- related to the headword. The translated word
- is not necessarily English.
-
-<tr> italic translation of the preceding word (or of the
- headword) into English.
-
-<fexp> * Functional expression (math). The function names are
- in plain type, the variables are italic.
-
-<iref> italic Illustration reference. Used ony occasionally, not
- yet (v. 0.41) consistently.
-
-<figref> italic Figure reference.
-
-<figcap> * Figure caption.
-
-<figtitle> * Figure title.
-
-<funct> * tags a mathematical function or expression.
-
-<chreact> * Chemical reaction. Similar to chemical formulas (which
- are contained but not explicitly marked), with
- some other symbols.
-
-<ptcl> italic Verb Particle. Only a few particles were actually
- marked, but in a future version more may be.
-
-<tabtitle> ? Table Title. Used only once.
-
-<title> italic Title of a literary work, movie, opera, musical
- composition, etc. Used rarely but should be
- used in every case, except in <au> references.
-
-<root> * Square root -- differs from the entity <root/,
- which is a square root sign that does not extend
- beyond the number following it. The <root>
- field has a bar (vinvulum) over the expression
- within the field, as well as the square root symbol
- preceding the expression in the field. Used only
- once.
-
-<vinc> * Vinculum. In a mathematical expression, a bar
- extending over the expression within the field.
- Used only once. This apparently serves the same
- function as a parentheses, of causing the
- expression within the field to be evaluated
- and the result used as the (mathematical) value
- of the field.
-
-<nul> plain Nultype. An older version of <plain>.
-
-<cd2> * Second collocation definition. Somewhat similar to
- <def2>. Purely a mark to reduce functional ambiguity,
- with no effect on the typography.
-
-<hypen> * Hypernym. Mark introduced for the World Wide Webster,
- when adding words from WordNet. In most cases, this
- tag marks the WordNet hypernym (for nouns and verbs).
- Where the <au> mark is PJC or includes a +PJC, the
- hypernym may not be the same as in WordNet. The words
- marked by this tag need to be bracketed in some way,
- but this is deferred until the definitions included
- with the hypernyms have been deleted, and other
- disambiguating marks substituted.
-
-<stype> italic Subtype. A functional mark, to point out words which
- are conceptually subtypes of the headword.
-
-<styp> * Subtype. A functional mark, to point out words which
- are conceptually subtypes of the headword, but
- with no *typographical* significance.
-
-<simto> * Similar-to. A semantic relational mark for
- closely related words which are not quite
- synonyms, nor hypernyms, nor hyponyms. Introduced
- with WordNet data.
-
-<conseq> * Consequence. For adjectives, is an attribute which
- or is a consequence of possessing the headword attribute.
-<hascons> Introduced with WordNet data.
-
-<consof> * Consequence of. For adjectives, an attribute which
- implies the headword as a natural consequence.
-
-<part> italic Part. Marks a word designating something which is
- conceptually a part of the headword. Rarely used.
-
-<parts> italic Part, plural form. Same as <part>, but marks the
- name of the part in its plural form.
-
-<partof> * Marks a word designating something of which the headword
- is conceptually a part. Inverse of <part>.
- This is very broad, and may mean constituent or
- separable part.
- Rarely used.
-
-<contxt> * Context. Used only for introductions to definitions,
- giving the context of usage, which are not part
- of the definition proper, as:
- <contxt>when used of a person:</contxt>
-
-<grp> * Marks the name of a group of people not formally
- organized.
-
-<membof> italic marks a group of which the headword is a member.
- This is rarely used, but should be indexed as
- an entry word or phrase.
-
-<member> italic marks a member of a group defined by the headword.
- This is rarely used, but should be indexed as
- an entry word or phrase.
-
-<members> italic Same as <member>, but marks a plural word,
- designating the name of the members in its plural form,
- for lack of ambiguity.
-
-<method> * Designates a special type of definition which
- describes a method for achieving the headword,
-
- used only once for the word "amend". The
- subdefinitions begin with "by".
-
-<corpn> * Name of a business company, corporation, or partnership.
- Started using November 1988. Rare.
-
-<corr> italic Correlative. A word intimately associated with the
- headword in a manner such that one cannot
- appear without the other. NOt exactly an inverse.
-
-<qperson> italic marks the name of a person, quoted in a dialogue.
- Used only in <q> blockquotes as of vers. 0.45.
-
-<org> * marks the name of an organization; sometimes used
- for the names of groups of people not
- formally organized *see also <grp>.
-
-<prod> italic produces. Designates a substance produced by
- a living organism. Rarely used.
-
-<prodp> * produces (plainfont). Designates a substance
- produced by a living organism. Same as <prod>,
- but does not affect font. Rarely used.
-
-<prodby> * produced by. Designates a living organism which
- produces the headword substance. Rarely used.
-
-<prodmac> italic produces. Designates an object or substance produced
- by a machine or process. Rarely used.
-
-<stage> italic life stage of an organism. Used to indicate
- variant forms of an organism defined by the
- headword. Rarely used.
-
-<stageof> * an organism one of whose life stages is the headword.
- Inverse (correlative) of <stage>. Rarely used.
-
-<inv> italic inversely related to headword -- e.g. depository
- is the inverse of depositor; buyer is the inverse of
- seller. Called "correlative" in the Webster 1913 and
- the CIDE. Rarely used.
-
-<methodfor> italic is a method to accomplish the action defined by
- the headword. Rarely used, and only in the
- supplemental section.
-
-<examp> italic example or instance of the headword, where the
- tagged and emphasized word is not a proper subtype.
---------------------------------------
-<p><hw>Pa*ron"y*mous</hw> <p><sn>2.</sn> <def>Having a similar sound, but different orthography and different meaning; -- said of certain words, as <examp>all</examp> and <examp>awl</examp>; <examp>hair</examp> and <examp>hare</examp>, etc.</def><br/
-[<source>1913 Webster</source>]</p>
--------------------------------------
-
-<sfield> * subfield of the headword, which must be a field
- of study or of knowledge
-<stage> italic a stage of life of the headword -- for living things,
- such as insects, whose life stages may take different
- names.
-
-<unit> italic a unit of measure, usually preceded by a number.
- Also used to tag the unit of a measure which is the
- headword.
-
-<uses> italic tags a tool or method used by the headword,
- which is usually some process.
-
-<usedfor> * tags a method or process for which the headword
- is a tool.
-
-<usedby> italic tags a tool or method which uses the headword,
- which is usually a physical object.
-
-<perf> italic performs -- tags a word which is a process or
- activity performed by the headword.
-
-<recipr> italic reciprocal -- used for cases where the tagged word
- is a reciprocal participant in an action, such as
- donor and recipient. The difference between this and
- <inv> inverse has not yet been systematically settled.
- Used seldom, and mostly in the supplemented version.
-
-<sig> italic significance, meaning -- used in definitions where the
- actual meaning is prefixed with commentary explaining
- usage or other attributes of the word, as with
- prefixes or suffixes.
-
-<wns> italic WordNet sense. Where known, the correspondence of the
- sense of an entry with that of WordNet 1.6 is
- given after the definition, in a tag of the
- form: <wns>[wns=3]</wns>, in which the number
- is the numbered sense in WordNet.
-
-<w16ns> italic WordNet version 1.6 sense. See <wns> for
- explanation.
-<wnote> * A note related to usage in the corresponding
- WordNet definition.
- =============================================================
-Biological classifications:
----------------------------
-<spn> italic Species name. Used to mark the taxonomic names
- of living things which are represented in
- italic font in the original printed version.
- Originally, not only species, but genera, orders and
- families were also thus marked. The conversion from
- <spn> to <fam>, <gen>, or <ord> is not completed, and
- <spn> may stil be found marking such groups.
- However, orders and families are also frequently
- mentioned in the original in normal font, and in such
- cases are not marked with any tag. So, this mark
- is not a reliable indicator of all mentions of
- taxonomic names.
-<kingdom> italic Taxonomic biological Kingdom name.
-<phylum> italic Taxonomic phylum name.
-<subphylum> italic Taxonomic subphylum name.
-<class> italic Taxonomic class name.
-<subclass> italic Taxonomic subclass name.
-<ord> italic Taxonomic order name.
- Also used for suborders, initially.
-<subord> italic Taxonomic suborder name.
-<suborder> italic Taxonomic suborder name.
-<fam> italic Taxonomic family name. Also used to tag "tribes".
-<subfam> italic Taxonomic subfamily name.
-<gen> italic Taxonomic genus name.
-<var> italic Variety. Used to mark subspecies or varities below
- the level of species in living organism systematic
- names.
-
-<varn> italic Variety. Used to mark subspecies or varities below
- the level of species in living organism systematic
- names. Duplicative variant of <var>
-
-
+ FIELD MARKS FOR WEBSTER 1913 and CIDE + ===================================== +Tagset.web: + Explanations of the tags used to mark the Webster 1913 dictionary +and the CIDE (Collaborative International Dictionary of English). +Note that the list of tags used to mark the public domain version +of this dictionary is shorter than the full set described here. + If any tag is not listed here, it is either (1) one of the +"point" (font size) or "type" (font style) tags, which should be self-explanatory; or + (2) Is a functional field with no effect on the typography. + +Last modified March 12, 1999. + For questions, contact: + Patrick Cassidy cassidy@micra.com + 735 Belvidere Ave. + Plainfield, NJ 07062 + (908) 561-3416 or (908) 668-5252 +------------------------------------------------------------- +A separate file, webfont.asc, contains the list of the individual +non-ASCII characters represented by either higher-order hexadecimal +character marks (e.g., \'94, for o-umlaut) or by entity tags +(e.g., <root/, for the square root symbol.) +-------------------------------------------------------------- + Use of tags: + In the MICRA electronic version of the 1913 Webster, each part of +the entry headed by an entry word ("headword") is labeled so that no +part of the entry except some punctuation marks should be found +outside of all fields, i.e. every character should be within some tagged +field. In the following description, the word "segment" usually refers to +a major part of an entry such as an etymology or a definition or a +collocation segment or a usage block, containing more than one field. +The term "field" may also be used similarly to "segment", but may also +denote single-word fields, such as an alternative spelling, labeled <asp>. + + Note: The tags on this list are similar in structure to SGML tags. Each +tag on this list marks a field; each field opens with a tagname between +angle brackets thus: <tagname>, and closes with a similar tag containing +the forward slash thus: </tagname>. No tags are used without closing +tags. Thus the HTML <BR> to indicate a line break is symbolized +here as an entity, <br/, and every <p> has a corresponding </p>. + The absence of an end-field tag, or the presence of an end-field tag +without a prior begin-field tag constitutes a typographical error, of which +there may be a significant number. Any errors detected should be brought +to the attention of PJC or the appropriate editor. + Most of the tagged fields are presented in the text in italic type, +with a number of exceptions. Where a word is contained within more than +one field, the innermost field determines the font to be used. Wherever +recognizable functional fields were found, an attempt was made to tag the +field with a functional mark, but in many cases, words were italicised only +to represent the word itself as a discourse entity, and in some such cases, +the "italic" mark <it> was used, implying nothing regarding functionality +of the word. The base font is considered "plain". Where an italic field +is indicated, parentheses or brackets within the field are not italicised. + Where no font is specified for a tag, the tag is merely a functional +division, and was printed in plain font unless otherwise tagged. This type +of segment is marked by an asterisk (*) where the font name would be. + The size of the "plain" font in the original text is about 1.6 mm for +the height of capitalized letters. +============================================================= +Explicit typographical tags: + These were used where the purpose of a different font was merely to +distinguish a word from the body of the text, and no explicit functional +tag seemed apropriate. +----------------------------------- +Tag Font +----------------------------------- +Explicit formatting tags: +. . . . . . . . . . . . . . . . . . +<plain> plain font (that used in the body of a definition) -- + normally not marked, except within fields of + a different front. +<it> italic (in master files) +<i> italic (for use in HTML presentation) +<bold> bold (in master files) +<b> bold (for use in HTML presentation) +<colf> bold, Collocation font. Same font as used in collocations. + smaller This is used only in the list of "un-" words not + by 1 point actually defined in the dictionary. Probably could be + replaced by a segment mark for the entire list! + The "un-" words should be indexed as headwords. + +<ct> bold Same as <colf>, a font similar to that used in + collocations. However, this tag is used in a table + and could be set to a different font. + +<h1> * HTML tag -- largest heading font. + +<h2> * HTML tag -- second largest heading font. + +<headrow> * Marks a Row title in a table. + +<hwf> Font the same as the headword <hw>, though the field is + not a headword. Used only once. + +<mitem> * Multiple items, a set of items in a table. +<point ...> A series of point size markers, many unique. +<point1.5> * One of the tags of the form <point**> where ** +<point6> represents the typographic point size of the + enclosed text. +<pre> An HTML tag indicating that the enclosed text is + of teletype form, preformatted in a uniform-spaced + font. +<sc> small caps (used mostly for "a. d.", "b. c.") + This is the same font a <er>, but has no functional + or semantic significance +<str> group of table data elements in a table +<sub> subscript, like <subs> +<subs> subscript +<sups> superscript +<supr> superscript +<sansserif> Sans-serif font +<stypec> Bold (collocation font) and also a subtype. +<tt> HTML tage -- teletype font +<universbold> A squared bold font without serifs approximating the + "universe bold" font on the HP Laserjet4, slightly + larger than the capitals in a definition body. Used + in expositions describing shapes, such as + "Y", "T", "U", "X", "V", "F". +<vertical> Vertically organized column. +<column1> Vertically organized column -- only part of a table + which needs to be completed. Used once. +<...type> A series of tags, many unique, designating certain + unusual fonts, such as "bourgeoistype" for + "bourgeois type", in the section on typography. + Most of these occur only once, in the section on fonts. +<antiquetype> +<blacklettertype> +<boldfacetype> +<bourgeoistype> +<boxtype> +<clarendontype> +<englishtype> +<extendedtype> +<frenchelzevirtype> +<germantype> +<gothictype> +<greatprimertype> +<longprimertype> +<miniontype> +<nonpareiltype> +<oldenglishtype> +<oldstyletype> +<pearltype> +<picatype> +<scripttype> +<smpicatype> +<typewritertype> + +============================================================= +Tags with semantic content: +. . . . . . . . . . . . . . . . . . . . . . . . . . . +<altsp> * Alternative spelling segment. Almost always + contained within square brackets after the main + definition segment. Expository words + such as "Spelled also" are in plain font; + the actual alternative spelling is marked by + <asp> ... </asp> tags within this segment. + +<ant> italic Antonym. + +<asp> italic Alternative spelling. The actual word which is an + alternative spelling to the headword. These + are functionally synonyms of the headword. In + most cases these also occur as headwords, with + reference to the word where the actual definition + is found, but not all such words are listed + separately, particularly if the spelling is + close enough to the headword to be found at the + same point in the dictionary. Whether listed + separately or not, these words should + be indexed at this location, also. + +<au> italic Authority or author. Used where an authority is + (may be right- given for a definition, and also used for the + justified. See author, where a quotation within double quotes + in the section is given in the same paragraph as the + on formatting). definition. The double quotes are indicated + by the open-quote (\'bd) and close-quote + (\'b8). In both cases, it is typically + right-justified, almost always fitting on + the same line with the last line of the + definition or quotation. + Within collocation segments, it is usually + used only after quotations, and is not right- + justified, except occasionally where it + would be close to the right margin, and then + apparently is is right-justified. We have + not explicitly marked those which are + right-justified, but they can be + recognized because they are on a line by + themselves, preceded by two carriage returns. + +<bio> * Marks a biography. Should be longer than + a short mention of who a person was, which + is typically included as a definition. + +<biography> * Same as <bio> + +<booki> italic Marks the name of a book, pamphlet, or similar + document. + +<branchof> * A field of knowledge which of which the headword + is a division. + +<caption> * Caption of a figure or table. + +<cas> * tags the CAS (Chemical Abstracts Service) registry + number for a chemical substance. + +<causes> italic tags the infectious disease caused by the headword. + Implied type of the agent is a microorganism, and + the tag must mark a disease. + +<causesp> * Same as <causes> without the italic type. +<causedbyp> * Same as <causedby> without the italic type. + +<causedby> italic inverse of causes: tags the causative agent of an + infectious disease, which is the headword . + the tag must mark a microorganism, virus, or + prion, and the implied type of the headword is + a disease. + +<centered> Used only for The single letter in the headers to each + letter of the alphabet. + +<city> * marks the proper name of a city. Used only + occasionally and not consistently at this stage. + +<cnvto> italic Converted to: used to tag substances which are + products prepared by conversion from the + headword. Usually chemicals or complex + products from mnatuarl materials. Rarely used + up to 1998. + +<colheads> * List of heads for the columns of a table. + +<coltitle> * Title of a column in a table. + +<comm> * Comment -- differs from <note> in being in-line with + the definition paragraph. Provides a little + additional information. + +<company> * Name of a company (commercial firm). Compare <org> + +<compof> italic Composed of. Tags a substance of which the + headword is at least partly composed. The + substance may be particulate, such as + diatoms composing diatomaceous earth. + +<contains> * marks an object contained within the headword. + +<contr> italic Contrasting word. Not exactly an antonym, which + is marked <ant>, but a contrasting word which is + often introduced as "opposite to" or "contrasts + with". + +<country> * Name of a country (nation) of the world. + +<cref> italic Collocation reference. A reference to a collocation. + Each such collocation should have its own entry, + marked by <col> ... </col> tags, and these + references should function as hypertext buttons + to access that entry. + +<date> * A Date, of any type, e.g. <date>Dec. 25</date>. + +<datey> * Date-with-year tags a date containing a year. + +<def> * definition. The definition may have subfields, + particularly <as> (an illustrative phrase + starting with "as" or "thus" and containing + the headword (or a morphological derivative). + The <mark>, \'bd...\'b8 quotations (left and + right double quotes) and <au> fields may be + found within a definition field, but should + and usually are located outside the definition + proper. The marking macro was + inconsistent in this placement, and the + exclusion of the <mark>, <au> and quotations + needs to be completed by the proof-readers. + Certain definitions contain <pos> + fields within them, where the headword is + an irregular derivative of another headword. + In these cases, the <pos> field follows + immediately after the <def> tag, and these + entries do not have a separate <pos> field. + In such cases, the <pos> field is italic, as + usual. + +<divof> * Division of the headword, usually an organization. + E. g. a faculty or department of a university, + or a United Nations agency. + +<edi> * Marks an education institution, a subtype of + organization. + +<emits> * tags a physical object or form of radiation + emitted by the headword + +<figure> Just a place-holder for illustrations, but seldom used. + +<film> italic Marks the name of a movie film. + +<fld> italic Field of specialization. Most often used for + Zoology and Botany, but many "fields of + specialization" are marked for technical + terms. The parentheses are usually within this + field, but are not themselves in italics. + +<geog> * Name of a geograpahical region of any size; + if applicable, the more specific <city>, + <state>, or <country> are preferred. + +<hypen> * Hyperym. Points to the hypernym from WordNet 1.5 + Initially, used only for entries extracted + from WordNet 1.5. Not present in the original + 1913 version. + +<illu> * Illustrative usage -- mostly from WordNet, and placed + outside the definition, in contrast to <as> usage. + These should be converted to <as>...</as> illustrative + usage format for consistency. + +<illust> * Illustration place-holder. Seldom used. +<img> * HTML usage -- points to an image file, usually + .gif or .jpg. These have no closing tag, and + will appear as errors in parsing. +<intensi> * Points to a word whose meaning is an intensified + form of the headword. Taken from WordNet + tags, used with some adjectives from WordNet +<item> * Designates one item in a row of a table. Used only when + intervening spaces do not serve properly as natural + field separaters. +<itran> italic Translation into a foreign (non-English) language + of the previous word in the text -- italic font. + (<sig> is a translation into English) +<itrans> italic Same as <itran> +<jour> * Title of a journal (periodical). +<matrix> * Always a filled rectangular array. +<matrix2x5> * A 2x5 matrix (2 rows by 5 columns). +<mstypec> * Multiple synonymous subtypes -- used in + def. of "grass". +<mtable> * Multiple table, encloses <table> figures. +<musfig> * Music figure. Only in a note under the entry "Figure", + the two numbers of each such field + are bold, 20 point type, stacked as in a fraction with + a bar between them, but also having a horizontal stroke + midway through each numeral. Unique to this entry. +<p> * paragraph tag, used always in pairs. Line breaks may + be embedded inside the paragraphs. +<person> * marks the proper name of a person. Used only + occasionally, but should be used more frequently + for cases where first names are abbreviated, + to reduce ambiguity of the period for automatic + analysis. Where a title is given, prefixed + or postfixed, it is included in this tag. + +<persfn> * marks the name of a person, when only one name + (usually the last name) is given. Not used + consistently where it should be. + +<publ> * Marks the name of a publication other than book, + which is marked by <booki>. It is often a + magazine or journal. +<qpers> * Tags the name of a person who is speaking, + within a quotation. +<qperson> Same as <qpers> +<cp> * Collocation, plain text -- used to tag phrases that + should be parsed as a unit, but has no typographical + significance. +<qau> italic Always right-justified, as described for <au>. +<ref> * A reference to a word in the vocabulary. +<refs> * Marks the set of references used for a longer article + such as a biography. +<river> * Marks the name of a river -- a proper name +<rj> * Right justified +<row> * Designates a row in a table. +<state> * Name of a geopolitical state, the first subdivision of + a country. Includes, e.g. Canadian provinces. +<subtypes> * Lists subtypes of the headword. +<sup> * superscript +<supr> * Supra. The two parts of each such field + are stacked, one over the other, *without* a + horizontal bar between (as in a fraction). + Used only in one entry, for a musical notation. +<table> * Always a filled rectangular array, having <row> and <item> + elements. +<td> * Table datum - one cell in a table +<th> * Table header +<tradename> * Tags a commercial Trade name +<ttitle> * Table title (Larger than normal font) +==================================================================== + +Functional Tags +-------------------------------------------------------------------- +Tag Font Meaning + (Comparatives are relative to the plain font.) +----------------------------------------------------------------------- +<-- --> * Comment, not a tag. These segments should be deleted + from the written or printed text. + Page numbers of the original text are indicated + within such comments; these may be left in, if + desired. + +<! !> * HTML-style comment. Used to indicate page numbers + in the public domain version. + +<adjf> small caps Tags for the actual adjective or adverb + comparatives or superlatives. Should be + indexed. See also conjf (verbs) and + decf (nouns). + +<altname> italic Alternative name. Usually for plants or animals, + but also used for other cases where words + are introduced by "also called", "called also", + "formerly called". These are functionally + *synonyms* for that word-sense. + +<altnpluf> italic Same as <altname>, but the marked word is a + plural form, whereas the headword is singular. + +<amorph> * Adjective morphological segment, primarily + the comparative and superlative forms. + The occasional adverb morphology is + also tagged this way. + +<as> * A segment occurring within the definitional + sentence, providing an example of usage of + the headword. Not conceptually a part of the + actual definition. + +<cd> smaller spacing Collocation definition. Similar in structure + to headword definitions (the <def> field). May + contain an <as> field. Plain type, but with + closer spacing than main definitions. + +<col> bold, Collocation. A word combination containing the + smaller by headword (or a morphological derivative). + 1 point The collocations do not have an explicitly + marked part of speech. + See also <ecol>, tagging embedded collocations. + +<colp> Collocation, no typographic significance. + Used to mark a word combination defined in + the dictionary without affect on font. + +<conjf> small caps The conjugated (non-infinitive) forms of + verbs. imp. & p. p. is common, as well as + p. pr. & vb. n. Irregular variants of + these are less common. Words in this + field perhaps should be indexed. + +<cs> smaller Collocation segment. The font and size is + vertical normal in a cs, but the spacing between lines + spacing is smaller (0.9 mm between lower-case letters, + rather than 1.1 mm in the main body of the + definition). For an on-line dictionary, + reproducing this typography is probably + pointless. + +<decf> small caps The actual morphological variants of nouns or + pronouns. Should be indexed. + +<ecol> * Embedded Collocation. A word combination + containing the headword (or a morphological + derivative, embedded within a definition + without a separate definitin of its own. + These collocations should be defined + implicitly by the text of the definition in + which they are embedded. + See also <col>, tagging explicitly defined + collocations. +<er> Small Caps Entry reference. References to headwords + within the "etymology" section are in small + caps. Such references also occur + in the body of definitions, and in "usage" + segments. + Such entry references should function as hypertext + buttons to access that entry. + +<ety> * Etymology. Always contained within square + brackets. Normal type is used for explanatory + comments, and italics for the actual words + (marked <ets>) considered as etymological + sources. + +<ets> italic Etymological source. Words from which the + headword was derived, or to which it is related. + The Greek words within an etymology segment + are invariably etymology sources, and should + be marked as such, but are not so marked, + even in the rare cases where the Greek word + transliteration has been written in. + +<etsep> italic Etymological source, being the name of a person + or geographical location which is the eponym + for the concept. This is used to distinguish + eponymous etymologies from others, and can also + be found in the body of a definition or note, + not only in the etymology field. Very few + of the names that should be marked this way + have actually been so marked, as of version + 0.42. In cases where such eponymous names + have not yet been thus marked, they will + usually be marked by <xex>, the non-semantic + italic-font marker, or, in etymologies, by + <ets>. + +<ex> italic Example. An example of usage of the headword, + usually found within an <as> or <note> segment. + +<fr> * Frequency of use, ordinal rank. This is used for + WordNet entries, in which the synonyms + were ranked in order of frequency of use. + <fr>1</fr> indicates that the headword is the + first word on the list of synonyms. + +<fu> * First use. A date at or around which the first + use of this word in writing is recorded. + Not in the original 1913 Webster, and usu. + taken from a recent dictionary. Only a few + such fields have been entered as of version + 0.41 + +<grk> transliteration Greek. The Greek words have been transliterated + using the equivalents explained in the + file "webfonts.asc". In most cases, the + transliterations are typical for Greek + letters, except for theta (transl = q), + phi (transl. = f), eta (transl. = h), and + upsilon (transl. = y, whether pronounced + as y or u). This was to eliminate any + ambiguity. These words occur primarily + in etymologies, and to conform to the + usage of <ets> should also be marked + by <ets>, but as of version 0.41 they + are not usually thus marked. + +<hw> bold, headword. Each main entry begins with the <hw> + larger by mark, and ends at the next <hw> mark. The + 2 points main entries are not otherwise explicitly + marked as a distinctive field. + The same word may appear as a headword + several times, usually as different parts + of speech, but sometimes with different + entries as the same part of speech, presumably + to indicate a different etymology. + Within the hw field the heavy accent is + represented by double quote ("), the + light accent by open-single-quote (`), + and the short dash separating syllables by + an asterisk (*). A hyphen (-) is used to + represent the hyphen of hyphenated words. + +<mark> italic, Usage mark. Almost always within square + brackets, occasionally in parentheses or + without any bracketing. + but The most common usage marks, + explanatory "Obs." = obsolete "R." = rare, "Colloq." = + may be plain. colloquial, "Prov. Eng." = Provincial England, + etc. are in italics. Some usage notes are also + marked with <mark>, but are in plain. For + simplicity, all words in this field may be + italic, until additional explicit marks are + added. + +<markp> * A usage mark in plain type (not italic). Found + within a definition, when there are more than + one sense-number listed. "Fig." at the head + of an entry is the most common case. + +<mcol> * Multiple collocation. Similar to multiple + headword, when two or more collocations share + one definition; however, the two collocations + are in-line, rather than stacked or justified. + There may be "or" or "and" words + (italicised), or an "etc." (plain type) + within this field. In many cases, the + <or/ and <and/ entities are used to + signify the change of font for these words. + +<mhw> * Multiple headword. This field is used where + more than one headword shares a single + definition. In the dictionary, the + (usually) two headwords are left-justified + one below the other in the column, and are + tied together on the right side of the + headwords by a long right curly brace. + This division is strictly functional, + for analytical purposes, and does not + affect the typography. + +<nmorph> * Noun morphology section. Rarely used, mostly + for irregular personal pronouns. + +<note> * Explanatory note. No explicit font is indicated. + These segments may be separate, as in the + separate paragraphs starting <note><hand/, + or they may just be further explanation within + (or more usually, following) the main + definition paragraph. Typographically, + the notes following the main definition may + not be distinguishable from additional + sentences appended to the first sentence + of a definition. + +<plu> * Plural. The "plural" segment starts with a + "pl." which is italicised, but in this + segment is not otherwise marked as + italicised. Other words occurring in this + segment are plain type. The "pl." can be + easily explicitly marked if necessary. + +<pos> italic Part of speech. Always an abbreviation: e.g., + n.; v. i.; v. t.; a.; adv.; pron.; prep. + Combinations may occur, as "a. & n.". + +<plw> small caps Plural word. The actual plural form of the word, + found within a <plu> segment. + +<pr> * pronunciation. The default font is normal, but + many non-ASCII characters are used. + The pronunciation field may have more than + one pronunciation, separated by an "<or/". + (An "or" here is in italic, and usually is + represented by the entity <or/). + There may also be some commentary, such as + "Fr."(French pronunciation) or "archaic". + The commentaries are typically italic, and + should be marked as such. In certain + pronunciations there is a numbered reference + to a root form explained in an introductory + section on pronunciation. + Very few of the pronunciation fields have + been filled in. The pronunciation markings use + a more complicated method than more modern + dictionaries. It would be interesting to have + these fields filled in, if there are any + volunteers willing to do it. + +<q> smaller by Quotation. No bracketing quotation marks, + two points, though occasionally \'bd-\'b8 quotations occur + centered, within these quotations. These quotations + Separate tend to be more complete sentences, rather + paragraph than just phrases, such as are contained + within quotation marks within the definition + paragraph. + +<qau> italic, Quotation author. Used only for the quotations + right justified marked with <q> that are centered in their + own paragraphs. + +<qex> italic Quotation example. An example of usage of + the headword, within quotations marked + by <q>..</q> tags. + +<sd> italic Subdefinition, marked (a), (b), (c), etc. THese are + finer distinctions of word senses, used + within numbered word-sense (for main entries), + and also used for subdefinitions within + collocation segments, which have no numbering of + senses. The letter is italic, the parentheses + are not. This tag is also used to indicate the + lettered subdefinition when it is referred to + at another point in the text. + +<ship> italic The name of a ship. Rarely used. + +<sing> * Singular. Analogous to the <plu> segment, but more + rarely used, mostly for Indian tribes, which + are listed in the plural form. + +<singw> small caps Singular word. The singular form of the + plural-form headword. + +<sn> bold, Sense number. A headword may have over 20 + larger by different sense numbers. Within each numbered + 2 points sense there may be lettered sub-senses. See + the <sd> (sub-definition) field. + +<source> italic Source. The author of the definition. Used only + for definitions not originally present in + Webster 1913, and not present in the original + version intended to mimic the 1913 printed + dictionary. This source is used for each + word sense, and may differ for different + senses of a word, especially where a Web1913 + definition was substantially modified, or a + new word sense was added to a previously + defined word. + +<syn> plain Synonyms. A list of synonyms, sometimes followed + by a <usage> segment. + +<usage> narrower Comparisons of word usage for words which are + spacing sometimes confused. As with collocation segments, + font is plain, but spacing is smaller than + normal definition spacing. This seems pointlessly + complicating for an on-line display. + +<vmorph> * Verb morphology (conjugation) segment, delimited + by square brackets. + +<wordforms> * Morphological derivatives not contained in the + bracketed segments, as above. For nouns + derived from adjectives, adverbs from + adjectives, etc. This segment is usually + found at the end of the main entry. The + adverbial and nominalized derivatives at the + end of a main entry are usually introduced + by an em dash [represented as two hyphens (--)]. + +<wf> bold, Same font as <hw>, with accents and syllable + larger by breaks marked as in the headword. + 2 points Marks the actual morphological forms within + a <wordforms> segment; typically, adverbial or + nominalized form of an adjective. + + +<def2> * Second definition (occasionally, a third definition is + present). This is used where a second or third + part of speech with the same orthography is + placed under one headword. Within this segment, + there will be a <pos> field, and sometimes + a <mark> and/or a quotation. + +<specif> * "Specifically:" Used to mark the words "specifically", + "Hence", "as" which are used to introduce a second + definition typically more specific than the first, + but in general derived by extension of the initial + definition. This functions as a warning of multiple + definitions where the sense-numbers are not explicitly + used. It is also useful in separate senses, to + tag polysemous definitions which may be + specializations or generalizations of the preceding + definition. + +<pluf> italic. Plural form. + Used exclusively to mark the "pl." abbreviation, + which introduces a definition for the headword, + *when used in the plural form*. Not related to + <plu>, which spells out the plural form, but does + define it. + +<uex> italic Usage example. Used only a few times, within + <usage> segments. + +<isa> italic supertype (hypernym) the inverse of <stype> and + identical to <hypen> but not derived from WordNet. + +<chform> plain, Chemical formula. The letters are plain font, + numbers but the numbers are subscript. This is mostly + subscript useful as a functional mark to pinpoint + chemicals. + +<chformi> plain, Chemical formula same as <chform>, but not + processed specially by the tag-converter program. + The letters are plain font, but the numbers are + subscript. + Used in place of <chform> when the formula has + a tag inside, which cannot now be processed by the + <chform> processing routine. + +<chname> * chemical name. Used to allow a IUPAC chemical + name to be processed as a unit in spite of + embedded dashes, parentheses, and commas. + +<see> * "see" reference to related words, outside of the + main <def>definition</def> field. + +<mathex> italic Mathematical expression. In this dictionary, + essentially all letters (used as variable labels) + in math expressions are in italic font. + The "+" and "-" may also appear typographically + different from elsewhere in the dictionary. + +<ratio> italic Also a mathematical expression, but the colon and + double colon may have a different typography + than usual., as in <ratio>a:b</ratio> + +<singf> italic Singular form. Analogous to <pluf>, to define + the singular word where the headword is the + plural form. ** only modifies the word "sing." + +<mord> * Morphological derivation. Used to mark the + entry-reference portions of those + entries which are defined as morphological + derivatives (plural, p. p., imp.) of other + headwords. Used just as an attempt to + mark and regularize the entry format. + May be ignored typographically. + +<fract> a stack, Fraction. Used for non-numerical fractions + with which cannot be expressed as a <frac12/-style + numerator, entity. The forward slash "/" is to be + horizontal interpreted as a horizontal line separating + bar, and the numerator and denominator. + denominator + +<exp> superscript, Exponential. Used in mathematical expressions. + smaller + font. + +<xlati> italic Translation (e.g. of Greek), in the body of a + definition or etymology. Used only twice. + +<tran> italic Word translated: the word in italic is translated + by a subsequent word. Usually in etymologies, where + the word translated is not actually etymologically + related to the headword. The translated word + is not necessarily English. + +<tr> italic translation of the preceding word (or of the + headword) into English. + +<fexp> * Functional expression (math). The function names are + in plain type, the variables are italic. + +<iref> italic Illustration reference. Used ony occasionally, not + yet (v. 0.41) consistently. + +<figref> italic Figure reference. + +<figcap> * Figure caption. + +<figtitle> * Figure title. + +<funct> * tags a mathematical function or expression. + +<chreact> * Chemical reaction. Similar to chemical formulas (which + are contained but not explicitly marked), with + some other symbols. + +<ptcl> italic Verb Particle. Only a few particles were actually + marked, but in a future version more may be. + +<tabtitle> ? Table Title. Used only once. + +<title> italic Title of a literary work, movie, opera, musical + composition, etc. Used rarely but should be + used in every case, except in <au> references. + +<root> * Square root -- differs from the entity <root/, + which is a square root sign that does not extend + beyond the number following it. The <root> + field has a bar (vinvulum) over the expression + within the field, as well as the square root symbol + preceding the expression in the field. Used only + once. + +<vinc> * Vinculum. In a mathematical expression, a bar + extending over the expression within the field. + Used only once. This apparently serves the same + function as a parentheses, of causing the + expression within the field to be evaluated + and the result used as the (mathematical) value + of the field. + +<nul> plain Nultype. An older version of <plain>. + +<cd2> * Second collocation definition. Somewhat similar to + <def2>. Purely a mark to reduce functional ambiguity, + with no effect on the typography. + +<hypen> * Hypernym. Mark introduced for the World Wide Webster, + when adding words from WordNet. In most cases, this + tag marks the WordNet hypernym (for nouns and verbs). + Where the <au> mark is PJC or includes a +PJC, the + hypernym may not be the same as in WordNet. The words + marked by this tag need to be bracketed in some way, + but this is deferred until the definitions included + with the hypernyms have been deleted, and other + disambiguating marks substituted. + +<stype> italic Subtype. A functional mark, to point out words which + are conceptually subtypes of the headword. + +<styp> * Subtype. A functional mark, to point out words which + are conceptually subtypes of the headword, but + with no *typographical* significance. + +<simto> * Similar-to. A semantic relational mark for + closely related words which are not quite + synonyms, nor hypernyms, nor hyponyms. Introduced + with WordNet data. + +<conseq> * Consequence. For adjectives, is an attribute which + or is a consequence of possessing the headword attribute. +<hascons> Introduced with WordNet data. + +<consof> * Consequence of. For adjectives, an attribute which + implies the headword as a natural consequence. + +<part> italic Part. Marks a word designating something which is + conceptually a part of the headword. Rarely used. + +<parts> italic Part, plural form. Same as <part>, but marks the + name of the part in its plural form. + +<partof> * Marks a word designating something of which the headword + is conceptually a part. Inverse of <part>. + This is very broad, and may mean constituent or + separable part. + Rarely used. + +<contxt> * Context. Used only for introductions to definitions, + giving the context of usage, which are not part + of the definition proper, as: + <contxt>when used of a person:</contxt> + +<grp> * Marks the name of a group of people not formally + organized. + +<membof> italic marks a group of which the headword is a member. + This is rarely used, but should be indexed as + an entry word or phrase. + +<member> italic marks a member of a group defined by the headword. + This is rarely used, but should be indexed as + an entry word or phrase. + +<members> italic Same as <member>, but marks a plural word, + designating the name of the members in its plural form, + for lack of ambiguity. + +<method> * Designates a special type of definition which + describes a method for achieving the headword, + + used only once for the word "amend". The + subdefinitions begin with "by". + +<corpn> * Name of a business company, corporation, or partnership. + Started using November 1988. Rare. + +<corr> italic Correlative. A word intimately associated with the + headword in a manner such that one cannot + appear without the other. NOt exactly an inverse. + +<qperson> italic marks the name of a person, quoted in a dialogue. + Used only in <q> blockquotes as of vers. 0.45. + +<org> * marks the name of an organization; sometimes used + for the names of groups of people not + formally organized *see also <grp>. + +<prod> italic produces. Designates a substance produced by + a living organism. Rarely used. + +<prodp> * produces (plainfont). Designates a substance + produced by a living organism. Same as <prod>, + but does not affect font. Rarely used. + +<prodby> * produced by. Designates a living organism which + produces the headword substance. Rarely used. + +<prodmac> italic produces. Designates an object or substance produced + by a machine or process. Rarely used. + +<stage> italic life stage of an organism. Used to indicate + variant forms of an organism defined by the + headword. Rarely used. + +<stageof> * an organism one of whose life stages is the headword. + Inverse (correlative) of <stage>. Rarely used. + +<inv> italic inversely related to headword -- e.g. depository + is the inverse of depositor; buyer is the inverse of + seller. Called "correlative" in the Webster 1913 and + the CIDE. Rarely used. + +<methodfor> italic is a method to accomplish the action defined by + the headword. Rarely used, and only in the + supplemental section. + +<examp> italic example or instance of the headword, where the + tagged and emphasized word is not a proper subtype. +-------------------------------------- +<p><hw>Pa*ron"y*mous</hw> <p><sn>2.</sn> <def>Having a similar sound, but different orthography and different meaning; -- said of certain words, as <examp>all</examp> and <examp>awl</examp>; <examp>hair</examp> and <examp>hare</examp>, etc.</def><br/ +[<source>1913 Webster</source>]</p> +------------------------------------- + +<sfield> * subfield of the headword, which must be a field + of study or of knowledge +<stage> italic a stage of life of the headword -- for living things, + such as insects, whose life stages may take different + names. + +<unit> italic a unit of measure, usually preceded by a number. + Also used to tag the unit of a measure which is the + headword. + +<uses> italic tags a tool or method used by the headword, + which is usually some process. + +<usedfor> * tags a method or process for which the headword + is a tool. + +<usedby> italic tags a tool or method which uses the headword, + which is usually a physical object. + +<perf> italic performs -- tags a word which is a process or + activity performed by the headword. + +<recipr> italic reciprocal -- used for cases where the tagged word + is a reciprocal participant in an action, such as + donor and recipient. The difference between this and + <inv> inverse has not yet been systematically settled. + Used seldom, and mostly in the supplemented version. + +<sig> italic significance, meaning -- used in definitions where the + actual meaning is prefixed with commentary explaining + usage or other attributes of the word, as with + prefixes or suffixes. + +<wns> italic WordNet sense. Where known, the correspondence of the + sense of an entry with that of WordNet 1.6 is + given after the definition, in a tag of the + form: <wns>[wns=3]</wns>, in which the number + is the numbered sense in WordNet. + +<w16ns> italic WordNet version 1.6 sense. See <wns> for + explanation. +<wnote> * A note related to usage in the corresponding + WordNet definition. + ============================================================= +Biological classifications: +--------------------------- +<spn> italic Species name. Used to mark the taxonomic names + of living things which are represented in + italic font in the original printed version. + Originally, not only species, but genera, orders and + families were also thus marked. The conversion from + <spn> to <fam>, <gen>, or <ord> is not completed, and + <spn> may stil be found marking such groups. + However, orders and families are also frequently + mentioned in the original in normal font, and in such + cases are not marked with any tag. So, this mark + is not a reliable indicator of all mentions of + taxonomic names. +<kingdom> italic Taxonomic biological Kingdom name. +<phylum> italic Taxonomic phylum name. +<subphylum> italic Taxonomic subphylum name. +<class> italic Taxonomic class name. +<subclass> italic Taxonomic subclass name. +<ord> italic Taxonomic order name. + Also used for suborders, initially. +<subord> italic Taxonomic suborder name. +<suborder> italic Taxonomic suborder name. +<fam> italic Taxonomic family name. Also used to tag "tribes". +<subfam> italic Taxonomic subfamily name. +<gen> italic Taxonomic genus name. +<var> italic Variety. Used to mark subspecies or varities below + the level of species in living organism systematic + names. + +<varn> italic Variety. Used to mark subspecies or varities below + the level of species in living organism systematic + names. Duplicative variant of <var> + + diff --git a/WEBFONT.ASC b/WEBFONT.ASC index 591de89..198c0e0 100644 --- a/WEBFONT.ASC +++ b/WEBFONT.ASC @@ -1,603 +1,603 @@ - WEBSTER FONTS
- =============
-
- Fonts for the Webster 1913 Dictionary.
- For version 0.50
- Last edit May 5, 2001
- ______________________________________
- (This file contains some extended ASCII characters, and should be
-transmitted in binary mode)
-----------------------------------------------------------------------
-
- This file describes a modified font for use in visualizing the
-text of the 1913 "Webster's Revised Unabridged Dictionary" (W1913),
-usable for the DOS operating system of IBM-compatible personal computers.
-The electronic version of that dictionary and this font were prepared by
-MICRA, Inc., Plainfield NJ, and are copyrighted (C) 1996 by MICRA, Inc.
-For details of permissions and restrictions on using these files, see
-the accompanying file "readme.web".
- The special characters used in the electronic version of the Webster
-1913 are required for visualizing unusual characters used in the
-etymology and pronunciation fields of the dictionary, in a form
-comparable to the way they appear in the original. Since there are
-more than 256 characters used in that dictionary, not all can be
-represented by single-byte codes, and are instead represented by
-SGML-style "short-form" symbols. (rather than the "entity" format
-"&xx;" The ampersand is used frequently, and we prefer to leave
-the "<" as the only "escape" character) of the type <x/ where x
-is a specific code for the symbol in the dictionary.
-See the "Short Form" section below for details about such characters.
-Note that the symbols used here are in some cases abbreviations
-(for compactness) of the ISO 8879 recommended symbols. If necessary,
-the table below allows simple replacement by alternate encodings.
- This symbol font can be loaded in IBM-compatible (x86) computers
-running the DOS operating system by using the "font.bat" command file
-in the "utils" directory. The fonts files for 8x14 and 8x16 fonts are
-"web14.fnt" and "web16.fnt" respectively.
- For those loading the Webster onto some machine other than an
-IBM-compatible running DOS, it will be necessary to provide a
-translation table, to convert these characters into a code that
-can be handled by that computer. For this reason, I attach an
-"explanation" for each character, for those who cannot view
-the original DOS font.
- The DOS-loadable font does not contain all of the characters needed
-to depict the etymologies or the pronunciations. In addition to an
-absence of several characters used in the pronunciations, no Greek letters are
-included. The Greek words appearing in the etymologies,
-when they are included, will be typed in a
-roman-letter transcription (See section on Greek transcription, below).
-Only a very few Greek words have been thus transcribed as of the
-present version (version 0.41).
- Wherever the typists did not know the character to use, they
-usually inserted a reverse-video question mark (decimal 176).
-This appears in full-ASCII versions as <?/. This mark was used both for
-characters in non-ASCII fonts, and for unreadable characters (i.e.,
-characters smeared in the original or distorted in the copies available
-to the typists. The type in the original was in many places smeared and
-illegible at the left and right page margins; occasionally, small
-parts of words were blotted out by plain white space).
- A character table for the high-order characters appears below.
-Under that is a list and description of most of the special characters
-used in the Webster files.
- Note that there are yet some characters used in the etymologies,
-and some other symbols, which are not in this list. For example, the
-vowels with a double dot *underneath*, e.g. a (as in all) have no representation
-in this character set, and, where explicitly entered in the dictionary,
-are represented by <xdd/ where "x" is the letter, as in "<add/".
-
-ITALICS
--------
- In most places, italic font is represented by the tags <it>...</it>
-surrounding the italic text, or by some other tag which also implies
-italic font. In the pronunciations, however, where italicized vowels
-are used among non-italic and other special characters to indicate
-pronunciation, the special codes <ait/, <eit/, <iit/, <oit/, <uit/,
-are also used to indicate the italicized vowel.
-
-DIACRITICS
--------------
- The European grave and acute accents are represented by the
-standard (IBM PC) high-order codes. Other characters with diacritics
-are represented by special "entity" codes, and in some cases also
-are found in this special WEB1913 font, described below.
- Vowels with a circle above (as in Swedish) are coded <xring/
-(x with a ring, or "degrees" mark over it); vowels with tilde over them
-are represented by <xtil/, where "x" is the vowel, as in <etil/ (<atil/
-also has code 238); letters with a dot above are represented by <xdot/
--- letter with a dot below are represented by <xsdot/ ("subdot");
-vowels with the semi-long mark (a macron with a short perpendicular
-vertical stroke attached above) are represented by <xsl/; the
-circumflex vowels have codes on this list, but may also be represented
-as <xcir/; vowels with macrons above are <xmac/ (including <oomac/,
-the "oo" with an unbroken macron above the two letters, <aemac/ = the
-ligature ae with a macron [also 214 = \'d6], and <oemac/ the ligature
-oe with a macron [also 215 = \'d7]); vowels with umlauts or a crescent
-(breve) above have codes in this list, but may also be represented by
-<xum/ and <xcr/ respectively. There is an occasional hacek or caron mark
-(an inverted circumflex) in the original; such letters are coded <xcar/.
-The o with a caron has code 213, but no others are in this font list.
-The diaeresis is treated typographically as identical to the umlaut.
- A special modification, used only for poetry (see entry "saturnian verse"
-under "saturnian") is a vowel with a macron, in which the macron is lighter
-than the usual macron, signifying a stressed syllable which has a short
-vowel sound. This is represented by <xsmac/ ("short mac").
- Another special character used in pronunciations is an "n" with an underline (like
-a macron, but below the letter), used to represent the "ng" sound. This is coded
-<nsm/ ("n sub-macron"). The ligated th used in pronunciations to depict the
-"th" sound of "the" is coded as <th/.
- NOTE: the letter combinations "fi" and "fl" are invariably printed as the
-ligatures fi and fl, but these ligatures are not marked as such
-in this transcription, and the two letters are left as individuals.
-
-SPECIAL SYMBOLS
- The dagger <dag/, double dagger <ddag/, and paragraph mark <para/ are rarely used.
- The double prime, or "seconds" of a degree is sometimes represented by
-a double "light accent" (code 183 = \'b7). In other places, and in later
-versions, it is represented by <sec/ = hex a9, in the webfont.
- The symbols "greater than" <gt/ and "less than" are encountered only
-once, but are distinguished from the right- and left-angle brackets
-(> and <) because of possible typographical differences in some fonts.
- The schwa is symbolized by <schwa/. It is not used in the
-pronunciations, but is mentioned as a symbol.
- The right-pointing arrow is <rarr/, consistent with ISO 8879.
-
-----------------------------------
-Table 1
-----------------------------------
-Numbers
- Hex codes
-1
-11 (12 is a hard page break, 13 CR, 14 sect break)
-21
-31 !"# $%&'(
-121 yz{|} ~ 79-7d 7e-82
-131 83-87 88-8c
-141 8d-91 92-96
-151 97-9b 9c-a0
-161 a1-a5 a6-aa
-171 ab-af b0-b4
-181 b5-b9 ba-be
-191 bf-c3 c4-c8
-201 c9-cd ce-d2
-211 d3-d7 d8-dc
-221 dd-e1 e2-e6
-231 e7-eb ec-f0
-241 f1-f5 f6-fa
-251 fb-ff
-
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
-Below is a complete list of the symbols used in the Webster ("webfont")
-which are encoded in the special font listed above, together with
-corresponding symbols in ISO 8879 and Tex coding. Much of this table was
-prepared by Rik Faith, to whom we express our appreciation.
- The "nearest ASCII" equivalents are given for those who want to
-display the data as best one can in 7-bit simple ASCII symbols without
-using the "entity" symbols.
-=========================================================================
-----------------------------------
-Table 2
-----------------------------------
-
-Comments:
- (1) The symbol in the "entity" column is the SGML-like symbol used in
- the present Webster files; the symbol in the "ISO 8879" column is
- the symbol for the same character given in "The user's guide to
- ISO 8879" by Smith and Stutely.
- (2) An asterisk "*" in the "entity" column means that this symbol and
-code value is not used in any form in the Webster 1913 electronic version.
- (3) If no asterisk is in the "entity" column, and no other symbol is
-there, this means that in the Webster, only the hexadecimal representation
-was used (e.g. for \'d8, \'bd, and \'b8).
- (4) \'b6 and \'b7, the heavy and light "accents", are never above a
-letter (these are not diacritical marks), but in-between letters, as the
-stress accent used in the headwords and pronunciations. The accent
-*follows* the syllable accented. The light accent \'b7 is also used as
-the "prime" in mathematical expressions (e.g. a\'b7 = "a prime"), or as
- "minutes" in degrees-minutes-seconds, and when doubled (\'b7\'b7)
-serves as "double prime" in mathematical expressions, and as "seconds"
-in degrees-minutes-seconds. The character \'a9 (<sec/ or ″) is
-also used to represent the double prime.
- (5) Although the semilong vowels are in the table (e.g. the "asl"
-= "a semilong", most of the entries in the ASCII version dictionary
-use the <xsl/ symbol coding. If you know of any printers' names for
-these, do let me know.
- (6) For some reason, the a breve and u breve have ISO codes (in the
-Latin-2 table), but the other vowels don't, in the Smith & Stutely book.
-Is this a mistake?
- (7) The symbol <nsc/ is used for "N small capitals", used in
-pronunciations to represent the soun fo the nasal N in French words.
- (8) If you find any exceptions to these usage assertions, please
-let me know.
-----------------------------------------------------------------------------------------
- webfont ISO 8879 latin1/ascii TeX nearest description
------------------- ASCII
-oct dec hex entity oct dec hex
---------------------------------------------------------------------------------
-025 21 15 * \S * section symbol
-
-074 60 3c lt 074 60 3c $<$ < less than
-076 62 3e gt 076 62 3e $>$ > greater than
-
-200 128 80 <Cced/ Ccedil 307 199 c7 \c{C} C C cedilla
-201 129 81 <uum/ uuml 374 252 fc \"u ue u umlaut (diaeresis)
-202 130 82 eacute 351 233 e9 \'e e e acute
-203 131 83 <acir/ acirc 342 226 e2 \^a a a circumflex
-204 132 84 <aum/ auml 344 228 e4 \"a ae a umlaut (diaeresis)
-205 133 85 <agrave/ agrave 340 224 e0 \`a a a grave
-206 134 86 <aring/ aring 345 229 e5 \aa a a ring above
-207 135 87 <cced/ ccedil 347 231 e7 \c{c} c c cedilla
-210 136 88 <ecir/ ecirc 352 234 ea \^e e e circumflex
-211 137 89 <eum/ euml 353 235 eb \"e e e umlaut (diaeresis)
-212 138 8a <egrave/ egrave 350 232 e8 \`e e e grave
-213 139 8b <ium/ iuml 357 239 ef \"i i i umlaut (diaeresis)
-214 140 8c <icir/ icirc 356 238 ee \^i i i circumflex
-215 141 8d igrave 354 236 ec \`i i i grave
-216 142 8e Auml A A umlaut
-217 143 8f Aring A A ring above
-
-220 144 90 <Eacute/ Eacute 311 201 c9 \'E e E acute
-221 145 91 <ae/ aelig 346 230 e6 \ae ae ligature ae
-222 146 92 <AE/ AElig 306 198 c6 \AE AE ligature AE
-223 147 93 <ocir/ ocirc 364 244 f4 \^o o o circumflex
-224 148 94 <oum/ ouml 366 246 f6 \"o oe o umlaut (diaeresis)
-225 149 95 ograve 362 242 f2 \`o o o grave
-226 150 96 <ucir/ ucirc 373 251 fb \^u u u circumflex
-227 151 97 ugrave 371 249 f9 \`u u u grave
-230 152 98 <yum/ yuml y y umlaut
-231 153 99 <Oum/ Ouml O O umlaut
-232 154 9a <Uum/ Uuml 334 220 dc \"U U U umlaut (diaeresis)
-233 155 9b
-234 156 9c <pound/ pound 243 163 a3 \pounds * pound sign (British)
-235 157 9d *
-236 158 9e *
-237 159 9f *
-240 160 a0 <aacute/ aacute 341 225 e1 \'a a a acute
-241 161 a1 <iacute/ iacute 355 237 ed \'i i i acute
-242 162 a2 oacute 363 243 f3 \'o o o acute
-243 163 a3 uacute 372 250 fa \'u u u acute
-244 164 a4 <ntil/ ntilde 361 241 f1 \~n ny n tilde
-245 165 a5 <Ntil/ Ntilde NY N tilde
-246 166 a6 <frac23/ $\frac{2}{3}$ 2/3 two-thirds
-247 167 a7 <frac13/ $\frac{1}{3}$ 1/3 one-third
-250 168 a8 *
-251 169 a9 <sec/ Prime seconds (of degree or time)
- Also, inches or double prime
-252 170 aa *
-253 171 ab <frac12/ 275 189 bd $\frac{1}{2}$ 1/2 one-half
-254 172 ac <frac14/ 274 188 bc $\frac{1}{4}$ 1/4 one-quarter
-255 173 ad *
-256 174 ae *
-257 175 af *
-260 176 b0 <?/ (?) Place-holder
- for unknown or illegible character.
-261 177 b1 *
-262 178 b2 *
-263 179 b3 *
-264 180 b4 * $\updownarrow$ * verticle arrow
-265 181 b5 <hand/ * pointing hand
- (printer's "fist")
-266 182 b6 \"{} '' bold accent
- (used in pronunciations)
-267 183 b7 prime 264 180 b4 \'{} ' light accent
- (used in pronunciations)
- also minutes (of arc or time)
-270 184 b8 '' " close double quote
-271 185 b9 *
-272 186 ba * $\parallel$ || verticle double bar (l)
-273 187 bb *
-274 188 bc <sect/ sect \S * section mark
-275 189 bd `` " open double quotes
-276 190 be <amac/ amacr \=a a a macron
-277 191 bf lsquo ` ` left single quote
-
-300 192 c0 <nsm/ ng "n sub-macron"
-301 193 c1 <sharp/ sharp $\sharp$ # musical sharp
-302 194 c2 <flat/ flat $\flat$ * musical flat
-303 195 c3 * -- -- long dash (en-dash? )
-304 196 c4 * $-$ - horizontal line
-305 197 c5 <th/ (part 1) first part of th ligature
- see 231 = e7 for part 2
-306 198 c6 <imac/ imacr \=i i i macron
-307 199 c7 <emac/ emacr \=e e e macron
-310 200 c8 <dsdot/ d Sanskrit/Tamil d dot
-311 201 c9 <nsdot/ n Sanskrit/Tamil n dot
-312 202 ca <tsdot/ t Sanskrit/Tamil t dot
-313 203 cb <ecr/ \u{e} e e breve
-314 204 cc <icr/ \u{i} i i breve
-315 205 cd *
-316 206 ce <ocr/ \u{o} o o breve
-317 207 cf - -- - short dash
-
-320 208 d0 -- mdash --- -- long (em) dash
-321 209 d1 <OE/ OElig \OE OE OE ligature
-322 210 d2 <oe/ oelig \oe oe oe ligature
-323 211 d3 <omac/ omacr \=o o o macron
-324 212 d4 <umac/ umacr \=u u u macron
-325 213 d5 <ocar/ \v{o} o o hacek
-326 214 d6 <aemac/ \=\ae ae ae ligature macron
-327 215 d7 <oemac/ \=\oe oe oe ligature macron
-330 216 d8 par $\parallel$ || double vertical
- bar(s)
-331 217 d9 *
-332 218 da *
-333 219 db *
-334 220 dc <ucr/ ubreve \u{u} u u breve
-335 221 dd <acr/ abreve \u{a} a a breve
-336 222 de <cre/ ssmile \u{} ~ crescent
- (like a breve, but vertically centered --
- represents the short accent in poetic meter)
-337 223 df <ymac/ \=y y y macron
-
-340 224 e0 <asl/ a a "semilong"
- (has a macron above with a short vertical
- bar on top the center of the macron)
- Used in pronunciations.
-341 225 e1 <esl/ e "semilong"
-342 226 e2 <isl/ i "semilong"
-343 227 e3 <osl/ o "semilong"
-344 228 e4 <usl/ u "semilong"
-345 229 e5 <adot/ a a with dot above
-346 230 e6 * mu small Greek mu
-347 231 e7 <th/ (part 2) second part of th ligature
- see 197 = c5 for part 1
-350 232 e8 *
-351 233 e9 *
-352 234 ea *
-353 235 eb <edh/ edh 360 240 f0 th small eth
-354 236 ec *
-355 237 ed <thorn/ thorn 376 254 fe th small thorn
-356 238 ee <atil/ atilde \~a a a tilde
-357 239 ef <ndot/ n n with dot above
-
-360 240 f0 <rsdot/ \d{r} r r with a dot below
-361 241 f1 *
-362 242 f2 *
-363 243 f3 *
-364 244 f4 <yogh/ y small yogh
-365 245 f5 mdash --- -- em dash
-366 246 f6 divide 367 247 f7 $\div$ / division sign
-367 247 f7 ap $\approx$ ~= "double tilde"
-370 248 f8 <deg/ 260 176 b0 ${}^\circ$ * degree sign
-371 249 f9 <middot/ $\bullet$ * bold middle dot
-372 250 fa * 267 183 b7 $\cdot$ * light middle dot
-373 251 fb <root/ radic $\surd$ * root sign
-374 252 fc *
-375 253 fd *
-376 254 fe *
-377 255 ff *
-
-----------------------------------
-Table 3
-----------------------------------
-
-====================================================================
-The table below gives some additional information about some of the
-more commonly used entities
--------------------------------------------------------------------
-Frequently used:
-decimal hex char definition
- 21 section symbol -- another section also at 197
- (so that 21 can be used as a normal control
- character)
- 126 ~ used by typists as a place-holder in word
- combinations where an uncapitalized headword
- should be.
- 128 80 <Cced/ c cedilla (uppercase)
- 129 81 <uum/ u umlaut
- 130 82 e acute
- 131 83 a circumflex
- 132 84 <aum/ a umlaut
- 133 85 a grave
- 134 86 <aring/ a with "ring" (circle) above (Swedish!)
- 135 87 <cced/ c cedilla
- 136 - 144 standard European set for IBM
- 136 88 <ecir/ e circumflex
- 137 89 <eum/ e umlaut (or e with dieresis above)
- 138 8a e grave
- 145 91 <ae/ = "ae" fused ligature
- 146 92 <AE/ = upper-case "ae" fused ligature
- 147 93 <ocir/ o circumflex
- 148 94 <oum/ o "umlaut", used mostly in "coperation,
- Zol." and in pronunciations
- 164 a4 <ntil/ Spanish "enye"
- 166 a6 <frac23/ two-thirds (fraction)
- 167 a7 <frac13/ one-third (fraction)
- 169 a9 <sec/ seconds of degree or time, or double-prime
- 171 ab <frac12/ one-half, as in the original IBM set
- 172 ac <frac14/ one-fourth (fraction)
- 176 b0 <?/ = (reverse-video question mark), used
- to represent an uncodable or illegible character
- 180 b4 long verticle double-headed arrow (a reference mark)
- 181 b5 <hand/ = (the typographer's "fist")
- Appearing as a "pointing hand" character
- (for explanatory notes)
- 182 b6 bold accent in headwords
- replaced in full ASCII version by double quote = "
- 183 b7 light accent in headwords
- replaced within headwords in the full ASCII version
- by an open-single-quote (` = ASCII 96, not the same
- as 191, \'bf). This mark is used also
- for minutes of a degree, and for "prime"
- to modify variables in mathematical expressions.
- -- two of these in sequence represent seconds
- of a degree, or double prime. The seconds
- symbol is also represented by <sec/ (hex a9).
- 184 b8 close double quotes (used with 189 [= \'bd], open quote)
- 186 ba verticle double bar - represents the symbol used
- in the printed dictionary before a headword to
- signify that the word was adopted without
- anglicization from a foreign language
- but in the full-ASCII version this function
- uses \'d8 -- see 216
- 188 bc <sect/ section mark
- - alternate to 21 (a control character)
- 189 bd open double quotes (used with 184, close quote)
- 190 be <amac/ a macron
- 191 bf <lsquo/ "left single quote"
- single open quote mark (not same as ASCII 96)
- 192 c0 <nsm/ "n sub-macron", an n with a macron below --
- represents the "ng" sound in pronunciations
- 193 c1 <sharp/ sharp - music notation
- 194 c2 <flat/ flat - music notation
- 195 c3 long dash, one pixel removed from left
- will fuse with left long dash, char 208
- 196 c4 graphic horizontal line
- 195+208 combination for a very long dash. In the
- original typing, the dash char 208 was used
- for both non-breaking hyphen (in hyphenated
- words), and for the em-dash used as an
- introductory mark for various segments.
- The em-dash should be distinguished from
- the hyphen, but that conversion hasn't yet
- been done.
- In the full ASCII version, a double hypen
- "--" represent the m-dash
- 197 c5 <th/ (part 1) first of a pair of characters
- 197+231 = used to represent the th ligature --
- <th/ represents the "th" sound of "mother"
- see 231 (e7) for part 2
- 198 c6 <imac/ = i macron
- 199 c7 <emac/ = e macron
- 200 c8 <dsdot/ Sanskrit/Tamil d with dot underneath
- 201 c9 <nsdot/ Sanskrit/Tamil n with dot underneath
- 202 ca <tsdot/ Sanskrit/Tamil t with dot underneath
- 203 cb <ecr/ = e with crescent (breve) above. Used
- - in some etymologies and pronunciation
- 204 cc <icr/ = i with crescent (breve) above - used
- - in some etymologies and pronunciation
- 206 ce <ocr/ = o with crescent (breve) above - used
- - in some etymologies and pronunciation
- 207 cf short dash, used in hyphenated words, and in
- breaking syllables where no accent is used. But
- sometimes the typists used the normal hyphen [45],
- or the long dash (decimal 208) for that purpose.
- The normal hyphen is the same length as the long
- dash, but one pixel higher in the character box.
- # In headwords, in the full ASCII version, this
- short dash is represented by the asterisk "*".
- 208 d0 <mdash/ = represents the long dash, used for the em
- dash which often precedes certain sections within a
- definition, and which separates some sections,
- such as wordforms or collocations within a
- collocation segment. This is replaced in the
- full ASCII version by a double hyphen, "--".
- 210 d2 <oe/ = "oe" fused ligature
- 211 d3 <omac/ = o macron
- 212 d4 <umac/ = u macron
- 213 d5 <ocar/ o with caron (hacek) (inverted circumflex) above
- 214 d6 <aemac/ = "ae" ligature with a macron
- 215 d7 <oemac/ = "oe" ligature with a macron
- 216 d8 <par/ double vertical bar (short length; the long
- length is the graphics character 186)
- This precedes words marked with a double vertical bar in
- the original dictionary, signifying that the word was
- adopted directly into English without modification of
- the spelling.
- 220 dc <ucr/ = u with crescent above - used in some etymologies
- 221 dd <acr/ = a with crescent above - used in some etymologies
- 222 de <cre/ = "crescent", an upward-curving crescent
- used as a poetic meter mark
- 223 df <ymac/ = y macron (used in Anglo-Saxon?)
- 229 e5 <adot/ = a with a dot above (for pronunciations)
- 231 e7 <th/ (part 2) second of a two-character combination
- 197+231 = used to represent the th ligature in pronunciations
- <th/ represents the "th" sound of "mother"
- 235 eb <edh/ = Old English and Icelandic "edh", (or "eth")
- like a Greek delta with a hatch mark
- through the ascender. Used to represent the
- Anglo-Saxon/Icelandic/Gothic character,
- in etymologies, pronounced like "th"
- 237 ed <thorn/ "thorn", an Old English and Icelandic
- character, appears like a "p" with an extended
- ascender.
- Used to represent the
- Anglo-Saxon/Icelandic/Gothic character,
- in etymologies, pronounced like "th"
- in "thorn" and also as in "brother"
- 238 ee <atil/ a with tilde above - in some etymologies
- 244 f4 <yogh/ like a script "3" or "z". Used in Old English
- etymologies, analogous to "y"
- 247 f7 double tilde ("approximately equals").
- used by typists as a place-holder in word
- combinations where the capitalized headword
- should be.
- 248 f8 <deg/ degrees (temperature or angle). Note: some
- typists used a superscript "o" to signify
- degrees. This must be corrected!
- 249 f9 middle dot (bold)
- 250 fa middle dot (light)
- 251 fb <root/ "root" sign used in etymologies, as in original
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-======================================
- Greek transcription
-=====================================
-Greek letters are represented:
- (capitals represent capital letters; lower-case represent lower-case)
- #Note that "h" in transliterations is used individually, as eta, and
- also in the combination "ch" (chi). Conversions to other codings
- must first convert "ch" before converting "h", or at least verify
- that an "h" to be converted has no preceding "c". "c" is not
- otherwise used, so there is no ambiguity. Also, "ps" always
- represents a psi; it could in theory occur as a pi-sigma
- combination, but it doesn't. Occasionally, "th" was entered instead
- of "q" to represent theta; these should be checked to verify that
- they do not represent tau-eta, and converted to "q".
-
-(1) characters individually:
- By the short-form notation <alpha/, <beta/, <gamma/, <lambda/ etc.
- Capitalized letters are <ALPHA/, etc.
-(2) in words:
- By inclusion within the markers <grk></grk>, using the following
- roman-letter equivalents for the Greek letters:
- Accents:
- (a) aspirants -- used in front of the letter modified, which is
-usually in *front* of words beginning in vowels. Of two types:
- ' (apostrophe) for the left-curving apirant (spiritus lenis)
- " (double quote) for the right-curving aspirant (spiritus asper)
- (when the aspirant is on a letter inside a word, it is placed
- in front of the letter it modifies.)
- (the left-curving aspirant is also used over rho, which is
- then usually transliterated "rh". The " in such cases is
- placed in front of the r (for rho) which it modifies).
- (b) normal accent (appearing as an acute accent in the original):
- ` (left open quote, ASCII ) -- placed after accented vowel
- (b) grave accent (appearing as an grave accent in the original):
- ~ (tilde, ASCII ) -- placed after accented vowel. This is
- rarely seen, as in <grk>to~ pa^n</grk> at "universe" or
- <grk>ta~ gewrgika`</grk> (at "Georgic").
- (c) curving accent (appearing as a rounded circumflex):
- ^ (circumflex) -- placed after accented vowel
- (d) "iota" subscript (ogonek)-- a comma placed after the vowel
- having the subscript
- (e) diaeresis:
- the double dot found occasionally over the iota is
- represented by a colon immediately after the iota,
- as the i-diaeresis in <grk>Farisai:ko`s</grk> (at "pharisaic").
-
- Where a letter has two accents, both are placed *after* the vowel
- Letters with an aspirant and an accent have the
- aspirant before the letter, and the accent after it.
- ------------------------
-
-
-The capitalized Greek letters are represented by the capitalized
- versions of the letters shown here.
------------------------------------------
- Greek letter transliteration
- ------------ ---------------
- alpha a
- beta b
- gamma g
- delta d
- epsilon e
- zeta z
- eta h
- theta q (th was used in some earier sections, but was
- changed due to potential confusion with the
- tau+eta combination, as in <grk>lyth`rios</grk>
- (at "lyterian") or <grk>poihth`s</grk>
- (at "maker") )
- iota i
- kappa k
- lambda l
- mu m
- nu n
- xi x
- omicron o
- pi p
- rho r
- sigma s (end form not distinguished here from middle
- form within words, but when isolated, use <sigmat/
- ("terminal sigma") for the end form)
- tau t
- upsilon y (Used for both "u" and "y" pronunciations)
- phi f
- chi ch (c is always followed by h, so the h component
- is not confusable with eta)
- psi ps (theoretically confusable with pi-sigma, but that
- combination seems never to occur)
- omega w
-
- (Roman j, v, u are unused)
-
+ WEBSTER FONTS + ============= + + Fonts for the Webster 1913 Dictionary. + For version 0.50 + Last edit May 5, 2001 + ______________________________________ + (This file contains some extended ASCII characters, and should be +transmitted in binary mode) +---------------------------------------------------------------------- + + This file describes a modified font for use in visualizing the +text of the 1913 "Webster's Revised Unabridged Dictionary" (W1913), +usable for the DOS operating system of IBM-compatible personal computers. +The electronic version of that dictionary and this font were prepared by +MICRA, Inc., Plainfield NJ, and are copyrighted (C) 1996 by MICRA, Inc. +For details of permissions and restrictions on using these files, see +the accompanying file "readme.web". + The special characters used in the electronic version of the Webster +1913 are required for visualizing unusual characters used in the +etymology and pronunciation fields of the dictionary, in a form +comparable to the way they appear in the original. Since there are +more than 256 characters used in that dictionary, not all can be +represented by single-byte codes, and are instead represented by +SGML-style "short-form" symbols. (rather than the "entity" format +"&xx;" The ampersand is used frequently, and we prefer to leave +the "<" as the only "escape" character) of the type <x/ where x +is a specific code for the symbol in the dictionary. +See the "Short Form" section below for details about such characters. +Note that the symbols used here are in some cases abbreviations +(for compactness) of the ISO 8879 recommended symbols. If necessary, +the table below allows simple replacement by alternate encodings. + This symbol font can be loaded in IBM-compatible (x86) computers +running the DOS operating system by using the "font.bat" command file +in the "utils" directory. The fonts files for 8x14 and 8x16 fonts are +"web14.fnt" and "web16.fnt" respectively. + For those loading the Webster onto some machine other than an +IBM-compatible running DOS, it will be necessary to provide a +translation table, to convert these characters into a code that +can be handled by that computer. For this reason, I attach an +"explanation" for each character, for those who cannot view +the original DOS font. + The DOS-loadable font does not contain all of the characters needed +to depict the etymologies or the pronunciations. In addition to an +absence of several characters used in the pronunciations, no Greek letters are +included. The Greek words appearing in the etymologies, +when they are included, will be typed in a +roman-letter transcription (See section on Greek transcription, below). +Only a very few Greek words have been thus transcribed as of the +present version (version 0.41). + Wherever the typists did not know the character to use, they +usually inserted a reverse-video question mark (decimal 176). +This appears in full-ASCII versions as <?/. This mark was used both for +characters in non-ASCII fonts, and for unreadable characters (i.e., +characters smeared in the original or distorted in the copies available +to the typists. The type in the original was in many places smeared and +illegible at the left and right page margins; occasionally, small +parts of words were blotted out by plain white space). + A character table for the high-order characters appears below. +Under that is a list and description of most of the special characters +used in the Webster files. + Note that there are yet some characters used in the etymologies, +and some other symbols, which are not in this list. For example, the +vowels with a double dot *underneath*, e.g. a (as in all) have no representation +in this character set, and, where explicitly entered in the dictionary, +are represented by <xdd/ where "x" is the letter, as in "<add/". + +ITALICS +------- + In most places, italic font is represented by the tags <it>...</it> +surrounding the italic text, or by some other tag which also implies +italic font. In the pronunciations, however, where italicized vowels +are used among non-italic and other special characters to indicate +pronunciation, the special codes <ait/, <eit/, <iit/, <oit/, <uit/, +are also used to indicate the italicized vowel. + +DIACRITICS +------------- + The European grave and acute accents are represented by the +standard (IBM PC) high-order codes. Other characters with diacritics +are represented by special "entity" codes, and in some cases also +are found in this special WEB1913 font, described below. + Vowels with a circle above (as in Swedish) are coded <xring/ +(x with a ring, or "degrees" mark over it); vowels with tilde over them +are represented by <xtil/, where "x" is the vowel, as in <etil/ (<atil/ +also has code 238); letters with a dot above are represented by <xdot/ +-- letter with a dot below are represented by <xsdot/ ("subdot"); +vowels with the semi-long mark (a macron with a short perpendicular +vertical stroke attached above) are represented by <xsl/; the +circumflex vowels have codes on this list, but may also be represented +as <xcir/; vowels with macrons above are <xmac/ (including <oomac/, +the "oo" with an unbroken macron above the two letters, <aemac/ = the +ligature ae with a macron [also 214 = \'d6], and <oemac/ the ligature +oe with a macron [also 215 = \'d7]); vowels with umlauts or a crescent +(breve) above have codes in this list, but may also be represented by +<xum/ and <xcr/ respectively. There is an occasional hacek or caron mark +(an inverted circumflex) in the original; such letters are coded <xcar/. +The o with a caron has code 213, but no others are in this font list. +The diaeresis is treated typographically as identical to the umlaut. + A special modification, used only for poetry (see entry "saturnian verse" +under "saturnian") is a vowel with a macron, in which the macron is lighter +than the usual macron, signifying a stressed syllable which has a short +vowel sound. This is represented by <xsmac/ ("short mac"). + Another special character used in pronunciations is an "n" with an underline (like +a macron, but below the letter), used to represent the "ng" sound. This is coded +<nsm/ ("n sub-macron"). The ligated th used in pronunciations to depict the +"th" sound of "the" is coded as <th/. + NOTE: the letter combinations "fi" and "fl" are invariably printed as the +ligatures fi and fl, but these ligatures are not marked as such +in this transcription, and the two letters are left as individuals. + +SPECIAL SYMBOLS + The dagger <dag/, double dagger <ddag/, and paragraph mark <para/ are rarely used. + The double prime, or "seconds" of a degree is sometimes represented by +a double "light accent" (code 183 = \'b7). In other places, and in later +versions, it is represented by <sec/ = hex a9, in the webfont. + The symbols "greater than" <gt/ and "less than" are encountered only +once, but are distinguished from the right- and left-angle brackets +(> and <) because of possible typographical differences in some fonts. + The schwa is symbolized by <schwa/. It is not used in the +pronunciations, but is mentioned as a symbol. + The right-pointing arrow is <rarr/, consistent with ISO 8879. + +---------------------------------- +Table 1 +---------------------------------- +Numbers + Hex codes +1 +11 (12 is a hard page break, 13 CR, 14 sect break) +21 +31 !"# $%&'( +121 yz{|} ~ 79-7d 7e-82 +131 83-87 88-8c +141 8d-91 92-96 +151 97-9b 9c-a0 +161 a1-a5 a6-aa +171 ab-af b0-b4 +181 b5-b9 ba-be +191 bf-c3 c4-c8 +201 c9-cd ce-d2 +211 d3-d7 d8-dc +221 dd-e1 e2-e6 +231 e7-eb ec-f0 +241 f1-f5 f6-fa +251 fb-ff + +=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- +Below is a complete list of the symbols used in the Webster ("webfont") +which are encoded in the special font listed above, together with +corresponding symbols in ISO 8879 and Tex coding. Much of this table was +prepared by Rik Faith, to whom we express our appreciation. + The "nearest ASCII" equivalents are given for those who want to +display the data as best one can in 7-bit simple ASCII symbols without +using the "entity" symbols. +========================================================================= +---------------------------------- +Table 2 +---------------------------------- + +Comments: + (1) The symbol in the "entity" column is the SGML-like symbol used in + the present Webster files; the symbol in the "ISO 8879" column is + the symbol for the same character given in "The user's guide to + ISO 8879" by Smith and Stutely. + (2) An asterisk "*" in the "entity" column means that this symbol and +code value is not used in any form in the Webster 1913 electronic version. + (3) If no asterisk is in the "entity" column, and no other symbol is +there, this means that in the Webster, only the hexadecimal representation +was used (e.g. for \'d8, \'bd, and \'b8). + (4) \'b6 and \'b7, the heavy and light "accents", are never above a +letter (these are not diacritical marks), but in-between letters, as the +stress accent used in the headwords and pronunciations. The accent +*follows* the syllable accented. The light accent \'b7 is also used as +the "prime" in mathematical expressions (e.g. a\'b7 = "a prime"), or as + "minutes" in degrees-minutes-seconds, and when doubled (\'b7\'b7) +serves as "double prime" in mathematical expressions, and as "seconds" +in degrees-minutes-seconds. The character \'a9 (<sec/ or ″) is +also used to represent the double prime. + (5) Although the semilong vowels are in the table (e.g. the "asl" += "a semilong", most of the entries in the ASCII version dictionary +use the <xsl/ symbol coding. If you know of any printers' names for +these, do let me know. + (6) For some reason, the a breve and u breve have ISO codes (in the +Latin-2 table), but the other vowels don't, in the Smith & Stutely book. +Is this a mistake? + (7) The symbol <nsc/ is used for "N small capitals", used in +pronunciations to represent the soun fo the nasal N in French words. + (8) If you find any exceptions to these usage assertions, please +let me know. +---------------------------------------------------------------------------------------- + webfont ISO 8879 latin1/ascii TeX nearest description +------------------ ASCII +oct dec hex entity oct dec hex +-------------------------------------------------------------------------------- +025 21 15 * \S * section symbol + +074 60 3c lt 074 60 3c $<$ < less than +076 62 3e gt 076 62 3e $>$ > greater than + +200 128 80 <Cced/ Ccedil 307 199 c7 \c{C} C C cedilla +201 129 81 <uum/ uuml 374 252 fc \"u ue u umlaut (diaeresis) +202 130 82 eacute 351 233 e9 \'e e e acute +203 131 83 <acir/ acirc 342 226 e2 \^a a a circumflex +204 132 84 <aum/ auml 344 228 e4 \"a ae a umlaut (diaeresis) +205 133 85 <agrave/ agrave 340 224 e0 \`a a a grave +206 134 86 <aring/ aring 345 229 e5 \aa a a ring above +207 135 87 <cced/ ccedil 347 231 e7 \c{c} c c cedilla +210 136 88 <ecir/ ecirc 352 234 ea \^e e e circumflex +211 137 89 <eum/ euml 353 235 eb \"e e e umlaut (diaeresis) +212 138 8a <egrave/ egrave 350 232 e8 \`e e e grave +213 139 8b <ium/ iuml 357 239 ef \"i i i umlaut (diaeresis) +214 140 8c <icir/ icirc 356 238 ee \^i i i circumflex +215 141 8d igrave 354 236 ec \`i i i grave +216 142 8e Auml A A umlaut +217 143 8f Aring A A ring above + +220 144 90 <Eacute/ Eacute 311 201 c9 \'E e E acute +221 145 91 <ae/ aelig 346 230 e6 \ae ae ligature ae +222 146 92 <AE/ AElig 306 198 c6 \AE AE ligature AE +223 147 93 <ocir/ ocirc 364 244 f4 \^o o o circumflex +224 148 94 <oum/ ouml 366 246 f6 \"o oe o umlaut (diaeresis) +225 149 95 ograve 362 242 f2 \`o o o grave +226 150 96 <ucir/ ucirc 373 251 fb \^u u u circumflex +227 151 97 ugrave 371 249 f9 \`u u u grave +230 152 98 <yum/ yuml y y umlaut +231 153 99 <Oum/ Ouml O O umlaut +232 154 9a <Uum/ Uuml 334 220 dc \"U U U umlaut (diaeresis) +233 155 9b +234 156 9c <pound/ pound 243 163 a3 \pounds * pound sign (British) +235 157 9d * +236 158 9e * +237 159 9f * +240 160 a0 <aacute/ aacute 341 225 e1 \'a a a acute +241 161 a1 <iacute/ iacute 355 237 ed \'i i i acute +242 162 a2 oacute 363 243 f3 \'o o o acute +243 163 a3 uacute 372 250 fa \'u u u acute +244 164 a4 <ntil/ ntilde 361 241 f1 \~n ny n tilde +245 165 a5 <Ntil/ Ntilde NY N tilde +246 166 a6 <frac23/ $\frac{2}{3}$ 2/3 two-thirds +247 167 a7 <frac13/ $\frac{1}{3}$ 1/3 one-third +250 168 a8 * +251 169 a9 <sec/ Prime seconds (of degree or time) + Also, inches or double prime +252 170 aa * +253 171 ab <frac12/ 275 189 bd $\frac{1}{2}$ 1/2 one-half +254 172 ac <frac14/ 274 188 bc $\frac{1}{4}$ 1/4 one-quarter +255 173 ad * +256 174 ae * +257 175 af * +260 176 b0 <?/ (?) Place-holder + for unknown or illegible character. +261 177 b1 * +262 178 b2 * +263 179 b3 * +264 180 b4 * $\updownarrow$ * verticle arrow +265 181 b5 <hand/ * pointing hand + (printer's "fist") +266 182 b6 \"{} '' bold accent + (used in pronunciations) +267 183 b7 prime 264 180 b4 \'{} ' light accent + (used in pronunciations) + also minutes (of arc or time) +270 184 b8 '' " close double quote +271 185 b9 * +272 186 ba * $\parallel$ || verticle double bar (l) +273 187 bb * +274 188 bc <sect/ sect \S * section mark +275 189 bd `` " open double quotes +276 190 be <amac/ amacr \=a a a macron +277 191 bf lsquo ` ` left single quote + +300 192 c0 <nsm/ ng "n sub-macron" +301 193 c1 <sharp/ sharp $\sharp$ # musical sharp +302 194 c2 <flat/ flat $\flat$ * musical flat +303 195 c3 * -- -- long dash (en-dash? ) +304 196 c4 * $-$ - horizontal line +305 197 c5 <th/ (part 1) first part of th ligature + see 231 = e7 for part 2 +306 198 c6 <imac/ imacr \=i i i macron +307 199 c7 <emac/ emacr \=e e e macron +310 200 c8 <dsdot/ d Sanskrit/Tamil d dot +311 201 c9 <nsdot/ n Sanskrit/Tamil n dot +312 202 ca <tsdot/ t Sanskrit/Tamil t dot +313 203 cb <ecr/ \u{e} e e breve +314 204 cc <icr/ \u{i} i i breve +315 205 cd * +316 206 ce <ocr/ \u{o} o o breve +317 207 cf - -- - short dash + +320 208 d0 -- mdash --- -- long (em) dash +321 209 d1 <OE/ OElig \OE OE OE ligature +322 210 d2 <oe/ oelig \oe oe oe ligature +323 211 d3 <omac/ omacr \=o o o macron +324 212 d4 <umac/ umacr \=u u u macron +325 213 d5 <ocar/ \v{o} o o hacek +326 214 d6 <aemac/ \=\ae ae ae ligature macron +327 215 d7 <oemac/ \=\oe oe oe ligature macron +330 216 d8 par $\parallel$ || double vertical + bar(s) +331 217 d9 * +332 218 da * +333 219 db * +334 220 dc <ucr/ ubreve \u{u} u u breve +335 221 dd <acr/ abreve \u{a} a a breve +336 222 de <cre/ ssmile \u{} ~ crescent + (like a breve, but vertically centered -- + represents the short accent in poetic meter) +337 223 df <ymac/ \=y y y macron + +340 224 e0 <asl/ a a "semilong" + (has a macron above with a short vertical + bar on top the center of the macron) + Used in pronunciations. +341 225 e1 <esl/ e "semilong" +342 226 e2 <isl/ i "semilong" +343 227 e3 <osl/ o "semilong" +344 228 e4 <usl/ u "semilong" +345 229 e5 <adot/ a a with dot above +346 230 e6 * mu small Greek mu +347 231 e7 <th/ (part 2) second part of th ligature + see 197 = c5 for part 1 +350 232 e8 * +351 233 e9 * +352 234 ea * +353 235 eb <edh/ edh 360 240 f0 th small eth +354 236 ec * +355 237 ed <thorn/ thorn 376 254 fe th small thorn +356 238 ee <atil/ atilde \~a a a tilde +357 239 ef <ndot/ n n with dot above + +360 240 f0 <rsdot/ \d{r} r r with a dot below +361 241 f1 * +362 242 f2 * +363 243 f3 * +364 244 f4 <yogh/ y small yogh +365 245 f5 mdash --- -- em dash +366 246 f6 divide 367 247 f7 $\div$ / division sign +367 247 f7 ap $\approx$ ~= "double tilde" +370 248 f8 <deg/ 260 176 b0 ${}^\circ$ * degree sign +371 249 f9 <middot/ $\bullet$ * bold middle dot +372 250 fa * 267 183 b7 $\cdot$ * light middle dot +373 251 fb <root/ radic $\surd$ * root sign +374 252 fc * +375 253 fd * +376 254 fe * +377 255 ff * + +---------------------------------- +Table 3 +---------------------------------- + +==================================================================== +The table below gives some additional information about some of the +more commonly used entities +------------------------------------------------------------------- +Frequently used: +decimal hex char definition + 21 section symbol -- another section also at 197 + (so that 21 can be used as a normal control + character) + 126 ~ used by typists as a place-holder in word + combinations where an uncapitalized headword + should be. + 128 80 <Cced/ c cedilla (uppercase) + 129 81 <uum/ u umlaut + 130 82 e acute + 131 83 a circumflex + 132 84 <aum/ a umlaut + 133 85 a grave + 134 86 <aring/ a with "ring" (circle) above (Swedish!) + 135 87 <cced/ c cedilla + 136 - 144 standard European set for IBM + 136 88 <ecir/ e circumflex + 137 89 <eum/ e umlaut (or e with dieresis above) + 138 8a e grave + 145 91 <ae/ = "ae" fused ligature + 146 92 <AE/ = upper-case "ae" fused ligature + 147 93 <ocir/ o circumflex + 148 94 <oum/ o "umlaut", used mostly in "coperation, + Zol." and in pronunciations + 164 a4 <ntil/ Spanish "enye" + 166 a6 <frac23/ two-thirds (fraction) + 167 a7 <frac13/ one-third (fraction) + 169 a9 <sec/ seconds of degree or time, or double-prime + 171 ab <frac12/ one-half, as in the original IBM set + 172 ac <frac14/ one-fourth (fraction) + 176 b0 <?/ = (reverse-video question mark), used + to represent an uncodable or illegible character + 180 b4 long verticle double-headed arrow (a reference mark) + 181 b5 <hand/ = (the typographer's "fist") + Appearing as a "pointing hand" character + (for explanatory notes) + 182 b6 bold accent in headwords + replaced in full ASCII version by double quote = " + 183 b7 light accent in headwords + replaced within headwords in the full ASCII version + by an open-single-quote (` = ASCII 96, not the same + as 191, \'bf). This mark is used also + for minutes of a degree, and for "prime" + to modify variables in mathematical expressions. + -- two of these in sequence represent seconds + of a degree, or double prime. The seconds + symbol is also represented by <sec/ (hex a9). + 184 b8 close double quotes (used with 189 [= \'bd], open quote) + 186 ba verticle double bar - represents the symbol used + in the printed dictionary before a headword to + signify that the word was adopted without + anglicization from a foreign language + but in the full-ASCII version this function + uses \'d8 -- see 216 + 188 bc <sect/ section mark + - alternate to 21 (a control character) + 189 bd open double quotes (used with 184, close quote) + 190 be <amac/ a macron + 191 bf <lsquo/ "left single quote" + single open quote mark (not same as ASCII 96) + 192 c0 <nsm/ "n sub-macron", an n with a macron below -- + represents the "ng" sound in pronunciations + 193 c1 <sharp/ sharp - music notation + 194 c2 <flat/ flat - music notation + 195 c3 long dash, one pixel removed from left + will fuse with left long dash, char 208 + 196 c4 graphic horizontal line + 195+208 combination for a very long dash. In the + original typing, the dash char 208 was used + for both non-breaking hyphen (in hyphenated + words), and for the em-dash used as an + introductory mark for various segments. + The em-dash should be distinguished from + the hyphen, but that conversion hasn't yet + been done. + In the full ASCII version, a double hypen + "--" represent the m-dash + 197 c5 <th/ (part 1) first of a pair of characters + 197+231 = used to represent the th ligature -- + <th/ represents the "th" sound of "mother" + see 231 (e7) for part 2 + 198 c6 <imac/ = i macron + 199 c7 <emac/ = e macron + 200 c8 <dsdot/ Sanskrit/Tamil d with dot underneath + 201 c9 <nsdot/ Sanskrit/Tamil n with dot underneath + 202 ca <tsdot/ Sanskrit/Tamil t with dot underneath + 203 cb <ecr/ = e with crescent (breve) above. Used + - in some etymologies and pronunciation + 204 cc <icr/ = i with crescent (breve) above - used + - in some etymologies and pronunciation + 206 ce <ocr/ = o with crescent (breve) above - used + - in some etymologies and pronunciation + 207 cf short dash, used in hyphenated words, and in + breaking syllables where no accent is used. But + sometimes the typists used the normal hyphen [45], + or the long dash (decimal 208) for that purpose. + The normal hyphen is the same length as the long + dash, but one pixel higher in the character box. + # In headwords, in the full ASCII version, this + short dash is represented by the asterisk "*". + 208 d0 <mdash/ = represents the long dash, used for the em + dash which often precedes certain sections within a + definition, and which separates some sections, + such as wordforms or collocations within a + collocation segment. This is replaced in the + full ASCII version by a double hyphen, "--". + 210 d2 <oe/ = "oe" fused ligature + 211 d3 <omac/ = o macron + 212 d4 <umac/ = u macron + 213 d5 <ocar/ o with caron (hacek) (inverted circumflex) above + 214 d6 <aemac/ = "ae" ligature with a macron + 215 d7 <oemac/ = "oe" ligature with a macron + 216 d8 <par/ double vertical bar (short length; the long + length is the graphics character 186) + This precedes words marked with a double vertical bar in + the original dictionary, signifying that the word was + adopted directly into English without modification of + the spelling. + 220 dc <ucr/ = u with crescent above - used in some etymologies + 221 dd <acr/ = a with crescent above - used in some etymologies + 222 de <cre/ = "crescent", an upward-curving crescent + used as a poetic meter mark + 223 df <ymac/ = y macron (used in Anglo-Saxon?) + 229 e5 <adot/ = a with a dot above (for pronunciations) + 231 e7 <th/ (part 2) second of a two-character combination + 197+231 = used to represent the th ligature in pronunciations + <th/ represents the "th" sound of "mother" + 235 eb <edh/ = Old English and Icelandic "edh", (or "eth") + like a Greek delta with a hatch mark + through the ascender. Used to represent the + Anglo-Saxon/Icelandic/Gothic character, + in etymologies, pronounced like "th" + 237 ed <thorn/ "thorn", an Old English and Icelandic + character, appears like a "p" with an extended + ascender. + Used to represent the + Anglo-Saxon/Icelandic/Gothic character, + in etymologies, pronounced like "th" + in "thorn" and also as in "brother" + 238 ee <atil/ a with tilde above - in some etymologies + 244 f4 <yogh/ like a script "3" or "z". Used in Old English + etymologies, analogous to "y" + 247 f7 double tilde ("approximately equals"). + used by typists as a place-holder in word + combinations where the capitalized headword + should be. + 248 f8 <deg/ degrees (temperature or angle). Note: some + typists used a superscript "o" to signify + degrees. This must be corrected! + 249 f9 middle dot (bold) + 250 fa middle dot (light) + 251 fb <root/ "root" sign used in etymologies, as in original + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +====================================== + Greek transcription +===================================== +Greek letters are represented: + (capitals represent capital letters; lower-case represent lower-case) + #Note that "h" in transliterations is used individually, as eta, and + also in the combination "ch" (chi). Conversions to other codings + must first convert "ch" before converting "h", or at least verify + that an "h" to be converted has no preceding "c". "c" is not + otherwise used, so there is no ambiguity. Also, "ps" always + represents a psi; it could in theory occur as a pi-sigma + combination, but it doesn't. Occasionally, "th" was entered instead + of "q" to represent theta; these should be checked to verify that + they do not represent tau-eta, and converted to "q". + +(1) characters individually: + By the short-form notation <alpha/, <beta/, <gamma/, <lambda/ etc. + Capitalized letters are <ALPHA/, etc. +(2) in words: + By inclusion within the markers <grk></grk>, using the following + roman-letter equivalents for the Greek letters: + Accents: + (a) aspirants -- used in front of the letter modified, which is +usually in *front* of words beginning in vowels. Of two types: + ' (apostrophe) for the left-curving apirant (spiritus lenis) + " (double quote) for the right-curving aspirant (spiritus asper) + (when the aspirant is on a letter inside a word, it is placed + in front of the letter it modifies.) + (the left-curving aspirant is also used over rho, which is + then usually transliterated "rh". The " in such cases is + placed in front of the r (for rho) which it modifies). + (b) normal accent (appearing as an acute accent in the original): + ` (left open quote, ASCII ) -- placed after accented vowel + (b) grave accent (appearing as an grave accent in the original): + ~ (tilde, ASCII ) -- placed after accented vowel. This is + rarely seen, as in <grk>to~ pa^n</grk> at "universe" or + <grk>ta~ gewrgika`</grk> (at "Georgic"). + (c) curving accent (appearing as a rounded circumflex): + ^ (circumflex) -- placed after accented vowel + (d) "iota" subscript (ogonek)-- a comma placed after the vowel + having the subscript + (e) diaeresis: + the double dot found occasionally over the iota is + represented by a colon immediately after the iota, + as the i-diaeresis in <grk>Farisai:ko`s</grk> (at "pharisaic"). + + Where a letter has two accents, both are placed *after* the vowel + Letters with an aspirant and an accent have the + aspirant before the letter, and the accent after it. + ------------------------ + + +The capitalized Greek letters are represented by the capitalized + versions of the letters shown here. +----------------------------------------- + Greek letter transliteration + ------------ --------------- + alpha a + beta b + gamma g + delta d + epsilon e + zeta z + eta h + theta q (th was used in some earier sections, but was + changed due to potential confusion with the + tau+eta combination, as in <grk>lyth`rios</grk> + (at "lyterian") or <grk>poihth`s</grk> + (at "maker") ) + iota i + kappa k + lambda l + mu m + nu n + xi x + omicron o + pi p + rho r + sigma s (end form not distinguished here from middle + form within words, but when isolated, use <sigmat/ + ("terminal sigma") for the end form) + tau t + upsilon y (Used for both "u" and "y" pronunciations) + phi f + chi ch (c is always followed by h, so the h component + is not confusable with eta) + psi ps (theoretically confusable with pi-sigma, but that + combination seems never to occur) + omega w + + (Roman j, v, u are unused) + |