summaryrefslogtreecommitdiffabout
Side-by-side diff
Diffstat (more/less context) (ignore whitespace changes)
-rw-r--r--COPYING680
-rw-r--r--GNUCIDE.DIR36
-rw-r--r--PRONUNC.WEB636
-rw-r--r--README.DIC536
-rw-r--r--TAGSET.WEB2120
-rw-r--r--WEBFONT.ASC1206
6 files changed, 2589 insertions, 2625 deletions
diff --git a/COPYING b/COPYING
index bfe01e4..b62dadc 100644
--- a/COPYING
+++ b/COPYING
@@ -1,340 +1,340 @@
- GNU GENERAL PUBLIC LICENSE
- Version 2, June 1991
-
- Copyright (C) 1989, 1991 Free Software Foundation, Inc.
- 675 Mass Ave, Cambridge, MA 02139, USA
- 617-542-5942
- Everyone is permitted to copy and distribute verbatim copies
- of this license document, but changing it is not allowed.
-
- Preamble
-
- The licenses for most software are designed to take away your
-freedom to share and change it. By contrast, the GNU General Public
-License is intended to guarantee your freedom to share and change free
-software--to make sure the software is free for all its users. This
-General Public License applies to most of the Free Software
-Foundation's software and to any other program whose authors commit to
-using it. (Some other Free Software Foundation software is covered by
-the GNU Library General Public License instead.) You can apply it to
-your programs, too.
-
- When we speak of free software, we are referring to freedom, not
-price. Our General Public Licenses are designed to make sure that you
-have the freedom to distribute copies of free software (and charge for
-this service if you wish), that you receive source code or can get it
-if you want it, that you can change the software or use pieces of it
-in new free programs; and that you know you can do these things.
-
- To protect your rights, we need to make restrictions that forbid
-anyone to deny you these rights or to ask you to surrender the rights.
-These restrictions translate to certain responsibilities for you if you
-distribute copies of the software, or if you modify it.
-
- For example, if you distribute copies of such a program, whether
-gratis or for a fee, you must give the recipients all the rights that
-you have. You must make sure that they, too, receive or can get the
-source code. And you must show them these terms so they know their
-rights.
-
- We protect your rights with two steps: (1) copyright the software, and
-(2) offer you this license which gives you legal permission to copy,
-distribute and/or modify the software.
-
- Also, for each author's protection and ours, we want to make certain
-that everyone understands that there is no warranty for this free
-software. If the software is modified by someone else and passed on, we
-want its recipients to know that what they have is not the original, so
-that any problems introduced by others will not reflect on the original
-authors' reputations.
-
- Finally, any free program is threatened constantly by software
-patents. We wish to avoid the danger that redistributors of a free
-program will individually obtain patent licenses, in effect making the
-program proprietary. To prevent this, we have made it clear that any
-patent must be licensed for everyone's free use or not licensed at all.
-
- The precise terms and conditions for copying, distribution and
-modification follow.
-
- GNU GENERAL PUBLIC LICENSE
- TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
-
- 0. This License applies to any program or other work which contains
-a notice placed by the copyright holder saying it may be distributed
-under the terms of this General Public License. The "Program", below,
-refers to any such program or work, and a "work based on the Program"
-means either the Program or any derivative work under copyright law:
-that is to say, a work containing the Program or a portion of it,
-either verbatim or with modifications and/or translated into another
-language. (Hereinafter, translation is included without limitation in
-the term "modification".) Each licensee is addressed as "you".
-
-Activities other than copying, distribution and modification are not
-covered by this License; they are outside its scope. The act of
-running the Program is not restricted, and the output from the Program
-is covered only if its contents constitute a work based on the
-Program (independent of having been made by running the Program).
-Whether that is true depends on what the Program does.
-
- 1. You may copy and distribute verbatim copies of the Program's
-source code as you receive it, in any medium, provided that you
-conspicuously and appropriately publish on each copy an appropriate
-copyright notice and disclaimer of warranty; keep intact all the
-notices that refer to this License and to the absence of any warranty;
-and give any other recipients of the Program a copy of this License
-along with the Program.
-
-You may charge a fee for the physical act of transferring a copy, and
-you may at your option offer warranty protection in exchange for a fee.
-
- 2. You may modify your copy or copies of the Program or any portion
-of it, thus forming a work based on the Program, and copy and
-distribute such modifications or work under the terms of Section 1
-above, provided that you also meet all of these conditions:
-
- a) You must cause the modified files to carry prominent notices
- stating that you changed the files and the date of any change.
-
- b) You must cause any work that you distribute or publish, that in
- whole or in part contains or is derived from the Program or any
- part thereof, to be licensed as a whole at no charge to all third
- parties under the terms of this License.
-
- c) If the modified program normally reads commands interactively
- when run, you must cause it, when started running for such
- interactive use in the most ordinary way, to print or display an
- announcement including an appropriate copyright notice and a
- notice that there is no warranty (or else, saying that you provide
- a warranty) and that users may redistribute the program under
- these conditions, and telling the user how to view a copy of this
- License. (Exception: if the Program itself is interactive but
- does not normally print such an announcement, your work based on
- the Program is not required to print an announcement.)
-
-These requirements apply to the modified work as a whole. If
-identifiable sections of that work are not derived from the Program,
-and can be reasonably considered independent and separate works in
-themselves, then this License, and its terms, do not apply to those
-sections when you distribute them as separate works. But when you
-distribute the same sections as part of a whole which is a work based
-on the Program, the distribution of the whole must be on the terms of
-this License, whose permissions for other licensees extend to the
-entire whole, and thus to each and every part regardless of who wrote it.
-
-Thus, it is not the intent of this section to claim rights or contest
-your rights to work written entirely by you; rather, the intent is to
-exercise the right to control the distribution of derivative or
-collective works based on the Program.
-
-In addition, mere aggregation of another work not based on the Program
-with the Program (or with a work based on the Program) on a volume of
-a storage or distribution medium does not bring the other work under
-the scope of this License.
-
- 3. You may copy and distribute the Program (or a work based on it,
-under Section 2) in object code or executable form under the terms of
-Sections 1 and 2 above provided that you also do one of the following:
-
- a) Accompany it with the complete corresponding machine-readable
- source code, which must be distributed under the terms of Sections
- 1 and 2 above on a medium customarily used for software interchange; or,
-
- b) Accompany it with a written offer, valid for at least three
- years, to give any third party, for a charge no more than your
- cost of physically performing source distribution, a complete
- machine-readable copy of the corresponding source code, to be
- distributed under the terms of Sections 1 and 2 above on a medium
- customarily used for software interchange; or,
-
- c) Accompany it with the information you received as to the offer
- to distribute corresponding source code. (This alternative is
- allowed only for noncommercial distribution and only if you
- received the program in object code or executable form with such
- an offer, in accord with Subsection b above.)
-
-The source code for a work means the preferred form of the work for
-making modifications to it. For an executable work, complete source
-code means all the source code for all modules it contains, plus any
-associated interface definition files, plus the scripts used to
-control compilation and installation of the executable. However, as a
-special exception, the source code distributed need not include
-anything that is normally distributed (in either source or binary
-form) with the major components (compiler, kernel, and so on) of the
-operating system on which the executable runs, unless that component
-itself accompanies the executable.
-
-If distribution of executable or object code is made by offering
-access to copy from a designated place, then offering equivalent
-access to copy the source code from the same place counts as
-distribution of the source code, even though third parties are not
-compelled to copy the source along with the object code.
-
- 4. You may not copy, modify, sublicense, or distribute the Program
-except as expressly provided under this License. Any attempt
-otherwise to copy, modify, sublicense or distribute the Program is
-void, and will automatically terminate your rights under this License.
-However, parties who have received copies, or rights, from you under
-this License will not have their licenses terminated so long as such
-parties remain in full compliance.
-
- 5. You are not required to accept this License, since you have not
-signed it. However, nothing else grants you permission to modify or
-distribute the Program or its derivative works. These actions are
-prohibited by law if you do not accept this License. Therefore, by
-modifying or distributing the Program (or any work based on the
-Program), you indicate your acceptance of this License to do so, and
-all its terms and conditions for copying, distributing or modifying
-the Program or works based on it.
-
- 6. Each time you redistribute the Program (or any work based on the
-Program), the recipient automatically receives a license from the
-original licensor to copy, distribute or modify the Program subject to
-these terms and conditions. You may not impose any further
-restrictions on the recipients' exercise of the rights granted herein.
-You are not responsible for enforcing compliance by third parties to
-this License.
-
- 7. If, as a consequence of a court judgment or allegation of patent
-infringement or for any other reason (not limited to patent issues),
-conditions are imposed on you (whether by court order, agreement or
-otherwise) that contradict the conditions of this License, they do not
-excuse you from the conditions of this License. If you cannot
-distribute so as to satisfy simultaneously your obligations under this
-License and any other pertinent obligations, then as a consequence you
-may not distribute the Program at all. For example, if a patent
-license would not permit royalty-free redistribution of the Program by
-all those who receive copies directly or indirectly through you, then
-the only way you could satisfy both it and this License would be to
-refrain entirely from distribution of the Program.
-
-If any portion of this section is held invalid or unenforceable under
-any particular circumstance, the balance of the section is intended to
-apply and the section as a whole is intended to apply in other
-circumstances.
-
-It is not the purpose of this section to induce you to infringe any
-patents or other property right claims or to contest validity of any
-such claims; this section has the sole purpose of protecting the
-integrity of the free software distribution system, which is
-implemented by public license practices. Many people have made
-generous contributions to the wide range of software distributed
-through that system in reliance on consistent application of that
-system; it is up to the author/donor to decide if he or she is willing
-to distribute software through any other system and a licensee cannot
-impose that choice.
-
-This section is intended to make thoroughly clear what is believed to
-be a consequence of the rest of this License.
-
- 8. If the distribution and/or use of the Program is restricted in
-certain countries either by patents or by copyrighted interfaces, the
-original copyright holder who places the Program under this License
-may add an explicit geographical distribution limitation excluding
-those countries, so that distribution is permitted only in or among
-countries not thus excluded. In such case, this License incorporates
-the limitation as if written in the body of this License.
-
- 9. The Free Software Foundation may publish revised and/or new versions
-of the General Public License from time to time. Such new versions will
-be similar in spirit to the present version, but may differ in detail to
-address new problems or concerns.
-
-Each version is given a distinguishing version number. If the Program
-specifies a version number of this License which applies to it and "any
-later version", you have the option of following the terms and conditions
-either of that version or of any later version published by the Free
-Software Foundation. If the Program does not specify a version number of
-this License, you may choose any version ever published by the Free Software
-Foundation.
-
- 10. If you wish to incorporate parts of the Program into other free
-programs whose distribution conditions are different, write to the author
-to ask for permission. For software which is copyrighted by the Free
-Software Foundation, write to the Free Software Foundation; we sometimes
-make exceptions for this. Our decision will be guided by the two goals
-of preserving the free status of all derivatives of our free software and
-of promoting the sharing and reuse of software generally.
-
- NO WARRANTY
-
- 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
-FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
-OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
-PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
-OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
-MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
-TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
-PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
-REPAIR OR CORRECTION.
-
- 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
-WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
-REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
-INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
-OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
-TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
-YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
-PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
-POSSIBILITY OF SUCH DAMAGES.
-
- END OF TERMS AND CONDITIONS
-
- Appendix: How to Apply These Terms to Your New Programs
-
- If you develop a new program, and you want it to be of the greatest
-possible use to the public, the best way to achieve this is to make it
-free software which everyone can redistribute and change under these terms.
-
- To do so, attach the following notices to the program. It is safest
-to attach them to the start of each source file to most effectively
-convey the exclusion of warranty; and each file should have at least
-the "copyright" line and a pointer to where the full notice is found.
-
- <one line to give the program's name and a brief idea of what it does.>
- Copyright (C) 19yy <name of author>
-
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation; either version 2 of the License, or
- (at your option) any later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
-
- You should have received a copy of the GNU General Public License
- along with this program; if not, write to the Free Software
- Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
-
-Also add information on how to contact you by electronic and paper mail.
-
-If the program is interactive, make it output a short notice like this
-when it starts in an interactive mode:
-
- Gnomovision version 69, Copyright (C) 19yy name of author
- Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
- This is free software, and you are welcome to redistribute it
- under certain conditions; type `show c' for details.
-
-The hypothetical commands `show w' and `show c' should show the appropriate
-parts of the General Public License. Of course, the commands you use may
-be called something other than `show w' and `show c'; they could even be
-mouse-clicks or menu items--whatever suits your program.
-
-You should also get your employer (if you work as a programmer) or your
-school, if any, to sign a "copyright disclaimer" for the program, if
-necessary. Here is a sample; alter the names:
-
- Yoyodyne, Inc., hereby disclaims all copyright interest in the program
- `Gnomovision' (which makes passes at compilers) written by James Hacker.
-
- <signature of Ty Coon>, 1 April 1989
- Ty Coon, President of Vice
-
-This General Public License does not permit incorporating your program into
-proprietary programs. If your program is a subroutine library, you may
-consider it more useful to permit linking proprietary applications with the
-library. If this is what you want to do, use the GNU Library General
-Public License instead of this License.
+ GNU GENERAL PUBLIC LICENSE
+ Version 2, June 1991
+
+ Copyright (C) 1989, 1991 Free Software Foundation, Inc.
+ 675 Mass Ave, Cambridge, MA 02139, USA
+ 617-542-5942
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+ Preamble
+
+ The licenses for most software are designed to take away your
+freedom to share and change it. By contrast, the GNU General Public
+License is intended to guarantee your freedom to share and change free
+software--to make sure the software is free for all its users. This
+General Public License applies to most of the Free Software
+Foundation's software and to any other program whose authors commit to
+using it. (Some other Free Software Foundation software is covered by
+the GNU Library General Public License instead.) You can apply it to
+your programs, too.
+
+ When we speak of free software, we are referring to freedom, not
+price. Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+this service if you wish), that you receive source code or can get it
+if you want it, that you can change the software or use pieces of it
+in new free programs; and that you know you can do these things.
+
+ To protect your rights, we need to make restrictions that forbid
+anyone to deny you these rights or to ask you to surrender the rights.
+These restrictions translate to certain responsibilities for you if you
+distribute copies of the software, or if you modify it.
+
+ For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must give the recipients all the rights that
+you have. You must make sure that they, too, receive or can get the
+source code. And you must show them these terms so they know their
+rights.
+
+ We protect your rights with two steps: (1) copyright the software, and
+(2) offer you this license which gives you legal permission to copy,
+distribute and/or modify the software.
+
+ Also, for each author's protection and ours, we want to make certain
+that everyone understands that there is no warranty for this free
+software. If the software is modified by someone else and passed on, we
+want its recipients to know that what they have is not the original, so
+that any problems introduced by others will not reflect on the original
+authors' reputations.
+
+ Finally, any free program is threatened constantly by software
+patents. We wish to avoid the danger that redistributors of a free
+program will individually obtain patent licenses, in effect making the
+program proprietary. To prevent this, we have made it clear that any
+patent must be licensed for everyone's free use or not licensed at all.
+
+ The precise terms and conditions for copying, distribution and
+modification follow.
+
+ GNU GENERAL PUBLIC LICENSE
+ TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
+
+ 0. This License applies to any program or other work which contains
+a notice placed by the copyright holder saying it may be distributed
+under the terms of this General Public License. The "Program", below,
+refers to any such program or work, and a "work based on the Program"
+means either the Program or any derivative work under copyright law:
+that is to say, a work containing the Program or a portion of it,
+either verbatim or with modifications and/or translated into another
+language. (Hereinafter, translation is included without limitation in
+the term "modification".) Each licensee is addressed as "you".
+
+Activities other than copying, distribution and modification are not
+covered by this License; they are outside its scope. The act of
+running the Program is not restricted, and the output from the Program
+is covered only if its contents constitute a work based on the
+Program (independent of having been made by running the Program).
+Whether that is true depends on what the Program does.
+
+ 1. You may copy and distribute verbatim copies of the Program's
+source code as you receive it, in any medium, provided that you
+conspicuously and appropriately publish on each copy an appropriate
+copyright notice and disclaimer of warranty; keep intact all the
+notices that refer to this License and to the absence of any warranty;
+and give any other recipients of the Program a copy of this License
+along with the Program.
+
+You may charge a fee for the physical act of transferring a copy, and
+you may at your option offer warranty protection in exchange for a fee.
+
+ 2. You may modify your copy or copies of the Program or any portion
+of it, thus forming a work based on the Program, and copy and
+distribute such modifications or work under the terms of Section 1
+above, provided that you also meet all of these conditions:
+
+ a) You must cause the modified files to carry prominent notices
+ stating that you changed the files and the date of any change.
+
+ b) You must cause any work that you distribute or publish, that in
+ whole or in part contains or is derived from the Program or any
+ part thereof, to be licensed as a whole at no charge to all third
+ parties under the terms of this License.
+
+ c) If the modified program normally reads commands interactively
+ when run, you must cause it, when started running for such
+ interactive use in the most ordinary way, to print or display an
+ announcement including an appropriate copyright notice and a
+ notice that there is no warranty (or else, saying that you provide
+ a warranty) and that users may redistribute the program under
+ these conditions, and telling the user how to view a copy of this
+ License. (Exception: if the Program itself is interactive but
+ does not normally print such an announcement, your work based on
+ the Program is not required to print an announcement.)
+
+These requirements apply to the modified work as a whole. If
+identifiable sections of that work are not derived from the Program,
+and can be reasonably considered independent and separate works in
+themselves, then this License, and its terms, do not apply to those
+sections when you distribute them as separate works. But when you
+distribute the same sections as part of a whole which is a work based
+on the Program, the distribution of the whole must be on the terms of
+this License, whose permissions for other licensees extend to the
+entire whole, and thus to each and every part regardless of who wrote it.
+
+Thus, it is not the intent of this section to claim rights or contest
+your rights to work written entirely by you; rather, the intent is to
+exercise the right to control the distribution of derivative or
+collective works based on the Program.
+
+In addition, mere aggregation of another work not based on the Program
+with the Program (or with a work based on the Program) on a volume of
+a storage or distribution medium does not bring the other work under
+the scope of this License.
+
+ 3. You may copy and distribute the Program (or a work based on it,
+under Section 2) in object code or executable form under the terms of
+Sections 1 and 2 above provided that you also do one of the following:
+
+ a) Accompany it with the complete corresponding machine-readable
+ source code, which must be distributed under the terms of Sections
+ 1 and 2 above on a medium customarily used for software interchange; or,
+
+ b) Accompany it with a written offer, valid for at least three
+ years, to give any third party, for a charge no more than your
+ cost of physically performing source distribution, a complete
+ machine-readable copy of the corresponding source code, to be
+ distributed under the terms of Sections 1 and 2 above on a medium
+ customarily used for software interchange; or,
+
+ c) Accompany it with the information you received as to the offer
+ to distribute corresponding source code. (This alternative is
+ allowed only for noncommercial distribution and only if you
+ received the program in object code or executable form with such
+ an offer, in accord with Subsection b above.)
+
+The source code for a work means the preferred form of the work for
+making modifications to it. For an executable work, complete source
+code means all the source code for all modules it contains, plus any
+associated interface definition files, plus the scripts used to
+control compilation and installation of the executable. However, as a
+special exception, the source code distributed need not include
+anything that is normally distributed (in either source or binary
+form) with the major components (compiler, kernel, and so on) of the
+operating system on which the executable runs, unless that component
+itself accompanies the executable.
+
+If distribution of executable or object code is made by offering
+access to copy from a designated place, then offering equivalent
+access to copy the source code from the same place counts as
+distribution of the source code, even though third parties are not
+compelled to copy the source along with the object code.
+
+ 4. You may not copy, modify, sublicense, or distribute the Program
+except as expressly provided under this License. Any attempt
+otherwise to copy, modify, sublicense or distribute the Program is
+void, and will automatically terminate your rights under this License.
+However, parties who have received copies, or rights, from you under
+this License will not have their licenses terminated so long as such
+parties remain in full compliance.
+
+ 5. You are not required to accept this License, since you have not
+signed it. However, nothing else grants you permission to modify or
+distribute the Program or its derivative works. These actions are
+prohibited by law if you do not accept this License. Therefore, by
+modifying or distributing the Program (or any work based on the
+Program), you indicate your acceptance of this License to do so, and
+all its terms and conditions for copying, distributing or modifying
+the Program or works based on it.
+
+ 6. Each time you redistribute the Program (or any work based on the
+Program), the recipient automatically receives a license from the
+original licensor to copy, distribute or modify the Program subject to
+these terms and conditions. You may not impose any further
+restrictions on the recipients' exercise of the rights granted herein.
+You are not responsible for enforcing compliance by third parties to
+this License.
+
+ 7. If, as a consequence of a court judgment or allegation of patent
+infringement or for any other reason (not limited to patent issues),
+conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License. If you cannot
+distribute so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you
+may not distribute the Program at all. For example, if a patent
+license would not permit royalty-free redistribution of the Program by
+all those who receive copies directly or indirectly through you, then
+the only way you could satisfy both it and this License would be to
+refrain entirely from distribution of the Program.
+
+If any portion of this section is held invalid or unenforceable under
+any particular circumstance, the balance of the section is intended to
+apply and the section as a whole is intended to apply in other
+circumstances.
+
+It is not the purpose of this section to induce you to infringe any
+patents or other property right claims or to contest validity of any
+such claims; this section has the sole purpose of protecting the
+integrity of the free software distribution system, which is
+implemented by public license practices. Many people have made
+generous contributions to the wide range of software distributed
+through that system in reliance on consistent application of that
+system; it is up to the author/donor to decide if he or she is willing
+to distribute software through any other system and a licensee cannot
+impose that choice.
+
+This section is intended to make thoroughly clear what is believed to
+be a consequence of the rest of this License.
+
+ 8. If the distribution and/or use of the Program is restricted in
+certain countries either by patents or by copyrighted interfaces, the
+original copyright holder who places the Program under this License
+may add an explicit geographical distribution limitation excluding
+those countries, so that distribution is permitted only in or among
+countries not thus excluded. In such case, this License incorporates
+the limitation as if written in the body of this License.
+
+ 9. The Free Software Foundation may publish revised and/or new versions
+of the General Public License from time to time. Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+Each version is given a distinguishing version number. If the Program
+specifies a version number of this License which applies to it and "any
+later version", you have the option of following the terms and conditions
+either of that version or of any later version published by the Free
+Software Foundation. If the Program does not specify a version number of
+this License, you may choose any version ever published by the Free Software
+Foundation.
+
+ 10. If you wish to incorporate parts of the Program into other free
+programs whose distribution conditions are different, write to the author
+to ask for permission. For software which is copyrighted by the Free
+Software Foundation, write to the Free Software Foundation; we sometimes
+make exceptions for this. Our decision will be guided by the two goals
+of preserving the free status of all derivatives of our free software and
+of promoting the sharing and reuse of software generally.
+
+ NO WARRANTY
+
+ 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
+FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
+OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
+PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
+OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
+TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
+PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
+REPAIR OR CORRECTION.
+
+ 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
+REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
+INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
+OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
+TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
+YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
+PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGES.
+
+ END OF TERMS AND CONDITIONS
+
+ Appendix: How to Apply These Terms to Your New Programs
+
+ If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+ To do so, attach the following notices to the program. It is safest
+to attach them to the start of each source file to most effectively
+convey the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+ <one line to give the program's name and a brief idea of what it does.>
+ Copyright (C) 19yy <name of author>
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+
+Also add information on how to contact you by electronic and paper mail.
+
+If the program is interactive, make it output a short notice like this
+when it starts in an interactive mode:
+
+ Gnomovision version 69, Copyright (C) 19yy name of author
+ Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+ This is free software, and you are welcome to redistribute it
+ under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License. Of course, the commands you use may
+be called something other than `show w' and `show c'; they could even be
+mouse-clicks or menu items--whatever suits your program.
+
+You should also get your employer (if you work as a programmer) or your
+school, if any, to sign a "copyright disclaimer" for the program, if
+necessary. Here is a sample; alter the names:
+
+ Yoyodyne, Inc., hereby disclaims all copyright interest in the program
+ `Gnomovision' (which makes passes at compilers) written by James Hacker.
+
+ <signature of Ty Coon>, 1 April 1989
+ Ty Coon, President of Vice
+
+This General Public License does not permit incorporating your program into
+proprietary programs. If your program is a subroutine library, you may
+consider it more useful to permit linking proprietary applications with the
+library. If this is what you want to do, use the GNU Library General
+Public License instead of this License.
diff --git a/GNUCIDE.DIR b/GNUCIDE.DIR
deleted file mode 100644
index cc416a7..0000000
--- a/GNUCIDE.DIR
+++ b/dev/null
@@ -1,36 +0,0 @@
-CIDE A 3,680,387 04-10-02 9:29a CIDE.A
-CIDE B 3,154,243 04-10-02 9:32a CIDE.B
-CIDE C 5,525,332 04-10-02 9:33a CIDE.C
-CIDE D 3,370,374 04-10-02 9:35a cide.d
-CIDE E 2,289,630 01-17-02 10:58p CIDE.E
-CIDE F 2,453,360 01-17-02 11:00p CIDE.F
-CIDE G 1,795,200 04-10-02 9:37a CIDE.G
-CIDE H 2,086,911 01-17-02 11:04p CIDE.H
-CIDE I 2,390,954 01-17-02 11:06p CIDE.I
-CIDE J 497,623 01-17-02 11:07p CIDE.J
-CIDE K 460,759 01-17-02 11:08p CIDE.K
-CIDE L 2,001,288 01-17-02 11:10p CIDE.L
-CIDE M 2,977,038 01-17-02 11:12p CIDE.M
-CIDE N 1,054,928 01-17-02 11:16p CIDE.N
-CIDE O 1,404,418 01-17-02 11:18p CIDE.O
-CIDE P 4,645,196 04-10-02 9:41a CIDE.P
-CIDE Q 312,451 04-10-02 9:43a CIDE.Q
-CIDE R 2,673,840 01-17-02 11:25p CIDE.R
-CIDE S 6,331,172 04-10-02 9:46a CIDE.S
-CIDE T 2,985,967 01-17-02 11:30p CIDE.T
-CIDE U 963,375 01-17-02 11:32p CIDE.U
-CIDE V 962,468 01-17-02 11:33p CIDE.V
-CIDE W 1,569,184 01-17-02 11:35p CIDE.W
-CIDE X 48,494 01-17-02 11:36p CIDE.X
-CIDE Y 182,838 01-17-02 11:38p CIDE.Y
-CIDE Z 134,301 01-17-02 11:41p CIDE.Z
-README DIC 13,775 01-17-02 11:55p readme.dic
-GNUCIDE DIR 1,899 04-10-02 9:58a GNUCIDE.DIR
-PRONUNC JPG 2,569,796 06-18-00 3:11p PRONUNC.JPG
-PRONUNC WEB 14,312 06-18-00 3:02p PRONUNC.WEB
-SYMBOLS JPG 144,716 06-18-00 3:13p SYMBOLS.JPG
-TAGSET WEB 55,843 08-16-01 1:16p TAGSET.WEB
-WEBFONT ASC 35,234 12-12-01 3:27p WEBFONT.ASC
-WXXVII JPG 1,188,380 06-18-00 3:19p WXXVII.JPG
-COPYING 18,361 02-11-02 4:02p COPYING
- 35 file(s) 59,994,047 bytes
diff --git a/PRONUNC.WEB b/PRONUNC.WEB
index 39ed073..325f8ce 100644
--- a/PRONUNC.WEB
+++ b/PRONUNC.WEB
@@ -1,318 +1,318 @@
-file PRONUNC.WEB
-================
- This file gives a number of examples of pronunciation,
-using the entity symbols representing the pronunciations as
-found in the 1913 Webster unabridged dictionary. Not all
-vowel sounds are given here, but the examples should allow one
-to recognize the characters and recall the symbols used to
-represent them. The set of symbols used for pronunciation
-is different from that used in most modern dictionaries,
-but a more worrisome problem is that the pronuncitions themselves
-seem in many cases to differ from modern usage. The places of
-the strong and weak accent are, however, in every case
-examined the same as in modern dictionaries. Anyone who is
-willing to work at revising the pronunciations to reflect modern
-usage or modern symbols should contact PJC.
-
-
- Pronunciations in the 1913 Webster ASCII version
- =================================================
-
-Syllables:
-----------------
- in pronunciations, the short hyphen used in the printed version as a
-syllable-break is represented in the ASCII version by an asterisk (*).
- the main (heavy) accent is represented by a double-quote (").
- the secondary (light) accent is represented by a left-single-quote
-(grave accent) (`)
- the hyphen in hyphenated words is represented by the ASCII hypen (-).
- where an accent occurs, no other syllable break is used.
- sometimes a hyphen occurs after an accent.
- ------------------------------------------------
-
-Consonants:
- Most consonants have their normal value in the pronunciations,
-but there are a few special characters, as the n-submacron and the
-"th" ligature. See the end of the "special characters" section.
-
-Special characters:
---------------------
- The special characters are represented by two different sets of
-symbols: (1) the RTF-format hexadecimal codes such as \'94 for
-o-umlaut, meaning that the byte code is hexadecimal 94. These
-are used only for those symbols which have been designed into a
-special font set for this dictionary. The font set can only be used
-in a DOS system; or
-(2) an "entity" symbol using "<" and "/" as opening and closing
-delimiters, with a mnemonic string between. In the case of o-umlaut
-the symbol is <oum/. For the vowels, the system is consistent,
-thus <aum/ is a-umlaut, and <ium/ is i-umlaut, etc.
- These delimiters are used in preference to the HTML-style
-(e.g. &auml;) delimiters because of the heavy use of ampersands in
-the dictionary, to minimize file length. For the same reason,
-the codes within the delimiters are generally shorter than the
-corresponding ISO 8879 codes ( <aum/ rather than &auml; ).
- For this discussion, I will use the "entity" coding. The
-equivalent hexadecimal codes, where they exist, will be found in
-the tables in the file "webfont.asc".
-
- The pronunciation system of the 1913 Webster has three peculiarities
-relative to systems used in recent dictionaries.
-(1) a more complex set of symbols are used. This is evident, for
- example, where the long vowels have different symbols whether
- they are used in stressed or unstressed syllables. Thus
- long a in "acre" or "chaos"is represented as a-macron (<amac/ in
- our notation). But in "chaotic" or "connate" or "comate" it is
- represented as a symbol looking like a-macron, but with a short
- ascender in the middle of the macron above the a. This is denoted
- <asl/ ("a semilong") in our notation.
-
- Also, some sounds have more than one symbol. Thus, there are several
- symbols using "y" with a diacritical mark above, representing
- identical sounds using "i" or "e", but used in those cases where the
- written word has a "y" in it. So words ending in "y" with
- pronunciations like the unaccented long "e" usually have
- a y-breve (<ycr/) in the pronunciation. Why? Apparently,
- just to look more like the spelling. In these cases its
- meaning is unambuiguous.
-
-(2) The indicated pronunciations themselves are in some cases
- different from what one would find in a modern dictionary.
- In part this is due to differences among orthoepists with
- different notions of how a word should sound, and possibly
- it is due to differences in the pronunciation between 1890,
- when British pronunciations may have had more influence, and
- the present. Thus we see that words ending in -"ties",
- which are given the pronunciation "-t<icr/z", which sounds
- like "tizz", whereas I have always heard such words pronounced
- with a long "e", as in "teez" (and most modern dictionaries
- give it the long-e pronunciation. In Webster's 10th collegiate,
- they mention that unstressed long e may be pronounced as i in
- southern British or southern US dialects, and perhaps it
- was more common in the US in 1890. The <icr/ is an unreliable
- indicator of modern standard American pronunciation. A long-e
- pronunciation on the antepenult is also sometimes given an
- <icr/ symbol in this dictionary.
-
-(3) The indefinite value, represented by an upside-down e (called
- the "schwa" is not used, the same sound being represented by
- symbols like short u <ucr/, or sometimes other vowels.
-
- So be warned, the pronunciations may not be quite what one would
- expect. But for the first phase of this effort, we are trying
- to reproduce exactly the pronuciations in the original work.
-
- Notice that in pronunciations, vowels that are obscured are often
- represented by the italicised vowel without any diacritical marks;
- these italicised vowels are represented as either <ait/, <eit/, etc.
- or with an <it> tag, as in m<it>e</it>nt
- Thus "Christian" is represented as kr<icr/s"ch<it>a</it>n
- communicant is represented as k<ocr/m*m<umac/"n<icr/*k<ait/nt
-
-
- Some examples of pronunciations follow:
- for further explanations of the entities, see the file "webfont.asc"
- ==============================================================
-
- <amac/ long a (stressed) (a with a macron above it)
- late = l<amac/t
- later = l<amac/t"<etil/r
- comb-shaped = k<omac/m"-sh<amac/pt`
- commemorate = k<ocr/m*m<ecr/m"<osl/*r<amac/t
- deign = d<amac/n
- deflate = d<esl/*fl<amac/t"
- defray = d<esl/*fr<amac/"
- defrayal = d<esl/*fr<amac/"<ait/l
-
-
- <asl/ long a (unstressed)
- commodate = k<ocr/m"m<osl/*d<asl/t
- cometary = k<ocr/m"<ecr/t*<asl/*r<ycr/
-
- <ait/ italic a
- communicant = k<ocr/m*m<umac/"n<icr/*k<ait/nt
- defeasance = d<esl/*f<emac/"z<ait/ns
- commercial = k<ocr/m*m<etil/r"sh<ait/l
- compass = k<ucr/m"p<ait/s
-
- <acr/ short a (a with a crescent [breve] above it)
- adipose = <acr/d"<icr/*p<omac/s
- absolve = <acr/b*s<ocr/lv"
- land = l<acr/nd
- lamp = l<acr/mp
-
- <adot/ short a (a with a dot above it)
- again = <adot/*g<ecr/n"
- carouse = k<adot/*rouz"
- coma = k<omac/"m<adot/
- comma = k<ocr/m"m<adot/ | *These sound different
- command = k<ocr/m*m<adot/nd" | to me
- mass = m<adot/s
- mash = m<adot/sh
- mat = m<adot/t
-
- <acir/ a-circumflex ("only in syllables closed by r")
- care = k<acir/r
- chair = ch<acir/r
- share = sh<acir/r
- compare = k<ocr/m*p<acir/r"
-
- <aum/ a-umlaut (in pronunciations not the same as in words)
- arsenic = <aum/r"s<esl/*n<icr/k
- arson = <aum/r"s'n
- arm = <aum/rm
- carp = k<aum/rp
- far = f<aum/r
- mar = m<aum/r
- compart = k<ocr/m*p<aum/rt"
- compartment = k<ocr/m*p<aum/rt"m<eit/nt
-
- <add/ a double dot ( with a double dot *below*)
- all = <add/l
- talk = t<add/k
- swarm = sw<add/rm [not aum??]
- water = w<add/"t<etil/r
- default = d<esl/*f<add/lt"
- defraud = d<esl/*fr<add/d"
- deerstalker = d<emac/r"st<add/k`<etil/r
-
-
- <ecr/ short e (e with a crescent [breve] above it)
- degenerate = d<esl/*j<ecr/n"<etil/r*<amac/t
- delve = d<ecr/lv
- end = <ecr/nd
- pet = p<ecr/t
- ten = t<ecr/n
-
- <esl/ long e (unstressed)
- committee = k<ocr/m*m<icr/t"t<esl/
- defame = d<esl/*f<amac/m"
- define = d<esl/*f<imac/n"
- comedy = k<ocr/m"<esl/*d<ycr/
-
- <eit/ e italic
- compartment = k<ocr/m*p<aum/rt"m<eit/nt
- -ment = -"m<eit/nt (for most -ment endings)
-
- <emac/ e macron (long e, stressed)
- compeer = k<ocr/m*p<emac/r"
- deer = d<emac/r"
-
- <etil/ e-tilde
- (representing the e before r in many words)
- (for the same sound in -ur words, <ucir/ is used!)
- fern = f<etil/rn
- commercial = k<ocr/m*m<etil/r"sh<ait/l
- commerce = k<ocr/m"m<etil/rs
-
- <icr/ short i (i with a crescent [breve] above it)
- Note: In most cases, this is used where the
- short i sound of "lip" is intended, but it is
- also used in the middle of words where Americans
- use an unstressed long "e" sound, (as the
- "i" in "serial" and "serious")!?
- and also in words ending in "ies",
- coded as "<icr/z" (as in liberties)
- lip = l<icr/p
- pin = p<icr/n
- commission = k<ocr/m*m<icr/sh"<ucr/n
- committal = k<ocr/m*m<icr/t"t<ait/l
- *serial = s<emac/"r<icr/*<ait/l
- *serious = s<emac/"r<icr/*<ucr/s
- liberty = l<icr/b"<etil/r*t<ycr/
- *but: liberties = l<icr/b"<etil/r*t<icr/z
-
- <imac/ i-macron (long i, stressed) (i with a macron above it)
- combine = k<ocr/m*b<imac/n"
- combined = k<ocr/m*b<imac/"nd
-
- <isl/ long i (unstressed)
- diameter = d<isl/*<acr/m"<esl/*t<etil/r
- diagonal = d<isl/*<acr/g"<osl/*n<ait/l
-
-
- <ocr/ short o (o with a crescent [breve] above it)
- colossus = k<osl/*l<ocr/s"s<ucr/s
- commute = k<ocr/m*m<umac/t"
-
- <omac/ o-macron (long o, stressed) (o with a macron above it)
- boat = b<omac/t
- colt = k<omac/lt
- comb = k<omac/m
- combing = k<omac/m"<icr/ng
- commode = k<ocr/m*m<omac/d"
- course = k<omac/rs
-
- <ocir/ o-circumflex ("only in syllables closed by r")
- orb = <ocir/rb
- lord = l<ocir/rd
- lordship = l<ocir/rd"sh<icr/p
- lorn = l<ocir/rn
- cord = k<ocir/rd
- commorse = k<ocr/m*m<ocir/rs"
- deform = d<esl/*f<ocir/rm"
- deformed = d<esl/*f<ocir/rmd"
- dehortative = d<esl/*h<ocir/rt"<adot*t<icr/v
-
- <osl/ "o semilong" (long o, unstressed)
- diagonal = d<isl/*<acr/g"<osl/*n<ait/l
- dejectory = d<esl/*j<ecr/k"t<osl/*r<ycr/
-
- <oomac/ oo-macron (an oo with a macron above both o's)
- boom = b<oomac/m
- boot = b<oomac/t
- boost = b<oomac/st
- commove = k<ocr/m*m<oomac/v"
-
- <oomcr/ oo-crescent (an oo with a crescent [breve] above both o's)
- foot = f<oocr/t
- cook = k<oocr/k
-
- <umac/ u macron (long u)
- commute = k<ocr/m*m<umac/t"
- definitude = d<esl/*f<icr/n"<icr/*t<umac/d
- communicant = k<ocr/m*m<umac/"n<icr/*k<ait/nt
- defuse = d<esl/*f<umac/z"
-
- <ucr/ short u (u with a crescent [breve] above it)
- come = k<ucr/m
- color = k<ucr/l"<etil/r
- colored = k<ucr/l"<etil/rd
- Columbia = k<osl/*l<ucr/m"b<icr/*<adot/
- up = <ucr/p
-
- <ycr/ y-crescent (y with a crescent [breve] above it)
- used mostly for y-endings (supposed to sound similar to <icr/!!)
- sounds to me like an unstressed long e
- comedy = k<ocr/m"<esl/*d<ycr/
- comely = k<ucr/m"l<ycr/
- liberty = l<icr/b"<etil/r*t<ycr/
-
- <ymac/ y-macron (y with a macron above it)
- used to represent the long i (stressed) sound, but
- examples in pronunciations seem to be absent. It is
- found in some foreign words in the etymologies.
-
- ou the common "ow" sound of "town", "browse"
- count = kount
-
- <nsm/ n-submacron (an n with a macron underneath)
- represents the "ng" sound when it occurs before a
- consonant
- defunct = d<esl/*f<ucr/<nsm/kt"
- commingle = k<ocr/m*m<icr/<nsm/"g'l
-
- <th/ the "th" sound in "mother"
- this is represented in the printed work by a th ligature
- carouse = k<adot/*rouz"
-
- zh not a special character, but used to represent the
- "si" sound in words like
-
- decision = d<esl/*s<icr/zh"<ucr/n
-
- th the usual sound as in thing and thorn
- sh the usual as in ship
- ch the usual as in chip
- N (capital N) represents the nasal "n" sound of the French language
-
+file PRONUNC.WEB
+================
+ This file gives a number of examples of pronunciation,
+using the entity symbols representing the pronunciations as
+found in the 1913 Webster unabridged dictionary. Not all
+vowel sounds are given here, but the examples should allow one
+to recognize the characters and recall the symbols used to
+represent them. The set of symbols used for pronunciation
+is different from that used in most modern dictionaries,
+but a more worrisome problem is that the pronuncitions themselves
+seem in many cases to differ from modern usage. The places of
+the strong and weak accent are, however, in every case
+examined the same as in modern dictionaries. Anyone who is
+willing to work at revising the pronunciations to reflect modern
+usage or modern symbols should contact PJC.
+
+
+ Pronunciations in the 1913 Webster ASCII version
+ =================================================
+
+Syllables:
+----------------
+ in pronunciations, the short hyphen used in the printed version as a
+syllable-break is represented in the ASCII version by an asterisk (*).
+ the main (heavy) accent is represented by a double-quote (").
+ the secondary (light) accent is represented by a left-single-quote
+(grave accent) (`)
+ the hyphen in hyphenated words is represented by the ASCII hypen (-).
+ where an accent occurs, no other syllable break is used.
+ sometimes a hyphen occurs after an accent.
+ ------------------------------------------------
+
+Consonants:
+ Most consonants have their normal value in the pronunciations,
+but there are a few special characters, as the n-submacron and the
+"th" ligature. See the end of the "special characters" section.
+
+Special characters:
+--------------------
+ The special characters are represented by two different sets of
+symbols: (1) the RTF-format hexadecimal codes such as \'94 for
+o-umlaut, meaning that the byte code is hexadecimal 94. These
+are used only for those symbols which have been designed into a
+special font set for this dictionary. The font set can only be used
+in a DOS system; or
+(2) an "entity" symbol using "<" and "/" as opening and closing
+delimiters, with a mnemonic string between. In the case of o-umlaut
+the symbol is <oum/. For the vowels, the system is consistent,
+thus <aum/ is a-umlaut, and <ium/ is i-umlaut, etc.
+ These delimiters are used in preference to the HTML-style
+(e.g. &auml;) delimiters because of the heavy use of ampersands in
+the dictionary, to minimize file length. For the same reason,
+the codes within the delimiters are generally shorter than the
+corresponding ISO 8879 codes ( <aum/ rather than &auml; ).
+ For this discussion, I will use the "entity" coding. The
+equivalent hexadecimal codes, where they exist, will be found in
+the tables in the file "webfont.asc".
+
+ The pronunciation system of the 1913 Webster has three peculiarities
+relative to systems used in recent dictionaries.
+(1) a more complex set of symbols are used. This is evident, for
+ example, where the long vowels have different symbols whether
+ they are used in stressed or unstressed syllables. Thus
+ long a in "acre" or "chaos"is represented as a-macron (<amac/ in
+ our notation). But in "chaotic" or "connate" or "comate" it is
+ represented as a symbol looking like a-macron, but with a short
+ ascender in the middle of the macron above the a. This is denoted
+ <asl/ ("a semilong") in our notation.
+
+ Also, some sounds have more than one symbol. Thus, there are several
+ symbols using "y" with a diacritical mark above, representing
+ identical sounds using "i" or "e", but used in those cases where the
+ written word has a "y" in it. So words ending in "y" with
+ pronunciations like the unaccented long "e" usually have
+ a y-breve (<ycr/) in the pronunciation. Why? Apparently,
+ just to look more like the spelling. In these cases its
+ meaning is unambuiguous.
+
+(2) The indicated pronunciations themselves are in some cases
+ different from what one would find in a modern dictionary.
+ In part this is due to differences among orthoepists with
+ different notions of how a word should sound, and possibly
+ it is due to differences in the pronunciation between 1890,
+ when British pronunciations may have had more influence, and
+ the present. Thus we see that words ending in -"ties",
+ which are given the pronunciation "-t<icr/z", which sounds
+ like "tizz", whereas I have always heard such words pronounced
+ with a long "e", as in "teez" (and most modern dictionaries
+ give it the long-e pronunciation. In Webster's 10th collegiate,
+ they mention that unstressed long e may be pronounced as i in
+ southern British or southern US dialects, and perhaps it
+ was more common in the US in 1890. The <icr/ is an unreliable
+ indicator of modern standard American pronunciation. A long-e
+ pronunciation on the antepenult is also sometimes given an
+ <icr/ symbol in this dictionary.
+
+(3) The indefinite value, represented by an upside-down e (called
+ the "schwa" is not used, the same sound being represented by
+ symbols like short u <ucr/, or sometimes other vowels.
+
+ So be warned, the pronunciations may not be quite what one would
+ expect. But for the first phase of this effort, we are trying
+ to reproduce exactly the pronuciations in the original work.
+
+ Notice that in pronunciations, vowels that are obscured are often
+ represented by the italicised vowel without any diacritical marks;
+ these italicised vowels are represented as either <ait/, <eit/, etc.
+ or with an <it> tag, as in m<it>e</it>nt
+ Thus "Christian" is represented as kr<icr/s"ch<it>a</it>n
+ communicant is represented as k<ocr/m*m<umac/"n<icr/*k<ait/nt
+
+
+ Some examples of pronunciations follow:
+ for further explanations of the entities, see the file "webfont.asc"
+ ==============================================================
+
+ <amac/ long a (stressed) (a with a macron above it)
+ late = l<amac/t
+ later = l<amac/t"<etil/r
+ comb-shaped = k<omac/m"-sh<amac/pt`
+ commemorate = k<ocr/m*m<ecr/m"<osl/*r<amac/t
+ deign = d<amac/n
+ deflate = d<esl/*fl<amac/t"
+ defray = d<esl/*fr<amac/"
+ defrayal = d<esl/*fr<amac/"<ait/l
+
+
+ <asl/ long a (unstressed)
+ commodate = k<ocr/m"m<osl/*d<asl/t
+ cometary = k<ocr/m"<ecr/t*<asl/*r<ycr/
+
+ <ait/ italic a
+ communicant = k<ocr/m*m<umac/"n<icr/*k<ait/nt
+ defeasance = d<esl/*f<emac/"z<ait/ns
+ commercial = k<ocr/m*m<etil/r"sh<ait/l
+ compass = k<ucr/m"p<ait/s
+
+ <acr/ short a (a with a crescent [breve] above it)
+ adipose = <acr/d"<icr/*p<omac/s
+ absolve = <acr/b*s<ocr/lv"
+ land = l<acr/nd
+ lamp = l<acr/mp
+
+ <adot/ short a (a with a dot above it)
+ again = <adot/*g<ecr/n"
+ carouse = k<adot/*rouz"
+ coma = k<omac/"m<adot/
+ comma = k<ocr/m"m<adot/ | *These sound different
+ command = k<ocr/m*m<adot/nd" | to me
+ mass = m<adot/s
+ mash = m<adot/sh
+ mat = m<adot/t
+
+ <acir/ a-circumflex ("only in syllables closed by r")
+ care = k<acir/r
+ chair = ch<acir/r
+ share = sh<acir/r
+ compare = k<ocr/m*p<acir/r"
+
+ <aum/ a-umlaut (in pronunciations not the same as in words)
+ arsenic = <aum/r"s<esl/*n<icr/k
+ arson = <aum/r"s'n
+ arm = <aum/rm
+ carp = k<aum/rp
+ far = f<aum/r
+ mar = m<aum/r
+ compart = k<ocr/m*p<aum/rt"
+ compartment = k<ocr/m*p<aum/rt"m<eit/nt
+
+ <add/ a double dot ( with a double dot *below*)
+ all = <add/l
+ talk = t<add/k
+ swarm = sw<add/rm [not aum??]
+ water = w<add/"t<etil/r
+ default = d<esl/*f<add/lt"
+ defraud = d<esl/*fr<add/d"
+ deerstalker = d<emac/r"st<add/k`<etil/r
+
+
+ <ecr/ short e (e with a crescent [breve] above it)
+ degenerate = d<esl/*j<ecr/n"<etil/r*<amac/t
+ delve = d<ecr/lv
+ end = <ecr/nd
+ pet = p<ecr/t
+ ten = t<ecr/n
+
+ <esl/ long e (unstressed)
+ committee = k<ocr/m*m<icr/t"t<esl/
+ defame = d<esl/*f<amac/m"
+ define = d<esl/*f<imac/n"
+ comedy = k<ocr/m"<esl/*d<ycr/
+
+ <eit/ e italic
+ compartment = k<ocr/m*p<aum/rt"m<eit/nt
+ -ment = -"m<eit/nt (for most -ment endings)
+
+ <emac/ e macron (long e, stressed)
+ compeer = k<ocr/m*p<emac/r"
+ deer = d<emac/r"
+
+ <etil/ e-tilde
+ (representing the e before r in many words)
+ (for the same sound in -ur words, <ucir/ is used!)
+ fern = f<etil/rn
+ commercial = k<ocr/m*m<etil/r"sh<ait/l
+ commerce = k<ocr/m"m<etil/rs
+
+ <icr/ short i (i with a crescent [breve] above it)
+ Note: In most cases, this is used where the
+ short i sound of "lip" is intended, but it is
+ also used in the middle of words where Americans
+ use an unstressed long "e" sound, (as the
+ "i" in "serial" and "serious")!?
+ and also in words ending in "ies",
+ coded as "<icr/z" (as in liberties)
+ lip = l<icr/p
+ pin = p<icr/n
+ commission = k<ocr/m*m<icr/sh"<ucr/n
+ committal = k<ocr/m*m<icr/t"t<ait/l
+ *serial = s<emac/"r<icr/*<ait/l
+ *serious = s<emac/"r<icr/*<ucr/s
+ liberty = l<icr/b"<etil/r*t<ycr/
+ *but: liberties = l<icr/b"<etil/r*t<icr/z
+
+ <imac/ i-macron (long i, stressed) (i with a macron above it)
+ combine = k<ocr/m*b<imac/n"
+ combined = k<ocr/m*b<imac/"nd
+
+ <isl/ long i (unstressed)
+ diameter = d<isl/*<acr/m"<esl/*t<etil/r
+ diagonal = d<isl/*<acr/g"<osl/*n<ait/l
+
+
+ <ocr/ short o (o with a crescent [breve] above it)
+ colossus = k<osl/*l<ocr/s"s<ucr/s
+ commute = k<ocr/m*m<umac/t"
+
+ <omac/ o-macron (long o, stressed) (o with a macron above it)
+ boat = b<omac/t
+ colt = k<omac/lt
+ comb = k<omac/m
+ combing = k<omac/m"<icr/ng
+ commode = k<ocr/m*m<omac/d"
+ course = k<omac/rs
+
+ <ocir/ o-circumflex ("only in syllables closed by r")
+ orb = <ocir/rb
+ lord = l<ocir/rd
+ lordship = l<ocir/rd"sh<icr/p
+ lorn = l<ocir/rn
+ cord = k<ocir/rd
+ commorse = k<ocr/m*m<ocir/rs"
+ deform = d<esl/*f<ocir/rm"
+ deformed = d<esl/*f<ocir/rmd"
+ dehortative = d<esl/*h<ocir/rt"<adot*t<icr/v
+
+ <osl/ "o semilong" (long o, unstressed)
+ diagonal = d<isl/*<acr/g"<osl/*n<ait/l
+ dejectory = d<esl/*j<ecr/k"t<osl/*r<ycr/
+
+ <oomac/ oo-macron (an oo with a macron above both o's)
+ boom = b<oomac/m
+ boot = b<oomac/t
+ boost = b<oomac/st
+ commove = k<ocr/m*m<oomac/v"
+
+ <oomcr/ oo-crescent (an oo with a crescent [breve] above both o's)
+ foot = f<oocr/t
+ cook = k<oocr/k
+
+ <umac/ u macron (long u)
+ commute = k<ocr/m*m<umac/t"
+ definitude = d<esl/*f<icr/n"<icr/*t<umac/d
+ communicant = k<ocr/m*m<umac/"n<icr/*k<ait/nt
+ defuse = d<esl/*f<umac/z"
+
+ <ucr/ short u (u with a crescent [breve] above it)
+ come = k<ucr/m
+ color = k<ucr/l"<etil/r
+ colored = k<ucr/l"<etil/rd
+ Columbia = k<osl/*l<ucr/m"b<icr/*<adot/
+ up = <ucr/p
+
+ <ycr/ y-crescent (y with a crescent [breve] above it)
+ used mostly for y-endings (supposed to sound similar to <icr/!!)
+ sounds to me like an unstressed long e
+ comedy = k<ocr/m"<esl/*d<ycr/
+ comely = k<ucr/m"l<ycr/
+ liberty = l<icr/b"<etil/r*t<ycr/
+
+ <ymac/ y-macron (y with a macron above it)
+ used to represent the long i (stressed) sound, but
+ examples in pronunciations seem to be absent. It is
+ found in some foreign words in the etymologies.
+
+ ou the common "ow" sound of "town", "browse"
+ count = kount
+
+ <nsm/ n-submacron (an n with a macron underneath)
+ represents the "ng" sound when it occurs before a
+ consonant
+ defunct = d<esl/*f<ucr/<nsm/kt"
+ commingle = k<ocr/m*m<icr/<nsm/"g'l
+
+ <th/ the "th" sound in "mother"
+ this is represented in the printed work by a th ligature
+ carouse = k<adot/*rouz"
+
+ zh not a special character, but used to represent the
+ "si" sound in words like
+
+ decision = d<esl/*s<icr/zh"<ucr/n
+
+ th the usual sound as in thing and thorn
+ sh the usual as in ship
+ ch the usual as in chip
+ N (capital N) represents the nasal "n" sound of the French language
+
diff --git a/README.DIC b/README.DIC
index 780e0bb..edaa4f0 100644
--- a/README.DIC
+++ b/README.DIC
@@ -1,268 +1,268 @@
-File README.DIC
- To accompany the GNU version of the set of files (cide.*) containing
- the electronic version of the
- Collaborative International Dictionary of English.
- (called also GCIDE)
- These files contain Version 0.46 (January 2002)
- * * * * * * * * * * * * * * * * * * * * * * * * * * * *
-
-The dictionary was derived from the
- Webster's Revised Unabridged Dictionary
- Version published 1913
- by the C. & G. Merriam Co.
- Springfield, Mass.
- Under the direction of
- Noah Porter, D.D., LL.D.
-
-and has been supplemented with some of the definitions from
- WordNet, a semantic network created by
- the Cognitive Science Department
- of Princeton University
- under the direction of
- Prof. George Miller
-
-and is being proof-read and supplemented by volunteers from
-around the world. This is an unfunded project, and future
-enhancement of this dictionary will depend on the efforts of
-volunteers willing to help build this free resource into a
-comprehensive body of general information. New definitions
-for missing words or words senses and longer explanatory notes,
-as well as images to accompany the articles are needed. More
-modern illustrative quotations giving recent examples of
-usage of the words in their various senses will be very
-helpful, since most quotations in the original 1913 dictionary
-are now well over 100 years old.
-
- This electronic version is being maintained by World Soul,
-a non-profit organization in Plainfield, NJ. For additional
-information or if you are willing to assist construction of this
-data source, contact:
-
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
- Patrick J. Cassidy | TEL: (908) 561-3416
- World Soul | if no answer, (908) 668-5252
- 735 Belvidere Ave. | FAX: (908) 668-5904
- Plainfield, NJ 07062-2054
- pc@worldsoul.org or cassidy@micra.com
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
-
- * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
-
-GCIDE is free software; you can redistribute it and/or modify
-it under the terms of the GNU General Public License as published by
-the Free Software Foundation; either version 2, or (at your option)
-any later version.
-
-GCIDE is distributed in the hope that it will be useful,
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
-GNU General Public License for more details.
-
-You should have received a copy of the GNU General Public License
-along with this copy of GCIDE; see the file COPYING. If not, write
-to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
-Boston, MA 02111-1307, USA.
- * * * * * * * * * * * * * * * * * * * * *
-
-STRUCTURE OF THE DICTIONARY
----------------------------
- When the archives are unpacked, the main dictionary text of
-the GCIDE will be found in 26 files named "cide.*", where the
-asterisk indicates which letter of the alphabet begins the
-words in each file. For example, file "cide.b" contains words
-beginning with the letter "B". Additional information about the
-tagging conventions and special character symbols are contained in
-ancillary files in this directory more information below). The main
-body of the 1913 dictionary was essentially identical to the edition
-published in 1890, and was republished in 1913 with an appendix
-containing "New Words". The new words of that appendix have been
-integrated into the main file in this version. However, it is important
-to keep in mind that the definitions in this dictionary are in most
-cases over 100 years old. Use them with caution!
- At the bottom of each paragraph in this dictionary, there is a
-bracketed and tagged "source" indicated. This tells from where the
-definition or other text in that paragraph came, as follows:
-
-[<source>1913 Webster</source>]
- = From the original 1890 dictionary.
-[<source>Webster 1913 Suppl.</source>]
- = From the 1913 "New Words" supplement to the Webster.
-[<source>WordNet 1.5</source>]
- = From the WordNet on-line semantic network.
-[<source>Century Dict. 1906.</source>]
- = From the Century Dictionary published in 1906, especially from
- the "proper Names" supplement (volume IX).
- published
-[<source>XXX</source>]
- = Added by one of the volunteers.
-
- The original definitions have been tagged and in some cases
-reformatted or slightly rearranged. If substantive information
-is added from a second source, usually the additional source is
-also noted, as in:
-[<source>Webster 1913 Suppl.</source> + <source>WordNet 1.5</source>]
-
- A list of the ancillary files related to the GCIDE is appended at
-the bottom of this "readme.dic" file.
- This version is tagged with SGML-like tags of the form <pos>...</pos>
-so that the original typography (italics, bold, block quotes) can be
-reproduced. A list of the most important tags for fields in the
-dictionary is given below. The tags also serve the more important
-function of allowing the information content to be conveniently imported
-into computer programs or databases. The set of tags used is described
-in the accompanying file "tagset.web". ***NOTE*** the paragraph tags
-<p>...</p> do *not* always nest properly with certain other tags, such
-as <note> and <cs> ("collocation section"), which in some cases span
-multiple paragraphs. If you are using a tag parser which detects
-improper nesting, you should first either delete the paragraph
-tags or convert them to non-tag symbols, or, if possible, set the
-parser to ignore the <p>...</p> tags.
- The unusual characters (such as Greek or the European accented
-characters, as well as special characters used in the pronunciations)
-are described in the accompanying file "webfont.asc". Some information
-on the pronunciation system used may be found by viewing the files
-"wxxvii.jpg" and "pronunc.jpg" with a GIF viewer (or any web browser),
-and additional explanations of pronunciation are in the file
-"pronunc.web".
- Each paragraph of the original text is enclosed within tags of
-the form <p> . . . </p>. Within these paragraphs are no line
-breaks, and some of the paragraphs are over 12,000 characters long.
-These lines are too long to be handled by the vi editor, and probably
-by some other text editors. At some points, embedded line breaks within
-a "paragraph" are marked by a <br/ "entity". The file can therefore
-be converted, if necessary, to a form with shorter lines, and subsequently
-reconverted back to the form having one line per paragraph.
-
- If additional line breaks are added, then in order remove the
-line breaks and reconstruct the original paragraphs, so that the
-page width can be adjusted, perform the following manipulations:
- (1) convert each line break (cr-lf combination) to a space.
- (2) convert the string "</p> " (</p> followed by two spaces)
- to </p> followed by two line breaks (cr-lf combinations)
- (3) convert the string "<br/ " (<br/ followed by one space)
- to <br/ followed by one line break (cr-lf).
-There will be some "lines" (paragraphs) with over 12,000 characters,
-which may give trouble to some simple text editors.
- A more sophisticated formatting of spaces within paragraphs may
-require the use of the fully-tagged master files. If you have
-a need for these files, contact Patrick Cassidy: cassidy@micra.com.
- The approximate beginning of each page is marked by an SGML
-comment of the form <-- p. 345 -->. (The exact beginning was in some
-cases in the middle of a paragraph, which we decided was not a
-good location for these page-number comments, so the page number
-was usually moved to the next paragraph break). Pages which have
-been proofread by volunteers (e.g., with initials VOL) will have a
-note within that page comment: <-- p. 345 pr=VOL -->. Pages which have
-not been proofread yet (most of them) will have varying numbers of
-typographical errors in them. We still (January 2002) need
-proofreaders to get the errors out of these dictionary files.
-
-***********************************************************************
-** WARNING!!! **
-***********************************************************************
-
- This version is only a first typing, and has numerous typographic
-errors, including errors in the field-marks. In addition, the user must
-keep in mind that this text is very old and will contain numerous
-obsolete, inaccurate, and perhaps offensive statements, which are
-included solely because this work is intended to reproduce accurately
-this historically interesting classic reference work. This text should
-not be relied upon as an accurate source of information, as in many
-cases it represents the state of knowledge around 1890. The text is
-provided "as is", and the user must accept responsibility for all
-consequences of its use. Please refer to the header of each file and
-the GNU public license. If these conditions of use are unacceptable,
-please do not use these texts.
-************************************************************************
-************************************************************************
- This electronic dictionary is also made available as a potential
-starting point for development of a modern comprehensive encyclopedic
-dictionary, to be accessible freely on the internet, and developed by the
-efforts of all individuals willing to help build a large and freely
-available knowledge base. A large number of collaborators are needed to
-bring this dictionary to a more accurate, more modern, and more useful
-state. Anyone willing to assist in any way in constructing such a
-knowledge base should contact Patrick Cassidy (see above). All reports
-of errors will be gratefully received, and should also be transmitted to
-PC at: pc@worldsoul.org.
-
-In addition to the main text of the dictionary, additional
-explanatory material about this version of the dictionary is available
-in the ancillary files:
-
-=====================================================================
-COPYING 18,321 11-03-99 1:13a COPYING
-README DIC 13,775 01-17-02 11:48p readme.dic
-WEBFONT ASC 35,234 12-12-01 3:27p WEBFONT.ASC
-TAGSET WEB 55,843 08-16-01 1:16p TAGSET.WEB
-PRONUNC WEB 14,312 06-18-00 3:02p PRONUNC.WEB
-PRONUNC JPG 2,569,796 06-18-00 3:11p PRONUNC.JPG
-SYMBOLS JPG 144,716 06-18-00 3:13p SYMBOLS.JPG
-WXXVII JPG 1,188,380 06-18-00 3:19p WXXVII.JPG
-==================================================================
-
-
-Most important tags used in the GCIDE:
-<hw> tags the headword
-<pr> pronunciation
-<pos> part of speech
-<ety> etymology
-<ets> "source" word within an <ety> field, usually foreign words
-<fld> field of knowledge (e.g. Med. = medicine)
-<def> definition
-<cs> collocation section (containing word combinations)
-<col> collocation entry (word combination)
-<cd> collocation definition
-<as> illustrations of usage (within a <def>. . . </def> field)
-<au> authority for a definition, or author of a quotation
-<q> illustrative quotation -- in block quote format
-<au> author of an illustrative <q> quotation
-<altname> alternative name for the headword -- essentially a synonym
-<asp> alternative spelling of the headword
-<syn> list of synonyms for the headword
-<p> paragraph
-<b> bold type
-<it> italic type
-
-For other tags, see the file "tagset.web"
-
-
-============================================================
- OTHER VERSIONS OF THE DICTIONARY
-=============================================================
-
- There are several other derivative versions of this dictionary
-on the internet, in some cases reformatted or provided with an
-interface. Those that I am aware of are:
-
-(1) Project Gutenberg
----------------------
- In the extext96 directory of Project Gutenberg (www.prairienet.org)
-there is a version of the original 1913 dictionary, which is in
-the **public domain**. The main files are in the directory etext96,
-and sre labeled pgw050**.***. The tags for that version are a subset
-of those used in this GNU version.
-
-(2) The DICT development group
-------------------------------
-This group has created a program to index and search this dictionary.
-The program can be downloaded and used locally, but at present
-is available only in a Unix-compatible executable version.
-See their web site at http://www.dict.org.
-
-(3) The University of Chicago ARTFL project
----------------------------------------------
-Mark Olsen and Gavin LaRowe at the University of Chicago have
-converted the original 1913 dictionary to HTML and have provided an
-interface allowing search of the headwords. When the supplemented
-version has developed sufficiently to warrant the effort, a
-similar searchable version may be posted there as well. The
-search page is at:
- http://humanities.uchicago.edu/forms_unrest/webster.form.html
-
-That page will provide links to other ARTFL projects and contact
-information for the ARTFL group, who alone can provide information
-about the HTML version or interface.
-
-
- -- PJC
+File README.DIC
+ To accompany the GNU version of the set of files (cide.*) containing
+ the electronic version of the
+ Collaborative International Dictionary of English.
+ (called also GCIDE)
+ These files contain Version 0.46 (January 2002)
+ * * * * * * * * * * * * * * * * * * * * * * * * * * * *
+
+The dictionary was derived from the
+ Webster's Revised Unabridged Dictionary
+ Version published 1913
+ by the C. & G. Merriam Co.
+ Springfield, Mass.
+ Under the direction of
+ Noah Porter, D.D., LL.D.
+
+and has been supplemented with some of the definitions from
+ WordNet, a semantic network created by
+ the Cognitive Science Department
+ of Princeton University
+ under the direction of
+ Prof. George Miller
+
+and is being proof-read and supplemented by volunteers from
+around the world. This is an unfunded project, and future
+enhancement of this dictionary will depend on the efforts of
+volunteers willing to help build this free resource into a
+comprehensive body of general information. New definitions
+for missing words or words senses and longer explanatory notes,
+as well as images to accompany the articles are needed. More
+modern illustrative quotations giving recent examples of
+usage of the words in their various senses will be very
+helpful, since most quotations in the original 1913 dictionary
+are now well over 100 years old.
+
+ This electronic version is being maintained by World Soul,
+a non-profit organization in Plainfield, NJ. For additional
+information or if you are willing to assist construction of this
+data source, contact:
+
+=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
+ Patrick J. Cassidy | TEL: (908) 561-3416
+ World Soul | if no answer, (908) 668-5252
+ 735 Belvidere Ave. | FAX: (908) 668-5904
+ Plainfield, NJ 07062-2054
+ pc@worldsoul.org or cassidy@micra.com
+=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
+
+ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
+
+GCIDE is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 2, or (at your option)
+any later version.
+
+GCIDE is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this copy of GCIDE; see the file COPYING. If not, write
+to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+Boston, MA 02111-1307, USA.
+ * * * * * * * * * * * * * * * * * * * * *
+
+STRUCTURE OF THE DICTIONARY
+---------------------------
+ When the archives are unpacked, the main dictionary text of
+the GCIDE will be found in 26 files named "cide.*", where the
+asterisk indicates which letter of the alphabet begins the
+words in each file. For example, file "cide.b" contains words
+beginning with the letter "B". Additional information about the
+tagging conventions and special character symbols are contained in
+ancillary files in this directory more information below). The main
+body of the 1913 dictionary was essentially identical to the edition
+published in 1890, and was republished in 1913 with an appendix
+containing "New Words". The new words of that appendix have been
+integrated into the main file in this version. However, it is important
+to keep in mind that the definitions in this dictionary are in most
+cases over 100 years old. Use them with caution!
+ At the bottom of each paragraph in this dictionary, there is a
+bracketed and tagged "source" indicated. This tells from where the
+definition or other text in that paragraph came, as follows:
+
+[<source>1913 Webster</source>]
+ = From the original 1890 dictionary.
+[<source>Webster 1913 Suppl.</source>]
+ = From the 1913 "New Words" supplement to the Webster.
+[<source>WordNet 1.5</source>]
+ = From the WordNet on-line semantic network.
+[<source>Century Dict. 1906.</source>]
+ = From the Century Dictionary published in 1906, especially from
+ the "proper Names" supplement (volume IX).
+ published
+[<source>XXX</source>]
+ = Added by one of the volunteers.
+
+ The original definitions have been tagged and in some cases
+reformatted or slightly rearranged. If substantive information
+is added from a second source, usually the additional source is
+also noted, as in:
+[<source>Webster 1913 Suppl.</source> + <source>WordNet 1.5</source>]
+
+ A list of the ancillary files related to the GCIDE is appended at
+the bottom of this "readme.dic" file.
+ This version is tagged with SGML-like tags of the form <pos>...</pos>
+so that the original typography (italics, bold, block quotes) can be
+reproduced. A list of the most important tags for fields in the
+dictionary is given below. The tags also serve the more important
+function of allowing the information content to be conveniently imported
+into computer programs or databases. The set of tags used is described
+in the accompanying file "tagset.web". ***NOTE*** the paragraph tags
+<p>...</p> do *not* always nest properly with certain other tags, such
+as <note> and <cs> ("collocation section"), which in some cases span
+multiple paragraphs. If you are using a tag parser which detects
+improper nesting, you should first either delete the paragraph
+tags or convert them to non-tag symbols, or, if possible, set the
+parser to ignore the <p>...</p> tags.
+ The unusual characters (such as Greek or the European accented
+characters, as well as special characters used in the pronunciations)
+are described in the accompanying file "webfont.asc". Some information
+on the pronunciation system used may be found by viewing the files
+"wxxvii.jpg" and "pronunc.jpg" with a GIF viewer (or any web browser),
+and additional explanations of pronunciation are in the file
+"pronunc.web".
+ Each paragraph of the original text is enclosed within tags of
+the form <p> . . . </p>. Within these paragraphs are no line
+breaks, and some of the paragraphs are over 12,000 characters long.
+These lines are too long to be handled by the vi editor, and probably
+by some other text editors. At some points, embedded line breaks within
+a "paragraph" are marked by a <br/ "entity". The file can therefore
+be converted, if necessary, to a form with shorter lines, and subsequently
+reconverted back to the form having one line per paragraph.
+
+ If additional line breaks are added, then in order remove the
+line breaks and reconstruct the original paragraphs, so that the
+page width can be adjusted, perform the following manipulations:
+ (1) convert each line break (cr-lf combination) to a space.
+ (2) convert the string "</p> " (</p> followed by two spaces)
+ to </p> followed by two line breaks (cr-lf combinations)
+ (3) convert the string "<br/ " (<br/ followed by one space)
+ to <br/ followed by one line break (cr-lf).
+There will be some "lines" (paragraphs) with over 12,000 characters,
+which may give trouble to some simple text editors.
+ A more sophisticated formatting of spaces within paragraphs may
+require the use of the fully-tagged master files. If you have
+a need for these files, contact Patrick Cassidy: cassidy@micra.com.
+ The approximate beginning of each page is marked by an SGML
+comment of the form <-- p. 345 -->. (The exact beginning was in some
+cases in the middle of a paragraph, which we decided was not a
+good location for these page-number comments, so the page number
+was usually moved to the next paragraph break). Pages which have
+been proofread by volunteers (e.g., with initials VOL) will have a
+note within that page comment: <-- p. 345 pr=VOL -->. Pages which have
+not been proofread yet (most of them) will have varying numbers of
+typographical errors in them. We still (January 2002) need
+proofreaders to get the errors out of these dictionary files.
+
+***********************************************************************
+** WARNING!!! **
+***********************************************************************
+
+ This version is only a first typing, and has numerous typographic
+errors, including errors in the field-marks. In addition, the user must
+keep in mind that this text is very old and will contain numerous
+obsolete, inaccurate, and perhaps offensive statements, which are
+included solely because this work is intended to reproduce accurately
+this historically interesting classic reference work. This text should
+not be relied upon as an accurate source of information, as in many
+cases it represents the state of knowledge around 1890. The text is
+provided "as is", and the user must accept responsibility for all
+consequences of its use. Please refer to the header of each file and
+the GNU public license. If these conditions of use are unacceptable,
+please do not use these texts.
+************************************************************************
+************************************************************************
+ This electronic dictionary is also made available as a potential
+starting point for development of a modern comprehensive encyclopedic
+dictionary, to be accessible freely on the internet, and developed by the
+efforts of all individuals willing to help build a large and freely
+available knowledge base. A large number of collaborators are needed to
+bring this dictionary to a more accurate, more modern, and more useful
+state. Anyone willing to assist in any way in constructing such a
+knowledge base should contact Patrick Cassidy (see above). All reports
+of errors will be gratefully received, and should also be transmitted to
+PC at: pc@worldsoul.org.
+
+In addition to the main text of the dictionary, additional
+explanatory material about this version of the dictionary is available
+in the ancillary files:
+
+=====================================================================
+COPYING 18,321 11-03-99 1:13a COPYING
+README DIC 13,775 01-17-02 11:48p readme.dic
+WEBFONT ASC 35,234 12-12-01 3:27p WEBFONT.ASC
+TAGSET WEB 55,843 08-16-01 1:16p TAGSET.WEB
+PRONUNC WEB 14,312 06-18-00 3:02p PRONUNC.WEB
+PRONUNC JPG 2,569,796 06-18-00 3:11p PRONUNC.JPG
+SYMBOLS JPG 144,716 06-18-00 3:13p SYMBOLS.JPG
+WXXVII JPG 1,188,380 06-18-00 3:19p WXXVII.JPG
+==================================================================
+
+
+Most important tags used in the GCIDE:
+<hw> tags the headword
+<pr> pronunciation
+<pos> part of speech
+<ety> etymology
+<ets> "source" word within an <ety> field, usually foreign words
+<fld> field of knowledge (e.g. Med. = medicine)
+<def> definition
+<cs> collocation section (containing word combinations)
+<col> collocation entry (word combination)
+<cd> collocation definition
+<as> illustrations of usage (within a <def>. . . </def> field)
+<au> authority for a definition, or author of a quotation
+<q> illustrative quotation -- in block quote format
+<au> author of an illustrative <q> quotation
+<altname> alternative name for the headword -- essentially a synonym
+<asp> alternative spelling of the headword
+<syn> list of synonyms for the headword
+<p> paragraph
+<b> bold type
+<it> italic type
+
+For other tags, see the file "tagset.web"
+
+
+============================================================
+ OTHER VERSIONS OF THE DICTIONARY
+=============================================================
+
+ There are several other derivative versions of this dictionary
+on the internet, in some cases reformatted or provided with an
+interface. Those that I am aware of are:
+
+(1) Project Gutenberg
+---------------------
+ In the extext96 directory of Project Gutenberg (www.prairienet.org)
+there is a version of the original 1913 dictionary, which is in
+the **public domain**. The main files are in the directory etext96,
+and sre labeled pgw050**.***. The tags for that version are a subset
+of those used in this GNU version.
+
+(2) The DICT development group
+------------------------------
+This group has created a program to index and search this dictionary.
+The program can be downloaded and used locally, but at present
+is available only in a Unix-compatible executable version.
+See their web site at http://www.dict.org.
+
+(3) The University of Chicago ARTFL project
+---------------------------------------------
+Mark Olsen and Gavin LaRowe at the University of Chicago have
+converted the original 1913 dictionary to HTML and have provided an
+interface allowing search of the headwords. When the supplemented
+version has developed sufficiently to warrant the effort, a
+similar searchable version may be posted there as well. The
+search page is at:
+ http://humanities.uchicago.edu/forms_unrest/webster.form.html
+
+That page will provide links to other ARTFL projects and contact
+information for the ARTFL group, who alone can provide information
+about the HTML version or interface.
+
+
+ -- PJC
diff --git a/TAGSET.WEB b/TAGSET.WEB
index 5714751..1409569 100644
--- a/TAGSET.WEB
+++ b/TAGSET.WEB
@@ -1,1060 +1,1060 @@
- FIELD MARKS FOR WEBSTER 1913 and CIDE
- =====================================
-Tagset.web:
- Explanations of the tags used to mark the Webster 1913 dictionary
-and the CIDE (Collaborative International Dictionary of English).
-Note that the list of tags used to mark the public domain version
-of this dictionary is shorter than the full set described here.
- If any tag is not listed here, it is either (1) one of the
-"point" (font size) or "type" (font style) tags, which should be self-explanatory; or
- (2) Is a functional field with no effect on the typography.
-
-Last modified March 12, 1999.
- For questions, contact:
- Patrick Cassidy cassidy@micra.com
- 735 Belvidere Ave.
- Plainfield, NJ 07062
- (908) 561-3416 or (908) 668-5252
--------------------------------------------------------------
-A separate file, webfont.asc, contains the list of the individual
-non-ASCII characters represented by either higher-order hexadecimal
-character marks (e.g., \'94, for o-umlaut) or by entity tags
-(e.g., <root/, for the square root symbol.)
---------------------------------------------------------------
- Use of tags:
- In the MICRA electronic version of the 1913 Webster, each part of
-the entry headed by an entry word ("headword") is labeled so that no
-part of the entry except some punctuation marks should be found
-outside of all fields, i.e. every character should be within some tagged
-field. In the following description, the word "segment" usually refers to
-a major part of an entry such as an etymology or a definition or a
-collocation segment or a usage block, containing more than one field.
-The term "field" may also be used similarly to "segment", but may also
-denote single-word fields, such as an alternative spelling, labeled <asp>.
-
- Note: The tags on this list are similar in structure to SGML tags. Each
-tag on this list marks a field; each field opens with a tagname between
-angle brackets thus: <tagname>, and closes with a similar tag containing
-the forward slash thus: </tagname>. No tags are used without closing
-tags. Thus the HTML <BR> to indicate a line break is symbolized
-here as an entity, <br/, and every <p> has a corresponding </p>.
- The absence of an end-field tag, or the presence of an end-field tag
-without a prior begin-field tag constitutes a typographical error, of which
-there may be a significant number. Any errors detected should be brought
-to the attention of PJC or the appropriate editor.
- Most of the tagged fields are presented in the text in italic type,
-with a number of exceptions. Where a word is contained within more than
-one field, the innermost field determines the font to be used. Wherever
-recognizable functional fields were found, an attempt was made to tag the
-field with a functional mark, but in many cases, words were italicised only
-to represent the word itself as a discourse entity, and in some such cases,
-the "italic" mark <it> was used, implying nothing regarding functionality
-of the word. The base font is considered "plain". Where an italic field
-is indicated, parentheses or brackets within the field are not italicised.
- Where no font is specified for a tag, the tag is merely a functional
-division, and was printed in plain font unless otherwise tagged. This type
-of segment is marked by an asterisk (*) where the font name would be.
- The size of the "plain" font in the original text is about 1.6 mm for
-the height of capitalized letters.
-=============================================================
-Explicit typographical tags:
- These were used where the purpose of a different font was merely to
-distinguish a word from the body of the text, and no explicit functional
-tag seemed apropriate.
------------------------------------
-Tag Font
------------------------------------
-Explicit formatting tags:
-. . . . . . . . . . . . . . . . . .
-<plain> plain font (that used in the body of a definition) --
- normally not marked, except within fields of
- a different front.
-<it> italic (in master files)
-<i> italic (for use in HTML presentation)
-<bold> bold (in master files)
-<b> bold (for use in HTML presentation)
-<colf> bold, Collocation font. Same font as used in collocations.
- smaller This is used only in the list of "un-" words not
- by 1 point actually defined in the dictionary. Probably could be
- replaced by a segment mark for the entire list!
- The "un-" words should be indexed as headwords.
-
-<ct> bold Same as <colf>, a font similar to that used in
- collocations. However, this tag is used in a table
- and could be set to a different font.
-
-<h1> * HTML tag -- largest heading font.
-
-<h2> * HTML tag -- second largest heading font.
-
-<headrow> * Marks a Row title in a table.
-
-<hwf> Font the same as the headword <hw>, though the field is
- not a headword. Used only once.
-
-<mitem> * Multiple items, a set of items in a table.
-<point ...> A series of point size markers, many unique.
-<point1.5> * One of the tags of the form <point**> where **
-<point6> represents the typographic point size of the
- enclosed text.
-<pre> An HTML tag indicating that the enclosed text is
- of teletype form, preformatted in a uniform-spaced
- font.
-<sc> small caps (used mostly for "a. d.", "b. c.")
- This is the same font a <er>, but has no functional
- or semantic significance
-<str> group of table data elements in a table
-<sub> subscript, like <subs>
-<subs> subscript
-<sups> superscript
-<supr> superscript
-<sansserif> Sans-serif font
-<stypec> Bold (collocation font) and also a subtype.
-<tt> HTML tage -- teletype font
-<universbold> A squared bold font without serifs approximating the
- "universe bold" font on the HP Laserjet4, slightly
- larger than the capitals in a definition body. Used
- in expositions describing shapes, such as
- "Y", "T", "U", "X", "V", "F".
-<vertical> Vertically organized column.
-<column1> Vertically organized column -- only part of a table
- which needs to be completed. Used once.
-<...type> A series of tags, many unique, designating certain
- unusual fonts, such as "bourgeoistype" for
- "bourgeois type", in the section on typography.
- Most of these occur only once, in the section on fonts.
-<antiquetype>
-<blacklettertype>
-<boldfacetype>
-<bourgeoistype>
-<boxtype>
-<clarendontype>
-<englishtype>
-<extendedtype>
-<frenchelzevirtype>
-<germantype>
-<gothictype>
-<greatprimertype>
-<longprimertype>
-<miniontype>
-<nonpareiltype>
-<oldenglishtype>
-<oldstyletype>
-<pearltype>
-<picatype>
-<scripttype>
-<smpicatype>
-<typewritertype>
-
-=============================================================
-Tags with semantic content:
-. . . . . . . . . . . . . . . . . . . . . . . . . . .
-<altsp> * Alternative spelling segment. Almost always
- contained within square brackets after the main
- definition segment. Expository words
- such as "Spelled also" are in plain font;
- the actual alternative spelling is marked by
- <asp> ... </asp> tags within this segment.
-
-<ant> italic Antonym.
-
-<asp> italic Alternative spelling. The actual word which is an
- alternative spelling to the headword. These
- are functionally synonyms of the headword. In
- most cases these also occur as headwords, with
- reference to the word where the actual definition
- is found, but not all such words are listed
- separately, particularly if the spelling is
- close enough to the headword to be found at the
- same point in the dictionary. Whether listed
- separately or not, these words should
- be indexed at this location, also.
-
-<au> italic Authority or author. Used where an authority is
- (may be right- given for a definition, and also used for the
- justified. See author, where a quotation within double quotes
- in the section is given in the same paragraph as the
- on formatting). definition. The double quotes are indicated
- by the open-quote (\'bd) and close-quote
- (\'b8). In both cases, it is typically
- right-justified, almost always fitting on
- the same line with the last line of the
- definition or quotation.
- Within collocation segments, it is usually
- used only after quotations, and is not right-
- justified, except occasionally where it
- would be close to the right margin, and then
- apparently is is right-justified. We have
- not explicitly marked those which are
- right-justified, but they can be
- recognized because they are on a line by
- themselves, preceded by two carriage returns.
-
-<bio> * Marks a biography. Should be longer than
- a short mention of who a person was, which
- is typically included as a definition.
-
-<biography> * Same as <bio>
-
-<booki> italic Marks the name of a book, pamphlet, or similar
- document.
-
-<branchof> * A field of knowledge which of which the headword
- is a division.
-
-<caption> * Caption of a figure or table.
-
-<cas> * tags the CAS (Chemical Abstracts Service) registry
- number for a chemical substance.
-
-<causes> italic tags the infectious disease caused by the headword.
- Implied type of the agent is a microorganism, and
- the tag must mark a disease.
-
-<causesp> * Same as <causes> without the italic type.
-<causedbyp> * Same as <causedby> without the italic type.
-
-<causedby> italic inverse of causes: tags the causative agent of an
- infectious disease, which is the headword .
- the tag must mark a microorganism, virus, or
- prion, and the implied type of the headword is
- a disease.
-
-<centered> Used only for The single letter in the headers to each
- letter of the alphabet.
-
-<city> * marks the proper name of a city. Used only
- occasionally and not consistently at this stage.
-
-<cnvto> italic Converted to: used to tag substances which are
- products prepared by conversion from the
- headword. Usually chemicals or complex
- products from mnatuarl materials. Rarely used
- up to 1998.
-
-<colheads> * List of heads for the columns of a table.
-
-<coltitle> * Title of a column in a table.
-
-<comm> * Comment -- differs from <note> in being in-line with
- the definition paragraph. Provides a little
- additional information.
-
-<company> * Name of a company (commercial firm). Compare <org>
-
-<compof> italic Composed of. Tags a substance of which the
- headword is at least partly composed. The
- substance may be particulate, such as
- diatoms composing diatomaceous earth.
-
-<contains> * marks an object contained within the headword.
-
-<contr> italic Contrasting word. Not exactly an antonym, which
- is marked <ant>, but a contrasting word which is
- often introduced as "opposite to" or "contrasts
- with".
-
-<country> * Name of a country (nation) of the world.
-
-<cref> italic Collocation reference. A reference to a collocation.
- Each such collocation should have its own entry,
- marked by <col> ... </col> tags, and these
- references should function as hypertext buttons
- to access that entry.
-
-<date> * A Date, of any type, e.g. <date>Dec. 25</date>.
-
-<datey> * Date-with-year tags a date containing a year.
-
-<def> * definition. The definition may have subfields,
- particularly <as> (an illustrative phrase
- starting with "as" or "thus" and containing
- the headword (or a morphological derivative).
- The <mark>, \'bd...\'b8 quotations (left and
- right double quotes) and <au> fields may be
- found within a definition field, but should
- and usually are located outside the definition
- proper. The marking macro was
- inconsistent in this placement, and the
- exclusion of the <mark>, <au> and quotations
- needs to be completed by the proof-readers.
- Certain definitions contain <pos>
- fields within them, where the headword is
- an irregular derivative of another headword.
- In these cases, the <pos> field follows
- immediately after the <def> tag, and these
- entries do not have a separate <pos> field.
- In such cases, the <pos> field is italic, as
- usual.
-
-<divof> * Division of the headword, usually an organization.
- E. g. a faculty or department of a university,
- or a United Nations agency.
-
-<edi> * Marks an education institution, a subtype of
- organization.
-
-<emits> * tags a physical object or form of radiation
- emitted by the headword
-
-<figure> Just a place-holder for illustrations, but seldom used.
-
-<film> italic Marks the name of a movie film.
-
-<fld> italic Field of specialization. Most often used for
- Zoology and Botany, but many "fields of
- specialization" are marked for technical
- terms. The parentheses are usually within this
- field, but are not themselves in italics.
-
-<geog> * Name of a geograpahical region of any size;
- if applicable, the more specific <city>,
- <state>, or <country> are preferred.
-
-<hypen> * Hyperym. Points to the hypernym from WordNet 1.5
- Initially, used only for entries extracted
- from WordNet 1.5. Not present in the original
- 1913 version.
-
-<illu> * Illustrative usage -- mostly from WordNet, and placed
- outside the definition, in contrast to <as> usage.
- These should be converted to <as>...</as> illustrative
- usage format for consistency.
-
-<illust> * Illustration place-holder. Seldom used.
-<img> * HTML usage -- points to an image file, usually
- .gif or .jpg. These have no closing tag, and
- will appear as errors in parsing.
-<intensi> * Points to a word whose meaning is an intensified
- form of the headword. Taken from WordNet
- tags, used with some adjectives from WordNet
-<item> * Designates one item in a row of a table. Used only when
- intervening spaces do not serve properly as natural
- field separaters.
-<itran> italic Translation into a foreign (non-English) language
- of the previous word in the text -- italic font.
- (<sig> is a translation into English)
-<itrans> italic Same as <itran>
-<jour> * Title of a journal (periodical).
-<matrix> * Always a filled rectangular array.
-<matrix2x5> * A 2x5 matrix (2 rows by 5 columns).
-<mstypec> * Multiple synonymous subtypes -- used in
- def. of "grass".
-<mtable> * Multiple table, encloses <table> figures.
-<musfig> * Music figure. Only in a note under the entry "Figure",
- the two numbers of each such field
- are bold, 20 point type, stacked as in a fraction with
- a bar between them, but also having a horizontal stroke
- midway through each numeral. Unique to this entry.
-<p> * paragraph tag, used always in pairs. Line breaks may
- be embedded inside the paragraphs.
-<person> * marks the proper name of a person. Used only
- occasionally, but should be used more frequently
- for cases where first names are abbreviated,
- to reduce ambiguity of the period for automatic
- analysis. Where a title is given, prefixed
- or postfixed, it is included in this tag.
-
-<persfn> * marks the name of a person, when only one name
- (usually the last name) is given. Not used
- consistently where it should be.
-
-<publ> * Marks the name of a publication other than book,
- which is marked by <booki>. It is often a
- magazine or journal.
-<qpers> * Tags the name of a person who is speaking,
- within a quotation.
-<qperson> Same as <qpers>
-<cp> * Collocation, plain text -- used to tag phrases that
- should be parsed as a unit, but has no typographical
- significance.
-<qau> italic Always right-justified, as described for <au>.
-<ref> * A reference to a word in the vocabulary.
-<refs> * Marks the set of references used for a longer article
- such as a biography.
-<river> * Marks the name of a river -- a proper name
-<rj> * Right justified
-<row> * Designates a row in a table.
-<state> * Name of a geopolitical state, the first subdivision of
- a country. Includes, e.g. Canadian provinces.
-<subtypes> * Lists subtypes of the headword.
-<sup> * superscript
-<supr> * Supra. The two parts of each such field
- are stacked, one over the other, *without* a
- horizontal bar between (as in a fraction).
- Used only in one entry, for a musical notation.
-<table> * Always a filled rectangular array, having <row> and <item>
- elements.
-<td> * Table datum - one cell in a table
-<th> * Table header
-<tradename> * Tags a commercial Trade name
-<ttitle> * Table title (Larger than normal font)
-====================================================================
-
-Functional Tags
---------------------------------------------------------------------
-Tag Font Meaning
- (Comparatives are relative to the plain font.)
------------------------------------------------------------------------
-<-- --> * Comment, not a tag. These segments should be deleted
- from the written or printed text.
- Page numbers of the original text are indicated
- within such comments; these may be left in, if
- desired.
-
-<! !> * HTML-style comment. Used to indicate page numbers
- in the public domain version.
-
-<adjf> small caps Tags for the actual adjective or adverb
- comparatives or superlatives. Should be
- indexed. See also conjf (verbs) and
- decf (nouns).
-
-<altname> italic Alternative name. Usually for plants or animals,
- but also used for other cases where words
- are introduced by "also called", "called also",
- "formerly called". These are functionally
- *synonyms* for that word-sense.
-
-<altnpluf> italic Same as <altname>, but the marked word is a
- plural form, whereas the headword is singular.
-
-<amorph> * Adjective morphological segment, primarily
- the comparative and superlative forms.
- The occasional adverb morphology is
- also tagged this way.
-
-<as> * A segment occurring within the definitional
- sentence, providing an example of usage of
- the headword. Not conceptually a part of the
- actual definition.
-
-<cd> smaller spacing Collocation definition. Similar in structure
- to headword definitions (the <def> field). May
- contain an <as> field. Plain type, but with
- closer spacing than main definitions.
-
-<col> bold, Collocation. A word combination containing the
- smaller by headword (or a morphological derivative).
- 1 point The collocations do not have an explicitly
- marked part of speech.
- See also <ecol>, tagging embedded collocations.
-
-<colp> Collocation, no typographic significance.
- Used to mark a word combination defined in
- the dictionary without affect on font.
-
-<conjf> small caps The conjugated (non-infinitive) forms of
- verbs. imp. & p. p. is common, as well as
- p. pr. & vb. n. Irregular variants of
- these are less common. Words in this
- field perhaps should be indexed.
-
-<cs> smaller Collocation segment. The font and size is
- vertical normal in a cs, but the spacing between lines
- spacing is smaller (0.9 mm between lower-case letters,
- rather than 1.1 mm in the main body of the
- definition). For an on-line dictionary,
- reproducing this typography is probably
- pointless.
-
-<decf> small caps The actual morphological variants of nouns or
- pronouns. Should be indexed.
-
-<ecol> * Embedded Collocation. A word combination
- containing the headword (or a morphological
- derivative, embedded within a definition
- without a separate definitin of its own.
- These collocations should be defined
- implicitly by the text of the definition in
- which they are embedded.
- See also <col>, tagging explicitly defined
- collocations.
-<er> Small Caps Entry reference. References to headwords
- within the "etymology" section are in small
- caps. Such references also occur
- in the body of definitions, and in "usage"
- segments.
- Such entry references should function as hypertext
- buttons to access that entry.
-
-<ety> * Etymology. Always contained within square
- brackets. Normal type is used for explanatory
- comments, and italics for the actual words
- (marked <ets>) considered as etymological
- sources.
-
-<ets> italic Etymological source. Words from which the
- headword was derived, or to which it is related.
- The Greek words within an etymology segment
- are invariably etymology sources, and should
- be marked as such, but are not so marked,
- even in the rare cases where the Greek word
- transliteration has been written in.
-
-<etsep> italic Etymological source, being the name of a person
- or geographical location which is the eponym
- for the concept. This is used to distinguish
- eponymous etymologies from others, and can also
- be found in the body of a definition or note,
- not only in the etymology field. Very few
- of the names that should be marked this way
- have actually been so marked, as of version
- 0.42. In cases where such eponymous names
- have not yet been thus marked, they will
- usually be marked by <xex>, the non-semantic
- italic-font marker, or, in etymologies, by
- <ets>.
-
-<ex> italic Example. An example of usage of the headword,
- usually found within an <as> or <note> segment.
-
-<fr> * Frequency of use, ordinal rank. This is used for
- WordNet entries, in which the synonyms
- were ranked in order of frequency of use.
- <fr>1</fr> indicates that the headword is the
- first word on the list of synonyms.
-
-<fu> * First use. A date at or around which the first
- use of this word in writing is recorded.
- Not in the original 1913 Webster, and usu.
- taken from a recent dictionary. Only a few
- such fields have been entered as of version
- 0.41
-
-<grk> transliteration Greek. The Greek words have been transliterated
- using the equivalents explained in the
- file "webfonts.asc". In most cases, the
- transliterations are typical for Greek
- letters, except for theta (transl = q),
- phi (transl. = f), eta (transl. = h), and
- upsilon (transl. = y, whether pronounced
- as y or u). This was to eliminate any
- ambiguity. These words occur primarily
- in etymologies, and to conform to the
- usage of <ets> should also be marked
- by <ets>, but as of version 0.41 they
- are not usually thus marked.
-
-<hw> bold, headword. Each main entry begins with the <hw>
- larger by mark, and ends at the next <hw> mark. The
- 2 points main entries are not otherwise explicitly
- marked as a distinctive field.
- The same word may appear as a headword
- several times, usually as different parts
- of speech, but sometimes with different
- entries as the same part of speech, presumably
- to indicate a different etymology.
- Within the hw field the heavy accent is
- represented by double quote ("), the
- light accent by open-single-quote (`),
- and the short dash separating syllables by
- an asterisk (*). A hyphen (-) is used to
- represent the hyphen of hyphenated words.
-
-<mark> italic, Usage mark. Almost always within square
- brackets, occasionally in parentheses or
- without any bracketing.
- but The most common usage marks,
- explanatory "Obs." = obsolete "R." = rare, "Colloq." =
- may be plain. colloquial, "Prov. Eng." = Provincial England,
- etc. are in italics. Some usage notes are also
- marked with <mark>, but are in plain. For
- simplicity, all words in this field may be
- italic, until additional explicit marks are
- added.
-
-<markp> * A usage mark in plain type (not italic). Found
- within a definition, when there are more than
- one sense-number listed. "Fig." at the head
- of an entry is the most common case.
-
-<mcol> * Multiple collocation. Similar to multiple
- headword, when two or more collocations share
- one definition; however, the two collocations
- are in-line, rather than stacked or justified.
- There may be "or" or "and" words
- (italicised), or an "etc." (plain type)
- within this field. In many cases, the
- <or/ and <and/ entities are used to
- signify the change of font for these words.
-
-<mhw> * Multiple headword. This field is used where
- more than one headword shares a single
- definition. In the dictionary, the
- (usually) two headwords are left-justified
- one below the other in the column, and are
- tied together on the right side of the
- headwords by a long right curly brace.
- This division is strictly functional,
- for analytical purposes, and does not
- affect the typography.
-
-<nmorph> * Noun morphology section. Rarely used, mostly
- for irregular personal pronouns.
-
-<note> * Explanatory note. No explicit font is indicated.
- These segments may be separate, as in the
- separate paragraphs starting <note><hand/,
- or they may just be further explanation within
- (or more usually, following) the main
- definition paragraph. Typographically,
- the notes following the main definition may
- not be distinguishable from additional
- sentences appended to the first sentence
- of a definition.
-
-<plu> * Plural. The "plural" segment starts with a
- "pl." which is italicised, but in this
- segment is not otherwise marked as
- italicised. Other words occurring in this
- segment are plain type. The "pl." can be
- easily explicitly marked if necessary.
-
-<pos> italic Part of speech. Always an abbreviation: e.g.,
- n.; v. i.; v. t.; a.; adv.; pron.; prep.
- Combinations may occur, as "a. & n.".
-
-<plw> small caps Plural word. The actual plural form of the word,
- found within a <plu> segment.
-
-<pr> * pronunciation. The default font is normal, but
- many non-ASCII characters are used.
- The pronunciation field may have more than
- one pronunciation, separated by an "<or/".
- (An "or" here is in italic, and usually is
- represented by the entity <or/).
- There may also be some commentary, such as
- "Fr."(French pronunciation) or "archaic".
- The commentaries are typically italic, and
- should be marked as such. In certain
- pronunciations there is a numbered reference
- to a root form explained in an introductory
- section on pronunciation.
- Very few of the pronunciation fields have
- been filled in. The pronunciation markings use
- a more complicated method than more modern
- dictionaries. It would be interesting to have
- these fields filled in, if there are any
- volunteers willing to do it.
-
-<q> smaller by Quotation. No bracketing quotation marks,
- two points, though occasionally \'bd-\'b8 quotations occur
- centered, within these quotations. These quotations
- Separate tend to be more complete sentences, rather
- paragraph than just phrases, such as are contained
- within quotation marks within the definition
- paragraph.
-
-<qau> italic, Quotation author. Used only for the quotations
- right justified marked with <q> that are centered in their
- own paragraphs.
-
-<qex> italic Quotation example. An example of usage of
- the headword, within quotations marked
- by <q>..</q> tags.
-
-<sd> italic Subdefinition, marked (a), (b), (c), etc. THese are
- finer distinctions of word senses, used
- within numbered word-sense (for main entries),
- and also used for subdefinitions within
- collocation segments, which have no numbering of
- senses. The letter is italic, the parentheses
- are not. This tag is also used to indicate the
- lettered subdefinition when it is referred to
- at another point in the text.
-
-<ship> italic The name of a ship. Rarely used.
-
-<sing> * Singular. Analogous to the <plu> segment, but more
- rarely used, mostly for Indian tribes, which
- are listed in the plural form.
-
-<singw> small caps Singular word. The singular form of the
- plural-form headword.
-
-<sn> bold, Sense number. A headword may have over 20
- larger by different sense numbers. Within each numbered
- 2 points sense there may be lettered sub-senses. See
- the <sd> (sub-definition) field.
-
-<source> italic Source. The author of the definition. Used only
- for definitions not originally present in
- Webster 1913, and not present in the original
- version intended to mimic the 1913 printed
- dictionary. This source is used for each
- word sense, and may differ for different
- senses of a word, especially where a Web1913
- definition was substantially modified, or a
- new word sense was added to a previously
- defined word.
-
-<syn> plain Synonyms. A list of synonyms, sometimes followed
- by a <usage> segment.
-
-<usage> narrower Comparisons of word usage for words which are
- spacing sometimes confused. As with collocation segments,
- font is plain, but spacing is smaller than
- normal definition spacing. This seems pointlessly
- complicating for an on-line display.
-
-<vmorph> * Verb morphology (conjugation) segment, delimited
- by square brackets.
-
-<wordforms> * Morphological derivatives not contained in the
- bracketed segments, as above. For nouns
- derived from adjectives, adverbs from
- adjectives, etc. This segment is usually
- found at the end of the main entry. The
- adverbial and nominalized derivatives at the
- end of a main entry are usually introduced
- by an em dash [represented as two hyphens (--)].
-
-<wf> bold, Same font as <hw>, with accents and syllable
- larger by breaks marked as in the headword.
- 2 points Marks the actual morphological forms within
- a <wordforms> segment; typically, adverbial or
- nominalized form of an adjective.
-
-
-<def2> * Second definition (occasionally, a third definition is
- present). This is used where a second or third
- part of speech with the same orthography is
- placed under one headword. Within this segment,
- there will be a <pos> field, and sometimes
- a <mark> and/or a quotation.
-
-<specif> * "Specifically:" Used to mark the words "specifically",
- "Hence", "as" which are used to introduce a second
- definition typically more specific than the first,
- but in general derived by extension of the initial
- definition. This functions as a warning of multiple
- definitions where the sense-numbers are not explicitly
- used. It is also useful in separate senses, to
- tag polysemous definitions which may be
- specializations or generalizations of the preceding
- definition.
-
-<pluf> italic. Plural form.
- Used exclusively to mark the "pl." abbreviation,
- which introduces a definition for the headword,
- *when used in the plural form*. Not related to
- <plu>, which spells out the plural form, but does
- define it.
-
-<uex> italic Usage example. Used only a few times, within
- <usage> segments.
-
-<isa> italic supertype (hypernym) the inverse of <stype> and
- identical to <hypen> but not derived from WordNet.
-
-<chform> plain, Chemical formula. The letters are plain font,
- numbers but the numbers are subscript. This is mostly
- subscript useful as a functional mark to pinpoint
- chemicals.
-
-<chformi> plain, Chemical formula same as <chform>, but not
- processed specially by the tag-converter program.
- The letters are plain font, but the numbers are
- subscript.
- Used in place of <chform> when the formula has
- a tag inside, which cannot now be processed by the
- <chform> processing routine.
-
-<chname> * chemical name. Used to allow a IUPAC chemical
- name to be processed as a unit in spite of
- embedded dashes, parentheses, and commas.
-
-<see> * "see" reference to related words, outside of the
- main <def>definition</def> field.
-
-<mathex> italic Mathematical expression. In this dictionary,
- essentially all letters (used as variable labels)
- in math expressions are in italic font.
- The "+" and "-" may also appear typographically
- different from elsewhere in the dictionary.
-
-<ratio> italic Also a mathematical expression, but the colon and
- double colon may have a different typography
- than usual., as in <ratio>a:b</ratio>
-
-<singf> italic Singular form. Analogous to <pluf>, to define
- the singular word where the headword is the
- plural form. ** only modifies the word "sing."
-
-<mord> * Morphological derivation. Used to mark the
- entry-reference portions of those
- entries which are defined as morphological
- derivatives (plural, p. p., imp.) of other
- headwords. Used just as an attempt to
- mark and regularize the entry format.
- May be ignored typographically.
-
-<fract> a stack, Fraction. Used for non-numerical fractions
- with which cannot be expressed as a <frac12/-style
- numerator, entity. The forward slash "/" is to be
- horizontal interpreted as a horizontal line separating
- bar, and the numerator and denominator.
- denominator
-
-<exp> superscript, Exponential. Used in mathematical expressions.
- smaller
- font.
-
-<xlati> italic Translation (e.g. of Greek), in the body of a
- definition or etymology. Used only twice.
-
-<tran> italic Word translated: the word in italic is translated
- by a subsequent word. Usually in etymologies, where
- the word translated is not actually etymologically
- related to the headword. The translated word
- is not necessarily English.
-
-<tr> italic translation of the preceding word (or of the
- headword) into English.
-
-<fexp> * Functional expression (math). The function names are
- in plain type, the variables are italic.
-
-<iref> italic Illustration reference. Used ony occasionally, not
- yet (v. 0.41) consistently.
-
-<figref> italic Figure reference.
-
-<figcap> * Figure caption.
-
-<figtitle> * Figure title.
-
-<funct> * tags a mathematical function or expression.
-
-<chreact> * Chemical reaction. Similar to chemical formulas (which
- are contained but not explicitly marked), with
- some other symbols.
-
-<ptcl> italic Verb Particle. Only a few particles were actually
- marked, but in a future version more may be.
-
-<tabtitle> ? Table Title. Used only once.
-
-<title> italic Title of a literary work, movie, opera, musical
- composition, etc. Used rarely but should be
- used in every case, except in <au> references.
-
-<root> * Square root -- differs from the entity <root/,
- which is a square root sign that does not extend
- beyond the number following it. The <root>
- field has a bar (vinvulum) over the expression
- within the field, as well as the square root symbol
- preceding the expression in the field. Used only
- once.
-
-<vinc> * Vinculum. In a mathematical expression, a bar
- extending over the expression within the field.
- Used only once. This apparently serves the same
- function as a parentheses, of causing the
- expression within the field to be evaluated
- and the result used as the (mathematical) value
- of the field.
-
-<nul> plain Nultype. An older version of <plain>.
-
-<cd2> * Second collocation definition. Somewhat similar to
- <def2>. Purely a mark to reduce functional ambiguity,
- with no effect on the typography.
-
-<hypen> * Hypernym. Mark introduced for the World Wide Webster,
- when adding words from WordNet. In most cases, this
- tag marks the WordNet hypernym (for nouns and verbs).
- Where the <au> mark is PJC or includes a +PJC, the
- hypernym may not be the same as in WordNet. The words
- marked by this tag need to be bracketed in some way,
- but this is deferred until the definitions included
- with the hypernyms have been deleted, and other
- disambiguating marks substituted.
-
-<stype> italic Subtype. A functional mark, to point out words which
- are conceptually subtypes of the headword.
-
-<styp> * Subtype. A functional mark, to point out words which
- are conceptually subtypes of the headword, but
- with no *typographical* significance.
-
-<simto> * Similar-to. A semantic relational mark for
- closely related words which are not quite
- synonyms, nor hypernyms, nor hyponyms. Introduced
- with WordNet data.
-
-<conseq> * Consequence. For adjectives, is an attribute which
- or is a consequence of possessing the headword attribute.
-<hascons> Introduced with WordNet data.
-
-<consof> * Consequence of. For adjectives, an attribute which
- implies the headword as a natural consequence.
-
-<part> italic Part. Marks a word designating something which is
- conceptually a part of the headword. Rarely used.
-
-<parts> italic Part, plural form. Same as <part>, but marks the
- name of the part in its plural form.
-
-<partof> * Marks a word designating something of which the headword
- is conceptually a part. Inverse of <part>.
- This is very broad, and may mean constituent or
- separable part.
- Rarely used.
-
-<contxt> * Context. Used only for introductions to definitions,
- giving the context of usage, which are not part
- of the definition proper, as:
- <contxt>when used of a person:</contxt>
-
-<grp> * Marks the name of a group of people not formally
- organized.
-
-<membof> italic marks a group of which the headword is a member.
- This is rarely used, but should be indexed as
- an entry word or phrase.
-
-<member> italic marks a member of a group defined by the headword.
- This is rarely used, but should be indexed as
- an entry word or phrase.
-
-<members> italic Same as <member>, but marks a plural word,
- designating the name of the members in its plural form,
- for lack of ambiguity.
-
-<method> * Designates a special type of definition which
- describes a method for achieving the headword,
-
- used only once for the word "amend". The
- subdefinitions begin with "by".
-
-<corpn> * Name of a business company, corporation, or partnership.
- Started using November 1988. Rare.
-
-<corr> italic Correlative. A word intimately associated with the
- headword in a manner such that one cannot
- appear without the other. NOt exactly an inverse.
-
-<qperson> italic marks the name of a person, quoted in a dialogue.
- Used only in <q> blockquotes as of vers. 0.45.
-
-<org> * marks the name of an organization; sometimes used
- for the names of groups of people not
- formally organized *see also <grp>.
-
-<prod> italic produces. Designates a substance produced by
- a living organism. Rarely used.
-
-<prodp> * produces (plainfont). Designates a substance
- produced by a living organism. Same as <prod>,
- but does not affect font. Rarely used.
-
-<prodby> * produced by. Designates a living organism which
- produces the headword substance. Rarely used.
-
-<prodmac> italic produces. Designates an object or substance produced
- by a machine or process. Rarely used.
-
-<stage> italic life stage of an organism. Used to indicate
- variant forms of an organism defined by the
- headword. Rarely used.
-
-<stageof> * an organism one of whose life stages is the headword.
- Inverse (correlative) of <stage>. Rarely used.
-
-<inv> italic inversely related to headword -- e.g. depository
- is the inverse of depositor; buyer is the inverse of
- seller. Called "correlative" in the Webster 1913 and
- the CIDE. Rarely used.
-
-<methodfor> italic is a method to accomplish the action defined by
- the headword. Rarely used, and only in the
- supplemental section.
-
-<examp> italic example or instance of the headword, where the
- tagged and emphasized word is not a proper subtype.
---------------------------------------
-<p><hw>Pa*ron"y*mous</hw> <p><sn>2.</sn> <def>Having a similar sound, but different orthography and different meaning; -- said of certain words, as <examp>all</examp> and <examp>awl</examp>; <examp>hair</examp> and <examp>hare</examp>, etc.</def><br/
-[<source>1913 Webster</source>]</p>
--------------------------------------
-
-<sfield> * subfield of the headword, which must be a field
- of study or of knowledge
-<stage> italic a stage of life of the headword -- for living things,
- such as insects, whose life stages may take different
- names.
-
-<unit> italic a unit of measure, usually preceded by a number.
- Also used to tag the unit of a measure which is the
- headword.
-
-<uses> italic tags a tool or method used by the headword,
- which is usually some process.
-
-<usedfor> * tags a method or process for which the headword
- is a tool.
-
-<usedby> italic tags a tool or method which uses the headword,
- which is usually a physical object.
-
-<perf> italic performs -- tags a word which is a process or
- activity performed by the headword.
-
-<recipr> italic reciprocal -- used for cases where the tagged word
- is a reciprocal participant in an action, such as
- donor and recipient. The difference between this and
- <inv> inverse has not yet been systematically settled.
- Used seldom, and mostly in the supplemented version.
-
-<sig> italic significance, meaning -- used in definitions where the
- actual meaning is prefixed with commentary explaining
- usage or other attributes of the word, as with
- prefixes or suffixes.
-
-<wns> italic WordNet sense. Where known, the correspondence of the
- sense of an entry with that of WordNet 1.6 is
- given after the definition, in a tag of the
- form: <wns>[wns=3]</wns>, in which the number
- is the numbered sense in WordNet.
-
-<w16ns> italic WordNet version 1.6 sense. See <wns> for
- explanation.
-<wnote> * A note related to usage in the corresponding
- WordNet definition.
- =============================================================
-Biological classifications:
----------------------------
-<spn> italic Species name. Used to mark the taxonomic names
- of living things which are represented in
- italic font in the original printed version.
- Originally, not only species, but genera, orders and
- families were also thus marked. The conversion from
- <spn> to <fam>, <gen>, or <ord> is not completed, and
- <spn> may stil be found marking such groups.
- However, orders and families are also frequently
- mentioned in the original in normal font, and in such
- cases are not marked with any tag. So, this mark
- is not a reliable indicator of all mentions of
- taxonomic names.
-<kingdom> italic Taxonomic biological Kingdom name.
-<phylum> italic Taxonomic phylum name.
-<subphylum> italic Taxonomic subphylum name.
-<class> italic Taxonomic class name.
-<subclass> italic Taxonomic subclass name.
-<ord> italic Taxonomic order name.
- Also used for suborders, initially.
-<subord> italic Taxonomic suborder name.
-<suborder> italic Taxonomic suborder name.
-<fam> italic Taxonomic family name. Also used to tag "tribes".
-<subfam> italic Taxonomic subfamily name.
-<gen> italic Taxonomic genus name.
-<var> italic Variety. Used to mark subspecies or varities below
- the level of species in living organism systematic
- names.
-
-<varn> italic Variety. Used to mark subspecies or varities below
- the level of species in living organism systematic
- names. Duplicative variant of <var>
-
-
+ FIELD MARKS FOR WEBSTER 1913 and CIDE
+ =====================================
+Tagset.web:
+ Explanations of the tags used to mark the Webster 1913 dictionary
+and the CIDE (Collaborative International Dictionary of English).
+Note that the list of tags used to mark the public domain version
+of this dictionary is shorter than the full set described here.
+ If any tag is not listed here, it is either (1) one of the
+"point" (font size) or "type" (font style) tags, which should be self-explanatory; or
+ (2) Is a functional field with no effect on the typography.
+
+Last modified March 12, 1999.
+ For questions, contact:
+ Patrick Cassidy cassidy@micra.com
+ 735 Belvidere Ave.
+ Plainfield, NJ 07062
+ (908) 561-3416 or (908) 668-5252
+-------------------------------------------------------------
+A separate file, webfont.asc, contains the list of the individual
+non-ASCII characters represented by either higher-order hexadecimal
+character marks (e.g., \'94, for o-umlaut) or by entity tags
+(e.g., <root/, for the square root symbol.)
+--------------------------------------------------------------
+ Use of tags:
+ In the MICRA electronic version of the 1913 Webster, each part of
+the entry headed by an entry word ("headword") is labeled so that no
+part of the entry except some punctuation marks should be found
+outside of all fields, i.e. every character should be within some tagged
+field. In the following description, the word "segment" usually refers to
+a major part of an entry such as an etymology or a definition or a
+collocation segment or a usage block, containing more than one field.
+The term "field" may also be used similarly to "segment", but may also
+denote single-word fields, such as an alternative spelling, labeled <asp>.
+
+ Note: The tags on this list are similar in structure to SGML tags. Each
+tag on this list marks a field; each field opens with a tagname between
+angle brackets thus: <tagname>, and closes with a similar tag containing
+the forward slash thus: </tagname>. No tags are used without closing
+tags. Thus the HTML <BR> to indicate a line break is symbolized
+here as an entity, <br/, and every <p> has a corresponding </p>.
+ The absence of an end-field tag, or the presence of an end-field tag
+without a prior begin-field tag constitutes a typographical error, of which
+there may be a significant number. Any errors detected should be brought
+to the attention of PJC or the appropriate editor.
+ Most of the tagged fields are presented in the text in italic type,
+with a number of exceptions. Where a word is contained within more than
+one field, the innermost field determines the font to be used. Wherever
+recognizable functional fields were found, an attempt was made to tag the
+field with a functional mark, but in many cases, words were italicised only
+to represent the word itself as a discourse entity, and in some such cases,
+the "italic" mark <it> was used, implying nothing regarding functionality
+of the word. The base font is considered "plain". Where an italic field
+is indicated, parentheses or brackets within the field are not italicised.
+ Where no font is specified for a tag, the tag is merely a functional
+division, and was printed in plain font unless otherwise tagged. This type
+of segment is marked by an asterisk (*) where the font name would be.
+ The size of the "plain" font in the original text is about 1.6 mm for
+the height of capitalized letters.
+=============================================================
+Explicit typographical tags:
+ These were used where the purpose of a different font was merely to
+distinguish a word from the body of the text, and no explicit functional
+tag seemed apropriate.
+-----------------------------------
+Tag Font
+-----------------------------------
+Explicit formatting tags:
+. . . . . . . . . . . . . . . . . .
+<plain> plain font (that used in the body of a definition) --
+ normally not marked, except within fields of
+ a different front.
+<it> italic (in master files)
+<i> italic (for use in HTML presentation)
+<bold> bold (in master files)
+<b> bold (for use in HTML presentation)
+<colf> bold, Collocation font. Same font as used in collocations.
+ smaller This is used only in the list of "un-" words not
+ by 1 point actually defined in the dictionary. Probably could be
+ replaced by a segment mark for the entire list!
+ The "un-" words should be indexed as headwords.
+
+<ct> bold Same as <colf>, a font similar to that used in
+ collocations. However, this tag is used in a table
+ and could be set to a different font.
+
+<h1> * HTML tag -- largest heading font.
+
+<h2> * HTML tag -- second largest heading font.
+
+<headrow> * Marks a Row title in a table.
+
+<hwf> Font the same as the headword <hw>, though the field is
+ not a headword. Used only once.
+
+<mitem> * Multiple items, a set of items in a table.
+<point ...> A series of point size markers, many unique.
+<point1.5> * One of the tags of the form <point**> where **
+<point6> represents the typographic point size of the
+ enclosed text.
+<pre> An HTML tag indicating that the enclosed text is
+ of teletype form, preformatted in a uniform-spaced
+ font.
+<sc> small caps (used mostly for "a. d.", "b. c.")
+ This is the same font a <er>, but has no functional
+ or semantic significance
+<str> group of table data elements in a table
+<sub> subscript, like <subs>
+<subs> subscript
+<sups> superscript
+<supr> superscript
+<sansserif> Sans-serif font
+<stypec> Bold (collocation font) and also a subtype.
+<tt> HTML tage -- teletype font
+<universbold> A squared bold font without serifs approximating the
+ "universe bold" font on the HP Laserjet4, slightly
+ larger than the capitals in a definition body. Used
+ in expositions describing shapes, such as
+ "Y", "T", "U", "X", "V", "F".
+<vertical> Vertically organized column.
+<column1> Vertically organized column -- only part of a table
+ which needs to be completed. Used once.
+<...type> A series of tags, many unique, designating certain
+ unusual fonts, such as "bourgeoistype" for
+ "bourgeois type", in the section on typography.
+ Most of these occur only once, in the section on fonts.
+<antiquetype>
+<blacklettertype>
+<boldfacetype>
+<bourgeoistype>
+<boxtype>
+<clarendontype>
+<englishtype>
+<extendedtype>
+<frenchelzevirtype>
+<germantype>
+<gothictype>
+<greatprimertype>
+<longprimertype>
+<miniontype>
+<nonpareiltype>
+<oldenglishtype>
+<oldstyletype>
+<pearltype>
+<picatype>
+<scripttype>
+<smpicatype>
+<typewritertype>
+
+=============================================================
+Tags with semantic content:
+. . . . . . . . . . . . . . . . . . . . . . . . . . .
+<altsp> * Alternative spelling segment. Almost always
+ contained within square brackets after the main
+ definition segment. Expository words
+ such as "Spelled also" are in plain font;
+ the actual alternative spelling is marked by
+ <asp> ... </asp> tags within this segment.
+
+<ant> italic Antonym.
+
+<asp> italic Alternative spelling. The actual word which is an
+ alternative spelling to the headword. These
+ are functionally synonyms of the headword. In
+ most cases these also occur as headwords, with
+ reference to the word where the actual definition
+ is found, but not all such words are listed
+ separately, particularly if the spelling is
+ close enough to the headword to be found at the
+ same point in the dictionary. Whether listed
+ separately or not, these words should
+ be indexed at this location, also.
+
+<au> italic Authority or author. Used where an authority is
+ (may be right- given for a definition, and also used for the
+ justified. See author, where a quotation within double quotes
+ in the section is given in the same paragraph as the
+ on formatting). definition. The double quotes are indicated
+ by the open-quote (\'bd) and close-quote
+ (\'b8). In both cases, it is typically
+ right-justified, almost always fitting on
+ the same line with the last line of the
+ definition or quotation.
+ Within collocation segments, it is usually
+ used only after quotations, and is not right-
+ justified, except occasionally where it
+ would be close to the right margin, and then
+ apparently is is right-justified. We have
+ not explicitly marked those which are
+ right-justified, but they can be
+ recognized because they are on a line by
+ themselves, preceded by two carriage returns.
+
+<bio> * Marks a biography. Should be longer than
+ a short mention of who a person was, which
+ is typically included as a definition.
+
+<biography> * Same as <bio>
+
+<booki> italic Marks the name of a book, pamphlet, or similar
+ document.
+
+<branchof> * A field of knowledge which of which the headword
+ is a division.
+
+<caption> * Caption of a figure or table.
+
+<cas> * tags the CAS (Chemical Abstracts Service) registry
+ number for a chemical substance.
+
+<causes> italic tags the infectious disease caused by the headword.
+ Implied type of the agent is a microorganism, and
+ the tag must mark a disease.
+
+<causesp> * Same as <causes> without the italic type.
+<causedbyp> * Same as <causedby> without the italic type.
+
+<causedby> italic inverse of causes: tags the causative agent of an
+ infectious disease, which is the headword .
+ the tag must mark a microorganism, virus, or
+ prion, and the implied type of the headword is
+ a disease.
+
+<centered> Used only for The single letter in the headers to each
+ letter of the alphabet.
+
+<city> * marks the proper name of a city. Used only
+ occasionally and not consistently at this stage.
+
+<cnvto> italic Converted to: used to tag substances which are
+ products prepared by conversion from the
+ headword. Usually chemicals or complex
+ products from mnatuarl materials. Rarely used
+ up to 1998.
+
+<colheads> * List of heads for the columns of a table.
+
+<coltitle> * Title of a column in a table.
+
+<comm> * Comment -- differs from <note> in being in-line with
+ the definition paragraph. Provides a little
+ additional information.
+
+<company> * Name of a company (commercial firm). Compare <org>
+
+<compof> italic Composed of. Tags a substance of which the
+ headword is at least partly composed. The
+ substance may be particulate, such as
+ diatoms composing diatomaceous earth.
+
+<contains> * marks an object contained within the headword.
+
+<contr> italic Contrasting word. Not exactly an antonym, which
+ is marked <ant>, but a contrasting word which is
+ often introduced as "opposite to" or "contrasts
+ with".
+
+<country> * Name of a country (nation) of the world.
+
+<cref> italic Collocation reference. A reference to a collocation.
+ Each such collocation should have its own entry,
+ marked by <col> ... </col> tags, and these
+ references should function as hypertext buttons
+ to access that entry.
+
+<date> * A Date, of any type, e.g. <date>Dec. 25</date>.
+
+<datey> * Date-with-year tags a date containing a year.
+
+<def> * definition. The definition may have subfields,
+ particularly <as> (an illustrative phrase
+ starting with "as" or "thus" and containing
+ the headword (or a morphological derivative).
+ The <mark>, \'bd...\'b8 quotations (left and
+ right double quotes) and <au> fields may be
+ found within a definition field, but should
+ and usually are located outside the definition
+ proper. The marking macro was
+ inconsistent in this placement, and the
+ exclusion of the <mark>, <au> and quotations
+ needs to be completed by the proof-readers.
+ Certain definitions contain <pos>
+ fields within them, where the headword is
+ an irregular derivative of another headword.
+ In these cases, the <pos> field follows
+ immediately after the <def> tag, and these
+ entries do not have a separate <pos> field.
+ In such cases, the <pos> field is italic, as
+ usual.
+
+<divof> * Division of the headword, usually an organization.
+ E. g. a faculty or department of a university,
+ or a United Nations agency.
+
+<edi> * Marks an education institution, a subtype of
+ organization.
+
+<emits> * tags a physical object or form of radiation
+ emitted by the headword
+
+<figure> Just a place-holder for illustrations, but seldom used.
+
+<film> italic Marks the name of a movie film.
+
+<fld> italic Field of specialization. Most often used for
+ Zoology and Botany, but many "fields of
+ specialization" are marked for technical
+ terms. The parentheses are usually within this
+ field, but are not themselves in italics.
+
+<geog> * Name of a geograpahical region of any size;
+ if applicable, the more specific <city>,
+ <state>, or <country> are preferred.
+
+<hypen> * Hyperym. Points to the hypernym from WordNet 1.5
+ Initially, used only for entries extracted
+ from WordNet 1.5. Not present in the original
+ 1913 version.
+
+<illu> * Illustrative usage -- mostly from WordNet, and placed
+ outside the definition, in contrast to <as> usage.
+ These should be converted to <as>...</as> illustrative
+ usage format for consistency.
+
+<illust> * Illustration place-holder. Seldom used.
+<img> * HTML usage -- points to an image file, usually
+ .gif or .jpg. These have no closing tag, and
+ will appear as errors in parsing.
+<intensi> * Points to a word whose meaning is an intensified
+ form of the headword. Taken from WordNet
+ tags, used with some adjectives from WordNet
+<item> * Designates one item in a row of a table. Used only when
+ intervening spaces do not serve properly as natural
+ field separaters.
+<itran> italic Translation into a foreign (non-English) language
+ of the previous word in the text -- italic font.
+ (<sig> is a translation into English)
+<itrans> italic Same as <itran>
+<jour> * Title of a journal (periodical).
+<matrix> * Always a filled rectangular array.
+<matrix2x5> * A 2x5 matrix (2 rows by 5 columns).
+<mstypec> * Multiple synonymous subtypes -- used in
+ def. of "grass".
+<mtable> * Multiple table, encloses <table> figures.
+<musfig> * Music figure. Only in a note under the entry "Figure",
+ the two numbers of each such field
+ are bold, 20 point type, stacked as in a fraction with
+ a bar between them, but also having a horizontal stroke
+ midway through each numeral. Unique to this entry.
+<p> * paragraph tag, used always in pairs. Line breaks may
+ be embedded inside the paragraphs.
+<person> * marks the proper name of a person. Used only
+ occasionally, but should be used more frequently
+ for cases where first names are abbreviated,
+ to reduce ambiguity of the period for automatic
+ analysis. Where a title is given, prefixed
+ or postfixed, it is included in this tag.
+
+<persfn> * marks the name of a person, when only one name
+ (usually the last name) is given. Not used
+ consistently where it should be.
+
+<publ> * Marks the name of a publication other than book,
+ which is marked by <booki>. It is often a
+ magazine or journal.
+<qpers> * Tags the name of a person who is speaking,
+ within a quotation.
+<qperson> Same as <qpers>
+<cp> * Collocation, plain text -- used to tag phrases that
+ should be parsed as a unit, but has no typographical
+ significance.
+<qau> italic Always right-justified, as described for <au>.
+<ref> * A reference to a word in the vocabulary.
+<refs> * Marks the set of references used for a longer article
+ such as a biography.
+<river> * Marks the name of a river -- a proper name
+<rj> * Right justified
+<row> * Designates a row in a table.
+<state> * Name of a geopolitical state, the first subdivision of
+ a country. Includes, e.g. Canadian provinces.
+<subtypes> * Lists subtypes of the headword.
+<sup> * superscript
+<supr> * Supra. The two parts of each such field
+ are stacked, one over the other, *without* a
+ horizontal bar between (as in a fraction).
+ Used only in one entry, for a musical notation.
+<table> * Always a filled rectangular array, having <row> and <item>
+ elements.
+<td> * Table datum - one cell in a table
+<th> * Table header
+<tradename> * Tags a commercial Trade name
+<ttitle> * Table title (Larger than normal font)
+====================================================================
+
+Functional Tags
+--------------------------------------------------------------------
+Tag Font Meaning
+ (Comparatives are relative to the plain font.)
+-----------------------------------------------------------------------
+<-- --> * Comment, not a tag. These segments should be deleted
+ from the written or printed text.
+ Page numbers of the original text are indicated
+ within such comments; these may be left in, if
+ desired.
+
+<! !> * HTML-style comment. Used to indicate page numbers
+ in the public domain version.
+
+<adjf> small caps Tags for the actual adjective or adverb
+ comparatives or superlatives. Should be
+ indexed. See also conjf (verbs) and
+ decf (nouns).
+
+<altname> italic Alternative name. Usually for plants or animals,
+ but also used for other cases where words
+ are introduced by "also called", "called also",
+ "formerly called". These are functionally
+ *synonyms* for that word-sense.
+
+<altnpluf> italic Same as <altname>, but the marked word is a
+ plural form, whereas the headword is singular.
+
+<amorph> * Adjective morphological segment, primarily
+ the comparative and superlative forms.
+ The occasional adverb morphology is
+ also tagged this way.
+
+<as> * A segment occurring within the definitional
+ sentence, providing an example of usage of
+ the headword. Not conceptually a part of the
+ actual definition.
+
+<cd> smaller spacing Collocation definition. Similar in structure
+ to headword definitions (the <def> field). May
+ contain an <as> field. Plain type, but with
+ closer spacing than main definitions.
+
+<col> bold, Collocation. A word combination containing the
+ smaller by headword (or a morphological derivative).
+ 1 point The collocations do not have an explicitly
+ marked part of speech.
+ See also <ecol>, tagging embedded collocations.
+
+<colp> Collocation, no typographic significance.
+ Used to mark a word combination defined in
+ the dictionary without affect on font.
+
+<conjf> small caps The conjugated (non-infinitive) forms of
+ verbs. imp. & p. p. is common, as well as
+ p. pr. & vb. n. Irregular variants of
+ these are less common. Words in this
+ field perhaps should be indexed.
+
+<cs> smaller Collocation segment. The font and size is
+ vertical normal in a cs, but the spacing between lines
+ spacing is smaller (0.9 mm between lower-case letters,
+ rather than 1.1 mm in the main body of the
+ definition). For an on-line dictionary,
+ reproducing this typography is probably
+ pointless.
+
+<decf> small caps The actual morphological variants of nouns or
+ pronouns. Should be indexed.
+
+<ecol> * Embedded Collocation. A word combination
+ containing the headword (or a morphological
+ derivative, embedded within a definition
+ without a separate definitin of its own.
+ These collocations should be defined
+ implicitly by the text of the definition in
+ which they are embedded.
+ See also <col>, tagging explicitly defined
+ collocations.
+<er> Small Caps Entry reference. References to headwords
+ within the "etymology" section are in small
+ caps. Such references also occur
+ in the body of definitions, and in "usage"
+ segments.
+ Such entry references should function as hypertext
+ buttons to access that entry.
+
+<ety> * Etymology. Always contained within square
+ brackets. Normal type is used for explanatory
+ comments, and italics for the actual words
+ (marked <ets>) considered as etymological
+ sources.
+
+<ets> italic Etymological source. Words from which the
+ headword was derived, or to which it is related.
+ The Greek words within an etymology segment
+ are invariably etymology sources, and should
+ be marked as such, but are not so marked,
+ even in the rare cases where the Greek word
+ transliteration has been written in.
+
+<etsep> italic Etymological source, being the name of a person
+ or geographical location which is the eponym
+ for the concept. This is used to distinguish
+ eponymous etymologies from others, and can also
+ be found in the body of a definition or note,
+ not only in the etymology field. Very few
+ of the names that should be marked this way
+ have actually been so marked, as of version
+ 0.42. In cases where such eponymous names
+ have not yet been thus marked, they will
+ usually be marked by <xex>, the non-semantic
+ italic-font marker, or, in etymologies, by
+ <ets>.
+
+<ex> italic Example. An example of usage of the headword,
+ usually found within an <as> or <note> segment.
+
+<fr> * Frequency of use, ordinal rank. This is used for
+ WordNet entries, in which the synonyms
+ were ranked in order of frequency of use.
+ <fr>1</fr> indicates that the headword is the
+ first word on the list of synonyms.
+
+<fu> * First use. A date at or around which the first
+ use of this word in writing is recorded.
+ Not in the original 1913 Webster, and usu.
+ taken from a recent dictionary. Only a few
+ such fields have been entered as of version
+ 0.41
+
+<grk> transliteration Greek. The Greek words have been transliterated
+ using the equivalents explained in the
+ file "webfonts.asc". In most cases, the
+ transliterations are typical for Greek
+ letters, except for theta (transl = q),
+ phi (transl. = f), eta (transl. = h), and
+ upsilon (transl. = y, whether pronounced
+ as y or u). This was to eliminate any
+ ambiguity. These words occur primarily
+ in etymologies, and to conform to the
+ usage of <ets> should also be marked
+ by <ets>, but as of version 0.41 they
+ are not usually thus marked.
+
+<hw> bold, headword. Each main entry begins with the <hw>
+ larger by mark, and ends at the next <hw> mark. The
+ 2 points main entries are not otherwise explicitly
+ marked as a distinctive field.
+ The same word may appear as a headword
+ several times, usually as different parts
+ of speech, but sometimes with different
+ entries as the same part of speech, presumably
+ to indicate a different etymology.
+ Within the hw field the heavy accent is
+ represented by double quote ("), the
+ light accent by open-single-quote (`),
+ and the short dash separating syllables by
+ an asterisk (*). A hyphen (-) is used to
+ represent the hyphen of hyphenated words.
+
+<mark> italic, Usage mark. Almost always within square
+ brackets, occasionally in parentheses or
+ without any bracketing.
+ but The most common usage marks,
+ explanatory "Obs." = obsolete "R." = rare, "Colloq." =
+ may be plain. colloquial, "Prov. Eng." = Provincial England,
+ etc. are in italics. Some usage notes are also
+ marked with <mark>, but are in plain. For
+ simplicity, all words in this field may be
+ italic, until additional explicit marks are
+ added.
+
+<markp> * A usage mark in plain type (not italic). Found
+ within a definition, when there are more than
+ one sense-number listed. "Fig." at the head
+ of an entry is the most common case.
+
+<mcol> * Multiple collocation. Similar to multiple
+ headword, when two or more collocations share
+ one definition; however, the two collocations
+ are in-line, rather than stacked or justified.
+ There may be "or" or "and" words
+ (italicised), or an "etc." (plain type)
+ within this field. In many cases, the
+ <or/ and <and/ entities are used to
+ signify the change of font for these words.
+
+<mhw> * Multiple headword. This field is used where
+ more than one headword shares a single
+ definition. In the dictionary, the
+ (usually) two headwords are left-justified
+ one below the other in the column, and are
+ tied together on the right side of the
+ headwords by a long right curly brace.
+ This division is strictly functional,
+ for analytical purposes, and does not
+ affect the typography.
+
+<nmorph> * Noun morphology section. Rarely used, mostly
+ for irregular personal pronouns.
+
+<note> * Explanatory note. No explicit font is indicated.
+ These segments may be separate, as in the
+ separate paragraphs starting <note><hand/,
+ or they may just be further explanation within
+ (or more usually, following) the main
+ definition paragraph. Typographically,
+ the notes following the main definition may
+ not be distinguishable from additional
+ sentences appended to the first sentence
+ of a definition.
+
+<plu> * Plural. The "plural" segment starts with a
+ "pl." which is italicised, but in this
+ segment is not otherwise marked as
+ italicised. Other words occurring in this
+ segment are plain type. The "pl." can be
+ easily explicitly marked if necessary.
+
+<pos> italic Part of speech. Always an abbreviation: e.g.,
+ n.; v. i.; v. t.; a.; adv.; pron.; prep.
+ Combinations may occur, as "a. & n.".
+
+<plw> small caps Plural word. The actual plural form of the word,
+ found within a <plu> segment.
+
+<pr> * pronunciation. The default font is normal, but
+ many non-ASCII characters are used.
+ The pronunciation field may have more than
+ one pronunciation, separated by an "<or/".
+ (An "or" here is in italic, and usually is
+ represented by the entity <or/).
+ There may also be some commentary, such as
+ "Fr."(French pronunciation) or "archaic".
+ The commentaries are typically italic, and
+ should be marked as such. In certain
+ pronunciations there is a numbered reference
+ to a root form explained in an introductory
+ section on pronunciation.
+ Very few of the pronunciation fields have
+ been filled in. The pronunciation markings use
+ a more complicated method than more modern
+ dictionaries. It would be interesting to have
+ these fields filled in, if there are any
+ volunteers willing to do it.
+
+<q> smaller by Quotation. No bracketing quotation marks,
+ two points, though occasionally \'bd-\'b8 quotations occur
+ centered, within these quotations. These quotations
+ Separate tend to be more complete sentences, rather
+ paragraph than just phrases, such as are contained
+ within quotation marks within the definition
+ paragraph.
+
+<qau> italic, Quotation author. Used only for the quotations
+ right justified marked with <q> that are centered in their
+ own paragraphs.
+
+<qex> italic Quotation example. An example of usage of
+ the headword, within quotations marked
+ by <q>..</q> tags.
+
+<sd> italic Subdefinition, marked (a), (b), (c), etc. THese are
+ finer distinctions of word senses, used
+ within numbered word-sense (for main entries),
+ and also used for subdefinitions within
+ collocation segments, which have no numbering of
+ senses. The letter is italic, the parentheses
+ are not. This tag is also used to indicate the
+ lettered subdefinition when it is referred to
+ at another point in the text.
+
+<ship> italic The name of a ship. Rarely used.
+
+<sing> * Singular. Analogous to the <plu> segment, but more
+ rarely used, mostly for Indian tribes, which
+ are listed in the plural form.
+
+<singw> small caps Singular word. The singular form of the
+ plural-form headword.
+
+<sn> bold, Sense number. A headword may have over 20
+ larger by different sense numbers. Within each numbered
+ 2 points sense there may be lettered sub-senses. See
+ the <sd> (sub-definition) field.
+
+<source> italic Source. The author of the definition. Used only
+ for definitions not originally present in
+ Webster 1913, and not present in the original
+ version intended to mimic the 1913 printed
+ dictionary. This source is used for each
+ word sense, and may differ for different
+ senses of a word, especially where a Web1913
+ definition was substantially modified, or a
+ new word sense was added to a previously
+ defined word.
+
+<syn> plain Synonyms. A list of synonyms, sometimes followed
+ by a <usage> segment.
+
+<usage> narrower Comparisons of word usage for words which are
+ spacing sometimes confused. As with collocation segments,
+ font is plain, but spacing is smaller than
+ normal definition spacing. This seems pointlessly
+ complicating for an on-line display.
+
+<vmorph> * Verb morphology (conjugation) segment, delimited
+ by square brackets.
+
+<wordforms> * Morphological derivatives not contained in the
+ bracketed segments, as above. For nouns
+ derived from adjectives, adverbs from
+ adjectives, etc. This segment is usually
+ found at the end of the main entry. The
+ adverbial and nominalized derivatives at the
+ end of a main entry are usually introduced
+ by an em dash [represented as two hyphens (--)].
+
+<wf> bold, Same font as <hw>, with accents and syllable
+ larger by breaks marked as in the headword.
+ 2 points Marks the actual morphological forms within
+ a <wordforms> segment; typically, adverbial or
+ nominalized form of an adjective.
+
+
+<def2> * Second definition (occasionally, a third definition is
+ present). This is used where a second or third
+ part of speech with the same orthography is
+ placed under one headword. Within this segment,
+ there will be a <pos> field, and sometimes
+ a <mark> and/or a quotation.
+
+<specif> * "Specifically:" Used to mark the words "specifically",
+ "Hence", "as" which are used to introduce a second
+ definition typically more specific than the first,
+ but in general derived by extension of the initial
+ definition. This functions as a warning of multiple
+ definitions where the sense-numbers are not explicitly
+ used. It is also useful in separate senses, to
+ tag polysemous definitions which may be
+ specializations or generalizations of the preceding
+ definition.
+
+<pluf> italic. Plural form.
+ Used exclusively to mark the "pl." abbreviation,
+ which introduces a definition for the headword,
+ *when used in the plural form*. Not related to
+ <plu>, which spells out the plural form, but does
+ define it.
+
+<uex> italic Usage example. Used only a few times, within
+ <usage> segments.
+
+<isa> italic supertype (hypernym) the inverse of <stype> and
+ identical to <hypen> but not derived from WordNet.
+
+<chform> plain, Chemical formula. The letters are plain font,
+ numbers but the numbers are subscript. This is mostly
+ subscript useful as a functional mark to pinpoint
+ chemicals.
+
+<chformi> plain, Chemical formula same as <chform>, but not
+ processed specially by the tag-converter program.
+ The letters are plain font, but the numbers are
+ subscript.
+ Used in place of <chform> when the formula has
+ a tag inside, which cannot now be processed by the
+ <chform> processing routine.
+
+<chname> * chemical name. Used to allow a IUPAC chemical
+ name to be processed as a unit in spite of
+ embedded dashes, parentheses, and commas.
+
+<see> * "see" reference to related words, outside of the
+ main <def>definition</def> field.
+
+<mathex> italic Mathematical expression. In this dictionary,
+ essentially all letters (used as variable labels)
+ in math expressions are in italic font.
+ The "+" and "-" may also appear typographically
+ different from elsewhere in the dictionary.
+
+<ratio> italic Also a mathematical expression, but the colon and
+ double colon may have a different typography
+ than usual., as in <ratio>a:b</ratio>
+
+<singf> italic Singular form. Analogous to <pluf>, to define
+ the singular word where the headword is the
+ plural form. ** only modifies the word "sing."
+
+<mord> * Morphological derivation. Used to mark the
+ entry-reference portions of those
+ entries which are defined as morphological
+ derivatives (plural, p. p., imp.) of other
+ headwords. Used just as an attempt to
+ mark and regularize the entry format.
+ May be ignored typographically.
+
+<fract> a stack, Fraction. Used for non-numerical fractions
+ with which cannot be expressed as a <frac12/-style
+ numerator, entity. The forward slash "/" is to be
+ horizontal interpreted as a horizontal line separating
+ bar, and the numerator and denominator.
+ denominator
+
+<exp> superscript, Exponential. Used in mathematical expressions.
+ smaller
+ font.
+
+<xlati> italic Translation (e.g. of Greek), in the body of a
+ definition or etymology. Used only twice.
+
+<tran> italic Word translated: the word in italic is translated
+ by a subsequent word. Usually in etymologies, where
+ the word translated is not actually etymologically
+ related to the headword. The translated word
+ is not necessarily English.
+
+<tr> italic translation of the preceding word (or of the
+ headword) into English.
+
+<fexp> * Functional expression (math). The function names are
+ in plain type, the variables are italic.
+
+<iref> italic Illustration reference. Used ony occasionally, not
+ yet (v. 0.41) consistently.
+
+<figref> italic Figure reference.
+
+<figcap> * Figure caption.
+
+<figtitle> * Figure title.
+
+<funct> * tags a mathematical function or expression.
+
+<chreact> * Chemical reaction. Similar to chemical formulas (which
+ are contained but not explicitly marked), with
+ some other symbols.
+
+<ptcl> italic Verb Particle. Only a few particles were actually
+ marked, but in a future version more may be.
+
+<tabtitle> ? Table Title. Used only once.
+
+<title> italic Title of a literary work, movie, opera, musical
+ composition, etc. Used rarely but should be
+ used in every case, except in <au> references.
+
+<root> * Square root -- differs from the entity <root/,
+ which is a square root sign that does not extend
+ beyond the number following it. The <root>
+ field has a bar (vinvulum) over the expression
+ within the field, as well as the square root symbol
+ preceding the expression in the field. Used only
+ once.
+
+<vinc> * Vinculum. In a mathematical expression, a bar
+ extending over the expression within the field.
+ Used only once. This apparently serves the same
+ function as a parentheses, of causing the
+ expression within the field to be evaluated
+ and the result used as the (mathematical) value
+ of the field.
+
+<nul> plain Nultype. An older version of <plain>.
+
+<cd2> * Second collocation definition. Somewhat similar to
+ <def2>. Purely a mark to reduce functional ambiguity,
+ with no effect on the typography.
+
+<hypen> * Hypernym. Mark introduced for the World Wide Webster,
+ when adding words from WordNet. In most cases, this
+ tag marks the WordNet hypernym (for nouns and verbs).
+ Where the <au> mark is PJC or includes a +PJC, the
+ hypernym may not be the same as in WordNet. The words
+ marked by this tag need to be bracketed in some way,
+ but this is deferred until the definitions included
+ with the hypernyms have been deleted, and other
+ disambiguating marks substituted.
+
+<stype> italic Subtype. A functional mark, to point out words which
+ are conceptually subtypes of the headword.
+
+<styp> * Subtype. A functional mark, to point out words which
+ are conceptually subtypes of the headword, but
+ with no *typographical* significance.
+
+<simto> * Similar-to. A semantic relational mark for
+ closely related words which are not quite
+ synonyms, nor hypernyms, nor hyponyms. Introduced
+ with WordNet data.
+
+<conseq> * Consequence. For adjectives, is an attribute which
+ or is a consequence of possessing the headword attribute.
+<hascons> Introduced with WordNet data.
+
+<consof> * Consequence of. For adjectives, an attribute which
+ implies the headword as a natural consequence.
+
+<part> italic Part. Marks a word designating something which is
+ conceptually a part of the headword. Rarely used.
+
+<parts> italic Part, plural form. Same as <part>, but marks the
+ name of the part in its plural form.
+
+<partof> * Marks a word designating something of which the headword
+ is conceptually a part. Inverse of <part>.
+ This is very broad, and may mean constituent or
+ separable part.
+ Rarely used.
+
+<contxt> * Context. Used only for introductions to definitions,
+ giving the context of usage, which are not part
+ of the definition proper, as:
+ <contxt>when used of a person:</contxt>
+
+<grp> * Marks the name of a group of people not formally
+ organized.
+
+<membof> italic marks a group of which the headword is a member.
+ This is rarely used, but should be indexed as
+ an entry word or phrase.
+
+<member> italic marks a member of a group defined by the headword.
+ This is rarely used, but should be indexed as
+ an entry word or phrase.
+
+<members> italic Same as <member>, but marks a plural word,
+ designating the name of the members in its plural form,
+ for lack of ambiguity.
+
+<method> * Designates a special type of definition which
+ describes a method for achieving the headword,
+
+ used only once for the word "amend". The
+ subdefinitions begin with "by".
+
+<corpn> * Name of a business company, corporation, or partnership.
+ Started using November 1988. Rare.
+
+<corr> italic Correlative. A word intimately associated with the
+ headword in a manner such that one cannot
+ appear without the other. NOt exactly an inverse.
+
+<qperson> italic marks the name of a person, quoted in a dialogue.
+ Used only in <q> blockquotes as of vers. 0.45.
+
+<org> * marks the name of an organization; sometimes used
+ for the names of groups of people not
+ formally organized *see also <grp>.
+
+<prod> italic produces. Designates a substance produced by
+ a living organism. Rarely used.
+
+<prodp> * produces (plainfont). Designates a substance
+ produced by a living organism. Same as <prod>,
+ but does not affect font. Rarely used.
+
+<prodby> * produced by. Designates a living organism which
+ produces the headword substance. Rarely used.
+
+<prodmac> italic produces. Designates an object or substance produced
+ by a machine or process. Rarely used.
+
+<stage> italic life stage of an organism. Used to indicate
+ variant forms of an organism defined by the
+ headword. Rarely used.
+
+<stageof> * an organism one of whose life stages is the headword.
+ Inverse (correlative) of <stage>. Rarely used.
+
+<inv> italic inversely related to headword -- e.g. depository
+ is the inverse of depositor; buyer is the inverse of
+ seller. Called "correlative" in the Webster 1913 and
+ the CIDE. Rarely used.
+
+<methodfor> italic is a method to accomplish the action defined by
+ the headword. Rarely used, and only in the
+ supplemental section.
+
+<examp> italic example or instance of the headword, where the
+ tagged and emphasized word is not a proper subtype.
+--------------------------------------
+<p><hw>Pa*ron"y*mous</hw> <p><sn>2.</sn> <def>Having a similar sound, but different orthography and different meaning; -- said of certain words, as <examp>all</examp> and <examp>awl</examp>; <examp>hair</examp> and <examp>hare</examp>, etc.</def><br/
+[<source>1913 Webster</source>]</p>
+-------------------------------------
+
+<sfield> * subfield of the headword, which must be a field
+ of study or of knowledge
+<stage> italic a stage of life of the headword -- for living things,
+ such as insects, whose life stages may take different
+ names.
+
+<unit> italic a unit of measure, usually preceded by a number.
+ Also used to tag the unit of a measure which is the
+ headword.
+
+<uses> italic tags a tool or method used by the headword,
+ which is usually some process.
+
+<usedfor> * tags a method or process for which the headword
+ is a tool.
+
+<usedby> italic tags a tool or method which uses the headword,
+ which is usually a physical object.
+
+<perf> italic performs -- tags a word which is a process or
+ activity performed by the headword.
+
+<recipr> italic reciprocal -- used for cases where the tagged word
+ is a reciprocal participant in an action, such as
+ donor and recipient. The difference between this and
+ <inv> inverse has not yet been systematically settled.
+ Used seldom, and mostly in the supplemented version.
+
+<sig> italic significance, meaning -- used in definitions where the
+ actual meaning is prefixed with commentary explaining
+ usage or other attributes of the word, as with
+ prefixes or suffixes.
+
+<wns> italic WordNet sense. Where known, the correspondence of the
+ sense of an entry with that of WordNet 1.6 is
+ given after the definition, in a tag of the
+ form: <wns>[wns=3]</wns>, in which the number
+ is the numbered sense in WordNet.
+
+<w16ns> italic WordNet version 1.6 sense. See <wns> for
+ explanation.
+<wnote> * A note related to usage in the corresponding
+ WordNet definition.
+ =============================================================
+Biological classifications:
+---------------------------
+<spn> italic Species name. Used to mark the taxonomic names
+ of living things which are represented in
+ italic font in the original printed version.
+ Originally, not only species, but genera, orders and
+ families were also thus marked. The conversion from
+ <spn> to <fam>, <gen>, or <ord> is not completed, and
+ <spn> may stil be found marking such groups.
+ However, orders and families are also frequently
+ mentioned in the original in normal font, and in such
+ cases are not marked with any tag. So, this mark
+ is not a reliable indicator of all mentions of
+ taxonomic names.
+<kingdom> italic Taxonomic biological Kingdom name.
+<phylum> italic Taxonomic phylum name.
+<subphylum> italic Taxonomic subphylum name.
+<class> italic Taxonomic class name.
+<subclass> italic Taxonomic subclass name.
+<ord> italic Taxonomic order name.
+ Also used for suborders, initially.
+<subord> italic Taxonomic suborder name.
+<suborder> italic Taxonomic suborder name.
+<fam> italic Taxonomic family name. Also used to tag "tribes".
+<subfam> italic Taxonomic subfamily name.
+<gen> italic Taxonomic genus name.
+<var> italic Variety. Used to mark subspecies or varities below
+ the level of species in living organism systematic
+ names.
+
+<varn> italic Variety. Used to mark subspecies or varities below
+ the level of species in living organism systematic
+ names. Duplicative variant of <var>
+
+
diff --git a/WEBFONT.ASC b/WEBFONT.ASC
index 591de89..198c0e0 100644
--- a/WEBFONT.ASC
+++ b/WEBFONT.ASC
@@ -1,603 +1,603 @@
- WEBSTER FONTS
- =============
-
- Fonts for the Webster 1913 Dictionary.
- For version 0.50
- Last edit May 5, 2001
- ______________________________________
- (This file contains some extended ASCII characters, and should be
-transmitted in binary mode)
-----------------------------------------------------------------------
-
- This file describes a modified font for use in visualizing the
-text of the 1913 "Webster's Revised Unabridged Dictionary" (W1913),
-usable for the DOS operating system of IBM-compatible personal computers.
-The electronic version of that dictionary and this font were prepared by
-MICRA, Inc., Plainfield NJ, and are copyrighted (C) 1996 by MICRA, Inc.
-For details of permissions and restrictions on using these files, see
-the accompanying file "readme.web".
- The special characters used in the electronic version of the Webster
-1913 are required for visualizing unusual characters used in the
-etymology and pronunciation fields of the dictionary, in a form
-comparable to the way they appear in the original. Since there are
-more than 256 characters used in that dictionary, not all can be
-represented by single-byte codes, and are instead represented by
-SGML-style "short-form" symbols. (rather than the "entity" format
-"&xx;" The ampersand is used frequently, and we prefer to leave
-the "<" as the only "escape" character) of the type <x/ where x
-is a specific code for the symbol in the dictionary.
-See the "Short Form" section below for details about such characters.
-Note that the symbols used here are in some cases abbreviations
-(for compactness) of the ISO 8879 recommended symbols. If necessary,
-the table below allows simple replacement by alternate encodings.
- This symbol font can be loaded in IBM-compatible (x86) computers
-running the DOS operating system by using the "font.bat" command file
-in the "utils" directory. The fonts files for 8x14 and 8x16 fonts are
-"web14.fnt" and "web16.fnt" respectively.
- For those loading the Webster onto some machine other than an
-IBM-compatible running DOS, it will be necessary to provide a
-translation table, to convert these characters into a code that
-can be handled by that computer. For this reason, I attach an
-"explanation" for each character, for those who cannot view
-the original DOS font.
- The DOS-loadable font does not contain all of the characters needed
-to depict the etymologies or the pronunciations. In addition to an
-absence of several characters used in the pronunciations, no Greek letters are
-included. The Greek words appearing in the etymologies,
-when they are included, will be typed in a
-roman-letter transcription (See section on Greek transcription, below).
-Only a very few Greek words have been thus transcribed as of the
-present version (version 0.41).
- Wherever the typists did not know the character to use, they
-usually inserted a reverse-video question mark (decimal 176).
-This appears in full-ASCII versions as <?/. This mark was used both for
-characters in non-ASCII fonts, and for unreadable characters (i.e.,
-characters smeared in the original or distorted in the copies available
-to the typists. The type in the original was in many places smeared and
-illegible at the left and right page margins; occasionally, small
-parts of words were blotted out by plain white space).
- A character table for the high-order characters appears below.
-Under that is a list and description of most of the special characters
-used in the Webster files.
- Note that there are yet some characters used in the etymologies,
-and some other symbols, which are not in this list. For example, the
-vowels with a double dot *underneath*, e.g. a (as in all) have no representation
-in this character set, and, where explicitly entered in the dictionary,
-are represented by <xdd/ where "x" is the letter, as in "<add/".
-
-ITALICS
--------
- In most places, italic font is represented by the tags <it>...</it>
-surrounding the italic text, or by some other tag which also implies
-italic font. In the pronunciations, however, where italicized vowels
-are used among non-italic and other special characters to indicate
-pronunciation, the special codes <ait/, <eit/, <iit/, <oit/, <uit/,
-are also used to indicate the italicized vowel.
-
-DIACRITICS
--------------
- The European grave and acute accents are represented by the
-standard (IBM PC) high-order codes. Other characters with diacritics
-are represented by special "entity" codes, and in some cases also
-are found in this special WEB1913 font, described below.
- Vowels with a circle above (as in Swedish) are coded <xring/
-(x with a ring, or "degrees" mark over it); vowels with tilde over them
-are represented by <xtil/, where "x" is the vowel, as in <etil/ (<atil/
-also has code 238); letters with a dot above are represented by <xdot/
--- letter with a dot below are represented by <xsdot/ ("subdot");
-vowels with the semi-long mark (a macron with a short perpendicular
-vertical stroke attached above) are represented by <xsl/; the
-circumflex vowels have codes on this list, but may also be represented
-as <xcir/; vowels with macrons above are <xmac/ (including <oomac/,
-the "oo" with an unbroken macron above the two letters, <aemac/ = the
-ligature ae with a macron [also 214 = \'d6], and <oemac/ the ligature
-oe with a macron [also 215 = \'d7]); vowels with umlauts or a crescent
-(breve) above have codes in this list, but may also be represented by
-<xum/ and <xcr/ respectively. There is an occasional hacek or caron mark
-(an inverted circumflex) in the original; such letters are coded <xcar/.
-The o with a caron has code 213, but no others are in this font list.
-The diaeresis is treated typographically as identical to the umlaut.
- A special modification, used only for poetry (see entry "saturnian verse"
-under "saturnian") is a vowel with a macron, in which the macron is lighter
-than the usual macron, signifying a stressed syllable which has a short
-vowel sound. This is represented by <xsmac/ ("short mac").
- Another special character used in pronunciations is an "n" with an underline (like
-a macron, but below the letter), used to represent the "ng" sound. This is coded
-<nsm/ ("n sub-macron"). The ligated th used in pronunciations to depict the
-"th" sound of "the" is coded as <th/.
- NOTE: the letter combinations "fi" and "fl" are invariably printed as the
-ligatures &filig; and &fllig;, but these ligatures are not marked as such
-in this transcription, and the two letters are left as individuals.
-
-SPECIAL SYMBOLS
- The dagger <dag/, double dagger <ddag/, and paragraph mark <para/ are rarely used.
- The double prime, or "seconds" of a degree is sometimes represented by
-a double "light accent" (code 183 = \'b7). In other places, and in later
-versions, it is represented by <sec/ = hex a9, in the webfont.
- The symbols "greater than" <gt/ and "less than" are encountered only
-once, but are distinguished from the right- and left-angle brackets
-(> and <) because of possible typographical differences in some fonts.
- The schwa is symbolized by <schwa/. It is not used in the
-pronunciations, but is mentioned as a symbol.
- The right-pointing arrow is <rarr/, consistent with ISO 8879.
-
-----------------------------------
-Table 1
-----------------------------------
-Numbers
- Hex codes
-1  
-11   (12 is a hard page break, 13 CR, 14 sect break)
-21  
-31  !"# $%&'(
-121 yz{|} ~ 79-7d 7e-82
-131 83-87 88-8c
-141 8d-91 92-96
-151 97-9b 9c-a0
-161 a1-a5 a6-aa
-171 ab-af b0-b4
-181 b5-b9 ba-be
-191 bf-c3 c4-c8
-201 c9-cd ce-d2
-211 d3-d7 d8-dc
-221 dd-e1 e2-e6
-231 e7-eb ec-f0
-241 f1-f5 f6-fa
-251 fb-ff
-
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
-Below is a complete list of the symbols used in the Webster ("webfont")
-which are encoded in the special font listed above, together with
-corresponding symbols in ISO 8879 and Tex coding. Much of this table was
-prepared by Rik Faith, to whom we express our appreciation.
- The "nearest ASCII" equivalents are given for those who want to
-display the data as best one can in 7-bit simple ASCII symbols without
-using the "entity" symbols.
-=========================================================================
-----------------------------------
-Table 2
-----------------------------------
-
-Comments:
- (1) The symbol in the "entity" column is the SGML-like symbol used in
- the present Webster files; the symbol in the "ISO 8879" column is
- the symbol for the same character given in "The user's guide to
- ISO 8879" by Smith and Stutely.
- (2) An asterisk "*" in the "entity" column means that this symbol and
-code value is not used in any form in the Webster 1913 electronic version.
- (3) If no asterisk is in the "entity" column, and no other symbol is
-there, this means that in the Webster, only the hexadecimal representation
-was used (e.g. for \'d8, \'bd, and \'b8).
- (4) \'b6 and \'b7, the heavy and light "accents", are never above a
-letter (these are not diacritical marks), but in-between letters, as the
-stress accent used in the headwords and pronunciations. The accent
-*follows* the syllable accented. The light accent \'b7 is also used as
-the "prime" in mathematical expressions (e.g. a\'b7 = "a prime"), or as
- "minutes" in degrees-minutes-seconds, and when doubled (\'b7\'b7)
-serves as "double prime" in mathematical expressions, and as "seconds"
-in degrees-minutes-seconds. The character \'a9 (<sec/ or &Prime;) is
-also used to represent the double prime.
- (5) Although the semilong vowels are in the table (e.g. the "asl"
-= "a semilong", most of the entries in the ASCII version dictionary
-use the <xsl/ symbol coding. If you know of any printers' names for
-these, do let me know.
- (6) For some reason, the a breve and u breve have ISO codes (in the
-Latin-2 table), but the other vowels don't, in the Smith & Stutely book.
-Is this a mistake?
- (7) The symbol <nsc/ is used for "N small capitals", used in
-pronunciations to represent the soun fo the nasal N in French words.
- (8) If you find any exceptions to these usage assertions, please
-let me know.
-----------------------------------------------------------------------------------------
- webfont ISO 8879 latin1/ascii TeX nearest description
------------------- ASCII
-oct dec hex entity oct dec hex
---------------------------------------------------------------------------------
-025 21 15 * \S * section symbol
-
-074 60 3c lt 074 60 3c $<$ < less than
-076 62 3e gt 076 62 3e $>$ > greater than
-
-200 128 80 <Cced/ Ccedil 307 199 c7 \c{C} C C cedilla
-201 129 81 <uum/ uuml 374 252 fc \"u ue u umlaut (diaeresis)
-202 130 82 eacute 351 233 e9 \'e e e acute
-203 131 83 <acir/ acirc 342 226 e2 \^a a a circumflex
-204 132 84 <aum/ auml 344 228 e4 \"a ae a umlaut (diaeresis)
-205 133 85 <agrave/ agrave 340 224 e0 \`a a a grave
-206 134 86 <aring/ aring 345 229 e5 \aa a a ring above
-207 135 87 <cced/ ccedil 347 231 e7 \c{c} c c cedilla
-210 136 88 <ecir/ ecirc 352 234 ea \^e e e circumflex
-211 137 89 <eum/ euml 353 235 eb \"e e e umlaut (diaeresis)
-212 138 8a <egrave/ egrave 350 232 e8 \`e e e grave
-213 139 8b <ium/ iuml 357 239 ef \"i i i umlaut (diaeresis)
-214 140 8c <icir/ icirc 356 238 ee \^i i i circumflex
-215 141 8d igrave 354 236 ec \`i i i grave
-216 142 8e Auml A A umlaut
-217 143 8f Aring A A ring above
-
-220 144 90 <Eacute/ Eacute 311 201 c9 \'E e E acute
-221 145 91 <ae/ aelig 346 230 e6 \ae ae ligature ae
-222 146 92 <AE/ AElig 306 198 c6 \AE AE ligature AE
-223 147 93 <ocir/ ocirc 364 244 f4 \^o o o circumflex
-224 148 94 <oum/ ouml 366 246 f6 \"o oe o umlaut (diaeresis)
-225 149 95 ograve 362 242 f2 \`o o o grave
-226 150 96 <ucir/ ucirc 373 251 fb \^u u u circumflex
-227 151 97 ugrave 371 249 f9 \`u u u grave
-230 152 98 <yum/ yuml y y umlaut
-231 153 99 <Oum/ Ouml O O umlaut
-232 154 9a <Uum/ Uuml 334 220 dc \"U U U umlaut (diaeresis)
-233 155 9b
-234 156 9c <pound/ pound 243 163 a3 \pounds * pound sign (British)
-235 157 9d *
-236 158 9e *
-237 159 9f *
-240 160 a0 <aacute/ aacute 341 225 e1 \'a a a acute
-241 161 a1 <iacute/ iacute 355 237 ed \'i i i acute
-242 162 a2 oacute 363 243 f3 \'o o o acute
-243 163 a3 uacute 372 250 fa \'u u u acute
-244 164 a4 <ntil/ ntilde 361 241 f1 \~n ny n tilde
-245 165 a5 <Ntil/ Ntilde NY N tilde
-246 166 a6 <frac23/ $\frac{2}{3}$ 2/3 two-thirds
-247 167 a7 <frac13/ $\frac{1}{3}$ 1/3 one-third
-250 168 a8 *
-251 169 a9 <sec/ Prime seconds (of degree or time)
- Also, inches or double prime
-252 170 aa *
-253 171 ab <frac12/ 275 189 bd $\frac{1}{2}$ 1/2 one-half
-254 172 ac <frac14/ 274 188 bc $\frac{1}{4}$ 1/4 one-quarter
-255 173 ad *
-256 174 ae *
-257 175 af *
-260 176 b0 <?/ (?) Place-holder
- for unknown or illegible character.
-261 177 b1 *
-262 178 b2 *
-263 179 b3 *
-264 180 b4 * $\updownarrow$ * verticle arrow
-265 181 b5 <hand/ * pointing hand
- (printer's "fist")
-266 182 b6 \"{} '' bold accent
- (used in pronunciations)
-267 183 b7 prime 264 180 b4 \'{} ' light accent
- (used in pronunciations)
- also minutes (of arc or time)
-270 184 b8 '' " close double quote
-271 185 b9 *
-272 186 ba * $\parallel$ || verticle double bar (l)
-273 187 bb *
-274 188 bc <sect/ sect \S * section mark
-275 189 bd `` " open double quotes
-276 190 be <amac/ amacr \=a a a macron
-277 191 bf lsquo ` ` left single quote
-
-300 192 c0 <nsm/ ng "n sub-macron"
-301 193 c1 <sharp/ sharp $\sharp$ # musical sharp
-302 194 c2 <flat/ flat $\flat$ * musical flat
-303 195 c3 * -- -- long dash (en-dash? )
-304 196 c4 * $-$ - horizontal line
-305 197 c5 <th/ (part 1) first part of th ligature
- see 231 = e7 for part 2
-306 198 c6 <imac/ imacr \=i i i macron
-307 199 c7 <emac/ emacr \=e e e macron
-310 200 c8 <dsdot/ d Sanskrit/Tamil d dot
-311 201 c9 <nsdot/ n Sanskrit/Tamil n dot
-312 202 ca <tsdot/ t Sanskrit/Tamil t dot
-313 203 cb <ecr/ \u{e} e e breve
-314 204 cc <icr/ \u{i} i i breve
-315 205 cd *
-316 206 ce <ocr/ \u{o} o o breve
-317 207 cf - -- - short dash
-
-320 208 d0 -- mdash --- -- long (em) dash
-321 209 d1 <OE/ OElig \OE OE OE ligature
-322 210 d2 <oe/ oelig \oe oe oe ligature
-323 211 d3 <omac/ omacr \=o o o macron
-324 212 d4 <umac/ umacr \=u u u macron
-325 213 d5 <ocar/ \v{o} o o hacek
-326 214 d6 <aemac/ \=\ae ae ae ligature macron
-327 215 d7 <oemac/ \=\oe oe oe ligature macron
-330 216 d8 par $\parallel$ || double vertical
- bar(s)
-331 217 d9 *
-332 218 da *
-333 219 db *
-334 220 dc <ucr/ ubreve \u{u} u u breve
-335 221 dd <acr/ abreve \u{a} a a breve
-336 222 de <cre/ ssmile \u{} ~ crescent
- (like a breve, but vertically centered --
- represents the short accent in poetic meter)
-337 223 df <ymac/ \=y y y macron
-
-340 224 e0 <asl/ a a "semilong"
- (has a macron above with a short vertical
- bar on top the center of the macron)
- Used in pronunciations.
-341 225 e1 <esl/ e "semilong"
-342 226 e2 <isl/ i "semilong"
-343 227 e3 <osl/ o "semilong"
-344 228 e4 <usl/ u "semilong"
-345 229 e5 <adot/ a a with dot above
-346 230 e6 * mu small Greek mu
-347 231 e7 <th/ (part 2) second part of th ligature
- see 197 = c5 for part 1
-350 232 e8 *
-351 233 e9 *
-352 234 ea *
-353 235 eb <edh/ edh 360 240 f0 th small eth
-354 236 ec *
-355 237 ed <thorn/ thorn 376 254 fe th small thorn
-356 238 ee <atil/ atilde \~a a a tilde
-357 239 ef <ndot/ n n with dot above
-
-360 240 f0 <rsdot/ \d{r} r r with a dot below
-361 241 f1 *
-362 242 f2 *
-363 243 f3 *
-364 244 f4 <yogh/ y small yogh
-365 245 f5 mdash --- -- em dash
-366 246 f6 divide 367 247 f7 $\div$ / division sign
-367 247 f7 ap $\approx$ ~= "double tilde"
-370 248 f8 <deg/ 260 176 b0 ${}^\circ$ * degree sign
-371 249 f9 <middot/ $\bullet$ * bold middle dot
-372 250 fa * 267 183 b7 $\cdot$ * light middle dot
-373 251 fb <root/ radic $\surd$ * root sign
-374 252 fc *
-375 253 fd *
-376 254 fe *
-377 255 ff *
-
- ----------------------------------
-Table 3
-----------------------------------
-
-====================================================================
-The table below gives some additional information about some of the
-more commonly used entities
--------------------------------------------------------------------
-Frequently used:
-decimal hex char definition
- 21 section symbol -- another section also at 197
- (so that 21 can be used as a normal control
- character)
- 126 ~ used by typists as a place-holder in word
- combinations where an uncapitalized headword
- should be.
- 128 80 <Cced/ c cedilla (uppercase)
- 129 81 <uum/ u umlaut
- 130 82 e acute
- 131 83 a circumflex
- 132 84 <aum/ a umlaut
- 133 85 a grave
- 134 86 <aring/ a with "ring" (circle) above (Swedish!)
- 135 87 <cced/ c cedilla
- 136 - 144 standard European set for IBM
- 136 88 <ecir/ e circumflex
- 137 89 <eum/ e umlaut (or e with dieresis above)
- 138 8a e grave
- 145 91 <ae/ = "ae" fused ligature
- 146 92 <AE/ = upper-case "ae" fused ligature
- 147 93 <ocir/ o circumflex
- 148 94 <oum/ o "umlaut", used mostly in "coperation,
- Zol." and in pronunciations
- 164 a4 <ntil/ Spanish "enye"
- 166 a6 <frac23/ two-thirds (fraction)
- 167 a7 <frac13/ one-third (fraction)
- 169 a9 <sec/ seconds of degree or time, or double-prime
- 171 ab <frac12/ one-half, as in the original IBM set
- 172 ac <frac14/ one-fourth (fraction)
- 176 b0 <?/ = (reverse-video question mark), used
- to represent an uncodable or illegible character
- 180 b4 long verticle double-headed arrow (a reference mark)
- 181 b5 <hand/ = (the typographer's "fist")
- Appearing as a "pointing hand" character
- (for explanatory notes)
- 182 b6 bold accent in headwords
- replaced in full ASCII version by double quote = "
- 183 b7 light accent in headwords
- replaced within headwords in the full ASCII version
- by an open-single-quote (` = ASCII 96, not the same
- as 191, \'bf). This mark is used also
- for minutes of a degree, and for "prime"
- to modify variables in mathematical expressions.
- -- two of these in sequence represent seconds
- of a degree, or double prime. The seconds
- symbol is also represented by <sec/ (hex a9).
- 184 b8 close double quotes (used with 189 [= \'bd], open quote)
- 186 ba verticle double bar - represents the symbol used
- in the printed dictionary before a headword to
- signify that the word was adopted without
- anglicization from a foreign language
- but in the full-ASCII version this function
- uses \'d8 -- see 216
- 188 bc <sect/ section mark
- - alternate to 21 (a control character)
- 189 bd open double quotes (used with 184, close quote)
- 190 be <amac/ a macron
- 191 bf <lsquo/ "left single quote"
- single open quote mark (not same as ASCII 96)
- 192 c0 <nsm/ "n sub-macron", an n with a macron below --
- represents the "ng" sound in pronunciations
- 193 c1 <sharp/ sharp - music notation
- 194 c2 <flat/ flat - music notation
- 195 c3 long dash, one pixel removed from left
- will fuse with left long dash, char 208
- 196 c4 graphic horizontal line
- 195+208 combination for a very long dash. In the
- original typing, the dash char 208 was used
- for both non-breaking hyphen (in hyphenated
- words), and for the em-dash used as an
- introductory mark for various segments.
- The em-dash should be distinguished from
- the hyphen, but that conversion hasn't yet
- been done.
- In the full ASCII version, a double hypen
- "--" represent the m-dash
- 197 c5 <th/ (part 1) first of a pair of characters
- 197+231 = used to represent the th ligature --
- <th/ represents the "th" sound of "mother"
- see 231 (e7) for part 2
- 198 c6 <imac/ = i macron
- 199 c7 <emac/ = e macron
- 200 c8 <dsdot/ Sanskrit/Tamil d with dot underneath
- 201 c9 <nsdot/ Sanskrit/Tamil n with dot underneath
- 202 ca <tsdot/ Sanskrit/Tamil t with dot underneath
- 203 cb <ecr/ = e with crescent (breve) above. Used
- - in some etymologies and pronunciation
- 204 cc <icr/ = i with crescent (breve) above - used
- - in some etymologies and pronunciation
- 206 ce <ocr/ = o with crescent (breve) above - used
- - in some etymologies and pronunciation
- 207 cf short dash, used in hyphenated words, and in
- breaking syllables where no accent is used. But
- sometimes the typists used the normal hyphen [45],
- or the long dash (decimal 208) for that purpose.
- The normal hyphen is the same length as the long
- dash, but one pixel higher in the character box.
- # In headwords, in the full ASCII version, this
- short dash is represented by the asterisk "*".
- 208 d0 <mdash/ = represents the long dash, used for the em
- dash which often precedes certain sections within a
- definition, and which separates some sections,
- such as wordforms or collocations within a
- collocation segment. This is replaced in the
- full ASCII version by a double hyphen, "--".
- 210 d2 <oe/ = "oe" fused ligature
- 211 d3 <omac/ = o macron
- 212 d4 <umac/ = u macron
- 213 d5 <ocar/ o with caron (hacek) (inverted circumflex) above
- 214 d6 <aemac/ = "ae" ligature with a macron
- 215 d7 <oemac/ = "oe" ligature with a macron
- 216 d8 <par/ double vertical bar (short length; the long
- length is the graphics character 186)
- This precedes words marked with a double vertical bar in
- the original dictionary, signifying that the word was
- adopted directly into English without modification of
- the spelling.
- 220 dc <ucr/ = u with crescent above - used in some etymologies
- 221 dd <acr/ = a with crescent above - used in some etymologies
- 222 de <cre/ = "crescent", an upward-curving crescent
- used as a poetic meter mark
- 223 df <ymac/ = y macron (used in Anglo-Saxon?)
- 229 e5 <adot/ = a with a dot above (for pronunciations)
- 231 e7 <th/ (part 2) second of a two-character combination
- 197+231 = used to represent the th ligature in pronunciations
- <th/ represents the "th" sound of "mother"
- 235 eb <edh/ = Old English and Icelandic "edh", (or "eth")
- like a Greek delta with a hatch mark
- through the ascender. Used to represent the
- Anglo-Saxon/Icelandic/Gothic character,
- in etymologies, pronounced like "th"
- 237 ed <thorn/ "thorn", an Old English and Icelandic
- character, appears like a "p" with an extended
- ascender.
- Used to represent the
- Anglo-Saxon/Icelandic/Gothic character,
- in etymologies, pronounced like "th"
- in "thorn" and also as in "brother"
- 238 ee <atil/ a with tilde above - in some etymologies
- 244 f4 <yogh/ like a script "3" or "z". Used in Old English
- etymologies, analogous to "y"
- 247 f7 double tilde ("approximately equals").
- used by typists as a place-holder in word
- combinations where the capitalized headword
- should be.
- 248 f8 <deg/ degrees (temperature or angle). Note: some
- typists used a superscript "o" to signify
- degrees. This must be corrected!
- 249 f9 middle dot (bold)
- 250 fa middle dot (light)
- 251 fb <root/ "root" sign used in etymologies, as in original
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-======================================
- Greek transcription
-=====================================
-Greek letters are represented:
- (capitals represent capital letters; lower-case represent lower-case)
- #Note that "h" in transliterations is used individually, as eta, and
- also in the combination "ch" (chi). Conversions to other codings
- must first convert "ch" before converting "h", or at least verify
- that an "h" to be converted has no preceding "c". "c" is not
- otherwise used, so there is no ambiguity. Also, "ps" always
- represents a psi; it could in theory occur as a pi-sigma
- combination, but it doesn't. Occasionally, "th" was entered instead
- of "q" to represent theta; these should be checked to verify that
- they do not represent tau-eta, and converted to "q".
-
-(1) characters individually:
- By the short-form notation <alpha/, <beta/, <gamma/, <lambda/ etc.
- Capitalized letters are <ALPHA/, etc.
-(2) in words:
- By inclusion within the markers <grk></grk>, using the following
- roman-letter equivalents for the Greek letters:
- Accents:
- (a) aspirants -- used in front of the letter modified, which is
-usually in *front* of words beginning in vowels. Of two types:
- ' (apostrophe) for the left-curving apirant (spiritus lenis)
- " (double quote) for the right-curving aspirant (spiritus asper)
- (when the aspirant is on a letter inside a word, it is placed
- in front of the letter it modifies.)
- (the left-curving aspirant is also used over rho, which is
- then usually transliterated "rh". The " in such cases is
- placed in front of the r (for rho) which it modifies).
- (b) normal accent (appearing as an acute accent in the original):
- ` (left open quote, ASCII ) -- placed after accented vowel
- (b) grave accent (appearing as an grave accent in the original):
- ~ (tilde, ASCII ) -- placed after accented vowel. This is
- rarely seen, as in <grk>to~ pa^n</grk> at "universe" or
- <grk>ta~ gewrgika`</grk> (at "Georgic").
- (c) curving accent (appearing as a rounded circumflex):
- ^ (circumflex) -- placed after accented vowel
- (d) "iota" subscript (ogonek)-- a comma placed after the vowel
- having the subscript
- (e) diaeresis:
- the double dot found occasionally over the iota is
- represented by a colon immediately after the iota,
- as the i-diaeresis in <grk>Farisai:ko`s</grk> (at "pharisaic").
-
- Where a letter has two accents, both are placed *after* the vowel
- Letters with an aspirant and an accent have the
- aspirant before the letter, and the accent after it.
- ------------------------
-
-
-The capitalized Greek letters are represented by the capitalized
- versions of the letters shown here.
------------------------------------------
- Greek letter transliteration
- ------------ ---------------
- alpha a
- beta b
- gamma g
- delta d
- epsilon e
- zeta z
- eta h
- theta q (th was used in some earier sections, but was
- changed due to potential confusion with the
- tau+eta combination, as in <grk>lyth`rios</grk>
- (at "lyterian") or <grk>poihth`s</grk>
- (at "maker") )
- iota i
- kappa k
- lambda l
- mu m
- nu n
- xi x
- omicron o
- pi p
- rho r
- sigma s (end form not distinguished here from middle
- form within words, but when isolated, use <sigmat/
- ("terminal sigma") for the end form)
- tau t
- upsilon y (Used for both "u" and "y" pronunciations)
- phi f
- chi ch (c is always followed by h, so the h component
- is not confusable with eta)
- psi ps (theoretically confusable with pi-sigma, but that
- combination seems never to occur)
- omega w
-
- (Roman j, v, u are unused)
-
+ WEBSTER FONTS
+ =============
+
+ Fonts for the Webster 1913 Dictionary.
+ For version 0.50
+ Last edit May 5, 2001
+ ______________________________________
+ (This file contains some extended ASCII characters, and should be
+transmitted in binary mode)
+----------------------------------------------------------------------
+
+ This file describes a modified font for use in visualizing the
+text of the 1913 "Webster's Revised Unabridged Dictionary" (W1913),
+usable for the DOS operating system of IBM-compatible personal computers.
+The electronic version of that dictionary and this font were prepared by
+MICRA, Inc., Plainfield NJ, and are copyrighted (C) 1996 by MICRA, Inc.
+For details of permissions and restrictions on using these files, see
+the accompanying file "readme.web".
+ The special characters used in the electronic version of the Webster
+1913 are required for visualizing unusual characters used in the
+etymology and pronunciation fields of the dictionary, in a form
+comparable to the way they appear in the original. Since there are
+more than 256 characters used in that dictionary, not all can be
+represented by single-byte codes, and are instead represented by
+SGML-style "short-form" symbols. (rather than the "entity" format
+"&xx;" The ampersand is used frequently, and we prefer to leave
+the "<" as the only "escape" character) of the type <x/ where x
+is a specific code for the symbol in the dictionary.
+See the "Short Form" section below for details about such characters.
+Note that the symbols used here are in some cases abbreviations
+(for compactness) of the ISO 8879 recommended symbols. If necessary,
+the table below allows simple replacement by alternate encodings.
+ This symbol font can be loaded in IBM-compatible (x86) computers
+running the DOS operating system by using the "font.bat" command file
+in the "utils" directory. The fonts files for 8x14 and 8x16 fonts are
+"web14.fnt" and "web16.fnt" respectively.
+ For those loading the Webster onto some machine other than an
+IBM-compatible running DOS, it will be necessary to provide a
+translation table, to convert these characters into a code that
+can be handled by that computer. For this reason, I attach an
+"explanation" for each character, for those who cannot view
+the original DOS font.
+ The DOS-loadable font does not contain all of the characters needed
+to depict the etymologies or the pronunciations. In addition to an
+absence of several characters used in the pronunciations, no Greek letters are
+included. The Greek words appearing in the etymologies,
+when they are included, will be typed in a
+roman-letter transcription (See section on Greek transcription, below).
+Only a very few Greek words have been thus transcribed as of the
+present version (version 0.41).
+ Wherever the typists did not know the character to use, they
+usually inserted a reverse-video question mark (decimal 176).
+This appears in full-ASCII versions as <?/. This mark was used both for
+characters in non-ASCII fonts, and for unreadable characters (i.e.,
+characters smeared in the original or distorted in the copies available
+to the typists. The type in the original was in many places smeared and
+illegible at the left and right page margins; occasionally, small
+parts of words were blotted out by plain white space).
+ A character table for the high-order characters appears below.
+Under that is a list and description of most of the special characters
+used in the Webster files.
+ Note that there are yet some characters used in the etymologies,
+and some other symbols, which are not in this list. For example, the
+vowels with a double dot *underneath*, e.g. a (as in all) have no representation
+in this character set, and, where explicitly entered in the dictionary,
+are represented by <xdd/ where "x" is the letter, as in "<add/".
+
+ITALICS
+-------
+ In most places, italic font is represented by the tags <it>...</it>
+surrounding the italic text, or by some other tag which also implies
+italic font. In the pronunciations, however, where italicized vowels
+are used among non-italic and other special characters to indicate
+pronunciation, the special codes <ait/, <eit/, <iit/, <oit/, <uit/,
+are also used to indicate the italicized vowel.
+
+DIACRITICS
+-------------
+ The European grave and acute accents are represented by the
+standard (IBM PC) high-order codes. Other characters with diacritics
+are represented by special "entity" codes, and in some cases also
+are found in this special WEB1913 font, described below.
+ Vowels with a circle above (as in Swedish) are coded <xring/
+(x with a ring, or "degrees" mark over it); vowels with tilde over them
+are represented by <xtil/, where "x" is the vowel, as in <etil/ (<atil/
+also has code 238); letters with a dot above are represented by <xdot/
+-- letter with a dot below are represented by <xsdot/ ("subdot");
+vowels with the semi-long mark (a macron with a short perpendicular
+vertical stroke attached above) are represented by <xsl/; the
+circumflex vowels have codes on this list, but may also be represented
+as <xcir/; vowels with macrons above are <xmac/ (including <oomac/,
+the "oo" with an unbroken macron above the two letters, <aemac/ = the
+ligature ae with a macron [also 214 = \'d6], and <oemac/ the ligature
+oe with a macron [also 215 = \'d7]); vowels with umlauts or a crescent
+(breve) above have codes in this list, but may also be represented by
+<xum/ and <xcr/ respectively. There is an occasional hacek or caron mark
+(an inverted circumflex) in the original; such letters are coded <xcar/.
+The o with a caron has code 213, but no others are in this font list.
+The diaeresis is treated typographically as identical to the umlaut.
+ A special modification, used only for poetry (see entry "saturnian verse"
+under "saturnian") is a vowel with a macron, in which the macron is lighter
+than the usual macron, signifying a stressed syllable which has a short
+vowel sound. This is represented by <xsmac/ ("short mac").
+ Another special character used in pronunciations is an "n" with an underline (like
+a macron, but below the letter), used to represent the "ng" sound. This is coded
+<nsm/ ("n sub-macron"). The ligated th used in pronunciations to depict the
+"th" sound of "the" is coded as <th/.
+ NOTE: the letter combinations "fi" and "fl" are invariably printed as the
+ligatures &filig; and &fllig;, but these ligatures are not marked as such
+in this transcription, and the two letters are left as individuals.
+
+SPECIAL SYMBOLS
+ The dagger <dag/, double dagger <ddag/, and paragraph mark <para/ are rarely used.
+ The double prime, or "seconds" of a degree is sometimes represented by
+a double "light accent" (code 183 = \'b7). In other places, and in later
+versions, it is represented by <sec/ = hex a9, in the webfont.
+ The symbols "greater than" <gt/ and "less than" are encountered only
+once, but are distinguished from the right- and left-angle brackets
+(> and <) because of possible typographical differences in some fonts.
+ The schwa is symbolized by <schwa/. It is not used in the
+pronunciations, but is mentioned as a symbol.
+ The right-pointing arrow is <rarr/, consistent with ISO 8879.
+
+----------------------------------
+Table 1
+----------------------------------
+Numbers
+ Hex codes
+1  
+11   (12 is a hard page break, 13 CR, 14 sect break)
+21  
+31  !"# $%&'(
+121 yz{|} ~ 79-7d 7e-82
+131 83-87 88-8c
+141 8d-91 92-96
+151 97-9b 9c-a0
+161 a1-a5 a6-aa
+171 ab-af b0-b4
+181 b5-b9 ba-be
+191 bf-c3 c4-c8
+201 c9-cd ce-d2
+211 d3-d7 d8-dc
+221 dd-e1 e2-e6
+231 e7-eb ec-f0
+241 f1-f5 f6-fa
+251 fb-ff
+
+=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
+Below is a complete list of the symbols used in the Webster ("webfont")
+which are encoded in the special font listed above, together with
+corresponding symbols in ISO 8879 and Tex coding. Much of this table was
+prepared by Rik Faith, to whom we express our appreciation.
+ The "nearest ASCII" equivalents are given for those who want to
+display the data as best one can in 7-bit simple ASCII symbols without
+using the "entity" symbols.
+=========================================================================
+----------------------------------
+Table 2
+----------------------------------
+
+Comments:
+ (1) The symbol in the "entity" column is the SGML-like symbol used in
+ the present Webster files; the symbol in the "ISO 8879" column is
+ the symbol for the same character given in "The user's guide to
+ ISO 8879" by Smith and Stutely.
+ (2) An asterisk "*" in the "entity" column means that this symbol and
+code value is not used in any form in the Webster 1913 electronic version.
+ (3) If no asterisk is in the "entity" column, and no other symbol is
+there, this means that in the Webster, only the hexadecimal representation
+was used (e.g. for \'d8, \'bd, and \'b8).
+ (4) \'b6 and \'b7, the heavy and light "accents", are never above a
+letter (these are not diacritical marks), but in-between letters, as the
+stress accent used in the headwords and pronunciations. The accent
+*follows* the syllable accented. The light accent \'b7 is also used as
+the "prime" in mathematical expressions (e.g. a\'b7 = "a prime"), or as
+ "minutes" in degrees-minutes-seconds, and when doubled (\'b7\'b7)
+serves as "double prime" in mathematical expressions, and as "seconds"
+in degrees-minutes-seconds. The character \'a9 (<sec/ or &Prime;) is
+also used to represent the double prime.
+ (5) Although the semilong vowels are in the table (e.g. the "asl"
+= "a semilong", most of the entries in the ASCII version dictionary
+use the <xsl/ symbol coding. If you know of any printers' names for
+these, do let me know.
+ (6) For some reason, the a breve and u breve have ISO codes (in the
+Latin-2 table), but the other vowels don't, in the Smith & Stutely book.
+Is this a mistake?
+ (7) The symbol <nsc/ is used for "N small capitals", used in
+pronunciations to represent the soun fo the nasal N in French words.
+ (8) If you find any exceptions to these usage assertions, please
+let me know.
+----------------------------------------------------------------------------------------
+ webfont ISO 8879 latin1/ascii TeX nearest description
+------------------ ASCII
+oct dec hex entity oct dec hex
+--------------------------------------------------------------------------------
+025 21 15 * \S * section symbol
+
+074 60 3c lt 074 60 3c $<$ < less than
+076 62 3e gt 076 62 3e $>$ > greater than
+
+200 128 80 <Cced/ Ccedil 307 199 c7 \c{C} C C cedilla
+201 129 81 <uum/ uuml 374 252 fc \"u ue u umlaut (diaeresis)
+202 130 82 eacute 351 233 e9 \'e e e acute
+203 131 83 <acir/ acirc 342 226 e2 \^a a a circumflex
+204 132 84 <aum/ auml 344 228 e4 \"a ae a umlaut (diaeresis)
+205 133 85 <agrave/ agrave 340 224 e0 \`a a a grave
+206 134 86 <aring/ aring 345 229 e5 \aa a a ring above
+207 135 87 <cced/ ccedil 347 231 e7 \c{c} c c cedilla
+210 136 88 <ecir/ ecirc 352 234 ea \^e e e circumflex
+211 137 89 <eum/ euml 353 235 eb \"e e e umlaut (diaeresis)
+212 138 8a <egrave/ egrave 350 232 e8 \`e e e grave
+213 139 8b <ium/ iuml 357 239 ef \"i i i umlaut (diaeresis)
+214 140 8c <icir/ icirc 356 238 ee \^i i i circumflex
+215 141 8d igrave 354 236 ec \`i i i grave
+216 142 8e Auml A A umlaut
+217 143 8f Aring A A ring above
+
+220 144 90 <Eacute/ Eacute 311 201 c9 \'E e E acute
+221 145 91 <ae/ aelig 346 230 e6 \ae ae ligature ae
+222 146 92 <AE/ AElig 306 198 c6 \AE AE ligature AE
+223 147 93 <ocir/ ocirc 364 244 f4 \^o o o circumflex
+224 148 94 <oum/ ouml 366 246 f6 \"o oe o umlaut (diaeresis)
+225 149 95 ograve 362 242 f2 \`o o o grave
+226 150 96 <ucir/ ucirc 373 251 fb \^u u u circumflex
+227 151 97 ugrave 371 249 f9 \`u u u grave
+230 152 98 <yum/ yuml y y umlaut
+231 153 99 <Oum/ Ouml O O umlaut
+232 154 9a <Uum/ Uuml 334 220 dc \"U U U umlaut (diaeresis)
+233 155 9b
+234 156 9c <pound/ pound 243 163 a3 \pounds * pound sign (British)
+235 157 9d *
+236 158 9e *
+237 159 9f *
+240 160 a0 <aacute/ aacute 341 225 e1 \'a a a acute
+241 161 a1 <iacute/ iacute 355 237 ed \'i i i acute
+242 162 a2 oacute 363 243 f3 \'o o o acute
+243 163 a3 uacute 372 250 fa \'u u u acute
+244 164 a4 <ntil/ ntilde 361 241 f1 \~n ny n tilde
+245 165 a5 <Ntil/ Ntilde NY N tilde
+246 166 a6 <frac23/ $\frac{2}{3}$ 2/3 two-thirds
+247 167 a7 <frac13/ $\frac{1}{3}$ 1/3 one-third
+250 168 a8 *
+251 169 a9 <sec/ Prime seconds (of degree or time)
+ Also, inches or double prime
+252 170 aa *
+253 171 ab <frac12/ 275 189 bd $\frac{1}{2}$ 1/2 one-half
+254 172 ac <frac14/ 274 188 bc $\frac{1}{4}$ 1/4 one-quarter
+255 173 ad *
+256 174 ae *
+257 175 af *
+260 176 b0 <?/ (?) Place-holder
+ for unknown or illegible character.
+261 177 b1 *
+262 178 b2 *
+263 179 b3 *
+264 180 b4 * $\updownarrow$ * verticle arrow
+265 181 b5 <hand/ * pointing hand
+ (printer's "fist")
+266 182 b6 \"{} '' bold accent
+ (used in pronunciations)
+267 183 b7 prime 264 180 b4 \'{} ' light accent
+ (used in pronunciations)
+ also minutes (of arc or time)
+270 184 b8 '' " close double quote
+271 185 b9 *
+272 186 ba * $\parallel$ || verticle double bar (l)
+273 187 bb *
+274 188 bc <sect/ sect \S * section mark
+275 189 bd `` " open double quotes
+276 190 be <amac/ amacr \=a a a macron
+277 191 bf lsquo ` ` left single quote
+
+300 192 c0 <nsm/ ng "n sub-macron"
+301 193 c1 <sharp/ sharp $\sharp$ # musical sharp
+302 194 c2 <flat/ flat $\flat$ * musical flat
+303 195 c3 * -- -- long dash (en-dash? )
+304 196 c4 * $-$ - horizontal line
+305 197 c5 <th/ (part 1) first part of th ligature
+ see 231 = e7 for part 2
+306 198 c6 <imac/ imacr \=i i i macron
+307 199 c7 <emac/ emacr \=e e e macron
+310 200 c8 <dsdot/ d Sanskrit/Tamil d dot
+311 201 c9 <nsdot/ n Sanskrit/Tamil n dot
+312 202 ca <tsdot/ t Sanskrit/Tamil t dot
+313 203 cb <ecr/ \u{e} e e breve
+314 204 cc <icr/ \u{i} i i breve
+315 205 cd *
+316 206 ce <ocr/ \u{o} o o breve
+317 207 cf - -- - short dash
+
+320 208 d0 -- mdash --- -- long (em) dash
+321 209 d1 <OE/ OElig \OE OE OE ligature
+322 210 d2 <oe/ oelig \oe oe oe ligature
+323 211 d3 <omac/ omacr \=o o o macron
+324 212 d4 <umac/ umacr \=u u u macron
+325 213 d5 <ocar/ \v{o} o o hacek
+326 214 d6 <aemac/ \=\ae ae ae ligature macron
+327 215 d7 <oemac/ \=\oe oe oe ligature macron
+330 216 d8 par $\parallel$ || double vertical
+ bar(s)
+331 217 d9 *
+332 218 da *
+333 219 db *
+334 220 dc <ucr/ ubreve \u{u} u u breve
+335 221 dd <acr/ abreve \u{a} a a breve
+336 222 de <cre/ ssmile \u{} ~ crescent
+ (like a breve, but vertically centered --
+ represents the short accent in poetic meter)
+337 223 df <ymac/ \=y y y macron
+
+340 224 e0 <asl/ a a "semilong"
+ (has a macron above with a short vertical
+ bar on top the center of the macron)
+ Used in pronunciations.
+341 225 e1 <esl/ e "semilong"
+342 226 e2 <isl/ i "semilong"
+343 227 e3 <osl/ o "semilong"
+344 228 e4 <usl/ u "semilong"
+345 229 e5 <adot/ a a with dot above
+346 230 e6 * mu small Greek mu
+347 231 e7 <th/ (part 2) second part of th ligature
+ see 197 = c5 for part 1
+350 232 e8 *
+351 233 e9 *
+352 234 ea *
+353 235 eb <edh/ edh 360 240 f0 th small eth
+354 236 ec *
+355 237 ed <thorn/ thorn 376 254 fe th small thorn
+356 238 ee <atil/ atilde \~a a a tilde
+357 239 ef <ndot/ n n with dot above
+
+360 240 f0 <rsdot/ \d{r} r r with a dot below
+361 241 f1 *
+362 242 f2 *
+363 243 f3 *
+364 244 f4 <yogh/ y small yogh
+365 245 f5 mdash --- -- em dash
+366 246 f6 divide 367 247 f7 $\div$ / division sign
+367 247 f7 ap $\approx$ ~= "double tilde"
+370 248 f8 <deg/ 260 176 b0 ${}^\circ$ * degree sign
+371 249 f9 <middot/ $\bullet$ * bold middle dot
+372 250 fa * 267 183 b7 $\cdot$ * light middle dot
+373 251 fb <root/ radic $\surd$ * root sign
+374 252 fc *
+375 253 fd *
+376 254 fe *
+377 255 ff *
+
+ ----------------------------------
+Table 3
+----------------------------------
+
+====================================================================
+The table below gives some additional information about some of the
+more commonly used entities
+-------------------------------------------------------------------
+Frequently used:
+decimal hex char definition
+ 21 section symbol -- another section also at 197
+ (so that 21 can be used as a normal control
+ character)
+ 126 ~ used by typists as a place-holder in word
+ combinations where an uncapitalized headword
+ should be.
+ 128 80 <Cced/ c cedilla (uppercase)
+ 129 81 <uum/ u umlaut
+ 130 82 e acute
+ 131 83 a circumflex
+ 132 84 <aum/ a umlaut
+ 133 85 a grave
+ 134 86 <aring/ a with "ring" (circle) above (Swedish!)
+ 135 87 <cced/ c cedilla
+ 136 - 144 standard European set for IBM
+ 136 88 <ecir/ e circumflex
+ 137 89 <eum/ e umlaut (or e with dieresis above)
+ 138 8a e grave
+ 145 91 <ae/ = "ae" fused ligature
+ 146 92 <AE/ = upper-case "ae" fused ligature
+ 147 93 <ocir/ o circumflex
+ 148 94 <oum/ o "umlaut", used mostly in "coperation,
+ Zol." and in pronunciations
+ 164 a4 <ntil/ Spanish "enye"
+ 166 a6 <frac23/ two-thirds (fraction)
+ 167 a7 <frac13/ one-third (fraction)
+ 169 a9 <sec/ seconds of degree or time, or double-prime
+ 171 ab <frac12/ one-half, as in the original IBM set
+ 172 ac <frac14/ one-fourth (fraction)
+ 176 b0 <?/ = (reverse-video question mark), used
+ to represent an uncodable or illegible character
+ 180 b4 long verticle double-headed arrow (a reference mark)
+ 181 b5 <hand/ = (the typographer's "fist")
+ Appearing as a "pointing hand" character
+ (for explanatory notes)
+ 182 b6 bold accent in headwords
+ replaced in full ASCII version by double quote = "
+ 183 b7 light accent in headwords
+ replaced within headwords in the full ASCII version
+ by an open-single-quote (` = ASCII 96, not the same
+ as 191, \'bf). This mark is used also
+ for minutes of a degree, and for "prime"
+ to modify variables in mathematical expressions.
+ -- two of these in sequence represent seconds
+ of a degree, or double prime. The seconds
+ symbol is also represented by <sec/ (hex a9).
+ 184 b8 close double quotes (used with 189 [= \'bd], open quote)
+ 186 ba verticle double bar - represents the symbol used
+ in the printed dictionary before a headword to
+ signify that the word was adopted without
+ anglicization from a foreign language
+ but in the full-ASCII version this function
+ uses \'d8 -- see 216
+ 188 bc <sect/ section mark
+ - alternate to 21 (a control character)
+ 189 bd open double quotes (used with 184, close quote)
+ 190 be <amac/ a macron
+ 191 bf <lsquo/ "left single quote"
+ single open quote mark (not same as ASCII 96)
+ 192 c0 <nsm/ "n sub-macron", an n with a macron below --
+ represents the "ng" sound in pronunciations
+ 193 c1 <sharp/ sharp - music notation
+ 194 c2 <flat/ flat - music notation
+ 195 c3 long dash, one pixel removed from left
+ will fuse with left long dash, char 208
+ 196 c4 graphic horizontal line
+ 195+208 combination for a very long dash. In the
+ original typing, the dash char 208 was used
+ for both non-breaking hyphen (in hyphenated
+ words), and for the em-dash used as an
+ introductory mark for various segments.
+ The em-dash should be distinguished from
+ the hyphen, but that conversion hasn't yet
+ been done.
+ In the full ASCII version, a double hypen
+ "--" represent the m-dash
+ 197 c5 <th/ (part 1) first of a pair of characters
+ 197+231 = used to represent the th ligature --
+ <th/ represents the "th" sound of "mother"
+ see 231 (e7) for part 2
+ 198 c6 <imac/ = i macron
+ 199 c7 <emac/ = e macron
+ 200 c8 <dsdot/ Sanskrit/Tamil d with dot underneath
+ 201 c9 <nsdot/ Sanskrit/Tamil n with dot underneath
+ 202 ca <tsdot/ Sanskrit/Tamil t with dot underneath
+ 203 cb <ecr/ = e with crescent (breve) above. Used
+ - in some etymologies and pronunciation
+ 204 cc <icr/ = i with crescent (breve) above - used
+ - in some etymologies and pronunciation
+ 206 ce <ocr/ = o with crescent (breve) above - used
+ - in some etymologies and pronunciation
+ 207 cf short dash, used in hyphenated words, and in
+ breaking syllables where no accent is used. But
+ sometimes the typists used the normal hyphen [45],
+ or the long dash (decimal 208) for that purpose.
+ The normal hyphen is the same length as the long
+ dash, but one pixel higher in the character box.
+ # In headwords, in the full ASCII version, this
+ short dash is represented by the asterisk "*".
+ 208 d0 <mdash/ = represents the long dash, used for the em
+ dash which often precedes certain sections within a
+ definition, and which separates some sections,
+ such as wordforms or collocations within a
+ collocation segment. This is replaced in the
+ full ASCII version by a double hyphen, "--".
+ 210 d2 <oe/ = "oe" fused ligature
+ 211 d3 <omac/ = o macron
+ 212 d4 <umac/ = u macron
+ 213 d5 <ocar/ o with caron (hacek) (inverted circumflex) above
+ 214 d6 <aemac/ = "ae" ligature with a macron
+ 215 d7 <oemac/ = "oe" ligature with a macron
+ 216 d8 <par/ double vertical bar (short length; the long
+ length is the graphics character 186)
+ This precedes words marked with a double vertical bar in
+ the original dictionary, signifying that the word was
+ adopted directly into English without modification of
+ the spelling.
+ 220 dc <ucr/ = u with crescent above - used in some etymologies
+ 221 dd <acr/ = a with crescent above - used in some etymologies
+ 222 de <cre/ = "crescent", an upward-curving crescent
+ used as a poetic meter mark
+ 223 df <ymac/ = y macron (used in Anglo-Saxon?)
+ 229 e5 <adot/ = a with a dot above (for pronunciations)
+ 231 e7 <th/ (part 2) second of a two-character combination
+ 197+231 = used to represent the th ligature in pronunciations
+ <th/ represents the "th" sound of "mother"
+ 235 eb <edh/ = Old English and Icelandic "edh", (or "eth")
+ like a Greek delta with a hatch mark
+ through the ascender. Used to represent the
+ Anglo-Saxon/Icelandic/Gothic character,
+ in etymologies, pronounced like "th"
+ 237 ed <thorn/ "thorn", an Old English and Icelandic
+ character, appears like a "p" with an extended
+ ascender.
+ Used to represent the
+ Anglo-Saxon/Icelandic/Gothic character,
+ in etymologies, pronounced like "th"
+ in "thorn" and also as in "brother"
+ 238 ee <atil/ a with tilde above - in some etymologies
+ 244 f4 <yogh/ like a script "3" or "z". Used in Old English
+ etymologies, analogous to "y"
+ 247 f7 double tilde ("approximately equals").
+ used by typists as a place-holder in word
+ combinations where the capitalized headword
+ should be.
+ 248 f8 <deg/ degrees (temperature or angle). Note: some
+ typists used a superscript "o" to signify
+ degrees. This must be corrected!
+ 249 f9 middle dot (bold)
+ 250 fa middle dot (light)
+ 251 fb <root/ "root" sign used in etymologies, as in original
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+======================================
+ Greek transcription
+=====================================
+Greek letters are represented:
+ (capitals represent capital letters; lower-case represent lower-case)
+ #Note that "h" in transliterations is used individually, as eta, and
+ also in the combination "ch" (chi). Conversions to other codings
+ must first convert "ch" before converting "h", or at least verify
+ that an "h" to be converted has no preceding "c". "c" is not
+ otherwise used, so there is no ambiguity. Also, "ps" always
+ represents a psi; it could in theory occur as a pi-sigma
+ combination, but it doesn't. Occasionally, "th" was entered instead
+ of "q" to represent theta; these should be checked to verify that
+ they do not represent tau-eta, and converted to "q".
+
+(1) characters individually:
+ By the short-form notation <alpha/, <beta/, <gamma/, <lambda/ etc.
+ Capitalized letters are <ALPHA/, etc.
+(2) in words:
+ By inclusion within the markers <grk></grk>, using the following
+ roman-letter equivalents for the Greek letters:
+ Accents:
+ (a) aspirants -- used in front of the letter modified, which is
+usually in *front* of words beginning in vowels. Of two types:
+ ' (apostrophe) for the left-curving apirant (spiritus lenis)
+ " (double quote) for the right-curving aspirant (spiritus asper)
+ (when the aspirant is on a letter inside a word, it is placed
+ in front of the letter it modifies.)
+ (the left-curving aspirant is also used over rho, which is
+ then usually transliterated "rh". The " in such cases is
+ placed in front of the r (for rho) which it modifies).
+ (b) normal accent (appearing as an acute accent in the original):
+ ` (left open quote, ASCII ) -- placed after accented vowel
+ (b) grave accent (appearing as an grave accent in the original):
+ ~ (tilde, ASCII ) -- placed after accented vowel. This is
+ rarely seen, as in <grk>to~ pa^n</grk> at "universe" or
+ <grk>ta~ gewrgika`</grk> (at "Georgic").
+ (c) curving accent (appearing as a rounded circumflex):
+ ^ (circumflex) -- placed after accented vowel
+ (d) "iota" subscript (ogonek)-- a comma placed after the vowel
+ having the subscript
+ (e) diaeresis:
+ the double dot found occasionally over the iota is
+ represented by a colon immediately after the iota,
+ as the i-diaeresis in <grk>Farisai:ko`s</grk> (at "pharisaic").
+
+ Where a letter has two accents, both are placed *after* the vowel
+ Letters with an aspirant and an accent have the
+ aspirant before the letter, and the accent after it.
+ ------------------------
+
+
+The capitalized Greek letters are represented by the capitalized
+ versions of the letters shown here.
+-----------------------------------------
+ Greek letter transliteration
+ ------------ ---------------
+ alpha a
+ beta b
+ gamma g
+ delta d
+ epsilon e
+ zeta z
+ eta h
+ theta q (th was used in some earier sections, but was
+ changed due to potential confusion with the
+ tau+eta combination, as in <grk>lyth`rios</grk>
+ (at "lyterian") or <grk>poihth`s</grk>
+ (at "maker") )
+ iota i
+ kappa k
+ lambda l
+ mu m
+ nu n
+ xi x
+ omicron o
+ pi p
+ rho r
+ sigma s (end form not distinguished here from middle
+ form within words, but when isolated, use <sigmat/
+ ("terminal sigma") for the end form)
+ tau t
+ upsilon y (Used for both "u" and "y" pronunciations)
+ phi f
+ chi ch (c is always followed by h, so the h component
+ is not confusable with eta)
+ psi ps (theoretically confusable with pi-sigma, but that
+ combination seems never to occur)
+ omega w
+
+ (Roman j, v, u are unused)
+

Return to:

Send suggestions and report system problems to the System administrator.