MediaWiki Markup Translator =========================== This package provides Python framework for translating WikiMedia articles to various formats. The present version supports conversions to plain text, HTML, and Texinfo formats. A command line converter utility is included. Classes ======= class ``WikiMarkup`` -------------------- A base class for all translator classes. Unless you plan extending wikitrans, you will never have to create objects of this class. Instead, you will be using one of its derived classes. Constructor arguments common for all derived classes: filename = *name* The file *name* is opened and used for input. file = *fd* An already opened file *fd* is used for input. text = *string* Input is taken from *string*, line by line. lang = *code* Specifies language version. Default is ``en``. This variable can be referred to as ``%(lang)s`` in the keyword arguments below. html_base = *url* Base URL for cross-references. Default is ``http://%(lang)s.wikipedia.org/wiki/``. image_base = *url* Base URL for images. Default is ``http://upload.wikimedia.org/wikipedia/commons/thumb/a/bf`` media_base = *url* Base URL for media files. Default is ``http://www.mediawiki.org/xml/export-0.3`` class ``TextWikiMarkup`` ------------------------ Translates material in Wiki markup language to plain text. Usage:: from WikiTrans.wiki2text import TextWikiMarkup markup = TextWikiMarkup(filename='input.txt') markup.parse() print(str(markup)) Specific constructor arguments: width = *N* Limit output width to *N* columns. Default is 78. show_urls = *bool* Whether or not to show the URLs links refer to. If *bool* is ``True`` (the default), a URL will be displayed in parentheses next to the link text. If ``False``, only the link text will be displayed. class ``TextWiktionaryMarkup`` ------------------------------ Translate material from wiktionary to plain text form. This is supposed to provide a wiktionary-specific form of ``TextWikiMarkup``. Currently, this class differs from ``TextWikiMarkup`` only in that the default value for ``html_base`` is ``http://%(lang)s.wikipedia.org/wiki/``. class ``TexiWikiMarkup`` ------------------------ Translate Wiki markup to Texinfo source. Usage:: from WikiTrans.wiki2texi import TexiWikiMarkup markup = TexiWikiMarkup(filename='input.txt') markup.parse() print(str(markup)) Two markup-specific keywords control the sectioning model used. sectioning_model = *model* Selects the Texinfo sectioning model for the output document. Possible values are: ``numbered`` Top of document is marked with ``@top``. Headings (``=``, ``==``, ``===``, etc) produce ``@chapter``, ``@section``, ``@subsection``, etc. ``unnumbered`` Unnumbered sectioning: ``@top``, ``@unnumbered``, ``@unnumberedsec``, ``@unnumberedsubsec``. ``appendix`` Sectioning suitable for appendix entries: ``@top``, ``@appendix``, ``@appendixsec``, ``@appendixsubsec``, etc. ``heading`` Use heading directives to reflect sectioning: ``@majorheading``, ``@chapheading``, ``@heading``, ``@subheading``, etc. sectioning_start = *n* Shift resulting heading level by *n* positions. For example, supposing ``sectioning_model=numbered``, ``== A ==`` will produce ``@section A`` on output. If ``sectioning_start=1`` is also given, this directive will produce ``@subsection A`` instead. class ``HtmlWikiMarkup`` ------------------------ Translates Wiki markup to HTML. Usage:: from WikiTrans.wiki2html import HtmlWikiMarkup markup = HtmlWikiMarkup(filename='input.txt') markup.parse() print(str(markup)) Supported keywords are same as for ``WikiMarkup`` class. class ``HtmlWiktionaryMarkup`` ------------------------------ Translate material from wiktionary to HTML form. This is supposed to provide a wiktionary-specific form of ``HtmlWikiMarkup``. Currently both classes are equivalent, except that the default value for ``html_base`` in ``HtmlWiktionaryMarkup`` is ``http://%(lang)s.wikipedia.org/wiki/``. The ``wikitrans`` utility ========================= This command line utility converts the supplied text to selected output format. The usage syntax is:: wikitrans [OPTIONS] ARG If ARG looks like a URL, the wiki text to be converted will be downloaded from that URL. Otherwise, if the ``--base-url=URL`` option is given, ARG is treated as the name of the page to get from the WikiMedia istallation at ``URL``. Otherwise, ARG is treated as the name of the file to read wiki material from. Examples:: wikitrans text.wiki wikitrans --base-url http://en.wiktionary.org door wikitrans https://en.wiktionary.org/wiki/Special:Export/door Options are: ``--version`` Show program's version number and exit. ``-h``, ``--help`` Show a short usage summary and exit. ``-v``, ``--verbose`` Verbose operation. ``-I ITYPE``, ``--input-type=ITYPE`` Set input document type. *ITYPE* is one of: ``default`` or ``wiktionary``. ``-t OTYPE``, ``--to=OTYPE``, ``--type=OTYPE`` Set output document type (``html`` (the default), ``texi``, ``text``, or ``dump``). ``-l LANG``, ``--lang=LANG`` Set input document language. ``-o KW=VAL``, ``--option=KW=VAL`` Pass the keyword argument ``KW=VAL`` to the parser class constructor. ``-d DEBUG``, ``--debug=DEBUG`` Set debug level (0..100). ``-D``, ``--dump`` Dump parse tree and exit; same as ``--type=dump``. ``-b URL``, ``--base-url=URL`` Set base url. Note: when using ``--base-url`` or passing URL as an argument (2nd and 3rd use cases above), if the URL is in 'wikipedia.org' or 'wiktionary.org' domain, the options ``--input-type``, and ``--lang`` are set automatically.