Archive for August, 2006

Using HTMLDOC to split HTML in multiple pages

August 31, 2006

You have that really big HTML page that takes forever to load on the browser? What about to break it in smaller pieces, one topic per page? The HTMLDOC tool can make it for you.

The main purpose of this tool is to do the opposite: join multiple HTML files into one single PDF file. It has a huge list of options, so you have strong control over the process, like setting fonts, header and footer, automatic Table of Contents, insert a cover page and more.

The txt2tags User Guide PDF is generated from a big HTML file by HTMLDOC.

The latest version comes with a new target called “htmlsep”, that takes an structured HTML page (full of <H1>, <H2>) and breaks it into multiple pages. This is the command line usage:

htmldoc -t htmlsep -o output-folder file.html

Note that it’s required that you create a folder for the generated files, before running the command. Let’s break some files? Here’s a quick sample HTML file with some headings:

<html>
<body>
<h1>Greatest Bands Ever</h1>
  <h2>Punk Rock</h2>
  RAMONES.
  <h2>Softcore</h2>
  Millencolin, No Fun At All, No Use For A Name, ...
  <h2>Other</h2>
  Toy Dolls, Operation Ivy, Face to Face, ...

<h1>Greatest Movies Ever</h1>
  <h2>Documentary</h2>
  Dogtown And Z-Boys, Riding Giants, Step Into Liquid, ...
  <h2>Strange</h2>
  Cube

</body>
</html>

Follow me:

$ ls -F
greatest.html  output/

$ htmldoc -t htmlsep -d output greatest.html
BYTES: 715
BYTES: 1135
BYTES: 883
BYTES: 1024
BYTES: 1030
BYTES: 1059
BYTES: 998
BYTES: 1074
BYTES: 896

$

Before that command, we’ve had just the HTML file and an empty folder. When running HTMLDOC it shows those “BYTES” lines to inform you everything is OK. Now, let’s check what we have on the output folder:

$ ls output/
Documentary.html        Other.html              Strange.html
GreatestBandsEver.html  PunkRock.html           index.html
GreatestMoviesEver.html Softcore.html           toc.html

Great! Each heading went to its own file, named accordingly. The extra files are “index.html” and “toc.html”, that holds the cover page and the Table of Contents. All the pages have the following navigation links: Contents, Previous, Next, so you can browse them in a sequence.

Handy, simple and fast.

You may play with other options to customize the files:

$ htmldoc -t htmlsep -d output \\
	--no-title --toclevels 2 --toctitle "Contents" \\
	greatest.html

Remember that big old User Guide in HTML that has hanging around on the txt2tags site? Now it is separated in multiple files. If you prefer the all-in-one version, download the PDF (see About topic).

Note 1: HTMLDOC has no support for CSS. You’ll have to add the <link> tag to the generated files.

Note 2: HTMLDOC reads the file data since the first heading. Use a %!postproc to remove the <H1>Page Title</H1> line when converting to HTML.

Note 3: Download HTMLDOC from www.htmldoc.org, which is the free Open Source version. If you’re in Linux search for “htmldoc” in your package manager. On the Mac you can find it on Fink, or download and compile the sources (it’s quick). Windows users may have to install the commercial demo from Easy Software.

Quit your text processor

August 25, 2006

Are you a writer? What about not using a text processor to write your text? Read on, I promise it will make sense :)

There’s a nice article about text-only writing at Linux.com. Three tools are mentioned to substitute the big memory-hungry Office processor:

  • A text editor - Vi
  • A formatting tool - txt2tags
  • A spell checker - Aspell

Check it out! Minimalist tools for writers by Dmitri Popov.

For more detailed information, read the Writing Books with Txt2tags document.

New translations: Swedish and Chinese

August 10, 2006

The txt2tags Team continues to grow. New contributors have spent their spare time helping to improve the program documentation. A big WELCOME to the newcomers!

  • Per Erik Strandberg translated the Sample file to Swedish.
  • wfifi translated the program messages (potfile) to Chinese.
  • Nicolas Dumoulin revised the User Guide’s French translation.

Their work is already online on the documentation page.

And you?

Maybe you could help us and translate the Markup Demo or the sample file to your language? It’s quick and easy! Takes just a few minutes. Start now!

Minor version 2.3.2 released

August 9, 2006

Summary: New commented block mark and several bug fixes.

This release introduces a new mark for commented blocks: %%%. The syntax is similar to the Verbatim and Raw blocks, using the same mark to open and close the block. Kudos to Leo Rosa for sending the patch!

    This is a paragraph.

    %%%
    This is a commented block.
    Remember that the %%% must be at the line
    beginning with no leading spaces.
    %%%

    Another paragraph.

The txt2tags test suite was extended from 120 to 144 tests! Those new checks revealed very catchy bugs and some strange behavior. Even fatal errors raised from uncommon markup on the source file.

Now everything is fine. Oh if I had implemented that test suite thing since the beginning…

Vanished Bugs

Removed useless <P></P> after Table followed by blank line

    $ echo -e "\n| Table\n" | txt2tags-2.3 -t html -H -o- -i-
    <TABLE CELLPADDING="4">
    <TR>
    <TD>Table</TD>
    </TR>
    </TABLE>

    <P></P>

    $ echo -e "\n| Table\n" | txt2tags-2.3.2 -t html -H -o- -i-
    <TABLE CELLPADDING="4">
    <TR>
    <TD>Table</TD>
    </TR>
    </TABLE>

Raw doesn’t close Quote anymore

    $ echo -e '\n\tQuote\n""" Raw' | txt2tags-2.3 -t html -H -o- -i-
            <BLOCKQUOTE>
            Quote
            </BLOCKQUOTE>
    Raw

    $ echo -e '\n\tQuote\n""" Raw' | txt2tags-2.3.2 -t html -H -o- -i-
            <BLOCKQUOTE>
            Quote
    Raw
            </BLOCKQUOTE>

Bugfix: Macro at line beginning now closes Quote

    $ echo -e "\n\tQuote\n%%date" | txt2tags-2.3 -t html -H -o- -i-
            <BLOCKQUOTE>
            Quote
            20060809
            </BLOCKQUOTE>

    $ echo -e "\n\tQuote\n%%date" | txt2tags-2.3.2 -t html -H -o- -i-
            <BLOCKQUOTE>
            Quote
            </BLOCKQUOTE>
    <P>
    20060809
    </P>

Bugfix: Verbatim and Raw areas are now mutually exclusive

    $ echo -e '\n```\n"""\nRaw in Verb\n"""\n```' | txt2tags-2.3 -t html -H -o- -i-
    <PRE>
    </PRE>
    Raw in Verb
    <PRE>
    </PRE>

    $ echo -e '\n```\n"""\nRaw in Verb\n"""\n```' | txt2tags-2.3.2 -t html -H -o- -i-
    <PRE>
      """
      Raw in Verb
      """
    </PRE>

Bugfix: Fatal error on macro after table

    $ echo -e "\n| x |\n%%date" | txt2tags-2.3 -t html -H -o- -i-
    Sorry! Txt2tags aborted by an unknown error.

    $ echo -e "\n| x |\n%%date" | txt2tags-2.3.2 -t html -H -o- -i-
    <TABLE CELLPADDING="4" BORDER="1">
    <TR>
    <TD>x</TD>
    </TR>
    </TABLE>

    <P>
    20060809
    </P>

Bugfix: Fatal error on table inside deflist

    $ echo -e "\n: | Table inside List Term" | txt2tags-2.3 -t html -H -o- -i-
    Sorry! Txt2tags aborted by an unknown error.

    $ echo -e "\n: | Table inside List Term" | txt2tags-2.3.2 -t html -H -o- -i-
    <DL>
    <DT>| Table inside List Term</DT><DD>
    </DL>

Bugfix: Fatal error on empty table

    $ echo -e "\n| |" | txt2tags-2.3 -t html -H -o- -i-
    Sorry! Txt2tags aborted by an unknown error.

    $ echo -e "\n| |" | txt2tags-2.3.2 -t html -H -o- -i-
    <TABLE CELLPADDING="4" BORDER="1">
    <TR>
    <TD></TD>
    </TR>
    </TABLE>

Get the new code at the download page, under the Minor Releases section.