NAME
    HTML::ParseTables - Extract text from HTML tables. Version 0.05
    (pre-alpha)

SYNOPSIS
        use HTML::ParseTables;

        my $p = new HTML::ParseTables();
        if ($html = shift) {
            $p->parse_file($html)
        }
        else {
            while (<DATA>) { $p->parse($_) }
        }
        
        print $p->table_count, " tables found\n";
        my %h = $p->all_tables;

        foreach $table (sort keys %h) {
            my @rows = $p->get_table($table);
            my $row_count = 0;
            print "\nTABLE $table. ",
                  $p->row_count($table),
                  " rows:\n";
            foreach $row (@rows) {
                print ++$row_count,
                      " (", scalar(@{$row}), " cells)\t",
                      join("\t", @{$row}), "\n";
            }
        }

        print "Table 1, Cell B2    : ",
              $p->get_cell(1, 'B2'), "\n";
        print "Last table, cell A1 : ",
              $p->get_cell('A1'), "\n";

    __DATA__

        <HTML><BODY>
        <P> paragraph before table </P>
        <TABLE>
            <TR> <TD>A1</TD> <TD>B1</TD> </TR>
            <TR> <TD>A2</TD> <TD>B2</TD> </TR>
        </TABLE>
        <TABLE>
            <TR> <TD>T2-A1</TD> <TD>T2-B1</TD> </TR>
            <TR> <TD>T2-A2</TD> <TD>T2-B2</TD> </TR>
        </TABLE>

DESCRIPTION
    Easy extraction of text from HTML documents containing tables.
    Tries to focus on an intuitive interface to get at table
    content. Particularly, it allows different notations to to get
    at individual cells, among which the popular spreadsheet "B2"
    notation.

    This version is to be considered "pre-alpha": it may contain
    many bugs, lots of things are not documented and the interface
    may change quickly. For the documentation, the only reliable
    thing to do is to look at the code. I have no time to polish it
    now, but I was asked to post it, so here it is.

    It works well for me in a few scripts that run daily, so
    hopefully you can use it too.

DETAILS
    Important: the module uses 1 as the index to the first
    table/column/row/cell, not 0!

  %config

    Should allow setting of user preferences for output and things
    retained during parsing.

  get_table([$table])

    Returns table $table as a list of rows (rows being references to
    a list of cells). First table is table 1. Without argument,
    returns last table.

  get_table_as_text([$table])

    Returns table $table as a string. Newlines between rows. The
    separator between cells depends on $config{format}. Without
    argument, returns last table.

  get_row([$table,] $row)

    Returns row $row from table $table as a list of cells. If $table
    is omitted, uses last table. First row is 1.

  get_cell()

    Accept different formats:

        get_cell($table, $column, $row)
         get_cell($table, 'B3')
        get_cell($column, $row)
        get_cell('A5')

    Returns the cell content. If $table is omitted, uses last.

  table_count()

    Returns number of tables found. Takes no argument.

  row_count([$table])

    Returns number of rows in table $table or last table.

  cell_count($table, $row)

    Returns number of cells in row $row of table $table.

  all_tables

    Returns a hash with all tables. Keys are numbers from 1 to the
    number of tables. Values are references to lists of lists (rows
    of cells).

LIMITATIONS
    Lot's for now:

    Doesn't handle nested tables.

    Doesn't understand colspan and rowspan.

    Documentation incomplete and possibly even wrong.

    This is really not finished.

    ...?

BUGS
    Let me know what you find

AUTHOR
    Milivoj Ivkovic <mi@alma.ch>. Others welcome to extend it to
    more operating systems which don't have an uptime command.

COPYRIGHT
    Copyright Milivoj Ivkovic, 1999. Same license as Perl itself.



