[jump to content][No Software Patents]Tools (new) : tinohtmlparse [ Home | Scylla+Charybdis old home | Tools (old) | Tools (new) | FAQ | Downloads ]

Moving to GitHub, slowly

The software on this pages will slowly be moved to GitHub https://github.com/hilbix/. The CVS repository will be migrated to GIT as well, so the history will be preserved, a bit. See FAQ.

Scylla and Charybdis, tinohtmlparse - Tools

The tools are developed under Linux with ESR's paradigm release early, release often in mind.
So you can consider this beta software, or alpha, or pre-alpha, or even worse ;)

Have a look in the download directory for all downloads.
As always here, all you get is the source. No binaries here.

tinohtmlparse 0.3.1-20090713-002009

Simple HTML parser to extract information from HTML files by shell


⇒ ⇒ ⇒ The development shifted to another location. All future versions will be published at https://github.com/hilbix/tinohtmlparse   ⇐ ⇐ ⇐


tinohtmlparse is based on ekHTML parser, to enable simple shell scripts to extract data out of an HTML file.

For an example read the README file.

History:

version 0.3.1-20090713-002009

download (13907 bytes) sig

Bugfix release: URL parser in tinohtmlabsurl.sh was broken.

#anchor and ?query type short URLs did not work correctly. It forgot to include the base filename and/or the query. Also "mailto:" and "news:" type URLs shall work now.

version 0.3.0-20090713-001713

download (13539 bytes) sig

Changed the output format. Now TAB preceedes the URL. Now installs into /usr/local/bin

If you need the old format with only spaces (space instead of TAB), use --old.

version 0.2.0-20071230-185919

download (13237 bytes) sig

Bugfix: ∧ had the wrong Entitiy code.

Also: Code now placed under the CLL.

version 0.1.6-20071230-181703

(11983 bytes archive)

README corrected

version 0.1.5-20070916-082005

(11829 bytes archive)

README corrected

version 0.1.4-20070916-080531

(11486 bytes archive)

Bugfix: % signs in URLs are now no more double escaped. Bugfix: SPC (Blanks) in URLs are now properly % escaped.

version 0.1.3-20070212-080023

(11555 bytes archive)

Bugfix: For some reason, lines which were fed to EKHTML were broken into parts.

It looks like EKHTML internally has a buffer where it copies data to, such that in certain circumstances cb_data() still was called with incomplete lines. This here is another fix for this. It perhaps still isn't perfect, as this should be fixed in cb_data() and not in the call to EKTHML. (However fixing it in cb_data() needs a lot of rewrite.) For now it seems to work as expected.

Also tinohtmlabsurl.sh has now the ability to send non-URL lines to some file descriptor (like /dev/null).

version 0.1.1-20060611-090022

(10865 bytes archive)

Minor changes.

Basically only the documentation was corrected and typos removed.

version 0.1.0-20060212-043634

(10776 bytes archive)

Now parses HTML entities in attributes into the correct text as EKHTML does forget to do this.

Call with --raw to get the old behavior.

This new parsing is just a hack and not complete yet!

version 0.0.0-20050206-011845

(7114 bytes archive)

First version, usable, but nearly not tested at all.

License and Disclaimer

All you can see here is free software according to the GNU GPL.
Copyright (C)2000-2011 by Valentin Hilbig
Note that the software comes with absolutely no warranty of any kind.
You use the software at your own risk.
Valentin Hilbig cannot be hold responsible for any unintended damage,
lost data or malfunction of the software you can find here.

[FSFE contributor 2007]

[end of page - jump back to content][hacker culture]
Last modified: 2011-09-12 by Valentin Hilbig [ Imprint / Impressum ]