The software on this pages will slowly be moved to GitHub https://github.com/hilbix/. The CVS repository will be migrated to GIT as well, so the history will be preserved, a bit. See FAQ.
The tools are developed under Linux with
release early, release often
So you can consider this beta software, or alpha, or pre-alpha, or even worse ;)
Have a look in the download directory for all downloads.
As always here, all you get is the source. No binaries here.
Simple HTML parser to extract information from HTML files by shell
⇒ ⇒ ⇒ The development shifted to another location. All future versions will be published at https://github.com/hilbix/tinohtmlparse ⇐ ⇐ ⇐
tinohtmlparse is based on ekHTML parser, to enable simple shell scripts to extract data out of an HTML file.
For an example read the README file.
Bugfix release: URL parser in tinohtmlabsurl.sh was broken.
#anchor and ?query type short URLs did not work correctly. It forgot to include the base filename and/or the query. Also "mailto:" and "news:" type URLs shall work now.
Changed the output format. Now TAB preceedes the URL. Now installs into /usr/local/bin
If you need the old format with only spaces (space instead of TAB), use --old.
Bugfix: ∧ had the wrong Entitiy code.
Also: Code now placed under the CLL.
(11983 bytes archive)
(11829 bytes archive)
(11486 bytes archive)
Bugfix: % signs in URLs are now no more double escaped. Bugfix: SPC (Blanks) in URLs are now properly % escaped.
(11555 bytes archive)
Bugfix: For some reason, lines which were fed to EKHTML were broken into parts.
It looks like EKHTML internally has a buffer where it copies data to, such that in certain circumstances cb_data() still was called with incomplete lines. This here is another fix for this. It perhaps still isn't perfect, as this should be fixed in cb_data() and not in the call to EKTHML. (However fixing it in cb_data() needs a lot of rewrite.) For now it seems to work as expected.
Also tinohtmlabsurl.sh has now the ability to send non-URL lines to some file descriptor (like /dev/null).
(10865 bytes archive)
Basically only the documentation was corrected and typos removed.
(10776 bytes archive)
Now parses HTML entities in attributes into the correct text as EKHTML does forget to do this.
Call with --raw to get the old behavior.
This new parsing is just a hack and not complete yet!
(7114 bytes archive)
First version, usable, but nearly not tested at all.
License and Disclaimer
All you can see here is free software according to the GNU GPL.