Joe English
Last updated: Thursday 06 March 2003, 18:45
HTMLLIB is a small library of Tcl routines for converting SGML to HTML with Cost. Typical usage is to create a file called mydtd-html.spec containing a Cost specification that maps YOURDTD element types to HTML element types, putting it somewhere in the $COSTPATH so Cost can find it, and running
costsh --spec=myfile-html.spec --input=myfile.sgm > myfile.html
A typical specification file might look something like:
# mydtd-html.spec, version 0.1 # package require Cost package require Cost-HTML ;# OR 'require htmllib.tcl' html:configure ?options...? specification htmlSpec { { element Heading } { html H1 } { element Para } { html P } { elements "GlossList VariableList" } { html DL class [query gi] } ... } #EOF
Parameters recognized in the specification:
Allowable values for html parameter
htmllib.tcl provides a default main procedure, which costsh will automatically execute after loading the specification and source document. This is what it does:
... fill this bit in ...
If this doesn't do exactly what you want, you can define your own main routine in the .spec file; it must at least do the following:
... fill this bit in ...
html:configure [ option value ] ...
Sets configuration parameters for HTML conversion. Valid options are:
html:preprocess [ specName ]
Makes an initial preprocessing pass over the source document; you must call this before html:convert Sets the following properties:
... fill this in.
html:processNode [ options... ] html:processChildren [ options... ]
%%% note anchorName param, html:anchorName command; how and when to generate A NAME=... elements.
html:hrefpos
...
... implement this: -spec option to html:processNode, html:processChildren
Some of these may be useful.
html:escape text html:escapeAttval text
Replace characters in text that would be otherwise interpreted as HTML markup with the appropriate entity references: < becomes <, > becomes >, and & becomes &. html:escapeAttval also replaces single and double quotes with the appropriate numeric character references. Both routines also replace @ signs with numeric character references, in hopes of fooling robots that scour the web for e-mail addresses to spam. Returns: the escaped string.
html:output text
Inserts text verbatim into the current HTML output stream.
html:text text
Inserts text into the current HTML output stream after escaping it with html:escape.
html:startTag GI [ attname attval ... ]
Inserts a start-tag with generic identifier GI and attributes "attname=attval...". The attvals are escaped with html:escapeAttval and enclosed in double-quotes, with one exception: as a special case, if any attval is the same as the corresponding attname, inserts attval literally with no quotes. This is to account for the HTML idiom for boolean attributes, e.g., <DL COMPACT>, <INPUT CHECKED>, etc. These are really equivalent to <DL COMPACT="COMPACT", etc., but many older browsers don't recognize the unminimized form.
For convenience, the attname-attval pairs may be passed as a single list instead of as separate arguments. This is to avoid necessitating the use of eval.
html:endTag
... (note HTMLEmpties) ...
HTML does not allow displayed material inside paragraphs, list items, et cetera, whereas many other document types do. Problem occurs because such items can prematurely terminate the P, LI, etc., element. To work around this, HTMLLIB always omits end tags for P etc., so that the result is still valid. (A better long-term solution would be to keep track of the output and re-insert start-tags as necessary ...)
... Problem: Apparently Netscape doesn't handle CSS properly if there are any omitted end-tags... sigh.
html:element GI [ attname attval ... ] script
html:beginPage
...
html:endPage
...
...
Doesn't properly handle HTML elements with CDATA declared content (SCRIPT, inline STYLE, XMP, and LISTING).
%%% Testing: joe@flightlab.com
Some tips: use "{element ...} {html {P class [query gi]}}" wherever appropriate; this will add value to the output.
Might possibly need: "break page here" specification: Sect (p1, p2, p3, BREAK, p4, p5, p6) ==> Node(Head, p1, p2, p3), Node(Continued-Head, p4, p5, p6).