HTMLLIB: HTML output library for Cost

Version 0.1

Joe English
Last updated: Thursday 06 March 2003, 18:45



1 Introduction

HTMLLIB is a small library of Tcl routines for converting SGML to HTML with Cost. Typical usage is to create a file called mydtd-html.spec containing a Cost specification that maps YOURDTD element types to HTML element types, putting it somewhere in the $COSTPATH so Cost can find it, and running

costsh --spec=myfile-html.spec --input=myfile.sgm > myfile.html

A typical specification file might look something like:

# mydtd-html.spec, version 0.1
#
package require Cost
package require Cost-HTML	;# OR 'require htmllib.tcl'

html:configure ?options...?

specification htmlSpec {
    { element Heading }		{ html H1 }
    { element Para } 		{ html P }
    { elements "GlossList VariableList" } { html DL class [query gi] }
    ...

}
#EOF

2 SGML-to-HTML conversion with Cost

Parameters recognized in the specification:

html
...
before
startAction
prefix
content
suffix
endAction
after
...
nodeName
anchorName
... can be #AUTO; need also: makeAnchor.

Allowable values for html parameter

gi [ attname attval ... ]
Most common case.
#IMPLIED
#IGNORE
#NODE
#TEMPLATE

htmllib.tcl provides a default main procedure, which costsh will automatically execute after loading the specification and source document. This is what it does:

... fill this bit in ...

If this doesn't do exactly what you want, you can define your own main routine in the .spec file; it must at least do the following:

... fill this bit in ...

html:configure [ option value ] ...

Sets configuration parameters for HTML conversion. Valid options are:

doctype publicId
...
outputDir
...
filePrefix
...
fileExtension
...
sourceSpec
...
html:preprocess [ specName ]

Makes an initial preprocessing pass over the source document; you must call this before html:convert Sets the following properties:

... fill this in.

html:processNode [ options... ]
html:processChildren [ options... ]

3 Creating multiple output files

4 Managing cross-references

%%% note anchorName param, html:anchorName command; how and when to generate A NAME=... elements.

html:hrefpos

...

5 Multiple modes

... implement this: -spec option to html:processNode, html:processChildren

6 Using HTML templates

7 Automatically converting non-SGML entities

8 Low-level utilities

Some of these may be useful.

html:escape text
html:escapeAttval text

Replace characters in text that would be otherwise interpreted as HTML markup with the appropriate entity references: < becomes &lt;, > becomes &gt;, and & becomes &amp;. html:escapeAttval also replaces single and double quotes with the appropriate numeric character references. Both routines also replace @ signs with numeric character references, in hopes of fooling robots that scour the web for e-mail addresses to spam. Returns: the escaped string.

html:output text

Inserts text verbatim into the current HTML output stream.

html:text text

Inserts text into the current HTML output stream after escaping it with html:escape.

html:startTag GI [ attname attval ... ]

Inserts a start-tag with generic identifier GI and attributes "attname=attval...". The attvals are escaped with html:escapeAttval and enclosed in double-quotes, with one exception: as a special case, if any attval is the same as the corresponding attname, inserts attval literally with no quotes. This is to account for the HTML idiom for boolean attributes, e.g., <DL COMPACT>, <INPUT CHECKED>, etc. These are really equivalent to <DL COMPACT="COMPACT", etc., but many older browsers don't recognize the unminimized form.

For convenience, the attname-attval pairs may be passed as a single list instead of as separate arguments. This is to avoid necessitating the use of eval.

html:endTag

... (note HTMLEmpties) ...

HTML does not allow displayed material inside paragraphs, list items, et cetera, whereas many other document types do. Problem occurs because such items can prematurely terminate the P, LI, etc., element. To work around this, HTMLLIB always omits end tags for P etc., so that the result is still valid. (A better long-term solution would be to keep track of the output and re-insert start-tags as necessary ...)

... Problem: Apparently Netscape doesn't handle CSS properly if there are any omitted end-tags... sigh.

html:element GI [ attname attval ... ] script

html:beginPage

...

html:endPage

...

9 Bugs

...

Doesn't properly handle HTML elements with CDATA declared content (SCRIPT, inline STYLE, XMP, and LISTING).

%%% Testing: joe@flightlab.com

Some tips: use "{element ...} {html {P class [query gi]}}" wherever appropriate; this will add value to the output.

Might possibly need: "break page here" specification: Sect (p1, p2, p3, BREAK, p4, p5, p6) ==> Node(Head, p1, p2, p3), Node(Continued-Head, p4, p5, p6).