.TH ps2html 1
.SH NAME
ps2html \- convert PostScript documents into HTML documents
.SH SYNOPSIS
.B ps2html
.RB [ " options " ]
[
.IR file.ps
]
[
.IR file.html
]
.SH DESCRIPTION
.B ps2html
applies on a
PostScript document and attempts to represent it's objects in HTML
format. This is achieved by using a PostScript library in order
to obtain as much information as possible from the PostScript file
and later proccessing this information, attempting to reconstruct
the structure of original document in an HTML format. Objects
represented in ouput consist of tables, shapes, bitmap images,
headers, footnotes, ordered or unordered lists and mathematical
expressions. Moreover, a great deal of attention is paid in representing
as much of pure text structure as possible, including fonts sizes
and weights (bold, italic, etc), vertical line distances, centered
or indented lines, as well as the capability of extracting the
PostScript code of files (mostly describing shapes) included in
the main PostSciprt file. In this way, one can use this program to
retrieve and proccess a PostSciprt file whose source one has lost.
ps2html calls Ghostscript, and requires Aladdin Ghostscript
version 3.54 or newer. Ghostscript must be invokable on the
current search path as gs. Other tools used by ps2html for image
extraction are hackppm, pnmcrop, ppmtogif and cjpeg. These filters
must also be invokable for ps2html to run efficiently
ps2html reads and processes its command line from left to right,
ignoring the case of options. When it encounters a filename, it
opens the file and expects to find a PostScript document to process.
It can also read options from a default option file called .ps2htmlrc
and located in each user's home directory. If this file does not exit,
ps2html will automatically create it. Additionally users can create
their own configuration files (structured similarly to .ps2htmlrc) and
load them through the -o option. For more information on options read
the options section.
.SH OPTIONS
.TP
.B \-o [ filename ]
.B Use
filename as a configuration file for ps2html
.TP
.B \-e [ filename ]
.B Use
Encoding Vector described in filename in order
to decode Type 3 fonts.
.TP
.B \-noimages
.B force
ps2html not to proccess any kind of images.
When this option is set, results are similar to
.IR pstotext
or
.IR pstoascii.
.TP
.B \-gif
.B set
image format to gif.
.TP
.B \-jpeg
.B set
image format to jpeg.
.TP
.B \-math
.B tell
ps2html to use TeXMathEncoding in order to extract
mathematical expressions from a dvips generated PostScript file.
.TP
.B \-psimages
.B tell
ps2html to extract images and shapes in their original
PostScript format.
.TP
.BI \-t " title"
.B optional
title of HTML document.
.TP
.BI \-D " path"
.B tell
ps2html to use
.I path
as a result directory. Default is ${HOME}/public_html/Results/
.TP
.BI \-P " n"
.B add
.BR n
to Page Index in order for soft links to work properly.
.TP
.BI \-f " n"
.B tell
ps2html to use
.I n
as a
.I fontsize
scale factor.
.I n
defaults to 5. 0 means no scalling at all.
.TP
.B \-check
.B check
whether environment is suitable for
.I ps2html
to run appropriately.
.TP
.B \-help
.B display
a short help message.
.TP
.B \-debug
.B useful for debugging.
.SH DETAILS
.B ps2html
extracts string the same way
.I pstotext
does. However, it attempts to retrieve and reconstruct more complex
objects such as tables, bitmap images, shapes, ordered and unordered lists
and footnotes. Moreover, one of its main goals is to maintain most of source
document's structure, including line, paragraph and page breaks, vertical line
distance, centered and indented lines, table of contents (whenever one is
found), font sizes and weights (bold, italic, emphasized). All of the above
are represented in HTML format and enable partial recontruction of the
original document the PostScript file came from.
ps2html takes 5 steps to complete its execution. These are:
.TP
.B Step 1:
Initialize output directory where results should be stored.
.TP
.B Step 2:
Parse input file in order to obtain information included in it.
This kind of information includes number of pages, bounding box,
fonts being used, as well as who the creator of the document is.
When \-psimages is set, ps2html attempts to extract the original
code for possible .eps files and store it in seperate files for
later use.
.TP
.B Step 3:
Invoke GhostScript with input file and PostScript library. Produces a
temporary file containing information about most of input file's objects.
.TP
.B Step 4:
Parse temporary file and proccess with the information included
in it. As a result most of PostScript objects are recognised and
converted in HTML format.
.TP
.B Step 5:
In the case of shapes or images being successfully recognized,
reinvoke GhostScript in order to create bitmap images out of
PostScript pages and store them in the results directory.
.SH BUGS
The complexity and peculiarity of many PostScript documents is the
main reason for ps2html to produce poor results. However one should mention
that this tool was developped for research purposes mainly and
its inefficiency should not be taken into consideration.
Feel free to use this product and send your comments to:
nikop@csd.uch.gr
.SH AUTHOR
.B Yannis
Nikopoylos, undergraduate student of Computer Science Department,
Univercity of Crete, Greece.
.SH SEE ALSO
.B prescript,
.B pstotext,
.B ps2ascii,
.B webify
.SH COPYRIGHT
Copyright 1998 Foundation Of Research and TecHnology (FORTH),
Computer Science Institute.