/*
Take an image of a page, and apply shears in X and Y to straighten
lines in the page. The shearing is a "poor man's" approximation
to rotation. The assumption is that the skewed text will still be
acceptable to the accompanying OCR algorithm.
The algorithm is to sample a path across the page, the same as drawing a
line across the page with a DDA. This nearly-horizontal line is slid
vertically over the whole page, and this is repeated for various slopes
of the line. The angle which provides the most intersections between
the line and the white pixels in the image is presumably the angle
where the scanline was able to squeeze between horzontal lines of type.
Since the total number of white vs black pixels on the page remains the
same, no matter what the scanning angle, the discriminating function
has to specifically weigh long runs of white more heavily, so the
score will probably be something like (white-black) or white^2 ...
maybe some sort of root-mean-square algorithm to be determined by
experiment. Whether raw black vs white data is sufficient or whether
we need to specifically count the lengths of runs of white pixels
will be determined by experiment.
The first hack will be crude and assume 1-bit images. Later versions
may use greyscale or colour if needed. Code should accept any input
format including high-res colour - we'll probably use the 'netpbm'
package so that we can work with one standard format and use their
large range of conversion tools (notably "anytopnm") to convert
any sort of graphics file on input. During development we will
probably want immediate graphical feedback but this may be too cumbersome
under the command-line unix I use, so I'll probably generate jpg's to
view on a web page by way of feedback.
Note: we could possibly also find horizontal lines (underlines, rules
etc) with the same algorithm, looking for black rather than white runs.
Perhaps the best approach may be to look for both at once. In fact the
RMS test could be extra useful here as (white-black)^2 will be positive
for an imbalance of black or white pixels.
This is fast code. Time to analyse a page is about equivalent to the
time to load and save the graphics image. No attempt has been made yet
to do bit-level optimisation, and a faster line-drawing algorithm
should speed it up by a factor of say 4 or more.
Once the angle of skew is determined, we can rotate the page by using
essentially the same algorithm; moving across the page horizontally,
following the slope of the DDA. This will give an output page of the
same width as the input page. Actually to be more precise we should
also apply a shear along the other axis, which can be done with
the very same algoritm, just rotated 90deg. In other words, instead
of just moving the horizontal scanning line up the page, we move it
up and also across a couple of pixels to account for the vertical
skew too. The skew is simply the horizontal skew rotated 90deg. This
is a fast way to do rotation; the only problem being that the resulting
bitmap will be slightly stretched in X and Y, by a few pixels. For OCR
this is not an issue. However the code could be adapted to become a
general-purpose rotator by simply adding a post-processing phase where
the bitmap is scaled in X and X by an appropriate factor. Code to
do this (rectilinearly) already exists elsewhere (eg Acorn had
Floyd-Steinberg dithered x/y re-scaling on the ARM years ago). It is
this latter stage which slows the process of rotation down, no matter
how it is implemented; fortunately for us we don't need it.
By carefully chosing the order of processing it should be possible
to update the bitmap in place, but that is probably overkill because
we need to write out an output file anyway and can do the scanning on
the fly as we write to the output.
Note: the complexity of this algorithm is primarily x*y, however
if we scan with slopes whose slant increases by only one pixel per
attempt, we end up with a complexity of x*y*y. A much more sensible
thing to do would be to quantise the scan into say 32 fixed angles,
so that a larger bitmap does not require more passes. The accuracy
of the algorithm shouldn't be overly affected. Perhaps having found
a local maximum with this crude filter, we could fill in the gaps
by doing a +/- scan in the gaps immediately adjacent to the rougher
estimate.
Implementation note: the vertical sliding of the scan line window
across the page is currently in an outer-loop relative to the DDA
procedure itself. In fact we really only need to calculate the pixels
making up the slope of this line just once, because all other lines
will be parallel to it, so we could in fact implement the sliding
as an inner loop inside the DDA procedure, as we move across the
page one pixel at a time.
*/
#include
#include
/* pnmshear.c - read a portable anymap and shear it by some angle
**
** Copyright (C) 1989, 1991 by Jef Poskanzer.
**
** Permission to use, copy, modify, and distribute this software and its
** documentation for any purpose and without fee is hereby granted, provided
** that the above copyright notice appear in all copies and that both that
** copyright notice and this permission notice appear in supporting
** documentation. This software is provided "as is" without express or
** implied warranty.
*/
#include
#include "pnm.h"
#include "shhopt.h"
#ifndef M_PI
#define M_PI 3.14159265358979323846
#endif /*M_PI*/
#define SCALE 4096
#define HALFSCALE 2048
struct cmdline_info {
/* All the information the user supplied in the command line,
in a form easy for the program to use.
*/
const char * input_filespec; /* Filespec of input file */
double angle; /* requested shear angle, in radians */
};
static void
parse_command_line(int argc, char ** argv,
struct cmdline_info *cmdlineP) {
optStruct3 opt;
unsigned int option_def_index = 0;
optEntry *option_def = malloc(100*sizeof(optEntry));
opt.opt_table = option_def;
opt.short_allowed = FALSE;
opt.allowNegNum = TRUE;
optParseOptions3(&argc, argv, opt, sizeof(opt), 0);
if (argc-1 < 1)
pm_error("Need an argument: the shear angle.\n");
else {
char *endptr;
cmdlineP->angle = strtod(argv[1], &endptr) * M_PI / 180;
if (*endptr != '\0' || strlen(argv[1]) == 0)
pm_error("Angle argument is not a valid floating point number: "
"'%s'", argv[1]);
if (argc-1 < 2)
cmdlineP->input_filespec = "-";
else {
cmdlineP->input_filespec = argv[2];
if (argc-1 > 2)
pm_error("too many arguments (%d). "
"The only arguments are shear angle and filespec.",
argc-1);
}
}
}
static void
makeNewXel(xel * const outputXelP, xel const curXel, xel const prevXel,
double const fracnew0, double const omfracnew0, int const format) {
/*----------------------------------------------------------------------------
Create an output xel as *outputXel, which is part curXel and part
prevXel, the part given by the fractions omfracnew0 and fracnew0,
respectively. These fraction values are the numerator of a fraction
whose denominator is SCALE.
The format of the pixel is 'format'.
-----------------------------------------------------------------------------*/
switch ( PNM_FORMAT_TYPE(format) ) {
case PPM_TYPE:
PPM_ASSIGN( *outputXelP,
( fracnew0 * PPM_GETR(prevXel)
+ omfracnew0 * PPM_GETR(curXel)
+ HALFSCALE ) / SCALE,
( fracnew0 * PPM_GETG(prevXel)
+ omfracnew0 * PPM_GETG(curXel)
+ HALFSCALE ) / SCALE,
( fracnew0 * PPM_GETB(prevXel)
+ omfracnew0 * PPM_GETB(curXel)
+ HALFSCALE ) / SCALE );
break;
default:
PNM_ASSIGN1( *outputXelP,
( fracnew0 * PNM_GET1(prevXel)
+ omfracnew0 * PNM_GET1(curXel)
+ HALFSCALE ) / SCALE );
break;
}
}
static void
shear_row(xel * const xelrow, int const cols,
xel * const newxelrow, int const newcols,
double const shearCols,
int const format, xel const bgxel) {
/*----------------------------------------------------------------------------
Shear the row 'xelrow' by 'shearCols' columns, and return the result as
'newxelrow'. They are 'cols' and 'newcols' columns wide, respectively.
Our OCR page rotation code knows the exact number of pixels to
shear by, so it's fortuitous that this procedure is written that
way, since the upper level of pbmshear actually uses floating
point degrees.
Fill in the part of the output row that doesn't contain image data with
'bgxel'.
The format of the input xels (which implies something about the
output xels too) is 'format'.
-----------------------------------------------------------------------------*/
int const intShearCols = (int) shearCols;
int col;
for ( col = 0; col < intShearCols; ++col )
newxelrow[col] = bgxel;
for ( col = 0; col < cols; ++col )
newxelrow[intShearCols+col] = xelrow[col];
for ( col = intShearCols + cols; col < newcols; ++col )
newxelrow[col] = bgxel;
}
int pnmshear(int argc, char * argv[]) {
FILE* ifp;
xel* xelrow;
xel* newxelrow;
xel bgxel;
int rows, cols, format;
int newformat, newcols;
int row;
xelval maxval, newmaxval;
double shearfac;
struct cmdline_info cmdline;
pnm_init( &argc, argv );
parse_command_line( argc, argv, &cmdline );
ifp = pm_openr( cmdline.input_filespec );
pnm_readpnminit( ifp, &cols, &rows, &maxval, &format );
xelrow = pnm_allocrow( cols );
/* Promote PBM files to PGM. */
newformat = format;
newmaxval = maxval;
shearfac = tan( cmdline.angle );
if ( shearfac < 0.0 )
shearfac = -shearfac;
newcols = rows * shearfac + cols + 0.999999;
pnm_writepnminit( stdout, newcols, rows, newmaxval, newformat, 0 );
newxelrow = pnm_allocrow( newcols );
bgxel = pnm_backgroundxelrow( xelrow, cols, newmaxval, format );
for ( row = 0; row < rows; ++row ) {
double shearCols;
pnm_readpnmrow( ifp, xelrow, cols, newmaxval, format );
if ( cmdline.angle > 0.0 )
shearCols = row * shearfac;
else
shearCols = ( rows - row ) * shearfac;
shear_row(xelrow, cols, newxelrow, newcols,
shearCols, format, bgxel);
pnm_writepnmrow( stdout, newxelrow, newcols, newmaxval, newformat, 0 );
}
pm_close( ifp );
pm_close( stdout );
exit( 0 );
}
void read_image(int *maxx, int *maxy)
{
/* Read a bitmap, return its size;
Numbers are inclusive, ie bounds are 0:maxx-1, 0:maxy-1 */
*maxx = 8 * 400;
*maxy = 12 * 400;
}
int samplepixel(int x, int y)
{
/* get the value of a pixel. For now, apply threshholding and return 0/1 */
return(0);
}
/*
The code below is a DDA trivially adapted from a line-drawing algorithm;
I have faster algorithms than this - even an anti-aliased one - but
this is rock-solid reliable code that can be trusted during the debugging
phase (and also it appears to be fast enough already!)
*/
int countwhite(int y1,int x1,int y2, int x2)
{
/* calculate a DDA from (xl,yb) to (xr,yt). */
/* instead of plotting pixels, add them up somehow */
int temp,dx,dy,x,y,x_sign,y_sign,flag;
int tot = 0;
dx=abs(x2-x1); // Delta of X
dy=abs(y2-y1); // Delta of Y
if (((dx >=dy) && (x1>x2)) || // Make sure that first coordinate
((dy>dx) && (y1>y2))) // is the one with least value
{
temp=x1;
x1=x2;
x2=temp;
temp=y1;
y1=y2;
y2=temp;
};
if ((y2-y1)<0) y_sign=-1; // The direction into which Y-coord shall travel
else y_sign=1; // Same for X
if ((x2-x1)<0) x_sign=-1; // ---- " ----
else x_sign=1; // ---- " ----
if (dx>=dy) // Which one of the deltas is the greatest one
{
for (x=x1,y=y1,flag=0;x<=x2;x++,flag+=dy) // From x1 to x2
{ // Also increase the
if (flag>=dx) // Increase/decrease // flag (displacement value)
{ // y!
flag-=dx;
y+=y_sign;
};
tot += samplepixel(x,y); // Plot the pixel
};
}
else
{
/* This is the same as above, just with x as y and vice versa.*/
for (x=x1,y=y1,flag=0;y<=y2;y++,flag+=dx)
{
if (flag>=dy)
{
flag-=dy;
x+=x_sign;
};
tot += samplepixel(x,y);
};
};
return(tot);
}
int main(int argc, char **argv)
{
int y, v, delta, maxx, maxy;
int step;
int **score;
int *total;
read_image(&maxx, &maxy); /* get width and height of image */
/* calculate shear in vertical pixels for -5deg to +5deg */
/* tan alpha = opp/adj. (opp = Y, adj = x) -> y = x * tan.alpha */
v = (maxx / 16);
/* very crude approximation for tan - approx 6deg. is it enough? */
/* NOTE: some care is taken here to get the boundary conditions right;
be very careful when tweaking this code to avoid +1 errors */
score = malloc(((v*2)+1) * sizeof(int *));
total = malloc(((v*2)+1) * sizeof(int));
if (score == NULL || total == NULL) {
fprintf(stderr, "straighten: very low on ram\n"); exit(1);
}
score += v; /* adjust origin, so have -v:v */
total += v;
step = (2*v)/32; /* 32 fixed angles. */
if (step == 0) step = 1;
for (delta = -v; delta <= v; delta += step) { /* typically -32:32 */
score[delta] = malloc(maxy * sizeof(int));
if (score[delta] == NULL) {
fprintf(stderr, "straighten: low on ram\n"); exit(2);
}
total[delta] = 0;
for (y = v; y < maxy-v; y++) {
/* tweaking by +/-v is so we don't run off the page */
/* draw a line from 0,0 to maxx,v for all v */
total[delta] += score[delta][y] = countwhite(0,y, maxx-1,y+delta);
}
}
/* Choose best score from -v to v-1 and generate image with this skew.
With a little care we can apply the skew either in situ or as we write
the output file, and avoid a second buffer */
for (delta = -v; delta <= v; delta += step) {
fprintf(stdout, "total[%d] = %d\n", delta, total[delta]);
}
exit(0);
}