DO NOT SAVE THIS POSTING IF YOU EXPECT SOFTWARE TO BE READY TO RUN. THIS IS HACK CODE. This is yet another posting of compression software. It attempts to make decompression transparent to the programmer by hiding it under a layer of macros which emulate stdio.h functions. At the moment it handles only _input_ from compressed files. However the procedures check the type of the so-called 'FILE *' which they are given, and if it isn't recognised as one of ours, they pass the call on to the real stdio procedure, so when you write files they ought to be written OK (but not compressed). This means you should be able to run programs with the option of compression or not, without recompilation. Simply either compress your input data file or don't at your discretion. This is NOT the zlib code I posted a couple of years back; that code read from 'compress'ed (.Z) files, and was limited to sequential reading. This code uses a user-supplied compression routine (I've thrown in a generic LZ hack, but you should implement your own) and attempts to allow reasonably efficient random seeking within the file. (You use the virtual decompressed seek addresses). [The LZ code was written as a student exercise by a couple of Dutch guys -- Pieter 'tiggr' Schoenmakers and Mikki Waucomont] Lots of people don't post to the net because they don't want to release something which is less than perfect. I am *not* one of those people :-) This code is very much a hack -- I did a reasonably clean prototype some time back, though it was a bit buggy. Recently my co-worker Mark Lester has been hacking it too, and incorporated it into a project we are working on. He fixed the bugs, but unfortunately took out a lot of the stdio emulation. (calls we didn't need for our project -- we were getting a bit short of space on PCs). However I've decided to post his version rather than my old one because he added a lot of buffer cacheing which is very worthwhile. [Note: it's quite possible his versions of the _compression_ code are out of date -- I usually ran the compression side on my Acorn Archimedes because PCs don't have the brains. Our sources may have diverged. But I ran a quick test & it sort of works, modulo some warnings about initialisation which you can ignore I think...] This code also includes an LRU package which was originally a linked- list package written by Paul Moore. (gus...@tharr.uucp) Thanks Paul. There are references in the code to BELL_LICENCE. This hooks in an alternative compression routine written by Tim Bell at Aukland. It is extremely good code (it's the one we actually use in practice) but it is covered by a non-disclosure and commercial licence by the University of Aukland -- therefore the sources are not included in this posting. The 'squidge' command compresses your data files. It choses between one of two algorithms depending on how well each algorithm packs the data. One of the algorithms is LZ and the other is a filthy hack suitable for compressing files of 32-bit integers where most integers are only slightly larger than their predecessors. Database hackers will understand the need for this :-) [Inverted indexes] Try playing around with the blocksize parameter to the squidge program -- it trades off seek time for compression ratio. I have deliberately not put a Archive-name: header on this posting because I would prefer that it isn't archived. I'm posting this for two reasons: 1) I mentioned it in passing on comp.compression recently and a few people asked for it; and 2) I'd much rather you just stole ideas from it then wrote your own! -- more cleanly than this is written. It really is a mess :-( However I know that when I am desperate for something I'd rather get even half-working code as a starting point than write it all from scratch myself. As I said, this code was pulled out of the middle of another system today, so there are lots of grungy bits left all over the place. Perhaps a nice compiler will tell you what isn't used, if you're bothered by that. If anyone implements a nicer version of this sort of system, please let me (or the net directly) know. Regards Graham Toal PS I wouldn't *dare* copyright anything as crap as this... ;-) PPS Despite the loud message at the top of this file, I was surprised to find it compiled & ran OK first time I tried it on a Unix box. Says a lot for the quality of Acorn's ANSI C compiler.