MANUAL FOR HUFFSTAT (BBC BASIC)

This file describes the program HUFFSTAT which analyses a group of files, then
constructs a Huffman binary tree to reduce the number of bits required to
represent the most frequently occuring symbols, and so compressing it.
The manual and software is (C)2006 SPROW

INSTRUCTIONS-
At the BASIC prompt, type CHAIN"HUFFSTAT"

The program will then read in a set of 7 data files to be analysed. The 
filenames can be changed by altering the DATA statements at the end of the 
program listing, the format is
  DATA "Description text"
  DATA Filename1
  DATA Filename2
  DATA Filename3
  DATA Filename4
  DATA Filename5
  DATA Filename6
  DATA Filename7
to save having to repeatedly enter the filenames, simply add additional groups
of 7 filenames then set the value of "choice%" at the start of the program
to select which group of to use. Ideally, the data files should be around
100kbytes in total.

After reading in the files and creating the frequency distribution, a graph
showing the distribution will be displayed. The graph is shown in high
resolution MODE0, this requires either a second processor to be in use or
the presence of shadow screen memory - alternatively the mode could be 
altered to one requiring less free memory.

Next, a binary tree will be created based on the frequency distribution. 
Symbols which appear very frequently (for example, space, character 32) will
be placed as a "leaf" higher up the tree, and those symbols which appear
infrequently will be placed as leaves lower in the tree - possibly more than
eight branches deep, meaning the output code will take more bits that the
original byte!

The tree generated will be based on the filenames provided, files with
different contents will result in different trees - graphics will be different
to word processor documents for example.

The tree generation stage can take a long time. The use of a second processor
is recommended, as it can take over 15 minutes on an unexpanded micro.

Lastly, the output codes are multiplied by the frequency distribution to work
out how big the data would be had it been compressed using the tree that was
generated. Note that the files aren't actually compressed, this is left as an
exercise for the reader.

The compression statistics are displayed after this last step, expressed as
a fraction of the original size. For example 40% compression would mean that
for a 100k input file the result would occupy 60k.

An attempt to save the graph with the command
 *SCREENSAVE
is made before ending, if this command is not present the program can easily
be altered to use an alternative screen dump command, for example to output
the graph to a printer.

KNOWN PROBLEMS/FUTURE ENHANCEMENTS-
Should convert to use GBPB instead of multiple BGET commands.
No known problems

HISTORY-
V1.00 Original
V1.01 Added bitmap file examples