MANUAL FOR HUFFSTAT (BBC BASIC) This file describes the program HUFFSTAT which analyses a group of files, then constructs a Huffman binary tree to reduce the number of bits required to represent the most frequently occuring symbols, and so compressing it. The manual and software is (C)2006 SPROW INSTRUCTIONS- At the BASIC prompt, type CHAIN"HUFFSTAT" The program will then read in a set of 7 data files to be analysed. The filenames can be changed by altering the DATA statements at the end of the program listing, the format is DATA "Description text" DATA Filename1 DATA Filename2 DATA Filename3 DATA Filename4 DATA Filename5 DATA Filename6 DATA Filename7 to save having to repeatedly enter the filenames, simply add additional groups of 7 filenames then set the value of "choice%" at the start of the program to select which group of to use. Ideally, the data files should be around 100kbytes in total. After reading in the files and creating the frequency distribution, a graph showing the distribution will be displayed. The graph is shown in high resolution MODE0, this requires either a second processor to be in use or the presence of shadow screen memory - alternatively the mode could be altered to one requiring less free memory. Next, a binary tree will be created based on the frequency distribution. Symbols which appear very frequently (for example, space, character 32) will be placed as a "leaf" higher up the tree, and those symbols which appear infrequently will be placed as leaves lower in the tree - possibly more than eight branches deep, meaning the output code will take more bits that the original byte! The tree generated will be based on the filenames provided, files with different contents will result in different trees - graphics will be different to word processor documents for example. The tree generation stage can take a long time. The use of a second processor is recommended, as it can take over 15 minutes on an unexpanded micro. Lastly, the output codes are multiplied by the frequency distribution to work out how big the data would be had it been compressed using the tree that was generated. Note that the files aren't actually compressed, this is left as an exercise for the reader. The compression statistics are displayed after this last step, expressed as a fraction of the original size. For example 40% compression would mean that for a 100k input file the result would occupy 60k. An attempt to save the graph with the command *SCREENSAVE is made before ending, if this command is not present the program can easily be altered to use an alternative screen dump command, for example to output the graph to a printer. KNOWN PROBLEMS/FUTURE ENHANCEMENTS- Should convert to use GBPB instead of multiple BGET commands. No known problems HISTORY- V1.00 Original V1.01 Added bitmap file examples