A file compressor is great for shrinking stored files, but it depresses me whenever I see a file grow instead of shrink. So what I am looking for is a file compression algorithm that never inflates any files, although it is allowed that some files (not all of course!) have the same length after "compression". Ideally it should work on files of all sizes, but I would be satisfied with a compressor that operates only on files larger than 1MB.
Can you provide such an algorithm? No programming knowledge is required for this problem.
All files are basically a string of 0's and 1's. The shortest files would then be one bit long. Clearly these are incompressable so no file compressor can compress any file.
What a compressor could do is look for long strings of 0's or 1's. Lets say there is a string of n 1's in a row beginning at position p. The compressor could remove these bits from the file and put a note at the end which says these bits were removed.
Presumably there is a minimum string length such that this note would be shorter than the string removed. Unfortunately for my scheme there is no guarantee this will occur.
A 1MB (8,000,000 bits more or less) file of randomly chosen bits is likely to have a string of 23 0's or 1's in a row. The position of the string would take 13 bits to encode, the length of the string takes 5 bits and the specification of 0's or 1's takes 1 bit. I don't know if other bits would be needed but it appears my scheme can shorten files to a small degree.
A real file probably has very long strings so maybe this could be even better in reality.
If there are no strings of minimum length, the compressor could just leave the file as it is so as not to enlarge it.
|
Posted by Jer
on 2006-10-18 12:04:55 |