Zipper for ALL files
There is no zipper that compresses ALL files without enlarging some. First observe that the zipped version of two different files must not be identical, otherwise the unzipper would not know into which of the two original files to unzip. Now consider the very small files first and assume we have a zipper that does not enlarge any files. We will see that this zipper will not truly compress any file:
- A 0-byte file cannot become smaller, therefore when zipping it you end up with the same 0-byte file.
- Assume you could truly compress a 1-byte file, then it would become a 0-byte file. But the 0-byte file is already the zipped version of the 0-byte file. Therefore, if our zipper should not enlarge any files, it can only make 1-byte files out of 1-byte files. In other words, our zipper would merely "permutate" the set of all 1-byte files, e.g. by flipping all bits or something similar.
- Same consideration for 2-byte files: All 0- or 1-byte files have already been proven to be zipped versions of some other files, therefore zipping 2-byte files results in 2-byte files and our zipper merely permutates the set of 2-byte files.
- The above logic holds for 3-byte files, 4-byte files, 5-byte- files,... etc. just as well and we conclude by induction that our zipper maps n-byte files to n-byte files, i.e. no single file is compressed!
Zipper for large enough files
When we only require that our zipper works for sufficiently large files, a simple algorithm exists. A "zipping" algorithm that operates on all but the 0-byte file works as follows:
- If the file contains n bytes, all of which are zero, then we zip the file to a (n-1)-byte file with all bytes being zero.
- If the file contains non-zero bytes, we leave the file unchanged. As an alternative (if you don't like the fact that the zipper does not really do anything), you write down all bits of the file, interpret it as a binary number and add 1 to it to obtain the bit sequence of the "zipped" file. Except of course if all bits are equal to 1, in which case your "zipped" file would be 00...001. This algorithm permutates all n-byte files having non-zero bytes.
Obviously this zipper does not enlarge any file. It also compresses some files, even an infinite number of them (the ones with zero-bytes only). And no two files get zipped into the same file, which makes it possible to provide an unzipper. The unzipper would simply add a zero-byte to every zero-byte-only-file and leave all other files unchanged (Or, for the alternative, subtract the number 1 in the bit sequence, with the exception that the file with bits 00..001 gets "unzipped" to 11..111.) The "empty" file with length 0 would also be considered to have zero-bytes only (or can you find any 1-bytes in it?) and would be unzipped to a 1-byte file containing a zero byte. Not sure though if the zipper would be very useful for practical applications...
Tricky ways out
A lot of great post were made regarding impossibility proofs and possible solutions. There are indeed some tricky ways to write a zipper that compresses ALL files, some of which were mentioned, that I'd like to discuss briefly. There are probably other tricky solutions.
What if our zipper simply calls any commercial zipper and if the file truly compresses, fine, otherwise we rather leave the file unchanged. Wouldn't that work?
That would be a way out if our unzipping algorithm could somehow determine whether the compressed file was really compressed or if it was simply left unchanged. That would actually require one more bit of information to store somewhere... One could encode that extra bit into the name or extension of the file, e.g. by adding the extension ".ZIPPED" if the file was "really" compressed. The main problem with this approach is, that real world operating systems have file length limits, because what happens if our original file has already maximum length? (If the operating system does not have this restriction, there is an even better tricky way out, see below) Another option is, that the zipper gives a message to the user a la "not really zipped" if the file could not be shrinked, and the user just needs to remember which files were "really" compressed. I.e. we store our "compressed"-bit inside the user, which is a bit of cheating. Similar tricks involve storing that information in the zipping executable itself or in some other system file.
If our operating system allows for arbitrarily long file names and if we allow our zipper to rename the original file, we can even compress all files to length zero! It goes as follows:
- With some suitable algorithm, encode the whole file content into a huge string consisting only of the letters a-z. The only requirement is that the algorithm can be reversed; this is easy to do.
- With some suitable algorithm, encode the whole file name (including extension) into a huge string consisting only of the letters a-z. Again the algorithm must be reversable.
Generate a zero-length file. For its file name, concatenate the encoded file content with the character "0" and the encoded file name. The "0" is to tell the unzipper which part of the file name contains content and which part the name. By reversing the encoding steps for content and name, our unzipper can restore the original file from the name of the compressed file.