|
Binary Files Comparing
Preface...
Construction of a small-size patch is closely related to the number of changes of the new version versus an old one, and also to the efficiency of the methods, capable to determine these changes and present them in a compact form. Well-known algorithms of text data comparing can't help in comparing of the binary data when the structure of files is completely unknown in advance, and the nature of these changes is poorly predicted. Attempt to apply LCS-search algorithms (Largest Common Subsequence) also have shown their incompetence. Comparison of binary files requires a special approach which is successfully implemented in PatchFactory3. In comparison with the previous versions of the program (1.x and 2.x) we have managed to greatly increase quality and speed of comparing process in version 3.
The basis of the algorithm is the ability to find concurrence in compared files at a level of octet-byte subsequences. Such byte-oriented nature of the algorithm allows to make the efficiency of its work independent of a file format. The size of a resulting difference file, in view of time spent for its construction is assumed as the main criterion of patch-building algorithm efficiency.
Key features of new algorithm:
- Significant improvement of algorithm quality parameters (generated difference files are smaller, work speed is greater);
- Special optimization providing considerable raise of comparison quality for executable modules (exe, dll);
- Comparing files with size up to 2^63 bytes;
- Flexible control under ratio "time/result" with the help of different comparing methods selection;
- Setting utilized resources constraints (memory size, process priority);
- Caching of comparing results. Storage of comparing results in intermediate files allows to significantly reduce the time of patch building.
The comparing algorithm is implemented as a separate console program dfbuild.exe In PatchFactory v3. Accepting as entrance parameters the file to be compared (old and new), dfbuild constructs a difference between them and saves it in a df-file.
The table below contains the examples (implemented for well-known software products main exe-files), illustrating the efficiency of df-files in comparison with LZMA-compression. Percentage values show the ratio of the LZMA-compressed file and df-file sizes to the size of a new file.


PatchFactory v3 Patching Engine Performance table
To evaluate the performance of PatchFactory's byte-level patching engine when building a difference between two particular versions, we have prepared a comparison table (below) for well-known software products where you can see the original size of the new version files, the full original installation package (available from the the author of the corresponding software product), the Update package size (from the author), the difference size (built with the help of PatchFactory), and the result compression ratio of the result difference size (prepared by means of PatchFactory) versus original installation package and uncompressed new version files of the software product.

|