Go to the first, previous, next, last section, table of contents.


Future Projects

Here are some ideas for improving GNU diff and patch. The GNU project has identified some improvements as potential programming projects for volunteers. You can also help by reporting any bugs that you find.

If you are a programmer and would like to contribute something to the GNU project, please consider volunteering for one of these projects. If you are seriously contemplating work, please write to `gnu@prep.ai.mit.edu' to coordinate with other volunteers.

Suggested Projects for Improving GNU diff and patch

One should be able to use GNU diff to generate a patch from any pair of directory trees, and given the patch and a copy of one such tree, use patch to generate a faithful copy of the other. Unfortunately, some changes to directory trees cannot be expressed using current patch formats; also, patch does not handle some of the existing formats. These shortcomings motivate the following suggested projects.

Handling Changes to the Directory Structure

diff and patch do not handle some changes to directory structure. For example, suppose one directory tree contains a directory named `D' with some subsidiary files, and another contains a file with the same name `D'. `diff -r' does not output enough information for patch to transform the the directory subtree into the file.

There should be a way to specify that a file has been deleted without having to include its entire contents in the patch file. There should also be a way to tell patch that a file was renamed, even if there is no way for diff to generate such information.

These problems can be fixed by extending the diff output format to represent changes in directory structure, and extending patch to understand these extensions.

Files that are Neither Directories Nor Regular Files

Some files are neither directories nor regular files: they are unusual files like symbolic links, device special files, named pipes, and sockets. Currently, diff treats symbolic links like regular files; it treats other special files like regular files if they are specified at the top level, but simply reports their presence when comparing directories. This means that patch cannot represent changes to such files. For example, if you change which file a symbolic link points to, diff outputs the difference between the two files, instead of the change to the symbolic link.

diff should optionally report changes to special files specially, and patch should be extended to understand these extensions.

File Names that Contain Unusual Characters

When a file name contains an unusual character like a newline or white space, `diff -r' generates a patch that patch cannot parse. The problem is with format of diff output, not just with patch, because with odd enough file names one can cause diff to generate a patch that is syntactically correct but patches the wrong files. The format of diff output should be extended to handle all possible file names.

Arbitrary Limits

GNU diff can analyze files with arbitrarily long lines and files that end in incomplete lines. However, patch cannot patch such files. The patch internal limits on line lengths should be removed, and patch should be extended to parse diff reports of incomplete lines.

Handling Files that Do Not Fit in Memory

diff operates by reading both files into memory. This method fails if the files are too large, and diff should have a fallback.

One way to do this is to scan the files sequentially to compute hash codes of the lines and put the lines in equivalence classes based only on hash code. Then compare the files normally. This does produce some false matches.

Then scan the two files sequentially again, checking each match to see whether it is real. When a match is not real, mark both the "matching" lines as changed. Then build an edit script as usual.

The output routines would have to be changed to scan the files sequentially looking for the text to print.

Ignoring Certain Changes

It would be nice to have a feature for specifying two strings, one in from-file and one in to-file, which should be considered to match. Thus, if the two strings are `foo' and `bar', then if two lines differ only in that `foo' in file 1 corresponds to `bar' in file 2, the lines are treated as identical.

It is not clear how general this feature can or should be, or what syntax should be used for it.

Reporting Bugs

If you think you have found a bug in GNU cmp, diff, diff3, sdiff, or patch, please report it by electronic mail to `bug-gnu-utils@prep.ai.mit.edu'. Send as precise a description of the problem as you can, including sample input files that produce the bug, if applicable.

Because Larry Wall has not released a new version of patch since mid 1988 and the GNU version of patch has been changed since then, please send bug reports for patch by electronic mail to both `bug-gnu-utils@prep.ai.mit.edu' and `lwall@netlabs.com'.


Go to the first, previous, next, last section, table of contents.