The diff
command has the option --ignore-matching-lines = RE
. Looking at the man, it says "ignore changes where all lines match RE" and it seems to ignore when the diff matches the regex.
Also, Japanese translation says, "Ignore changes that just insert / delete lines that match REGEXP. ".
I think these are quite misleading expressions. (I don't know what I mean by reading it again after I understand it ...) This section describes the exact behavior of this option.
Consider the difference between the following two files.
file1.txt
Date: 2020/4/7
Time: 20:00:00
file2.txt
Date: 2020/4/6
Time: 21:00:00
If you take diff
normally
$ diff file1.txt file2.txt
1,2c1,2
< Date: 2020/4/7
< Time: 20:00:00
---
> Date: 2020/4/6
> Time: 21:00:00
It will be. Here, suppose you want to ignore the difference in Date and take the difference only in Time.
$ diff -I Date file1.txt file2.txt
1,2c1,2
< Date: 2020/4/7
< Time: 20:00:00
---
> Date: 2020/4/6
> Time: 21:00:00
But the result is no different. Now try inserting a blank line between Date and Time in file1.txt and file2.txt.
file1.txt
Date: 2020/4/7
Time: 20:00:00
file2.txt
Date: 2020/4/6
Time: 21:00:00
Similarly, if you take diff
$ diff -I Date file1.txt file2.txt
3c3
< Time: 20:00:00
---
> Time: 21:00:00
I was able to retrieve only Time correctly.
This is because --ignore-matching-lines
matches hunk, not lines.
A hunk is a bunch of differences, in the first example
1,2c1,2
< Date: 2020/4/7
< Time: 20:00:00
---
> Date: 2020/4/6
> Time: 21:00:00
Is one hunk. If you try to take diff
normally with a blank line inserted
$ diff file1.txt file2.txt
1c1
< Date: 2020/4/7
---
> Date: 2020/4/6
3c3
< Time: 20:00:00
---
> Time: 21:00:00
And two hunks. The correct behavior of --ignore-matching-lines
is to apply a regular expression match to each of these hunks and ignore the first hunk.
The GNU documentation describes the exact behavior, including this.
However, -I only ignores the insertion or deletion of lines that contain the regular expression if every changed line in the hunk—every insertion and every deletion—matches the regular expression. In other words, for each nonignorable change, diff prints the complete set of changes in its vicinity, including the ignorable ones.
As it says "every changed line in the hunk", it will not be ignored unless all lines of the hunk match the regular expression. If you understand so far, you will understand that the meaning of "all" written in man and the word "only" in the Japanese translation refer to all hunks, but it is quite difficult to notice it at first sight. is not it….