tr: calculate complement set early#6340
Conversation
|
GNU testsuite comparison: |
|
The MacOS failures look unrelated to the |
I also don't have a Mac so my hands are tied. My comment was to make it easier to review c: |
An additional issue was found while reviewing uutils#6340, check [this thread][1]. A summary is: - `tr` ignores the `-t`/`--truncate-set1` flag when not translating - Not translating is defined as `-d` was passed, or one set was passed. [1]: uutils#6340 (comment)
Fixes uutils#6163 and adds a test to verify that a regression is not caused. Instead of inverting the conditions to check (e.g. delete characters **not** present in set1) invert set1 when passed the complement flag (`-c`, `-C`, `--complement`). This is done by calculating set1 then "inverting" it by subtracting from the "full" (universe) set (0..=u8::MAX). This fixes issue 6163 because it was caused by a combination of the `-c` and `-t` flag. `-c` is the abovementioned complement flag and `-t`/`--truncate-set1` truncates set1 to the length of set2. What happened in issue 6163 is that `set1={b'Y'}` and `set2={b'Z'}`, when truncated set1 stays the same and we proceed. The problem is GNU utils does not consider set1 to be `{b'Y'}`, but the complement of `{b'Y'}`, that is `U \ {b'Y'}={0, 1, ..., b'X', b'Z', ...}`, thus it is truncated to `{0}`. We can verify this by doing: `printf '\0' | tr -c -t Y Z`, which prints `Z` to stdout as expected. Additionally, by calculating the complement of set1 we no longer need to consider the complement flag when doing the translate operation, this allows us to delete a lot of code.
An additional issue was found while reviewing uutils#6340, check [this thread][1]. A summary is: - `tr` ignores the `-t`/`--truncate-set1` flag when not translating - Not translating is defined as `-d` was passed, or one set was passed. [1]: uutils#6340 (comment)
|
GNU testsuite comparison: |
|
Ah, I missed the "spelling mistakes". Personally I believe the spellchecker is configured too strictly. For this PR, please somehow make the spellchecker stop complaining about these: You'll find in the same file examples how to use |
Those are new, you didn't miss them, I did T-T |
An additional issue was found while reviewing uutils#6340, check [this thread][1]. A summary is: - `tr` ignores the `-t`/`--truncate-set1` flag when not translating - Not translating is defined as `-d` was passed, or one set was passed. [1]: uutils#6340 (comment)
|
No worries, all is good ^^ |
|
I gave it a try to improving perf by rewriting the set from a I think delaying the translation from args to sets might help, but it doesn't seem necessary to me. Is there a benchmark framework used to test |
|
Yes, the TranslateOperation is order-sensitive. I was mostly thinking about the process of computing the complement itself. And I totally agree, let's keep it as-is for now, and worry about performance later. (And when you do, MEASURE first! I feel like this would make a difference, but feelings aren't reliable enough.) I'm not aware of any consistent benchmarking scheme. However, there are a few |
|
GNU testsuite comparison: |
Fixes #6163 and adds a test to verify that a regression is not caused.
Instead of inverting the conditions to check (e.g. delete characters not present in set1) invert set1 when passed the complement flag (
-c,-C,--complement). This is done by calculating set1 then "inverting" it by subtracting from the "full" (universe) set (0..=u8::MAX).This fixes issue 6163 because it was caused by a combination of the
-cand-tflag.-cis the abovementioned complement flag and-t/--truncate-set1truncates set1 to the length of set2. What happened in issue 6163 is thatset1={b'Y'}andset2={b'Z'}, when truncated set1 stays the same and we proceed. The problem is GNU utils does not consider set1 to be{b'Y'}, but the complement of{b'Y'}, that isU \ {b'Y'}={0, 1, ..., b'X', b'Z', ...}, thus it is truncated to{0}.We can verify this by doing:
printf '\0' | tr -c -t Y Z, which printsZto stdout as expected.Additionally, by calculating the complement of set1 we no longer need to consider the complement flag when doing the translate operation, this allows us to delete a lot of code.