cksum: take out regex pattern matching#7580
Conversation
|
GNU testsuite comparison: |
|
of course, perf and file size are key! |
bbbbd00 to
b1a19f7
Compare
|
GNU testsuite comparison: |
b1a19f7 to
8f41040
Compare
|
I think this is ready for review. |
|
GNU testsuite comparison: |
|
looks great! bravo Dunno if it is a regression from this change but it seems that the fuzzer isn't happy: |
|
I cannot find your fuzzer error in the latest test, was it in a previous test ? The CICD / MinRustV issue was my fault, I accidentally swapped variables |
The fuzzer discrepancy is not caused by this change. This PR seems only affect |
|
Very good changes overall 👍
|
| let (filename, checksum) = ByteSliceExt::split_once(after_paren, b") = ") | ||
| .or_else(|| ByteSliceExt::split_once(after_paren, b")= "))?; |
There was a problem hiding this comment.
This won't work for files called like 'foo) = ', because the .position() used in ByteSliceExt is short-circuiting, when we'd like it to be greedy.
There was a problem hiding this comment.
I see, I did not think of this case. If I implement rsplit_once that iterates from right to left, that would solve this, no?
I'll add a test case.
Also, I think there is a second issue here.
MD5(filename)= fds65dsf46as5df4d6f54asd is valid openssl syntax.
MD5 (filename) = fds65dsf46as5df4d6f54asd is valid posix syntax.
MD5(filename) = fds65dsf46as5df4d6f54asd is invalid syntax I think but would match here. I'll add a test case.
There was a problem hiding this comment.
I guess that'll work. I'd add a test where there is a ' ) = ' in the middle of the digest as well to see if the error message is the one expected.
I am looking into it none the less, but independently of this PR. |
|
GNU testsuite comparison: |
5178fc0 to
80e5b2f
Compare
|
GNU testsuite comparison: |
failures:
---- test_who::test_boot stdout ----
run: who --version
uutils-tests-info: version for the reference coreutil 'who' is higher than expected; expected: 8.3, found: 9.4
run: who -b
run: /home/runner/work/coreutils/coreutils/target/x86_64-unknown-linux-gnu/debug/coreutils who -b
thread 'test_who::test_boot' panicked at tests/by-util/test_who.rs:37:39:
assertion failed: `(left == right)`
Diff < left / right > :
> system boot Mar 27 13:00
>
stack backtrace:
0: rust_begin_unwind
at /rustc/4d91de4e48198da2e33413efdcd9cd2cc0c46688/library/std/src/panicking.rs:692:5
1: core::panicking::panic_fmt
at /rustc/4d91de4e48198da2e33413efdcd9cd2cc0c46688/library/core/src/panicking.rs:75:14
2: tests::common::util::CmdResult::stdout_is
at ./tests/common/util.rs:481:9
3: tests::test_who::test_boot
at ./tests/by-util/test_who.rs:37:9
4: tests::test_who::test_boot::{{closure}}
at ./tests/by-util/test_who.rs:33:15
5: core::ops::function::FnOnce::call_once
at /rustc/4d91de4e48198da2e33413efdcd9cd2cc0c46688/library/core/src/ops/function.rs:250:5
6: core::ops::function::FnOnce::call_once
at /rustc/4d91de4e48198da2e33413efdcd9cd2cc0c46688/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
failures:
test_who::test_boot
test result: FAILED. 3836 passed; 1 failed; 54 ignored; 0 measured; 0 filtered out; finished in 92.27sI am pretty confident I did not cause this CICD/build error. |
|
Could you please improve the stack ? |
|
The utf-8 was a regression on my part, the original regex handled filenames as bytes, my first rewrite did not.
maybe a fourth:
What do you think ? |
The fourth one is not needed IMO, the 3 others are good 👍 |
647adc2 to
fca3971
Compare
|
Is this ok? |
|
GNU testsuite comparison: |
removes dependency on the regex crate for LineFormat detection and parsing, resulting in a faster and lighter cksum binary.
…dd tests fixes this non-regex implementation's flaw with file_names containing the separator's pattern: - replaces left-to-right greedy separator match with right-to-left one. - added bugfix tests fixes secondary bug: positive match on hybrid posix-openssl format adds secondary bugfix tests Co-authored-by: Dorian Péron <72708393+RenjiSann@users.noreply.github.com>
fca3971 to
09a9dc7
Compare
|
I'm wondering what's wrong with the SELinux test |
|
GNU testsuite comparison: |
After reading #7504, i looked in https://github.com/uutils/coreutils-tracking/blob/main/individual-size-results/ and saw that
cksumhad a big jump in size as well. I identified the culprit: bringing in the regex crate.After looking a bit into regex for bytes, I tried removing the regex calls altogether and i came up with this branch.
The binary size is much smaller:
1.4MBvs3.1MB.The binary is much faster (?!) tested on a 162500 lines long file of b2sum 50% tagged and 50%untagged checksum :
And passes the gnu tests:
as well as:
If this is of value, please let me know. Then I would still have some cleanup to do.
Any remarks/thoughts are very welcome.