cat: Performance improvement when printing line numbers#7645
cat: Performance improvement when printing line numbers#7645sylvestre merged 1 commit intouutils:mainfrom
Conversation
|
Here's some numbers for the performance gains with this change... So ~70% faster than the current mainline implementation, and a tiny bit faster than the GNU implementation. If you pull in #7642 too we actually end up quite a bit quicker than GNU :) |
|
GNU testsuite comparison: |
d14f4a6 to
90828e6
Compare
|
GNU testsuite comparison: |
src/uu/cat/src/cat.rs
Outdated
| fn new() -> Self { | ||
| LineNumber { | ||
| // Initialize buf to b" 1\t" | ||
| buf: vec![b' ', b' ', b' ', b' ', b' ', b'1', b'\t'], |
There was a problem hiding this comment.
Maybe you could write this as:
| buf: vec![b' ', b' ', b' ', b' ', b' ', b'1', b'\t'], | |
| buf: Vec::from(b" 1\t"), |
There was a problem hiding this comment.
Thanks - I've made that change 👍
src/uu/cat/src/cat.rs
Outdated
| // If we hit anything other than a b'9' we can break since the next digit is | ||
| // unaffected. | ||
| // Also note that if we hit a b' ', we can think of that as a 0 and increment to b'1'. | ||
| // If/else is faster than match since we can prioritize most likely digits first. |
There was a problem hiding this comment.
Have you benchmarked this? I think both would be fine here, but a match would be nice.
There was a problem hiding this comment.
I agree the match is nicer, but benchmarking shows a small benefit with the if-else logic.
Here's a benchmark result from my machine (cat => this code, cat.match => if-else replaced with a match)...
$ hyperfine -L cat ./target/release/cat,./target/release/cat.match "{cat} -n ./wikidatawiki-20240901-pages-logging27.xml"
Benchmark 1: ./target/release/cat -n ./wikidatawiki-20240901-pages-logging27.xml
Time (mean ± σ): 5.445 s ± 0.078 s [User: 4.827 s, System: 0.613 s]
Range (min … max): 5.351 s … 5.635 s 10 runs
Benchmark 2: ./target/release/cat.match -n ./wikidatawiki-20240901-pages-logging27.xml
Time (mean ± σ): 5.860 s ± 0.102 s [User: 5.239 s, System: 0.616 s]
Range (min … max): 5.722 s … 6.056 s 10 runs
Summary
./target/release/cat -n ./wikidatawiki-20240901-pages-logging27.xml ran
1.08 ± 0.02 times faster than ./target/release/cat.match -n ./wikidatawiki-20240901-pages-logging27.xml
So it's a few% faster, worth the gain I think.
|
Very Cool! |
Many thanks for the review! |
90828e6 to
b6131f6
Compare
|
GNU testsuite comparison: |
| } | ||
|
|
||
| // Logic to store a string for the line number. Manually incrementing the value | ||
| // represented in a buffer like this is significantly faster than storing |
There was a problem hiding this comment.
This looks very similar to the fast_inc function I have here: https://github.com/uutils/coreutils/pull/7564/files#diff-8ef9575cb5c2b35e15751308bf63d6dbf3ac217a383790f9355bc5e13ab2fdb9R272
I wonder if we should try to reuse that? (my function is a bit more generic as increment can be other values that aren't 1, so I'm not 100% sure how it'll do in terms of performance)
There was a problem hiding this comment.
I'm inclined to leave my code as-is for now. I've done some benchmarking and even a small change in the logic I've written has a measurable difference on the overall performance, so I think moving to a generic function would slow things down quite a bit.
Do you think you'll ever implement a specialization for the common-case of adding exactly 1?
There was a problem hiding this comment.
No problem, happy to play with this after both our PR are merged ,-)
I'm hoping that by forcing #[inline] the compiler will give us the specialized version for free.
There was a problem hiding this comment.
Ah, I played with it anyway ,-P
Dirty commit here: 606655e gets within 1-2% of your implementation.
And 03a0481 gets 1% faster with an optimized version, but it's a bit unfortunate that we need to copy code (I can't see how we can avoid that). edit: bc689e5 ever more optimized...
I'll need to check what happens when I move the function to uucore (not sure yet to understand how rust does things w.r.t. compile/link...)
One difference is that I preallocate a larger buffer, so that the vector never needs to be grown (we'll never reach more than 1024 digits ,-P)
(again, I don't mean to block this PR, happy to look further after merge)
There was a problem hiding this comment.
Hey, for sure if there's a version in uucore that gives the same performance then let's use that 👍
Add a simple class to manually maintain a string representation of the line number for the `cat` application. Maintaing this string is much faster than converting a `usize` line-number variable to a string each time it's needed. Gives a significant performance improvement with -n and -b flags.
b6131f6 to
c56489e
Compare
|
GNU testsuite comparison: |
| // If we hit anything other than a b'9' we can break since the next digit is | ||
| // unaffected. | ||
| // Also note that if we hit a b' ', we can think of that as a 0 and increment to b'1'. | ||
| // If/else here is faster than match (as measured with some benchmarking Apr-2025), |
There was a problem hiding this comment.
interesting, the assembly is indeed a bit different in main here:
https://godbolt.org/z/fr6crT441
Add a simple class to manually maintain a string representation of the line number for the
catapplication.Maintaing this string is much faster than converting a
usizeline-number variable to a string each time it's needed.Gives a significant performance improvement with
-nand-bflags.