cat: Performance improvement when printing line numbers by karlmcdowall · Pull Request #7645 · uutils/coreutils

karlmcdowall · 2025-04-04T02:45:15Z

Add a simple class to manually maintain a string representation of the line number for the cat application.
Maintaing this string is much faster than converting a usize line-number variable to a string each time it's needed.

Gives a significant performance improvement with -n and -b flags.

karlmcdowall · 2025-04-04T02:49:20Z

Here's some numbers for the performance gains with this change...

$ hyperfine -L cat /usr/bin/cat,./target/release/cat.original,./target/release/cat "{cat} -n ./wikidatawiki-20240901-pages-logging27.xml"
Benchmark 1: /usr/bin/cat -n ./wikidatawiki-20240901-pages-logging27.xml
  Time (mean ± σ):      5.200 s ±  0.104 s    [User: 4.831 s, System: 0.365 s]
  Range (min … max):    5.103 s …  5.422 s    10 runs
 
Benchmark 2: ./target/release/cat.original -n ./wikidatawiki-20240901-pages-logging27.xml
  Time (mean ± σ):      8.844 s ±  0.111 s    [User: 8.254 s, System: 0.587 s]
  Range (min … max):    8.687 s …  9.007 s    10 runs
 
Benchmark 3: ./target/release/cat -n ./wikidatawiki-20240901-pages-logging27.xml
  Time (mean ± σ):      5.146 s ±  0.082 s    [User: 4.552 s, System: 0.594 s]
  Range (min … max):    5.038 s …  5.259 s    10 runs
 
Summary
  ./target/release/cat -n ./wikidatawiki-20240901-pages-logging27.xml ran
    1.01 ± 0.03 times faster than /usr/bin/cat -n ./wikidatawiki-20240901-pages-logging27.xml
    1.72 ± 0.03 times faster than ./target/release/cat.original -n ./wikidatawiki-20240901-pages-logging27.xml

So ~70% faster than the current mainline implementation, and a tiny bit faster than the GNU implementation.

If you pull in #7642 too we actually end up quite a bit quicker than GNU :)

github-actions · 2025-04-04T03:21:35Z

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

github-actions · 2025-04-04T12:31:05Z

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

tertsdiepraam · 2025-04-04T15:09:00Z

src/uu/cat/src/cat.rs

+    fn new() -> Self {
+        LineNumber {
+            // Initialize buf to b"     1\t"
+            buf: vec![b' ', b' ', b' ', b' ', b' ', b'1', b'\t'],


Maybe you could write this as:

Suggested change

buf: vec![b' ', b' ', b' ', b' ', b' ', b'1', b'\t'],

buf: Vec::from(b" 1\t"),

Thanks - I've made that change 👍

tertsdiepraam · 2025-04-04T15:11:29Z

src/uu/cat/src/cat.rs

+            // If we hit anything other than a b'9' we can break since the next digit is
+            // unaffected.
+            // Also note that if we hit a b' ', we can think of that as a 0 and increment to b'1'.
+            // If/else is faster than match since we can prioritize most likely digits first.


Have you benchmarked this? I think both would be fine here, but a match would be nice.

I agree the match is nicer, but benchmarking shows a small benefit with the if-else logic.
Here's a benchmark result from my machine (cat => this code, cat.match => if-else replaced with a match)...

$ hyperfine -L cat ./target/release/cat,./target/release/cat.match "{cat} -n ./wikidatawiki-20240901-pages-logging27.xml" Benchmark 1: ./target/release/cat -n ./wikidatawiki-20240901-pages-logging27.xml Time (mean ± σ): 5.445 s ± 0.078 s [User: 4.827 s, System: 0.613 s] Range (min … max): 5.351 s … 5.635 s 10 runs Benchmark 2: ./target/release/cat.match -n ./wikidatawiki-20240901-pages-logging27.xml Time (mean ± σ): 5.860 s ± 0.102 s [User: 5.239 s, System: 0.616 s] Range (min … max): 5.722 s … 6.056 s 10 runs Summary ./target/release/cat -n ./wikidatawiki-20240901-pages-logging27.xml ran 1.08 ± 0.02 times faster than ./target/release/cat.match -n ./wikidatawiki-20240901-pages-logging27.xml

So it's a few% faster, worth the gain I think.

tertsdiepraam · 2025-04-04T15:12:18Z

Very Cool!

karlmcdowall · 2025-04-04T16:41:40Z

Very Cool!

Many thanks for the review!

github-actions · 2025-04-04T17:23:27Z

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)

drinkcat · 2025-04-04T18:38:37Z

src/uu/cat/src/cat.rs

+}
+
+// Logic to store a string for the line number. Manually incrementing the value
+// represented in a buffer like this is significantly faster than storing


This looks very similar to the fast_inc function I have here: https://github.com/uutils/coreutils/pull/7564/files#diff-8ef9575cb5c2b35e15751308bf63d6dbf3ac217a383790f9355bc5e13ab2fdb9R272

I wonder if we should try to reuse that? (my function is a bit more generic as increment can be other values that aren't 1, so I'm not 100% sure how it'll do in terms of performance)

I'm inclined to leave my code as-is for now. I've done some benchmarking and even a small change in the logic I've written has a measurable difference on the overall performance, so I think moving to a generic function would slow things down quite a bit.
Do you think you'll ever implement a specialization for the common-case of adding exactly 1?

No problem, happy to play with this after both our PR are merged ,-)

I'm hoping that by forcing #[inline] the compiler will give us the specialized version for free.

Ah, I played with it anyway ,-P

Dirty commit here: 606655e gets within 1-2% of your implementation.

And 03a0481 gets 1% faster with an optimized version, but it's a bit unfortunate that we need to copy code (I can't see how we can avoid that). edit: bc689e5 ever more optimized...

I'll need to check what happens when I move the function to uucore (not sure yet to understand how rust does things w.r.t. compile/link...)

One difference is that I preallocate a larger buffer, so that the vector never needs to be grown (we'll never reach more than 1024 digits ,-P)

(again, I don't mean to block this PR, happy to look further after merge)

Hey, for sure if there's a version in uucore that gives the same performance then let's use that 👍

Add a simple class to manually maintain a string representation of the line number for the `cat` application. Maintaing this string is much faster than converting a `usize` line-number variable to a string each time it's needed. Gives a significant performance improvement with -n and -b flags.

github-actions · 2025-04-05T02:25:19Z

GNU testsuite comparison:

Congrats! The gnu test tests/misc/tee is no longer failing!

sylvestre · 2025-04-05T08:42:42Z

src/uu/cat/src/cat.rs

+            // If we hit anything other than a b'9' we can break since the next digit is
+            // unaffected.
+            // Also note that if we hit a b' ', we can think of that as a 0 and increment to b'1'.
+            // If/else here is faster than match (as measured with some benchmarking Apr-2025),


interesting, the assembly is indeed a bit different in main here:
https://godbolt.org/z/fr6crT441

karlmcdowall force-pushed the cat_line_number_formatting branch 2 times, most recently from d14f4a6 to 90828e6 Compare April 4, 2025 11:52

tertsdiepraam reviewed Apr 4, 2025

View reviewed changes

karlmcdowall force-pushed the cat_line_number_formatting branch from 90828e6 to b6131f6 Compare April 4, 2025 16:42

drinkcat reviewed Apr 4, 2025

View reviewed changes

karlmcdowall force-pushed the cat_line_number_formatting branch from b6131f6 to c56489e Compare April 5, 2025 01:50

sylvestre reviewed Apr 5, 2025

View reviewed changes

sylvestre merged commit f6cadac into uutils:main Apr 5, 2025
68 checks passed

drinkcat mentioned this pull request Apr 18, 2025

Move seq's fast_inc to uucore, use it in cat #7782

Merged

BrewTestBot mentioned this pull request May 24, 2025

uutils-coreutils 0.1.0 Homebrew/homebrew-core#224645

Merged

moonfruit mentioned this pull request May 26, 2025

uutils-selected 0.1.0 moonfruit/homebrew-tap#243

Closed

karlmcdowall deleted the cat_line_number_formatting branch July 5, 2025 20:27

	buf: vec![b' ', b' ', b' ', b' ', b' ', b'1', b'\t'],
	buf: Vec::from(b" 1\t"),

Uh oh!

Conversation

karlmcdowall commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

karlmcdowall commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 4, 2025

Uh oh!

github-actions bot commented Apr 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karlmcdowall Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tertsdiepraam commented Apr 4, 2025

Uh oh!

karlmcdowall commented Apr 4, 2025

Uh oh!

github-actions bot commented Apr 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drinkcat Apr 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 5, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

karlmcdowall commented Apr 4, 2025 •

edited

Loading

karlmcdowall commented Apr 4, 2025 •

edited

Loading

karlmcdowall Apr 4, 2025 •

edited

Loading

drinkcat Apr 5, 2025 •

edited

Loading