Skip to content

base(nc|32|64): Optimize performances reduction memset#9632

Merged
sylvestre merged 7 commits intouutils:mainfrom
mattsu2020:base32_performance_check
Dec 19, 2025
Merged

base(nc|32|64): Optimize performances reduction memset#9632
sylvestre merged 7 commits intouutils:mainfrom
mattsu2020:base32_performance_check

Conversation

@mattsu2020
Copy link
Contributor

@mattsu2020 mattsu2020 commented Dec 11, 2025

Summary

  • Allocate read buffers with capacity and reuse their spare capacity to avoid zeroing overhead in fast encode/decode paths.
  • Switch fast encode/decode to read into spare capacity via slice::from_raw_parts_mut, keeping safety comments to document the initialization guarantees.
  • Tidy base_common.rs imports for consistency.

related
#9621

Refactor buffer creation from zero-initialized vectors to pre-allocated Vec with_capacity,
using unsafe set_len to avoid unnecessary zeroing, improving performance without affecting
correctness, as only initialized bytes from Read::read are accessed.
…decode

Replaced manual unsafe `set_len` calls and direct reads into uninitialized vectors with `MaybeUninit::slice_assume_init_mut` to prevent potential memory safety issues and improve code reliability in `fast_encode` and `fast_decode` modules. Added buffer clearing to ensure proper reuse.
…ce::from_raw_parts_mut

Replace unsafe usage of `MaybeUninit::slice_assume_init_mut` with `slice::from_raw_parts_mut` in the fast_encode and fast_decode modules for reading data into the spare capacity of buffers. This change maintains safety guarantees through updated comments while potentially improving code clarity and performance by avoiding MaybeUninit initialization assumptions. The modification ensures the buffer's uninitialized tail is correctly handled as raw bytes during I/O operations.
Moved the `slice` import from after `collections::VecDeque` to after `num::NonZeroUsize` to better align with the module's import grouping style.
@codspeed-hq
Copy link

codspeed-hq bot commented Dec 11, 2025

CodSpeed Performance Report

Merging #9632 will improve performances by 3.15%

Comparing mattsu2020:base32_performance_check (27b72b4) with main (2000af8)

Summary

⚡ 2 improvements
✅ 125 untouched
⏩ 6 skipped1

Benchmarks breakdown

Benchmark BASE HEAD Change
b64_decode_synthetic 169.8 µs 164.6 µs +3.15%
b64_decode_ignore_garbage_synthetic 169.7 µs 164.6 µs +3.14%

Footnotes

  1. 6 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@mattsu2020 mattsu2020 changed the title base(nc|32|64): Optimize performances remove memset base(nc|32|64): Optimize performances reduction memset Dec 11, 2025
@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/overlay-headers (passes in this run but fails in the 'main' branch)

@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/overlay-headers (passes in this run but fails in the 'main' branch)

@sylvestre
Copy link
Contributor

i am not a fan of adding unsafe. would it be possible to find another way? thanks

Replace unsafe spare_capacity_mut and from_raw_parts_mut usage with safe Vec initialization and direct read calls in fast_encode and fast_decode. This eliminates potential safety risks while preserving buffer functionality.
@github-actions
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/cksum/cksum is no longer failing!

@mattsu2020 mattsu2020 marked this pull request as draft December 16, 2025 08:39
@sylvestre
Copy link
Contributor

is it ready for review?

…icient buffering

Switch from unbuffered Read to BufRead in get_input, handle_input, and fast_encode_stream functions. This reduces syscalls by leveraging buffered reads, improving performance for base32 encoding/decoding operations. Refactor fast_encode_stream to use fill_buf() and manage leftover buffers more efficiently.
@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/overlay-headers (passes in this run but fails in the 'main' branch)

@mattsu2020 mattsu2020 marked this pull request as ready for review December 17, 2025 10:50
@sylvestre sylvestre merged commit 3de9411 into uutils:main Dec 19, 2025
127 checks passed
@mattsu2020 mattsu2020 deleted the base32_performance_check branch December 19, 2025 23:52
CrazyRoka pushed a commit to CrazyRoka/coreutils that referenced this pull request Dec 28, 2025
* perf(base32): optimize read buffer allocation in fast encode/decode

Refactor buffer creation from zero-initialized vectors to pre-allocated Vec with_capacity,
using unsafe set_len to avoid unnecessary zeroing, improving performance without affecting
correctness, as only initialized bytes from Read::read are accessed.

* refactor: use MaybeUninit for safer buffer handling in base32 encode/decode

Replaced manual unsafe `set_len` calls and direct reads into uninitialized vectors with `MaybeUninit::slice_assume_init_mut` to prevent potential memory safety issues and improve code reliability in `fast_encode` and `fast_decode` modules. Added buffer clearing to ensure proper reuse.

* refactor(base32): replace MaybeUninit::slice_assume_init_mut with slice::from_raw_parts_mut

Replace unsafe usage of `MaybeUninit::slice_assume_init_mut` with `slice::from_raw_parts_mut` in the fast_encode and fast_decode modules for reading data into the spare capacity of buffers. This change maintains safety guarantees through updated comments while potentially improving code clarity and performance by avoiding MaybeUninit initialization assumptions. The modification ensures the buffer's uninitialized tail is correctly handled as raw bytes during I/O operations.

* refactor(base32): reorder std imports in base_common.rs for consistency

Moved the `slice` import from after `collections::VecDeque` to after `num::NonZeroUsize` to better align with the module's import grouping style.

* refactor(base32): remove unsafe buffer handling in encode/decode

Replace unsafe spare_capacity_mut and from_raw_parts_mut usage with safe Vec initialization and direct read calls in fast_encode and fast_decode. This eliminates potential safety risks while preserving buffer functionality.

* perf(base32): optimize input handling by switching to BufRead for efficient buffering

Switch from unbuffered Read to BufRead in get_input, handle_input, and fast_encode_stream functions. This reduces syscalls by leveraging buffered reads, improving performance for base32 encoding/decoding operations. Refactor fast_encode_stream to use fill_buf() and manage leftover buffers more efficiently.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants