Fixes #10192 - fix(comm): improve stdout handling and add test for lossy UTF-8 output#10206
Fixes #10192 - fix(comm): improve stdout handling and add test for lossy UTF-8 output#10206sylvestre merged 10 commits intouutils:mainfrom
Conversation
|
GNU testsuite comparison: |
|
please run cargo fmt |
|
GNU testsuite comparison: |
|
The current CI failure occurs during package installation on ubuntu-latest, prior to running any project-specific steps. |
|
GNU testsuite comparison: |
|
As a side note for other reviewers, I felt that the way that the output was written was repetitive and that we should have made a helper function for that but it turns out that we do not have that implemented yet and there's many utilities that follow that same format. Would be great as a follow up task to make that format into a helper and cleaning up all of the ones that match that pattern. |
Wrap stdout in BufWriter to improve performance and avoid duplicate error messages, matching GNU comm behavior.
CodSpeed Performance ReportMerging this PR will degrade performance by 6.4%Comparing Summary
Performance Changes
Footnotes
|
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
…for lossy UTF-8 output (uutils#10206) * fix(comm): improve stdout handling and add test for lossy UTF-8 output * run cargo fmt * perf(comm): use BufWriter for buffered stdout output Wrap stdout in BufWriter to improve performance and avoid duplicate error messages, matching GNU comm behavior. * fix: refactor write operations in comm to use a dedicated function * comm: use translate! --------- Co-authored-by: Sylvestre Ledru <sylvestre@debian.org>
This pull request refactors the output handling in the
commutility to write output directly to a lockedstdouthandle, improving efficiency and error handling, especially for non-UTF-8 input. It also adds a new test to ensure correct output when processing files with invalid UTF-8 sequences.Output handling improvements:
print!macros to writing directly to a lockedstdouthandle viastdout.write_all, allowing for more efficient and robust output, particularly for non-UTF-8 data. All output operations now properly handle errors usingmap_err_context. (src/uu/comm/src/comm.rs) [1] [2] [3] [4] [5] [6]Testing for non-UTF-8 input:
test_output_lossy_utf8, to verify that the utility correctly handles and outputs files containing invalid UTF-8 bytes, matching GNUcomm's behavior. (tests/by-util/test_comm.rs)