date: add ICU support for day/months names#10457
Conversation
eedfc7e to
d505aa4
Compare
|
GNU testsuite comparison: |
dd7465b to
bfff822
Compare
|
GNU testsuite comparison: |
bfff822 to
8219153
Compare
|
GNU testsuite comparison: |
|
I've spent a bunch of time prototyping approaches to the issue of locale and calendar support in date and I am aligned that the approach is to pre-intercept and do our own replacement before passing it to Jiff. Its unfortunate that it means that we essentially have to recreate the majority of the date parser. That being said reviewing this is a bit tricky because its going to be a long journey to get to full compatibility with this approach and we could go back and forth for a long time with things that don't match the gnu date implementation so I'm going to try to keep it high level. From a review perspective I think it would be better to build forward and merge as is and keep iterating on it to keep the PR's smaller and keep adding more regression test cases. The high level comments I would have mainly relate to the approach of the placeholder, when testing locally I could not come up with an example where doing the injection before passing it to Jiff would create a conflict compared to injecting the placeholders then doing the replacements after Jiff. I think it would be simpler to just perform the replacement in advance and not have to worry about adding the placeholder, but if you came across a scenario where that caused an issue this approach would make sense. The other high level comment is related to using jiff-icu. In my prototyping I mainly used jiff-icu to get the locale data: The Jiff ICU library makes the code to do this less flaky and more explicit: Would you be okay if I merge this as is, from the context that there is so far to go and the first PR will be the largest and just adding the locale env variable parser and the dependencies we need already puts us in the right direction and we can make the implementation way more robust in the future. |
| // Only use ICU for non-default locales and when format string contains month or day specifiers | ||
| let use_icu = should_use_icu_locale(); | ||
|
|
||
| if (format_string.contains("%B") |
There was a problem hiding this comment.
I'm wondering if this is redundant since the contains checks will also done inside this statement, but it could also have a perf implication so not entirely sure.
Architecturally this might have a long term approach with modifiers since they will look like %^ but it works fine for now.
| if !full_month.is_empty() { | ||
| temp_format = temp_format.replace("%B", "<<<FULL_MONTH>>>"); | ||
| } | ||
| if !abbrev_month.is_empty() { |
There was a problem hiding this comment.
In the main comment for the review I discussed that I didn't come across an instance where swapping the locale data out conflicted with Jiff, if you had an example I'm interested to know
| let output = result.stdout_str().trim(); | ||
| let expected = format!("{expected_full} {expected_abbrev}"); | ||
|
|
||
| if output == expected { |
There was a problem hiding this comment.
These tests aren't the most useful, they will always succeed even if the output doesn't match.
Also we are using the compiled_data feature so all of the locale data comes built into the binary so we shouldn't ever expect the locale data to not be available
sounds good! |
Fixes: https://bugs.launchpad.net/ubuntu/+source/rust-coreutils/+bug/2130859
implement this: