Skip to content

date: add ICU support for day/months names#10457

Merged
ChrisDryden merged 1 commit intouutils:mainfrom
sylvestre:month-locale
Jan 25, 2026
Merged

date: add ICU support for day/months names#10457
ChrisDryden merged 1 commit intouutils:mainfrom
sylvestre:month-locale

Conversation

@sylvestre
Copy link
Contributor

@sylvestre sylvestre commented Jan 23, 2026

Fixes: https://bugs.launchpad.net/ubuntu/+source/rust-coreutils/+bug/2130859

implement this:

$ LANG=th_TH ./target/debug/coreutils  date +%A
วันเสาร์
$ LANG=ja_JP ./target/debug/coreutils  date +%B
2000年1月1日

@sylvestre sylvestre force-pushed the month-locale branch 2 times, most recently from eedfc7e to d505aa4 Compare January 23, 2026 23:03
@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/shuf/shuf-reservoir (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/sort/sort-stale-thread-mem (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/tail/inotify-dir-recreate (fails in this run but passes in the 'main' branch)

@sylvestre sylvestre force-pushed the month-locale branch 2 times, most recently from dd7465b to bfff822 Compare January 24, 2026 09:09
@github-actions
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/seq/seq-epipe is now passing!

@sylvestre sylvestre marked this pull request as ready for review January 24, 2026 12:42
@github-actions
Copy link

GNU testsuite comparison:

Note: The gnu test tests/env/env-signal-handler is now being skipped but was previously passing.

@ChrisDryden
Copy link
Collaborator

I've spent a bunch of time prototyping approaches to the issue of locale and calendar support in date and I am aligned that the approach is to pre-intercept and do our own replacement before passing it to Jiff. Its unfortunate that it means that we essentially have to recreate the majority of the date parser. That being said reviewing this is a bit tricky because its going to be a long journey to get to full compatibility with this approach and we could go back and forth for a long time with things that don't match the gnu date implementation so I'm going to try to keep it high level. From a review perspective I think it would be better to build forward and merge as is and keep iterating on it to keep the PR's smaller and keep adding more regression test cases.

The high level comments I would have mainly relate to the approach of the placeholder, when testing locally I could not come up with an example where doing the injection before passing it to Jiff would create a conflict compared to injecting the placeholders then doing the replacements after Jiff. I think it would be simpler to just perform the replacement in advance and not have to worry about adding the placeholder, but if you came across a scenario where that caused an issue this approach would make sense.

The other high level comment is related to using jiff-icu. In my prototyping I mainly used jiff-icu to get the locale data:

  let formatted = formatter.format(&date).to_string();
  // Extract month name from formatted date like "15 janvier 2000" or "2000-01-15"
  // Look for a word that contains letters (the month name)
  let words: Vec<&str> = formatted.split_whitespace().collect();
  words.iter()
      .find(|word| word.chars().any(|c| c.is_alphabetic()))
      .map_or_else(String::new, |s| (*s).to_string())

The Jiff ICU library makes the code to do this less flaky and more explicit:

  let icu_zdt = ZonedDateTime::convert_from(&zdt);

  let formatter = DateTimeFormatter::try_new(
      locale!("fr-FR").into(),
      fieldsets::YMDET::long(), 
  )?;
  formatter.format(&icu_zdt).to_string()

Would you be okay if I merge this as is, from the context that there is so far to go and the first PR will be the largest and just adding the locale env variable parser and the dependencies we need already puts us in the right direction and we can make the implementation way more robust in the future.

// Only use ICU for non-default locales and when format string contains month or day specifiers
let use_icu = should_use_icu_locale();

if (format_string.contains("%B")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this is redundant since the contains checks will also done inside this statement, but it could also have a perf implication so not entirely sure.

Architecturally this might have a long term approach with modifiers since they will look like %^ but it works fine for now.

if !full_month.is_empty() {
temp_format = temp_format.replace("%B", "<<<FULL_MONTH>>>");
}
if !abbrev_month.is_empty() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the main comment for the review I discussed that I didn't come across an instance where swapping the locale data out conflicted with Jiff, if you had an example I'm interested to know

let output = result.stdout_str().trim();
let expected = format!("{expected_full} {expected_abbrev}");

if output == expected {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests aren't the most useful, they will always succeed even if the output doesn't match.

Also we are using the compiled_data feature so all of the locale data comes built into the binary so we shouldn't ever expect the locale data to not be available

@sylvestre
Copy link
Contributor Author

Would you be okay if I merge this as is

sounds good!

@ChrisDryden ChrisDryden merged commit e1259d9 into uutils:main Jan 25, 2026
176 of 177 checks passed
@sylvestre sylvestre deleted the month-locale branch January 27, 2026 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants