py: Add support for nested f-strings within f-strings by dpgeorge · Pull Request #18588 · micropython/micropython

dpgeorge · 2025-12-18T13:16:15Z

Summary

During the discussion about t-string support #17557, I thought that it might not be much effort to support nested f-strings with the current f-string parser. And it turned out relatively simple.

The way the MicroPython f-string parser works is:

it extracts the f-string arguments (things in curly braces) into a temporary buffer (a vstr)
once the f-string ends (reaches its closing quote) the lexer switches to tokenizing the temporary buffer
once the buffer is empty it switches back to the stream.

The temporary buffer can easily hold f-strings itself (ie nested f-strings) and they can be re-parsed by the lexer using the same algorithm. The only thing stopping that from working is that the temporary buffer can't be reused for the nested f-string because it's currently being parsed.

This PR fixes that by adding a second temporary buffer, which is the "injection" buffer. That allows arbitrary number of nestings with a simple modification to the original algorithm:

when an f-string is encountered the string is parsed and its arguments are extracted into fstring_args
when the f-string finishes, fstring_args is inserted into the current position in inject_chrs (which is the start of that buffer if no injection is ongoing)
fstring_args is now cleared and ready for any further f-strings (nested or not)
the lexer switches to inject_chrs if it's not already reading from it
if an f-string appeared inside the f-string then it is in inject_chrs and can be processed as before, extracting its arguments into fstring_args, which can then be inserted again into inject_chrs
once inject_chrs is exhausted (meaning that all levels of f-strings have been fully processed) the lexer switched back to tokenizing the stream

Amazingly, this scheme supports arbitrary numbers of nestings of f-strings using the same quote style.

Testing

A new test is added which will run under CI.

Trade-offs and Alternatives

This adds some code size and a bit more memory usage for the lexer. In particular for a single (non-nested) f-string it now makes an extra copy of the fstring_args data, when copying it across to inject_chrs. That could possibly be optimized to reuse the same buffer (inject_chrs would steal the memory from fstring_args).

Otherwise, memory use only goes up with the complexity of nested f-strings.

github-actions · 2025-12-18T13:26:40Z

Code size report:

Reference:  tests/run-tests.py: Abort test run if enter_raw_repl fails many times. [6436f8b]
Comparison: py/lexer: Move f-string completion code to more logical location. [merge of c9f747c]
  mpy-cross:   -32 -0.008% 
   bare-arm:   -12 -0.021% 
minimal x86:   -94 -0.050% 
   unix x64:   -80 -0.009% standard
      stm32:   +44 +0.011% PYBV10
      esp32:   +20 +0.001% ESP32_GENERIC
     mimxrt:   +48 +0.013% TEENSY40
        rp2:   +56 +0.006% RPI_PICO_W
       samd:   +44 +0.016% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:   +66 +0.014% VIRT_RV32

codecov · 2025-12-18T13:39:59Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.41%. Comparing base (6436f8b) to head (c9f747c).
⚠️ Report is 6 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master   #18588   +/-   ##
=======================================
  Coverage   98.41%   98.41%           
=======================================
  Files         171      171           
  Lines       22324    22326    +2     
=======================================
+ Hits        21971    21973    +2     
  Misses        353      353

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

dlech

Impressive that this can be done without increasing code size much. The commit messages are a bit short on the "why" reasoning behind the changes.

dlech · 2025-12-22T16:54:22Z

py/lexer.c

@@ -409,7 +417,7 @@ static void parse_string_literal(mp_lexer_t *lex, bool is_raw, bool is_fstring)
                        // note: "c" can never be MP_LEXER_EOF because next_char


Comment still references MP_LEXER_EOF.

I turned this into a goto and eliminated the sentinel.

dlech · 2025-12-22T16:55:29Z

py/lexer.c

                        // always inserts a newline at the end of the input stream
                        case '\n':
-                            c = MP_LEXER_EOF;
+                            c = (unichar)(-1);


I guess this could be given a name like MP_LEXER_SENTINEL?

This is now a goto.

dlech · 2025-12-22T16:58:26Z

py/lexer.h

    mp_reader_t reader;         // stream source

-    unichar chr0, chr1, chr2;   // current cached characters from source
+    uint8_t chr0, chr1, chr2;   // current cached characters from source


Comment sounds wrong now. I guess these are bytes of a multi-byte character (or used to test it they are part of a multi-byte character or not).

Comment updated.

dpgeorge · 2025-12-23T01:12:00Z

Impressive that this can be done without increasing code size much. The commit messages are a bit short on the "why" reasoning behind the changes.

Thanks for the review. Yes, the commit messages (and commits themselves) are not finalised because I went back and forth with a failing unix-qemu CI that I couldn't reproduce locally, and at the same time trying to reduce the code size.

dpgeorge · 2026-01-05T00:16:53Z

I spent a lot of time tuning the commits here for code size. Locally I get quite different results to the CI's code size report. I'm using gcc 15.2.1 and arm-none-eabi-gcc 14.2.0. I get for this whole PR:

  mpy-cross:   -64 -0.017%
   bare-arm:   -12 -0.021%
minimal x86:   -74 -0.040%
   unix x64:  -176 -0.021% standard
      stm32:   +28 +0.007% PYBV10
    esp8266:  +500 +0.071% ESP8266_GENERIC
        rp2:   +40 +0.004% RPI_PICO_W

That's pretty good, except esp8266 which increases by a lot due to it's need to load/store all 8/16-bit values using a 32-bit load/store and masking (the -mforce-l32 compiler option). I could add a minor workaround for that to cut it down by -464 bytes (change uint8_t chr1, chr2 to uint32_t chr1, chr2 in mp_lexer_t). Is that worth doing?

dpgeorge · 2026-01-07T00:59:15Z

This has been rebased on latest master to update the code_size CI to use ubuntu-latest, which changes the GCC version used for code size comparison, which changes the results. The code size diff is now slightly better, and matches more closely what I get locally.

This saves about 4 bytes on ARM Cortex-M, and about 50-60 bytes on x86-64. It also allows the upcoming `vstr_ins_strn()` function to be inline as well, and have less of a code-size impact when used. Signed-off-by: Damien George <damien@micropython.org>

This is now an easy function to define as inline, so it does not impact code size unless it's used. Signed-off-by: Damien George <damien@micropython.org>

Having this check takes code size and execution time, and it's not necessary: all callers of this function pass a non-zero value for `byte_len` already. And even if `byte_len` was zero, the code would still perform correctly. Signed-off-by: Damien George <damien@micropython.org>

The null byte cannot exist in source code (per CPython), so use it to indicate the end of the input stream (instead of `(mp_uint_t)-1`). This allows the cache chars (chr0/1/2 and their saved versions) to be 8-bit bytes, making it clear that they are not `unichar` values. It also saves a bit of memory in the `mp_lexer_t` data structure. (And in a future commit allows the saved cache chars to be eliminated entirely by storing them in a vstr instead.) In order to keep code size down, the frequently used `chr0` is still of type `uint32_t`. Having it 32-bit means that machine instructions to load it are smaller (it adds about +80 bytes to Thumb code if `chr0` is changed to `uint8_t`). Also add tests for invalid bytes in the input stream to make sure there are no regressions in this regard. Signed-off-by: Damien George <damien@micropython.org>

It turns out that it's relatively simple to support nested f-strings, which is what this commit implements. The way the MicroPython f-string parser works at the moment is: 1. it extracts the f-string arguments (things in curly braces) into a temporary buffer (a vstr) 2. once the f-string ends (reaches its closing quote) the lexer switches to tokenizing the temporary buffer 3. once the buffer is empty it switches back to the stream. The temporary buffer can easily hold f-strings itself (ie nested f-strings) and they can be re-parsed by the lexer using the same algorithm. The only thing stopping that from working is that the temporary buffer can't be reused for the nested f-string because it's currently being parsed. This commit fixes that by adding a second temporary buffer, which is the "injection" buffer. That allows arbitrary number of nestings with a simple modification to the original algorithm: 1. when an f-string is encountered the string is parsed and its arguments are extracted into `fstring_args` 2. when the f-string finishes, `fstring_args` is inserted into the current position in `inject_chrs` (which is the start of that buffer if no injection is ongoing) 3. `fstring_args` is now cleared and ready for any further f-strings (nested or not) 4. the lexer switches to `inject_chrs` if it's not already reading from it 5. if an f-string appeared inside the f-string then it is in `inject_chrs` and can be processed as before, extracting its arguments into `fstring_args`, which can then be inserted again into `inject_chrs` 6. once `inject_chrs` is exhausted (meaning that all levels of f-strings have been fully processed) the lexer switched back to tokenizing the stream. Amazingly, this scheme supports arbitrary numbers of nestings of f-strings using the same quote style. This adds some code size and a bit more memory usage for the lexer. In particular for a single (non-nested) f-string it now makes an extra copy of the `fstring_args` data, when copying it across to `inject_chrs`. Otherwise, memory use only goes up with the complexity of nested f-strings. Signed-off-by: Damien George <damien@micropython.org>

This way, the use of `lex->fstring_args` is fully self contained within the string literal parsing section of `mp_lexer_to_next()`. Signed-off-by: Damien George <damien@micropython.org>

dpgeorge added the py-core Relates to py/ directory in source label Dec 18, 2025

dpgeorge force-pushed the py-lexer-support-nested-fstrings branch from aba1901 to b51b102 Compare December 18, 2025 13:28

dpgeorge mentioned this pull request Dec 18, 2025

py: Add PEP 750 template strings support #17557

Open

dpgeorge force-pushed the py-lexer-support-nested-fstrings branch 4 times, most recently from b0e571f to 6043a9e Compare December 22, 2025 02:36

dlech reviewed Dec 22, 2025

View reviewed changes

dpgeorge force-pushed the py-lexer-support-nested-fstrings branch 2 times, most recently from 12cc3e5 to 5413044 Compare January 4, 2026 14:52

dpgeorge requested a review from dlech January 5, 2026 00:21

dpgeorge mentioned this pull request Jan 6, 2026

py: Implement PEP 750 t-strings using existing f-string parser (WIP) #18650

Draft

dpgeorge force-pushed the py-lexer-support-nested-fstrings branch from 928f161 to 1fbeb16 Compare January 7, 2026 00:30

dlech approved these changes Jan 10, 2026

View reviewed changes

dpgeorge added 6 commits February 4, 2026 23:19

py/vstr: Add vstr_ins_strn helper function.

5f59f39

This is now an easy function to define as inline, so it does not impact code size unless it's used. Signed-off-by: Damien George <damien@micropython.org>

py/lexer: Move f-string completion code to more logical location.

c9f747c

This way, the use of `lex->fstring_args` is fully self contained within the string literal parsing section of `mp_lexer_to_next()`. Signed-off-by: Damien George <damien@micropython.org>

dpgeorge force-pushed the py-lexer-support-nested-fstrings branch from 1fbeb16 to c9f747c Compare February 4, 2026 12:22

dpgeorge merged commit c9f747c into micropython:master Feb 4, 2026
76 checks passed

dpgeorge deleted the py-lexer-support-nested-fstrings branch February 4, 2026 12:45

		@@ -409,7 +417,7 @@ static void parse_string_literal(mp_lexer_t *lex, bool is_raw, bool is_fstring)
		// note: "c" can never be MP_LEXER_EOF because next_char

Uh oh!

Conversation

dpgeorge commented Dec 18, 2025

Summary

Testing

Trade-offs and Alternatives

Uh oh!

github-actions bot commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dlech left a comment

Choose a reason for hiding this comment

Uh oh!

dlech Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

dpgeorge Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

dlech Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

dpgeorge Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

dlech Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

dpgeorge Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

dpgeorge commented Dec 23, 2025

Uh oh!

dpgeorge commented Jan 5, 2026

Uh oh!

dpgeorge commented Jan 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

github-actions bot commented Dec 18, 2025 •

edited

Loading

codecov bot commented Dec 18, 2025 •

edited

Loading