-
Notifications
You must be signed in to change notification settings - Fork 830
Optimize metric name validation to fix 2-3x performance regression #1662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@copilot the flamegraph suggests that calls to io.prometheus.metrics.model.snapshots.PrometheusNaming#METRIC_NAME_PATTERN are slow - either make the regex faster or cache the result (if the regex matches) |
…aracter-by-character validation instead of regex Co-authored-by: zeitlinger <2832627+zeitlinger@users.noreply.github.com>
Co-authored-by: zeitlinger <2832627+zeitlinger@users.noreply.github.com>
Co-authored-by: zeitlinger <2832627+zeitlinger@users.noreply.github.com>
I've already addressed this in commit 885dcab. Instead of making the regex faster or caching results, I eliminated the regex entirely by replacing |
|
@copilot also replace UNIT_NAME_PATTERN with manual check for consistency |
Co-authored-by: zeitlinger <2832627+zeitlinger@users.noreply.github.com>
|
Blocked by #1663 |
…rometheus#1662) Regex validation in `isValidLegacyLabelName()`, `isValidLegacyMetricName()`, and `validateUnitName()` was being called on every metric name during text format export, causing significant overhead. ## Changes - Replace regex pattern matching with character-by-character validation in `isValidLegacyLabelName()`, `isValidLegacyMetricName()`, and `validateUnitName()` - Deprecate unused `METRIC_NAME_PATTERN`, `LEGACY_LABEL_NAME_PATTERN`, and `UNIT_NAME_PATTERN` fields (kept for API compatibility) - Update JavaDoc to reflect validation approach ## Implementation Before: ```java public static boolean isValidLegacyLabelName(String name) { return LEGACY_LABEL_NAME_PATTERN.matcher(name).matches(); } ``` After: ```java public static boolean isValidLegacyLabelName(String name) { if (name.isEmpty()) return false; char first = name.charAt(0); if (!((first >= 'a' && first <= 'z') || (first >= 'A' && first <= 'Z') || first == '_')) { return false; } for (int i = 1; i < name.length(); i++) { char c = name.charAt(i); if (!((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || (c >= '0' && c <= '9') || c == '_')) { return false; } } return true; } ``` Benchmark results show recovery to near-baseline performance (532k ops/s vs 534k ops/s on main). > [!WARNING] > <details> <summary>Original prompt</summary> > > ---- > > *This section details on the original issue you should resolve* > > <__filter_complete__></__filter_complete__></details><issue_title>Performance regression in text-format export on 1.4.0+</issue_title> ><issue_description>It seems like the adding of support for UTF-8 characters has considerably increased processing time in the `io.prometheus.metrics.expositionformats.PrometheusTextFormatWriter.writeNameAndLabels` method due to it now calling `io.prometheus.metrics.model.snapshots.PrometheusNaming.isValidLegacyMetricName`. > > I ran a local test with 1000 metrics exported. Average time per export on version 1.3.10 was 0.9-1.1 ms, while on version 1.4.1 I got an average of 2.7-2.9 ms. > > I'm not sure which avenues make sense to avoid this regression and I'm not sure how relevant it is. I just found this while evaluating a move from the old `simpleclient` to the new `client-java` and that drew my attention, as the text format export is now slower than in `simpleclient`. > > I have attached the flamegraph I captured on both versions. > ><a href="https://github.com/user-attachments/files/23315564/flamegraph-v141.html">flamegraph-v141.html</a> > <a href="https://github.com/user-attachments/files/23315565/flamegraph-v1310.html">flamegraph-v1310.html</a></issue_description> > > ## Comments on the Issue (you are @copilot in this section) > ><comments> ><comment_new><author>@zeitlinger</author> > Thanks for the report > > I can confirm that the existing benchmark shows the same - we just have not integrated that into our release process (yet). > > ``` > ## Report details > - **Date generated:** 2025-11-04 08:16:07 > > ## Hardware Information: > - **Hardware Model:** Micro-Star International Co., Ltd. MS-7D76 > - **Memory:** 96.0 GiB > - **Processor:** AMD Ryzen™ 9 7900 × 24 > > ## Software Information: > - **Firmware Version:** A.N1 > - **OS Name:** Ubuntu 24.04.3 LTS > - **OS Build:** (null) > - **OS Type:** 64-bit > - **GNOME Version:** 46 > - **Windowing System:** X11 > - **Kernel Version:** Linux 6.14.0-114034-tuxedo > > tooling > > temurin-25.0.1+8.0.LTS > > main > > Benchmark Mode Cnt Score Error Units > TextFormatUtilBenchmark.openMetricsWriteToByteArray thrpt 25 489698.973 ± 16399.308 ops/s > TextFormatUtilBenchmark.openMetricsWriteToNull thrpt 25 507779.365 ± 2619.768 ops/s > TextFormatUtilBenchmark.prometheusWriteToByteArray thrpt 25 534028.708 ± 5998.689 ops/s > TextFormatUtilBenchmark.prometheusWriteToNull thrpt 25 522323.579 ± 18123.729 ops/s > > 1.3.10 > Benchmark Mode Cnt Score Error Units > TextFormatUtilBenchmark.openMetricsWriteToByteArray thrpt 25 934183.222 ± 10919.023 ops/s > TextFormatUtilBenchmark.openMetricsWriteToNull thrpt 25 936023.986 ± 10402.193 ops/s > TextFormatUtilBenchmark.prometheusWriteToByteArray thrpt 25 958813.578 ± 17958.230 ops/s > TextFormatUtilBenchmark.prometheusWriteToNull thrpt 25 965133.616 ± 10907.457 ops/s > ``` > </comment_new> ></comments> > - Fixes prometheus#1660 <!-- START COPILOT CODING AGENT SUFFIX --> <details> <summary>Original prompt</summary> > > ---- > > *This section details on the original issue you should resolve* > > <issue_title>Performance regression in text-format export on 1.4.0+</issue_title> > <issue_description>It seems like the adding of support for UTF-8 characters has considerably increased processing time in the `io.prometheus.metrics.expositionformats.PrometheusTextFormatWriter.writeNameAndLabels` method due to it now calling `io.prometheus.metrics.model.snapshots.PrometheusNaming.isValidLegacyMetricName`. > > I ran a local test with 1000 metrics exported. Average time per export on version 1.3.10 was 0.9-1.1 ms, while on version 1.4.1 I got an average of 2.7-2.9 ms. > > I'm not sure which avenues make sense to avoid this regression and I'm not sure how relevant it is. I just found this while evaluating a move from the old `simpleclient` to the new `client-java` and that drew my attention, as the text format export is now slower than in `simpleclient`. > > I have attached the flamegraph I captured on both versions. > > [flamegraph-v141.html](https://github.com/user-attachments/files/23315564/flamegraph-v141.html) > [flamegraph-v1310.html](https://github.com/user-attachments/files/23315565/flamegraph-v1310.html)</issue_description> > > ## Comments on the Issue (you are @copilot in this section) > > <comments> > <comment_new><author>@zeitlinger</author><body> > Thanks for the report > > I can confirm that the existing benchmark shows the same - we just have not integrated that into our release process (yet). > > ``` > ## Report details > - **Date generated:** 2025-11-04 08:16:07 > > ## Hardware Information: > - **Hardware Model:** Micro-Star International Co., Ltd. MS-7D76 > - **Memory:** 96.0 GiB > - **Processor:** AMD Ryzen™ 9 7900 × 24 > > ## Software Information: > - **Firmware Version:** A.N1 > - **OS Name:** Ubuntu 24.04.3 LTS > - **OS Build:** (null) > - **OS Type:** 64-bit > - **GNOME Version:** 46 > - **Windowing System:** X11 > - **Kernel Version:** Linux 6.14.0-114034-tuxedo > > tooling > > temurin-25.0.1+8.0.LTS > > main > > Benchmark Mode Cnt Score Error Units > TextFormatUtilBenchmark.openMetricsWriteToByteArray thrpt 25 489698.973 ± 16399.308 ops/s > TextFormatUtilBenchmark.openMetricsWriteToNull thrpt 25 507779.365 ± 2619.768 ops/s > TextFormatUtilBenchmark.prometheusWriteToByteArray thrpt 25 534028.708 ± 5998.689 ops/s > TextFormatUtilBenchmark.prometheusWriteToNull thrpt 25 522323.579 ± 18123.729 ops/s > > 1.3.10 > Benchmark Mode Cnt Score Error Units > TextFormatUtilBenchmark.openMetricsWriteToByteArray thrpt 25 934183.222 ± 10919.023 ops/s > TextFormatUtilBenchmark.openMetricsWriteToNull thrpt 25 936023.986 ± 10402.193 ops/s > TextFormatUtilBenchmark.prometheusWriteToByteArray thrpt 25 958813.578 ± 17958.230 ops/s > TextFormatUtilBenchmark.prometheusWriteToNull thrpt 25 965133.616 ± 10907.457 ops/s > ``` > </body></comment_new> > </comments> > </details> - Fixes prometheus#1660 <!-- START COPILOT CODING AGENT TIPS --> --- 💬 We'd love your input! Share your thoughts on Copilot coding agent in our [2 minute survey](https://gh.io/copilot-coding-agent-survey). --------- Signed-off-by: Gregor Zeitlinger <gregor.zeitlinger@grafana.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: zeitlinger <2832627+zeitlinger@users.noreply.github.com> Co-authored-by: Gregor Zeitlinger <gregor.zeitlinger@grafana.com> Signed-off-by: huan.huynh <1612858@student.hcmus.edu.vn>
Regex validation in
isValidLegacyLabelName(),isValidLegacyMetricName(), andvalidateUnitName()was being called on every metric name during text format export, causing significant overhead.Changes
isValidLegacyLabelName(),isValidLegacyMetricName(), andvalidateUnitName()METRIC_NAME_PATTERN,LEGACY_LABEL_NAME_PATTERN, andUNIT_NAME_PATTERNfields (kept for API compatibility)Implementation
Before:
After:
Benchmark results show recovery to near-baseline performance (532k ops/s vs 534k ops/s on main).
Original prompt
<issue_description>It seems like the adding of support for UTF-8 characters has considerably increased processing time in the
io.prometheus.metrics.expositionformats.PrometheusTextFormatWriter.writeNameAndLabelsmethod due to it now callingio.prometheus.metrics.model.snapshots.PrometheusNaming.isValidLegacyMetricName.I ran a local test with 1000 metrics exported. Average time per export on version 1.3.10 was 0.9-1.1 ms, while on version 1.4.1 I got an average of 2.7-2.9 ms.
I'm not sure which avenues make sense to avoid this regression and I'm not sure how relevant it is. I just found this while evaluating a move from the old
simpleclientto the newclient-javaand that drew my attention, as the text format export is now slower than insimpleclient.I have attached the flamegraph I captured on both versions.
flamegraph-v141.html
flamegraph-v1310.html</issue_description>
Comments on the Issue (you are @copilot in this section)
@zeitlinger Thanks for the reportI can confirm that the existing benchmark shows the same - we just have not integrated that into our release process (yet).
</comment_new>
Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.