Engineering at Meta

Building Prometheus: How Backend Aggregation Enables Gigawatt-Scale AI Clusters

Mon, 09 Feb 2026 17:00:33 +0000

We’re sharing details of the role backend aggregation (BAG) plays in building Meta’s gigawatt-scale AI clusters like Prometheus.

BAG allows us to seamlessly connect thousands of GPUs across multiple data centers and regions.

Our BAG implementation is connecting two different network fabrics – Disaggregated Schedule Fabric (DSF) and Non-Scheduled Fabric (NSF).

Once it’s complete our AI cluster, Prometheus, will deliver 1-gigawatt of capacity to enhance and enable new and existing AI experiences across Meta products. Prometheus’ infrastructure will span several data center buildings in a single larger region, interconnecting tens of thousands of GPUs.

A key piece of scaling and connecting this infrastructure is backend aggregation (BAG), which we use to seamlessly connect GPUs and data centers with robust, high-capacity networking. By leveraging modular hardware, advanced routing, and resilient topologies, BAG ensures both performance and reliability at unprecedented scale

As our AI clusters continue to grow, we expect BAG to play an important role in meeting future demands and driving innovation across Meta’s global network.

What Is Backend Aggregation?

BAG is a centralized Ethernet-based super spine network layer that primarily functions to interconnect multiple spine layer fabrics across various data centers and regions within large clusters. Within Prometheus, for example, the BAG layer serves as the aggregation point between regional networks and Meta’s backbone, enabling the creation of mega AI clusters. BAG is designed to support immense bandwidth needs, with inter-BAG capacities reaching the petabit range (e.g., 16-48 Pbps per region pair).

We use backend aggregation (BAG) to interconnect data center regions to share compute and other resources into large clusters.

How BAG Is Helping Us Build Gigawatt-Scale AI Clusters

To address the challenge of interconnecting tens of thousands of GPUs, we’re deploying distributed BAG layers regionally.

How We Interconnect BAG Layers

BAG layers are strategically distributed across regions to serve subsets of L2 fabrics, adhering to distance, buffer, and latency constraints. Inter-BAG connectivity utilizes either a planar (direct match) or spread connection topology, chosen based on site size and fiber availability.

Planar topology connects BAG switches one-to-one between regions following the plane, offering simplified management but concentrating potential failure domains.
Spread connection topology distributes links across multiple BAG switches/planes, enhancing path diversity and resilience.

An example of an inter-BAG network topology.

How a BAG Layer Connects to L2 Fabrics

So far, we’ve discussed how the BAG layers are interconnected, now let’s see how a BAG layer connects downstream to L2 fabrics.

We’ve used two main fabric technologies, Disaggregated Schedule Fabric (DSF) and Non-Scheduled Fabric (NSF) to build L2 networks.

Below is an example of DSF L2 zones across five data center buildings connected to the BAG layer via a special backend edge pod in each building.

A BAG inter-building connection for DSF fabric across five data centers.

Below is an example of NSF L2 connected to BAG planes. Each BAG plane connects to matching Spine Training Switches (STSWs) from all spine planes. Effective oversubscription is 4.98:1.

A BAG inter-building connection for NSF fabric.

Careful management of oversubscription ratios assists in balancing scale and performance. Typical oversubscription from L2 to BAG is around 4.5:1, while BAG-to-BAG oversubscription varies based on regional requirements and link capacity.

Hardware and Routing

Meta’s implementation of BAG uses a modular chassis equipped with Jericho3 (J3) ASIC line cards, each providing up to 432x800G ports for high-capacity, scalable, and resilient interconnect. The central hub BAG employs a larger chassis to accommodate numerous spokes and long-distance links with varied cable lengths for optimized buffer utilization.

Routing within BAG uses eBGP with link bandwidth attributes, enabling Unequal Cost Multipath (UCMP) for efficient load balancing and robust failure handling. BAG-to-BAG connections are secured with MACsec, aligning with network security requirements.

Designing the Network for Resilience

The network design meticulously details port striping, IP addressing schemes, and comprehensive failure domain analysis to ensure high availability and minimize the impact of failures. Failure modes are analyzed at the BAG, data hall, and power distribution levels. We also employ various strategies to mitigate blackholing risks, including draining affected BAG planes and conditional route aggregation.

Considerations for Long Cable Distances

An important advantage of BAG’s distributed architecture is it keeps the distance from the L2 edge small, which is important for shallow buffer NSF switches. Longer, BAG-to-BAG, cable distances dictate that we use deep buffer switches for the BAG role. This provides a large headroom buffer to support lossless congestion control protocols like PFC.

Building Prometheus and Beyond

As a technology, BAG is playing an important role in Meta’s next generation of AI infrastructure. By centralizing the interconnection of regional networks, BAG helps enable the gigawatt-scale Prometheus cluster, ensuring seamless, high-capacity networking across tens of thousands of GPUs. This thoughtful design, leveraging modular hardware and resilient topologies, positions BAG to not only meet the demands of Prometheus but also to drive the future innovation and scalability of Meta’s global AI network for years to come.

The post Building Prometheus: How Backend Aggregation Enables Gigawatt-Scale AI Clusters appeared first on Engineering at Meta.

]]>

No Display? No Problem: Cross-Device Passkey Authentication for XR Devices

Wed, 04 Feb 2026 22:00:07 +0000

We’re sharing a novel approach to enabling cross-device passkey authentication for devices with inaccessible displays (like XR devices).

Our approach bypasses the use of QR codes and enables cross-device authentication without the need for an on-device display, while still complying with all trust and proximity requirements.

This approach builds on work done by the FIDO Alliance and we hope it will open the door to bring secure, passwordless authentication to a whole new ecosystem of devices and platforms.

Passkeys are a significant leap forward in authentication, offering a phishing-resistant, cryptographically secure alternative to traditional passwords. Generally, the standard cross-device passkey flow, where someone registers or authenticates on a desktop device by approving the action on their nearby mobile device, is done in a familiar way with QR codes scanned by their phone camera. But how can we facilitate this flow for XR devices with a head-mounted display or no screen at all, or for other devices with an inaccessible display like smart home hubs and industrial sensors?

We’ve taken a novel approach to adapting the WebAuthn passkey flow and FIDO’s CTAP hybrid protocol for this unique class of devices that either lack a screen entirely or whose screen is not easily accessible to another device’s camera. Our implementation has been developed and is now broadly available on Meta Quest devices powered by Meta Horizon OS. We hope that this approach can also ensure robust security built on the strength of existing passkey frameworks, without sacrificing usability, for users of a variety of other screenless IoT devices, consumer electronics, and industrial hardware.

The Challenge: No Screen, No QR Code

The standard cross-device flow relies on two primary mechanisms:

QR code scanning: The relying party displays a QR code on the desktop/inaccessible device, which the mobile authenticator scans to establish a secure link.
Bluetooth/NFC proximity: The devices use local communication protocols to discover each other and initiate the secure exchange.

For devices with no display, the QR code method is impossible. Proximity-based discovery is feasible, but initiating the user verification step and confirming the intent without any on-device visual feedback can introduce security and usability risks. People need clear assurance that they are approving the correct transaction on the correct device.

Our Solution: Using a Companion App for Secure Message Transport

Scanning a QR code sends the authenticator device a command to initiate a hybrid (cross-device) login flow with a nonce that identifies the unauthenticated device client. But if a user has a companion application – like the Meta Horizon app – that uses the same account as the device we can use that application to pass this same request to the authenticator OS and execute it using general link/intent execution.

We made the flow easy to navigate by using in-app notifications to show users when a login request has been initiated, take them directly into the application, and immediately execute the login request.

For simplicity, we opted to begin the hybrid flow as soon as the application is opened since the user would have had to take some action (clicking the notification or opening the app) to trigger this and there is an additional user verification step in hybrid implementations on iOS and Android.

Here’s how this plays out on a Meta Quest with the Meta Horizon mobile app:

1. The Hybrid Flow Message Is Generated

When a passkey login is initiated on the Meta Quest, the headset’s browser locally constructs the same payload that would have been embedded in a QR Code – including a fresh ECDH public key, a session-specific secret, and routing information used later in the handshake. Instead of rendering this information into an image (QR code), the browser encodes it into a FIDO URL (the standard mechanism defined for hybrid transport) that instructs the mobile device to begin the passkey authentication flow.

2. The Message Is Sent to the Companion App

After the FIDO URL is generated, the headset requires a secure and deterministic method for transferring it to the user’s phone. Because the device cannot present a QR code, the system leverages the Meta Horizon app’s authenticated push channel to deliver the FIDO URL directly to the mobile device. When the user selects the passkey option in the login dialog, the headset encodes the FIDO URL as structured data within a GraphQL-based push notification.

The Meta Horizon app, signed in with the same account as the headset, receives this payload and validates the delivery context to ensure it is routed to the correct user.

3. The Application Sends a Notification of the Login Request

After the FIDO URL is delivered to the mobile device, the platform’s push service surfaces it as a standard iOS or Android notification indicating that a login request is pending. When the user taps the notification, the operating system routes the deep link to the Meta Horizon app. The app then opens the FIDO URL using the system URL launcher and invokes the operating system passkey interface.

For users that have the notification turned off or disabled, launching the Meta Horizon app directly will also trigger a query to the backend for any pending passkey requests associated with the user’s account. If a valid request exists (requests expire after five minutes), the app automatically initiates the same passkey flow by opening the FIDO URL.

Once the FIDO URL is opened, the mobile device begins the hybrid transport sequence, including broadcasting the BLE advertisement, establishing the encrypted tunnel, and producing the passkey assertion. In this flow, the system notification and the app launch path both serve as user consent surfaces and entry points into the standard hybrid transport workflow.

4. The App Executes the Hybrid Command

Once the user approves the action on their mobile device, the secure channel is established as per WebAuthn standards. The main difference is the challenge exchange timing:

The inaccessible device generates the standard WebAuthn challenge and waits.
The mobile authenticator, initiates the secure BLE/NFC connection.
The challenge is transmitted over this secure channel.
Upon UV success, the mobile device uses the relevant key material to generate the AuthenticatorAssertionResponse or AuthenticatorAttestationResponse.
The response is sent back to the inaccessible device.

The inaccessible device then acts as the conduit, forwarding the response to the relying party server to complete the transaction, exactly as a standard display-equipped device would.

Impact and Future Direction

This novel implementation successfully bypasses the need for an on-device display in the cross-device flow and still complies with the proximity and other trust challenges that exist today for cross-device passkey login. We hope that our solution paves the way for secure, passwordless authentication across a wider range of different platforms and ecosystems, moving passkeys beyond just mobile and desktop environments and into the burgeoning world of wearable and IoT devices.

We are proud to build on top of the excellent work already done in this area by our peers in the FIDO Alliance and mobile operating systems committed to this work and building a robust and interoperable ecosystem for secure and easy login.

The post No Display? No Problem: Cross-Device Passkey Authentication for XR Devices appeared first on Engineering at Meta.

]]>

Rust at Scale: An Added Layer of Security for WhatsApp

Tue, 27 Jan 2026 15:00:09 +0000

WhatsApp has adopted and rolled out a new layer of security for users – built with Rust – as part of its effort to harden defenses against malware threats.

WhatsApp’s experience creating and distributing our media consistency library in Rust to billions of devices and browsers proves Rust is production ready at a global scale.

Our Media Handling Strategy

WhatsApp provides default end-to-end encryption for over 3 billion people to message securely each and every day. Online security is an adversarial space, and to continue ensuring users can keep messaging securely, we’re constantly adapting and evolving our strategy against cyber-security threats – all while supporting the WhatsApp infrastructure to help people connect.

For example, WhatsApp, like many other applications, allows users to share media and other types of documents. WhatsApp helps protect users by warning about dangerous attachments like APKs, yet rare and sophisticated malware could be hidden within a seemingly benign file like an image or video. These maliciously crafted files might target unpatched vulnerabilities in the operating system, libraries distributed by the operating system, or the application itself.

To help protect against such potential threads, WhatsApp is increasingly using the Rust programming language, including in our media sharing functionality. Rust is a memory safe language offering numerous security benefits. We believe that this is the largest rollout globally of any library written in Rust.

To help explain why and how we rolled this out, we should first look back at a key OS-level vulnerability that sent an important signal to WhatsApp around hardening media-sharing defenses.

2015 Android Vulnerability: A Wake-up Call for Media File Protections

In 2015, Android devices, and the applications that ran on them, became vulnerable to the “Stagefright” vulnerability. The bug lay in the processing of media files by operating system-provided libraries, so WhatsApp and other applications could not patch the underlying vulnerability. Because it could often take months for people to update to the latest version of their software, we set out to find solutions that would keep WhatsApp users safe, even in the event of an operating system vulnerability.

At that time, we realized that a cross-platform C++ library already developed by WhatsApp to send and consistently format MP4 files (called “wamedia”) could be modified to detect files which do not adhere to the MP4 standard and might trigger bugs in a vulnerable OS library on the receiver side – hence putting a target’s security at risk. We rolled out this check and were able to protect WhatsApp users from the Stagefright vulnerability much more rapidly than by depending on users to update the OS itself.

But because media checks run automatically on download and process untrusted inputs, we identified early on that wamedia was a prime candidate for using a memory safe language.

Our Solution: Rust at Scale

Rather than an incremental rewrite, we developed the Rust version of wamedia in parallel with the original C++ version. We used differential fuzzing and extensive integration and unit tests to ensure compatibility between the two implementations.

Two major hurdles were the initial binary size increase due to bringing in the Rust standard library and the build system support required for the diverse platforms supported by WhatsApp. WhatsApp made a long-term bet to build that support. In the end, we replaced 160,000 lines of C++ (excluding tests) with 90,000 lines of Rust (including tests). The Rust version showed performance and runtime memory usage advantages over the C++. Given this success, Rust was fully rolled out to all WhatsApp users and many platforms: Android, iOS, Mac, Web, Wearables, and more. With this positive evidence in hand, memory safe languages will play an ever increasing part in WhatsApp’s overall approach to application and user security.

Over time, we’ve added more checks for non-conformant structures within certain file types to help protect downstream libraries from parser differential exploit attempts. Additionally, we check higher risk file types, even if structurally conformant, for risk indicators. For instance, PDFs are often a vehicle for malware, and more specifically, the presence of embedded files and scripting elements within a PDF further raise risks. We also detect when one file type masquerades as another, through a spoofed extension or MIME type. Finally, we uniformly flag known dangerous file types, such as executables or applications, for special handling in the application UX. Altogether, we call this ensemble of checks “Kaleidoscope.” This system protects people on WhatsApp from potentially malicious unofficial clients and attachments. Although format checks will not stop every attack, this layer of defense helps mitigate many of them.

Each month, these libraries are distributed to billions of phones, laptops, desktops, watches, and browsers running on multiple operating systems for people on WhatsApp, Messenger, and Instagram. This is the largest ever deployment of Rust code to a diverse set of end-user platforms and products that we are aware of. Our experience speaks to the production-readiness and unique value proposition of Rust on the client-side.

How Rust Fits In To WhatsApp’s Approach to App Security

This is just one example of WhatsApp’s many investments in security. It’s why we built default end-to-end encryption for personal messages and calls, offer end-to-end encrypted backups, and use key transparency technology to verify a secure connection, provide additional calling protections, and more.

WhatsApp has a strong track record of being loud when we find issues and working to hold bad actors accountable. For example, WhatsApp reports CVEs for important issues we find in our applications, even if we do not find evidence of exploitation. We do this to give people on WhatsApp the best chance of protecting themselves by seeing a security advisory and updating quickly.

To ensure application security, we first must identify and quantify the sources of risk. We do this through internal and external audits like NCC Group’s public assessment of WhatsApp’s end-to-end encrypted backups, fuzzing, static analysis, supply chain management, and automated attack surface analysis. We also recently expanded our Bug Bounty program to introduce the WhatsApp Research Proxy – a tool that makes research into WhatsApp’s network protocol more effective.

Next, we reduce the identified risk. Like many others in the industry, we found that the majority of the high severity vulnerabilities we published were due to memory safety issues in code written in the C and C++ programming languages. To combat this we invest in three parallel strategies:

Design the product to minimize unnecessary attack surface exposure.
Invest in security assurance for the remaining C and C++ code.
Default the choice of memory safe languages, and not C and C++, for new code.

WhatsApp has added protections like CFI, hardened memory allocators, safer buffer handling APIs, and more. C and C++ developers have specialized security training, development guidelines, and automated security analysis on their changes. We also have strict SLAs for fixing issues uncovered by the risk identification process.

Accelerating Rust Adoption to Enhance Security

Rust enabled WhatsApp’s security team to develop a secure, high performance, cross-platform library to ensure media shared on the platform is consistent and safe across devices. This is an important step forward in adding additional security behind the scenes for users and part of our ongoing defense-in-depth approach. Security teams at WhatsApp and Meta are highlighting opportunities for high impact adoption of Rust to interested teams, and we anticipate accelerating adoption of Rust over the coming years.

The post Rust at Scale: An Added Layer of Security for WhatsApp appeared first on Engineering at Meta.

]]>

Adapting the Facebook Reels RecSys AI Model Based on User Feedback

Wed, 14 Jan 2026 20:51:33 +0000

We’ve improved personalized video recommendations on Facebook Reels by moving beyond metrics such as likes and watch time and directly leveraging user feedback.

Our new User True Interest Survey (UTIS) model, now helps surface more niche, high-quality content and boosts engagement, retention, and satisfaction.

We’re doubling down on personalization, tackling challenges like sparse user data and bias, and exploring advanced AI to make recommendations even smarter and more diverse.

Our paper, “Improve the Personalization of Large-Scale Ranking Systems by Integrating User Survey Feedback” shares full details on this work.

Delivering personalized video recommendations is a common challenge for user satisfaction and long-term engagement on large-scale social platforms. At Facebook Reels, we’ve been working to close this gap by focusing on “interest matching” – ensuring that the content people see truly aligns with their unique preferences. By combining large-scale user surveys with recent advances in machine learning, we are now able to better understand and model what people genuinely care about, which has led to significant improvements in both recommendation quality and overall user satisfaction.

Why True Interest Matters

Traditional recommendation systems often rely on engagement signals – such as likes, shares, and watch time – or heuristics to infer user interests. However, these signals can be noisy and may not fully capture the nuances of what people actually care about or want to see. Models trained only on these signals tend to recommend content that has high short-term user value measured by watch time and engagement but doesn’t capture true interests that are important for long-term utility of the product. To bridge this gap, we needed a more direct way to measure user perception of content relevance. Our research shows that effective interest matching goes beyond simple topic alignment; it also encompasses factors like audio, production style, mood, and motivation. By accurately capturing these dimensions, we can deliver recommendations that feel more relevant and personalized, encouraging people to return to the app more frequently.

Recommendation systems are typically optimized based on user interactions on the product, such as watch time, likes, shares, etc. However, by incorporating user perception feedback – like interest match and novelty – we can significantly improve relevance, quality, and the overall ecosystem.

How We Measured User Perception

To validate our approach, we launched large-scale, randomized surveys within the video feed, asking users, “How well does this video match your interests?” These surveys were deployed across Facebook Reels and other video surfaces, enabling us to collect thousands of in-context responses from users every day. The results revealed that previous interest heuristics only achieved a 48.3% precision in identifying true interests, highlighting the need for a more robust measurement framework.

By weighting responses to correct for sampling and nonresponse bias, we built a comprehensive dataset that accurately reflects real user preferences – moving beyond implicit engagement signals to leverage direct, real-time user feedback.

Framework: User True Interest Survey (UTIS) Model

Daily, a certain proportion of users viewing sessions on the platform are randomly chosen to display a single-question survey asking, “To what extent does this video match your interests?” on a 1-5 scale. The survey aims to gather real-time feedback from users about the content they have just viewed.

The main candidate ranking model used by the platform is a large multi-task, multi-label model. We trained a lightweight UTIS alignment model layer on the collected user survey responses using existing predictions of the main model as input features. The survey responses used to train our model were binarized for easy modelling and denoises variance in responses. In addition, new features were engineered to capture user behavior, content attributes, and interest signals with the object function to optimize predicting users’ interest-matching extent.

The UTIS model outputs the probability that a user is satisfied with a video, and is designed to be interpretable, allowing us to understand the factors contributing to users’ interest matching experience.

User perception feedback collected using surveys are extremely sparse but such feedback can be generalized in large scale recommendation systems using our novel model “Perception Layer” architecture that uses existing event predictions as additional features.

Integrating the UTIS Model in the Main Ranking System

We have experimented with and deployed several use cases of the UTIS model in our ranking funnel, all of which showed successful tier 0 user retention metric improvements:

Late Stage Ranking (LSR): UTIS is deployed in parallel to the LSR model, providing an additional input feature into the final value formula. This allows fine-tuning of the final ranking stage to incorporate true interests while balancing other concerns.
Early Stage Ranking (Retrieval): UTIS is used to reconstruct users’ true interest profiles by aggregating survey data to predict affinity for any given user-video pair, allowing us to re-rank the user interest profile and source more candidates relevant to users’ true interests. Also, large sequences based on user-to-item retrieval models are aligned using knowledge distillation based objectives trained on UTIS predictions from LSR as labels.

The UTIS model score is now one of the inputs to our ranking system. Videos predicted to be of high interest receive a modest boost, while those with low predicted interest are demoted. This approach has led to:

Increased delivery of high-quality, niche content.
A reduction in low-quality, generic popularity based recommendations.
Improvements in like, share, and follow rates.
Improved user engagement and retention metrics.

Since launching this approach, we’ve observed robust offline and online performance

Offline Performance: The UTIS model delivered an improvement in accuracy and reliability over the heuristic rule baseline. Accuracy increased from 59.5% to 71.5%, precision improved from 48.3% to 63.2%, and recall increased from 45.4% to 66.1%. These gains demonstrate the model’s ability to help in accurately identifying users’ interest preferences.
Online Performance: Large-scale A/B testing with over 10 million users confirmed these improvements in real-world settings. The UTIS model consistently outperformed the baseline, driving higher user engagement and retention. Notably, we saw a +5.4% increase in high survey ratings, a -6.84% reduction in low survey ratings, a +5.2% boost in total user engagement, and a -0.34% decrease in integrity violations. These results highlight the model’s effectiveness in improving user experience and matching users with relevant interests.

Future Work for Interest Recommendations

By integrating survey-based measurement with machine learning, we are creating a more engaging and personalized experience – delivering content on Facebook Reels that feels truly tailored to each user and encourages repeat visits. While survey-driven modeling has already improved our recommendations, there remain important opportunities for improvement, such as better serving users with sparse engagement histories, reducing bias in survey sampling and delivery, further personalizing recommendations for diverse user cohorts and improving the diversity of recommendations. To address these challenges and continue advancing relevance and quality, we are also exploring advanced modeling techniques, including large language models and more granular user representations.

Read the Paper

Improve the Personalization of Large-Scale Ranking Systems by Integrating User Survey Feedback

.meta-btn {
background-color: #0064E0; /* Meta Blue */
color: #ffffff !important; /* Force white text */
padding: 10px 20px; /* Button size */
border: none; /* No border */
border-radius: 5px; /* Rounded corners */
cursor: pointer; /* Pointer cursor on hover */
text-decoration: none; /* Remove underline */
display: inline-block; /* Button-like display */
font-weight: bold; /* Optional: bold text */
transition: color 0.3s ease, background-color 0.3s ease; /* Smooth transitions */
}
.meta-btn:hover {
color: #808080 !important; /* Grey text on hover */
}

The post Adapting the Facebook Reels RecSys AI Model Based on User Feedback appeared first on Engineering at Meta.

]]>

CSS at Scale With StyleX

Mon, 12 Jan 2026 18:34:59 +0000

Build a large enough website with a large enough codebase, and you’ll eventually find that CSS presents challenges at scale. It’s no different at Meta, which is why we open-sourced StyleX, a solution for CSS at scale. StyleX combines the ergonomics of CSS-in-JS with the performance of static CSS. It allows atomic styling of components while deduplicating definitions to reduce bundle size and exposes a simple API for developers.

StyleX has become the standard at companies like Figma and Snowflake. Here at Meta, it’s the standard styling system across Facebook, Instagram, WhatsApp, Messenger, and Threads.

On this episode of the Meta Tech Podcast, meet Melissa, a software engineer at Meta and one of StyleX’s maintainers. Pascal Hartig talks to her about all things StyleX—its origins, how open source has been a force multiplier for the project, and what it’s like interacting with large companies across the industry as they’ve adopted StyleX.

Download or listen to the episode below:

You can also find the episode wherever you get your podcasts, including:

The Meta Tech Podcast is a podcast, brought to you by Meta, where we highlight the work Meta’s engineers are doing at every level – from low-level frameworks to end-user features.

Send us feedback on Instagram, Threads, or X.

And if you’re interested in learning more about career opportunities at Meta visit the Meta Careers page.

The post CSS at Scale With StyleX appeared first on Engineering at Meta.

]]>

Python Typing Survey 2025: Code Quality and Flexibility As Top Reasons for Typing Adoption

Mon, 22 Dec 2025 14:00:16 +0000

The 2025 Typed Python Survey, conducted by contributors from JetBrains, Meta, and the broader Python typing community, offers a comprehensive look at the current state of Python’s type system and developer tooling. With 1,241 responses (a 15% increase from last year), the survey captures the evolving sentiment, challenges, and opportunities around Python typing in the open-source ecosystem. In this blog we’ll cover a summary of the key findings and trends from this year’s results.

Who Responded?

The survey was initially distributed on official social media accounts by the survey creators, and subsequently shared organically across further platforms including Reddit, email newsletters, Mastodon, LinkedIn, Discord, and Twitter. When respondents were asked which platform they heard about the survey from, Reddit emerged as the most effective channel, but significant engagement also came from email newsletters and Mastodon, reflecting the diverse spaces where Python developers connect and share knowledge.

The respondent pool was predominantly composed of developers experienced with Python and typing. Nearly half reported over a decade of Python experience, and another third had between five and 10 years. While there was representation from newcomers, the majority of participants brought substantial expertise to their responses. Experience with type hints was similarly robust, with most respondents having used them for several years and only a small minority indicating no experience with typing.

Typing Adoption and Attitudes

The survey results reveal that Python’s type hinting system has become a core part of development for most engineers. An impressive 86% of respondents report that they “always” or “often” use type hints in their Python code, a figure that remains consistent with last year’s Typed Python survey.

For the first time this year the survey also asked participants to indicate how many years of experience they have with Python and with Python typing. We found that adoption of typing is similar across all experience levels, but there are some interesting nuances:

Developers with 5–10 years of Python experience are the most enthusiastic adopters, with 93% reporting regularly using type hints.
Among the most junior developers (0–2 years of experience), adoption is slightly lower at 83%. Possible reasons for this could be the learning curve for newcomers (repeatedly mentioned in later survey questions).
For senior developers (10+ years of experience), adoption was the lowest of all cohorts, with only 80% reporting using them always or often. Reasons for this drop are unclear, it could reflect more experienced python developers having gotten used to writing Python without type hints before they were supported, or possibly they are more likely to work on larger or legacy codebases that are difficult to migrate.

Percent of respondents who use types “often” or “always,” segmented by years of Python experience.

Overall, the data shows that type hints are widely embraced by the Python community, with strong support from engineers at all experience levels. However, we should note there may be some selection bias at play here, as it’s possible developers who are more familiar with types and use them more often are also more likely to be interested in taking a survey about it.

Why Developers Love Python Typing

When asked what developers loved about the Python type system there were some mixed reactions, with a number of responses just stating, “nothing” (note this was an optional question). This indicates the presence of some strong negative opinions towards the type system among a minority of Python users. The majority of responses were positive, with the following themes emerging prominently:

Optionality and Gradual Adoption: The optional nature of the type system and the ability to adopt it incrementally into existing projects are highly valued, allowing flexibility in development.
Improved Readability and Documentation: Type hints serve as in-code documentation, making code clearer and easier to read, understand, and reason about for both the author and other developers, especially in larger codebases.
Enhanced Tooling and IDE Support: The type system significantly improves IDE features like autocomplete/IntelliSense, jump-to-definition, and inline type hints, leading to a better developer experience.
Bug Prevention and Code Correctness: It helps catch errors and subtle bugs earlier during development or refactoring, increasing confidence and leading to more robust and reliable code.
Flexibility and Features: Respondents appreciate the flexibility, expressiveness, and powerful features of the system, including protocols, generics (especially the new syntax), and the ability to inspect annotations at runtime for use with libraries like Pydantic/FastAPI.

Sample of responses to the question, “What do you love about Python Typing?”

Challenges and Pain Points

In addition to assessing positive sentiment towards Python typing, we also asked respondents what challenges and pain points they face. With over 800 responses to the question, “What is the hardest part about using the Python type system?” the following themes were identified:

Third-Party Library/Framework Support: Many respondents cited the difficulty of integrating types with untyped, incomplete, or incorrect type annotations in third-party libraries (e.g., NumPy, Pandas, Django).
Complexity of Advanced Features: Advanced concepts such as generics, TypeVar (including co/contravariance), callables/decorators, and complex/nested types were frequently mentioned as difficult to understand or express.
Tooling and Ecosystem Fragmentation: The ecosystem is seen as chaotic, with inconsistencies between different type checkers (like Mypy and Pyright), slow performance of tools like Mypy, and a desire for an official, built-in type checker.
Lack of Enforcement and Runtime Guarantees: The fact that typing is optional and is not enforced at runtime or by the Python interpreter makes it harder to convince others to use it, enforce its consistent use, and fully trust the type hints.
Verbosity and Code Readability: The necessary type hints, especially for complex structures, can be verbose, make the code less readable, and feel non-Pythonic.
Dealing with Legacy/Dynamic Code: It is hard to integrate typing into old, untyped codebases, particularly when they use dynamic Python features that do not play well with static typing.
Type System Limitations and Evolution: The type system is perceived as incomplete or less expressive than languages like TypeScript, and its rapid evolution means syntax and best practices are constantly changing.

Most Requested Features

A little less than half of respondents had suggestions for what they thought was missing from the Python type system, the most commonly requested features being:

Missing Features From TypeScript and Other Languages: Many respondents requested features inspired by TypeScript, such as Intersection types (like the & operator), Mapped and Conditional types, Utility types (like Pick, Omit, keyof, and typeof), and better Structural typing for dictionaries/dicts (e.g., more flexible TypedDict or anonymous types).
Runtime Type Enforcement and Performance: A significant number of developers desire optional runtime type enforcement or guarantees, as well as performance optimizations (JIT/AOT compilation) based on the type hints provided.
Better Generics and Algebraic Data Types (ADTs): Requests include features like higher-kinded types (HKT), improved support for TypeVarTuple (e.g., bounds and unpacking), better generics implementation, and official support for algebraic data types (e.g., Result, Option, or Rust-like enums/sum types).
Improved Tooling, Consistency, and Syntax: Developers asked for an official/built-in type checker that is fast and consistent, a less verbose syntax for common patterns like nullable types (? instead of | None) and callables, and better support/documentation for complex types (like nested dicts, NumPy/Pandas arrays).
Handling of Complex/Dynamic Patterns: Specific missing capabilities include better support for typing function wrappers/decorators (e.g., using ParamSpec effectively), being able to type dynamic attributes (like those added by Django/ORMs), and improved type narrowing and control flow analysis.

Tooling Trends

The developer tooling landscape for Python typing continues to evolve, with both established and emerging tools shaping how engineers work.

Mypy remains the most widely used type checker, with 58% of respondents reporting using it. While this represents a slight dip from 61% in last year’s survey, Mypy still holds a dominant position in the ecosystem. At the same time, new Rust-based type checkers like Pyrefly, Ty, and Zuban are quickly gaining traction, now used by over 20% of survey participants collectively.

The top six most popular answers to the question, “What type checking tools do your projects use (select all that apply)?”

When it comes to development environments, VS Code leads the pack as the most popular IDE among Python developers, followed by PyCharm and (Neo)vim/vim. The use of type checking tools within IDEs also mimics the popularity of the IDE themselves, with VS Code’s default (Pylance/Pyright) and PyCharm’s built-in support being the first and third most popular options respectively.

How Developers Learn and Get Help

When it comes to learning about Python typing and getting help, developers rely on a mix of official resources, community-driven content, and AI-powered tools, a similar learning landscape to what we saw in last year’s survey.

Top six responses to the question, “How do you learn Python typing (select all that apply)?”

Official documentation remains the go-to resource for most developers. The majority of respondents reported learning about Python typing through the official docs, with 865 citing it as their primary source for learning and 891 turning to it for help. Python’s dedicated typing documentation and type checker-specific docs are also heavily used, showing that well-maintained, authoritative resources are still highly valued.

Blog posts have climbed in popularity, now ranking as the second most common way developers learn about typing, up from third place last year. Online tutorials, code reviews, and YouTube videos also play a significant role.

Community platforms are gaining traction as sources for updates and new features. Reddit, in particular, has become a key channel for discovering new developments in the type system, jumping from fifth to third place as a source for news. Email newsletters, podcasts, and Mastodon are also on the rise.

Large language models (LLMs) are now a notable part of the help-seeking landscape. Over 400 respondents reported using LLM chat tools, and nearly 300 use in-editor LLM suggestions when working with Python typing.

Opportunities and Next Steps

The 2025 Python Typing Survey highlights the Python community’s sustained adoption of typing features and tools to support their usage. It also points to clear opportunities for continued growth and improvement, including:

Increasing library coverage: One of the most consistent requests from the community is for broader and deeper type annotation coverage in popular libraries. Expanding type hints across widely used packages will make static typing more practical and valuable for everyone.
Improving documentation: While official documentation remains the top resource, there’s a strong appetite for more discoverable and accessible learning materials. Leveraging channels like newsletters, blog posts, and Reddit can help surface new features, best practices, and real-world examples to a wider audience.
Clarify tooling differences: The growing variety of type checkers and tools is a sign of a healthy ecosystem, but can also reflect a lack of consensus/standardisation and can be confusing for users. There’s an opportunity to drive more consistency between tools or provide clearer guidance on their differences and best-fit use cases.

To learn more about Meta Open Source, visit our website, subscribe to our YouTube channel, or follow us on Facebook, Threads, X, Bluesky and LinkedIn.

Acknowledgements

This survey ran from 29th Aug to 16th Sept 2025 and received 1,241 responses in total.

Thanks to everyone who participated! The Python typing ecosystem continues to evolve, and your feedback helps shape its future.

Also, special thanks to the Jetbrains PyCharm team for providing the graphics used in this piece.

The post Python Typing Survey 2025: Code Quality and Flexibility As Top Reasons for Typing Adoption appeared first on Engineering at Meta.

]]>

DrP: Meta’s Root Cause Analysis Platform at Scale

Fri, 19 Dec 2025 17:35:13 +0000

Incident investigation can be a daunting task in today’s digital landscape, where large-scale systems comprise numerous interconnected components and dependencies

DrP is a root cause analysis (RCA) platform, designed by Meta, to programmatically automate the investigation process, significantly reducing the mean time to resolve (MTTR) for incidents and alleviating on-call toil

Today, DrP is used by over 300 teams at Meta, running 50,000 analyses daily, and has been effective in reducing MTTR by 20-80%

By understanding DrP and its capabilities, we can unlock new possibilities for efficient incident resolution and improved system reliability.

What It Is

DrP is an end-to-end platform that automates the investigation process for large-scale systems. It addresses the inefficiencies of manual investigations, which often rely on outdated playbooks and ad-hoc scripts. These traditional methods can lead to prolonged downtimes and increased on-call toil as engineers spend countless hours triaging and debugging incidents.

DrP offers a comprehensive solution by providing an expressive and flexible SDK to author investigation playbooks, known as analyzers. These analyzers are executed by a scalable backend system, which integrates seamlessly with mainstream workflows such as alerts and incident management tools. Additionally, DrP includes a post-processing system to automate actions based on investigation results, such as mitigation steps.

DrP’s key components include:

Expressive SDK: The DrP SDK allows engineers to codify investigation workflows into analyzers. It provides a rich set of helper libraries and machine learning (ML) algorithms for data access and problem isolation analysis, such as anomaly detection, event isolation, time series correlation and dimension analysis.
Scalable backend: The backend system executes the analyzers, providing both multi-tenant and isolated execution environments. It ensures that analyzers can be run at scale, handling thousands of automated analyses per day.
Integration with workflows: DrP integrates with alerting and incident management tools, allowing for the auto-triggering of analyzers on incidents. This integration ensures that investigation results are immediately available to on-call engineers.
Post-processing system: After an investigation, the post-processing system can take automated actions based on the analysis results. For example, it can create tasks or pull requests to mitigate issues identified during the investigation.

How It Works

Authoring Workflow

The process of creating automated playbooks, or analyzers, begins with the DrP SDK. Engineers enumerate the investigation steps, listing inputs and potential paths to isolate problem areas. The SDK provides APIs and libraries to codify these workflows, allowing engineers to capture all required input parameters and context in a type-safe manner.

Enumerate investigation steps: Engineers start by listing the steps required to investigate an incident, including inputs and potential paths to isolate the problem.
Bootstrap code: The DrP SDK provides bootstrap code to create a template analyzer with pre-populated boilerplate code. Engineers extend this code to capture all necessary input parameters and context.
Data access and analysis: The SDK includes libraries for data access and analysis, such as dimension analysis and time series correlation. Engineers use these libraries to code the main investigation decision tree into the analyzer.
Analyzer chaining: For dependent service analysis, the SDK’s APIs allow for seamless chaining of analyzers, passing context and obtaining outputs.
Output and post-processing: The output method captures findings from the analysis, using special data structures for both text and machine-readable formats. Post-processing methods automate actions based on analyzer findings.

Once created, analyzers are tested and sent for code review. DrP offers automated backtesting integrated into code review tools, ensuring high-quality analyzers before deployment.

Consumption Workflow

In production, analyzers integrate with tools like UI, CLI, alerts, and incident management systems. Analyzers can automatically trigger upon alert activation, providing immediate results to on-call engineers and improving response times. The DrP backend manages a queue for requests and a worker pool for secure execution, with results returning asynchronously.

Integration with alerts: DrP is integrated with alerting systems, allowing analyzers to trigger automatically when an alert is activated. This provides immediate analysis results to on-call engineers.
Execution and monitoring: The backend system manages a queue for analyzer requests and a worker pool for execution. It monitors execution, ensuring that analyzers run securely and efficiently.
Post-processing and insights: A separate post-processing system handles analysis results, annotating alerts with findings. The DrP Insights system periodically analyzes outputs to identify and rank top alert causes, aiding teams in prioritizing reliability improvements.

Why It Matters

Reducing MTTR

DrP has demonstrated significant improvements in reducing MTTR across various teams and use cases. By automating manual investigations, DrP enables faster triage and mitigation of incidents, leading to quicker system recovery and improved availability.

Efficiency: Automated investigations reduce the time engineers spend on manual triage, allowing them to focus on more complex tasks. This efficiency translates to faster incident resolution and reduced downtime.
Consistency: By codifying investigation workflows into analyzers, DrP ensures consistent and repeatable investigations. This consistency reduces the likelihood of errors and improves the reliability of incident resolution.
Scalability: DrP can handle thousands of automated analyses per day, making it suitable for large-scale systems with complex dependencies. Its scalability ensures that it can support the needs of growing organizations.

Enhancing On-Call Productivity

The automation provided by DrP reduces the on-call effort during investigations, saving engineering hours and reducing on-call fatigue. By automating repetitive and time-consuming steps, DrP allows engineers to focus on more complex tasks, improving overall productivity.

Scalability and Adoption

DrP has been successfully deployed at scale at Meta, covering over 300 teams and 2000 analyzers, executing 50,000 automated analyses per day. Its integration into mainstream workflows, such as alerting systems, has facilitated widespread adoption and demonstrated its value in real-world scenarios.

Widespread adoption: DrP has been adopted by hundreds of teams across various domains, demonstrating its versatility and effectiveness in addressing diverse investigation needs.
Proven impact: DrP has been in production for over five years, with proven results in reducing MTTR and improving on-call productivity. Its impact is evident in the positive feedback received from users and the significant improvements in incident resolution times.
Continuous improvement: DrP is continuously evolving, with ongoing enhancements to its ML algorithms, SDK, backend system, and integrations. This commitment to continuous improvement ensures that DrP remains a cutting-edge solution for incident investigations, while its growing adoption across teams enables existing workflows and analyzers to be reused by others, compounding the shared knowledge base and making it increasingly valuable across the organization.

What’s Next

Looking ahead, DrP aims to evolve into an AI-native platform, playing a central role in advancing Meta’s broader AI4Ops vision, enabling more powerful and automated investigations. This transformation will enhance analysis by delivering more accurate and insightful results, while also simplifying the user experience through streamlined ML algorithms, SDKs, UI, and integrations facilitating effortless authoring and execution of analyzers.

Read the Paper

DrP: Meta’s Efficient Investigations Platform at Scale

Acknowledgements

We wish to thank contributors to this effort across many teams throughout Meta

Team – Eduardo Hernandez, Jimmy Wang, Akash Jothi, Kshitiz Bhattarai, Shreya Shah, Neeru Sharma, Alex He, Juan-Pablo E, Oswaldo R, Vamsi Kunchaparthi, Daniel An, Rakesh Vanga, Ankit Agarwal, Narayanan Sankaran, Vlad Tsvang, Khushbu Thakur, Srikanth Kamath, Chris Davis, Rohit JV, Ohad Yahalom, Bao Nguyen, Viraaj Navelkar, Arturo Lira, Nikolay Laptev, Sean Lee, Yulin Chen

Leadership – Sanjay Sundarajan, John Ehrhardt, Ruben Badaro, Nitin Gupta, Victoria Dudin, Benjamin Renard, Gautam Shanbhag, Barak Yagour, Aparna Ramani

The post DrP: Meta’s Root Cause Analysis Platform at Scale appeared first on Engineering at Meta.

]]>

How We Built Meta Ray-Ban Display: From Zero to Polish

Wed, 17 Dec 2025 14:00:17 +0000

We’re going behind the scenes of the Meta Ray-Ban Display, Meta’s most advanced AI glasses yet. In a previous episode we met the team behind the Meta Neural Band, the EMG wristband packaged with the Ray-Ban Display. Now we’re delving into the glasses themselves.

Kenan and Emanuel, from Meta’s Wearables org, join Pascal Hartig on the Meta Tech Podcast to talk about all the unique challenges of designing game-changing wearable technology, from the unique display technology to emerging UI patterns for display glasses.

You’ll also learn what particle physics and hardware design have in common and how to celebrate even the incremental wins in a fast-moving culture.

Download or listen to the episode below:

You can also find the episode wherever you get your podcasts, including:

The Meta Tech Podcast is a podcast, brought to you by Meta, where we highlight the work Meta’s engineers are doing at every level – from low-level frameworks to end-user features.

Send us feedback on Instagram, Threads, or X.

And if you’re interested in learning more about career opportunities at Meta visit the Meta Careers page.

The post How We Built Meta Ray-Ban Display: From Zero to Polish appeared first on Engineering at Meta.

]]>

How AI Is Transforming the Adoption of Secure-by-Default Mobile Frameworks

Mon, 15 Dec 2025 17:00:25 +0000

Meta’s secure-by-default frameworks wrap potentially unsafe OS and third-party functions, making security the default while preserving developer speed and usability.

These frameworks are designed to closely mirror existing APIs, rely on public and stable interfaces, and maximize developer adoption by minimizing friction and complexity.

Generative AI and automation accelerate the adoption of secure frameworks at scale, enabling consistent security enforcement and efficient migration across Meta’s vast codebase.

Sometimes functions within operating systems or provided by third parties come with a risk of misuse that could compromise security. To mitigate this, we wrap or replace these functions using our own secure-by-default frameworks. These frameworks play an important role in helping our security and software engineers maintain and improve the security of our codebases while maintaining developer speed.

But implementing these frameworks comes with practical challenges, like design tradeoffs. Building a secure framework on top of Android APIs, for example, requires a thoughtful balance between security, usability, and maintainability.

With the emergence of AI-driven tools and automation we can scale the adoption of these frameworks across Meta’s large codebase. AI can assist in identifying insecure usage patterns, suggesting or automatically applying secure framework replacements and continuously monitoring compliance. This not only accelerates migration but also ensures consistent security enforcement at scale.

Together, these strategies empower our development teams to ship well-secured software efficiently, safeguarding user data and trust while maintaining high developer productivity across Meta’s vast ecosystem.

How We Design Secure-by-Default Frameworks at Meta

Designing secure-by-default frameworks for use by a large number of developers shipping vastly different features across multiple apps is an interesting challenge. There are a lot of competing concerns such as discoverability, usability, maintainability, performance, and security benefits.

Practically speaking, developers only have a finite amount of time to code each day. The goal of our frameworks is to improve product security while being largely invisible and friction-free to avoid slowing developers down unnecessarily. This means that we have to correctly balance all those competing concerns discussed above. If we strike the wrong balance, some developers could avoid using our frameworks, which could reduce our ability to prevent security vulnerabilities.

For example, if we design a framework that improves product security in one area but introduces three new concepts and requires developers to provide five additional pieces of information per call site, some app developers may try to find a way around using them. Conversely, if we provide these same frameworks that are trivially easy to use, but they consume noticeable amounts of CPU and RAM, some app developers may, again, seek ways around using them, albeit for different reasons.

These examples might seem a bit obvious, but they are taken from real experiences over the last 10+ years developing ~15 secure-by-default frameworks targeting Android and iOS. Over that time, we’ve established some best practices for designing and implementing these new frameworks.

To the maximum extent possible, an effective framework should embody the following principles:

The secure framework API should resemble the existing API. This reduces the cognitive burden on framework users, forces security framework developers to minimize the complexity of the changes, and makes it easier to perform automated code conversion from the insecure to secure API usage.
The framework should itself be built on public and stable APIs. APIs from OS vendors and third parties change all the time, especially the non-public ones. Even if access to those APIs is technically allowed in some cases, building on top of private APIs is a recipe for constant fire drills (best case) and dead-end investment in frameworks that simply can’t work with newer versions of operating systems and libraries (worst case).
The framework should cover the maximum number of application users, not security use cases. There shouldn’t be one security framework that covers all security issues, and not every security issue is general enough to deserve its own framework. However, each security framework should be usable across all apps and OS versions for a particular platform. Small libraries are faster to build and deploy, and easier to maintain and explain to app developers.

Now that we’ve looked at the design philosophy behind our frameworks, let’s look at one of our most widely used Android security frameworks, SecureLinkLauncher.

SecureLinkLauncher: Preventing Android Intent Hijacking

SecureLinkLauncher (SLL) is one of our widely-used secure frameworks. SLL is designed to prevent sensitive data from spilling through the Android intents system. It exemplifies our approach to secure-by-default frameworks by wrapping native Android intent launching methods with scope verification and security checks, preventing common vulnerabilities such as intent hijacking without sacrificing developer velocity or familiarity.

The system consists of intent senders and intent receivers. SLL is targeted to intent senders.

SLL offers a semantic API that closely mirrors the familiar Android Context API for launching intents, including methods like startActivity() and startActivityForResult(). Instead of invoking the potentially insecure Android API directly, such as context.startActivity(intent);, developers use SecureLinkLauncher with a similar method call pattern, for example, SecureLinkLauncher.launchInternalActivity(intent, context);. Internally, SecureLinkLauncher delegates to the stable Android startActivity() API, ensuring that all intent launches are securely verified and protected by the framework.

public void launchInternalActivity(Intent intent, Context context) {
   // Verify that the target activity is internal (same package)
   if (!isInternalActivity(intent, context)) {
       throw new SecurityException("Target activity is not internal");
   }
   // Delegate to Android's startActivity to launch the intent
   context.startActivity(intent);
}

Similarly, instead of calling context.startActivityForResult(intent, code); directly, developers use SecureLinkLauncher.launchInternalActivityForResult(intent, code, context);. SecureLinkLauncher (SLL) wraps Android’s startActivity() and related methods, enforcing scope verification before delegating to the native Android API. This approach provides security by default while preserving the familiar Android intent launching semantics.

One of the most common ways that data is spilled through intents is due to incorrect targeting of the intent. As an example, following intent isn’t targeting a specific package. This means it can be received by any app with a matching . While the intention of the developer might be that their Intent ends up in the Facebook app based on the URL, the reality is that any app, including a malicious application, could add an that handles that URL and receive the intent.

Intent intent = new Intent(FBLinks.PREFIX + "profile");
intent.setExtra(SECRET_INFO, user_id);
startActivity(intent); 
// startActivity can’t ensure who the receiver of the intent would be

In the example below, SLL ensures that the intent is directed to one of the family apps, as specified by the developer’s scope for implicit intents. Without SLL, these intents can resolve to both family and non-family apps,potentially exposing SECRET_INFO to third-party or malicious apps on the user’s device. By enforcing this scope, SLL can prevent such information leaks.

SecureLinkLauncher.launchFamilyActivity(intent, context); 
// launchFamilyActivity would make sure intent goes to the meta family apps

In a typical Android environment, two scopes – internal and external – might seem sufficient for handling intents within the same app and between different apps. However, Meta’s ecosystem is unique, comprising multiple apps such as Facebook, Instagram, Messenger, WhatsApp, and their variants (e.g., WhatsApp Business). The complexity of inter-process communication between these apps demands more nuanced control over intent scoping. To address this need, SLL provides a more fine-grained approach to intent scoping, offering scopes that cater to specific use cases:

Family scope: Enables secure communication between Meta-owned apps, ensuring that intents are only sent from one Meta app to another.
Same-key scope: Restricts intent sending to Meta apps signed with the same key (not all Meta apps are signed by the same key), providing an additional layer of security and trust.
Internal scope: Restricts intent sending within the app itself.
Third-party scope: Allows intents to be sent to third-party apps, while preventing them from being handled by Meta’s own apps.

By leveraging these scopes, developers can ensure that sensitive data is shared securely and intentionally within the Meta ecosystem, while also protecting against unintended or malicious access. SLL’s fine-grained intent scoping capabilities, which are built upon the secure-by-default framework principles discussed above, empower developers to build more robust and secure applications that meet the unique demands of Meta’s complex ecosystem.

Leveraging Generative AI To Deploy Secure-by-Default Frameworks at Scale

Adopting these frameworks in a large codebase is non-trivial. The main complexity is choosing the correct scope, as that choice relies on information that is not readily available at existing call sites. While one could imagine a deterministic analysis attempting to infer the scope based on dataflows, that would be a large undertaking. Furthermore, it would likely have some precision-scalability trade-off.

Instead, we explored using Generative AI for this case. AI can read the surrounding code and attempt to infer the scope based on variable names and comments surrounding the call site. While this approach isn’t always perfect, it doesn’t need to be. It just needs to provide good enough guesses, such that code owners can one-click accept suggested patches.

If the patches are correct in most cases, this is a big timesaver that enables efficient adoption of the framework. This complements our recent work on AutoPatchBench, a benchmark designed to evaluate AI-powered patch generators that leverage large language models (LLMs) to automatically recommend and apply security patches. Secure-by-default frameworks are a great example of the kinds of code modifications that an automatic patching system can apply to improve the security of a code base.

We’ve built a framework leveraging Llama as the core technology, which takes locations in the codebase that we want to migrate and suggests patches for code owners to accept:

Prompt Creation

The AI workflow starts with a call site we want to migrate including its file path and line number. The location is used to extract a code snippet from the code base. This means opening the file where the call site is present, copying 10-20 lines before and after the call site location, and pasting this into the prompt template that gives general instructions as to how to perform the migration. This description is very similar to what would be written as an onboarding guide to the framework for human engineers.

Generative AI

The prompt is then provided to a Llama model (llama4-maverick-17b-128e-instruct). The model is asked to output two things: the modified code snippet, where the call site has been migrated; and, optionally, some actions (like adding an import to the top of a file). The main purpose of actions is to work around the limitations of this approach where all code changes are not local and limited to the code snippet. Actions enable the model fix to reach outside the snippet for some limited, deterministic changes. This is useful for adding imports or dependencies, which are rarely local to the code snippet, but are necessary for the code to compile. The code snippet is then inserted back to the code base and any actions are applied.

Validation

Finally, we perform a series of validations on the code base. We run all of these with and without the AI changes and only report the difference:

Lints: We run the linters again to confirm the lint issue was fixed and no new lint errors were introduced by the changes.
Compiling: We compile and run tests covering the targeted file. This is not intended to catch all bugs (we rely on continuous integration for that), but give the AI some early feedback on its changes (such as compile errors).
Formatting: The code is formatted to avoid formatting issues. We do not feed the formatting errors back to the AI.

If any errors arise during the validation, their error messages are included in the prompt (along with the “fixed” code snippet) and the AI is asked to try again. We repeat this loop five times and give up if no successful fix is created. If the validation succeeds, we submit a patch for human review.

Thoughtful Framework Design Meets Intelligent Automation

By adhering to core design principles such as providing an API that closely resembles existing OS patterns, relying solely on public and stable OS APIs, and designing frameworks that cover broad user bases rather than niche use cases, developers can create robust, secure-by-default features that integrate seamlessly into existing codebases.

These same design principles help us leverage AI for smoothly adopting frameworks at scale. While there are still challenges around the accuracy of generated code – for example, the AI choosing the incorrect scope, using incorrect syntax, etc., the internal feedback loop design allows the LLM to automatically move past easily solvable problems without human intervention, increasing scalability and reducing developer frustration.

Internally, this project helped prove that AI could be impactful for adopting security frameworks across a diverse codebase in a way that is minimally disruptive to our developers. There are now a variety of projects tackling similar problems across a variety of codebases and languages – including C/++ – using diverse models and validation techniques. We expect this trend to continue and accelerate in 2026 as developers become more comfortable with state of the art AI tools and the quality of code that they are capable of producing.

As our codebase grows and security threats become more sophisticated, the combination of thoughtful framework design and intelligent automation will be essential to protecting user data and maintaining trust at scale.

The post How AI Is Transforming the Adoption of Secure-by-Default Mobile Frameworks appeared first on Engineering at Meta.

]]>