Apple Machine Learning Research

Apple Machine Learning Research https://machinelearning.apple.com Apple machine learning teams are engaged in state of the art research in machine learning and artificial intelligence. Learn about the latest advancements. en Wed, 18 Feb 2026 00:00:00 GMT query-auto-completion Unifying Ranking and Generation in Query Auto-Completion via Retrieval-Augmented Generation and Multi-Objective Alignment https://machinelearning.apple.com/research/query-auto-completion Query Auto-Completion (QAC) is a critical feature of modern search systems that improves search efficiency by suggesting completions as users type. However, existing approaches face fundamental challenges: traditional retrieve-and-rank pipelines have poor long-tail coverage and require extensive feature engineering, while recent generative methods suffer from hallucination and safety risks. We present a unified framework that reformulates QAC as end-to-end list generation through Retrieval-Augmented Generation (RAG) and multi-objective Direct Preference Optimization (DPO). Our approach… Wed, 18 Feb 2026 00:00:00 GMT correctness Models That Prove Their Own Correctness https://machinelearning.apple.com/research/correctness How can we trust the correctness of a learned model on a particular input of interest? Model accuracy is typically measured on average over a distribution of inputs, giving no guarantee for any fixed input. This paper proposes a theoretically-founded solution to this problem: to train Self-Proving models that prove the correctness of their output to a verification algorithm V via an Interactive Proof. Self-Proving models satisfy that, with high probability over an input sampled from a given distribution, the model generates a correct output and successfully proves its correctness to V. The… Tue, 17 Feb 2026 00:00:00 GMT ferret-ui Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents https://machinelearning.apple.com/research/ferret-ui Developing autonomous agents that effectively interact with Graphic User Interfaces (GUIs) remains a challenging open problem, especially for small on-device models. In this paper, we present Ferret-UI Lite, a compact, end-to-end GUI agent that operates across diverse platforms, including mobile, web, and desktop. Utilizing techniques optimized for developing small models, we build our 3B Ferret-UI Lite agent through curating a diverse GUI data mixture from real and synthetic sources, strengthening inference-time performance through chain-of-thought reasoning and visual tool-use, and… Tue, 17 Feb 2026 00:00:00 GMT semantic-caching Asynchronous Verified Semantic Caching for Tiered LLM Architectures https://machinelearning.apple.com/research/semantic-caching Large language models (LLMs) now sit in the critical path of search, assistance, and agentic workflows, making semantic caching essential for reducing inference cost and latency. Production deployments typically use a tiered static-dynamic design: a static cache of curated, offline vetted responses mined from logs, backed by a dynamic cache populated online. In practice, both tiers are commonly governed by a single embedding similarity threshold, which induces a hard tradeoff: conservative thresholds miss safe reuse opportunities, while aggressive thresholds risk serving semantically incorrect… Mon, 16 Feb 2026 00:00:00 GMT completed-hyperparameter Completed Hyperparameter Transfer across Modules, Width, Depth, Batch and Duration https://machinelearning.apple.com/research/completed-hyperparameter Hyperparameter tuning can dramatically impact training stability and final performance of large-scale models. Recent works on neural network parameterisations, such as μP, have enabled transfer of optimal global hyperparameters across model sizes. These works propose an empirical practice of search for optimal global base hyperparameters at a small model size, and transfer to a large size. We extend these works in two key ways. To handle scaling along most important scaling axes, we propose the Complete(d) Parameterisation that unifies scaling in width and depth — using an adaptation of… Fri, 13 Feb 2026 00:00:00 GMT controlled-experimentation A Small-Scale System for Autoregressive Program Synthesis Enabling Controlled Experimentation https://machinelearning.apple.com/research/controlled-experimentation What research can be pursued with small models trained to complete true programs? Typically, researchers study program synthesis via large language models (LLMs) which introduce issues such as knowing what is in or out of distribution, understanding fine-tuning effects, understanding the effects of tokenization, and higher demand on compute and storage to carry out experiments. We present a system called Cadmus which includes an integer virtual machine (VM), a dataset composed of true programs of diverse tasks, and an autoregressive transformer model that is trained for under $200 of compute… Fri, 13 Feb 2026 00:00:00 GMT faster-rates Faster Rates For Federated Variational Inequalities https://machinelearning.apple.com/research/faster-rates In this paper, we study federated optimization for solving stochastic variational inequalities (VIs), a problem that has attracted growing attention in recent years. Despite substantial progress, a significant gap remains between existing convergence rates and the state-of-the-art bounds known for federated convex optimization. In this work, we address this limitation by establishing a series of improved convergence rates. First, we show that, for general smooth and monotone variational inequalities, the classical Local Extra SGD algorithm admits tighter guarantees under a refined analysis… Fri, 13 Feb 2026 00:00:00 GMT mapping Mapping the Design Space of User Experience for Computer Use Agents https://machinelearning.apple.com/research/mapping Large language model (LLM)-based computer use agents execute user commands by interacting with available UI elements, but little is known about how users want to interact with these agents or what design factors matter for their user experience (UX). We conducted a two-phase study to map the UX design space for computer use agents. In Phase 1, we reviewed existing systems to develop a taxonomy of UX considerations, then refined it through interviews with eight UX and AI practitioners. The resulting taxonomy included categories such as user prompts, explainability, user control, and users’… Thu, 12 Feb 2026 00:00:00 GMT trace-length Trace Length is a Simple Uncertainty Signal in Reasoning Models https://machinelearning.apple.com/research/trace-length Uncertainty quantification for LLMs is a key research direction towards addressing hallucination and other issues that limit their reliable deployment. In this work, we show that reasoning trace length is a simple and useful confidence estimator in large reasoning models. Through comprehensive experiments across multiple models, datasets, and prompts, we show that trace length performs in comparable but complementary ways to other zero-shot confidence estimators such as verbalized confidence. Our work reveals that reasoning post-training fundamentally alters the relationship between trace… Thu, 12 Feb 2026 00:00:00 GMT parallel-track Parallel Track Transformers: Enabling Fast GPU Inference with Reduced Synchronization https://machinelearning.apple.com/research/parallel-track Efficient large-scale inference of transformer-based large language models (LLMs) remains a fundamental systems challenge, frequently requiring multi-GPU parallelism to meet stringent latency and throughput targets. Conventional tensor parallelism decomposes matrix operations across devices but introduces substantial inter-GPU synchronization, leading to communication bottlenecks and degraded scalability. We propose the Parallel Track (PT) Transformer, a novel architectural paradigm that restructures computation to minimize cross-device dependencies. PT achieves up to a 16x reduction in… Tue, 10 Feb 2026 00:00:00 GMT