Nvidia Open Sources Latest VLA—Can It Break Through L4 Autonomous Driving Barriers?

旧船已漏，新陆未现 — 一个普通人在 AI 时代的观察与思考

2026 年 7 月 2 日

灵光一号位转岗，蚂蚁阿福收兵

2026 年 7 月 2 日

As large AI models are increasingly integrated into the automotive industry, competition in the sector is shifting from basic functionality to high-level intelligent driving capabilities. The VLA (Vision-Language-Action Model) is now seen as the key variable driving the next wave of technological advancement.

On December 1, NVIDIA officially announced the open-sourcing of its latest autonomous driving Vision-Language-Action (VLA) model, Alpamayo-R1. This model can simultaneously process vehicle camera footage and textual instructions to output driving decisions. It is now open-sourced on both GitHub and Hugging Face, with the release of the Cosmos Cookbook development toolkit.

This marks the industry』s first open-source VLA model dedicated to autonomous driving. NVIDIA aims to use this move to provide core technical support for the adoption of L4-level autonomous driving.

Notably, compared with traditional black-box autonomous driving algorithms, NVIDIA』s Alpamayo-R1 emphasizes explainability, capable of providing the reasoning behind its decisions. This feature assists with safety validation, regulatory review, and accident liability determination. Accompanying tools like the Cosmos Cookbook make it easier for companies and developers to efficiently train, evaluate, and deploy solutions.

Industry experts believe NVIDIA is attempting to lower development barriers, accelerate the standardization of the software stack, and break away from the costly 「fully in-house development」 approach prevalent in Robotaxi operations. The goal is to create an 「Android-style」 ecosystem that allows for rapid modular assembly.

However, some insiders told the author that NVIDIA』s open-sourcing of Alpamayo-R1 is similar to Baidu』s Apollo initiative: valuable for newcomers to the autonomous driving field, but not particularly significant for established, specialized companies.

Currently, VLA technology is widely recognized as the next-generation core for intelligent driving, prompting increased investment from major players. In China, companies such as Li Auto, XPeng Motors, GWM (already applied in the Wey Lanshan model), and DeepRoute have all achieved mass production deployments based on VLA.

Addressing the Pain Points of Traditional End-to-End Models

Traditional end-to-end models often function as a black box—they may be "visible but not understandable," and are prone to failure when encountering long-tail scenarios such as illegal left turns or pedestrians darting into the road.

Compared with traditional end-to-end models, VLA introduces language modality as an intermediate layer, transforming visual perception into an interpretable logical chain. This enhances its potential to handle long-tail and complex, unpredictable scenarios, allowing the machine to observe, reason, and decide like a human, rather than merely mapping massive amounts of data into outputs.

In the field of autonomous driving, the VLA (Vision-Language-Action) large model represents a technological direction that deeply integrates visual perception, language understanding, and decision-making control. It can directly output driving actions, and its core advantages include more powerful environmental comprehension and reasoning abilities, more efficient integrated decision-making, superior handling of long-tail situations, more transparent human-machine interaction and trust-building, and more natural vehicle control methods.

The VLA model Alpamayo-R1, newly open-sourced by NVIDIA, is trained on an entirely new Chain of Causation (CoC) dataset. Each segment of driving data is annotated not only with what the vehicle did, but also with why it took that action.

For example: 「The vehicle slowed down and changed lanes to the left because there was a moped stopped at the red light ahead, and the left lane was clear.」 This means the model learns reasoning based on causality, rather than memorizing fixed patterns by rote.

At the same time, thanks to a modular VLA architecture, NVIDIA』s Alpamayo-R1 combines Cosmos-Reason—a vision-language model pretrained for physical AI applications—with a diffusion-based trajectory decoder, enabling real-time generation of feasible dynamic plans. In addition, a multi-stage training strategy is used: supervised fine-tuning is first employed to enhance reasoning ability, followed by reinforcement learning (RL) to optimize inference quality—leveraging feedback from large reasoning models and ensuring consistency between reasoning and action.

According to data released by NVIDIA, the Alpamayo-R1 has achieved a 12% improvement in trajectory planning performance in complex scenarios, a 25% reduction in close-range collision rates, a 45% enhancement in inference quality, and a 37% boost in inference-action consistency. As the model parameters expanded from 0.5B to 7B, performance continued to improve. On-road vehicle testing has verified its real-time capabilities (with a latency of 99 milliseconds) and the feasibility of deployment in urban environments.

As a result, NVIDIA』s Alpamayo-R1 is expected to enable a major leap in capabilities for L4 autonomous driving, paving the way for Robotaxi services to safely integrate into real-world, chaotic public roads.

Becoming the 「Android」 of the Autonomous Driving Arena

The open-sourcing of Alpamayo-R1 once again demonstrates NVIDIA's ambition in the autonomous driving sector. The company is no longer content with merely being a hardware supplier—it aims to become the 「Android」 of the autonomous driving industry.

In fact, as early as this October, NVIDIA quietly released its Alpamayo-R1 large model to the public. At the Washington GTC conference, NVIDIA also unveiled its autonomous driving platform—NVIDIA DRIVE AGX Hyperion 10.

Hyperion 10 is regarded as the 「body」 of NVIDIA』s autonomous driving solution, while Alpamayo-R1 is its 「brain.」

Notably, Hyperion 10 achieves a closed-loop process 「from simulation to real vehicle」: In the cloud, DGX supercomputers use DRIVE Sim to generate high-fidelity simulation data, which is used to train DRIVE AV models. On the vehicle end, sensor data from Hyperion 10 seamlessly integrates with Thor chips.

Thus, if an automaker wants to quickly launch a model with L4 capabilities, it no longer needs to build separate, large-scale teams for hardware integration, software algorithms, and data training. By adopting NVIDIA』s full-stack solutions, rapid vehicle deployment can be achieved.

At the same time, NVIDIA is building an 「Android-style」 Robotaxi ecosystem, and has announced a clear timeline for rollout: deployment of 100,000 Robotaxis starting in 2027.

Currently, NVIDIA has announced partnerships with Uber, Mercedes-Benz, Stellantis, Lucid, and others to jointly build the 「world』s largest L4-level autonomous driving fleet.」 As of October 2025, NVIDIA』s cloud platform will have accumulated over 5 million hours of real-world road data.

NVIDIA』s entry is shifting Robotaxi competition from a pure technology race to a battle of ecosystems.

The closed ecosystem not only leads to redundant R&D investment, but its more profound drawback is the creation of data silos. For example, Waymo』s driving data on U.S. roads is difficult to benefit Chinese automakers, and each player is independently—and slowly—climbing the technology curve.

Nvidia』s open ecosystem creates an opportunity for players within it to share anonymized feature data while ensuring data privacy and security. For instance, if Automaker A encounters an extreme scenario at a specific intersection, that scenario』s data can be anonymized and converted into training features, helping Automaker B』s models more quickly recognize similar risks.

If Nvidia can become the Android of the autonomous driving sector, it could shift the entire industry』s technological evolution from a linear pace to exponential acceleration. This is more than just technology sharing—it』s also about shared costs. The marginal cost of collectively addressing long-tail scenarios, the industry』s biggest challenge, will continue to decrease as the ecosystem expands.

According to Zhou Guang, CEO of Yuanrong Qixing, the VLA could bring a leapfrog lead and become the critical variable in the next round of competition.

Tian Shan, CTO of DeepWay, told the author that VLA is currently a hot trend in autonomous driving, attracting many researchers. It can significantly improve both the generalization and reasoning capabilities of autonomous driving models. Nvidia』s open-sourcing of Alpamayo-R1 allows more people to participate in research and contribute to this hot and promising self-driving technology, which will actively promote the development and implementation of VLA. Moreover, this technology can also be applied to embodied intelligence and other physical AI scenarios.

Invisible barriers still lie ahead

However, in order for Alpamayo-R1 to meet automotive-grade latency requirements, it still needs to run on top-tier cards like the RTX A6000 Pro Blackwell—the INT8 performance of this card reaches 4000T, which is about six times that of Thor.

Nvidia』s business model means its open-source initiatives are ultimately designed to better sell its hardware and full-stack solutions. The Alpamayo-R1 model is deeply integrated with Nvidia』s chips (such as Thor) and development platforms (such as Drive), allowing for greater computational efficiency.

In other words, joining the Nvidia ecosystem brings convenience, but also creates a deep dependency on Nvidia for core computing power.

Additionally, as DeepWay CTO Tian Shan pointed out, whether VLA is the best autonomous driving technology is still an open question. The Alpamayo-R1 model』s toolchain is based on Nvidia』s platforms, which is a limitation for many developers. As a result, other technologies and computing platforms are also pushing forward the development of autonomous driving.

In Tian Shan』s view, most companies should focus more on the application of technology in real-world scenarios—that is, the engineering implementation of technology. Addressing practical, real-life problems and achieving a commercially viable closed loop for intelligent driving technology as soon as possible will be more beneficial for the industry』s long-term, healthy development.

Furthermore, the large-scale commercialization and implementation of L4 autonomous driving—or Robotaxi services—are closely tied to policies and regulations. The ability to operate within compliance frameworks, undergo safety assessments, and strike a balance between data utilization and privacy protection is just as important as technological capability itself.

Jensen Huang, the founder and CEO of NVIDIA, has always regarded Robotaxi as the 「first commercial application of robotics technology.」 Instead of building a single driverless taxi, NVIDIA』s goal has always been to provide the technological foundation enabling all players to create their own driverless taxis. Now, he is attempting to establish a fast-replicable production line for this application through the open-sourcing of VLA.

However, whether open source can truly lower the barriers to entry and accelerate the arrival of L4 autonomous driving—ultimately unleashing technology』s full potential across broader commercial horizons—remains to be seen. The open-sourcing of NVIDIA』s Alpamayo-R1 model is only the beginning of the game. More hurdles still lie ahead, and it will take the market to validate its true potential. (Writing by Zhang Min; Editing by Chelsea Sun and Li Chengcheng)