Databricks co-founder Matei Zaharia has been awarded the ACM Prize and states that AGI has already arrived.
In summary: Matei Zaharia, a computer science professor at Berkeley and co-founder of Databricks, who is the creator of Apache Spark, has been awarded the 2026 ACM Prize in Computing for his significant contributions to distributed data systems and AI infrastructure. The $250,000 award, provided through an Infosys endowment, is one of the most prestigious honors in computer science for mid-career professionals. Zaharia has chosen to donate the prize to charity. In an interview after the announcement, he stated that AGI is already present, “it’s just not in a form that we value,” and suggested that the field should stop comparing AI to human cognition.
From PhD dissertation to global infrastructure
Zaharia began developing Apache Spark as a PhD student at UC Berkeley in 2009 as a faster alternative to Hadoop MapReduce, which had become the standard framework for large-scale distributed data processing but suffered from slow disk-based I/O between processing stages. Spark utilized in-memory computation for intermediate processes, significantly reducing processing times for tasks such as machine learning training, graph processing, and stream analysis, from hours to minutes or even seconds. This performance improvement was substantial enough that Spark quickly became the preferred solution over MapReduce for most analytical tasks shortly after its release. It is still widely used as one of the leading data processing frameworks globally. Zaharia’s dissertation on Spark earned him the ACM Doctoral Dissertation Award in 2014, and the project laid the groundwork for Databricks, the data and AI company he co-founded in 2013 with six colleagues from Berkeley. By December 2025, Databricks reached a valuation of $134 billion following its Series L funding, and in February 2026 announced a revenue run rate of $5.4 billion, with over 65% year-on-year growth. In its prize citation, the ACM recognized Zaharia for his “visionary development of distributed data systems and computing infrastructure, which has enabled large-scale machine learning, analytics, and AI on a global scale.” The open-source project Zaharia promoted, Apache Spark—licensed under Apache 2.0, similar to Google’s recent Gemma 4 open-weight model family—has become the standard framework for AI model and tool releases targeting widespread commercial use.
Delta Lake, MLflow, and the data lakehouse
Zaharia’s work extended beyond Spark. As data infrastructure transitioned to the cloud and organizations began accumulating large volumes of unstructured data in object storage like Amazon S3, new challenges arose: while cloud data lakes offered speed and cost benefits, they lacked reliability, providing no transactional guarantees, consistent schema enforcement, or effective management of simultaneous writes. To address this, Zaharia co-developed Delta Lake, which introduced ACID transactional features to cloud object stores and enabled a new architectural framework—the data lakehouse—combining the cost and scalability advantages of a data lake with the consistency and governance aspects of a traditional data warehouse. The lakehouse model has become the primary commercial offering of Databricks and is widely adopted in enterprise data engineering. Another significant project, MLflow, was created to tackle the operational disarray arising as machine learning progressed from research to production. Teams developing ML models struggled to track experiments, version models, and manage deployments across various tools, including Scikit-learn, TensorFlow, PyTorch, and XGBoost, that an organization might use simultaneously. MLflow established a structured lifecycle framework, emerging as one of the leading platforms for operationalizing AI at scale.
Agents, DSPy, and the current research frontier
Recently, Zaharia's research has transitioned from data infrastructure to enhancing the reliability and capability of AI agents. He is a co-author of DSPy, an open-source framework designed to automatically optimize the prompts and parameters used to instruct language models for specific tasks, replacing the manual prompt engineering that can introduce brittleness in production AI systems. A related initiative, GEPA, focuses on agent quality enhancements, aiming to increase the reliability of multi-step AI workflows where errors accumulate through successive decisions. The unifying theme throughout Zaharia’s career has been systems thinking applied to elements of AI beyond the models themselves—such as data pipelines, experiment tracking, deployment infrastructure, and now agent orchestration. The enterprise AI deployment ecosystem shaped by these tools has blossomed into a significant commercial sector: Infosys, which supports the ACM Prize through its endowment, is also a key partner in Anthropic’s Claude Partner Network, launched in March 2026 with a $100 million investment directed towards enterprise AI deployment—a market that owes much of its current existence to the accessible data and ML infrastructure established by Zaharia’s open-source contributions. “The aspect I find most exciting,” Zaharia remarked in the TechCrunch interview, "is what I would term AI for search, specifically for research or engineering.” He envisions students and researchers leveraging AI to simulate molecular-level changes within biological systems and forecast their results, enabling autonomous scientific exploration at
Other articles
Databricks co-founder Matei Zaharia has been awarded the ACM Prize and states that AGI has already arrived.
Matei Zaharia, the creator of Apache Spark and CTO of Databricks, has been awarded the 2026 ACM Prize in Computing. He asserts that Artificial General Intelligence (AGI) has already been achieved, although not in a recognizable form.
