The Dramatic Rise Of Startup 'Databricks': Journey From Academia To A $6.2B Business

In June 2020, data-crunching startup Databricks, inc. ranked at 36th position on CNBC's 2020 Disruptor 50 list of innovative companies. During the ongoing recession caused by the Covid-19 pandemic, Databricks hasn't laid off any of its over 1,300 employees (spread across four continents), while many other technology peers have downsized. The company is poised to continue booming during the rest of 2020, as more and more enterprises are adopting artificial intelligence to strengthen their operations. The company is also gearing up to go public in 2021, and Databricks' CEO and co-founder Ali Ghodsi says there's plenty of investor demand to make that happen.

Databricks: The data and A.I. company

San Francisco headquartered startup 'Databricks' was founded in 2013, having origins in both academia and the open-source community. The company, founded by seven co-founders (Ali Ghodsi, Ion Stoica, Matei Zaharia, Patric Wendell, Reynold Xin, Andy Konwinski, and Scott Shenker) and driven by huge industry potential, helps data teams, engineers, analysts, and scientists, work together to find value inside data, and solve the world's toughest problems.

All seven co-founders, at the time of conceiving Databricks, were UC Berkeley researchers, who were able to successfully bank on the proposition: when combined with A.I., data holds the promise of curing diseases, saving lives, reversing climate change, and even changing the way we live.

As a result, Databricks is the only open unified platform for massive-scale data management, business analytics, and full-lifecycle machine learning, enabling data teams to innovate collaboratively and faster. Today, more than 5,000 organizations worldwide rely on Databricks, including Shell, Conde Nast, and Regeneron. The company has 100s of global partners, including Microsoft, Tableau, Cap Gemini, and Amazon, among others.

Databricks' close ties with academia (UC Berkeley in particular), helped it gain traction in the A.I. space. As more and more enterprises continue to adopt A.I., Databricks' growth will be bigger and bigger every year from here on out. Its total funding raised to date is $897 million, and the company is relentlessly gaining traction.

Academia to Industry: An Uphill Battle

Everything wasn't always hunky-dory for this present day buzzy unicorn. The company has had a history of interesting twists and dramatic turns. The groundwork (creation of the 'Apache Spark' project) that happened in an academic setting ( leading to the inception of Databricks' ) was an uphill battle with lots of struggle. Things took time and eventually in 2015 Databricks had achieved the popularity needed for both initial traction and becoming a highly valued company.

Though, it was far from being a sustainable entrepreneurial venture up until early 2017, which is when the company finally became a viable business en route to hyper-growth. After that, within a year of achieving impressive revenue growth, in 2018, it could confidently boast of having a product-market fit.

History

Databricks was co-founded by seven computer science Ph.D. students at UC Berkeley. Among them, co-founder Ali Ghodsi, who is Databricks' CEO since January 2016, had the vision of creating a company that would be the clear winner in the big data platform race. He understood that data would soon become more valuable than oil.

The beginning: Ali, his fellow co-founders, and the 'Apache Spark' open source project at UC Berkeley

In 1984 Ali was just five years old, when, with nothing much in hand, his family had to flee from Iran (due to the revolution). The family picked frigid northern Sweden as their safe haven. Growing up in the not so great suburbs of Sweden, his parents got him a used Commodore 64. With nothing else to do at home, he read the bunch of manuals that came with it to become a self-taught programmer.

By the time he was eight years old, he started spending so much time coding on his Commodore, he was able to write smaller programs and eventually wrote games for it. This was how his love for coding and resolving problems developed and strengthened. From age eight until he transitioned into becoming Databricks' CEO, Ali hadn't spent a day without programming.

His sequence of higher education in Sweden embodies a computer engineering degree, an MBA in logistics and strategic marketing, and a Ph.D. in distributed computing (in 2006). He took off his post Ph.D. career with a professorship job in Sweden, followed by a short stint in the U.S. (in Palo Alto) for a summer.

In 2009, he accepted the opportunity to collaborate with Ion Stoica in UC Berkeley, intended for about one year, after which he wanted to head back to Sweden. Ion, whom Ali had previously been an academic colleague with, was now a professor at UC Berkeley. After a year of working with Ion, Ali was blown away by the projects and opportunities, and he decided to stay on. The project they primarily ended up building on was 'Spark', which was initiated by team member Matei Zaharia in 2009 itself.

In the UC Berkeley research labs, they collaborated with another team that was working on machine learning. That other team was struggling with a competition called 'The Netflix Contest.' To win the competition, participant teams had to come up with a machine learning algorithm that would accurately predict what movies Netflix should recommend to its viewers.

Ali and his team tied for first in the competition, by introducing 'Apache Spark' as their competition entry. Apache Spark enabled you to take lots and lots of data (like movie recommendations) and do machine learning on it to be able to predict things. For example, predicting what movies people haven’t watched, but they would like to watch.

Having to tie for first place in the competition was a confidence builder, and gave them the tag of being the true founders of A.I. With Spark, they knew they had created something useful, but the challenge ahead was to make people adopt this seemingly alien technology.

Tough times & hope: long search for adoption, impact, & initial traction

From 2009 to 2012, none of the companies Ali and his team approached and pitched 'Apache Spark' to, paid any heed. At this stage, as "Berkeley hippies", they were just looking for impact. However, no adopters of Spark meant there was skepticism about the technology, also the spread of fud (fear, uncertainty, and doubt) by competing technologies. Ali and his team had to do something so that this technology would no longer be dismissed as "academic'" or a mere "source code".

In 2013, after the tough years came hope. Ben Horowitz (from famous VC Andreessen Horowitz), who had heard about Spark through Berkeley professor Scott Shenker, was of the staunch belief that a $100 Billion company could be built around the Spark technology. He emphasized that Ali and his team would have to create a company (corporate-type of structure) on their own, to take A.I. to the masses.

The co-founders internally started debating among themselves on how much of an offer would be good enough to let Ben in. They had conflicting suggestions for the valuation, ranging from 20 to 35 million dollars. Finally, Ben walked in and said the company was worth $50 million and he was willing to invest $14 million. The offer was instantly accepted. This came at a time Ali was earning $59,000 at UC Berkeley.

With the money in hand they got to work, they coded away, hired experts, and built the company 'Databricks'. Yet, they still faced the lingering challenge - lack of adopters and the fud created by competing technologies.

A common misleading rumor going around through conferences (attended by thousands) and otherwise was, "Spark is great if you have massive amounts of memory. But what if you have so much data it doesn’t fit in the memory? Then Spark will not work, so don’t use this technology, it won't work!"

The founders knew Spark was great for everything and the rumors were false. It wasn't until 2015, when something amazing happened, and things turned around. Before delving into the turnaround story, it can be inferred that an important relationship had indeed been formed.

The Relationship Between Apache Spark & Databricks

The development of Spark was initiated in 2009 by Matei Zaharia - at UC Berkeley's AMPLab. In 2010, Spark was open-sourced under a BSD (Berkeley Source Distribution) license. In 2013, it was donated to the Apache Software Foundation, before becoming a Top-Level Apache Project in early 2014.

Databricks is the enterprise behind Apache Spark. It's a managed platform for running Apache Spark, whereby users reap full benefit from Spark by not having to learn perplexing cluster management concepts or perform endless maintenance tasks. Instead, through a point and click user interface, preferred by data analysts and data scientists, Databricks enables users to be more productive with Spark.

The overnight turnaround: Global adoption of Spark

Finally, in 2015, Spark, the technology Databricks manifests, gained huge popularity and global adoption. Databricks' founders were fed up with the rumor - the technology doesn't work if the data doesn't fit in the memory. They decided to resort to marketing.

They took part in a geeky competition. This time it wasn't the 'Netflix Contest', but the 'Sorting Contest'. The challenge was to sort a petabyte of data. Through the help of Databricks' co-founder (and chief architect) Reynold, they beat the world record, by being the fastest ever in sorting one petabyte of data, and they did so with a lot less memory than one petabyte. The achievement caught media attention, and suddenly Spark became the most popular software, even topping the Gartner Hype Cycle.

Through 2015, suddenly everybody across the world was talking about the technology. There was global adoption, but with annual revenue of just $1 million, Databricks' internal business challenges were outweighing the global adoption (and popularity) of its technology. Also, at that time, co-founder Ion Stoica had to step away as Chief Executive to return to his Berkeley professorship commitments.

Entrepreneurial Strategy: Pivoting for Traction & Explosive Growth

In January 2016, with Ion heading back to Berkeley, at the suggestion of Ben Harowitz - Ali Ghodsi became CEO, which he claims was a decision based on being the eldest co-founder after Ion. That year the company valuation was around $500 million, but annual revenue was only $1 million. The board was getting anxious with them, as even a local restaurant had higher revenues. The technology was amazing and impacted the world, but it didn't have to be given out for free.

Ali wasn't psyched out by the business challenges that lay ahead. He knew that revenues were extremely low, and any changes he implemented would prove fruitful. He introduced three changes.

Change No. 1: The pivot to enterprise sales

This pivot was made to focus on targeting large enterprises. Prior to this Databricks was working in an almost self-service way, without the need for a massive salesforce. Now they pivoted and went all-in on sales and hired an enterprise sales leader. The basis for this pivot was the huge revenue potential that would arise from big corporations that would benefit from Databricks' A.I. based technology to clean their ginormous data. This pivot was so important, that Databricks was paying around $350,000 on average as salary to each of these sales guys.

Change No. 2: Hiring an executive team over-indexed on experience

The six remaining co-founders (including Ali), who are all PhDs, were already focussing on the research and innovation of technology. What they needed now were pros to handle the other business functions. Ali then up-leveled the team by hiring a total of twelve experts within the marketing, sales, finance, and customer success departments.

Change No. 3: Building proprietary software focussed on enterprise features sought by large enterprises

The open-source technology was great, but Databricks needed to offer proprietary software that would be focused on enterprise features to solve the problems of large enterprises. This way Databricks would have really valuable products they could sell.

Results & The way forward

A year into his journey as CEO, in 2017, Ali and company struck it's first million-dollar deal. Ali's entrepreneurial strategies had kicked in with positive momentum. At the end of that year, Databricks' recurring revenue was $40 million, $100 million in 2018, and a revenue run-rate of $200 million during Q3 of 2019. Today, the company has a valuation of $6.2 billion, and the growth trajectory is unaffected by the pandemic or anything else.

Today, Databricks has transitioned from being a visionary company to a leader in Data Science, by gaining traction because of the following product features:

It's an open unified analytics platform
It acts as Apache Spark simplified
It offers multi-language and multiple platform support
Rich notebooks and dashboard make it user friendly

The company's growth, funding, and valuation results today are astounding. Yet, these results shouldn't conceal the invaluable entrepreneurial lessons we can derive from Databricks' rocky journey to mega success. A combination of history, struggle, forces of destiny, and an uphill entrepreneurial path took the company from scratch to unprecedented hyper-growth.