Could you tell us about your fascinating journey into tech space? How did you start at Dremio?
I’ve been in tech my entire career. As a CS undergrad at the Technion in Israel I joined IBM Research, and later worked on enterprise security products at Microsoft. I joined MapR back in 2009 when Hadoop was the hottest thing around. As the fourth employee and member of the executive team, I helped grow the company to about 400 employees and hundreds of enterprise customers. But after 5.5 years, it became clear that Hadoop was simply too hard for most companies, and only hardcore developers were able to take advantage. Around that time, the Public Cloud started to gain traction not just with startups, but also larger enterprises, and Cloud data lake storage (i.e., S3) was emerging as the system of record for Big Data in the Cloud. Jacques Nadeau and I founded Dremio in 2015, backed by some of the top VCs in Silicon Valley, to capitalize on this trend and radically simplify the analytics stack for the enterprise. We developed what we now call a Data Lake Engine.
As a leader in the Data Management industry, what unique challenges and opportunities did you find in your journey?
The problems that we’re solving at Dremio are extremely complex. As part of our platform, we’ve built a distributed query engine that can elastically scale up to 1000 instances/servers and outperform other query engines such as Presto and Hive by an order of magnitude. This is an endeavor that should probably take a decade, but as a startup you have to ship the product within a year or two. That required building a world-class team of 10x Engineers with hundreds of combined years of experience developing databases and distributed systems from the ground up. That was obviously not an easy thing to do.
Could you tell us about the current benchmarks in Big Data and Analytics and how you train your team in managing these rather high-ended expectations?
I believe that performance benchmarks will be a thing of the past within a couple years. When you combine scale-out software like Dremio, with the infinite scalability and elasticity of the Public Cloud, you can achieve any performance, so efficiency becomes more interesting than performance. And efficiency can be measured in dollars ($) per unit of work. For example, what does it cost to run a specific Tableau or Power BI query on a specific dataset? I believe that’s the future of benchmarking, and we’ve developed many ground-breaking technologies such as Data Reflections, C3 and Apache Arrow to give us a strong lead in both performance and efficiency.
What is Dremio’s data lake engine? How do you ensure flexibility and openness in the platform?
Our Data Lake Engine is a new approach to Data Analytics that delivers lightning fast query speed and a self-service semantic layer operating directly against data lake storage such as AWS S3 and Microsoft ADLS. Dremio eliminates the need for traditional ETL, data warehouses, cubes, and aggregation tables, as well as the infrastructure, copies of data, and effort these systems entail. We ensure flexibility and openness in a number of important ways. First, Dremio works directly with data lake storage, meaning customers don’t have to send their data to Dremio or have it stored in proprietary, expensive data warehouses or other formats that lock them in. Second, we provide powerful joining abilities across a wide range of data sources so customers can easily access their data anytime, without ETL. Third, Dremio is built on open source technologies such as Apache Arrow and Apache Arrow Flight, which we co-created.
These projects have set the industry-standard for columnar, in-memory data representation and sharing, powering dozens of open source and commercial technologies. And finally, customers have full flexibility to deploy and run Dremio on AWS, Azure, or oven on-premise. They can even query data across disparate regions or Clouds. And the abstraction provided by our semantic layer enables them to migrate data from one location to another, without impacting their analysts or data scientists.
What message do you have for young professionals in the Data Harnessing industry?
The Cloud makes it really easy to experiment with data, and very inexpensive to store it thanks to systems such as S3 and ADLS. Take advantage of these environments so that you can easily utilize best of breed technologies to explore, analyze, visualize and model your data.
Our AI readers would like to know from you about the best resources to develop Algorithms for weighing/categorizing data. How do you handle heterogeneity of data inputs?
We create connectors to enable Dremio to process and interact with data sources, and join that data with all other sources. In addition to the native connectors built into Dremio we recently introduced the Dremio Hub Marketplace, which provides an accessible and easy to use listing of Dremio Community provided connectors to a wide variety of data sources, from Relational Databases through SaaS applications. Customers can simply find the data source to connect to, download the connector and follow the straightforward installation steps.
In addition to providing a marketplace of connectors for download, Dremio Hub includes an easy to use framework for Dremio Community members to build new connectors and to share and publish their connectors for the Dremio Community. Connectors can be built to any data source with a JDBC driver and are template based making it simple and easy to define new connectors without complex coding required. Dremio Hub connectors have the exact same high-performance capabilities as native Dremio connectors, including Advanced Relational Pushdown which executes complex SQL logic directly within the data source for high-performance and the ability to define Reflections.
What are your future predictions for Digital Transformation based on AI, Data Science and IoT? What is the practicality of these applications compared to what we see or use today?
There has been a lot said about IoT in the last decade. Industrial IoT (IIoT) and consumer IoT are now commonplace. IoT platforms will soon be making IoT data easily queryable, so that data consumers can write a SQL statement or use a BI or Data Science tool to analyze historical IoT data with tremendous speed and virtually no effort. I also expect the introduction of 5G to further accelerate IoT adoption/use cases, and IoT will drive significant growth in Cloud data lake storage consumption.
How do you foresee issues in Data Management and Security, especially of applications that run in tandem with Microsoft enterprise software? How can business owners safeguard their interest and that of their customers?
The enemy of security is data copies. Once companies lose control of their sensitive data within the company, it becomes impossible to control and monitor access to that data. It’s then just a matter of time until data leaks and other incidents occur. At Dremio our focus is on delivering a data platform that enables IT to maintain security and control, while also providing self-service to analysts and data scientists. As counterintuitive as it may sound, we believe that self-service is the foundation of security, because when IT can’t provide self-service, the consumers of data find workarounds and the data quickly ends up in various data extracts such as spreadsheets.
What is your future roadmap for the companies that are yet to build their own Data Science/AI teams?
Dremio is enabling Data Science/AI teams at many of the world’s largest companies to explore large volumes of data and perform feature engineering. With the newly created Arrow Flight technology, data scientists can now populate data frames in Python and R over 100x faster than through traditional ODBC/JDBC interfaces. An Arrow client library, such as pyarrow, establishes a parallel connection to Dremio, and because Dremio’s internal memory format is Apache Arrow, the data flows directly from inside the Dremio engine into the memory on the client application. In essence, this enables data scientists to work with millions to billions of records without having to create static offline extracts on their own systems.
How do you prepare for an AI-centric world? How do you inspire your people to work with technology?
Here at Dremio, inspiration stems from our strong cultural values. We put customer needs, problems, and aspirations at the center of everything we do, and we’re constantly looking for creative solutions. Powerful new technologies like AI create opportunities for our people to innovate in really interesting ways to help our customers, and that’s exciting for us.
What start-ups are you keenly following –
Preset (the company commercializing Superset), Anaconda.
One technology that will be outdated by 2025
Cloud data warehouses. They seemed different when they came out, but companies are starting to realize that a data warehouse is a data warehouse, and the risks of lock-in and runaway costs outweigh the benefits. The Cloud data lake, with storage provided by S3 and ADLS, will take over as the more open and flexible data platform architecture.
Something you do better than others – the secret of your success?
Customer obsession. While we’re not perfect, every single employee at Dremio will jump through walls to help our customers. It’s our core value.
Tag the one person in the industry whose answers to these questions you would love to read.
Juergen Kraemer – Head of Analytics and IoT at Software AG.
Thank you, Tomer! That was fun and hope to see you back on AiThority soon.
Tomer is Co-founder and CEO of Dremio. Previously he was the vice president of product at MapR, where he was responsible for product strategy, roadmap and new feature development. As a member of the executive team, he helped grow the company from five employees to more than 300 employees and 1000 enterprise customers.
Dremio is the Data-as-a-Service Platform. Created by veterans of open source and big data technologies, and the creators of Apache Arrow, Dremio is a fundamentally new approach to data analytics that helps companies get more value from their data, faster. Dremio makes data engineering teams more productive, and data consumers more self-sufficient.
Founded in 2015, Dremio is headquartered in Mountain View, CA. Investors include Lightspeed Venture Partners, Redpoint, and Norwest Venture Partners.