Last month, we reviewed hundreds of AIOps companies and open source platforms for our AI RADAR. In the unique space that we surfed in, we found Gigster. Gigster is a data-driven Machine Learning company that prides itself in enabling Project Managers with unmatched efficiency in Application Development. It helps to build specs 80% faster than average and developers can start with 60% of the code already written.
We spoke to CTO Debo Olaosebikan to understand the roadmap for Gigster in 2019-2020 and how it is competing within the exciting AI ML DevOps space. Here are some of his opinions, insights, and predictions for DevOps and AI Engineers who are pursuing a career in Data Science and Deep Learning.
Look to IBM, Google, Microsoft, Amazon, and providers of Machine Learning APIs to release more inclusive datasets to combat embedded discrimination and bias in AI.
Machine Learning is the dominant form of Artificial Intelligence that is driving success in fields as diverse as speech recognition in your Amazon Alexa, facial recognition in the auto-tagging feature on Facebook, pedestrian detection in a self-driving car and even deciding to show you a shoe ad because you visited a shoe e-commerce site. In Machine Learning, decisions are learned from existing data records of human decisions and labels.
Thus, to distinguish a dog from a cat, we show a computer many labeled images of dogs and many labeled images of cats and it learns the difference. This seemingly innocuous approach embeds a huge problem — bias. If we blindly feed computers with the labels and decisions of humans, the computer may simply replicate our biases. There is an infamous Microsoft Tay bot to remind us of this.
Even worse, but more subtle, is the bias that comes from the data itself not being representative of the broader group we want to understand. For example, earlier this year, work by Joy Buolawumi and Timnit Gebru showed that, on the task of classifying what gender a person was, major commercially available computer vision products performed best when fed images of light-skinned men and worst with images of dark-skinned women. It is a massive problem if the datasets we train these classifiers on do not contain enough properly labeled people of color and do not capture broader cultural nuances irrespective of place of origin.
Decisions about under sampled people made by Machine Learning models trained on these non-inclusive datasets are obviously flawed. In 2019, we will see large companies with major computer vision products actually release more inclusive datasets openly. These datasets will be more balanced in terms of geography, race, gender, cultural concepts as well as other dimensions and because they are released openly will supercharge research on minimizing bias in AI.
Adoption of AI within Healthcare and Financial services will go up as products that make previously black box AI decisions more interpretable start to become mainstream.
Life was much simpler when AI was based on algorithms that made decisions that could be easily explained.
For example, an algorithm that looks first to see if you have a headache, and then to see if you have a fever, and then concludes that you have the flu is interpretable. Regardless of whether the algorithm made the right or wrong prediction, there is huge value in the fact that it is possible to explain how it made its decision.
In fields like medicine, where we might be making life or death decisions with machines, it’s clearly important that we can go back and understand why a machine suggested a course of action. In fields like finance, this is also critical. If an AI algorithm denies someone a loan, it’s important for us to understand why – and it’s especially important to know that there was no unjustified discrimination. As AI has become more successful, it has inclined much more on a technique called “Deep Learning,” which leverages many layers of Neural Networks (hence the term “deep”). In these systems, there is no clear way to interpret what’s going on and why the machine made a decision.
The system is like an extremely accurate black box that takes in a set of symptoms, measurements, images and the state and history of a patient and can output a highly accurate diagnosis.
For example, Google AI can predict whether you are at risk of heart disease simply by looking at your eyes!
What exactly is it about your eyes? No one walks around thinking they have diseased eyes! In 2019, as startups and large companies look to drive the adoption of AI in industries like finance and healthcare — there will be commercially supported systems tailored to these industries that aid in the introspection of the deep neural networks and allow us to better interpret predictions.
There will be attempts at completely automating the explanation of these predictions, but the approach that succeeds will be one that gives humans the ability to probe, look into the black box and better understand its decisions so that the humans behind the machines can come up with their own explanations.
Algorithms versus Algorithms
There will be successful AI-powered hacks of AI systems that go beyond “fake news”.
As techniques for generating fake, but realistic images and videos advance and new ways of deceiving Machine Learning algorithms emerge (e.g. fake news) — new security issues will surface for self-driving cars and other mission-critical systems. Thus far, concern in the public has been mainly centered around images, videos, audio – broadly speaking, the proliferation of “fake media” and “fake news” – but in 2019, we will see demonstrations of attacks that generate convincing, but fake structured and unstructured textual data that can cause problems in automated decision making around things like credit scoring and extracting data from documents.
Transfer Learning and Simulation become mainstream and help businesses overcome cold start problems and the high cost of amassing training data.
The success of most AI projects largely depends on the availability of high quality, labeled data. Most projects die here as there typically isn’t data about the problem at hand or it’s extremely difficult to hand-label all the data that does exist.
For example, for even something as simple as predicting whether a customer will buy a product, you face the cold start problem when you start out and you have no customers. If your business never gets really big, you never get the “big data” that might be necessary to take advantage of the most powerful techniques. Worse still, getting thousands of labels, with multiple passes, in cases where expert knowledge is needed (e.g. labeling a tumor) is incredibly costly.
An active area of AI research is how to address this set of challenges. How can we use powerful Deep Learning techniques even in situations where we have small amounts of data? Two approaches will see increased adoption within the enterprise in 2019. The first area that is working well is transfer learning — where models learned from one domain where there is a lot of data are used to bootstrap learning in a different domain where there is much fewer data.
For example, Landing AI is able to detect defects in objects on a manufacturing line using only a handful of defective examples. Anyone can now train a specialized object classifier (e.g. car or roof damages for automating insurance claims processing) starting from models that have learned a ton about images from large datasets like ImageNet. The domains don’t have to be based on the same data types. Researchers have used models learned on images to train classifiers for sensor data.
The second approach is synthetic data generation and simulation. Generative Adversarial Networks (GANs) allow us to create extremely realistic data. NVIDIA famously generated imaginary but very compelling celebrity faces using GANs. Self-driving car companies also create virtual simulated scenarios where they are able to train their driving algorithms over much larger driving distances than they are able to in real life. Waymo, for example, has driven 5 billion miles in simulation compared to 8 miles on the road. In 2019, companies will leverage simulations, VR and synthetic data to make giant advances with Machine Learning which were previously impossible due to data constraints.
Increasing demands for privacy will push more AI to happen on the edge and large internet giants will invest in edge AI to gain a competitive advantage.
As consumers become more wary of handing off all of their data to large internet companies, it will become a competitive advantage for companies to offer services that do not require handing off data to the cloud. It was generally thought that one had to use the cloud for expensive Machine Learning computations like facial and speech recognition, but a combination of hardware advances and a climate that is more privacy-aware will drive more Machine Learning to happen directly on mobile phones and even smaller edge devices, thus reducing the need for sending potentially sensitive data to centralized servers.
The trend is in early stages with companies like Apple doing Intelligent Processing (running Machine Learning models) on mobile devices instead of on the cloud (e.g. with CoreML and it’s special purpose Neural Engine chip and Google announcing the TPU edge product). In 2019, we will see the trend accelerate and much more of the mobile, smart home and IoT ecosystem will move Machine Learning work to the edge.
Next week, stay tuned for Debo Olaosebikan’s mid-year AI predictions and more.
To participate in our AI RADAR, drop us a line at firstname.lastname@example.org