A shortage of both engineering talent and high-quality training data are just two obstacles we need to address in 2019
At the start of the year, many media outlets were reporting that 2018 would be the “Year of AI.” Analysts were predicting that worldwide revenues for cognitive and Artificial Intelligence (AI) systems had reached $12.5 billion in 2017 — a 60% increase over 2016.
Further, investment in AI would continue that trajectory in 2018 and beyond, achieving a compound annual growth rate (CAGR) of 54.4% through 2020, by which time revenues would exceed $46 billion. While the jury is still out on whether these numbers have been achieved, it’s clear to see that 2018 has been a big year for AI.
When this is combined with the intense media focus on AI innovation and the way it’s disrupting industries from law to banking, it’s no wonder that there are several schools of misinformation around Machine Learning. There are plenty of people who are convinced that AI will put us all out of work within a decade.
On the other hand, there are those who believe that hopes for AI are exaggerated and that interest in Machine Learning is nothing but a passing phase. Both of these viewpoints are overly simplistic, and neither helps to clarify matters for those who are actually curious about the effects of automation five, ten, or twenty years from now. The state of AI is a complex topic, but it’s probably fair to say that the current pace of AI innovation is slower than many expected.
Challenges facing the industry
1. Talent shortage
There are only 300,000 AI engineers worldwide, but millions are needed. There simply aren’t enough people who truly understand the complex technologies behind AI to improve on the ambitious pace we’ve established. Neill Gernon of UK-based Atrovate makes a good point about the skills gap that’s emerging and how it can be bridged.
2. Paucity of AI training data
High-performing algorithms cannot be created without high-quality data. However, it’s proving difficult for companies to collect and organize the enormous amount of unbiased, categorized and labeled data needed to train a machine.
Why is AI training data so important?
AI training data is used to build algorithms and teach them to perform tasks. It usually contains pairs of input information and corresponding labeled answers. In some fields, the input information will also have relevant tags to help the algorithm make accurate predictions. For example, in sentiment analysis, the AI training dataset usually includes input text with output attribute labels of positive, negative, or neutral. In image recognition, the input would be the image, and the label would suggest what is depicted in that image (e.g. table, chair, etc.).
Researchers use the training data in an iterative approach that fine-tunes the algorithm’s predictions and improves its success rate. Of course, to train the algorithms effectively, researchers need a large amount of data. However, data quality is equally important. AI researchers must implement rigorous procedures to ensure the data is clean and organized before using it to train an algorithm. Duplicate, flawed, or irrelevant data can impede an algorithm’s ability to recognize patterns or create unbiased results. Even small errors, such as incorrectly tagging a word as a noun instead of verb, can have a significant impact. Just ask Amazon.
Read More: Context and Data is King
Ways to boost success
1. Carefully plan your data-gathering roadmap
Most companies don’t know how to effectively use Machine Learning for business, so they aren’t motivated to collect AI training data. These companies often view Machine Learning as just an ‘experiment’ or side project. Even companies that are aware of how Machine Learning can benefit their business are still learning the ropes, so they may not know what kind of algorithm to use (for example, classic versus neural networks). The starting point for any successful AI project is a training-data requirements document.
2. Don’t underestimate the time it takes to gather data
Companies often decide at the last minute to implement machine learning, right after seeing that their competitor released a shiny new AI product. This leads to a stressful scramble to collect data, resulting in a messy dataset hastily assembled from random sources. Realistically, you need to collect data for weeks or months to train a well-performing machine learning model. If you rush that process, or worse build the algorithm with unclean data, then you’ll end up with a poor model that is unfit for use — an expensive piece of broken tech.
3. Tagging, labeling and classifying — essential tasks that you can outsource
Most Machine Learning algorithms are built on ground truth training data. In order to get useful training data that will provide a solid foundation for a model, human annotation is often required. Although there are several crowdsourcing companies like Gengo that can efficiently and inexpensively create and annotate datasets to a high standard, the initial perception among many people is that this requires a huge, expensive workforce.
This misunderstanding sometimes kills promising machine-learning projects over budget concerns, at exactly the point that these services are enabling small and medium-sized businesses to make an impact in AI. Make sure that you’ve done your research and found a company that can fulfil your annotation needs — it could save your project from the scrapheap.
Great AI requires exceptional training — by humans
Ironically, for all the talk of AI displacing humans, they will be very much in the loop where training is concerned. To drive the pace of AI innovation we’re going to need large volumes of data, and that will require humans to collect, categorize, and prepare the data. This task will prove easier for some companies than others.
Large technology companies such as Google and Uber have the resources to build teams and hire employees or contractors who can focus solely on AI training. However, you don’t need an AI department to contribute to the growth of the sector. For small to midsize businesses, there’s an emerging industry of crowd-powered services that can efficiently clean, tag, and annotate their data so that it’s ready for use in their upcoming machine-learning projects.