Do Robots Read? The short answer is “YES”, but not in the way you might think. Unless you’re in the field of data science and natural language processing (NLP). 😉 So, how can a machine learning system read? Reading is not an easy task even for humans, since the same text can be interpreted differently by different individuals. This problem is not as prominent in the field of computer vision as it is in NLP. Let’s suppose we are building algorithms that are able to classify images. The task, in general, is non-ambiguous. Unless you’ve never seen a bicycle before, if you see a picture of one you know what it is. This is not often the case when it comes to reading and interpreting text. A good example are the possible interpretations of the following headline found in the Wall Street Journal:
“Republicans Grill IRS Chief Over Lost Emails.”
This type of sentence is a great example because it can have two very different interpretations.
- Republicans harshly question the chief about the emails
- Republicans cook the chief using email as the fuel
Even though the second interpretation might sound ridiculous to you, that might not be the case for a machine.
The intrinsic ambiguity of language
It’s really easy, even for us humans, to misunderstand language sometimes. How we interpret what someone says is conditional on our own feelings, past experience, and situational context.
So, how can data science and machine learning, in particular, solve that kind of problems? Let’s first define what NLP really is. In short, it is the combined usage of programming and math to perform language-based tasks. As we all might agree, mathematics is the only language shared by all humans regardless of culture, religion, or gender. Pi is still approximately 3.14159 regardless of what country you are in or what language you speak. There are various mathematical techniques that are used to analyze text. We can separate two general approaches for this type of analyses.
Frequency-based approach – by counting the word occurrences in the text and not using their order, hence ignoring the contextual meaning of the sentences, it relies on the most frequently used and the most important words for making inferences on the data.
This approach tends to be used in shallow language-processing models rather than in deep-learning models. Models using the frequency-based approach are nonetheless a powerful and unavoidable feature-engineering tool utilized when using lightweight text-processing models such as logistic regression and random forests.
Context-based approach – One-dimensional convolutional neural networks and recurrent neural networks are capable of learning representations for groups of words and characters without being explicitly told about the existence of such groups. They do this by looking at the continuous word or character sequences.
While the models based on the first approach have been used for many years and are still popular today, this second approach has made huge strives during recent years. Unlike frequency-based models, neural language models are able to recognize that two words are similar without losing the ability to encode each word as distinct from the other. Neural language models share statistical strength between one word (and its context) and other similar words and contexts. The distributed representation the model learns for each word enables this type of sharing by allowing the model to similarly treat words that have features in common. These representations, also called word embeddings, are low dimensional floating-point vectors. Embeddings can be learned jointly with the task you are trying to complete. An alternative is to use pre-trained embeddings derived from more general machine-learning tasks.
The main concept behind word embeddings is to compute vectors corresponding to each word in the vocabulary set in a way that preserves the semantic relationships between these words in geometric terms. Word embeddings are meant to map human language into a geometric space. In this sense, one would expect synonyms to be embedded into similar word vectors (vectors that are close in distance). In addition to distance, you may want specific directions in the embedding space to be meaningful.
The prominent examples of meaningful geometric transformations are “gender” vectors and “plural” vectors. The results can be used for arithmetic operations on words. For example, the famous king – man + woman = queen.
These distributed representations are used to improve models for natural language processing because, as mentioned before, they preserve contextual information. Additionally, by using dimensionality reduction techniques, such as t – SNE one can construct word clouds based on these embeddings.
Of course, none of the modern deep learning models truly understand a text in the human sense. Rather, these models can map the statistical structure of written language, which is sufficient to solve many simple textual tasks. Deep learning for natural-language processing is pattern recognition applied to words, sentences, and paragraphs, in much the same way that computer vision is pattern recognition applied to pixels. Hence there are NLP related tasks that modern modeling approaches are quite successful at such as text classification. This includes sentiment analysis, spam detection, etc. Modern techniques are also quite good at machine translation and information extraction. But there are more advanced tasks such as machine conversations and summarization of the documents, the solutions to which are not quite complete. There is active research being done to improve the results in these areas.
The answer to whether robots are able to read, in a mathematical sense, is yes. And one of the goals of modern data science research is to come as close as possible to the imperfect, but in any case, the gold standard of intelligence; human perception of written text.
I hope you enjoyed this article and stay tuned to the Develandoo Blog. There is more exciting stuff to come!
- Artificial Intelligence