Major Challenges of Natural Language Processing

Natural Language Processing is a strong tool that has enormous advantages but also has many drawbacks and difficulties.

Contextual words and phrases and homonyms

The same words and sentences may have a different meaning, and several words, particularly in English, are exactly the same but have a different meaning.

For example:

ran to the store because we ran out of milk.

Can I run something past you real quick?

The house is looking really run down.

This is understandable to humans because we read the sentence context and interpret all the various meanings. And although NLP language models would have learned all meanings, there may be challenges in distinguishing them into context.

Synonyms

Ali and his friends (2019) stated this challenge in their paper  as “Distinguishing between antonymy and synonymy is one of the crucial problems for NLP applications, especially those focused on lexical-semantic relation extraction, such as sentiment analysis, semantic relatedness, opinion mining and machine translation.” Synonyms can lead to problems in understanding the concepts because we have the same meanings in several different words. In addition, some of these terms can have the same definition, while some can be complicated and different people use synonyms to represent slightly different meanings within their own language.

Therefore, it is crucial to integrate all the possible definitions and synonyms of a term for building NLP systems. Text analysis models can still make errors at times, but as they obtain more relevant training data, synonyms are better understood.

Irony and sarcasm

Pang and Lee (2006) emphasize that “Sarcastic and ironic expressions are prevalent in social media and, due to the tendency to invert polarity, play an important role in the context of opinion mining, emotion recognition, and sentiment analysis. ” Irony and sarcasm create difficulty with NLP models since people use words and expressions that, by definition, are either positive or negative, but connote the opposite.

Errors in text or speech

Misspelled or abused words may cause text analysis issues. Self-correction and grammar corrections can manage common errors, but the writer’s purpose is not always understood.

Colloquialisms and slang

There are a range of issues for NLP in informal words, sentences, idioms, and lingo culturally particular, particularly with models designed for wide-ranging use. Because colloquialism may not have any “dictionary definitions,” as a formal language, and in different geographical regions, those phrases may even have various meanings. Moreover, cultural slang continues to grow, so new words come out every day.

Domain-specific language

Various companies and industries actually use very different languages. Therefore, extremely niche industries may need their own models to build or train. For example, an NLP processing model necessary for health care would vary considerably from processing legal documents.

Source

Ali, M. A., Sun, Y., Zhou, X., Wang, W., & Zhao, X. (2019). Antonym-Synonym Classification Based on New Sub-Space Embeddings. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 6204–6211. https://doi.org/10.1609/aaai.v33i01.33016204 

Ilić, S., Marrese-Taylor, E., Balazs, J., & Matsuo, Y. (2018). Deep contextualized word representations for detecting sarcasm and irony. Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, 1–6. https://doi.org/10.18653/v1/w18-6202 

Bo Pang and Lillian Lee. 2006. Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval, 1(2):91–231.