The amount of text data we send out in the world is staggering. On average, there are 500 million tweets sent per day, 23 billion text messages, and 306.4 billion emails. Everything we say, every email we send, and every word on our resumes can be used to understand the world around us, and it also gives us clues about the individual speaking or writing. Hogan’s Data Science team is exploring how best to capture text data and harness its power in understanding human nature. Below are some frequently asked questions that people unfamiliar with text-based machine learning often ask us.
Q: What is NLP?
A: NLP is a type of artificial intelligence that uses machine learning to break down, process, and quantify human language. NLP helps us understand the hidden stories within text-based data.
Q: Why is NLP important?
A: Up to 95% of usable organizational data is unstructured, resulting in an increased drive for using this data to remain competitive. The competition and consistent advancements in computational power, data access, and open-source research initiatives have led the field of NLP to evolve and grow constantly.
Q: How do NLP, artificial intelligence, and machine learning relate to each other?
A: Broadly speaking, artificial intelligence refers to using machines to mimic human decision making. The decision making can be either rule-based (the machine is told which rules and procedures to follow explicitly by the designer) or learned (the machine learns the rules and procedures based on data).
Machine learning refers to machines learning from data. A machine is said to learn if its performance increases on a particular task in response to exposure to a new experience or new data that tells the machine how to perform a task better. Machine learning is a subset of artificial intelligence.
Natural language processing refers to using a machine to quantify human language. NLP includes both rule-based and machine learning techniques. So, NLP is a type of artificial intelligence centered around human language that often uses machine learning.
Q: What are some common NLP techniques?
A: There are several common techniques used in this research, including:
- Bag of words – A list of all the words used in a training sample.
- Inverse document frequency – The number of documents a word appears in.
- Lemmatization – Removing inflections from a word. For example, “studies” would become “study.”
- N-grams – Word combinations where the N indicates the number of words to be combined. For example, a bi-gram could be “computer science.”
- Stemming – Reducing a word to its stem. For example, “studies” would become “studi.”
- Stop words – Frequently used or extremely common words often removed in NLP analyses.
- Term frequency – How often a word occurs in a document.
Q: What are some everyday examples of NLP?
A: There are several applications in which NLP might drive something you encounter and use daily:
- Siri, Alexa, or Google Assistant
- Spell-check
- Autocomplete
- Voice-to-text messaging
- Search engines
- Spam filtering
Q: How is Hogan using NLP?
A: One way we are using NLP is by streamlining the coding process of focus group notes for personality scale relevance. We’re injecting NLP into our job analysis strategy to increase the efficiency of the approach and improve the quality of our results. Manually reading and coding focus group notes is a time-intensive and cognitively draining process. Using NLP, on average, we can decrease the overall time it takes while maintaining predictions that are both consistent and accurate. This approach has already shown promising results for correctly identifying the relevance of personality characteristics from focus group notes. When compared against human raters (i.e., subject-matter experts, or SMEs), our model was consistent and had an average accuracy score higher than the average accuracy of the SMEs. Please see our blog post on NLP from February 11, 2020, for more details.
Q: What are some new research directions Hogan is exploring with NLP?
A: Hogan’s Data Science team has several projects in the works using NLP to expand our insight from available text data housed internally as well as from open-source applications (e.g., O*NET):
- Job family matching using job descriptions – Hogan is exploring using NLP to allow someone to enter their job description and receive the job family and relevant Hogan assessment scales linked to their job description.
- Automatic item writing – Hogan is exploring using natural language generation to automatically generate assessment items that (1) tap specific Hogan personality domains, (2) are equivalent in difficulty and readability to our current items, and (3) are interchangeable with our current items to ensure both test security and fairness of the assessment process.
- Automatic feedback generation – Hogan is exploring using natural language generation to provide accurate, quick, and unique feedback to every user based on that user’s assessment results.