Artificial intelligence systems, especially large language models such as GPTs, respond to text-based inputs with novel, humanlike text outputs. You can ask for an essay about the fall of Rome or a love poem to your romantic partner, and the system will readily generate it. Such systems can even take medical test information as inputs and generate logically coherent (and possibly correct) diagnoses. Nonetheless, experts do not recommend using these models for medical advice—at least not yet.
To understand why, it is important to understand how AI systems work. GPTs do not have “understanding” of medical science or poetry or the Roman empire in the sense that humans can understand these topics. Instead, these systems have learned associations between words and pieces of text.i When GPTs see a piece of text that says, “Write me an essay on the fall of Rome,” they see the words “write,” “essay,” and “fall of Rome.” They interpret that input as a request to generate a logically coherent (and possibly factually accurate) series of words and phrases connected to the text “fall of Rome” in essay format. The system learned these associations through a long series of trial-and-error efforts using massive computing power on vast amounts of text (i.e., the internet).
Because AI systems can take virtually any input and produce some logically coherent and possibly accurate output, it should come as no surprise that they can also take scores from Hogan assessments as inputs and generate text-based outputs. Given Hogan Personality Inventory scores as inputs, ChatGPT will provide interpretive text for those scores. For example, I asked ChatGPT to provide an interpretation of a score of 5 on the HPI’s Ambition scale, and it said:
“Your score in the Ambition scale is low, which suggests that you may not be particularly driven to achieve power, status, or wealth. You may be content with your current position and not feel the need to constantly pursue advancement or recognition. This can be a positive trait as it may allow you to focus on more important things in life.”
This is a logically coherent and reasonable interpretation. However, the interpretation does not say much about potential for leadership, the degree to which the scorer has a sense of direction in life, or the degree to which the scorer is comfortable in front of a large audience—all of which are captured by Hogan’s Ambition scale. Although the interpretation may seem accurate and valid, it may not be driven by any connection to Hogan at all, but simply by how the word “ambition” is used broadly in language.
Keep in mind that Hogan interpretative reports and guidance are based on empirical relationships between our assessments and outcomes. Scores on our assessments mean what they predict, and our reports reflect those relationships. This not to say that GPT-based interpretations of Hogan scores will not be valid or accurate now or in the future. In fact, training GPT models to reflect Hogan nomenclature is possible, and we are working on such tools currently.
But we must caution against using general artificial intelligence systems to generate interpretations of Hogan reports. Our own testing indicates that, at least on some occasions, the AI systems generate interpretations that are grossly incorrect and completely erroneous. While we may change our stance on this in the future as AI systems improve and more test results come in, for the time being we strongly recommend that any Hogan report interpretations come directly from Hogan or a Hogan-certified practitioner.
This blog post was written by Hogan Chief Science Officer Ryne Sherman, PhD.
Note
i. Some might argue that human understanding of these things is also simply association between words and text; we are not so sure we are ready to make that equivalency yet.