A recent study has shed light on how transformer models, which power modern chatbots, determine what information to pay attention to. When users interact with chatbots, they often treat prompts as conversations. However, how do chatbots understand and reference previous prompts? Researchers have now revealed the mechanism utilized by transformer models in deciphering relevant content.
Imagine having a lengthy text and asking a chatbot to identify key topics for aggregation and summarization. To achieve this, the chatbot needs to focus on the appropriate details. Samet Oymak, an assistant professor of electrical and computer engineering at the University of Michigan, supervised a study presented at the Neural Information Processing Systems Conference on December 13th. The study mathematically demonstrated how transformers learn this process.
For instance, GPT-4, a large language model, can handle entire books. Transformers break down the text into smaller segments called tokens, which are processed in parallel while retaining the context around each word. GPT spent years assimilating internet text before emerging as a chatbot capable of engaging in comprehensive conversations.
The key to transformers lies within their attention mechanism. This mechanism determines the most relevant information. The team led by Oymak discovered that transformers employ an old-school method in accomplishing this—support vector machines (SVMs), which were invented three decades ago. SVMs create boundaries that categorize data into two categories. They are commonly used in sentiment analysis to identify positive and negative sentiments in customer reviews. Surprisingly, transformers employ a similar process to determine what information warrants attention and what can be disregarded.
Although the conversation with ChatGPT mimics human interaction, it is actually a multi-dimensional mathematical process. Each text token is transformed into a vector, a string of numbers. When prompted, ChatGPT utilizes its attention mechanism to assign weights to each vector, word, and word combination, deciding which to consider in formulating its response. Acting as a word prediction algorithm, it predicts the first word and continues until it completes the response.
When a new prompt is entered, ChatGPT considers it as a continuation of the conversation. However, in reality, ChatGPT reevaluates the entire conversation from the beginning, assigning new weights to each token, and crafting a response based on this updated assessment. This process gives the impression that ChatGPT can recall earlier statements. For instance, if given the first hundred lines of Romeo and Juliet and asked to explain the conflict between the Montagues and Capulets, ChatGPT can summarize the most relevant interactions.
Although the operational process of transformer neural networks was already known, these architectures do not have an explicit threshold for determining what information should be attended to and what should be ignored. That is where the SVM-like mechanism comes into play.
According to Oymak, “We don’t understand what these black box models are doing, and they are going mainstream.” This study is one of the first to reveal how the attention mechanism can uncover valuable information within vast amounts of text.
The research team aims to utilize this knowledge to enhance the efficiency and interpretability of large language models. They anticipate its application in other AI areas where attention is crucial, such as perception, image processing, and audio processing.
A more in-depth paper on the topic, titled “Transformers as Support Vector Machines,” will be presented at the Mathematics of Modern Machine Learning workshop at NeurIPS 2023.
Note:
1. Source: Coherent Market Insights, Public sources, Desk research
2. We have leveraged AI tools to mine information and compile it
Money Singh is a seasoned content writer with over four years of experience in the market research sector. Her expertise spans various industries, including food and beverages, biotechnology, chemical and materials, defense and aerospace, consumer goods, etc.