Forecasting the New Hampshire Republican Primary with LLMs

By Wojciech Gryc on January 24, 2024

Introduction

The vision behind Emerging Trajectories is that if we have a semi-competent analyst with access to all of the world's information — polling data, articles, models, and more — they should be able to generate better forecasts than experts, superforecasters, or professional analysts.

As part of this exploration, we've built a forecasting platform to log LLM-powered forecasts and track real-world events. Our goal was to see how well LLMs could do in forecasting the results of an upcoming election or event.

Our Experiment

Emerging Trajectories is a relatively new project, so bear with us! We began forecasting via ChatGPT and GPT-4 (gpt-4-1106-preview) on January 17 and 18, respectively. We generated forecasts every day around 7pm ET. We specifically asked both models to predict (a) the proportion of votes to be cast for Trump, (b) the proportion of votes to be cast for Haley, and (c) the difference between the two.

A few important points:

  1. Both models had access to the Internet. In the case of ChatGPT, we were using GPT-4 powered by Bing, while GPT-4 used a PhaseLLM's web search agent that performed a Google search prior to generating results.
  2. We kept prompts the same day over day; there was no user intervention in terms of what the workflow looked like.
  3. We asked the LLMs to justify their predictions, in the hopes of understanding (a) what data they were using, and (b) to see what insights we could generate from these results.

Results

This was an interesting week, with Ron DeSantis dropping out of the race on January 21. Nikki Haley was also gathering momentum over the course of the week.

The results for both Trump and Haley are shown below, respectively.

Forecasts for Trump's % of Vote

Forecasts for Haley's % of Vote

Analysis

Interestingly, our approaching using Google Search results and PhaseLLM/GPT-4 performed better than ChatGPT. While we don't know what's happening under the ChatGPT hood, so to speak, we can analyze the responses from ChatGPT to understand a bit more about what's going on.

Conclusion and Next Steps

Both ChatGPT and PhaseLLM/GPT-4 were able to make forecasts regularly, and incorporated information provided to the prompts to generate their predictions. The information was, however, very limited — content from top ranking search results, and nothing more. It's very possible that the results of this forecasting process would have been much better with information from specific news outlets, a deeper RAG-based fact extraction from content, and even using external world models to help validate and guide the LLMs to make conclusions.

We will be running more thorough experiments around other upcoming events in the next few weeks. Stay tuned as we develop a more thorough framework and modeling approach!

Questions?

If you have any questions or if you want to get involved, please email us at hello --at-- phaseai --dot-- com.