0:00
/
0:00
Transcript

OpenAI (gpt-4o) is unable to use tools properly

Comparing OpenAI, Anthropic and Meta models and their ability to properly understand LLM tools

I’ve been experimenting with different large language models and have been using Verida’s Personal Agent Kit to connect a command line chatbot to my emails, Telegram data, and other private information. During my experiments, I noticed something unexpected in how the various models process tools.

I started by running my command line app configured to use Claude 3.5 from Anthropic. I asked it to list the tools it can access and then inquired about the parameters for the query email tool. Claude correctly returned a detailed list of parameters—such as selector, field, sort, limit, skip, and count—demonstrating its full support for filtering and sorting.

Next, I switched my configuration to use OpenAI’s GPT-4. After updating the settings, I ran the same command to view the available tools. While GPT-4 did list the tools (with names or descriptions), when I asked for the parameters of the query email tool, it only returned four: field, limit, skip, and count. Notably, it omitted the selector and sort parameters. Without these, it's difficult to filter emails (for example, by using a regular expression or distinguishing between sent and received emails) effectively.

I also tested Llama 3.3 70B. When configured with Llama, the model initially tried to execute the tools rather than just listing them. After a second attempt—making sure it only listed the parameters—it successfully returned the complete set, including selector and sort.

This inconsistency, particularly with GPT-4 and other OpenAI models, limits their utility in scenarios where full tool parameter support is required. I’m curious to know if others have experienced similar issues or have found workarounds.