I’ve been experimenting with different large language models and have been using Verida’s Personal Agent Kit to connect a command line chatbot to my emails, Telegram data, and other private information. During my experiments, I noticed something unexpected in how the various models process tools.
I started by running my command line app configured to use Claude 3.5 from Anthropic. I asked it to list the tools it can access and then inquired about the parameters for the query email tool. Claude correctly returned a detailed list of parameters—such as selector, field, sort, limit, skip, and count—demonstrating its full support for filtering and sorting.
Next, I switched my configuration to use OpenAI’s GPT-4. After updating the settings, I ran the same command to view the available tools. While GPT-4 did list the tools (with names or descriptions), when I asked for the parameters of the query email tool, it only returned four: field, limit, skip, and count. Notably, it omitted the selector and sort parameters. Without these, it's difficult to filter emails (for example, by using a regular expression or distinguishing between sent and received emails) effectively.
I also tested Llama 3.3 70B. When configured with Llama, the model initially tried to execute the tools rather than just listing them. After a second attempt—making sure it only listed the parameters—it successfully returned the complete set, including selector and sort.
This inconsistency, particularly with GPT-4 and other OpenAI models, limits their utility in scenarios where full tool parameter support is required. I’m curious to know if others have experienced similar issues or have found workarounds.