New tools for building agents

389 points by meetpateltech 4 months ago

bob1029 4 months ago

I don't know how much this API churn is going to help developers who are trying to integrate OAI into real, actual, non-wrapper products. Every vendor-managed state machine that handles conversation, messages, prompt hand-off, etc., has ultimately proven inadequate, presumptive or distracting for my use cases.

At the end of the day, all I ever seem to use is the chat completion API with structured outputs turned on. Despite my "basic" usage, I am employing tool use, recursive conversations, RAG, etc. I don't see the value in outsourcing state management of my "agent" to a 3rd party. I have way more autonomy if I keep things like this local.

The entire premise of these products is that you are feeding a string literal into some black box and it gives you a new string. Hopefully, as JSON or whatever you requested. If you focus just on the idea of composing the appropriate string each time, everything else melts away. This is the only grain that really matters. Think about other ways in which we compose highly-structured strings based upon business state stored in a database. It's literally the exact same thing you do when you SSR a webpage with PHP. The only real difference is how it is served.

cpfiffer 4 months ago

This is my sense too.
I haven't really found any agent framework that gives me anything I need above a simple structured gen call.
As you say, most requests to LLMs are (should be?) prompt-in structure-out, in line with the Unix philosophy of doing precisely one thing well.
Agent frameworks are simply too early. They are layers built to abstract a set of design patterns that are not common. We should only build abstractions when it is obvious that everyone is reinventing the wheel.
In the case of agents, there is no wheel to invent. It's all simple language model calls.
I commonly use the phrase "the language model should be the most boring part of your code". You should be spending most of your time building the actual software and tooling -- LLMs are a small component of your software. Agent frameworks often make the language model too large a character in your codebase, at least for my tastes.
sippeangelo 4 months ago

I mirror this sentiment. Even their "function calling" abstraction still hallucinates parameters and schema, and the JSON schema itself is clearly way too verbose and breaks down completely if you feed it anything more complex than 5 very simple function calls. This just seems to build upon their already broken black box abstractions and isn't useful for any real world applications, but it's helpful for getting small proof-of-concept apps going, I guess...
- swyx 4 months ago
  
  > Even their "function calling" abstraction still hallucinates parameters and schema
  huh? sample code please? this should not be true since Structured Outputs came out - literally prevented from generating invalid json
  (more: https://www.latent.space/p/openai-api-and-o1)
  
  amitness 4 months ago
  
  It's not enabled by default for their function calling API. So, hallucination is possible.
  You have to set 'strict' to True manually to use the same grammar-based sampling they use for structured outputs.
  https://platform.openai.com/docs/guides/function-calling?api...
Androider 4 months ago

Exactly. You would have to be naive to build a company on top of this kind of API. LLMs are going to be become commodities, and this is OpenAI fighting against that fate as their valuation and continued investment requirements doesn't make any sense otherwise.
If you built on the Assistant API, maybe take the hint and don't just rewrite to the Responses API? Own your product, black box the LLM-of-the-day.
- bob1029 4 months ago
  
  > OpenAI fighting against that fate as their valuation and continued investment requirements doesn't make any sense otherwise.
  Is it actually the case that OpenAI couldn't be viable if all they offered was a simple chat completion API on top of the web experience?
  It seems to me the devil is all in how the margin plays out. I'd focus on driving down costs and pushing boundaries on foundation models. If you are always a half step ahead, highly reliable and reasonably cheap, your competitors will have a tough time. Valuations can be justified if businesses begin to trust the roadmap and stability of the services.
  I'll tell you what's not working right now is the insane model naming scheme and rapid fire vision changes. This kind of stuff is spooking the technology leaders of large prospective customers. Only the most permanently online people can keep things straight. Everyone was super excited and on board with AI in 2024 because who wants to be left out. I think that energy is still justified in many ways, but we've also got to find a way to meet more of the customer base where they are currently at. Wrappers and agentic SDKs are not what these people are looking for. Many F500s already have a gigantic development team who can deal with deep, nasty API integrations and related state contraptions. They're looking for assurances/evidence that OAI's business & product line will remain stable for the next 5+ years before going all-in.
  
  jjfoooo4 4 months ago
  
  The point of the bear thesis on OpenAI is that training frontier models is extraordinarily expensive. They can’t produce cutting edge models, charge a cheap price, and make a profit all at once
- ozim 4 months ago
  
  Looking at all the „AI specialists” that popped up recently- I have feeling there is enough naivety out there for it to work.
  
  dartos 4 months ago
  
  Oh man, don’t look up “vibe coding”
  
  ozim 4 months ago
  
  Too late ;) I ran into 2 guys that were exactly bragging about „vibe coding” on meetups. I just nod ↕
  
  JTyQZSnP3cQGa8B 4 months ago
  
  It feels like I'm becoming way too old for all the new computer stuff. I spent 2 decades trying to use every language available to write reliable programs for everyone, and now the whole world is jumping in this black hole / black box controlled by a few big companies where the output is random and definitely not up to my own standards.
  It's very sad because we were supposed to do better than those who came before us, but instead we're throwing everything in the trash for a so-called productivity that I don't think even exists out of the influencers' brains.
  
  dartos 4 months ago
  
  Me too. That resonates really strongly with my feelings.
  I’m hoping that most people aren’t full steam in AI.
  I haven’t had any coworkers who just rely on AI… I have had some bosses who do tho…
- Terretta 4 months ago
  
  > You would have to be naive to build a company on top of this kind of API.
  You have to be purposefully naive to be a cutting-edge tech entrepreneur in the first place. If you fully acknowledged every risk and roadblock ahead, you’d probably never start.
  But that deliberate naiveté is exactly what’s required to launch a startup in VUCA-space. Outsized success comes from exploiting emerging complexities: betting despite ambiguity, adapting quickly on top of uncertainty, and turning volatility into advantage.
  
  guappa 4 months ago
  
  Recognising when something makes 0 business sense is an important skill for an entrepreneur.
daviding 4 months ago

This bit feels like we are being pushed away from the existing API for non-technical reasons?
> When using Chat Completions, the model always retrieves information from the web before responding to your query. To use web_search_preview as a tool that models like gpt-4o and gpt-4o-mini invoke only when necessary, switch to using the Responses API.
Porting over to the new Responses API is non-trivial, and we already have history, RAG and other things an assistant needs already.
- jjfoooo4 4 months ago
  
  From their perspective, if they don’t have your data, it’s too easy to switch providers.
  
  isoprophlex 4 months ago
  
  Exactly this is what's going on. Moat-building.
- zwily 4 months ago
  
  I can’t find that text in the announcement. In fact it sounds like you have to use a specific model with the chat completions endpoint to get web searches.
  
  bob1029 4 months ago
  
  In the API they are named like "gpt-4o-search-preview".
sagarpatil 4 months ago

Couldn’t have said it better. I’ve developed multiple agents with just function calling, structured outputs and they have been in production for more than a year (back in the day we did not call it agent lol) I think these is targeted towards people who are already using agent frameworks + OpenAI API.
- BonoboIO 4 months ago
  
  What are the agents doing for you? Just interested in your actual use cases.
EGreg 4 months ago

“ These new tools streamline core agent logic, orchestration, and interactions, making it significantly easier for developers to get started with building agents”
Sounds exactly like “the cloud”, especially AWS. Basically “get married to our platform, build on top of it, and make it hard to leave.” The benefits are that it’s easy to get started. And also that they invested in the infrastructure, but now they are trying to lock you in by storing as much state and data as possible with them withoit an easy way to migrate. So, increase your switching costs. For social networks the benefit was that they had the network effect but that doesn’t apply here.
- Androider 4 months ago
  
  All of AWS' big money makers are the meat-and-potatoes services around compute, storage, databases etc. where you could drop their offering and replace it with another in a straightforward way. It will cost you to migrate in terms of time and direct spend (those egress fees...), but it's possible. Companies ultimately stay put because the products work and the price is reasonable, but if they tried to 10X the price overnight everyone would eventually bolt.
  Yeah they keep pushing higher-level services, but the uptake of these is extremely limited. If you used something like SageMaker, which has an extremely high lock-in factor, it's probably because you're an old school company that don't know what you're doing and AWS held your developer's hand to get the Hello World-level app working, but at least you got your name printed in their case study materials of the project at the end.
  I think OpenAI looks at AWS and thinks they can do better. And for their investors, they must do better. But in the end I think the commoditization of LLMs is already almost complete, and this is just a futile attempt to fight it.
ripped_britches 4 months ago

100000000000%
Don’t be fooled by moving state management to somewhere other than your business logic unless it enables a novel use case (which these SDKs do not)
With that said, glad to see the agentic endpoints available but still going to be managing my state this way
malthaus 4 months ago

outsourcing state to openai & co is great for them as vendor lock-in. the real money in AI will be business- and user-interfacing tools built on top of the vendors and it would be a terrible business decision to not abstract away from the model provider in the background and keep all private data under your domain, also from a data protection / legal point of view
i can understand them trying to prevent their business from becoming a commodity but i don't see that working out for them except for some short term buzz, but others will run with their ideas in domain specific applications
edoceo 4 months ago

I just use OpenAI to help me build these "necessary" patterns against their own API. Why make me use some framework when the AI is the framework?
LoganDark 4 months ago

Speaking of string literals, I hate that the state of the art nowadays is to force you to format model inputs as a conversation with separate messages. Ever since OpenAI did that and discontinued the regular completions API it became nearly useless for me since I don't use LLMs for conversation. And because OpenAI is the Apple of LLMs, everyone else is copying that worthless chat messages abstraction and not providing normal completions as they should.
Don't get me wrong, chat completions are nice to have for certain use cases, but that being the only option makes me practically unable to use the model.
danielmarkbruce 4 months ago

100%. I'll build the application, thanks.
But you can't expect them not to try.
mortoc 4 months ago

I get the sense that these sorts of tools are more for power users than for software engineers with production AI experience.
- zombiwoof 4 months ago
  
  Every manager I see now who gave up or was not a good coder is now chomping at the bit to use these tools
  
  Der_Einzige 4 months ago
  
  Not true. It’s impossible to find talent with experience in major agent frameworks like smolagents, autogen/ag2, crewAI.
  I wish that there were tons of managers desperate to learn how to use these tools. I’m not seeing it!
  
  OutOfHere 4 months ago
  
  The talent you speak of can be cultivated in house, but management never has the appetite for it.
samstave 4 months ago

The weak point in the OAI armor is SLAs.
So - are people forming relationships with OAI which include an SLA, and if so - what do those look like?

simonw 4 months ago

There's a really good thread on Twitter from the designer of the new APIs going into the background behind many of the design decisions: https://twitter.com/athyuttamre/status/1899541471532867821

Here's the alternative link for people who aren't signed in to Twitter: https://nitter.net/athyuttamre/status/1899541471532867821

bradyriddle 4 months ago

The nitter link is appreciated!
- ElijahLynn 4 months ago
  
  TIL about Nitter, so grateful as I have Twitter blocked on my computer and phone.
  
  sixhobbits 4 months ago
  
  oh cool, I thought nitter died with the API changes. Glad they have it working again.
telotortium 4 months ago

https://threadreaderapp.com/thread/1899541471532867821.html

mrtksn 4 months ago

I feel like all those AI agent attempts are misguided at their core because they don't attempt to create new ways but replace humans on the legacy systems. This is fundamentally shortsighted because the economy, life and everything is about humans interacting with humans.

The current AI agent approach appears to be permutations of the joke about how people will make AI to expand their once sentence to a long nice e-mail and the AI on the receiving end will summarize that long e-mail into single sentence.

I get that there's a use case for automating tasks on legacy systems but IMHO the real opportunity is the opportunity to remove most of the legacy systems.

Humans are not that bad you know? Is it creating UIs for humans using AI then make AI use these UI to do stuff really the way forward?

NitpickLawyer 4 months ago

> because the economy, life and everything is about humans interacting with humans.
How many hand crafted, clay bowls, baked in a human powered kiln are you using everyday? Or how many weaved baskets, made out of hand picked sticks?
History has showed that anything that can be automated, will be automated. And everything that can be made "cheaper" or "faster" will as well.
- mrtksn 4 months ago
  
  That's not the point though, I'm not anti-automation or anything like that. The point is, using robots on interfaces and systems made for people is not the way to go.
  Why would you want to have your swipes on Tinder and your trip planning to Rio be automated through human interface? If it was for legit reasons it would have happened as from machine to machine communications. I'm big fan of the AI agent concept, my objection is that at in its current state people don't think out of the box and propose using the current infrastructure to delegate human functions instead of re-imagining the new world that is possible when working together with AI.
  
  NitpickLawyer 4 months ago
  
  > my objection is that at in its current state people don't think out of the box and propose using the current infrastructure to delegate human functions instead of re-imagining the new world that is possible when working together with AI.
  Ah, my bad I missread your initial post.
  If I now understand what you're saying, I think there's a paralel in manufacturing, where "custom made bots" on assembly line will win against "humanoid bots" every time. The problem there is that you have to first build the custom-made bots, and they only work on that one task. While a "humanoid" bot can, in theory, do more general things with tools already in place for humans.
  I think specialised APIs and stuff will eventually be built for AI agents. But in the meantime everyone wants to be first to market, and the "human facing" UI/UX is all we have. So they're trying to make it work with what's available.
  
  mrtksn 4 months ago
  
  Right, IMHO the amazing thing about AI is that it can actually build the custom made bot from scratch every time you need it.
  They just need to go a few steps back and evaluate why this system was needed in first place. Awful lot of software and all kinds of interfaces exists only to accommodate humans who need to be in the loop when working with machines and are not actually needed if you are taking the human out of the loop. You can be taking humans off the loop for legit or nefarious reasons and when its legit there's usually opportunity to coordinate with the other machines to remove the people specific parts to make things more efficient.
  In programming this is even more evident, i.e %100 of the programming libraries exist only to make developers do stuff easier or prevent re-inventing the wheel.
  The part about making the developers life easier is quite substential and can be removed by making the AI write the exact code needed to accomplish the task without bothering with human developer accommodations like libraries to separate the code into modules for maintainability.
bob1029 4 months ago

I think the most valuable path for the current generation of AI models is integrating them with the configuration and administration side of the product.
For example, as a supplemental user experience that power users in your org can leverage to macro out client configuration and project management tasks in a B2B SaaS ecosystem. Tool use can be very reliable when you have a well constrained set of abstractions, contexts and users to work with.

zellyn 4 months ago

Notably not mentioned: Model Context Protocol https://www.anthropic.com/news/model-context-protocol

koconder 4 months ago

100% but this is not the same thing, nor is this going to replace the agent SDK (or visa versa). Agents will always need some form of communication protocol, if we look at the world and agentic frameworks its a sea of logos and without some forms of open standards this would be hard.
I'm currently at Comet and I have personally worked on MCP implementations AND have made some contributions to Agent SDK in the form of a native integration and improvement to test suite.
- https://github.com/comet-ml/opik-mcp
- https://github.com/openai/openai-agents-python/pull/91
Our recent integration shipped on day 1:
- https://www.comet.com/docs/opik/tracing/integrations/openai_...
I think the key to what OpenAI is pushing towards is simplicity for developers through very easy to use components. I won't comment on the strategy or pricing etc, but on first glance as a developer the simple modular approach and lack of bloat in their SDK is refreshing.
Kudos to the team and people working on the edge to innovate and think differently in an already crowded and shifting landscape.
nilslice 4 months ago

not implementing doesn't mean its not supported https://github.com/dylibso/mcpx-openai-node (this is for mcp.run tool calling with OpenAI models, not generic)
but yes, it's the strongest anti-developer move to not directly support MCP. not surprised given OpenAI generally. but would be a very nice addition!
- benatkin 4 months ago
  
  DeepSeek doesn’t seem to support it either FWIW. Maybe MCP is just an Anthropic thing.
  
  nilslice 4 months ago
  
  It is not only an Anthropic thing, and it works with any model that supports function calling, which DeepSeek did not when it first launched. That probably has changed since, but I haven't looked!
  
  benatkin 4 months ago
  
  I don't like it. I don't like the OpenAI API all that much either but at least it's lightweight. I think MCP would fit better on mcp.anthropic.com to go along with their email address mcp-support@anthropic.com at the bottom of https://modelcontextprotocol.io/
  I wish they'd done a smaller launch of it and gather feedback rather than announcing a supposed new standard which feels a lot like a wrapper.
  This here is atrocious https://github.com/modelcontextprotocol/quickstart-resources... It includes this mcp PyPI package which pulls in a bunch of other PyPI dependencies. And for some reason they say "we recommend uv". How is that related to just setting up a tool for an AI to use?
  Compare that to this get weather example: https://api-docs.deepseek.com/guides/function_calling/
  It makes me not want to use Claude/Anthropic.
  
  burningion 4 months ago
  
  That example code on DeepSeek doesn't actually include the logic to call a weather API? It just puts a fake answer back in, and you've got to handle the process manually.
  The pyproject.toml in the Model Context Protocol example is just showing the new, "best" way to distribute and install Python projects and dependencies. If you haven't used uv before, it makes working with Python projects substantially better.
  The Model Context Protocol server lets the model autonomously use the tool and incorporate its result. It's a much cleaner (imo obviously) separation of tool definition and execution.
thenameless7741 4 months ago

it's mentioned in the main thread: https://nitter.net/athyuttamre/status/1899511569274347908
> [Q] Does the Agents SDK support MCP connections? So can we easily give certain agents tools via MCP client server connections?
> [A] You're able to define any tools you want, so you could implement MCP tools via function calling
in short, we need to do some plumbing work.
relevant issue in the repo: https://github.com/openai/openai-agents-python/issues/23
knowaveragejoe 4 months ago

You can (somewhat) bridge between them:
https://github.com/SecretiveShell/MCP-Bridge
dgellow 4 months ago

Do you have experience with MCP? If yes, what do you think of it?
- smcleod 4 months ago
  
  It's great! Easy to work with, makes it quick to build tools and isnt over complicated.
- singularity2001 4 months ago
  
  not OP but giving Claude access to local files / emails / database / terminal was … futuristic! (until I hit their stupid request limit)
  
  consumer451 4 months ago
  
  I have been using Windsurf+Sonnet for a couple months, and recently adding Supabase MCP was a total game changer for velocity. I can't believe I waited so long to configure that.
  Querying schema from prompt is great, but also being able to say "I cannot see the Create Project button on the projects list screen. Use MCP to see if user with email me@domain.com has the appropriate permissions" is just amazing.
esafak 4 months ago

How do they compare?
- cowpig 4 months ago
  
  MCP is a protocol, and Anthropic has provided SDKs for implementing that protocol. In practice, I find the MCP protocol to be pretty great, but it leaves basically everything except the model parts out. I.e. MCP really only addresses how "agentic" systems interact with one another, nothing else.
  This SDK is trying to provide a bunch of code for implementing specific agent codebases. There are a bunch of open source ones already, so this is OpenAI throwing their hat in the ring.
  IMO this OpenAI release is kind of ecosystem-hostile in that they are directly competing with their users, in the same way that the GPT apps were.
  
  TeeWEE 4 months ago
  
  Correction: the MCP is a protocol for function calling and getting context into a model. It’s can run locally or over a network.
  It does not specify how “agentic” systems interact with each other. Depending on what you mean there.
  
  cowpig 4 months ago
  
  People are using the word "agentic" to mean this, I think. But yeah it's a dumb, overloaded primarily marketing word. I keep going back and forth on whether I should use the word "agentic"/"agent" at all
  
  esafak 4 months ago
  
  Thank you. Which open source ones do you recommend?
  
  cowpig 4 months ago
  
  Here's a fairly comprehensive list:
  https://github.com/slavakurilyak/awesome-ai-agents
  CrewAI is a popular VC-backed one, but two that I think are kind of interesting in the open source space are:
  https://github.com/i-am-bee/beeai-framework
  https://github.com/lastmile-ai/mcp-agent
  ... However I think the vast majority of "AI Agent" use-cases in practice right now are actually just workflows, and imo dify is great for those:
  https://github.com/langgenius/dify
  [edit] worth mentioning [langfuse](https://github.com/langfuse/langfuse), which is more like a platform that addresses the observability/evals/prompt management piece of the puzzle as opposed to a full-on "agent framework". In practice I have not yet run into a case where I needed something like what OpenAI just released, nor crewAI etc (despite it feeling like those cases may be coming)
  
  esafak 4 months ago
  
  Thanks. mcp-agent lacks tests so I'm skipping it for now: https://github.com/lastmile-ai/mcp-agent/tree/main/tests
  I just realized BeeAI is IBM's project: https://www.ibm.com/think/news/beeai-open-source-multiagent
  I also see there's https://ai.pydantic.dev/ but it lacks MCP support. Finally, the MCP site maintains a nice client list:
  https://modelcontextprotocol.io/clients#feature-support-matr...

swyx 4 months ago

swyx here. we got some preview and time with the API/DX team to ask FAQs about all the new APIs.

https://latent.space/p/openai-agents-platform

main fun part - since responses are stored for free by default now, how can we abuse the Responses API as a database :)

other fun qtns that a HN crew might enjoy:

- hparams for websearch - depth/breadth of search for making your own DIY Deep Research

- now that OAI is offering RAG/reranking out of the box as part of the Responses API, when should you build your own RAG? (i basically think somebody needs to benchmark the RAG capabilities of the Files API now, because the community impression has not really updated from back when Assistants API was first launched)

- whats the diff between Agents SDK and OAI Swarm? (basically types, tracing, pluggable LLMs)

- will the `search-preview` and `computer-use-preview` finetunes be merged into GPT5?

suttontom 4 months ago

What is a "qtns"?
- oofbaroomf 4 months ago
  
  Questions.
  
  xdavidliu 4 months ago
  
  this is why I don't like NSA (non-standard acronyms). It saves half a second for the typer, but causes hours if not days of confusion when summed over all the readers.
  
  sixhobbits 4 months ago
  
  This is an OF (online forum) my bruv, you don't need to follow a style guide to post here
mritchie712 4 months ago

for anyone that likes the Agents SDK, but doesn't want their framework attached to OpenAI, we're really liking PydanticAI[0].
0 - https://ai.pydantic.dev/
- darkteflon 4 months ago
  
  There’s also HF’s smolagents[1].
  1 - https://github.com/huggingface/smolagents
- fullstackwife 4 months ago
  
  Openai SDK docs:
  > Notably, our SDK is compatible with any model providers that support the OpenAI Chat Completions API format.
  so you can use with everything, not only OpenAI?
  
  swyx 4 months ago
  
  yea they mention this on the pod
  
  DrBenCarson 4 months ago
  
  Yes
- startupsfail 4 months ago
  
  Does it encode everything as json object, so special characters are getting escaped?
  I’ve noticed that with longer responses (particularly involving latex), models are a lot less accurate when the results need to be additionally encoded into JSON.
  I like structured, but my preference is yaml/markdown, as it is a lot more readable (and the only thing that works with longer responses, latex or code generation).
ggnore7452 4 months ago

appreciate the question on hparams for websearch!
one of the main reasons i build these ai search tools from scratch is that i can fully control the depth and breadth (and also customize loader to whatever data/sites). and currently the web search isn't very transparent on what sites they do not have full text or just use snippets.
having computer use + websearch is definitely something very powerful (openai's deep research essentially)

rvz 4 months ago

They did not announce the price(s) in the presentation. Likely because they know it is going to be very expensive:

   Web Search [0]
    * $30 and $25 per 1K queries for GPT‑4o search and 4o-mini search.

   File search [1]
    * $2.50 per 1K queries and file storage at $0.10/GB/day
    * First 1GB is free.

   Computer use tool (computer-use-preview model) [2]
    * $3 per 1M input tokens and $12/1M output tokens.

[0] https://platform.openai.com/docs/pricing#web-search

[1] https://platform.openai.com/docs/pricing#built-in-tools

[2] https://platform.openai.com/docs/pricing#latest-models

rudedogg 4 months ago

Those prices, especially for web search, are absurd.
I have a hard time seeing how this API is better than https://www.anthropic.com/news/model-context-protocol.
It seems like the motivation was "how can we make more money", rather than "how can we be more useful for our users".
yard2010 4 months ago

So they're basically pivoting from selling text by the ounce to selling web searches and cloud storage? I like it, it's a bold move. When the slow people at Google finally catch up it might be too late for Google?
- KoolKat23 4 months ago
  
  Google AI Studios "Grounding" basically web search is priced similarly. (Very expensive for either, although Google gives you your first 1500 queries free).
  It seems completely upside down, they always said traditional search was cheaper/less intensive, I guess a lot of tokens must go into the actual LLM searching and retrieving.
_bramses 4 months ago

For anyone who's looking Brave Search is $3/1000 requests [0]
I also wrote a script that searches the web and works pretty well (using the vercel ai sdk)[1]
[0] - https://brave.com/search/api/
[1] - https://gist.github.com/bramses/41e90b27d156590154bcefd4119f...

anorak27 4 months ago

I have built myself a much simpler and powerful version of the responses API and it works with all LLM providers.

https://github.com/Anilturaga/aiide

grvdrm 4 months ago

Thank you for your detailed Readme. A relief / joy to read compared to many other libraries/etc. that provide one basic (if that) example and otherwise leave to your own trial/error.
bsenftner 4 months ago

For so few Github stars, I'm surprised that this is the 4th time I'm reading about your aiide project in 2 days. It looks good, very good BTW.

jumploops 4 months ago

> “we plan to formally announce the deprecation of the Assistants API with a target sunset date in mid-2026.”

The new Responses API is a step in the right direction, especially with the built-in “handoff” functionality.

For agentic use cases, the new API still feels a bit limited, as there’s a lack of formal “guardrails”/state machine logic built in.

> “Our goal is to give developers a seamless platform experience for building agents”

It will be interesting to see how they move towards this platform, my guess is that we’ll see a graph-based control flow in the coming months.

Now there are countless open-source solutions for this, but most of them fall short and/or add unnecessary obfuscation/complexity.

We’ve been able to build our agentic flows using a combination of tool calling and JSON responses, but there’s still a missing higher order component that no one seems to have cracked yet.

falcor84 4 months ago

I'm impressed by the advances in Computer Use mentioned here and this got me wondering - is this already mature enough to be utilized for usability testing? Would I be right to assume that in general, a UI that is more difficult for AI to navigate is likely to also be relatively difficult for humans, and that it's a signal that it should be simplified/improved in some way?

m3t4man 4 months ago

Why would you assume that? Modality of engagement is drastically different between the way LLM engages with UI vs human being
- falcor84 4 months ago
  
  Oh, I had assumed that it was trained on human interaction data and should be generally similar, and from the examples I saw - it generally was (although still not as good as us). In what sense do you expect it to be drastically different?
  
  m3t4man 3 months ago
  
  LLM engages with UI in a pragmatic way. It doesn't care about aesthetics or structural complexity, it doesn't care about latency or tactility. It doesnt have anywhere close to similar preferences and reasons for why it Vs a human being would and wouldn't click on or engage with something on a UI.
  Do you really see no difference in the way a hollow algorithm interacts with UI vs how a person would go about interacting with UI?

ilaksh 4 months ago

The Agents SDK they linked to comes up 404.

BTW I have something somewhat similar to some of this like Responses and File Search in MindRoot by using the task API: https://github.com/runvnc/mindroot/blob/main/api.md

Which could be combined with the query_kb tool from the mr_kb plugin (in my mr_kb repo) which is actually probably better than File Search because it allows searching multiple KBs.

Anyway, if anyone wants to help with my program, create a plugin on PR, or anything, feel free to connect on GitHub, email or Discord/Telegram (runvnc).

edwinarbus 4 months ago

sorry, fixed link: https://openai.github.io/openai-agents-python/
yablak 4 months ago

Loads fine for me. Maybe because I'm logged in?
- IncreasePosts 4 months ago
  
  That should be a 403 then. Tsk tsk open ai
  
  29ebJCyy 4 months ago
  
  Technically it should be a 401. Tsk tsk IncreasePosts.
  
  __float 4 months ago
  
  It's common (see: S3, private GitHub repos) to return 404 instead of unauthorized to avoid even leaking existence of a resource at URL.

serjester 4 months ago

This is one of the few agent abstractions I've seen that actually seems intuitive. Props to the OpenAI team, seems like it'll kill a lot of bad startups.

sdcoffey 4 months ago

Steve here from the OpenAI team–this means a lot! We really hope you enjoy building on it

dazzaji 4 months ago

I was fortunate to get early access to the new Agent SDK and APIs that OpenAI dropped today and made an open source project to show some of the capabilities [1]. If you are using any of the other agent frameworks like LangGraph/LangChain, AutoGen, Crew, etc I definitely suggest giving this agent SDK a spin.

To ease into it, I added the entire SDK with examples and full documentation as a single text file in my repo [2] so you can quickly get up to speed be adding it to a prompt and just asking about it or getting some quick start code to play around with.

The code in my repo is very modular so you can try implementing any module using one of the other frameworks to do a head-to-head.

Here’s a blog post with some more thoughts on this SDK [3] and some if its major capabilities.

I’m liking it. A lot!

[1] https://github.com/dazzaji/agento6

[2] https://raw.githubusercontent.com/dazzaji/agento6/refs/heads...

[3] https://www.dazzagreenwood.com/p/unleashing-creativity-with-...

mentalgear 4 months ago

Well, I'll just wait 2-3 days until a (better) open-source alternative is released. :D

Areibman 4 months ago

Nice to finally see one of the labs throwing weight behind a much needed simple abstraction. It's clear they learned from the incumbents (langchain et al)-- don't sell complexity.

Also very nice of them to include extensible tracing. The AgentOps integration is a nice touch to getting behind the scenes to understand how handoffs and tool calls are triggered

esafak 4 months ago

Extensible how?
swyx 4 months ago

why agentops specifically? doesnt the oai first party one also do it?
- Areibman 4 months ago
  
  The OpenAI dash is great but is clearly missing a lot of features (i.e. data export, alerts, non-oai model compatibility). Believe it or not, but they don't even report response API costs on spans
  
  swyx 4 months ago
  
  oh lol i wasnt looking at usernames. it's you! spiderman pointing

LeoPanthera 4 months ago

I don't know about agents, but this finally adds the ability to search the web to the API. This is a very useful big deal.

Kind of annoying that they've made a bunch of tiny changes to the history format though. It doesn't seem to change anything important, and only serves to make existing code incompatible.

simonw 4 months ago

If you want to get an idea for the changes, here's a giant commit where they updated ALL of the Python library examples in one go from the old chat completions to the new resources APIs: https://github.com/openai/openai-python/commit/2954945ecc185...

sixhobbits 4 months ago

Aa lot of criticism about the potential of vendor lockin etc, but I think this is great, especially for building proof of concepts and small projects. As they said, these are the first building blocks and they look great to me.

When gpt3.5 came out these are literally the first things I built manually. I mainly use LLMs through a telegram bot. I know there are a lot of tools and frameworks out there but I wrote a few hundred lines of hacky python to give my bot memory, web search, image analysis. It's fun and useful and I agree that these are the basic building blocks that many apps need.

Sure you can find better stuff elsewhere with less lock in and more control, but now it "just works" and this responses api is cleaner and more powerful than the chatcompletions one, so personally I'm happy to give openai credit for this, I just don't know why they couldn't have released it two years ago

cowpig 4 months ago

Feels like OpenAI really want to compete with its own ecosystem. I guess they are doing this to try to position themselves as the standard web index that everyone uses, and the standard RAG service, etc.

But they could just make great services and live in the infra layer instead of trying to squeeze everyone out at the application layer. Seems unnecessarily ecosystem-hostile

pas 4 months ago

they target new entrants probably, they need more revenue, and more importantly a killer app that's at least a bit tied to them.

nowittyusername 4 months ago

How does this compare to MCP? Anyone has any considerations on the matter?

tiniuclx 4 months ago

At $30 per 1k search queries, the OpenAI search API seems very expensive. Perplexity's Sonar model charges just $5 per thousand searches [0].

I wonder what justifies this drastic difference in price.

[0] https://docs.perplexity.ai/guides/pricing

theuppermiddle 4 months ago

Does the SDK allow executing Python code generated in some sort of sandbox? If not are there any open source library which does this for us? I would ideally like the state of the code executed, including return values, available for the entire chat session, like IPython, so that subsequent LLM generated code can use them.

sci_prog 4 months ago

Yeah, OpenInterpreter does this (you are not limited to OpenAI only): https://github.com/OpenInterpreter/open-interpreter
I wrote a wrapper around it that works in a web browser (you'll need an OpenAI API key): https://github.com/uhsealevelcenter/IDEA

phren0logy 4 months ago

I'm a bit surprised at the approach to RAG. It will be great to see how well it handles complex PDFs. The max size is far larger than the Anthropic API permits (though that's obviously very different - no RAG).

The chunking strategy is... pretty basic, but I guess we'll see if it works well enough for enough people.

nextworddev 4 months ago

This may be bad for Langflow, Langsmith, etc

daviding 4 months ago

It would have been nice if the Completions use of the internal web-search tool wasn't always mandatory and could be set to 'auto'. Would save a lot of reworking just to go the new Responses API format just for that use case.

laacz 4 months ago

Won't the industry eventually go through all stages of the smart home? Early protocols, often completely incompatible and perpendicular, then few competing standards, etc?

hodanli 4 months ago

I wonder why they phased out Pydantic in structured output for the Responses API.

sdcoffey 4 months ago

Hey there! This is Steve here from the OpenAI team–I worked on the Responses API. We have not removed this! It should still work just like before! Here's an example:
https://github.com/openai/openai-python/blob/main/examples/r...
- hodanli 4 months ago
  
  Hi Steve,
  I based my assumption on the examples in the documentation[^1]. It is great that we can still use that.
  1. https://platform.openai.com/docs/guides/structured-outputs?a...
lunarcave 4 months ago

(Shameless plug) I worked on something for anyone else wanting to get structured outputs from LLMs in a model agnostic way (Including Open AI models): https://github.com/inferablehq/l1m

andrethegiant 4 months ago

$25 per thousand searches seems excessive

tiniuclx 4 months ago

Perplexity charges $5 per 1k searches for their Sonar API - this is pretty ridiculous.
shrisukhani 4 months ago

ya i'm sure they'll get a bunch of usage despite that but don't know who would use it at any kind of scale with that pricing
otoh, they've dropped prices for everything else a ton previously so maybe they will for this as well

casey2 4 months ago

It's so obvious when you are just intentionally holding back releases just to steal mindshare from your competition. YAWN!

cosbgn 4 months ago

We handle over 1M requests per month using the Assistant API on https://rispose.com which apparently will get depreciated mid 2026. Should we move to the new API?

nknj 4 months ago

there's no rush to do this - in the coming weeks, we will add support for:
- assistant-like and thread-like objects to the responses api
- async responses
- code interpreter in responses
once we do this, we'll share a migration guide that allows you to move over without any loss of features or data. we'll also give you a full 12 months to do your migration. feel free to reach out at nikunj[at]openai.com if you have any questions about any of this, and thank you so much for building on the assistants api beta! I think you'll really like responses api too!
- marko-k 4 months ago
  
  If Responses is replacing Assistants, is there a quickstart template available—similar to the one you had for Assistants?
  https://github.com/openai/openai-assistants-quickstart
jstummbillig 4 months ago

Eventually, yes. The addressed Assistant API near the end of the the video: They say there will be a transition path, once they built all Assistant features into the new API, and ample time to take action.

the_clarence 4 months ago

Can you actually see what requests it makes to be able to answer?

shrisukhani 4 months ago

Curious what people think about the CUA API pricing? Any thoughts on what use cases it may or may not work for (the pricing specifically)?

dmayle 4 months ago

Is it just me, or is what OpenAI is really lacking is a billing API/platform?

As an engineer, I have to manage the cost/service ratio manually, making sure I charge enough to handle my traffic, while enforcing/managing/policing the usage.

Additionally, there are customers who already pay for OpenAI, so the value add for them is less, since they are paying twice for the underlying capabilities.

If OpenAPI had a billing API/platform ala AppStore/PlayStore, I have multiple price points matched to OpenAI usage limits (and maybe configurable profit margins).

For customers that don't have an existing relationship with me, OpenAI could support a Netflix/YouTube-style profit-sharing system, where OpenAI customers can try out and use products integrated with the billing platform/API, and my products would receive payment in accordance with customer usage...

mrcwinn 4 months ago

One, if you charge above API costs, you should never police usage (so long as you're transparent with customers). Why would you need to cap usage if you're pricing correctly? (Rate limits aside)
Two, yes, many people will pay $20/mo for ChatGPT and then also pay for a product that under the hood uses OpenAI API. If you're worried about your product's value not being differentiated from ChatGPT, I'd say you have a product problem moreso than OpenAI has a billing model problem.

nnurmanov 4 months ago

Does anyone know if there are any difference if you typed the question with typos vs you did it correctly?

davidbarker 4 months ago

In theory there shouldn't be — LLMs are pretty robust to typos and usually infer the intended meaning regardless.

nekitamo 4 months ago

Does the new Agents SDK support streaming audio and Realtime models?

rohanmehta1 4 months ago

Not yet, but it's on the roadmap!

preaching5271 4 months ago

When TypeScript SDK?

huqedato 4 months ago

OpenAI has become the Yahoo of the AI landscape.

baxtr 4 months ago

A bit off topic but the post comes handy: can we settle the debate what an agent really is? It seems like everyone has their own definition.

Ok I’ll start: an agent is a computer program that utilized LLMs heutiger for decision making.

knowaveragejoe 4 months ago

I think Anthropic's definition makes the most sense.
- Workflows are systems where LLMs and tools are orchestrated through predefined code paths. (imo this is what most people are referring to as "agents")
- Agents, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks.
https://www.anthropic.com/engineering/building-effective-age...
- kodablah 4 months ago
  
  The problem with this definition is that modern workflow systems are not through predefined code paths, they do dynamically direct their own processes and tool usage.
  
  dzzen 4 months ago
  
  > they do dynamically direct their own processes and tool usage how are they dynamic? if there is a code written for it then they are not dynamic.
  If they are driven by LLMs interpretation then there is no explicit code written and it's figured out at runtime
3stripe 4 months ago

First rule of writing definitions: use everyday English.
- baxtr 4 months ago
  
  True! Meant heuristic
rglover 4 months ago

Agents are just regular LLM chat bots that are prompted to parse user input into instructions about what functions to call in your back-end, with what data, etc. Basically it's a way to take random user input and turn it into pseudo-logic you can write code against.
As an example, I can provide a system prompt that mentions a function like get_weather() being available to call. Then, I can pass whatever my user's prompt text is and the LLM will determine what code I need to call on the back-end.
So if a user types "What is the weather in Nashville?" the LLM would infer that the user is asking about weather and reply to me with a string like "call function get_weather with location Nashville" or if you prompted it, some JSON like { function_to_call: 'get_weather', location: 'Nashville' }. From there, I'd just call that function with any the data I asked the LLM to provide.
- gusmally 4 months ago
  
  That sounds like L1 in this article (there are six) https://www.vellum.ai/blog/levels-of-agentic-behavior
  
  rglover 4 months ago
  
  Relative to that scale, L2 is how I've come to understand it. It's kind of soft-sold as L3 but that will require quite a bit of work on the vendor side (e.g., implementing an AWS Lambda style setup for authoring functions the LLM can call).
codydkdc 4 months ago

an agent is software that does something on behalf of someone (aka software)
I personally strongly prefer the term "bots" for what most of these frameworks call "agents"
- handfuloflight 4 months ago
  
  Stick to the agentic nomenclature if you want at least a magnitude increase in valuation.
kylecazar 4 months ago

Even more off topic, does "heutiger" mean something in English that I'm unaware of? Google tells me it's just German for 'today' or 'current'.
- baxtr 4 months ago
  
  Never heard that word either!
  
  kylecazar 4 months ago
  
  I see from the other comment it's just a typo haha. It all makes sense now!
nsonha 4 months ago

There is already a definition in agent oriented programning. It has something to do with own sensors of environment and react autonomously. I find that definition fits agentic AI too. My rudimentary interpretation is anything with its own inner (event) loop.
- baxtr 4 months ago
  
  So it’s just a program?
  
  nsonha 4 months ago
  
  a program can be unixy: taking inputs and producing output, and does not listen and react to any event. It can also be an UI program that are "event-driven" but all events are from user actions, hence no autonomy.