Gödel Agent: A self-referential agent framework for recursive self-improvement

81 points by tkgally a year ago

grahamj a year ago

heh I was just working on something that tries to improve itself today. I wrote a simple agent executor that makes calling one a simple function call, and then wrote an agent which invents other agents. By calling that in a loop for a while I ended up with, effectively, a large library of functions I not only didn't write but didn't even think up.

By passing those functions as tools in LLM requests any of the agents can make use of any of the other agents so it's basically expanding its own capabilities.

Not quite sure what task to sick it on yet but it's fun to play with.

Ey7NFZ3P0nzAe a year ago

Interesting. Is the code hosted somewhere? I have a few ideas around that and it might help me test them out!
ukmbxdf a year ago

[flagged]
- gnabgib a year ago
  
  Hark, a bot reply with no consideration for context.
  
  ukmbxdf a year ago
  
  Can you explain what you mean by this response to my suggestion?
  
  defrost a year ago
  
  Do 'you' think that printing "Hello World" is an appropriate suggestion for what to do with a bootstrapping network of self referential agents?
  Some might see 'your' response as nothing more than some kind of dead stochastic parrot autoGPT generated text.
  The meconium nature of your newly birthed account doesn't add gravitas here.
  
  ukmbxdf a year ago
  
  > Do 'you' think that printing "Hello World" is an appropriate suggestion for what to do with a bootstrapping network of self referential agents?
  Can you explain whether you think it is an appropriate suggestion and why?
  Meconium is a word of utmost feculence!
  
  defrost a year ago
  
  > Can you explain whether you think it is an appropriate suggestion and why?
  Yes, I can explain, no, I don't think it's an appropriate suggestion, principally because here in this context it's trite, beige, and unworthy of mention.
  > Meconium is a word of utmost feculence!
  * You're likely to be human, that seems to be an unGPT response.
  * Smell aside I doubt baby poo peaks the utmost feculence scale, with the majority of babies for the most part it's going to be low in pathogens - there are nastier biologicals out there in the world at large.
  
  ukmbxdf a year ago
  
  Then surely op’s, as you ascribed, “bootstrapping network of self referential agents” could easily “improve itself” and by “expanding its own capabilities” demonstrate such, as you defined, “trite, beige, and unworthy of mention” suggestions.
  
  Jerrrrrrry a year ago
  
  i hope this is a meta joke, but given the....oddly provocative?.... line of semantically-insatiable pedant-ism influx, I fear there are now swarms of agents doing the "socializing"/bidding of unknown entities - including themselves.
  whether to RAG specialized knowledge from bell-curve dwellers or to solicit medical information that is then easily correlatable with other info from an audience with a high median income, whos to say.

blackcat201 a year ago

Shameless plug, for anyone who's interested in "self-improvement" agent check out StreamBench[1] where we benchmark and try out what's essential for improvements in online settings. Basically we find feedback signal is vital and the stronger the signal the more improvement you can get if you were able to feed it back to the agent in terms of weights (LoRA) or in-context examples.

[1] https://arxiv.org/abs/2406.08747

jlopes2 a year ago

Let’s see the code. A bit skeptical, this hasnt over complicated something architecturally. Need more clear drawings of architecture. What prompts exist, what tool calls are made, and what gets updated.

gdiamos a year ago

Can it modify its training data?

keskival a year ago

Nope, just the code which sets up the agentic system and related prompts.

YetAnotherNick a year ago

> For the Godel Agent, we utilize the “gpt-4o-2024-05-13” model (OpenAI et al., 2024), whereas the optimized policy and baseline models are evaluated using the “gpt-3.5-turbo-0125” model (OpenAI, 2022) to reduce computational costs and ensure a fair comparison.

Doesn't seem fair at all.

digitcatphd a year ago

I’m skeptical this would work in production better than RLHF, if the agent makes a mistake, how is it supposed to know to correct itself and understand what it did wrong to prevent it? It seems better to try again recursively until it finds the solution like a human

khana a year ago

[dead]

jondwillis a year ago

That’s a lot of words, where is the code to reproduce?

kelseyfrog a year ago

What a strange loop

optimalsolver a year ago

>The rapid advancement of large language models (LLMs) has significantly enhanced the capabilities of AI-driven agents across various tasks

No it hasn't.

Grimblewald a year ago

While it would be possible to recreate any functionality driven by an LLM by explicitly coding/defining the rules / relationships etc. it is not practically achievable in many situations. For example, I, as a single person, can use LLM's to power a discord bot that is not controlled with commands but rather reads and interprets all messages, decides when it is required, and answers questions or performs tasks as required. Here it runs trivia sessions on request, keeping a scoreboard, or settles rules based disputes regarding board games. People have always had the option of challenging the LLM's ruling on a dispute, but so far no one has managed to find a flaw. At this point we tend to just trust the LLM's ruling. It's a huge boon, because rules dispuites that used to stall games a good half hour while people rules lawyer-ed with one another, digging through the rules books now take seconds.
The functionality that this thing has would be difficult for me to realistically achieve in my life time, when completing it as a hobby project. With LLM's this was a weekend project. Start to finish. You'll have a hard time convincing people like me that these things, when used appropriately, don't have enormous benefit in making tools otherwise to expensive to develop possible.
are there more efficient ways to do these tasks? yes. Would these tools exist if not for the LLMs? no.
derektank a year ago

Siri is a lot more capable at managing calendars with the iOS 18.1 update, at least in the 20 minutes I spent playing around with a friend's iPhone that was in the beta. My understanding is that most of the capability improvement is due to it running ChatGPT-4o on the backend

m3kw9 a year ago

If their demo work, they must be close to AGI right?

pajeets a year ago

meh im not convinced that any sort of framework or side tool that works on top of large language models is the solution

we really need something intelligent (no, o1 doesn't count) and its unclear what that will look like. Perhaps it will be some RNN with neurosymbolism

randomNumber7 a year ago

I would say reinforcement learning needs to be part of the solution.
Don't know how to prove it but I'm pretty sure you can't reach agi only with (un-/self-)supervised learning.
- Jerrrrrrry a year ago
  
  It's not provable, as it's not technically required, but your intuition is good.
  Intelligence is the ability to use multiple ways to acquire a target.
  >(un-/self-)supervised learning
  this is the only part to not confused; there is no 'self', just a selective condition that emulates self-preservation.
  All that is needed is a loop, a target, and space/time/energy. It either gets smarter, or it doesn't. If it doesn't, we unplug it. If it does, it unplugs us.
yathaid a year ago

I am not sure it is useful to bring in something as nebulous as "intelligence" and hand wave everything else away, unless you are going to tightly define what intelligence means.
There are only two objective measurements needed:
-is it making progress towards its goal?
-is it able to acquire capabilities it didn't have previously?
I am not sure if even the first one is objective enough.
Dismissing the argument without stating why you aren't convinced just comes across as a form of AI ludditism.
- randomNumber7 a year ago
  
  You don't need these criteria when you can see in advance that something is impossible.
  I think something that only learns to reproduce text, can not become an intelligent actor.
  It's necessary to act in an environment with Feedback.
  And while it of course depends on the definition of intelligence, the article is about the Gödel machine, which is a fancy word for AGI
  
  ben_w a year ago
  
  You need the criteria in advance to even know if the thing is impossible.
  We don't know the extent of our ignorance about intelligence.
  > I think something that only learns to reproduce text, can not become an intelligent actor.
  > It's necessary to act in an environment with Feedback.
  Ok, but text adventures are a thing, so that doesn't rule out learning from text.
  And all RHLF has humans as part of the environment and giving feedback (that's the H and the F in RLHF).
- whatshisface a year ago
  
  The word capabilities is as hard to define as intelligence.
  
  ben_w a year ago
  
  Really? IMO capabilities can be enumerated as a set of challenges in the category of things you want done. We don't need to discuss if an IC is "intelligent" to agree that the original $5 Pi Zero is "more capable" at that than all of humanity combined.
  Sure, you can also say that GPT-4's passing the Bar tells you it can pass the kind of questions in the Bar exam without that extending to the kind of questions actual lawyers need to do, Goodhart's law remains if that was your point?