grahamj 2 days ago

heh I was just working on something that tries to improve itself today. I wrote a simple agent executor that makes calling one a simple function call, and then wrote an agent which invents other agents. By calling that in a loop for a while I ended up with, effectively, a large library of functions I not only didn't write but didn't even think up.

By passing those functions as tools in LLM requests any of the agents can make use of any of the other agents so it's basically expanding its own capabilities.

Not quite sure what task to sick it on yet but it's fun to play with.

  • Ey7NFZ3P0nzAe 2 days ago

    Interesting. Is the code hosted somewhere? I have a few ideas around that and it might help me test them out!

  • ukmbxdf 2 days ago

    [flagged]

    • gnabgib 2 days ago

      Hark, a bot reply with no consideration for context.

      • ukmbxdf 2 days ago

        Can you explain what you mean by this response to my suggestion?

        • defrost 2 days ago

          Do 'you' think that printing "Hello World" is an appropriate suggestion for what to do with a bootstrapping network of self referential agents?

          Some might see 'your' response as nothing more than some kind of dead stochastic parrot autoGPT generated text.

          The meconium nature of your newly birthed account doesn't add gravitas here.

          • ukmbxdf 2 days ago

            > Do 'you' think that printing "Hello World" is an appropriate suggestion for what to do with a bootstrapping network of self referential agents?

            Can you explain whether you think it is an appropriate suggestion and why?

            Meconium is a word of utmost feculence!

            • defrost 2 days ago

              > Can you explain whether you think it is an appropriate suggestion and why?

              Yes, I can explain, no, I don't think it's an appropriate suggestion, principally because here in this context it's trite, beige, and unworthy of mention.

              > Meconium is a word of utmost feculence!

              * You're likely to be human, that seems to be an unGPT response.

              * Smell aside I doubt baby poo peaks the utmost feculence scale, with the majority of babies for the most part it's going to be low in pathogens - there are nastier biologicals out there in the world at large.

              • ukmbxdf 2 days ago

                Then surely op’s, as you ascribed, “bootstrapping network of self referential agents” could easily “improve itself” and by “expanding its own capabilities” demonstrate such, as you defined, “trite, beige, and unworthy of mention” suggestions.

                • Jerrrrrrry a day ago

                  i hope this is a meta joke, but given the....oddly provocative?.... line of semantically-insatiable pedant-ism influx, I fear there are now swarms of agents doing the "socializing"/bidding of unknown entities - including themselves.

                  whether to RAG specialized knowledge from bell-curve dwellers or to solicit medical information that is then easily correlatable with other info from an audience with a high median income, whos to say.

blackcat201 2 days ago

Shameless plug, for anyone who's interested in "self-improvement" agent check out StreamBench[1] where we benchmark and try out what's essential for improvements in online settings. Basically we find feedback signal is vital and the stronger the signal the more improvement you can get if you were able to feed it back to the agent in terms of weights (LoRA) or in-context examples.

[1] https://arxiv.org/abs/2406.08747

jlopes2 2 days ago

Let’s see the code. A bit skeptical, this hasnt over complicated something architecturally. Need more clear drawings of architecture. What prompts exist, what tool calls are made, and what gets updated.

gdiamos 2 days ago

Can it modify its training data?

  • keskival 2 days ago

    Nope, just the code which sets up the agentic system and related prompts.

digitcatphd 2 days ago

I’m skeptical this would work in production better than RLHF, if the agent makes a mistake, how is it supposed to know to correct itself and understand what it did wrong to prevent it? It seems better to try again recursively until it finds the solution like a human

jondwillis 2 days ago

That’s a lot of words, where is the code to reproduce?

YetAnotherNick 2 days ago

> For the Godel Agent, we utilize the “gpt-4o-2024-05-13” model (OpenAI et al., 2024), whereas the optimized policy and baseline models are evaluated using the “gpt-3.5-turbo-0125” model (OpenAI, 2022) to reduce computational costs and ensure a fair comparison.

Doesn't seem fair at all.

optimalsolver 2 days ago

>The rapid advancement of large language models (LLMs) has significantly enhanced the capabilities of AI-driven agents across various tasks

No it hasn't.

  • derektank 2 days ago

    Siri is a lot more capable at managing calendars with the iOS 18.1 update, at least in the 20 minutes I spent playing around with a friend's iPhone that was in the beta. My understanding is that most of the capability improvement is due to it running ChatGPT-4o on the backend

pajeets 2 days ago

meh im not convinced that any sort of framework or side tool that works on top of large language models is the solution

we really need something intelligent (no, o1 doesn't count) and its unclear what that will look like. Perhaps it will be some RNN with neurosymbolism

  • randomNumber7 2 days ago

    I would say reinforcement learning needs to be part of the solution.

    Don't know how to prove it but I'm pretty sure you can't reach agi only with (un-/self-)supervised learning.

    • Jerrrrrrry a day ago

      It's not provable, as it's not technically required, but your intuition is good.

      Intelligence is the ability to use multiple ways to acquire a target.

        >(un-/self-)supervised learning
      
      
      this is the only part to not confused; there is no 'self', just a selective condition that emulates self-preservation.

      All that is needed is a loop, a target, and space/time/energy. It either gets smarter, or it doesn't. If it doesn't, we unplug it. If it does, it unplugs us.

  • yathaid 2 days ago

    I am not sure it is useful to bring in something as nebulous as "intelligence" and hand wave everything else away, unless you are going to tightly define what intelligence means.

    There are only two objective measurements needed:

    -is it making progress towards its goal?

    -is it able to acquire capabilities it didn't have previously?

    I am not sure if even the first one is objective enough.

    Dismissing the argument without stating why you aren't convinced just comes across as a form of AI ludditism.

    • randomNumber7 2 days ago

      You don't need these criteria when you can see in advance that something is impossible.

      I think something that only learns to reproduce text, can not become an intelligent actor.

      It's necessary to act in an environment with Feedback.

      And while it of course depends on the definition of intelligence, the article is about the Gödel machine, which is a fancy word for AGI

      • ben_w 2 days ago

        You need the criteria in advance to even know if the thing is impossible.

        We don't know the extent of our ignorance about intelligence.

        > I think something that only learns to reproduce text, can not become an intelligent actor.

        > It's necessary to act in an environment with Feedback.

        Ok, but text adventures are a thing, so that doesn't rule out learning from text.

        And all RHLF has humans as part of the environment and giving feedback (that's the H and the F in RLHF).

    • whatshisface 2 days ago

      The word capabilities is as hard to define as intelligence.

      • ben_w 2 days ago

        Really? IMO capabilities can be enumerated as a set of challenges in the category of things you want done. We don't need to discuss if an IC is "intelligent" to agree that the original $5 Pi Zero is "more capable" at that than all of humanity combined.

        Sure, you can also say that GPT-4's passing the Bar tells you it can pass the kind of questions in the Bar exam without that extending to the kind of questions actual lawyers need to do, Goodhart's law remains if that was your point?

m3kw9 2 days ago

If their demo work, they must be close to AGI right?