New prompt injection papers: Agents rule of two and the attacker moves second

behnamoh 2 hours ago

I actually want prompt injection to remain possible. So many lazy academic paper reviewers nowadays delegate the review process to AI. It'd be cool if we could inject prompts in the paper that would stop the AI from aiding in such situations. In my experience, prompt injection techniques work for non-reasoning models but gpt-5-high easily ignores them...

ares623 4 hours ago

I don’t know if it’s just me but doesn’t a huge value of LLMs for the general population necessitate all 3 of the circles?

Having just 2 circles requires a person in the loop, and that person will still need knowledge and experience and a low enough throughput to meaningfully action the workload otherwise they would just rubber stamp everything (which is essentially the 3rd circle with extra steps)

mercer 2 hours ago

wouldn't that still add a lot of value, where the person in the loop (sadly, usually) becomes little more than the verifier, but can process a lot more work?
Anecdotally what I'm hearing is that this is pretty much how LLMs are helping programmers get more done, including the work being less enjoyable because it involves more verification and rubber-stamping.
For the business owner, it doesn't matter that the nature of the work has changed, as long as that one person can get more work done. Even worse, the business owner probably doesn't care as much about the quality of the resulting work, as long as it works.
I'm reminded of how much of my work has involved implementing solutions that took less careful thought, where even when I outlined the drawbacks, the owner wanted it done the quick way. And if the problems arose, often quite a bit later, it was as if they hadn't made that initial decision in the first place.
For my personal tinkering, I've all but defaulted to the LLMs returning suggested actions at logical points in the workflow, leaving me to confirm or cancel whatever it came up with. this definitely still makes the process faster, just not as magically automatic.
pprotas 3 hours ago

The HITL is needed to pin the accountability on an employee you can fire
- ares623 3 hours ago
  
  Yeah that seems likely. But still even in that dystopian scenario, the incentives of the human will lead them to go through the back log very thoroughly, which IMO defeats the productivity gains.
  Maybe there will still be some productivity gains even with the human being the bottleneck? Or the humans can be scaled out and parallelized more easily?
- boxed 3 hours ago
  
  Given the incentives here, I'd bet this is mathematically identical to throwing dice and firing people.

kubb 2 hours ago

I’m sorry, what kind of rule is that? How does it guarantee security?

It sounds like we’re making things up at this point.

bawolff 5 minutes ago

It kind of sounds like a weak version of airgapping. If you cant persist state, access private data, or exfiltrate data, there is not much point to jailbreaking the llm.
However, its deeply unsatisying in the same way that securing your laptop by not turning it on, is.

r0x0r007 2 hours ago

Nice, why don't we apply the same principles to our regular applications? Ooh, right, cause we couldn't use them and a whole industry got created that's called cybersecurity and it's supposed to be consulted BEFORE releasing privacy nightmares and using them. But hey, regular applications can't come up with cool poems.