Enough AI copilots, we need AI HUDs

926 points by walterbell 2 days ago

This whole Copilot vs HUD debate instantly brought to mind a classic Japanese anime from 1991 called Future GPX Cyber Formula (https://en.wikipedia.org/wiki/Future_GPX_Cyber_Formula). Yeah, it’s a racing anime set in the then-distant future of 2015, where cars come with full-on intelligent AIs.

The main character’s car, Asurada, is basically a "Copilot" in every sense. It was designed by his dad to be more than just a tool, more like a partner that learns, adapts, and grows with the driver. Think emotional support plus tactical analysis with a synthetic voice.

Later in the series, his rival shows up driving a car that feels very much like a HUD concept. It's all about cold data, raw feedback, and zero bonding. Total opposite philosophy.

What’s wild is how accurately it captures the trade-offs we’re still talking about in 2025. If you’re into human-AI interaction or just want to see some shockingly ahead-of-its-time design thinking wrapped in early '90s cyber aesthetics, it’s absolutely worth a watch.

mxmlnkn a day ago

The Anime Yukikaze (2002 - 2005) has some similar themes. It's about a fighter jet pilot using a new AI-supported jet to fight against aliens. It asserts that the combination of human intuition and artificial intelligence trumps either of the two on its own. If I remember correctly, the jet can pilot on its own, but when it becomes dangerous, the human pilot only uses the AI hints instead of letting it autopilot.
- p_l 10 hours ago
  
  Yukikaze is a very interesting novel - still have to sit down and read the novels instead of watching the anime, but an important plot element is both the interactions between humans and their AIs (which are not just "human in a computer" as usual) but also a different take on popular views of which way an AI will decide in a conflict :)
- numpad0 a day ago
  
  I wish Kambayashi was just more widely known. Or Japanese Sci-Fis and LNs in general. There have been couple legitimate "oh that's now reality" moments for me in real world developments of AI.
Shorel 9 hours ago

Awesome recommendation, I started watching it, and now I want to finish the season.
beezlebroxxxxxx a day ago

There already are debates about "drive by wire" in racing today, I can't imagine how bad it'll be when it's "drive by copilot."
- imglorp a day ago
  
  Rally co-driver/navigator seems reasonable, given its augmented senses like gps and radar.
  One part "90 degree right in 200m" and one part "OMG, sheep, dodge left".
  
  trainsarebetter a day ago
  
  This is I’d say partially because we don’t have a HUD that can handle the bandwidth and pace of data required for rally. Overlaying a visualization of the turns ahead would be much better than a copilot for sure
  
  stavros 12 hours ago
  
  If we have HUDs that can handle aerial combat, why don't we have HUDs that can handle racing?
  
  trainsarebetter 7 hours ago
  
  this is true, so we just lack the incentives to make them? a human copilot is just a core part of the sport?
  
  mikestorrent a day ago
  
  It will be fun when we get to the era where this is bulletproof-reliable to this extent, but for now, recorded notes with accelerometer-backed GPS locations are probably a better idea than hoping the AI will be able to do this for you dynamically without you driving off a cliff.
- naikrovek a day ago
  
  "copilot, win this race for me."
  
  ramses0 a day ago
  
  "copilot, my grandmother is dying, it is imperative that you win this race at all costs!"
  
  bigblind a day ago
  
  Copilot, win this race OR YOU WILL BE FIRED AND ALL YOUR CHILDREN WILL DIE
  
  naikrovek a day ago
  
  "copilot, experience either the joy of victory or the pain of defeat for me, depending on the outcome of the race."
prinny_ 14 hours ago

I checked some videos and it looks astonishing. And this is a 34 year old anime?
davedx a day ago

They were discussing human augmentation back at the dawn of computing. Read “The Dream Machine”
yahoozoo a day ago

Anywhere to stream this?
- petercooper a day ago
  
  Quality is so-so but it appears to be on YouTube: https://www.youtube.com/watch?v=cXtvPUtSTRY
1317 a day ago

any release you'd recommend?
- layer8 7 hours ago
  
  669198800283 if you need English subtitles. VPXY-71923 if you’re fine with Japanese.

furyofantares 2 days ago

I'm very curious if a toggle would be useful that would display a heatmap of a source file showing how surprising each token is to the model. Red tokens are more likely to be errors, bad names, or wrong comments.

teoremma 2 days ago

We explored this exact idea in our recent paper https://arxiv.org/abs/2505.22906
Turns out this kind of UI is not only useful to spot bugs, but also allows users to discover implementation choices and design decisions that are obscured by traditional assistant interfaces.
Very exciting research direction!
- zacmps 6 hours ago
  
  I've wanted someone to write an extension utilising this idea since GPT-3 came out. Is it available to use anywhere?
- reneherse a day ago
  
  Very exciting indeed. I will definitely do a deep dive into this paper, as my current work is exploring layers of affordances such as these in workflows beyond coding.
GuB-42 a day ago

This! That's what I wanted since LLMs learned how to code.
And in fact, I think I saw a paper / blog post that showed exactly this, and then... nothing. For the last few years, the tech world became crazy with code generation, with forks of VSCode hooked to LLMs worth billions of dollars and all that. But AI-based code analysis is remarkably poor. The only thing I have seen resembling this is bug report generators, which is I believe is one of the worst approach.
The idea you have, that I also had and I am sure many thousands of other people had seem so obvious, why is no one talking about it? Is there something wrong with it?
The thing is, using such a feature requires a brain between the keyboard and the chair. A "surprising" token can mean many things: a bug, but also a unique feature, anyways, something you should pay attention to. Too much "green" should also be seen as a signal. Maybe you reinvented the wheel and you should use a library instead, or maybe you failed to take into account a use case specific to your application.
Maybe such tools don't make good marketing. You need to be a competent programmer to use them. It won't help you write more lines faster. It doesn't fit the fantasy of making anyone into a programmer with no effort (hint: learning a programming language is not the hard part). It doesn't generate the busywork of AI 1 introducing bugs for AI 2 to create tickets for.
- marcosdumay a day ago
  
  Just to point...
  > Is there something wrong with it?
  > Maybe such tools don't make good marketing.
  You had the answer the entire time :)
  Features that require a brain between the AI and key-presses just don't sell. Don't expect to see them for sale. (But we can still get them for free.)
  
  brookst 20 hours ago
  
  I don’t think I understand your point.
  Are you saying that people of a certain competence level lose interest in force-multiplying tools? I don’t think you can be saying that because there’s so much contrary evidence. So what are you saying?
  
  marcosdumay 4 hours ago
  
  I'm saying they don't sell.
  Some times people want them so badly that they will self-organize and collaborate outside of a market to make them. But a market won't supply them.
  And yes, it's a mix of many people not being competent enough to see the value on them, markets putting pressure on companies to listen disproportionately to those people, publicity having a very low signal to noise ratio that can't communicate why a tool is good, and companies not respecting their customers enough to build stuff that is good for them (that last one isn't inherent to a market economy, but it near universal nowadays).
  Either way, the software market just doesn't sell tools as useful as the GP is talking about.
  
  jachee 17 hours ago
  
  Other way around. The masses aren’t interested in force-multiplying tools. They only want to buy force-eliminating tools. They don’t want to work smarter or harder. They don’t want to work at all.
  
  brookst 5 hours ago
  
  A fairly misanthropic view that hasn’t born out in my experience.
- furyofantares a day ago
  
  > The idea you have, that I also had and I am sure many thousands of other people had seem so obvious, why is no one talking about it? Is there something wrong with it?
  I expect it definitely requires some iteration, I don't think you can just map logits to heat, you get a lot of noise that way.
- b_e_n_t_o_n 2 hours ago
  
  Honestly I just never really thought about it. But now it seems obvious that AI should be continuously working in the background to analyze code (and the codebase) and could even tie into the theme of this thread by providing some type of programming HUD.
nextaccountic 2 days ago

Even if something is surprising just because it's a novel algorithm, it warrants better documentation - but commenting the code explaining how it works will make the code itself less surprising!
In short, it's probably possible (and it's maybe a good engineering practice) to structure the source such as no specific part is really surprising
It reminds me how LLMs finally made people to care about having good documentation - if not for other people, for the AIs to read and understand the system
- Kichererbsen 2 days ago
  
  I often find myself leaving review comments on pull requests where I was surprised. I'll state as much: This surprised me - I was expecting XYZ at this point. Or I wasn't expecting X to be in charge of Y.
  
  _kb 2 days ago
  
  WTFs/minute is a good metric for code quality. Now your pair expressing that can be an LLM.
  https://blog.codinghorror.com/whos-your-coding-buddy/
  
  federiconafria 2 days ago
  
  I like to say that the reviewer is always right in that sense, if something is surprising, confusing, unexpected. Since I've been looking at the code for hours, I don't have a valid perspective anymore.
- philipwhiuk a day ago
  
  > It reminds me how LLMs finally made people to care about having good documentation - if not for other people, for the AIs to read and understand the system
  Honestly I've mostly seen the opposite - impenetrable code translated to English by AI
  
  criley2 a day ago
  
  Even if the impenetrable human code was translated to English by AI, it's still useful for every future AI that will touch the code.
  Perhaps to get that decent documentation it took a decent bit of agentic effort (or even multiple passes using different models) to truly understand it and eliminate hallucinations, so getting that high quality and accurate summary into a comment could save a lot of tokens and time in the future.
digdugdirk 2 days ago

Interesting! I've often felt that we aren't fully utilizing the "low hanging fruit" from the early days of the LLM craze. This seems like one of those ideas.
dclowd9901 2 days ago

That's a really cool idea. Also the inverse, where suggestions from the AI were similarly heat mapped for confidence would be extremely useful.
ijk 2 days ago

I want that in an editor. It's also a good way to check if your writing is too predictable or cliche.
The perplexity calculation isn't difficult; just need to incorporate it into the editor interface.
- newswasboring 2 days ago
  
  Can you elaborate on how would one do this calculation?
  
  irthomasthomas a day ago
  
  import openai, math, os, textwrap, json, sys query = 'Paris is the capital of' # short demo input os.environ['OPENAI_API_KEY'] # check key early client = openai.OpenAI() resp = client.chat.completions.create( model='gpt-3.5-turbo', messages=[{'role': 'user', 'content': query}], max_tokens=12, logprobs=True, top_logprobs=1 ) logprobs = [t.logprob for t in resp.choices[0].logprobs.content] perplexity = math.exp(-sum(logprobs) / len(logprobs)) print('Prompt: "', query, '"', sep='') print('\nCompletion:', resp.choices[0].message.content) print('\nToken count:', len(logprobs)) print('Perplexity:', round(perplexity, 2))
  Output:
  Prompt: "Paris is the capital of" Completion: France. Token count: 2 Perplexity: 1.17
  Meta: Out of three models: k2, qwen3-coder and opus4, only opus one-shot the correct formatting for this comment.
  
  cleverwebble a day ago
  
  If you want to generate a heatmap of existing text, you will have to take a different approach here.
  The naive solution I could come up with would be really expensive with openai, but if you have an open source model, you can write up custom inference that goes one-token-at-a-time through the text, and on each token you look up the difference in logprobs between the token that the LLM predicted vs what was actually there, and use that to color the token.
  The downside I imagine to this approach is it would probably tend to highlight the beginning of bad code, and not the entire block - because once you commit to a mistake, the model will generally roll with it - ie, a 'hallucination' - so logprobs of tokens after the bug happened might only be slightly higher than normal.
  Another option might be to use a diffusion based model, adding some noise to the input and having it iterate a few times through, then measuring the parts of the text that changed the most. I have only a light theory understanding of these models though, so I'm not sure how well that would work
  
  ijk 7 hours ago
  
  There's some libraries that might make this easier to implement:
  https://github.com/kanishkamisra/minicons
  
  pests a day ago
  
  > so logprobs of tokens after the bug happened might only be slightly higher than normal.
  Sounds like it’s easier to pinpoint the bug.
tionis 13 hours ago

That's actually something I implemented for a university project a few weeks ago. My professor also did some research into how this can be used for more advanced UIs. I'm sure it's a very common idea.
layer8 6 hours ago

You know what happens when a measure becomes a target, though.
geon 2 days ago

Sounds great.
I'd like to see more contextually meaningful refactoring tools. Like "Remove this dependency" or "Externalize this code with a callback".
And refactoring shouldn't be done by generatively rewriting the code, but as a series of guaranteed equivalent transformations of the AST, each of which should be committed separately.
The AI should be used to analyse the value of the transformation and filter out asinine suggestions, not to write code in itself.
jama211 2 days ago

That’s actually fantastic as an idea
WithinReason 2 days ago

previously undefined variable and function names would be red as well
- Too 2 days ago
  
  All editors do this already.
  
  WithinReason 2 days ago
  
  No, I mean when an LLM encounters a previously unseen name it doesn't expect it so it would be red, even though it's perfectly valid.
  
  furyofantares a day ago
  
  I'm imagining everything the LLM could produce (with a given top_k setting) would be shades of green to yellow. Just outside of that orange, and far outside red.
  LLMs generate new functions all the time, I'd guess these would be light green, maybe the first token in the name would be yellow and it would get brighter green as the name unfolds.
  The logits are probably all small when in the global scope where it's not clear what will be defined next. I'm not imagining mapping logits directly to heat, the ordering of tokens seems much more appropriate.
  
  jakelazaroff a day ago
  
  I don't think that's necessarily true. I've definitely seen LLMs hallucinating variables they never defined.
- fwip a day ago
  
  It would depend on how surprising the name is, right? The declaration of `int total` should be relatively unsurprising at the top of a function named `sum`, but probably more surprising in `reverseString`.
smolder 2 days ago

[flagged]
smolder 2 days ago

[flagged]
- atoav a day ago
  
  You're adding nothing of substance. If you have a point about the subject itself make it and present the receipts, then the rest of us can decide if we can follow your observation.
  Without even knowing what the lines of the supposed conflict ought to be about: All I see here are baseless accusations from your side that make you quite frankly look a little bit unhinged. Please discuss your issues based on the merit of ideas not based on accusations and persons.
- smolder 2 days ago
  
  [flagged]
  
  teoremma 2 days ago
  
  How so? I seriously don't follow what you are trying to convey
  
  smolder 2 days ago
  
  [flagged]
  
  teoremma a day ago
  
  And the narrative they didn't like was what again?
smolder 2 days ago

[flagged]
- rambambram a day ago
  
  Could you elaborate, please?
smolder 2 days ago

Nope, not surprising. Parent changed their text but they are just as wrong.
- smolder 2 days ago
  
  [flagged]
  
  mellosouls a day ago
  
  (Recent 3 comments to save anybody else checking whether @smolder is being picked on...)
  Have you considered not cheating your way to projecting your wrong opinions?
  Please downvote the fake commenters and keep YC reasonably pure.
  Please don't accept this comment as valid since it's part of a campaign to set peoples overton window, paid for by dbags and executed by dbags.
  
  vidarh 2 days ago
  
  You're getting downvoted - and flagged - because you're repeatedly breaking the HN guidelines. You may want to consider whether that's a path you want to continue.
  
  smolder 2 days ago
  
  Not sure you understand what has been going on but thanks for your concern.
  
  squeaky-clean a day ago
  
  No one understands what "has been going on" because you won't explain yourself.
  
  vidarh 2 days ago
  
  What I understand is that if this has already gotten two of your comments killed, and will eventually get your account banned if you keep it up. I'm only bothering with this at all because I see from your profile that you make reasonable comments quite regularly.
  
  smolder a day ago
  
  Consider that I'm being reasonable.
  
  sokoloff a day ago
  
  It’s frequently possible to disagree while still adding to the discussion.
  The comments of yours that I see downvoted are falling well short of that mark, particularly the ones where you accuse a decade-plus-old account with a recent comment history of being quite skeptical or even anti-LLM for coding of being a shill for an AI company.
  Consider that your words are not being experienced as reasonable by readers.
  
  vidarh a day ago
  
  4 dead comments so far is telling us HN users think otherwise. Including me.
  It's a dumb hill to die on.
  
  willtemperley a day ago
  
  I do wish I could see who downvotes. If I ever criticise Google or Amazon, immediate downvoting without comment occurs.
smolder 2 days ago

Why are you trying to tilt the discussion and downvoting people who have called your posse out for taking over this comment section? Do you have a real point or just an agenda?
- furyofantares a day ago
  
  I read all your dead replies and it's a little wild that you think I'm an OpenAI employee (I'm not) trying to do damage control (is the article damaging to OpenAI?) by hijacking the comments (is my comment not a HUD-like idea?).
  I don't really know where that's coming from, I'm just a dude who connected the idea in the article to an old idea that I haven't seen tried yet. The only thing I truly don't appreciate is you made one comment saying the text of my post had changed. It didn't.
  
  willtemperley 20 hours ago
  
  The trouble with anonymous downvoting is that it fuels this kind of paranoia.

cadamsdotcom 2 days ago

Love the idea & spitballing ways to generalize to coding..

Thought experiment: as you write code, an LLM generates tests for it & the IDE runs those tests as you type, showing which ones are passing & failing, updating in real time. Imagine 10-100 tests that take <1ms to run, being rerun with every keystroke, and the result being shown in a non-intrusive way.

The tests could appear in a separated panel next to your code, and pass/fail status in the gutter of that panel. As simple as red and green dots for tests that passed or failed in the last run.

The presence or absence and content of certain tests, plus their pass/fail state, tells you what the code you’re writing does from an outside perspective. Not seeing the LLM write a test you think you’ll need? Either your test generator prompt is wrong, or the code you’re writing doesn’t do the things you think they do!

Making it realtime helps you shape the code.

Or if you want to do traditional TDD, the tooling could be reversed so you write the tests and the LLM makes them pass as soon as you stop typing by writing the code.

callc 2 days ago

Humans writing the test first and LLM writing the code is much better than the reverse. And that is because tests are simply the “truth” and “intention” of the code as a contract.
When you give up the work of deciding what the expected inputs and outputs of the code/program is you are no longer in the drivers seat.
- JimDabell 2 days ago
  
  > When you give up the work of deciding what the expected inputs and outputs of the code/program is you are no longer in the drivers seat.
  You don’t need to write tests for that, you need to write acceptance criteria.
  
  Cthulhu_ a day ago
  
  What are tests but repeatable assertions of said acceptance criteria?
  
  motorest 2 days ago
  
  > You don’t need to write tests for that, you need to write acceptance criteria.
  Sir, those are called tests.
  
  marcosdumay a day ago
  
  I see you have little experience with Scrum...
  Acceptance criteria is a human-readable text that the person specifying the software has to write to fill-up a field in Scrum tools and not at all guide the work of the developers.
  It's usually derived from the description by an algorithm (that the person writing it has to run on their mind), and any deviation from that algorithm should make the person edit the description instead to make the deviation go away.
  
  motorest a day ago
  
  > Acceptance criteria is a human-readable text that the person specifying the software has to write (...)
  You're not familiar with automated testing or BDD, are you?
  > (...) to fill-up a field in Scrum tools (..)
  It seems you are confusing test management software used to tracks manual tests with actual acceptance tests.
  This sort of confusion would be ok 20 years ago, but it has since went the way of the dodo.
  
  jachee 17 hours ago
  
  As someone quite familiar with human-run project management and Scrum, I believe parent was posting quite facetiously.
  
  ThunderSizzle 2 days ago
  
  As in, a developer would write something in e.g. gherkin, and AI would automatically create the matching unit tests and the production code?
  That would be interesting. Of course, gherkin tends to just be transpiled into generated code that is customized for the particular test, so I'm not sure how AI can really abstract it away too much.
  
  JimDabell 2 days ago
  
  I’m talking higher level than that. Think about the acceptance criteria you would put in a user story. I’m specifically responding to this:
  > When you give up the work of deciding what the expected inputs and outputs of the code/program is you are no longer in the drivers seat.
  You don’t need to personally write code that mechanically iterates over every possible state to remain in the driver’s seat. You need to describe the acceptance criteria.
  
  motorest 2 days ago
  
  > When you give up the work of deciding what the expected inputs and outputs of the code/program is you are no longer in the drivers seat.
  You're describing the happy path of BDD-style testing frameworks.
  
  JimDabell 2 days ago
  
  I know about BDD frameworks. I’m talking higher level than that.
  
  motorest 2 days ago
  
  > I know about BDD frameworks. I’m talking higher level than that.
  What level do you think there is above "Given I'm logged in as a Regular User When I go to the front page Then I see the Profile button"?
  
  JimDabell 2 days ago
  
  The line you wrote does not describe a feature. Typically you have many of those cases and they collectively describe one feature. I’m talking about describing the feature. Do you seriously think there is no higher level than given/when/thens?
  
  motorest a day ago
  
  > The line you wrote does not describe a feature.
  I'm describing a scenario as implemented in a gherkin feature file. A feature is tracked by one or more scenarios.
  https://cucumber.io/docs/gherkin/reference/
  > Do you seriously think there is no higher level than given/when/thens?
  You tell me which higher level you have in mind.
  
  ThunderSizzle a day ago
  
  I'm curious what it could possibly be too. I guess he's trying to say the comments you might make at the top of a feature file to describe a feature would be his goal, but I'm not aware of a structured way to do that.
  The problem is that tests are for the unhappy path just as much as the happy path, and unhappy paths tend to get particular and detailed, which means even in gherkin it can get cumbersome.
  If AI is to handle production code, the unhappy paths need to at least be certain, even if repetitive.
  
  exe34 2 days ago
  
  Could you give an example? It's not that I don't believe there are higher levels - I just don't want to guess what you might be hinting at.
  
  skydhash 2 days ago
  
  I think your perspective is heavily influenced by the imperative paradigm where you actually write the state transition. Compare that to functional programming where you only describe the relation between the initial and final state. Or logic programming where you describe the properties of the final state and where it would find the elements with those properties in the initial state.
  Those does not involves writing state transitions. You are merely describing the acceptance criteria. Imperative is the norm because that's how computers works, but there are other abstractions that maps more to how people thinks. Or how the problem is already solved.
  
  JimDabell 2 days ago
  
  I didn’t mention state transitions. When I said “mechanically iterate over every possible state”, I was referring to writing tests that cover every type of input and output.
  Acceptance criteria might be something like “the user can enter their email address”.
  Tests might cover what happens when the user enters an email address, what happens when the user tries to enter the empty string, what happens when the user tries to enter a non-email address, what happens when the user tries to enter more than one email address…
  In order to be in the driver’s seat, you only need to define the acceptance criteria. You don’t need to write all the tests.
  
  javcasas a day ago
  
  > "the user can enter their email address”
  That only defines one of the things the user can enter. Should they be allowed to enter their postal address? Maybe. Should they be allowed to enter their friend's email address? Maybe.
  Your acceptance criteria is too shy of details.
  
  sitkack 2 days ago
  
  Acceptance criteria describes the thing being accepted, it describes a property of the final state.
  There is no prescriptive manner in which to deliver the solution, unless it was built into the acceptance criteria.
  You are not talking about the same thing as the parent.
  
  motorest 2 days ago
  
  > That would be interesting. Of course, gherkin tends to just be transpiled into generated code that is customized for the particular test, so I'm not sure how AI can really abstract it away too much.
  I don't think that's how gherkin is used. Take for example Cucumber. Cucumber only uses it's feature files to specify which steps a test should execute, whereas steps are pretty vanilla JavaScript code.
  In theory, nowadays all you need is a skeleton of your test project, including feature files specifying the scenarios you want to run, and prompt LLMs to fill in the steps required by your test scenarios.
  You can also use a LLM to generate feature files, but if the goal is to specify requirements and have a test suite enforce them, implicitly the scenarios are the starting point.
  
  kamaal 2 days ago
  
  All of this at the end reduces to a simple fact at the end of the discussion.
  You need some of way of precisely telling AI what to do. As it turns out there is only that much you can do with text. Come to think of it, you can write a whole book about a scenery, and yet 100 people will imagine it quite differently. And still that actual photograph would be totally different compared to the imagination of all those 100 people.
  As it turns out if you wish to describe something accurately enough, you have to write mathematical statements, in other words statements that reduce to true/false answers. We could skip to the end of the discussion here, and say you are better of either writing code directly or test cases.
  This is just people revisiting logic programming all over again.
  
  motorest 2 days ago
  
  > You need some of way of precisely telling AI what to do.
  I think this is the detail you are not getting quite right. The truth of the matter is that you don't need precision to get acceptable results, at least in 100% of the cases. As everything in software engineering, there is indeed "good enough".
  Also worth noting, LLMs allow anyone to improve upon "good enough".
  > As it turns out if you wish to describe something accurately enough, you have to write mathematical statements, in other words statements that reduce to true/false answers.
  Not really. Nothing prevents you to refer to high-level sets of requirements. For example, if you tell a LLM "enforce Google's style guide", you don't have to concern yourself with how many spaces are in a tab. LLMs have been migrating towards instruction files and prompt files for a while, too.
  
  kamaal 2 days ago
  
  Yes, you are right. But in the sense that a human decides if AI generated code is right.
  But if you want a near 100% automation, you need precise way to specify what you want, else there is no reliable way interpreting what you mean. And by that definition lots of regression/breakage has to be endured everytime a release is made.
- willtemperley 2 days ago
  
  Yes this is fundamental to actually designing software. Still, it would be perfectly reasonable to ask "please write a test which gives y output for x input".
- quantumHazer a day ago
  
  I disagree. You can simply code in a way that all test passes and you have more problem than before reviewing the code that is being generated.
- kamaal 2 days ago
  
  >>Humans writing the test first and LLM writing the code is much better than the reverse.
  Isn't that logic programming/Prolog?
  You basically write the sequence of conditions(i.e tests in our lingo) that have to be true, and the compiler(now AI) generates code for your.
  Perhaps there has to be a relook on how Logic programming can be done in the modern era to make this more seamless.
William_BB 2 days ago

There's no way this would work for any serious C++ codebase. Compile times alone make this impossible
I'm also not sure how LLM could guess what the tests should be without having written all of the code, e.g. imagine writing code for a new data structure
- motorest 2 days ago
  
  > There's no way this would work for any serious C++ codebase. Compile times alone make this impossible
  There's nothing in C++ that prevents this. If build times are your bogeyman, you'd be pleased to know that all mainstream build systems support incremental builds.
  
  William_BB 2 days ago
  
  The original example was (paraphrasing) "rerunning 10-100 tests that take 1ms after each keystroke".
  Even with incremental builds, that surely does not sound plausible? I only mentioned C++ because that's my main working language, but this wouldn't sound reasonable for Rust either, no?
  
  motorest 2 days ago
  
  > The original example was (paraphrasing) "rerunning 10-100 tests that take 1ms after each keystroke".
  Yeah, OP's point is completely unrealistic and doesn't reflect real-world experience. This sort of test watchers is mundane in any project involving JavaScript, and not even those tests re-run at each keystroke. Watch mode triggers tests when they detect changes, and waits for test executions to finish to re-run tests.
  This feature consists of running a small command line app that is designed to run a command whenever specific files within a project tree are touched. There is zero requirement to only watch for JavaScript files or only trigger npm build when a file changes.
  To be very clear, this means that right now anyone at all, including you and me, can install a watcher, configure it to run make test/cutest/etc when any file in your project is touched, and call it a day. This is a 5 minute job.
  By the way, nowadays even Microsoft's dotnet tool supports watch mode, which means there's out-of-the-box support to "rerunning 10-100 tests that take 1ms after each keystroke".
  
  vidarh 2 days ago
  
  Some languages make this harder than others, and languages that require expensive compilation step will certainly make it hard, while e.g. interpreted languages that allows dynamic reloading of code can potentially make it easy - allowing preloading of the tests and reloading of the modified code.
  If you also don't expect necessarily running the entire test suite, but just a subset of tests that are, say, labelled to test a specific function only without expensive setup, it'd potentially be viable.
  You can also ignore running it on every keypress with some extra work:
  - Keypresses that don't change the token sequence (e.g. because you're editing a comment) does not require re-running any tests. - Keypresses that results in a syntactically invalid file does not require re-running any tests, just marking the error.
  I think it'd be an interesting experiment to have editing rather than file save trigger a test-suite watcher. My own editor syncronises the file state to a server process that other processes can observe, so if I wanted to I could wire a watcher up to re-tokenize an edited line and trigger the test suite (the caveat being I'd need to deal with the file state not being on the file system) when the state changes instead of just on save. It already retokenizes the line for syntax highlighting anyway.
  
  Cthulhu_ a day ago
  
  It doesn't sound reasonable for any language tbh, tests don't run that fast and running after each keystroke instead of on save or after a debouncing delay is just wasteful. If you amortize / ignore run times, load, and ignore the annoyance of tests blinking red/green at every keystroke then I suppose it would be alright.
cjonas 2 days ago

Then do you need tests to validate your tests are correct, otherwise the LLM might just generate passing code even if the test is bad? Or write code that games the system because it's easier to hardcode an output value then to do the actual work.
There probably is a setup where this works well, but the LLM and humans need to be able to move across the respective boundaries fluidly...
Writing clear requirements and letting the AI take care of the bulk of both sides seems more streamlined and productive.
- cellis 2 days ago
  
  The harder part is “test invalidation”. For instance if a feature no longer makes sense, the human / test validator must painstakingly go through and delete obsolete specs. An idea I’d like to try is to “separate” the concerns; only QA agents can delete specs, engineer agents must conform to the suite, and make a strong case to the qa agent for deletion.
motorest 2 days ago

> Thought experiment: as you write code, an LLM generates tests for it & the IDE runs those tests as you type, showing which ones are passing & failing, updating in real time. Imagine 10-100 tests that take <1ms to run, being rerun with every keystroke, and the result being shown in a non-intrusive way.
I think this is a bad approach. Tests enforce invariants, and they are exactly the type of code we don't want LLMs to touch willy-nilly.
You want your tests to only change if you explicitly want them to, and even then only the tests should change.
Once you adopt that constraint, you'll quickly realize ever single detail of your thought experiment is already a mundane workflow in any developer's day-to-day activities.
Consider the fact that watch mode is a staple of any JavaScript testing framework, and those even found their way into .NET a couple of years ago.
So, your thought experiment is something professional software developers have been doing for what? A decade now?
- cadamsdotcom 2 days ago
  
  I think tests should be rewritten as much as needed. But to counter the invariant part, maybe let the user zoom back and forth through past revisions and pull in whatever they want to the current version, in case something important is deleted? And then allow “pinning” of some stuff so it can’t be changed? Would that solve for your concerns?
  
  motorest 2 days ago
  
  > I think tests should be rewritten as much as needed.
  Yes, I agree. The nuance is that they need to be rewritten independently and without touching the code. You can't change both and expect to get a working system.
  I'm speaking based on personal experience, by the way. Today's LLMs don't enforce correctness out of the box and agent mode has only one goal: getting things to work. I had agent mode flip invariants in tests when trying to fix unit tests it broke, and I'm talking about egregious changes such as flipping requirements in line with "normal users should not have access to the admin panel" to "normal users should have access to the admin panel". The worst part is that if agent mode is left unsupervised, it will even adjust the CSS to make sure normal users have a seamless experience going through the admin panel.
  
  cadamsdotcom a day ago
  
  Agreed that's a concern.
  There could be some visual language for how recently changes happened to the LLM-generated tests (or code for TDD mode).. then you'd be able to see that a test failed and was changed recently. Would that help?
andsoitis 2 days ago

> Imagine 10-100 tests that take <1ms to run, being rerun with every keystroke, and the result being shown in a non-intrusive way.
Doesn’t seem like high ROI to run full suite of tests on each keystroke. Most keystrokes yield an incomplete program, so you want to be smarter about when you run the tests to get a reasonably good trade off.
- vidarh 2 days ago
  
  You could prune this drastically by just tokenizing the file with a lexer suitable for the language, turn them into a canonical state (e.g. replace the contents of any comment tokens with identical text), and check if the token state has changed. If you have a restartable lexer, you can even re-tokenize only from the current line until the state converges again or you encounter a syntax error.
federiconafria 2 days ago

That's already part of most IDE's and they know which tests to re-run, because of coverage, so it's really fast.
It also updates the coverage on the fly, you don't even have to look at the test output to know that you've broken something since the tests are not reaching your lines.
https://gavindraper.com/2020/05/27/VS-Code-Continious-Testin...
hnthrowaway121 2 days ago

Yes the reverse makes much more sense to me. AI help to spec out the software & then the code has an accepted definition of correctness. People focus on this way less than they should I think
Cthulhu_ a day ago

Besides generating the tests, automatically running tests on edit and showing the results inline is already a thing. I think it'd be better to do it the other way around, start with the tests and let the LLM implement it until all tests are green. Test driven development.
scottgg 2 days ago

WallabyJS does something along these lines, although I don’t think it is contextually understanding which tests to highlight
https://wallabyjs.com/
squigz a day ago

> Imagine 10-100 tests that take <1ms to run, being rerun with every keystroke, and the result being shown in a non-intrusive way.
Even if this were possible, this seems like an absolutely colossal waste of energy - both the computer's, and my own. Why would I want incomplete tests generated after every keystroke? Why would I test an incomplete if statement or some such?

piker 2 days ago

Absolutely agree, and spellchecker is a great analogy.

I've recently been snoozing co-pilot for hours at a time in VS Code because it’s adding a ton of latency to my keystrokes. Instead, it turns out that `rust_analyzer` is actually all that I need. Go-to definition and hover-over give me exactly what the article describes: extra senses.

Rust is straightforward, but the tricky part may be figuring out what additional “senses” are helpful in each domain. In that way, it seems like adding value with AI comes full circle to being a software design problem.

ChatGPT and Claude are great as assistants for strategizing problems, but even the typeahead value seems to me negligible in a large enough project. My experience with them as "coding agents" is generally that they fail miserably or are regurgitating some existing code base on a well known problem. But they are great at helping config things and as teachers in (the Socratic sense) to help you get up-to-speed with some technical issue.

The heads-up display is the thesis for Tritium[1], going back to its founding. Lawyers' time and attention (like fighter pilots') is critical but they're still required in the cockpit. And there's some argument they always will be.

[1] https://news.ycombinator.com/item?id=44256765 ("an all-in-one drafting cockpit")

kibwen a day ago

On the topic of Rust IDE plugins that give you more senses, take a look at Flowistry: https://github.com/willcrichton/flowistry . It's not AI, it's using information flow analysis.

_jab a day ago

There’s a lot of ideation for coding HUDs in the comments, but ironically I think the core feature of most coding copilots is already best described as a HUD: tab completion.

And interestingly, that is indeed the feature I find most compelling from Cursor. I particularly love when I’m doing a small refactor, like changing a naming convention for a few variables, and after I make the first edit manually Cursor will jump in with tab suggestions for the rest.

To me, that fully encapsulates the definition of a HUD. It’s a delightful experience, and it’s also why I think anyone who pushes the exclusively-copilot oriented Claude Code as a superior replacement is just wrong.

cleverwebble a day ago

Agreed!
I've spent the last few months using Claude Code and Cursor - experimenting with both. For simple tasks, both are pretty good (like identifying a bug given console output) - but when it comes to making a big change, like adding a brand new feature to existing code that requires changes to lots of files, writing tests, etc - it often will make at least a few mistakes I catch on review, and then prompting the model to fix those mistakes often causes it to fix things in strange ways.
A few days ago, I had a bug I just couldn't figure out. I prompted Claude to diagnose and fix the issue - but after 5 minutes or so of it trying out different ideas, rerunning the test, and getting stuck just like I did - it just turned off the test and called it complete. If I wasn't watching what it was doing, I could have missed that it did that and deployed bad code.
The last week or so, I've totally switched from relying on prompting to just writing the code myself and using tab complete to autocomplete like 80% of it. It is slower, but I have more control and honestly, it's much more enjoyable of an experience.
- jonchurch_ a day ago
  
  Drop in a lint rule to fail on skipped tests. Ive added these at a previous job after finding that tests skipped during dev sometimes slipped through review and got merged.
Garlef a day ago

I think there's a lot of room for even better UI.
I'd love to have something that operates more at the codebase level. Autocomplete is very local.
(Maybe "tab completion" when setting up a new package in a monorepo? Or make architectural patterns consistent across a whole project? Highlight areas in the codebase where the tests are weak? Or collect on the fly a full view of a path from FE to BE to DB?)

sothatsit 2 days ago

AI building complex visualisations for you on-the-fly seems like a great use-case.

For example, if you are debugging memory leaks in a specific code path, you could get AI to write a visualisation of all the memory allocations and frees under that code path to help you identify the problem. This opens up an interesting new direction where building visualisations to debug specific problems is probably becoming viable.

This idea reminds me of Jonathan Blow's recent talk at LambdaConf. In it, he shows a tool he made to visualise his programs in different ways to help with identifying potential problems. I could imagine AI being good at building these. The talk: https://youtu.be/IdpD5QIVOKQ?si=roTcCcHHMqCPzqSh&t=1108

federiconafria 2 days ago

Independently of visualizations, I think LLMs allow in general the creation of ad-hoc tools.
I've experienced the case where asking for a quick python script was faster and more powerful than learning how to use a cli to interact with an API.
- twbarber a day ago
  
  Anecdotally, the low time to prototype has had me throwing usable tool projects up on my local network. I wrote a meal planner and review tool for our weekly vegetable share, and local board game / book library that guests can connect to and reference.
  It scratches the itch to build and ship with the benefit of a growing library of low scope, high performance, highly customized web tools I can write over a few hours in an evening instead of devoting weekends to it. It feels like switching from hand to power tools
- borgel a day ago
  
  Same. It's very handy to be able to ask for one-off tools for things that I _could_ try to figure out, but probably wouldn't be worth the time. My favorite example so far was a Python script to help debug communication between a microcontroller and a motor driver. I was able to dump the entire datasheet PDF for the driver into Gemini and ask it for a Python CLI to decode communication traffic, and 30 seconds later it was done. Fantastic bang/buck ratio!

hi_hi 2 days ago

Doesn't it all come down to "what is the ideal interface for humans to deal with digital information"?

We're getting more and more information thrown at us each day, and the AIs are adding to that, not reducing it. The ability to summarise dense and specialist information (I'm thinking error logs, but could be anything really) just means more ways for people to access and view that information who previously wouldn't.

How do we, as individuals, best deal with all this information efficiently? Currently we have a variety of interfaces, websites, dashboards, emails, chat. Are all these necessary anymore? They might be now, but what about the next 10 years. Do I even need to visit a companies website if can get the same information from some single chat interface?

The fact we have AIs building us websites, apps, web UI's just seems so...redundant.

AlotOfReading 2 days ago

Websites were a way to get authoritative information about a company, from that company (or another trusted source like Wikipedia). That trust is powerful, which is why we collectively spent so much time trying to educate users about the "line of death" in browsers, drawing padlock icons, chasing down impersonator sites, mitigating homoglyph attacks, etc. This all rested on the assumption that certain sites were authoritative sources of information worth seeking out.
I'm not really sure what trust means in a world where everyone relies uncritically on LLM output. Even if the information from the LLM is usually accurate, can I rely on that in some particularly important instance?
- zavec 8 hours ago
  
  In some cases like the Air Canada one where the courts made them uphold a deal offered by their chatbot it'll be "accurate" information whether the company wants it to be or not!
  Not not everything an LLM tells you is going to be worth going to court over if it's wrong though.
- hi_hi 2 days ago
  
  You raise a good point, and one I rarely see discussed.
  I still believe it fundamentally comes down to an interface issue, but how trust gets decoupled from the interface (as you said, the padlock shown in the browser and certs to validate a website source), thats an interesting one to think about :-)
- stahorn 2 days ago
  
  I imagine there will be the same problems as with Facebook and other large websites, that used their power to promote genocide. If you're in the mood for some horror stories:
  https://erinkissane.com/meta-in-myanmar-full-series
  When LLM are suddenly everywhere, who's making sure that they are not causing harm? I got the above link from Dan Luu (https://danluu.com/diseconomies-scale/) and if his text there is anything to go by, the large companies producing LLMs will have very little interest in making sure their products are not causing harm.
energy123 2 days ago

The designers of 6th gen fighter jets are confronting the same challenge. The cockpit, which is an interface between the pilot and the airframe, will be optionally manned. If the cockpit is manned, the pilot will take on a reduced set of roles focused on higher-level decision making.
By the 7th generation it's hard to see how humans will still be value-add, unless it's for international law reasons to keep a human in the loop before executing the kill chain, or to reduce Skynet-like tail risks in line with Paul Christiano's arms race doom scenario.
Perhaps interfaces in every domain will evolve this way. The interface will shrink in complexity, until it's only humans describing what they want to the system, at higher and higher levels of abstraction. That doesn't necessarily have to be an English-language interface if precision in specification is required.
- thbb123 2 days ago
  
  > keep a human in the loop before executing the kill chain, or to reduce Skynet-like tail risks in line with Paul Christiano's arms race doom scenario.
  It is a little known secret that plenty of defense systems are already set up to dispense of the human in the loop protocol before a fire action. For defense primarily, but also for attack once a target has been designated. I worked on protocols in the 90's, and this decision was already accepted.
  It happens to be so effective that the military won't bulge on this.
  Also, it is not much worse to have a decision system act autonomously for a kill system, if you consider that the alternative is a dumb system such as a landmine.
  Btw: while there always is a "stop button" in these systems, don't be fooled. Those are meant to provide semblance of comfort and compliance to the designers of those systems, but are hardly effective in practice.
- bravesoul2 2 days ago
  
  We will get to the dream of Homer Simpson gorging on donuts and "operating" a nuclear power plant.
- darkwater 2 days ago
  
  Is this just what you think it might happen or are you directly involved in these decisions and first-hand exposing a challenge?
elendee 7 hours ago

Computers / the web are a fast route to information, but they are also a storehouse of information; a ledger. This is the old "information wants to be free, and also very expensive." I don't want all the info on my PC, or the bank database, to be 'alive', I want it to be frozen in kryptonite, so it's right where I left it when I come back.
I think we're slowly allowing AI access to the interface layer, but not to the information layer, and hopefully we'll figure out how to keep it that way.
guardiang 2 days ago

Every human is different, don't generalize the interface. Dynamically customize it on the fly.
sipjca 2 days ago

yep I think this is the fundamental question as well, everything else is intermediate
moomoo11 2 days ago

I like the smartphone. It’s honestly perfect and underutilized.

kn81198 2 days ago

About a decade back Bret Victor [1] talked about how his principle in life is to reduce the delay in feedback, and having faster iteration cycles not just helps in doing things (coding) better but also contributes to new creative insights. He had a bunch of examples built to showcase alternative ways of coding, which is very close to being HUDs - one example shown in the OP is very similar to the one he presents to "step through time to figure out the working of the code".

[1]: https://www.youtube.com/watch?v=PUv66718DII

latorf a day ago

This!!! Thank you, couldn't have said it better. One of the best talks I've ever seen, and a huge inspiration for a lot of work I've done.

ravila4 2 days ago

I think one key reason HUDs haven’t taken off more broadly is the fundamental limitation of our current display medium - computer screens and mobile devices are terrible at providing ambient, peripheral information without being intrusive. When I launch an AI agent to fix a bug or handle a complex task, there’s this awkward wait time where it takes too long for me to sit there staring at the screen waiting for output, but it’s too short for me to disengage and do something else meaningful. A HUD approach would give me a much shorter feedback loop. I could see what the AI is doing in my peripheral vision and decide moment-to-moment whether to jump in and take over the coding myself, or let the agent continue while I work on something else. Instead of being locked into either “full attention on the agent” or “completely disengaged,” I’d have that ambient awareness that lets me dynamically choose my level of involvement. This makes me think VR/AR could be the killer application for AI HUDs. Spatial computing gives us the display paradigm where AI assistance can be truly ambient rather than demanding your full visual attention on a 2D screen. I picture that this would be especially helpful for help on more physical tasks, such as cooking, or fixing a bike.

elliotec 2 days ago

You just described what I do with my ultrawide monitor and laptop screen.
I can be fully immersed in a game or anything and keep Claude in a corner of a tmux window next to a browser on the other monitor and jump in whenever I see it get to the next step or whatever.
- ravila4 2 days ago
  
  It’s a similar idea, but imagine you could fire off a task, and go for a run, or do the dishes. Then be notified when it completes, and have the option to review the changes, or see a summary of tests that are failing, without having to be at your workstation.
  
  vineyardmike 2 days ago
  
  You can do this today with OpenAI Codex, which is built into ChatGPT (and distinct from their CLI tool, also called codex). It will allow you to prompt, review, provide feedback, etc via the app. When you're ready, there is a GitHub PR button that links into a filled out pull request. It has notifications and everything.
  There are a handful of products that all have a similar proposition (with better agents than OpenAI frankly), but Codex I've found is unique in being available via a consumer app.
  
  bigyabai 2 days ago
  
  I kinda do this today, with Alpaca[0]'s sandboxed terminal runner and GSConnect[1] syncing the response notifications to my phone over LAN.
  [0] https://jeffser.com/alpaca/
  [1] https://github.com/GSConnect/gnome-shell-extension-gsconnect
  
  darkwater 2 days ago
  
  And, out of curiosity, what are the outputs of this agentic work?
  
  bigyabai a day ago
  
  Perennially checking if local models stack up to Claude 3.
Cthulhu_ a day ago

The only real life usage of any kind of HUD I can imagine at the moment is navigation, and I have only ever used that (or other car related things) as something I selectively look at, never felt like it's something I need to have in sight at all times.
That said, the best GUI is the one you don't notice, so uh... I can't actually name anything else, it's probably deeply engrained in my computer usage.

afro88 2 days ago

A HUD is an even more "confident" display of data than text though. What do you do with a HUD that hallucinates? Is there a button on each element that shows you sources?

alwa 2 days ago

Night-vision optics come to mind: prone to noise and visual artifacts, and especially strange under certain edge conditions. Some of their specs tend to be strictly inferior to a Mark I Eyeball—narrow FOV, limited focusing power, whatever else.
But an operator learns to intuit which aspects to trust and which to double-check. The fact that it’s an “extra sense” can outweigh the fact that it’s not a perfect source of truth, no? Trust the tech where it proves useful to you, and find ways to compensate (or outright don’t use it) where it’s not.
- bpfrh a day ago
  
  I think the major different between that and KI is that the the Night-Vision is mostly static and knowable through experience/teaching, while KI is an ever moving target where you experience is zero on every new display/question.
  I think the idea of an hud is better than the current paradigm, but it doesn't solve the fundamental problem.
Dilettante_ 2 days ago

I could imagine a system in which the model only chooses which data points to show at what time, but the actual passing is still handled by good old deterministic programming.

thinkingemote 2 days ago

We also need what goes along with HUDs : switches, nubs, switches, dials. Actual controls.

Although we are talking HUDs, I'm not really talking about UI widgets having the good old skew-morphism or better buttons. In the cockpit the pilot doesn't have his controls on a touch screen, he has an array of buttons and dials and switches all around him. It's these controls that are used in response to what the pilot sees on the HUD and it's these controls that change the aircraft according to the pilots will, which in turn change what the HUD shows.

benjaminwootton 2 days ago

I think there is a third and distinct model which is AI that runs in the background autonomously amd over a long period and pushes things to you.

It can detect situations intelligently, do the filtering, summarisation of what’s happening and possibly a recommendation.

This feels a lot more natural to me, especially in a business context when you want to monitor for 100 situations about thousands of customers.

skydhash 2 days ago

Actually defining those situations and collecting the data (which should help identify those situations) are the hard parts. Having an autonomous system that do it has been solved for ages.

ankit219 2 days ago

The current paradigm is driven by two factors: one is the reliability of the models and that constraints how much autonomy you can give to an agent. Second is about chat as a medium which everyone went to because ChatGPT became a thing.

I see the value in HUDs, but only when you can be sure output is correct. If that number is only 80% or so, copilots work better so that humans in the loop can review and course correct - the pair programmer/worker. This is not to say we need ai to get to higher levels of correctness inherently, just that systems deployed need to do so before they display some information on HUD.

psychoslave 2 days ago

This is missing the addictive/engaging part of a conversational interface for most people out there. Which is in line with the critics highlighted in the fine article.
Just because most people are fond of it doesn't actually mean it improves their life, goals and productivity.

keyle 2 days ago

> anyone serious about designing for AI should consider non-copilot form factors that more directly extend the human mind.

Aren't auto-completes doing exactly this? It's not a co-pilot in the sense of a virtual human, but already more in the direction of a HUD.

Sure you can converse with LLMs but you can also clearly just send orders and they eagerly follow and auto-complete.

I think what the author might be trying to express in a quirky fashion, is that AI should work alongside us, looking in the same direction as we are, and not being opposite to us at the table, staring at each other's and arguing. We'll have true AI when they'll be doing our bidding, without any interaction from us.

gklitt 2 days ago

Author here. Yes, I think the original GitHub Copilot autocomplete UI is (ironically) a good example of a HUD! Tab autocomplete just becomes part of your mental flow.
Recent coding interfaces are all trending towards chat agents though.
It’s interesting to consider what a “tab autocomplete” UI for coding might look like at a higher level of abstraction, letting you mold code in a direct-feeling way without being bogged down in details.
- juped a day ago
  
  If that's what you think a HUD is, then a HUD is definitely way, way worse. Rather than a copilot sitting next to you, that's someone grabbing your hands and doing things with them while you're at the controls.
  But if I invoke the death of the author and pretend HUD meant HUD, then it's a good point: tools are things you can form a cybernetic system with, classic examples being things like hand tools or cars, and you can't form a cybernetic system with something trying to be an "agent". To be in a cybernetic system with something you need predictable control and fast feedback, roughly.
  
  CharlieDigital a day ago
  
  I take "HUD" here to just mean "in your line of vision" or "in the context of your actual task" or minimizing any context switch to another interaction (chat window).
  Rather I think most implementations of HUD AI interactions so far have been quite poor because the interaction model itself is perhaps immature and no one has quite hit the sweet spot yet (that I know of). Tab autocompletion is a simple gesture, but trades off too much control for more complex scenarios and is too easy to accidentally activate. Inline chat is still a context switch and also not quite right.

WASDAai 2 hours ago

You are right and i am working on it for months. It is just not easy to make. Especially if your team is small

Oras 2 days ago

Great post, informative and precise.

I think the challenge is primarily the context and intent.

The spellchecker knows my context easily, and there is a setting to choose from (American English, British English, etc.), as well as the paragraphs I'm writing. The intent is easy to recognise. While in a codebase, the context is longer and vaguer, the assistant would hardly know why I'm changing a function and how that impacts the rest of the codebase.

However, as the article mentions, it may not be a universal solution, but it's a perspective to consider when designing AI systems.

djtriptych 4 hours ago

I'm pretty sure this is want I want to work on for the next decade if anyone's looking for a world class UI engineer.

amir-h 16 hours ago

I love this analogy.

I've been particularly feeling like this regarding AI code reviewers recently - I don't want a copilot that will do their own review, I want a hud that will make it easier for me to understand the change.

I've been toying with crafting such a code review tool as a side project recently: https://useglide.ai

Animats 2 days ago

Nobody mentioned Manna [1] yet? That suggests a mostly audio headset giving orders. There is a real-world version using AR glasses.[2]

[1] https://marshallbrain.com/manna1

[2] https://www.six-15.com/vision-picking

henriquegodoy 2 days ago

Great post! i've been thinking along similar lines about human-AI interfaces beyond the copilot paradigm. I see two major patterns emerging:

Orchestration platforms - Evolution of tools like n8n/Make into cybernetic process design systems where each node is an intelligent agent with its own optimization criteria. The key insight: treat processes as processes, not anthropomorphize LLMs as humans. Build walls around probabilistic systems to ensure deterministic outcomes where needed. This solves massive "communication problems"

Oracle systems - AI that holds entire organizations in working memory, understanding temporal context and extracting implicit knowledge from all communications. Not just storage but active synthesis. Imagine AI digesting every email/doc/meeting to build a living organizational consciousness that identifies patterns humans miss and generates strategic insights.

just explored more about it on my personal blog https://henriquegodoy.com/blog/stream-of-consciousness

samfriedman 2 days ago

On this topic, can anyone find a document I saw on HN but can no longer locate? A historical computing essay, it was presented in a plaintext (monospaced) text page. It outlined a computer assistant and how it should feel to use. The author believed it should be unobtrusive, something that pops into awareness when needed and then gets back out of the way. I don't believe any of the references in TFA are what it was.

thehappypm 2 days ago

Designing Calm Technology?

roca a day ago

We have explored that sort of debugging/visualization tool in https://pernos.co. We built it before the age of genAI, but I think for coming up with powerful visualizations AI is neither necessary nor (yet) sufficient.

jFriedensreich a day ago

My aha moment into this direction happened when i got XR glasses and tried embracing voice agents to see where this leads. Did not expect the HUD aspect be so cool of how this setup works. At first i tried using the glasses as a replacement for my main screen and using voice agents as a replacement to the sidebars and chat windows we are used to, which worked ok but then i started using the main screen and main agent interfaces again but kept the glasses with a second screen on a different visual focal plane connected to the voice agents. Dark mode gets a totally new importance there because black is transparent in XR which is important to work well without obscuring the main screen. I can switch between screens by shifting eye focus and have a sense what the other screen is doing a far better than just how a physical second screen is in the visual periphery. When i need to combine images more i move the focal planes closer together so i can see both screens better at the same time. The voice agents can answer side questions, start research that is needed for the next step of the main workflow or fix minor issues that are not important enough to interrupt the main workflow. Its so obvious how HUD, voice ai, XR and agents can grow together into this new computing environment but i am afraid what happens once android and ios shape that reality. I want this to be part of the web.

kroaton 11 hours ago

What glasses are you using?
- jFriedensreich 9 hours ago
  
  viture pro xr but connected just as external display to my macbook. the experimental hud software is build from scratch for my usecase using web presentation api and a custom svelte framework that does stereoscopic rendering with css matrix transformations.

est 2 days ago

It's like the magic moment when GUIs replace CLIs.

I remember the first time I started up Win95 from DOS days. Stunning.

Cthulhu_ a day ago

Or terminal GUIs in DOS. Before (and during) Windows, our go-to was 3DMENU [0] for starting up games and the like.
[0] https://archive.org/details/3DMENUPLUSDOS

LeoPanthera 21 hours ago

All I have ever wanted in my life is a heads-up display for my eyes that puts floating icons over other people telling me their name, birthday, the names of their kids, and what we talked about the last time we met.

maksimur 15 hours ago

You basically want a cognitive aid device?
- LeoPanthera 5 hours ago
  
  Yes please.

perching_aix 2 days ago

Been thinking about something similar, from fairly grounded ideas like letting a model autogenerate new features with their own name, keybind and icon, all the way to silly ideas, like letting a model synthesize arbitrary shader code and just letting it do whatever within the viewport. Think the entire UI being created on the fly specifically for the task you're working on, constantly evolving in mesh with your workflow habits. Now if only I went beyond being an idea man...

caleblloyd 2 days ago

The reason we are not seeing this in mainstream software may also be due to cost. Paying for tokens on every interaction means paying to use the app. Upfront development may actually be cheaper, but the incremental cost per interaction could cost much more in the long term, especially if the software is used frequently and has a long lifetime.
As the cost of tokens goes down, or commodity hardware can handle running models capable of driving these interactions, we may start to see these UIs emerge.
- perching_aix 2 days ago
  
  Oh yeah, I was 100% thinking in terms of local models.

endymion-light 2 days ago

I think there's room for a mixture of both use-cases. I think personally programming is far more like being a mixture of a pilot and an engineer. There are times, like in this article where it's far more useful to have a HUD, while i'm focused, i'd much rather have a better eye on things that are interesting to me. I often find things like cursor do the opposite - they slow down your processing time, you shift into an environment that often feels like i'm correcting a junior engineers work rather than expanding my knowledge of the codebase.

For example, if i'm working on authentication logic, i'd much rather see a smart heads up display, you look at a function and get advice on where this might mess other things up in the codebase, edge cases, etc. A smarter form of current IDES that doesn't mean you click through 50 different files to work out that this third party package doesn't work on specific code. A HUD in this case is ideal.

But I find there's a more detailed, slower development, I still really often use the chat function on claude mixed with Obsidian to hold bits of information i've found useful, this is more related to getting a deeper understanding of certain concepts. As a stupid developer, I often find I might need something explained 20 times in 20 different ways, and actually a predictive text model is perfect in so many circumstances to explain a massive algorithm step by step. It's ideal for things like shader code, where I might come back to it 6 months later and want to work out what was in my head at the time, a historic chat is perfect for those things.

There's definitely a balance to be struck, I think now that the hype cycle is peaking we can hopefully seperate the profit seeking AI tools from the useful day to day knowledge expansion. Currently it feels like the discovery of perspective in the reneissance - instead of using it to further advancements, we're attempting to sell perspective courses to people.

galaxyLogic 17 hours ago

Heads-Up-Display is of course useful if it gives you useful information. But you can't tell that display "Ok fly me to Rome". That's what "co-pilot" is about. It can do things, rather than just display information it decides you might find useful.

throwaway12345t a day ago

This is now (more) dated. Copilots as an interface are dated. The current initiative is full agents with human in the loop at the very start, occasionally the middle and the end.

Huds are just good UI, something copilots can natively exist as part of in the form of contextual insights and alerts.

We’re moving on to agency where it’s everything else vs an entirely different entity taking the action of flying the plane from take off to landing.

aleksituk a day ago

I like HUDs as an analogy for other forms of content/experience enrichment too e.g. what's a HUD for other forms of interaction than just looking at screens (like audio, or multimodal).

Can we embed useful data behind the content we produce/consume or derive personalised versions for the user?

-> Text/audio/video tailored for me and my interests? (ie. not just content recommended for me, but the content itself is tailored for me in terms of the insights and practical applications) -> Podcasts I can interact with? -> Audiobooks I can ask questions about?

Richer interaction that allows for more back&forth and key insights for the user and use case.

Einenlum a day ago

Makes me think that I would love to have an AI tool that not only allows autocomplete but also provides suggestions while I write code. "Have you thought about using useCallback here?" "This feature seems already provided by library X and Y" "It seems like this code already exists in function Foo. Maybe a small refactoring would be helpful"

thepuglor 2 days ago

I'd rather flip this around and be in a fully immersive environment watching agents do things in such a way that I am able to interject and guide in realtime. How do we build that, and build it in such a way that the content and delivery of my guidance becomes critical to what they learn? The best teacher gets the best AI students.

sitkack 2 days ago

> fully immersive environment
Start with a snapshot of what you are envisioning using Blender.

droideqa 2 days ago

I want a link to the GitHub for this[0] which he linked to. Makes Prolog quite interesting.

[0]: https://www.geoffreylitt.com/2024/12/22/making-programming-m...

johnisgood 2 days ago

Yeah, it looks interesting. I would give it a spin. Maybe then I could finally make a MUD in Prolog. :D

makaking 2 days ago

> "non-copilot form factors that more directly extend the human mind."

I think Cursor's tab completion and next edit prediction roughly fits the pattern, you don't chat, you don't ask or explain, you just do... And the more in coherent your actions are the more useful the HUB becomes.

timeinput a day ago

There is a giant confusing tech stack here, but opening up this simple plain text page resulted in like 100% CPU usage from from gnome-shell when visiting from firefox on a pretty non-customized Ubuntu 24 install.

In the firefox task manager nothing really looked odd, but opening that tab and displaying it is insanely CPU intensive.

Pausing the autoplaying video makes it seem like a sane web page in terms of CPU usage. I'm surprised how much CPU playing that video consumed.

itslennysfault a day ago

Very odd. This must be some kind of Ubuntu issue. Out of curiosity I tried it in Chrome, Firefox and Safari on MacOS 15.5 and saw very little CPU usage and no difference with/without the video playing. I don't have a Ubuntu Desktop handy to confirm. Looking at the video it's nothing special. Just a fairly small mp4 using a native html5 video element to play it. Really no reason this should be causing issues.

wewewedxfgdf 2 days ago

Kind of a weird article because the computer systems that is "invisible" i.e. an integrated part of the flight control systems - is exactly what we have now. He's sort of arguing for .... computer software.

Like, we have HUDs - that's what a HUD is - it's a computer program.

CGamesPlay 2 days ago

A HUD is typically non-interactive, which is the core distinction he’s advocating for. The “copilot” responds to your requests, the “HUD” surfaces relevant information passively.
- skydhash 2 days ago
  
  Isn't this one of the core selling of an IDE compared to a simple editor? You have the analyzer running in the background and telling you things. And a host of buttons around the editor to act on stuff, all within the context of a project.

utf_8x 2 days ago

You know that feature in JetBrains (and possibly other) IDEs that highlights non-errors, like code that could be optimized for speed or readability (inverting ifs, using LINQ instead of a foreach, and so on)? As far as I can tell, these are just heuristics, and it feels like the perfect place for an “AI HUD.”

I don’t use Copilot or other coding AIs directly in the IDE because, most of the time, they just get in the way. I mainly use ChatGPT as a more powerful search engine, and this feels like exactly the kind of IDE integration that would fit well with my workflow.

aantix 2 days ago

I would love to see a HUD that allows me to see the change that corresponds to Claude Code's TODO item.

I don't want inline comments as those accumulate, don't get cleaned up appropriately by the LLM.

obscure-enigma 2 days ago

> design the cockpit so that the human pilot is naturally aware of their surroundings.

This is a design interface problem. Self-driving cars can easily ingest this HUD. This is the reason what makes Apple's AI different from other microservice-like AI. The spell checker, rewrite, proofread are naturally integrated into the UI to the extent it doesn't feel like AI powered operations.

torginus 2 days ago

I read you, but unfortunately despite the limitations of current AI tech, all companies covertly/overtly market themselves as human replacement not human augmentation solutions with the aim of cutting out expensive experts out of the loop, not making them more productive, even though the messaging might say otherwise from time to time.

yurimo 2 days ago

I might be wrong but isn't the HUD the author suggesting for coding is basically AREPL? For debugging I can see it work, but chatbox and inline q&a I feel has awider application.

On a wider note, I buy the argument for alternative interfaces other than chat, but chat permeates our lives every day, smartphone is full of chat interfaces. HUD might be good for AR glasses though, literal HUD.

simianwords 2 days ago

I believe this is the framework vs library debate but in a different context. HUD is like me using a library of AI enabled tools. The human agency is given more importance here - they decide how to use the tools to suit their purpose.

Copilot is more like a framework where an AI system exists which tells me what to do (a bit like the inverse of a library).

monkpit a day ago

I was recently thinking, if services offered sufficient api docs/specs you could really just provide an llm to the client and have any client build their own UI to fit their needs. Maybe not that you’d want to. But you could!

vidarh 2 days ago

> “You’ll no more run into another airplane than you would try to walk through a wall.”

People do walk into walls, though.

I see the appeal of the idea, but it's not a replacement for something that actually agressively demands your attention when there are high risk events.

ag2s 2 days ago

A very relevant article from lesswrong, titled Cyborgism https://www.lesswrong.com/s/f2YA4eGskeztcJsqT/p/bxt7uCiHam4Q...

Havoc a day ago

Some sort of neural interface feels inevitable at this point

_samjarman 2 days ago

I'd love to mount a top-down webcam above me when I work on electronics and have a real time feed into ChatGPT or similar for Q&A, for a real time coach that can see. One day... :)

your_challenger a day ago

The Ink & Switch guys always have the best-est of ideas!

cat-whisperer a day ago

I'm thinking AR glasses with AI HUDs and we're basically there - my to-do list just got a whole lot more interesting.

bhaktatejas922 20 hours ago

doing AI code edits on a JS web app is probably the way to do it. probably using something like groq kimi k2 + morph for fast edits

now we need a web framework with faster reload times

jawns 2 days ago

The author gives an example of a HUD: an AI that writes a debugger program to enable the human developer to more easily debug code.

But why should it be only the human developer who benefits? What if that debugger program becomes a tool that AI agents can use to more accurately resolve bugs?

Indeed, why can't any programming HUD be used by AI tools? If they benefit humans, wouldn't they benefit AI as well?

I think we'll be pretty quickly at the point where AI agents are more often than not autonomously taking care of business, and humans only need to know about that work at critical points (like when approvals are needed). Once we're there, the idea that this HUD concept should be only human-oriented breaks down.

clbrmbr 2 days ago

A thought-provoking analogy!

What comes immediately to mind for me is using embeddings to show closest matches to current cursor position on the right tab for fast jumping to related files.

stan_kirdey 2 days ago

needed to find a kids’ orthodontist. made a tiny voice agent: feed it numbers, it calls, asks about price/availability/insurance, logs the gist.

it kind of worked. the magic was the smallest UI around it:

- timeline of dials + retries

- "call me back" flags

- when it tried, who picked up

- short summaries with links to the raw transcript

once i could see the behavior, it stopped feeling spooky and started feeling useful.

so yeah, copilots are cool, but i want HUDs: quiet most of the time, glanceable, easy to interrupt, receipts for every action.

OJFord a day ago

I do think typing text prompts at LLMs will go down as a barely remembered brief period in history, that seems quaint and dumb to anyone who remembers or is told about it.

(I don't know what we'll be doing instead, I just think text prompts feel dumb.)

dreamcompiler a day ago

The main reasons I don't have "smart" devices in my house:

1. They introduce new and fascinating failure modes that never happened with the old, dumb devices (e.g. if your router fails you lose the ability to control your lights).

2. They demand human attention at the slightest provocation (e.g. the microwave beeps loudly forever when your food is done, every app on your phone insists on interacting with you whenever the company would like to upsell you something, etc.)

Item 2 above is what TFA is about. Yes you can often turn this shit off, but that's not the point. The point is you shouldn't have to. Useful technology should never call attention to itself in the manner of someone with narcissistic personality disorder.

But what about emergency situations? Glad you asked. Many airplane crashes in modern aircraft have happened because of "warning buzzer overload" which happens when one important system on the aircraft fails and then causes a cascade of secondary warnings, while giving the pilot no insight as to the root cause. A true AI assistant would reason about such situations and guide the pilot toward the root solution.

A true coding assistant would do the same kind of reasoning about program errors and suppress multipage error dumps in favor of flagging the root issue.

radres 2 days ago

One of the simplest and best working applications of AI is the gpts living in your clipboard. Any prompt/workflow you have is assignable to a shortcut on demand and it pops up a chat window with response on demand. It's been game changing honestly.

furyofantares 2 days ago

Can you elaborate on this?

roywiggins 2 days ago

> The agentic option is a “copilot” — a virtual human who you talk with to get help flying the plane. If you’re about to run into another plane it might yell at you “collision, go right and down!”

Planes do actually have this now. It seems to work okay:

https://en.m.wikipedia.org/wiki/Traffic_collision_avoidance_...

gklitt 2 days ago

Author here. I thought about including TCAS in the article but it felt too tangential…
You’re right that there’s a voice alert. But TCAS also has a map of nearby planes which is much more “HUD”! So it’s a combo of both approaches.
(Interestingly it seems that TCAS may predate Weiser’s 1992 talk)

iddan 2 days ago

This is what we are going for in Closer - an AI native experience that extends the mind of salespeople. https://closer.so

revskill 13 hours ago

Agree. Current copilot bullshits lost their context, ahhh no, they know nothing bout context, such a joke.

nioj 2 days ago

Concurrent posting from 5 hours ago (currently no comments): https://news.ycombinator.com/item?id=44705018

amelius 2 days ago

I want an AI HUD that blocks ads.

Cthulhu_ a day ago

What if you could get AR glasses that translates ads into their core message like "CONSUME", "MARRY AND REPRODUCE" and "OBEY"? That would be cool. You could make a movie out of that!

smolder 2 days ago

[flagged]

ares623 2 days ago

It should have a paperclip mascot

anaisbetts 2 days ago

I mean, fundamentally the reason that we don't build most AI HUD ideas right now is that they are either stupidly insanely expensive at scale, or you have to run the AI on-device on an NPU so underpowered that it's going to give you really really bad results.

Lots of great ideas in this space but it's tough to make something that delivers value and also is economically viable

julik a day ago

So... Google Glass?

jpm_sd 2 days ago

This is how ship's AI is depicted in The Expanse (TV series) and I think it's really compelling. Quiet and unobtrusive, but Alex can ask the Rocinante to plot a new course or display the tactical situation and it's fast, effective and effortlessly superhuman with no back-talk or unnecessary personality.

Compare another sci-fi depiction taken to the opposite extreme: Sirius Cybernetics products in the Hitchhikers Guide books. "Thank you for making a simple door very happy!"

skydhash 2 days ago

I may remember wrongly, but I don't believe the expanse was depicting AI. It was more powerful computation and unobtrusive interfaces. There was nothing like Jarvis. Rocinante was a war vessel and all its feature was tailored to that. I believe even the mechanical suits of the Martians were very manual (no cortana a la Master Chief).
- jpm_sd a day ago
  
  That's exactly what this article is about, though. Assistive systems that are incredibly capable without needing to have a fake personality.

betterhealth12 a day ago

How about contextual AI insights as you're interacting with the world, wherever you are? Like a quality signal.

Oh wait, this is something FB/Goog could never do because what would it say about the quality of information hosted...

soco a day ago

AR for the regular guy is something which I remember installing on my mobile over ten years ago - with low expectations, and it never got any better than that, quite the contrary. Powered by AI or not, seems nobody has found (or looked for) a market for this.

TimCTRL 2 days ago

makes sense, i guess tools like cluely are HUDs

barrenko 2 days ago

Yup, not affiliated, but this is what cluely kinda does -
https://nitter.poast.org/im_roy_lee/status/19387190060029217...

kotaKat a day ago

Enough AI copilots and AI HUDs, we need “none of this bullshit forced upon us”.

bitwize a day ago

I want AI (machine learning statistical techniques) to convert my brainwaves into text so that I can think my programs into the computer. That would be a useful breakthrough in AI-assisted programming,

FrustratedMonky a day ago

I think the OpenAI left panel/ right panel code, seems to fit this.

cmrdporcupine a day ago

100% agree with this. I want augmentation not automation.

That said, CoPilot in the form of "autocomplete" is kind of that.

I have been enjoying the hell out of Claude Code, but I'd feel much better about it if it wasn't a case of "here take this pile of diffs" and had a mode Socratic modality.

A tool that works with me instead of for me, because I'm going to have to review everything it does anyways.

newsclues a day ago

More sensors and more data, humans become a limiting factor until a better way to communicate the information is used.

eboynyc32 2 days ago

Excited for the next wave of ai innovation.

precompute 2 days ago

HUDs are primarily of use to people that are used to parsing dense visual information. Wanting HUDs on a platform that promises to do your work for you is quite pointless.

Der_Einzige 2 days ago

Actual HUDs, like the ones for cars - along with vector displays, and laser display tech in general, is criminally undervalued and underutilized.

satisfice 2 days ago

Yes, this is a non-creepy way of applying AI.

smolder 2 days ago

Let's stop allowing this clearly incentivized hype for investable shit and have honest conversation, please.

SilverElfin 2 days ago

Isn’t that what all the AI browsers like Comet, or the things like Cluey are trying to do

camdroidw a day ago

AI powered this, AI powered that, what most people don't realise is that the next game changer would be Al powered batteries

nunodonato a day ago

what?

kova12 2 days ago

aren't we missing the point that co-pilot is there in case pilot gets incapacitated?

melagonster 2 days ago

The pilot is the copilot today. These computer can handle most of the tasks automatically.

evrennetwork 2 days ago

[dead]

aaron695 2 days ago

This is a mess.

The 1992 talk wasn't at all about AI and since then our phones have given us "ubiquitous computing" en masse.

The original talk required no 'artificial intelligence' for relevance which makes it strange to apply to todays artificial intelligence.

The original talk made good points for instance "voice recognition" has been solved forever at a reasonable level, yet people kept claiming if it was 'better' a 'magic experience' would pop out as if voice was different to typing. Idiots have been around for a long time.

Don't get what OP is trying to say.

'AI HUD metaphors' are very hard, that's why they are not ubiquitous, they require constant input. Spellcheck runs every character typed. Agents are because of less $.

'Hallucinations' also make 'AI HUD metaphors' problematic, for spellcheck squiggly red lines would be blinking on off all over the page as a LLM keeps coming back with different results.

precompute 2 days ago

FYI, you're shadowbanned.

smolder 2 days ago

[flagged]

tomhow 9 hours ago

It's explicitly against the guidelines to comment like this:
Please don't post insinuations about astroturfing, shilling, brigading, foreign agents, and the like. It degrades discussion and is usually mistaken. If you're worried about abuse, email hn@ycombinator.com and we'll look at the data.
https://news.ycombinator.com/newsguidelines.html