Is it just me or are LLM chat bots still useless in many cases?

3 points by pthr a year ago

Despite all enthusiasm I hear around me (at work mostly) and read online, I've noticed on many occasions the responses I get are really just useless.

This recent encounter illustrates that well. It's a simple coding task, where I couldn't by reading the manual figure out in 1 minute whether or not I could get something done with a given project. Hence I asked the following question at phind.com, MS Co-pilot, MS M365 Chat, Llama 3.1 70B (via the Huggingface web interface), chatgpt-o1 preview:

---

I have a recent BMW electric vehicle. I want to programmatically list all charging sessions: I need the charging session's start and end time, with the amount of kWh charged.

I found the bimmer_connected Python module, which may be able to do what I want. Can you check whether that indeed allows to obtain such a list, and if so, suggest some sample Python code.

Be brief, I'll read the code and don't need much explanation.

---

Observations:

  - all of them except from chatgtp-o1 preview told me the functionality existed (which, from reading the manual, is not the case; I guess BMW doesn't store extended history and hence doesn't offer an API to obtain such info)
  - they all returned code examples with functions that in reality don't exist in the said module
  - chatgpt-o1 told me the functionality didn't exist in bimmer_connected, but suggested another Python module named 'MyBMW'. As I couldn't find that easily, I asked the follow-up question where I should find it: the response was that no such Python module existed.

This is really representative for my experiences so far.

What are your thoughts and experiences?

Tier3r a year ago

The number of cases AI is used for is too vast to cover, but I do have the same experience. I've a few heuristics about assessing how well AI will work.

1) AI's performance decreases exponentially with the sparsity of relevant examples available on the web. The fields in which these are prevalent tend to be specific frameworks, libraries or hardware. AI may sometimes extrapolate from similar examples - for example many JS libraries are similar to others - but when it comes to, say, firmware libraries, it fails horribly. How relevant is relevant is a matter of some dispute. I find that it cannot transfer much beyond direct examples of library code. It's not necessarily a matter of bad documentation, even if the necessary docs are there, it has very poor ability to come up with a solution from first principles.

2) AI's performance decreases linearly with the number of different dependencies that piece of code connects to. Part of it is it doesn't have the context, part of it is that the chance of it seeing that particular configuration decreases exponentially with the number of variable configurations.

3) Bizarrely AI is terrible at recommending books. I was asking it for books on introductions to category theory and it made a whole bunch.

Where AI is useful is writing boilerplate, especially starting code which has minimal dependencies and plenty of examples online.

pthr a year ago

Thanks, those are very good points!
Intuitively it makes a lot of sense that asking for boilerplate code for Flask (to name something widely used and documented) these AI bots could perform way better than on something for which much less content exists, as with the scenario I described.