How big of a problem is unstructured data for companies?

8 points by dark7 17 hours ago

I read somewhere that 90% of companies have data like documents, PDFs, videos, images, audio clips, and other content that are unstructured that will be a big obstacle for ai.

Are there companies already in this space?

Trying to see if there's something here before I possibly create a MVP.

danjl 7 hours ago

The trick is finding a problem that increases revenue or decreases costs after providing structure to the data. Sure, it would be great to bring structure to assets, but you can't just provide search or labeling. You have to figure out how providing the structure actually brings value to those companies. You'd hope they would do that for you, but you need to figure it all out, at least for one set of customers who will pay you, before you build the MVP. The details of how it benefits the company has a profound effect on the design of the MVP, including how to access the assets and how to expose the structure in the UX.

  • dark7 4 hours ago

    Yeah good point. Would have to provide some value other than search. They can probably already do that with ChatGPT to an extent with pdfs and what not.

evanjrowley 12 hours ago

It's an even bigger obstacle for data management, particularly classification and loss prevention. Comparatively, it's less of an obstacle for AI and most likely that will be a game changer for addressing those other issues.

edmundsauto 8 hours ago

I work for big tech. Our problem is not the unstructured nature of the data; it is the volume of noise to signal. Basic ranking and information retrieval is implemented; we have LLM/RAG systems that can be queried. However, it’s hard to evaluate what is good and up to date information - 98% of the documents people kick out are not useful.

  • dark7 4 hours ago

    Interesting. So for example you’re saying you make a query and the information in the document is old and out of date?

theGnuMe 9 hours ago

There are companies but it is a wide open space. There was a legal AI startup sold last year for a billion or so...