szszrk 7 hours ago

It looks nice, I'm positively surprised by search results I got on demo page. Ai response is more or less similar to what I'm used to, graphics contain YouTube thumbnails from videos that are super related to the topic I asked (one that took me a while to stumble upon, but is a huge knowledge source), text results are decent...

I never looked into private/selfhosted search. How does such service gather data from web? What's the original source, who does scrapping and how do you update it?

  • felladrin 6 hours ago

    Glad for your review!

    About the source of the search results, both text and images, they're all from [SearXNG](https://github.com/searxng/searxng/).

    > SearXNG is a free internet metasearch engine which aggregates results from more than 70 search services. Users are neither tracked nor profiled.

    SearXNG by itself offers a full-stack platform for you to run searches privately (you can find public instances at <https://searx.space/>, and easily host yourself [via docker](https://github.com/searxng/searxng-docker)).

    About how they scrape other search engines, it's really simple: HTTP calls and parsing of HTML (for most of them).

    In MiniSearch, I don't need to save the results by myself. The scrape is done in real-time by SearXNG and passed to MiniSearch, which in turn runs a similarity search and filters out the textual results that don't seem that useful.

    But I can say the real differential of MiniSearch is that it's mobile-first. Since the beginning, it was made to run on the browsers of Chrome/Safari/Firefox Mobile, and [Wllama](https://github.com/ngxson/wllama) together with [Web-LLM](https://github.com/mlc-ai/web-llm), along with LLMs of <1B parameters, allowed it!

    If you're curious, here's the HN post I made about it a year ago: https://news.ycombinator.com/item?id=37885752

    • szszrk 4 hours ago

      Thank you for such patient and rich response. I didn't realize that's how SearXNG works.

      Fully agree with you that mobile experience is the highlight of your project. It feels... not present. They way it should be - clean, intuitive and focused on results, not the tool.

      It really makes me want to add it to things I self host.