Conducting deep web searches and gathering sources is one of the main things I’ve been using LLMs for. How far away are we from being able to self-host something like Claude’s web search capabilities? Or even just a service where I’d pay with my money instead of my data?

  • Avid Amoeba@lemmy.ca
    link
    fedilink
    arrow-up
    0
    ·
    20 hours ago

    Open WebUI + SearXNG + llama.cpp + Qwen 3.6 35B + 16-32GB GPU. Gives you 256K context and runs with 80-100tps on 3090. If you have less VRAM like 16GB it’ll be slower but still probably tens of tps on anything recent. I run it on AMD Pro 9700 which is about as fast as 3090.