Conducting deep web searches and gathering sources is one of the main things I’ve been using LLMs for. How far away are we from being able to self-host something like Claude’s web search capabilities? Or even just a service where I’d pay with my money instead of my data?


I know you said not clean, but Claude and other LLMs with this ability can and do still hallucinate even when researching. You can have it give you direct source links (not just the attributes that it gives by default) with explicit instructions to quote the exact finding, word for word, and sometimes it will suddenly tell you that it was made up and not actually in the attributed page. Not every time, but it’s something to be very careful with.
Google’s own LLM struggles with this issue as well.