Conducting deep web searches and gathering sources is one of the main things I’ve been using LLMs for. How far away are we from being able to self-host something like Claude’s web search capabilities? Or even just a service where I’d pay with my money instead of my data?
To pay with money check out Kagi assistant
Is it any good? I pay for Kagi search but don’t use the assistant.
I only use the models available on the cheap plan. I like that the first thing it does is an internet search so it doesn’t say anything too stupid. I’d say it’s as good as any ai product 🤷
Open WebUI + SearXNG + llama.cpp + Qwen 3.6 35B + 16-32GB GPU. Gives you 256K context and runs with 80-100tps on 3090. If you have less VRAM like 16GB it’ll be slower but still probably tens of tps on anything recent. I run it on AMD Pro 9700 which is about as fast as 3090.
Did anyone mention that huggingface will quanticize for you? Its like one button push.
If you have the equipment and about 5TB of storage Fable 5 AI is available by Torrent on Pirate Bay right now. :)
What’s an example of something you can do with this? I’m not familiar with Claude
“Hey Claude, research for me current research to Nuclear Fusion. What are the biggest hurdles what are the next steps, and how promising is private research” enabling the research feature will give you a report, Fact checked (not clean but ok ish), and all the sources for it.
Claude will spin up a bunch of workers and search the web, following leads, and so on.
One of the few actual useful features of AI IMHO
I know you said not clean, but Claude and other LLMs with this ability can and do still hallucinate even when researching. You can have it give you direct source links (not just the attributes that it gives by default) with explicit instructions to quote the exact finding, word for word, and sometimes it will suddenly tell you that it was made up and not actually in the attributed page. Not every time, but it’s something to be very careful with.
Google’s own LLM struggles with this issue as well.
What my Qwen 3.6 35B setup spat out to this prompt. Used 20-something sources:
Based on my research, here’s a comprehensive overview of the current state of nuclear fusion research:
Current Major Breakthroughs
The field has seen remarkable advances recently:
- National Ignition Facility (NIF) achieved fusion ignition for the seventh time in February 2025, setting a new energy gain record of 2.44x. In April 2025, they produced a record fusion yield of 8.6 MJ - that’s more than four times the laser energy put in (2.08 MJ).
- Global investment has surged: Fusion startups have raised over $13 billion to date, with 17 companies having raised $100M+ each.
Biggest Hurdles & Challenges
1. Materials Science
The fusion environment is extremely hostile. Neutrons from fusion reactions will bombard the reactor’s first wall and vacuum vessel with more than 10 displacements per atom (dpa) per year, causing:
- Severe material degradation
- Tritium retention issues
- Erosion of plasma-facing components This requires entirely new materials that can withstand decades of operation.
2. Engineering Scale-Up
Moving from physics experiments to power plants presents enormous engineering challenges:
- For inertial confinement (NIF), you’d need to ignite pellets many times per second—described as a “dizzying” challenge
- Maintaining stable plasma at 150+ million°C for continuous operation (not just short pulses)
- Heat extraction systems that can convert fusion heat into electricity efficiently
3. Plasma Physics
Understanding and controlling plasma turbulence, instabilities, and confinement remains an ongoing science challenge. While tokamaks are proven, consistent, optimal performance is difficult.
4. Tritium Breeding
Fusion reactors need tritium (which is rare), so they must breed their own in the reactor blanket—a technology that hasn’t been demonstrated at commercial scale yet.
5. Cost & Complexity
Current fusion devices cost billions to build and operate. Economic competitiveness with other energy sources (and fission) remains uncertain.
Next Steps & Timeline
Public Research:
- ITER (the international tokamak project): Construction is progressing but the first plasma has been pushed back to 2034, a 9-year delay from earlier plans. Full deuterium-tritium operations are targeted for 2036+
- NIF: Continuing experiments with higher yields, working toward more efficient ignition schemes
Near-term Goals (2025-2030):
- SPARC tokamak (Commonwealth Fusion Systems) aims to demonstrate net energy by mid-2020s using new HTS magnets
- Continued NIF physics campaigns to understand ignition optimization
- DEMO reactor design work for next-generation demonstrations
Commercial Timeline:
The Fusion Industry Association states that the majority of fusion companies anticipate commercial power by the mid-2030s. This seems aggressive but reflects accelerated progress and massive investment.
Private Sector Research: How Promising?
The private sector is arguably making the most exciting progress:
Major Players:
-
Commonwealth Fusion Systems (CFS): Developing SPARC using high-temperature superconducting magnets to create a compact, cheaper tokamak. They’ve already built and tested their HTS magnet technology at MIT’s Alcator C-Mod and are raising capital for the full SPARC device.
-
Helion Energy: Pursuing a unique pulsed magnetic fusion approach (not a tokamak), planning a pilot plant by mid-decade with backing from Amazon’s Jeff Bezos and Bill Gates.
-
Tae Technologies: Founded by John Nuckolls (NIF physicist) and others, using an inertial electrostatic confinement concept.
-
17 startups have each raised $100M+, collectively attracting over $13B in total investment.
Strengths of Private Research:
- Speed: Smaller, more focused teams can iterate faster than billion-dollar international projects like ITER
- Innovation: Willingness to pursue alternative approaches beyond tokamaks (stellarators, field-reversed configurations, laser-based ICF)
- Funding momentum: Strong VC interest and tech billionaire backing
- Regulatory progress: The ADVANCE Act (passed 2024 in the US) creates a more favorable regulatory framework for fusion
Concerns:
- Overhyped timelines (“commercial by mid-2030s” may be optimistic given historical nuclear project delays)
- Many companies will likely fail; it’s an unproven industry
- The “10-year away” phenomenon has plagued fusion for 60 years—can private sector avoid this?
Bottom Line
Nuclear fusion is in the most promising phase of its history. We’ve definitively proven scientific net energy (NIF), new technologies like HTS magnets are enabling smaller reactors, and private investment is unprecedented. However, the gap between “net energy in a lab experiment” and “reliable, economic electricity on the grid” remains massive—possibly 15-20 years still.
The consensus seems to be: fusion might reach early commercial deployment by the mid-2030s from private companies, but widespread adoption is likely decades away. Whether that’s good enough depends on your patience and how you weigh potential reward against current climate urgency.
Openwebui+searxng on a AMD strix board.
Pro: works like a charm, low power consumption, fast, “big” , LLM (running qwen3.6 35B A3B + gemma4 E4B for website summaries and other smaller tasks)
Con: strix boards start at 2k€, more in USA because of tarrifs
Curious why do you swap between Qwen and E4B. On my hardware they perform with similar tps. Qwen 3.6 35B spits out 80-100tps on AMD 9700 and E4B gives me about the same tps.
To avoid context switching on the GPU. OpenWebUi for example uses it for memory and title generation.
Those are not performance critical and background tasks, so instead of slowing down qwen, we just outsource this stuff to the NPU.
What’s a “strix board”? Is that necessary?
AMD Strix is an APU, optimized for AI. It is the cheapest option I am aware of to run bigger models at home. 2k for 56GB VRAM, and less den 300W total power Budget.
One could run smaller models. But for the context sizes required for research work, that is nearly impossible.
Also, external services, like openrouter, can be used to use models hosted in the cloud.
But for self hosted, you need something that can run models with at least 15GB of VRAM + Context. For comparison. Our highly quantized model uses 20GB of vram. For our 4 slots we need another 20GB on top of it (around 5GB for 254k tokens), making it 40GB.
Yup. And if you want to take a small step without major hardware requirements: connect your setup to a paid subscription Mistral or Anthropic API. They allow you to switch off training on your data.
On top of that, the costs are way lower than the normal consumer grade chat subscriptions, and your searches + memory are kept locally (e.g., managed through open webui).
For those who want to know more, rough setup:
- llama-cpp rocmfp4 fork
- currently custom quantized qwen3.6 35B A3B model, working on publishing
- be3 embedding and reranker, also GPU
- gemma4-e4b via FastFlowLM on NPU!
- OpenWebUI and searxng as docker containers on a Pi currently
We get 70-100tok/s generation. Four slots with 256k context length each.
We use a smaller Board with “only” 64GB of shared LPDDR5X. Bottleneck is memory speed, rocmfp4 quants help a lot.
As soon as I get my imatrix calibration right, I will publish the quantized versions.
Most existing quantized models are broken. The authors did some not supported stuff (like using a already quantized model and requantize it) that you may get issues with coherence or sudden Chinese words in the output.
That is not an issue with rocmfp4 but with vibe coders and agent psychosis.
Thank you so so much for pointing out ROCmFP4. I have been tinkering with my RDNA 3 framework on llama. I was struggling with ROCm llama.cpp and have been using vulcan in the meantime. I know there’s some issues on the llama.cpp github to try and fix my issue (UMA stuff), but haven’t come across this specific project. Gonna try it out
Do you have a walk through for setup?
I’m on the strix halo 128 gb variant and while I got ollama working fine, i haven’t gotten any of these multi headed setups working
I am on Gentoo for it, but everything with a decent rocm should work.
Have a look for llama-swap, that handles multi head endpoints.
Also, as you are on a big board, you can quantize yourself, as the BF16 version of qwen has only 72gb.
I will try and post a full writeup next days. But feel free to dm me, if you need some guidance on quantize or more.
I am using this fork currently: https://github.com/charlie12345/ROCmFPX
Stuff happens fast currently, so may be worth to wait a week or two ig you need something super stable, but if you are up for experimenting, that’s the way to go
Great man! Gentoo lover and long time addicted here… Keep it the good work!
THis is great, thanks. I’m on the z-13 and needed to use it for a work project, which is wrapping up soon. I’m planning on re-building it as a locally hosted agent support machine.







