Guide to Self Hosting LLMs Faster/Better than Ollama

brucethemoose@lemmy.world · 2 days ago

Remember how great Brexit was, they say.

brucethemoose@lemmy.world · 5 days ago

One good thing that may come of Trump is shaking a lot of complacency out of other countries. Maybe even hurting the far right in them, once the population gets a lot of exposure to full MAGA.

brucethemoose@lemmy.world · edit-2 6 days ago

I saw a Brexit meme awhile back, of America going “hold my beer.”

Well, the UK is holding our beer now. Here we go!

brucethemoose@lemmy.world · 20 days ago

But what disinformation? What’s the lie?

brucethemoose@lemmy.world · edit-2 20 days ago

Iran!

And CSTO countries that don’t partularly like Russia anymore.

brucethemoose@lemmy.world · 20 days ago

No surprise there. Russia’s command structure seems like a disaster based on what I read from the ISW.

brucethemoose@lemmy.world · 20 days ago

Seems like a hard secret to keep, even for Russia.

I’m guessing state media just downplays it? But what do they say? Like, do they just very quickly mention assistance from Korea (not even specifying North/South) in a blip and move on to the next blurb? Is there a longer justification?

Or do they keep it a secret and tell soldiers to keep their mouths shut?

I am morbidly interested in the propaganda aspect.

brucethemoose@lemmy.world · edit-2 20 days ago

I wonder how this is being portrayed inside Russia?

I mean… did the public look on North Korea very favorably before the war?

brucethemoose@lemmy.world · 21 days ago

They’re all mega expensive, arent’ they?

I feel like a huge mistake is making heavy, mega luxury EVs instead of pushing smaller-battery cars with a tiny (I’m talking like 2hp) backup generator.

brucethemoose@lemmy.world · 26 days ago

To actually answer this, you could look into free APIs of open source models, which have daily limits but are otherwise largely catch-free. You could even mirror endpoints on your VPS if you need to, or host “middleware” like prompt formatters and enhancers.

I say this because, as others said, you cannot actually host AI on a VPS…

brucethemoose@lemmy.world · edit-2 27 days ago

I still think that having an operational moon-based spaceport

Depends what it’s used for, but yeah. But I think the human habitation would be extremely minimal, and it would be more of a utilitarian “midpoint” for deep-space missions and a research site rather than a place of extensive human habitation.

Also read: https://www.orionsarm.com/xcms.php?r=oaeg-front

It’s a fictional universe in a wiki format (with some short stories), but based on hard science, and (IMO) a much more realistic idealized depiction of what future humanity could look like.

brucethemoose@lemmy.world · edit-2 27 days ago

Colonization doesn’t make sense in light of what’s likely to come first. Artifical intelligence, mind uploading, extensive genetic engineering, programmable nanotech for fabrication, take your pick… All these are infinitely more reachable and cheaper than dedicating tons of resources to sustaining a squishy, fragile human bodies in space while the vast majority are still stuck on Earth due to economic constraints.

It’s just not economical until humans are so different that it doesn’t really resemble are Star Trek-ish visions of humans on space boats (eg they’re flying around in computers, AI are sent ahead to construct habitation, bodies are genetically engineered for survival in space, that sort of thing).

Again, I am not talking about research or the glory of stepping foot somewhere, but I just don’t see the point of trying to emulate a traditional human living in an environment where it’s so impractical.

brucethemoose@lemmy.world · edit-2 28 days ago

Two things:

That was kinda the dream after WWII, no?
Exploring space should be a uniting purpose of humanity, but colonizing space, as humans live now, is just wildly, hilariously impractical. It would be orders of magnitude cheaper and easier to live at the bottom of the ocean, or under the antarctic ice sheet. And this is speaking as someone really into exotic rocketry and transcendental sci-fi.

I’d recommend reading through Project Rho, if you’re interested: https://projectrho.com/public_html/rocket/

As well as “farther future” but grounded Sci-Fi like Orion’s Arm, where humanity doesn’t really resemble its current form. And play KSP! The more you read and see, the more you realize “wow, sending humans through space is hard, and living there kinda doesn’t make sense right now.”

brucethemoose@lemmy.world · edit-2 29 days ago

If Ehud Barak had gone back to the Israeli people with “You have to give them back their houses and stop encircling/blockading their settlements”, he’d have been assassinated by the Israelis.

Isn’t that the nature of a “winner takes all” knife’s edge political system, though? If the opposition were in power, they would have done something like this, and Israel would hate it, but they’d have to take it just like they took what they didn’t like over the past decades. Maybe they’d lose the next election (and get assassinated), but the deed would already be done.

…Or maybe I’m totally wrong.

brucethemoose@lemmy.world · edit-2 29 days ago

I’m no expert, but I remember Netanyahu’s opposition supporting two state solutions and other much more reasonable approaches than the horrible status quo. But it never quite hit critical mass, right?

And Netanyahu sure seems like the person who clung to power and just barely stopped that opposition from ever taking root.

brucethemoose@lemmy.world · edit-2 29 days ago

Biden: Please no…

Netanyahu: Waves eyebrows.

Ugh, what a toxic relationship. The U.S. bends over backwards for Netanyahu, not Israel, Netanyahu, and he’s going to snipe Biden/Harris at the last second anyway. He could literally cost Harris the election.

And, you know, lead to tons of death and destruction and literal genocide. But thats a secondary concern in his quest to stay in power.

brucethemoose@lemmy.world · edit-2 1 month ago

Most of this seems like a no brainer. Of course Ukraine should join NATO, the whole point of NATO is to defend against Russian aggression and they are doing that by themselves. There couldn’t be a better qualification.

Everyone wants to cooperate with Ukraine now, and rebuild it.

It all just seems like a matter of Donald Trump and EU equivalents not being elected, and “problem” NATO states being coerced to jump onboard.

brucethemoose@lemmy.world · edit-2 1 month ago

This is horrible. What was ostensibly the target, the article mentions nothing about that.

The strikes defied what Mikati had told Al Jazeera just a day earlier were assurances given by the US that Israel would reduce its attacks on Beirut.

…Does the US have any credibility left when it comes to assuring others what Israel will do?

brucethemoose@lemmy.world · edit-2 1 month ago

To go into more detail:

Exllama is faster than llama.cpp with all other things being equal.
exllama’s quantized KV cache implementation is also far superior, and nearly lossless at Q4 while llama.cpp is nearly unusable at Q4 (and needs to be turned up to Q5_1/Q4_0 or Q8_0/Q4_1 for good quality)
With ollama specifically, you get locked out of a lot of knobs like this enhanced llama.cpp KV cache quantization, more advanced quantization (like iMatrix IQ quantizations or the ARM/AVX optimized Q4_0_4_4/Q4_0_8_8 quantizations), advanced sampling like DRY, batched inference and such.

It’s not evidence or options… it’s missing features, thats my big issue with ollama. I simply get far worse, and far slower, LLM responses out of ollama than tabbyAPI/EXUI on the same hardware, and there’s no way around it.

Also, I’ve been frustrated with implementation bugs in llama.cpp specifically, like how llama 3.1 (for instance) was bugged past 8K at launch because it doesn’t properly support its rope scaling. Ollama inherits all these quirks.

I don’t want to go into the issues I have with the ollama devs behavior though, as that’s way more subjective.

brucethemoose@lemmy.world · edit-2 1 month ago

It’s less optimal.

On a 3090, I simply can’t run Command-R or Qwen 2.5 34B well at 64K-80K context with ollama. Its slow even at lower context, the lack of DRY sampling and some other things majorly hit quality.

Ollama is meant to be turnkey, and thats fine, but LLMs are extremely resource intense. Sometimes the manual setup/configuration is worth it to squeeze out every ounce of extra performance and quantization quality.

Even on CPU-only setups, you are missing out on (for instance) the CPU-optimized quantizations llama.cpp offers now, or the more advanced sampling kobold.cpp offers, or more fine grained tuning of flash attention configs, or batched inference, just to start.

And as I hinted at, I don’t like some other aspects of ollama, like how they “leech” off llama.cpp and kinda hide the association without contributing upstream, some hype and controversies in the past, and hints that they may be cooking up something commercial.

brucethemoose@lemmy.world · edit-2 1 month ago

Guide to Self Hosting LLMs Faster/Better than Ollama

brucethemoose@lemmy.world · 3 months ago

Pressure grows as "last chance" negotiations for Gaza deal resume