counting the days at this point
counting the days at this point
you’d be surprised how fast a model can be if you narrow the scope, quantize, and target specific hardware, like the AI hardware features they’re announcing.
not a 1-1, but a quantized Mistral 7B runs at ~35 tokens/sec on my M2. that’s not even as optimized as it could be. it can write simple scripts and do some decent writing prompts.
they could get really narrow in scope (super simple RAG, limited responses, etc), quantize down to even something like 4 bit, and run it on custom accelerated hardware. it doesn’t have to reproduce Shakespeare, but i can imagine a PoC that runs circles around Siri in semantic understanding and generated responses. being able to reach out on Slack to the engineers that built the NPU stack ain’t bad neither.
i guess the rare thing is the public commitment, but Apple has generally had a good track record for updates compared to its Android counterparts, who have previously failed to meet their goals or set laughable goals like 2 years.