llama.cpp compiled as a native Android library via the NDK, linked into React Native through a custom JSI bridge. GGUF models loaded straight into memory.
On Snapdragon devices we use QNN (Qualcomm Neural Network) for hardware acceleration. OpenCL GPU fallback on everything else. CPU-only as a last resort.
Image gen is Stable Diffusion running on the NPU where available. Vision uses SmolVLM and Qwen3-VL. Voice is on-device Whisper.
The model browser filters by your device's RAM so you never download something your phone can't run. The whole thing is MIT licensed - happy to answer anything about the architecture.
The current copy function copies the entire message.
Could it please be improved to allow selection and copying of only the desired text?
You have eliminated the problem of latency and having flagship phones. Amazing, my old android has AI now. Gave you star on github. Ciao.
lol thanks buddy!
Can you share some technical details? How did you do it? What’s under the good?
ofcourse ofcourse,
I've documented everything here: https://github.com/alichherawalla/off-grid-mobile-ai/blob/ma...
llama.cpp compiled as a native Android library via the NDK, linked into React Native through a custom JSI bridge. GGUF models loaded straight into memory. On Snapdragon devices we use QNN (Qualcomm Neural Network) for hardware acceleration. OpenCL GPU fallback on everything else. CPU-only as a last resort.
Image gen is Stable Diffusion running on the NPU where available. Vision uses SmolVLM and Qwen3-VL. Voice is on-device Whisper.
The model browser filters by your device's RAM so you never download something your phone can't run. The whole thing is MIT licensed - happy to answer anything about the architecture.
Any roadmap to add Mediatek NPU support?
I'm working on that we speak. Shouldn't not be that difficult of a lift and should be able to do that tonight or in the next couple of nights