Sunday, October 26, 2025

Deeplite Accelerates AI on Arm CPUs Using Ultra-Compact Quantization

Deeplite, a provider of AI optimization software designed to make AI model inference faster, more compact and energy-efficient, announced Deeplite Runtime (DeepliteRT), a new addition to its platform that makes AI models even smaller and faster in production deployment, without compromising accuracy. Customers will benefit from lower power consumption, reduced costs and the ability to utilize existing Arm CPUs to run AI models.

Also Read: TuSimple Develops Autonomous Domain Controller Using NVIDIA DRIVE Orin to Bring Level 4 Autonomous Trucking to Market at Scale

As organizations look to include more edge devices in their AI and deep learning strategies, they are faced with the challenge of making AI models run on small edge devices, including security cameras, commercial drones, and mobile phones, that often have very limited power budgets and processor resources. DeepliteRT solves this challenge with an innovative way to run ultra-compact quantized models on commodity Arm processors, while at the same time maintaining model accuracy.

“Multiple industries continue to look for new ways to do more on the edge. It is where users interact with devices and applications and businesses connect with customers. However, the resource limitations of edge devices are holding them back,” said Nick Romano, CEO and co-founder at Deeplite. “DeepliteRT helps enterprises to roll out AI quickly on existing edge hardware, which can save time and reduce costs by avoiding the need to replace current edge devices with more expensive hardware.”

Deeplite has partnered with Arm to run DeepliteRT on its Cortex-A Series CPUs in everyday devices such as home security cameras. Businesses can run complex AI tasks on these low-power CPUs, eliminating the need for expensive and power-hungry GPU-based hardware solutions that limit AI adoption.

DeepliteRT builds upon the company’s existing inference optimization solutions, including Deeplite Neutrino™, an intelligent optimization engine for Deep Neural Networks (DNNs) on edge devices where size, speed and power are often major challenges. Neutrino automatically optimizes DNN models for target resource constraints. Neutrino inputs large, initial DNN models that have been trained for a specific use case and understands the edge device constraints to deliver smaller, more efficient, and accurate models.

Subscribe Now

    Hot Topics