Skymizer has announced the HTX301, a PCIe AI accelerator card designed to run large language models locally without needing a large GPU cluster.
The Taiwan based company says the HTX301 can run inference for 700B parameter models on a single PCIe card. If the claims hold up in real testing, that could make the card interesting for companies that want on premises AI without the cost, power draw, and complexity of large accelerator systems.
The HTX301 is built on Skymizer’s HyperThought platform and uses the company’s next generation LPU IP. It is aimed at LLM inference rather than broad GPU compute, with a focus on decode acceleration, prefill and decode orchestration, low latency, and fixed infrastructure costs.

Each PCIe card uses six HTX301 chips and includes up to 384GB of memory. Skymizer is not using HBM, GDDR, or LPDDR5X here. The card uses standard LPDDR4 and LPDDR5 memory, which helps keep power and cost lower.
The power figure is one of the biggest claims. Skymizer says the card runs at around 240W, which is much lower than high end PCIe AI accelerators such as AMD’s Instinct MI350P or NVIDIA’s RTX PRO 6000 Blackwell server card.
Here is a quick look at the HTX301:
| Feature | Details |
|---|---|
| Product | Skymizer HTX301 |
| Type | PCIe AI accelerator |
| Main use | Local LLM inference |
| Maximum model class | Up to 700B parameters |
| Chips per card | 6 HTX301 chips |
| Memory | Up to 384GB |
| Memory type | LPDDR4 and LPDDR5 |
| Power | Around 240W |
| Platform | HyperThought with next generation LPU IP |
| Target market | On premises AI and enterprise inference |
Skymizer also says its LPU design is efficient enough to reach 30 tokens per second with only 0.5 TOPS and 100GB/s of bandwidth. For Llama 2 7B prefill, the company claims an octa core LPU can reach 240 tokens per second, with multi chip scaling reaching up to 1,200 tokens per second.
The company is also using compression to reduce memory and bandwidth pressure. Its weight compression is claimed to perform better than llama.cpp by 9 percent to 17.8 percent, while KV cache compression is said to keep perplexity loss low.
The main appeal is clear. Many businesses want to run AI locally for privacy, predictable latency, and control over data. A single PCIe card that can handle very large models at 240W would be much easier to deploy than a rack full of high power GPUs.
There is still reason to be cautious. These are company claims, and the HTX301 needs independent testing before it can be judged against established GPU based systems. It also appears focused on inference, not full scale model training.
Skymizer plans to show the HTX301 at Computex. Until real benchmarks arrive, it is best viewed as a promising on premises AI accelerator that could matter if its 700B model, memory, and power claims are proven in practice.



Discussion (0)
Be the first to comment.