The story behind Kernelize

AI hardware cannot succeed on silicon alone. Every new model creates new kernel, compiler, and runtime work, and that work still depends on a small number of scarce experts.

At AMD, we saw how Triton changed the model. Instead of treating every new workload as a closed, chip-specific engineering project, Triton created a shared kernel layer where work could compound across developers, models, and hardware targets.

Kernelize was founded to bring that advantage to every accelerator. We help hardware teams connect to the Triton ecosystem, reuse shared kernel infrastructure, and move faster from model support to optimized inference.

Our goal is to turn deep kernel and compiler expertise into reusable infrastructure, so high-performance inference can run on any chip.

Our role in the ecosystem is at the kernel bottleneck

Kernelize is an open core company working in the compiler stack that connects AI models to hardware. We build on Triton, PyTorch, and vLLM, and contribute to the open-source infrastructure that makes new accelerator backends easier to support without forking the ecosystem.

Triton gives the AI hardware industry a common kernel layer. It lets hardware teams and developers share infrastructure, reuse optimization knowledge, and bring new accelerators into the stack through familiar tools instead of isolated, chip-specific workflows.

That open foundation is what makes Kernelize possible. With common connection points in Triton and PyTorch, we can build tools above the kernel layer that analyze models, identify performance bottlenecks, and guide optimization across many accelerator architectures.

The team behind the portable kernel layer

Our team brings compiler, kernel, and runtime experience from AMD, Intel, and NVIDIA, with deep work in Triton and accelerator backend development.

Simon Waters

Co-Founder & CTO

Simon leads Kernelize’s compiler and technical strategy. He has deep experience building optimizing compiler products, including Triton backend work at AMD, and founded Kernelize to make high-performance AI inference portable for all accelerator architectures.

Bryan Bowyer

Co-Founder & Head of Product

Bryan leads Kernelize’s product and go-to-market strategy. He has deep experience bringing AI hardware products to market, including kernel and compiler work across multiple generations of AMD NPU hardware.

Alex Baden

Compiler Engineer

Alex builds compiler infrastructure for Kernelize’s AI accelerator backends. He previously worked on Triton at Intel and brings deep experience in code generation, parallel computing, and making high-performance kernels portable across hardware architectures.

Andrew Brown

Compiler Engineer

Andrew builds open-source compiler infrastructure for Kernelize. His work on Triton Extensions helps make it practical for new AI accelerators to connect into the Triton ecosystem without maintaining long-lived compiler forks.

Matt Leon

Runtime Engineer

Matt builds the infrastructure that lets Kernelize optimize production workloads across heterogeneous hardware, drawing on his experience at Intel in networking, virtualization, and scalable systems.

Rajan Walia

Compiler Engineer

Rajan builds compiler infrastructure for Kernelize’s AI accelerator backends. He previously worked as a senior compiler engineer at NVIDIA, where he developed deep experience in GPU compiler systems, code generation, and high-performance computing software.

Getting Started

Building AI hardware or deploying inference at scale?

Kernelize helps teams bring new models to any hardware and optimize the kernels that determine real-world performance.

Kernelize