Senior AI/ML Engineer – LLM Inference & GPU Optimization – Center for Career & Professional Development

About Us

We are an early-stage deep tech startup building infrastructure software at the intersection of cloud computing and AI performance optimization. Our founding team comes from industry-leading organizations including Meta and Nvidia-acquired Grok. We are backed by experienced founders and working on a technically hard problem in AI inference and GPU workload optimization.

We do not share full product details publicly at this stage, but candidates who reach the interview will get the complete picture.

The Role

We are looking for a PhD candidate or doctoral researcher with strong hands-on experience in LLM deployment, GPU computing, and inference optimization. This is not a research-only role, we want someone who can move between theory and implementation, owns their work end to end, and is comfortable operating without established playbooks.

You will work directly with senior engineers and founders on real infrastructure problems.

What You Will Do

Deploy and benchmark LLM inference workloads across GPU-accelerated cloud environments
Work with frameworks including vLLM, LangChain, and LangGraph to build and evaluate AI pipelines
Contribute to telemetry instrumentation and observability tooling for production AI systems
Optimize inference performance, latency, throughput, GPU utilization, and surface actionable insights
Assist with cloud deployment automation across AWS, GCP, or Azure
Engage with cutting-edge research and apply findings directly to our platform

What We Are Looking For

Currently enrolled in or recently completed a PhD program in Computer Science, Computer Engineering, Electrical Engineering, or a related field
Deep understanding of LLM inference, GPU computing, and CUDA programming
Hands-on experience with vLLM, CUDA, or similar GPU inference frameworks
Strong familiarity with cloud infrastructure, AWS, GCP, or Azure
Experience deploying and benchmarking models in production or research environments
Comfort working independently in fast-moving, early-stage environments
MCP (Model Context Protocol) experience is a plus
Publications or research experience in ML systems, distributed computing, or AI infrastructure is a plus

Why This Role

Work directly with founders who have built and scaled AI infrastructure at the highest level in the industry
Real problems, your work ships and has impact
Flexible hours and fully remote
Path to full-time for the right candidate
Opportunity to be part of something being built from the ground up at an inflection point in AI infrastructure

Monday	8:30 am - 4:30 pm
Tuesday	8:30 am - 4:30 pm
Wednesday	8:30 am - 4:30 pm
Thursday	8:30 am - 4:30 pm
Friday	8:30 am - 4:30 pm

Stealth Startup

Senior AI/ML Engineer – LLM Inference & GPU Optimization