Relativity Space

Staff GPU Systems Engineer, Space Computing

DevOps / Infra Long Beach (CA) Today

Apply for this role

Listed via Greenhouse · Redirects to Relativity Space's careers page

Job Description

At Relativity Space, we’re building rockets to serve today’s needs and tomorrow’s breakthroughs. Our Terran R vehicle will deliver customer payloads to orbit, meeting the growing demand for launch capacity. But that’s just the start. Achieving commercial success with Terran R will unlock new opportunities to advance science, exploration, and innovation, pioneering progress that reaches beyond the known.

Joining Relativity means becoming part of something where autonomy, ownership, and impact exist at every level. Here, you're not just executing tasks; you're solving problems that haven’t been solved before, helping develop a rocket, a factory, and a business from the ground up. Whether you’re in propulsion, manufacturing, software, avionics, or a corporate function, you’ll collaborate across teams, shape decisions, and see your work come to life in record time. Relativity is a place where creativity and technical rigor go hand in hand, and your voice will help define the stories we’re writing together. Now is a unique moment in time where it’s early enough to leave your mark on the product, the process, and the culture, but far enough along that Terran R is tangible and picking up momentum. The most meaningful work of your career is waiting. Join us.

About the Team:

The Interplanetary Sciences Program was established to expand access to scientific exploration across our solar system. Its mission is to make planetary research faster, more affordable, and more capable than ever before by rethinking how science missions are designed, built, and operated. The program aims to enable scientists to send instruments to distant worlds without decades of development or prohibitive costs. By creating a sustainable model for interplanetary exploration, we are transforming space science from an occasional event into a continuous process of discovery that accelerates knowledge, broadens participation, and inspires the next generation of explorers.

About the Role:

Own the GPU compute environment for a space-based data center — setup, driver integration, container runtime, job scheduling, and performance optimization — building the platform that enables onboard AI/ML inference and SAR reprocessing millions of miles from the nearest sysadmin
Profile and optimize compute performance across the full stack: GPU utilization, memory bandwidth, I/O throughput, and storage interface performance, squeezing maximum science return from constrained power and thermal budgets that shift between sunlit burst processing and eclipse idle periods
Build power and thermal-aware compute scheduling that orchestrates batch workloads around orbital constraints, coordinating with the storage platform to sustain 10 Gbps data movement between NAS and compute nodes during processing windows
Develop compute health monitoring and upset recovery mechanisms — checkpoint/restart strategies, GPU fault detection, and automated recovery — so a radiation-induced upset means a restarted job, not a lost processing window
Integrate GPU drivers with the payload Linux image in coordination with the Platform RE, manage the container runtime for compute workloads, and ensure the platform reliably runs ML frameworks and SAR processing pipelines maintained by the broader operations team

About You:

BS/MS in Computer Science or Electrical Engineering and 5+ years of relevant experience
Hands-on experience with GPU programming and compute frameworks — CUDA, ROCm, or OpenCL — with real performance profiling and optimization work, not just running tutorials
Strong Linux systems administration and performance tuning skills: you've diagnosed I/O bottlenecks, tuned memory management, and understood why a workload isn't hitting expected throughput
Experience with container technologies (Docker, Podman, or lightweight alternatives) and HPC job scheduling concepts
Working proficiency in Python for tooling, scripting, and ML framework integration, with C/C++ skills for performance-critical system components

Nice to haves but not required:

Experience with HPC cluster administration, ML infrastructure, or cloud GPU compute platforms at scale
Deep familiarity with ML framework runtime requirements — PyTorch or TensorFlow deployment, model serving, and inference optimization
Knowledge of GPU compute architectures at the hardware level: CUDA cores, compute units, memory hierarchies, and how they affect real workload performance
Experience with high-throughput data movement and storage I/O optimization — NFS tuning, buffer management, and sustaining multi-gigabit throughput
Background in power-managed computing: duty cycling, thermal throttling, and workload scheduling under variable power constraints
Experience designing checkpoint/restart or fault-tolerant batch processing systems — space experience not required, similar problems exist in large-scale distributed infrastructure and autonomous systems

At Relativity Space, we are committed to transparency and fairness in our compensation practices. Actual compensation will be determined based on experience, qualifications, and other job-related factors.

Compensation is only one part of our total rewards package. Relativity Space offers competitive salary and equity, a generous PTO and sick leave policy, parental leave, an annual learning and development stipend, and more! To see some of the benefits & perks we offer, please visit here.

Hiring Range:$181,000—$248,500 USD

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

If you need a reasonable accommodation, please contact us at accommodations@relativityspace.com.