Solarwinds

Senior Site Reliability Engineer

DevOps / Infra Bangalore Office Today

Apply for this role

Listed via Greenhouse · Redirects to Solarwinds's careers page

Job Description

At SolarWinds, we’re a people-first company. Our purpose is to enrich the lives of the people we serve—including our employees, customers, shareholders, partners, and communities. Join us in our mission to help customers accelerate business transformation with simple, powerful, and secure solutions.

The ideal candidate thrives in an innovative, fast-paced environment and is collaborative, accountable, ready, and empathetic. We’re looking for individuals who believe they can accomplish more as a team and create lasting growth for themselves and others. We hire based on attitude, competency, and commitment. Solarians are ready to advance our world-class solutions in a fast-paced environment and accept the challenge to lead with purpose. If you’re looking to build your career with an exceptional team, you’ve come to the right place. Join SolarWinds and grow with us!

Role Overview

We are seeking a Senior Site Reliability Engineer to join our Platform / Site Reliability Engineering team, focusing on the SolarWinds Observability (SWO) platform and related observability products (including AppOptics and Papertrail). You will work across AWS and GCP environments and Kubernetes-based infrastructure, applying GitOps principles to build and operate reliable, scalable, and secure SaaS services.

The ideal candidate has a strong understanding of SRE practices and a proven track record of implementing high-quality reliability capabilities such as SLAs/SLOs, proactive alert management, incident response, and postmortems in production environments.

What You’ll Do (Your Impact)

Own the reliability, performance, and scalability of the SWO platform and shared observability services running in AWS and GCP, used by customers globally.
Work collaboratively with software engineering teams to define and evolve infrastructure, deployment, and operability requirements for SWO platform services.
Contribute actively to automation and observability initiatives, driving best practices in metrics, logging, tracing, and alerting across the platform.
Learn, develop, and maintain operational tooling for deployment, monitoring, and analysis of cloud infrastructure and systems in AWS and GCP.
Participate in and help lead 24/7 on-call rotations, owning the response to production incidents, coordinating resolution, and ensuring high-quality communications to stakeholders.
Create and maintain on-call documentation and incident response playbooks that enable fast, consistent, and high-quality responses to incidents.
Establish and drive SLOs and error budgets for key SWO services, using them to guide reliability investments and operational excellence.
Embrace and help spread best practices for CI/CD and code review, including GitHub pull requests and GitOps workflows, to maximize development and release velocity while maintaining reliability.
Champion a culture of continuous learning and improvement, using postmortems and data to prevent the recurrence of issues and to systematically improve our systems and processes.

What We’re Looking For (Ideal Attributes)

Strong customer orientation and empathy for the impact that reliability and performance have on users.
Excellent interpersonal and organizational skills, with the ability to collaborate effectively with engineers, product managers, and non-technical stakeholders.
High attention to detail and a strong focus on quality, especially in changes affecting production systems.
Ability to act decisively and stay calm under pressure, particularly during high-severity incidents.
A collaborative problem solver with a strong bias for ownership and action—you take responsibility for problems end-to-end and drive them to resolution.

Your Experience & Skills

7+ years of experience designing, building, and maintaining SaaS environments, ideally multi-tenant customer-facing platforms.
5+ years of hands-on experience designing, building, and maintaining AWS and/or GCP infrastructure with Terraform (or similar IaC tools).
Strong experience building and running Kubernetes clusters (managed or self-hosted) in production, including upgrades, scaling, and troubleshooting.
Solid background in observability—monitoring, logging, tracing, and metrics—and using these signals to drive reliability improvements.
Practical experience with GitOps CI/CD processes and tooling (e.g., Argo CD, Flux, or similar), including safe rollouts and rollbacks.
Strong scripting or programming skills in Python, Go (Golang), Bash, or PowerShell, as well as familiarity with AWS CLI tools.
Experience with security operations for cloud infrastructure: security policies, identity and access management, key management, and setting up encryption at rest and in transit.

SolarWinds is an Equal Employment Opportunity Employer. SolarWinds will consider all qualified applicants for employment without regard to race, color, religion, sex, age, national origin, sexual orientation, gender identity, marital status, disability, veteran status or any other characteristic protected by law.

All applications are treated in accordance with the SolarWinds Privacy Notice: https://www.solarwinds.com/applicant-privacy-notice