Site Reliability Engineer (SRE)

Remote friendly (USA United States of America)

Job Description

Senior Site Reliability Engineer


As a Site Reliability Engineer (SRE), you will play a key role in the availability, reliability, and resiliency of Litera’s SaaS applications and technology services. You will participate in designing, building, enhancing, and supporting next-generation applications and services while safeguarding vital systems. You will follow and uphold industry best practices for configuration management, change management, incident management, application monitoring, resource scaling, high availability, and disaster recovery.

What You’ll Be Doing as A Part of Our Team

• Subject matter expert for SaaS-hosted applications and underlying architecture.
• Tackle unique challenges across the organization.
• Address escalations and provide ongoing support for customer-facing applications.
• Participate in a 24x7 SRE team on-call rotation.
• Perform root cause analysis for major outages and incidents.
• Automate processes to reduce toil.
• Build and maintain product availability and performance dashboards.
Recommend/implement solutions to continuously improve processes and productivity.
• Maintain security compliance and configuration standards.
• Organize and lead cross-functional teams to resolve critical production issues.
• Manage organizational disaster recovery events and processes.
• Build and maintain SRE runbooks/playbooks.
• Be a technical leader and mentor to the team.
• Be dependable and present for the team.

What You Should Have to Qualify

• Proven work experience as a Site Reliability Engineer or similar role.
• Solid grasp of software engineering best practices and agile methodologies.
• Understanding of how services scale, fail and recover.
• Capacity to recommend architectural changes and enlist others to implement your designs.
• Strong problem-solving skills and reverse engineering discipline.
• Ability to simplify and explain complex issues.
• Can learn new technologies and concepts quickly and unafraid of ambiguity.
• Comfortable working in a fast-paced environment.
• Recognize the importance of urgency.
• Able to maintain a professional composure during pressure conditions.
• Solid understanding of ITIL and problem management processes.
• 3+ years working with container orchestration technologies at scale.
• 3+ years working with configuration management tools (Terraform, Puppet, Ansible).
• Advanced knowledge of enterprise monitoring and alerting tools such as New Relic, Prometheus, Data Dog, or Dynatrace.
• 3+ years debugging applications and performing defect management.
• 3+ years working with cloud providers (AWS, Azure).
• Comfortable navigating and troubleshooting server operating systems (Windows, Linux), databases, and web engines (IIS, Apache).
• Solid experience working with databases (writing queries and performance tuning).
• Experience designing, analyzing, and troubleshooting large distributed systems.

Ideally, You Would Also Have These

• Experience working on SaaS teams.
• Experience working in highly regulated environments (SOX, HIPAA, PCI).
• AWS, Azure, or other relevant industry certifications.
• Software development experience.
• Background in application security and compliance.

California, Colorado, Connecticut, Illinois, Maryland, Nevada, New Jersey, New York, Ohio, Rhode Island, or Washington Residents Only: The salary range for California, Colorado, Connecticut, Illinois, Maryland, Nevada, New Jersey, New York, Ohio, Rhode Island, or Washington residents is $85,000 to $110,000. Pay is based on several factors including but not limited to education, work experience, certifications, etc. In addition to your salary, Litera offers benefits such as a comprehensive benefits package, incentive and recognition programs, and 401k contribution (all benefits are subject to eligibility requirements).

Litera is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.