placement.solutions
HomeJobsCooley › Senior Technology Site Reliability Engineer

Senior Technology Site Reliability Engineer

Cooley
San Francisco, CA
Senior Technology Site Reliability Engineer Cooley is seeking a Senior Site Reliability Engineer to join the Infrastructure & Development Operations team. Position summary: The Senior Technology Site Reliability Engineer (“SRE”) is responsible for ensuring the reliability, scalability, and performance of the firm’s critical infrastructure and applications. The SRE blends software engineering with systems engineering to build and maintain automated, resilient, and observable systems that support high availability and operational excellence. In addition to being technically advanced, the SRE will have a high degree of emotional intelligence and the ability to work as a team towards complex and layered objectives. Specific duties and responsibilities include, but are not limited to, the following: Position responsibilities: Monitor and maintain production systems to ensure high availability and performance Implement and manage service-level indicators (SLIs), objectives (SLO’s), agreements (SLA’s), and error budgets Participate in on-call rotations and incident response, including root cause analysis and postmortems Develop and maintain infrastructure as code (IaC) using Terraform Automate deployment, scaling, and recovery processes to reduce manual intervention Partner with DevOps to build and maintain CI/CD pipelines to support safe and efficient software delivery Implement observability solutions using metrics, logs, traces, and alerting systems (Prometheus, Grafana, DataDog, etc.) Proactively identify and resolve system bottlenecks and reliability risks Work closely with Infrastructure, DevOps, Development, and security teams to embed reliability into the development lifecycle Contribute to a culture of blameless post-mortems and continuous improvement Document operational procedures and share knowledge across teams All other duties as assigned or required Skills and experience: Required: After orientation at Cooley LLP, exhibit proficiency in the Microsoft Office suite, iManage and other firm applications Ability to work extended and/or weekend hours, as required Ability to travel, as required 6+ years direct applicable experience (e.g. site reliability engineering or related field) Proficiency in Terraform and programming languages such as Python, Go, or Java Deep expertise in cloud platforms, particularly AWS, and container orchestration Strong background in distributed systems, performance tuning, and automation Hands-on experience with configuration management tools such as Puppet, Chef, or Salt Preferred: Bachelor's Degree in Computer Science, Information Technology, Engineering, or associated discipline Experience working with advanced ETL data workflows including technologies such as AWS EMR, Azure Synapse, Azure Data Factory, or Apache Hive/Spark/Airflow Experience with IaC deployment of AKS/EKS/GKE architecture Experience with enterprise Data Lake environments using technologies such as DataBricks or Snowflake Competencies: Expert
Apply on firm site →