DevOps Engineer (Resiliency)

monday.com

monday.com

Software Engineering
Tel Aviv-Yafo, Israel
Posted on Aug 25, 2024

DevOps Engineer (Resiliency)

  • R&D
  • Tel-Aviv, Israel
  • Full-time

Description

monday.com is looking for a Resilience Engineer to join our newly formed Resilience Engineering team. You will be responsible for developing and maintaining disaster recovery and high availability strategies that ensure the resilience of our infrastructure.

About The Role

  • Develop and maintain comprehensive disaster recovery (DR) plans to ensure rapid recovery from system failures and incidents.
  • Design and implement multi-region architectures to enhance system reliability and ensure high availability across all critical services.
  • Create and manage high availability (HA) strategies to minimize downtime and ensure continuous service delivery.
  • Monitor and plan infrastructure capacity to support resilience initiatives and ensure scalability during failures.
  • Work closely with other engineering teams to integrate resilience solutions into the overall infrastructure.

Our Stack: AWS, Terraform, Kubernetes, Datadog, OpenTelemetry

Requirements

  • Strong experience in disaster recovery planning and high availability architecture.
  • Proven ability to design and implement multi-region architectures.
  • Familiarity with cloud services and infrastructure as code (IaC) practices.
  • Strong collaboration and communication skills.
  • Ability to work in a fast-paced, dynamic environment focused on resilience and reliability.

Social Title

DevOps Engineer

Our Team

The R&D Team is passionate about building innovative and lovable products, while tackling complex engineering problems at a great scale. We’re accountable for bringing the company’s vision to life by navigating our progress into flawless execution and encouraging full ownership and independence in all projects. The Infra role is a crucial piece as our company scales and user-base grows, conquering all aspects of product and infrastructure challenges. We are focused around development flow productivity, building application infrastructure and production resilience. We have huge challenges related to hyper growth of engineering, application and data scale.