View Our Website View All Jobs

DevOps - Senior Site Reliability Engineer

About Us

SignifAI is a VC-backed start-up in stealth mode, with offices in Sunnyvale and Tel-Aviv. Our mission is to increase system availability through machine intelligence.  Our technology helps technical operations teams solve problems faster, more effectively, and more accurately than ever before, resulting in increased system uptime and team effectiveness. SignifAI has recently released a version of its product, with an initial focus on automated root cause analysis, preventive analysis and augmented remediation. Our team consists of technologists that managed large distributed infrastructure with over 1 Billion events every day for the past several years. We have faced the pain, been there to solve down time and scaling issues, and suffered from the lack of clear simple answer to what’s important. Now we have decided to solve it by integrating machine intelligence to increase uptime, strengthen systems and make team's life easier.

Who are we looking for

We are  looking for highly energetic, startup mentality and passionate site reliability (DevOps) engineer to manage our entire production platform while helping our customers to identify issues in their implementations with our product.
If you are not afraid of highly distributed scalable infrastructure, this position is for you.

The Role 

This is a unique role that involves site reliability engineering work combined with customer success responsibilities.

You will ensure highly scalable, reliable, and secure service while managing release processes, workflow, and live deployments. Helping to architect scalable deployments for complex production serving systems in AWS and Google Cloud. Automating everything. You will use Ansible and other tools to manage configuration and deployments. Continuously improving application and system monitoring, log analysis, and metrics. Improving operational tools in order to detect and rapidly respond to incidents and issues while using SignifAI's own product.

Contributing anywhere else you can to move the company’s engineering forward. You will also work behind the scene to support our customer's large production environments. This is huge opportunity to grow your skills fast with cutting edge technologies, open source libraries, and large scale challenges.

   
Your Qualifications

  • Ability to root cause sources of instability in a high-traffic, large-scale distributed system 
  • Experience with configuration and troubleshooting of Linux, Java, Tomcat, and other middleware technologies
  • Understands large-scale complex systems from a reliability perspective
  • Scripting abilities in python, go, or JVM-based languages
  • Passion for resolving reliability issues and identify strategies to mitigate going forward
  • Experience in both software engineering and cloud-based ops – having previously been in a DevOps, Site Reliability Engineer, or Operations Engineer role.
  • Passion for automation, performance, visibility, and using the best tools possible.

Pluses:

  • Computer Science degree from a strong program or professional experience (8200 or other technological unit)
  • AWS/Google Cloud, working and deploying into Kubernetes clusters.
  • Experience working on Agile projects in distributed teams.
Read More

Apply for this position

Required*
Apply with Indeed
Attach resume as .pdf, .doc, or .docx (limit 2MB) or Paste resume

Paste your resume here or Attach resume file

Human Check*