Site Reliability Engineer

Company:  mistral.ai
Location: Paris
Closing Date: 02/08/2024
Salary: £80 - £100 Per Annum
Type: Temporary
Job Requirements / Description
Mistral AI is looking for an SRE Engineer to shape reliability, scalability, and performance of our platform and customer facing applications.You will work closely with our software engineers to ensure our systems meet and exceed our customers' expectations.Responsibilities- Make sure our inference and platform resources are always available and in good shape- Ensure our products are reliable and ensure SLAs- Design, build, and maintain scalable, highly available, and fault-tolerant standard and AI infrastructure to support our machine learning workloads and services- Implement and improve monitoring, alerting, and incident response systems to ensure optimal system performance and minimize downtime- Develop and maintain comprehensive documentation for infrastructure designs, processes, and best practices- Participate in on-call rotations to respond to incidents and perform root cause analysis to prevent future occurrences- Drive continuous improvement in infrastructure automation, deployment, and orchestration using tools like Kubernetes, Flux, Terraform, …- Collaborate with the security team to ensure infrastructure adheres to best security practices and compliance requirements- Evaluate and implement new tools, technologies, and processes to enhance our AI infrastructure's efficiency, reliability, and scalabilityAbout you :- 3+ years of experience in SW Engineering- Key technical skills: observability/alerting/operational maintenance- Familiar with bare Kubernetes/Grafana/Prometheus- Experience building cross datacenter & highly available distributed systems- Experience profiling & optimizing stacks to the millisecond- Good programming skills in one language (Python/Go/C++/Rust)- Master’s degree in Computer Science, Engineering, or a related field, or equivalent experience.- Proven experience as a Site Reliability Engineer, DevOps Engineer, or similar role, ideally in an AI/ML-focused environment.- Strong understanding of AI/ML infrastructure requirements- Experience with containerization and orchestration technologies like Docker and Kubernetes.- Familiarity with infrastructure-as-code tools such as Terraform- Solid understanding of cloud computing platforms like AWS, GCP, or Azure.- Experience with monitoring, logging, and alerting tools like Prometheus, Grafana, ELK Stack, …- Strong problem-solving skills and the ability to work independently and collaboratively in a fast-paced environment.- Excellent communication skills, both written and verbal.What We Offer:- Ability to shape the exciting journey of AI and be part of the very early days of one of Europe’s hottest startup- A fun, young, multicultural team and collaborative work environment — based in Paris and London- Competitive salary and bonus structure- Comprehensive benefits package- Opportunities for professional growth and developmentAbout Mistral AIMistral AI is a European company training large generative models for providing them to the industry. It releases the technology in a fully transparent way; a significant part of its IP is shared with permissive open-source software: Mistral AI intends to be a technical leader in the open-source generative AI community.We're a small team, mostly composed of seasoned researchers and engineers in the field of AI. We like to work hard and to be at the edge of science. We are creative, low-ego, team-spirited, and have all been passionate about AI for years. We hire people that foster in competitive environments because they find them more fun to work in. We hire passionate women and men from all over the world. #J-18808-Ljbffr
Apply Now
Share this job
mistral.ai
  • Similar Jobs

  • Site Reliability Engineer

    Paris
    View Job
  • Site Reliability Engineer

    Paris
    View Job
  • Site Reliability Engineer

    Paris
    View Job
  • Site Reliability Engineer

    Paris
    View Job
  • Senior Site Reliability Engineer

    Paris
    View Job
An unhandled exception has occurred. See browser dev tools for details. Reload 🗙