The main responsibilities of the position include:
- Assist in designing, implementing, and maintaining scalable MLOps pipelines on AWS using services such as SageMaker, EC2, EKS, S3, Lambda and other relevant AWS tools
- Coordinate with our platform team to troubleshoot Kubernetes clusters (EKS) to orchestrate the deployment of machine learning models and other microservices
- Develop and maintain CI/CD pipelines for model and application deployment, testing, and monitoring
- Collaborate closely with Data Science, and DevOps team to streamline the model development lifecycle, from experimentation to production deployment
- Implement security best practices, including network security, data encryption, and role-based access controls within the AWS infrastructure
- Monitor, troubleshoot, and optimize data and ML pipelines to ensure high availability and performance
- Set up and manage model monitoring systems for performance drift, ensuring continuous model improvement
Main requirements:
- Bachelor's degree in Computer Science, Engineering, or related field
- 1+ years of hands-on experience in MLOps, DevOps, or related fields
- Knowledge and preferable working experience in AWS services for machine learning, such as SageMaker, EKS, S3, EC2, Lambda, and others
- Exposure to Kubernetes for container orchestration
- Experience with Docker
- Exposure to infrastructure-as-code tools such as Terraform or CloudFormation
- Familiarity with CI/CD tools such as GitLab CI
- Understanding machine learning model lifecycle
- Familiarity with monitoring and logging solutions like Prometheus, Grafana, CloudWatch and ELK Stack
- Understanding of networking concepts and cloud security best practices
- Proficiency in Python and Bash, and comfortable working in Linux environments
- Strong problem-solving and communication skills
The following will be considered an advantage:
- Experience working with serverless architectures and event-driven processing on AWS
- Familiarity with advanced Kubernetes concepts such as Helm
- Experience with Data Engineering pipelines, ETL processes, or big data platforms
- Experience with ML frameworks like TensorFlow, PyTorch and Keras
- Experience with ML platforms like Kubeflow and/or SageMaker
- Experience with workflow engines like Argo Workflows and/or Airflow
