Key Responsibilities:
- Design, implement, and maintain scalable ML infrastructure.
- Collaborate with data scientists to deploy and monitor machine learning models.
- Automate data processing workflows and ensure data quality.
- Optimize and manage cloud resources for cost-effective operations.
- Develop and maintain CI/CD pipelines for ML models.
- Troubleshoot and resolve issues related to ML infrastructure and deployments.
- Work closely with cross-functional teams to understand requirements and deliver solutions.
Required Skills and Experience:
- Minimum 3 years of experience in infrastructure engineering.
- Technical experience with any cloud platform (AWS, Azure, Google Cloud Platform).
- Strong proficiency in Python scripting and other programming languages.
- Experience with CI/CD tools and practices.
- Solid understanding of machine learning lifecycle and best practices.
- Strong problem-solving skills and attention to detail.
- Excellent communication skills and ability to work collaboratively in a team environment.
- Demonstrated ability to take ownership and drive projects to completion.
Beneficial Skills and Experience:
- Proficiency in using EMR (Elastic MapReduce) for large-scale data processing.
- Experience with containerization and orchestration tools (Docker, Kubernetes).
- Familiarity with data visualization tools and techniques.
- Knowledge of big data technologies (Spark, Hadoop).
- Experience with version control systems (Git).
- Understanding of data governance and security best practices.
- Experience with monitoring and logging tools (Prometheus, Grafana).
- Stakeholder management skills and ability to communicate technical concepts to non-technical audiences.
