Key Responsibilities:
- Lead the solution architecture, design, implementation, and maintenance of scalable ML infrastructure.
- Collaborate with senior stakeholders to align ML Ops initiatives with business objectives. Offshore and onshore teams, present ideas, plan and delivery.
- Oversee the deployment, monitoring, and optimization of machine learning models.
- Automate complex data processing workflows and ensure data quality.
- Optimize and manage cloud resources for cost-effective operations.
- Develop and maintain robust CI/CD pipelines for ML models.
- Troubleshoot and resolve advanced issues related to ML infrastructure and deployments.
- Mentor and guide team members, fostering a culture of continuous learning and innovation.
- Drive best practices and standards for ML Ops within the organization.
- Manage and prioritize multiple projects and initiatives in a fast-paced environment.
Required Skills and Experience:
- Minimum 7 years of experience in data processing and data engineering.
- Proficiency in using EMR (Elastic MapReduce) for large-scale data processing.
- Extensive experience with SageMaker, ECR, S3, Lamba functions, Cloud capabilities and deployment of ML models.
- Strong proficiency in Python scripting and other programming languages.
- Experience with CI/CD tools and practices.
- Solid understanding of the machine learning lifecycle and best practices.
- Strong problem-solving skills and attention to detail.
- Excellent communication skills and ability to work collaboratively in a team environment.
- Demonstrated ability to take ownership and drive projects to completion.
- Proven experience in leading and mentoring teams.
- Strong stakeholder management skills and ability to communicate technical concepts to non-technical audiences.
Beneficial Skills and Experience:
- Experience with containerization and orchestration tools (Docker, Kubernetes).
- Familiarity with data visualization tools and techniques.
- Knowledge of big data technologies (Spark, Hadoop).
- Experience with version control systems (Git).
- Understanding of data governance and security best practices.
- Experience with monitoring and logging tools (Prometheus, Grafana).
