About The Job
As a Machine Learning Data Engineer at CAG, you will be responsible for designing, implementing, and maintaining the data pipelines and infrastructure that support our machine learning projects. You will work closely with data scientists, machine learning engineers, cloud engineer and other cross-functional teams to ensure the availability, reliability, and performance of our data systems. Your role will be critical in enabling the development and deployment of advanced machine learning models that drive key business insights and innovations.
This role requires a blend of technical expertise, project management skills, and the ability to deliver robust, scalable data solutions in a fast-paced environment.
Responsibilities :
- Architect and implement scalable data solutions to address complex business challenges, leveraging advanced analytics, statistical methods, and machine learning techniques. Apply advanced data preprocessing, transformation, and enrichment techniques to ensure high-quality inputs for machine learning models.
- Partner with data scientists and ML engineers to translate data requirements into actionable insights, optimizing feature engineering processes and model deployment strategies.
- Construct and manage modern data infrastructure, including data warehouses and data lakes, to facilitate seamless data access for analysis and model training.
- Continuously optimize data pipelines for performance, scalability, and cost-effectiveness, considering factors such as data volume, processing speed, and resource utilization.
- Collaborate closely with DevOps and IT teams to ensure smooth deployment, monitoring, and maintenance of data pipelines in production environments.
- Work cross-functionally to ensure adherence to data governance, security, and privacy regulations.
- Stay at the forefront of data engineering and machine learning advancements, driving the adoption of best practices within the team.
- Mentor junior team members and contribute to the overall data strategy of the organization.
Requirements :
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field with at least 5 years’ work experience
- Extensive hands-on experience with cloud platforms, particularly AWS and GCP, including their data services and analytics offerings.
- Strong coding skills with proficiency in:
- Infrastructure as Code (e.g., Terraform, CloudFormation)
- Shell scripting
- Python
- SQL
- Deep understanding of big data technologies, distributed computing, and modern data architecture patterns. Proven track record in designing and implementing large-scale data solutions, including data pipelines, data warehouses, and data lakes.
- Demonstrated ability to successfully deliver projects, meet milestones, and drive initiatives from conception to production.
- Experience with data streaming technologies (e.g., Kafka, Kinesis) and batch processing frameworks (e.g., Spark, Hadoop).
- Familiarity with containerization and orchestration technologies (e.g., Docker, Kubernetes).
- Strong problem-solving skills and ability to translate complex business requirements into technical solutions.
- Excellent communication skills with the ability to collaborate effectively across cross-functional teams.