Job Description:
AI/ML Cloud Infrastructure Engineer
Required 5+ years of Azure and AWS AI Platform Administrator to manage, optimize, and secure AI/ML platforms across Azure and AWS cloud environments. This role involves configuring and maintaining AI services, ensuring optimal performance, managing costs, and supporting AI/ML teams in deploying, scaling, and monitoring workloads. The ideal candidate will have a strong understanding of cloud-native AI tools, infrastructure-as-code (IaC), and DevOps practices.
Responsibilities:
AI/ML Platform Deployment & Management
- Deploy, configure, and maintain AI/ML platforms such as Azure Machine Learning, Azure Cognitive Services, AWS SageMaker, Rekognition, and Comprehend.
- Ensure seamless integration of AI/ML services with cloud infrastructure.
Cloud Infrastructure & Compute Resource Management
- Manage compute resources (e.g., EC2, EKS, Azure VMs, AKS) to support AI/ML workloads while ensuring cost-efficiency and scalability.
- Optimize containerized AI workloads using Kubernetes, Docker, and serverless architectures.
Storage & Data Management
- Oversee storage solutions for AI applications using AWS S3, Azure Blob Storage, and Data Lake architectures.
- Implement efficient data pipelines for AI/ML workloads to enhance data accessibility and performance.
AI/ML Deployment & Support
- Provide technical support to data scientists, ML engineers, and developers in deploying, monitoring, and troubleshooting AI/ML workloads.
- Streamline MLOps processes for continuous integration and deployment (CI/CD) of AI models.
Security, Compliance & Collaboration
- Work closely with DevOps and security teams to align AI/ML infrastructure with best practices, compliance, and security standards.
- Implement IAM policies, data encryption, and secure access controls for AI workloads.
Performance Monitoring & Optimization
- Continuously monitor AI workloads using AWS CloudWatch, Azure Monitor, and other observability tools to optimize for cost, latency, and throughput.
- Implement auto-scaling mechanisms to handle dynamic AI/ML workloads efficiently.
Preferred Qualifications:
✔ Experience in deploying AI/ML workloads on Azure and AWS.
✔ Strong knowledge of cloud computing, Kubernetes, and container orchestration.
✔ Hands-on expertise in MLOps, automation, and CI/CD for AI applications.
✔ Familiarity with infrastructure-as-code (IaC) tools such as Terraform, CloudFormation, or ARM templates.
✔ Experience with AI model lifecycle management, monitoring, and governance.