Get in Touch

Course Outline

Foundations of Cloud Operations on AWS

  • Operational roles and responsibilities within cloud environments.
  • AWS account structure, Organizations, and multi-account strategies.
  • Core operational services: CloudWatch, CloudTrail, and AWS Config.

Infrastructure as Code and Provisioning

  • Principles of IaC and immutable infrastructure.
  • Provisioning infrastructure with Terraform and AWS CloudFormation.
  • Managing state, modules, and environment promotion processes.

CI/CD and Deployment Strategies

  • Designing CI/CD pipelines for cloud-native applications.
  • Implementing blue/green, canary, and rolling deployment strategies.
  • Automating rollback mechanisms, health checks, and release validation.

Monitoring, Observability, and Alerting

  • Managing metrics, logs, and traces: shipping, storing, and analyzing data.
  • Utilizing CloudWatch, X-Ray, and third-party observability tools.
  • Defining SLOs/SLIs, establishing alerting policies, and defining on-call procedures.

Security Operations and Identity Management

  • IAM best practices, least privilege access, and cross-account management.
  • Secrets management, KMS, and secure parameter stores.
  • Operational security: patching strategies, vulnerability scanning, and audit trails.

Resilience, Backup, and Disaster Recovery

  • Designing for fault tolerance and high availability.
  • Backup strategies, snapshot automation, and restoration procedures.
  • Disaster recovery planning and runbook development.

Cost Optimization and Governance

  • Cost visibility: billing, tagging, and cost allocation strategies.
  • Rightsizing, reserved instances/savings plans, and budgeting controls.
  • Governance: policies, guardrails, and automation for compliance.

Containers, Serverless, and Runtime Operations

  • Operational considerations for ECS, EKS, and Lambda.
  • Service discovery, autoscaling, and resource limit management.
  • Logging, tracing, and debugging containerized workloads.

Incident Response, Playbooks, and Chaos Engineering

  • Runbook-driven incident response and postmortem practices.
  • Automating remediation and self-healing patterns.
  • Introduction to chaos experiments for validating resilience.

Hands-on Workshop: Operate a Sample Workload

  • Deploying a sample application using IaC and a CI/CD pipeline.
  • Implementing monitoring, alerts, and automated remediation scripts.
  • Simulating incidents and practicing runbook-based responses.

Summary and Next Steps

Requirements

  • Foundational knowledge of cloud concepts and networking.
  • Familiarity with the Linux command line and scripting.
  • Experience with version control systems (Git) and basic CI/CD principles.

Target Audience

  • Cloud operations engineers.
  • Site Reliability Engineers (SREs) and platform engineers.
  • DevOps engineers and technical team leads.
 21 Hours

Number of participants


Price per participant

Testimonials (2)

Upcoming Courses

Related Categories