Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction to AIOps with Open Source Tools
- Overview of AIOps concepts and associated benefits.
- The role of Prometheus and Grafana within the observability stack.
- The integration of ML into AIOps: comparing predictive versus reactive analytics.
Setting Up Prometheus and Grafana
- Installing and configuring Prometheus for time series data collection.
- Creating dashboards in Grafana utilizing real-time metrics.
- Exploring exporters, relabeling techniques, and service discovery mechanisms.
Data Preprocessing for ML
- Extracting and transforming Prometheus metrics.
- Preparing datasets specifically for anomaly detection and forecasting tasks.
- Utilizing Grafana’s transformations or Python-based pipelines.
Applying Machine Learning for Anomaly Detection
- Employing basic ML models for outlier detection (such as Isolation Forest and One-Class SVM).
- Training and evaluating models on time series data.
- Visualizing detected anomalies within Grafana dashboards.
Forecasting Metrics with ML
- Developing simple forecasting models (introducing ARIMA, Prophet, and LSTM).
- Predicting system load or resource usage patterns.
- Leveraging predictions to inform early alerting and scaling decisions.
Integrating ML with Alerting and Automation
- Defining alert rules based on ML outputs or established thresholds.
- Managing Alertmanager and configuring notification routing.
- Triggering scripts or automation workflows in response to detected anomalies.
Scaling and Operationalizing AIOps
- Integrating external observability tools (e.g., ELK stack, Moogsoft, Dynatrace).
- Operationalizing ML models within observability pipelines.
- Adhering to best practices for implementing AIOps at scale.
Summary and Next Steps
Requirements
- A solid understanding of system monitoring and observability concepts.
- Practical experience with Grafana or Prometheus.
- Familiarity with Python programming and fundamental machine learning principles.
Audience
- Observability engineers.
- Infrastructure and DevOps teams.
- Monitoring platform architects and Site Reliability Engineers (SREs).
14 Hours