Get in Touch

Course Outline

Introduction to AIOps with Open Source Tools

  • Overview of AIOps concepts and associated benefits.
  • The role of Prometheus and Grafana within the observability stack.
  • The integration of ML into AIOps: comparing predictive versus reactive analytics.

Setting Up Prometheus and Grafana

  • Installing and configuring Prometheus for time series data collection.
  • Creating dashboards in Grafana utilizing real-time metrics.
  • Exploring exporters, relabeling techniques, and service discovery mechanisms.

Data Preprocessing for ML

  • Extracting and transforming Prometheus metrics.
  • Preparing datasets specifically for anomaly detection and forecasting tasks.
  • Utilizing Grafana’s transformations or Python-based pipelines.

Applying Machine Learning for Anomaly Detection

  • Employing basic ML models for outlier detection (such as Isolation Forest and One-Class SVM).
  • Training and evaluating models on time series data.
  • Visualizing detected anomalies within Grafana dashboards.

Forecasting Metrics with ML

  • Developing simple forecasting models (introducing ARIMA, Prophet, and LSTM).
  • Predicting system load or resource usage patterns.
  • Leveraging predictions to inform early alerting and scaling decisions.

Integrating ML with Alerting and Automation

  • Defining alert rules based on ML outputs or established thresholds.
  • Managing Alertmanager and configuring notification routing.
  • Triggering scripts or automation workflows in response to detected anomalies.

Scaling and Operationalizing AIOps

  • Integrating external observability tools (e.g., ELK stack, Moogsoft, Dynatrace).
  • Operationalizing ML models within observability pipelines.
  • Adhering to best practices for implementing AIOps at scale.

Summary and Next Steps

Requirements

  • A solid understanding of system monitoring and observability concepts.
  • Practical experience with Grafana or Prometheus.
  • Familiarity with Python programming and fundamental machine learning principles.

Audience

  • Observability engineers.
  • Infrastructure and DevOps teams.
  • Monitoring platform architects and Site Reliability Engineers (SREs).
 14 Hours

Number of participants


Price per participant

Upcoming Courses

Related Categories