Get in Touch

Course Outline

Overview of CANN Optimization Capabilities

  • Understanding how inference performance is managed within CANN.
  • Defining optimization objectives for edge and embedded AI systems.
  • Comprehending AI Core utilization and memory allocation strategies.

Leveraging the Graph Engine for Analysis

  • Introduction to the Graph Engine and its execution pipeline.
  • Visualizing operator graphs and runtime metrics.
  • Adjusting computational graphs to achieve optimization.

Profiling Tools and Performance Metrics

  • Utilizing the CANN Profiling Tool for workload analysis.
  • Evaluating kernel execution time and identifying bottlenecks.
  • Profiling memory access and implementing tiling strategies.

Custom Operator Development with TIK

  • Exploring the TIK overview and operator programming model.
  • Implementing custom operators using the TIK DSL.
  • Conducting testing and benchmarking of operator performance.

Advanced Operator Optimization with TVM

  • Introduction to TVM integration with CANN.
  • Employing auto-tuning strategies for computational graphs.
  • Determining when and how to transition between TVM and TIK.

Memory Optimization Techniques

  • Managing memory layouts and buffer placement.
  • Applying techniques to reduce on-chip memory consumption.
  • Adopting best practices for asynchronous execution and resource reuse.

Real-World Deployment and Case Studies

  • Case study: Performance tuning for a smart city camera pipeline.
  • Case study: Optimizing the inference stack for autonomous vehicles.
  • Guidelines for iterative profiling and continuous improvement.

Summary and Next Steps

Requirements

  • Comprehensive knowledge of deep learning model architectures and training workflows.
  • Practical experience with model deployment via CANN, TensorFlow, or PyTorch.
  • Proficiency in Linux CLI, shell scripting, and Python programming.

Target Audience

  • AI performance engineers.
  • Specialists in inference optimization.
  • Developers working on edge AI or real-time systems.
 14 Hours

Number of participants


Price per participant

Upcoming Courses

Related Categories