Accelerating PyG on NVIDIA GPUs

22 Sep 2022

In collaboration with the engineering team at NVIDIA, we are excited to announce GPU-accelerated PyTorch Geometric using NVIDIA AI.

Developers and researchers can take advantage of NVIDIA AI for significantly faster model training, which unlocks the ability to perform Graph Neural Network (GNN) training workflows right in the PyG framework with no line of code change.

GPU-Optimizations powered by NVIDIA AI

GPU acceleration for PyTorch Geometric is enabled using pyg-lib, a low-level GNN library exposing optimized operations for use in PyG.

These optimizations are enabled by the NVIDIA AI suite of libraries, mainly cuGraph, a RAPIDS library to accelerate graph algorithms that process data found in GPU-accelerated DataFrames and CUTLASS, for implementing high-performance matrix-multiplication (GEMM) within CUDA.

Many real-world applications require graphs that exceed the memory capacity of a single GPU, which requires sub-graphs to be sampled via neighbor sampling as part of the training loop. Currently, this sampling is done entirely by the CPU, leaving room for acceleration with a GPU implementation. Later this year, cuGraph will provide CUDA-optimized neighbor sampling-based data loading for large graphs.

CUTLASS grouped GEMM enables GPU-accelerated heterogeneous GNNs with a Typed Matrix Multiply. Current heterogeneous GNNs in PyG utilize layers with linear weights for each edge type or node type, performing a for-loop over them. Typed Matrix Multiply allows us to parallelize over node/edge types, addressing the for-loop bottleneck.

For GNNs, much of the computation relies on sparse aggregations that can be optimized. Additionally, cuGraph-ops integration for accelerating sparse aggregations is coming soon.

These optimizations are to be integrated into PyG to provide significant speed-ups with no lines of code change needed by a user.

Performance Benefits

In the graphs below, you can see the performance speedup from the accelerated GPU sampling and training compared to the GPU baseline.

Accelerated GPU Sampling


2-Hop Neighbor Sampling on obgn-mag dataset
SW:PyG 2.1, cuGraph 22.10, GPU: NVIDIA A6000, CPU: AMD Ryzen Threadripper PRO 3975WX


Accelerated GPU Sampling


HeteroLinear on FakeHeteroDataset with ~20000 nodes per node type
SW: pyg-lib 0.1.0, GPU: NVIDIA A100, CPU: AMD EPYC 7742 64-Core Processor


Summary

PyTorch Geometric is one of the most popular libraries for graph neural networks, and in collaboration with NVIDIA, we are enabling our users with extremely high performance on NVIDIA GPUs while maintaining the ease and flexibility of PyG with no line of code change.

Learn more about the optimization in our GTC talk, Accelerating GNNs with PyTorch Geometric and GPUs.

Initial CUTLASS integration is available in PyG 2.1 with additional accelerations coming soon.

NVIDIA is also releasing an optimized PyG container with the latest upstream improvements, performance-tuned and tested for NVIDIA GPUs. It will be available in Q4’2022 in early access. Sign up here to join the interest list.