v0.7.0
This is a new major release with various system optimizations, new features and enhancements, new models and bugfixes.
Important: Change on PyPI Installation
DGL pip wheels are no longer shipped on PyPI. Use the following command to install DGL with pip:
pip install dgl -f https://data.dgl.ai/wheels/repo.html
for CPU.pip install dgl-cuXX -f https://data.dgl.ai/wheels/repo.html
for CUDA.pip install --pre dgl -f https://data.dgl.ai/wheels-test/repo.html
for CPU nightly builds.pip install --pre dgl-cuXX -f https://data.dgl.ai/wheels-test/repo.html
for CUDA nightly builds.
This does not impact conda installation.
GPU-based Neighbor Sampling
DGL now supports uniform neighbor sampling and MFG conversion on GPU, contributed by @nv-dlasalle from NVIDIA. Experiment for GraphSAGE on the ogbn-product graph gets a >10x speedup (reduced from 113s to 11s per epoch) on a g3.16x instance. The following docs have been updated accordingly:
- A new user guide chapter Using GPU for Neighborhood Sampling about when and how to use this new feature.
- The API doc of NodeDataLoader.
New Tutorials for Multi-GPU and Distributed Training
The release brings two new tutorials about multi-GPU training for node classification and graph classification, respectively. There is also a new tutorial about distributed training across multiple machines. All of them are available at https://docs.dgl.ai/.
Improved CPU Message Passing Kernel
The update includes a new CPU implementation of the core GSpMM kernel for GNN message passing, thanks to @sanchit-misra from Intel. The new kernel performs tiling on the sparse CSR matrix and leverages Intel’s LibXSMM for kernel generation, which gives an up to 4.4x speedup over the old kernel. Please read their paper https://arxiv.org/abs/2104.06700 for details.
More efficient NodeEmbedding for multi-GPU training and distributed training
DGL now utilizes NCCL to synchronize the gradients of sparse node embeddings (dgl.nn.NodeEmbedding
) during training (credits to @nv-dlasalle from NVIDIA). The NCCL feature is available in both dgl.optim.SparseAdam
and dgl.optim.SparseAdagrad
. Experiments show a 20% speedup (reduced from 47.2s to 39.5s per epoch) on a g4dn.12xlarge (4 T4 GPU) instance for training RGCN on ogbn-mag graph. The optimization is automatically turned on when NCCL backend support is detected.
The sparse optimizers for dgl.distributed.DistEmbedding
now use a synchronized gradient update strategy. We add a new optimizer dgl.distributed.optim.SparseAdam
. The dgl.distributed.SparseAdagrad
has been moved to dgl.distributed.optim.SparseAdagrad
.
Sparse-sparse Matrix Multiplication and Addition Support
We add two new APIs dgl.adj_product_graph
and dgl.adj_sum_graph
that perform sparse-sparse matrix multiplications and additions as graph operations respectively. They can run with both CPU and GPU with autograd support. An example usage of these functions is Graph Transformer Networks.
PyTorch Lightning Compatibility
DGL is now compatible with PyTorch Lightning for single-GPU training or training with DistributedDataParallel. See this example of training GraphSAGE with PyTorch Lightning.
- Node classification: https://github.com/dmlc/dgl/blob/master/examples/pytorch/graphsage/train_lightning.py
- Unsupervised learning: https://github.com/dmlc/dgl/blob/master/examples/pytorch/graphsage/train_lightning_unsupervised.py
We thank @justusschock for making DGL DataLoaders compatible with PyTorch Lightning (#2886).
New Models
A batch of 19 new model examples are added to DGL in 0.7 bringing the total number to be 90+. Users can now use the search bar on https://www.dgl.ai/ to quickly locate the examples with tagged keywords. Below is the list of new models added.
- Interaction Networks for Learning about Objects, Relations, and Physics (https://arxiv.org/abs/1612.00222.pdf) (#2794, @Ericcsr)
- Multi-GPU RGAT for OGB-LSC Node Classification (#2835, @maqy1995)
- Network Embedding with Completely-imbalanced Labels (https://ieeexplore.ieee.org/document/8979355) (#2813, @Fizyhsp)
- Temporal Graph Networks improved (#2860, @Ericcsr)
- Diffusion Convolutional Recurrent Neural Network (https://arxiv.org/abs/1707.01926) (#2858, @Ericcsr)
- Gated Attention Networks for Learning on Large and Spatiotemporal Graphs (https://arxiv.org/abs/1803.07294) (#2858, @Ericcsr)
- DeeperGCN (https://arxiv.org/abs/2006.07739) (#2831, @xnuohz)
- Deep Graph Contrastive Representation Learning (https://arxiv.org/abs/2006.04131) (#2828, #3009, @hengruizhang98)
- Graph Neural Networks Inspired by Classical Iterative Algorithms (https://arxiv.org/abs/2103.06064) (#2770, @ffttyy)
- GraphSAINT (#2792) (@lt610)
- Label Propagation (#2852, @xnuohz)
- Combining Label Propagation and Simple Models Out-performs Graph Neural Networks (https://arxiv.org/abs/2010.13993) (#2852, @xnuohz)
- GCNII (#2874, @kyawlin)
- Latent Dirichlet Allocation on GPU (#2883, @yifeim)
- A Heterogeneous Information Network based Cross Domain Insurance Recommendation System for Cold Start Users (#2864, @KounianhuaDu)
- Five heterogeneous graph models: HetGNN/GTN/HAN/NSHE/MAGNN (#2993, @Theheavens)
- New OGB-arxiv and OGB-proteins results (#3018, @espylapiza)
- Heterogeneous Graph Attention Networks with minibatch sampling (#3005, @maqy1995)
- Learning Hierarchical Graph Neural Networks for Image Clustering (https://arxiv.org/abs/2107.01319) (#3087, #3105)
New Datasets
- Two fake news datasets, Gossipcop and Politifact. (#2876, #2939, @kayzliu)
- Two fraud datasets extracted from Yelp and Amazon. See https://arxiv.org/pdf/2008.08692.pdf and https://ponderly.github.io/pub/PCGNN_WWW2021.pdf for details. (#2876, #2908, @kayzliu)
New Functionalities
- KD-Tree, Brute-force family, and NN-descent implementation of KNN (#2767, #2892, #2941) (@lygztq)
- BLAS-based KNN implementation on GPU (#2868, @milesial)
- A new API
dgl.sample_neighbors_biased
for biased neighbor sampling where each node has a tag, and each tag has its own (unnormalized) probability (#1665, #2987, @soodoshll). We also provide two helper functionssort_csr_by_tag
andsort_csc_by_tag
to sort the internal storage of a graph based on tags to allow such kind of neighbor sampling (#1664, @soodoshll). - Distributed sparse Adam node embedding optimizer (#2733)
- Heterogeneous graph’s
multi_update_all
now supports user-defined cross-type reducers (#2891, @Secbone) - Add
in_degrees
andout_degrees
supports todgl.DistGraph
(#2918) - A new API
dgl.sampling.node2vec_random_walk
for Node2vec random walks (#2992, @Smilexuhc) dgl.node_subgraph
,dgl.edge_subgraph
,dgl.in_subgraph
anddgl.out_subgraph
all have arelabel_nodes
argument to allow graph compaction (i.e. removing the nodes with no edges). (#2929)- Allow direct slicing of a batched graph without constructing a new data structure. (#2349, #2851, #2965)
- Allow setting the distributed node embeddings with
NodeEmbedding.all_set_embedding()
(#3047) - Graphs can be directly created from CSR or CSC representations on either CPU or GPU (#3045). See the API doc of
dgl.graph
for more details. - A new
dgl.reorder
API to permute a graph according to RCMK, METIS or custom strategy (#3063) dgl.nn.GraphConv
now has a left normalization which divides the outgoing messages by out-degrees, equivalent to random-walk normalization (#3114)- Add a new
exclude='self'
to EdgeDataLoader to exclude the edges sampled in the current minibatch alone during neighbor sampling when reverse edges are not available (#3122)
Performance Optimizations
- Check if a COO is sorted to avoid sync during forward/backward and parallelize sorted COO/CSR conversion. (#2645, @nv-dlasalle)
- Faster uniform sampling with replacement (#2953)
- Eliminating ctor & dtor &
IsNullArray
overheads in random walks (#2990, @AjayBrahmakshatriya) - GatedGCNConv shortcut with one edge type (#2994)
- Hierarchical Partitioning in distributed training with 25% speedup (#3000, @soodoshll)
- Save memory usage in
node_split
andedge_split
during partitioning (#3132, @JingchengYu94)
Other Enhancements
- Graph partitioning now returns ID mapping from old nodes/edges to new ones (#2857)
- Better error message when
idx_list
out of bound (#2848) - Kill training jobs on remote machines in distributed training when receiving KeyboardInterrupt (#2881)
- Provide a
dgl.multiprocessing
namespace for multiprocess training with fork and OpenMP (#2905) - GAT supports multidimensional input features (#2912)
- Users can now specify graph format for distributed training (#2948)
- CI now runs on Kubernetes (#2957)
to_heterogeneous(to_homogeneous(hg))
now returns the samehg
. (#2958)remove_nodes
andremove_edges
now preserves batch information. (#3119)
Bug Fixes
- Multiprocessing sampling in distributed training hangs in Python 3.8 (#2315, #2826)
- Use correct NIC for distributed training (#2798, @Tonny-Gu)
- Fix potential TypeError in HGT example (#2830, @zhangtianle)
- Distributed training initialization fails with graphs without node/edge data (#2366, #2838)
- DGL Sparse Optimizer will crash when some DGL NodeEmbedding is not involved in the forward pass (#2856, #2859)
- Fix GATConv shape issues with Residual Connections (#2867, #2921, #2922, #2947, #2962, @xieweiyi, @jxgu1016)
- Moving a graph to GPU will change the default CUDA device (#2895, #2897)
- Remove
__len__
method to stop polluting PyCharm outputs (#2902) - Inconsistency in the typing of node types and edge types returned by
load_partition
(#2742, @chwan-rice) NodeDataLoader
andEdgeDataLoader
now supportsDistributedDataParallel
with proper shuffling and batching (#2539, #2911)- Nonuniform sampling with replacement may dereference null pointer (#2942, #2943, @nv-dlasalle)
- Strange behavior of
bipartite_from_networkx()
(#2808, #2917) - Make GCMC example compatible with torchtext 0.9+ (#2985, @alexpod1000)
dgl.to_homogenous
doesn't work correctly on graphs with 0 nodes of a given type (#2870, #3011)- TU regression datasets throw errors (#2952, #3010)
- RGCN generates nan in PyTorch 1.8 but not in PyTorch 1.7.x (#2760, #3013, @nv-dlasalle)
- Deal with situation where
num_layers
equals 1 for GraphSAGE (#3066, @Wang-Yu-Qing) - Lengthen the timeout for distributed node embedding (#2966, #2967 @sojiadeshina)
- Misc fixes in code and documentation (#2844, #2869, #2840, #2879, #2863, #2822, #2907, #2928, #2935, #2960, #2938, #2968, #2961, #2983, #2981, #3017, #3051, #3040, #3064, #3065, #3133, #3139) (@Theheavens, @ab-10, @yunshiuan, @moritzblum, @kayzliu, @universvm, @europeanplaice, etc.)
Deprecations
preserve_nodes
argument indgl.edge_subgraph
is deprecated and renamed torelabel_nodes
.