Releases: dmlc/dgl
v1.0.1
What's new
- Enable dgl.sparse on Mac and Windows.
- Fixed several bugs.
v1.0.0
v1.0.0 release is a new milestone for DGL. 🎉🎉🎉
New Package: dgl.sparse
In this release, we introduced a brand new package: dgl.sparse, which allows DGL users to build GNNs in Sparse Matrix paradigm. We provided Google Colab tutorials on dgl.sparse package from getting started on sparse APIs to building different types of GNN models including Graph Diffusion, Hypergraph and Graph Transformer, and 10+ examples of commonly used models in github code base.
NOTE: this feature is currently only available in Linux.
New Additions
- A new example of SEAL+NGNN for OGBL datasets (#4550, #4772)
- Add DeepWalk module (#4562)
- A new example of BiPointNet for modelnet40 dataset (#4434)
- Add Transformers related modules: Metapath2vec (#4660), LaplacianPosEnc (#4750), DegreeEncoder (#4742), ToLevi (#4884), BiasedMultiheadAttention (#4916), PathEncoder (#4956), GraphormerLayer (#4959), SpatialEncoder & SpatialEncoder3d (#4991)
- Add Graph Positional Encoding Ops: double_radius_node_labeling (#4513), shortest_dist (#4799)
- Add a new sample algorithm: (La)yer-Neigh(bor) sampling (#4668)
System Enhancement
- Support PyTorch CUDA Stream (#4503)
- Support canonical edge types in HeteroGraphConv (#4440)
- Reduce Memory Consumption in Distributed Training Example (#4558)
- Improve the performance of
is_unibipartite
(#4556) - Add options for padding and eigenvalues in Laplacian positional encoding transform (#4628)
- Reduce startup overhead for dist training (#4735)
- Add Heterogeneous Graph support for GNNExplainer (#4401)
- Enable sampling with edge masks on homogeneous graph (#4748)
- Enable save and load for Distributed Optimizer (#4752)
- Add edge-wise message passing operators u_op_v (#4801)
- Support bfloat16 (bf16) (#4648)
- Accelerate CSRSliceMatrix<kDGLCUDA, IdType> by leveraging hashmap (#4924)
- Decouple size of node/edge data files from nodes/edges_per_chunk entries in the metadata.json for Distributed Graph Partition Pipeline(#4930)
- Canonical etypes are always used during partition and loading in distributed DGL(#4777, #4814).
- Add parquet support for node/edge data in Distributed Partition Pipeline.(#4933)
Deprecation & Cleanup
- Deprecate unused dataset attributes (#4666)
- Cleanup outdated examples (#4751)
- Remove the deprecated functions (#5115, #5116, #5117)
- Drop outdated modules (#5114, #5118)
Dependency Update
Starting from this release, we will drop support for CUDA 10.1 and 11.0. On windows, we will further drop support for CUDA 10.2.
Linux: CentOS 7+ / Ubuntu 18.04+
PyTorch ver. \ CUDA ver. | 10.2 | 11.3 | 11.6 | 11.7 |
---|---|---|---|---|
1.12 | ✅ | ✅ | ✅ | |
1.13 | ✅ | ✅ |
Windows: Windows 10+/Windows server 2016+
PyTorch ver. \ CUDA ver. | 11.3 | 11.6 | 11.7 |
---|---|---|---|
1.12 | ✅ | ✅ | |
1.13 | ✅ | ✅ |
Bugfixes
- Fix a bug related to EdgeDataLoader (#4497)
- Fix graph structure corruption with transform (#4753)
- Fix a bug causing UVA cannot work on old GPUs (#4781)
- Fix NN modules crashing with non-FP32 inputs (#4829)
Installation
The installation URL and conda repository has changed for CUDA packages. Please use the following:
# If you installed dgl-cuXX pip wheel or dgl-cudaXX.X conda package, please uninstall them first.
pip install dgl -f https://data.dgl.ai/wheels/repo.html # for CPU
pip install dgl -f https://data.dgl.ai/wheels/cuXX/repo.html # for CUDA, XX = 102, 113, 116 or 117
conda install dgl -c dglteam # for CPU
conda install dgl -c dglteam/label/cuXX # for CUDA, XX = 102, 113, 116 or 117
v0.9.1
v0.9.1 is a minor release with the following update:
Distributed Graph Partitioning Pipeline
DGL now supports partitioning and preprocessing graph data using multiple machines. At its core is a new data format called Chunked Graph Data Format (CGDF) which stores graph data by chunks. The new pipeline processes data chunks in parallel which not only reduces the memory requirement of each machine but also significantly accelerates the entire procedure. For the same random graph with 1B nodes/5B edges, using a cluster of 8 AWS EC2 x1e.4xlarge (16 vCPU, 488GB RAM each), the new pipeline can reduce the running time to 2.7 hours and cut down the money cost by 3.7x. Read the feature highlight blog for more details.
To get started with this new feature, check out the new user guide chapter.
New Additions
- A new example of SEAL model for OGBL datasets: https://github.com/dmlc/dgl/tree/master/examples/pytorch/ogb/seal_ogbl (#4291)
- A new example of Directional Graph Substructure Networks (GSN) for OGBG-MolPCBA dataset: https://github.com/dmlc/dgl/tree/master/examples/pytorch/ogb/directional_GSN (#4405)
- A new example of the Network In Graph Neural Network model for OGBL datasets: https://github.com/dmlc/dgl/tree/master/examples/pytorch/ogb/ngnn (#4328)
- PyTorch Multi-GPU examples are moved to
dgl/examples/pytorch/multigpu/
. With a new example of multi-GPU graph property prediction that can achieve 9.5x speedup on 8 GPUs. (#4385) - A new example of Heterogeneous RGCN model on OGBN-MAG dataset: https://github.com/dmlc/dgl/tree/master/examples/pytorch/ogb/ogbn-mag (#4331)
- Refactored the code style of the following commonly visited examples: RGCN, GIN, GAT. (#4327) (#4280) (#4240)
System Enhancement
- Two new APIs
dgl.use_libxsmm
anddgl.is_libxsmm_enabled
to enable/disable Intel LibXSMM. (#4455) - Added a new option
exclude_self
to exclude self-loop edges fordgl.knn_graph
. The API now supports creating a batch of KNN graphs. (#4389) - The distributed training program launched by DGL will now report error when any trainer/server fails.
- Speedup DataLoader by adding CPU affinity support. (#4126)
- Enable graph partition book to support canonical edge types. (#4343)
- Improve the performance of CUDA SpMMCSr (#4363)
- Add CUDA Weighted Neighborhood Sampling (#4064)
- Enable UVA for Weighted Samplers (#4314)
- Allow add data to self loop created by AddSelfLoop or add_self_loop (#4261)
- Add CUDA Weighted Randomwalk Sampling (#4243)
Deprecation & Cleanup
- Removed the already deprecated
AsyncTransferer
class. The functionality has been incorporated to DGL DataLoader. (#4505) - Removed the already deprecated
num_servers
andnum_workers
arguments ofdgl.distributed.initialize
. (#4284)
Dependency Update
Starting from this release, we will drop support for CUDA 10.1 and 11.0. On windows, we will further drop support for CUDA 10.2.
Linux: CentOS 7+ / Ubuntu 18.04+
PyTorch ver. \ CUDA ver. | 10.2 | 11.1 | 11.3 | 11.5 | 11.6 |
---|---|---|---|---|---|
1.9 | ✅ | ✅ | |||
1.10 | ✅ | ✅ | ✅ | ||
1.11 | ✅ | ✅ | ✅ | ✅ | |
1.12 | ✅ | ✅ | ✅ |
Windows: Windows 10+/Windows server 2016+
PyTorch ver. \ CUDA ver. | 11.1 | 11.3 | 11.5 | 11.6 |
---|---|---|---|---|
1.9 | ✅ | |||
1.10 | ✅ | ✅ | ||
1.11 | ✅ | ✅ | ✅ | |
1.12 | ✅ | ✅ |
Bugfixes
- Fix a crash bug due to incorrect dtype in dgl.to_block() (#4487)
- Fix a bug related to unpinning when tensoradaptor is not available (#4450)
- Fix a bug related to pinning empty tensors and graphs (#4393)
- Remove duplicate entries of CUB submodule (#4499)
- Fix broken static_assert (#4342)
- A bunch of fixes in edge_softmax_hetero (#4336)
- Fix the default value of
num_bases
in RelGraphConv module (#4321) - Fix etype check in DistGraph.edge_subgraph (#4322)
- Fix incorrect _bias and bias usage (#4310)
- Enable DistGraph.find_edge() works with str or tuple of str (#4319)
- Fix a numerical bug related to SparseAdagrad. (#4253)
v0.9.0
This is a major update with several new features including graph prediction pipeline in DGL-Go, cuGraph support, mixed precision support, and more.
Starting from 0.9 we also ship arm64 builds for Linux and OSX.
DGL-Go
DGL-Go now supports training GNNs for graph property prediction tasks. It includes two popular GNN models – Graph Isomorphism Network (GIN) and Principal Neighborhood Aggregation (PNA). For example, to train a GIN model on the ogbg-molpcba dataset, first generate a YAML configuration file using command:
dgl configure graphpred --data ogbg-molpcba --model gin
which generates the following configuration file. Users can then manually adjust the configuration file.
version: 0.0.2
pipeline_name: graphpred
pipeline_mode: train
device: cpu # Torch device name, e.g., cpu or cuda or cuda:0
data:
name: ogbg-molpcba
split_ratio: # Ratio to generate data split, for example set to [0.8, 0.1, 0.1] for 80% train/10% val/10% test. Leave blank to use builtin split in original dataset
model:
name: gin
embed_size: 300 # Embedding size
num_layers: 5 # Number of layers
dropout: 0.5 # Dropout rate
virtual_node: false # Whether to use virtual node
general_pipeline:
num_runs: 1 # Number of experiments to run
train_batch_size: 32 # Graph batch size when training
eval_batch_size: 32 # Graph batch size when evaluating
num_workers: 4 # Number of workers for data loading
optimizer:
name: Adam
lr: 0.001
weight_decay: 0
lr_scheduler:
name: StepLR
step_size: 100
gamma: 1
loss: BCEWithLogitsLoss
metric: roc_auc_score
num_epochs: 100 # Number of training epochs
save_path: results # Directory to save the experiment results
Alternatively, users can fetch model recipes of pre-defined hyperparameters for the original experiments.
dgl recipe get graphpred_pcba_gin.yaml
To launch training:
dgl train --cfg graphpred_ogbg-molpcba_gin.yaml
Another addition is a new command to conduct inference of a trained model on some other dataset. For example, the following shows how to apply the GIN model trained on ogbg-molpcba
to ogbg-molhiv
.
# Generate an inference configuration file from a saved experiment checkpoint
dgl configure-apply graphpred --data ogbg-molhiv --cpt results/run_0.pth
# Apply the trained model for inference
dgl apply --cfg apply_graphpred_ogbg-molhiv_pna.yaml
It will save the model prediction in a CSV file like below
Mixed Precision
DGL is compatible with the PyTorch Automatic Mixed Precision (AMP) package for mixed precision training, thus saving both training time and GPU memory consumption. This feature requires PyTorch 1.6+ and Python 3.7+.
By wrapping the forward pass with torch.cuda.amp.autocast()
, PyTorch automatically selects the appropriate data type for each op and tensor. Half precision tensors are memory efficient, most operators on half precision tensors are faster as they leverage GPU tensorcores.
import torch.nn.functional as F
from torch.cuda.amp import autocast
def forward(g, feat, label, mask, model):
with autocast(enabled=True):
logit = model(g, feat)
loss = F.cross_entropy(logit[mask], label[mask])
return loss
Small gradients in float16
format have underflow problems (flush to zero). PyTorch provides a GradScaler
module to address this issue. It multiplies the loss by a factor and invokes backward pass on the scaled loss to prevent the underflow problem. It then unscales the computed gradients before the optimizer updates the parameters. The scale factor is determined automatically.
from torch.cuda.amp import GradScaler
scaler = GradScaler()
def backward(scaler, loss, optimizer):
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
Putting everything together, we have the example below.
import torch
import torch.nn as nn
from dgl.data import RedditDataset
from dgl.nn import GATConv
from dgl.transforms import AddSelfLoop
class GAT(nn.Module):
def __init__(self, in_feats, num_classes, num_hidden=256, num_heads=2):
super().__init__()
self.conv1 = GATConv(in_feats, num_hidden, num_heads, activation=F.elu)
self.conv2 = GATConv(num_hidden * num_heads, num_hidden, num_heads)
def forward(self, g, h):
h = self.conv1(g, h).flatten(1)
h = self.conv2(g, h).mean(1)
return h
device = torch.device('cuda')
transform = AddSelfLoop()
data = RedditDataset(transform)
g = data[0]
g = g.int().to(device)
train_mask = g.ndata['train_mask']
feat = g.ndata['feat']
label = g.ndata['label']
in_feats = feat.shape[1]
model = GAT(in_feats, data.num_classes).to(device)
model.train()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3, weight_decay=5e-4)
for epoch in range(100):
optimizer.zero_grad()
loss = forward(g, feat, label, train_mask, model)
backward(scaler, loss, optimizer)
Thanks @nv-dlasalle @ndickson-nvidia @yaox12 etc. for support!
cuGraph Interface
The RAPIDS cuGraph library provides a collection of GPU accelerated algorithms for graph analytics, such as centrality computation and community detection. According to its documentation, “the latest NVIDIA GPUs (RAPIDS supports Pascal and later GPU architectures) make graph analytics 1000x faster on average over NetworkX”.
To install cuGraph, we recommend following the practice below.
conda install mamba -n base -c conda-forge
mamba create -n dgl_and_cugraph -c dglteam -c rapidsai-nightly -c nvidia -c pytorch -c conda-forge cugraph pytorch torchvision torchaudio cudatoolkit=11.3 dgl-cuda11.3 tqdm
conda activate dgl_and_cugraph
DGL now supports compatibility with cuGraph by allowing conversion between a DGLGraph object and a cuGraph graph object, making it possible for DGL users to access efficient graph analytics implementations in cuGraph. For example, users can perform community detection on a graph with the Louvain method available in cuGraph.
import cugraph
from dgl.data import CoraGraphDataset
dataset = CoraGraphDataset()
g = dataset[0].to('cuda')
cugraph_g = g.to_cugraph()
cugraph_g = cugraph_g.to_undirected()
parts, modularity_score = cugraph.louvain(cugraph_g)
The community membership of nodes from parts['partition']
can then be used as auxiliary node labels or node features.
If you have modified the structure of a cuGraph graph object or loaded graph data with cuGraph, you can also convert it to a DGLGraph object.
import dgl
g = dgl.from_cugraph(cugraph_g)
Credits to @VibhuJawa!
Arm64 builds
Linux AArch64 and OSX M1 (arm64) are now supported. One can install them as usual with pip
and conda
:
pip install dgl-cuXX -f https://data.dgl.ai/wheels/repo.html
conda install -c dglteam dgl-cudaXX.X # currently not available for OSX M1
Quality-of-life updates
- Added more missing FP16 specializations (#4140, @ndickson-nvidia )
- Allow communicators of size one when NCCL is missing (#3713, @nv-dlasalle )
- Automatically unpin DGL tensors when out of scope to avoid potential bugs (#4135, @yaox12 )
System optimizations
- Enable using UVA and FP16 with SparseAdam Optimizer (#3885, @nv-dlasalle )
- Enable USE_EPOLL by default in distributed training (#4167)
- Optimize the use of alternative streams in dataloader (#4177, @yaox12 )
- Redirect AllocWorkspace to PyTorch's allocator if available (#4199, @yaox12 )
Bug fixes
- Massive refactoring of examples including GCN, GraphSAGE, PinSAGE, EGES, DGI, GATv2, and many more (#4130, #4194, #4186, #4197, #4201, #4160, #4220, #4219, #4218, #4242, #4255, huge thanks to @chang-l!)
- Fix CareGNN example to adapt to new sampler interface (#4211, @yaox12)
- Fix #4150 (#4164, #4198, #4212)
- Fix etype not guaranteed to be sorted in distributed training (#4156)
- Fix compiler warnings (#4051, @TristonC)
- Fix correct and smooth example using validation labels during prediction in validation (#4158, @LucasPrietoAl )
- Fix build issues on mac OS (#4168, #4175)
- Fix that pin_prefetcher is not actually enabled (#4169, @yaox12 )
- Fix A Bug Related to GroupRevRes (#4181)
- Fix
deferred_dtype
missing error (#4174, @nv-dlasalle ) - Add CUDA context availability check before setting curand seed (#4223, @yaox12)
- Fix dtype mismatch when copy graph into shared memory and get it back (#4222) (#4228)
- Fix
graph
attribute missing in DataLoader when device is not specified (#4245) - Record stream when using another CUDA stream for data transfer (#4250, @yaox12 )
- Fix Multiple Backwards Pass Error with
retain_graph
being set (#4078) (#4249) - Doc fixes (#4149, #4180, #4193, #4246, #4248, @PotatoChipsNinja @yaox12 @alxwen711 @Zhanghyi )
Misc
v0.8.2
This is a minor release with the following updates.
Test AArch64 Build
A 0.8.2 test build for AArch64 is available in
pip install dgl -f https://data.dgl.ai/wheels-test/repo.html # or dgl-cuXX for CUDA
New Modules
- Graph Isomorphism Network with Edge Features (#3934)
dgl.transforms.FeatMask
for randomly dropping out dimensions of all node/edge features (#3968, @RecLusIve-F)dgl.transforms.RowFeatNormalizer
for normalization of all node/edge features (#3968, @RecLusIve-F)- Label propagation module (#4017)
- Directional graph network layer (#4017)
- Datasets for developing GNN explainability approaches (#3982)
dgl.transforms.SIGNDiffusion
for augmenting input node features (#3982)
Quality-of-life Updates
- Allow
HeteroLinear
with/without bias (#3970, @ksadowski13) - Allow selection of “socket” for RPC backend in distributed training (#3951)
- Enable specification of maximum number of trials for socket backend in DistDGL (#3977)
- Added floating-point conversion functions to
dgl.transforms.functional
(#3890, @ndickson-nvidia) - Improve the warning message when Tensoradapter is not found (#4055)
- Add sanity check for
in_edges
/out_edges
on empty graphs (#4050)
System Optimization
- Improved graph batching on GPU for Graph DataLoaders (#3895, @ayasar70)
- CPU DataLoader affinitization (#3723 @daniil-sizov)
- Memory consumption optimization on index shuffling in dataloader (#3980)
- Remove unnecessary induced vertices in edge subgraph (#3978, @yaox12)
- Change the
curandState
and launch dimension of GPU neighbor sampling kernel (#3990, @paoxiaode)
Bug fixes
- Fix multi-GPU edge classification crashing with pure-GPU sampling (#3946)
- Fixed race conditions in distributed SparseAdam and SparseAdagrad (#3971, @ndickson-nvidia)
- Fix launch parameters index select kernel in sparse pull for multi-GPU sparse embedding (#3524, @nv-dlasalle)
- Fix import error when tensorflow backend is specified (#4015)
- Fix DistDGL crashing when sampling on bipartite graphs (#4014)
- Prevent users from attempting to pin PyTorch non-contiguous tensors or views only encompassing part of tensor (#3992, @nv-dlasalle)
- Fix Cython CAPI holding GIL causes deadlock when Python callback is asynchronous (#4036)
- Misc unit test, example, doc fixes etc. (#3947, #3941, #3928, #3944, #3505, #3953, #3983, #3996, #4009, #4010, #4016, #4022, #4023, #4027, #4030, #4034, #4038, #4053, #4058, #4060 @Kh4L, @daniil-sizov, @HenryChang213, @sharique1006, @msharmavikram, @initzhang, @yinpeiqi, @chang-l, @nv-dlasalle, @Sanzo00, @Eurus-Holmes, @xiaopqr, @decoherencer)
v0.8.1
This is a minor release that includes the following model updates, optimizations, new features and bug fixes.
Model update
nn.GroupRevRes
from Training Graph Neural Networks with 1000 layers [#3842]transforms.LaplacianPositionalEncoding
from Graph Neural Networks with Learnable Structural and Positional Representations [#3869]transforms.RWPositionalEncoding
from Graph Neural Networks with Learnable Structural and Positional Representations [#3869]dataloading.SAINTSampler
from GraphSAINT [#3879]nn.EGNNConv
from E(n) Equivariant Graph Neural Networks [#3901]nn.PNAConv
from the baselines of E(n) Equivariant Graph Neural Networks [#3901]
Example update
- Position-aware GNN [#3823 @RecLusIve-F]
- EGES (Enhanced Graph Embedding with Side info) [#3756 @Wang-Yu-Qing]
Feature update (new functionalities, interface changes, etc.)
- Radius graph - construct a graph by connecting points within a given distance. [#3829 @ksadowski13]
- It uses
torch.cdist
so the space complexity is O(N^2).
- It uses
- Added a
get_attention
parameter inGlobalAttentionPooling
. [#3837 @decoherencer]
Quality of life update
- Example to train with multi-GPU with PyTorch Lightning. [#3863]
- Multi-GPU inference with UVA. [#3827 @nv-dlasalle]
- Enable UVA sampling with CPU indices to save GPU memory. [#3892]
- Set
stacklevel=2
for DGL-raised warnings. [#3816] - Pure GPU example of GraphSAGE, with both node classification and link prediction. [#3796 @nv-dlasalle, #3856 @Kh4L]
- Tensoradapter DLPack 0.6 compatibility / PyTorch 1.11 support. [#3803]
System optimization
- Enable UVA for PinSAGE and RandomWalk. [#3857 @yaox12]
- METIS partition with communication volume minimization, reduces the communication volume by 13.4% compared with edge-cut minimization on ogbn-products. [#3821 @chwan1016]
- Change parameter of curand_init for reducing GPU latency [#3794 @paoxiaode]
Bug fixes
- Fix Python 3.10 import error [#3862]
- Fix repeated 0’s in DataLoader index iteration when
shuffle=False
[#3892] - DataLoader device cannot be None [#3822 @yinpeiqi]
- Fix device error in negative sampling with UVA [#3904 @nv-dlasalle]
- Illegal instruction in ClusterGCNSampler (#3910)
- Include pin memory status in pickling and deep copy [#3914]
- Misc doc fixes (@lvcrek @AzureLeon1 @decoherencer @yaox12 @ketyi )
v0.8.0post2
This is a bugfix release including the following bugfixes:
Quality-of-life updates
- Python 3.10 support.
- PyTorch 1.11 support.
- CUDA 11.5 support on Linux. Please install with
pip install dgl-cu115 -f https://data.dgl.ai/wheels/repo.html # if using pip conda install dgl-cuda11.5 -c dglteam # if using conda
- Compatibility to DLPack 0.6 in tensoradapter (#3803) for PyTorch 1.11
- Set stacklevel=2 for dgl_warning (#3816)
- Support custom datasets in DataLoader that are not necessarily tensors (#3810 @yinpeiqi )
Bug fixes
- Pass ntype/etype into partition book when node/edge_split (#3828)
- Fix multi-GPU RGCN example (#3871 @yaox12)
- Send rpc messages blockingly in case of congestion (#3867). Note that this fix would probably cause speed regression in distributed DGL training. We were still finding the root cause of the underlying issue in #3881.
- Fix CopyToSharedMem assuming that all relation graphs are homogeneous (#3841)
- Fix HAN example crashing with CUDA (#3841)
- Fix UVA sampling crash without specifying prefetching features (#3862)
- Fix documentation display issue of node/edge_split (#3858)
- Fix device mismatch error in GraphSAGE distributed training example under multi-node multi-GPU (#3870)
- Use
torch.distributed.algorithms.join.Join
to deal with uneven training sets in distributed training (#3870) - Dataloader documentation fixes (#3886)
- Remove redundant reference of networkx package in pagerank.py (#3888 @AzureLeon1 )
- Make source build work for systems where the default is Python 2 (#3718)
- Fix UVA sampling with partially specified node types (#3897)
v0.8.0post1
v0.8.0
v0.8.0 is a major release with many new features, system improvement and fixes. Read the blog for the highlighted features.
Major features
Mini-batch Sampling Pipeline Update
Enabled CUDA UVA-based optimization and feature prefetching for all built-in graph samplers (up to 4x speedup compared to v0.7). Users can now specify the features to prefetch and turn on UVA optimization in dgl.dataloading.Sampler
and dgl.dataloading.DataLoader
.
g = ... # some DGLGraph data
train_nids = ... # training node IDs
sampler = dgl.dataloading.MultiLayerNeighborSampler(
fanout=[10, 15],
prefetch_node_feats=['feat'], # prefetch node feature 'feat'
prefetch_labels=['label'], # prefetch node label 'label'
)
dataloader = dgl.dataloading.DataLoader(
g, train_nids, sampler,
device='cuda:0', # perform sampling on GPU 0
batch_size=1024,
shuffle=True,
use_uva=True # turn on UVA optimization
)
We have done a major refactor on the sampling components to make it easier to implement new graph samplers. Added a new base class dgl.dataloading.Sampler
with one abstract method sample for overriding. Added new APIs dgl.set_src_lazy_features
, dgl.set_dst_lazy_features
, dgl.set_node_lazy_features
, dgl.set_edge_lazy_features
for customizing prefetching rules. The code below shows the new user experience.
class NeighborSampler(dgl.dataloading.Sampler):
def __init__(self,
fanouts : list[int],
prefetch_node_feats: list[str] = None,
prefetch_edge_feats: list[str] = None,
prefetch_labels: list[str] = None):
super().__init__()
self.fanouts = fanouts
self.prefetch_node_feats = prefetch_node_feats
self.prefetch_edge_feats = prefetch_edge_feats
self.prefetch_labels = prefetch_labels
def sample(self, g, seed_nodes):
output_nodes = seed_nodes
subgs = []
for fanout in reversed(self.fanouts):
# Sample a fixed number of neighbors of the current seed nodes.
sg = g.sample_neighbors(seed_nodes, fanout)
# Convert this subgraph to a message flow graph.
sg = dgl.to_block(sg, seed_nodes)
seed_nodes = sg.srcdata[NID]
subgs.insert(0, sg)
input_nodes = seed_nodes
# handle prefetching
dgl.set_src_lazy_features(subgs[0], self.prefetch_node_feats)
dgl.set_dst_lazy_features(subgs[-1], self.prefetch_labels)
for subg in subgs:
dgl.set_edge_lazy_features(subg, self.prefetch_edge_feats)
return input_nodes, output_nodes, subgs
Related documentations:
- Reworked the user guide chapter for customizing graph samplers.
- Added a new user guide chapter for writing graph samplers with feature prefetching.
We thank Xin Yao (@yaox12 ) and Dominique LaSalle (@nv-dlasalle ) from NVIDIA and David Min (@davidmin7 ) from UIUC for their contributions.
DGL-Go
DGL-Go is a new command line tool for users to get started with training, using and studying Graph Neural Networks (GNNs). Data scientists can quickly apply GNNs to their problems, whereas researchers will find it useful to customize their experiments.
The initial release include
- Four commands,
dgl train
,dgl recipe
,dgl configure
anddgl export
. - 3 training pipelines for node prediction using full graph training, link prediction using full graph training and node prediction using neighbor sampling.
- 5 node encoding models: gat, gcn, gin, sage, sgc; 3 edge encoding models: bilinear, dot-product, element-wise.
- 10 datasets including custom dataset in CSV format.
NN Modules
We have accelerated dgl.nn.RelGraphConv
and dgl.nn.HGTConv
by up to 36x and 12x compared with the baselines from v0.7 and PyG. Shortened the implementation of dgl.nn.RelGraphConv
by 3x (from 200L → 64L).
Breaking change: dgl.nn.RelGraphConv
no longer accepts 1-D integer tensor representing node IDs during forward. Please switch to torch.nn.Embedding
to explicitly represent trainable node embeddings.
Below are the new NN modules added to v0.8:
GATv2Conv
: GATv2 from How Attentive are Graph Attention Networks?EGATConv
: Graph attention layer that handles edge features from Rossmann-ToolboxEdgePredictor
: Predictor/score function for pairs of node representationsTransE
: Similarity measure from Translating Embeddings for Modeling Multi-relational DataTransR
: Similarity measure from Learning entity and relation embeddings for knowledge graph completionHeteroLinear
: Apply linear transformations on heterogeneous inputs.HeteroEmbedding
: Create a heterogeneous embedding table.HGTConv
: Heterogeneous graph transformer convolution from Heterogeneous Graph TransformerTypedLinear
: Linear transformation according to types.JumpingKnowledge
: The Jumping Knowledge aggregation module from Representation Learning on Graphs with Jumping Knowledge NetworksGNNExplainer
: GNNExplainer model from GNNExplainer: Generating Explanations for Graph Neural Networks
A new edge_weight
argument is added to several GNN modules to support training on weighted graph. Added a new user guide chapter 5.5 about how to use edge weights in your GNN model.
Graph Dataset and Transforms
Rename the old dgl.transform
package to dgl.transforms
to follow PyTorch’s namespace convention. All DGL’s datasets now accept an extra transforms keyword argument for data augmentation and transformation:
import dgl
import dgl.transforms as T
t = T.Compose([
T.AddSelfLoop(),
T.GCNNorm(),
])
dataset = dgl.data.CoraGraphDataset(transform=t)
g = dataset[0] # graph and features will be transformed automatically
Added 16 graph data transforms module:
Compose
: Create a transform composed of multiple transforms in sequence.AddSelfLoop
: Add self-loops for each node in the graph and return a new graph.RemoveSelfLoop
: Remove self-loops for each node in the graph and return a new graph.AddReverse
: Add a reverse edge (i,j) for each edge (j,i) in the input graph and return a new graph.ToSimple
: Convert a graph to a simple graph without parallel edges and return a new graph.LineGraph
: Return the line graph of the input graph.KHopGraph
: Return the graph whose edges connect the k-hop neighbors of the original graph.AddMetaPaths
: Add new edges to an input graph based on given metapaths, as described in Heterogeneous Graph Attention Network.GCNNorm
: Apply symmetric adjacency normalization to an input graph and save the result edge weights, as described in Semi-Supervised Classification with Graph Convolutional Networks.PPR
: Apply personalized PageRank (PPR) to an input graph for diffusion, as introduced in The pagerank citation ranking: Bringing order to the web.- [
HeatKernel
](https://docs.dgl.ai/generated/dgl.transforms.HeatKernel.html#dgl.trans...
0.7.2
0.7.2 Release Notes
This is a patch release targeting CUDA 11.3 and PyTorch 1.10. It contains (1) distributed training on heterogeneous graphs, and (2) bug fixes and code reorganization commits. The performance impact should be minimal.
To install with CUDA 11.3 support, run either
pip install dgl-cu113 -f https://data.dgl.ai/wheels/repo.html
or
conda install -c dglteam dgl-cuda11.3
Distributed Training on Heterogeneous Graphs
We have made the interface of distributed sampling on heterogeneous graph consistent with single-machine code. Please refer to https://github.com/dmlc/dgl/blob/0.7.x/examples/pytorch/rgcn/experimental/entity_classify_dist.py for the new code.
Other fixes
- [Bugfix] Fix bugs of farthest_point_sampler (#3327, @sangyx)
- [Bugfix] Fix sparse embeddings for PyTorch < 1.7 #3291 (#3333)
- Fixes bug in hg.update_all causing crash #3312 (#3345, @sanchit-misra)
- [Bugfix] And PYTHONPATH in server launch. (#3352)
- [CPU][Sampling][Performance] Improve sampling on the CPU. (#3274, @nv-dlasalle)
- [Performance, CPU] Rewriting OpenMP pragmas into parallel_for (#3171, @tpatejko)
- [Build] Fix OpenMP header inclusion for Mac builds (#3325)
- [Performance] improve coo2csr space complexity when row is not sorted (#3326)
- [BugFix] initialize data if null when converting from row sorted coo to csr (#3360)
- fix broadcast tensor dim in
dgl.broadcast_nodes
(#3351, @jwyyy) - [BugFix] fix typo in fakenews dataset variable name (#3363, @kayzliu)
- [Doc] Added md5sum info for OGB-LSC dataset (#3332, @msharmavikram)
- [Feature] Graceful handling of exceptions thrown within OpenMP blocks (#3353)
- Fix torch import in example (#3372, @jwyyy)
- [Distributed] Allow user to pass-in extra env parameters when launching a distributed training task. (#3375)
- [BugFix] extract gz into target dir (#3389)
- [Model] Refine GraphSAINT (#3328 @ljh1064126026 )
- [Bug] check dtype before convert to gk (#3414)
- [BugFix] add count_nonzero() into SA_Client (#3417)
- [Bug] Do not skip graphconv even no edge exists (#3416)
- Fix edge ID exclusion when both g and g_sampling are specified in EdgeDataLoader(#3322)
- [Bugfix] three bugs related to using DGL as a subdirectory(third_party) of another project. (#3379, @yuanzexi )
- [PyTorch][Bugfix] Use uint8 instead of bool in pytorch to be compatible with nightly version (#3406, #3454, @nv-dlasalle)
- [Fix] Use ==/!= to compare constant literals (str, bytes, int, float, tuple) (#3415, @cclauss)
- [Bugfix][Pytorch] Fix model save and load bug of stgcn_wave (#3303, @HaoWei-TomTom )
- [BugFix] Avoid Memory Leak Issue in PyTorch Backend (#3386, @chwan-rice )
- [Fix] Split nccl sparse push into two groups (#3404, @nv-dlasalle )
- [Doc] remove duplicate papers (#3393, @chwan-rice )
- Fix GINConv backward #3437 (#3440)
- [bugfix] Fix compilation with CUDA 11.5's CUB (#3468, @nv-dlasalle )
- [Example][Performance] Enable faster validation for pytorch graphsage example (#3361, @nv-dlasalle )
- [Doc] Evaluation Tutorial for Link Prediction (#3463)