Variants of Graph Neural Networks and implementation in TensorFlow

Krishna Chaitanya
10 min readAug 12, 2023

--

See Graph Neural Networks and implementing in TensorFlow for introduction and basics.

There are many variants of GNNs, including Graph Convolutional Networks (GCNs), GraphSAGE, Graph Attention Networks (GATs), and more. Each of these variants uses a different function for aggregating and transforming features.

There are several variants of Graph Neural Networks (GNNs), each with its own unique characteristics and use cases. Here are a few of the most popular ones:

  1. Graph Convolutional Networks (GCNs): GCNs are one of the simplest and most widely used types of GNNs. They operate by aggregating the features of each node’s neighbors through a weighted average, where the weights are determined by the edge connections.
  2. GraphSAGE (Graph Sample and Aggregated): GraphSAGE extends GCNs by allowing for inductive learning, where the model can generalize to unseen nodes or graphs. It does this by sampling and aggregating features from a node’s local neighborhood.
  3. Graph Attention Networks (GATs): GATs introduce attention mechanisms to GNNs, allowing the model to learn to weigh the features of neighboring nodes differently. This can be particularly useful when some neighbors are more relevant than others for a given task.
  4. ChebNet (Chebyshev Spectral Graph Convolution): ChebNet uses Chebyshev polynomials to create a spectral graph convolution operation. It can capture information from a node’s wider neighborhood by including higher-degree polynomials.
  5. Graph Isomorphism Networks (GINs): GINs are designed to be as powerful as the Weisfeiler-Lehman graph isomorphism test, a classical algorithm used to test if two graphs are isomorphic. GINs can capture more complex graph structures than many other GNN variants.

Each of these variants has its own strengths and weaknesses, and the best choice often depends on the specific task and data.

Convolutional Networks (GCNs)

GraphSAGE is a method for inductive learning on graphs. Unlike transductive methods like GCNs, which learn a fixed embedding for each node in the training graph, GraphSAGE is able to generate embeddings for previously unseen nodes or graphs. This makes it particularly useful for dynamic graphs where new nodes or edges may be added over time.

The key idea behind GraphSAGE is to learn a function that generates a node’s embedding by sampling and aggregating features from its local neighborhood. This function can then be applied to any node in any graph, regardless of whether the node was present in the training data.

Here’s a step-by-step process of how GraphSAGE works:

  1. Node Feature Initialization: As with GCNs, each node in the graph is initialized with a feature vector.
  2. Neighborhood Sampling: For each node, a fixed-size sample of its neighbors is selected. This allows the model to handle nodes with large degrees and ensures that the computation per node is constant.
  3. Feature Aggregation: Each node aggregates the feature vectors of its sampled neighbors using an aggregation function such as mean, sum, or max. This aggregated feature vector captures the local neighborhood structure of the node.
  4. Feature Transformation: The aggregated feature vector is concatenated with the node’s own feature vector, and then transformed using a fully connected layer and a non-linear activation function.
  5. Repeat Steps 2–4: Steps 2–4 are repeated for a certain number of layers. With each layer, the nodes sample and aggregate features from a larger and larger neighborhood.
  6. Readout: After the final layer, a readout function is used to aggregate the feature vectors of all nodes in the graph to produce a graph-level output.

Now, let’s see how we can implement a simple GraphSAGE model using the Spektral library in TensorFlow:

import spektral
from spektral.layers import GraphSageConv
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dropout, Dense
# Define the model
class GraphSageModel(Model):
def __init__(self, n_hidden, n_labels):
super().__init__()
self.sage_conv1 = GraphSageConv(n_hidden)
self.sage_conv2 = GraphSageConv(n_labels)
self.dropout = Dropout(0.5)
self.dense = Dense(n_labels, 'softmax')
def call(self, inputs, training=False):
x, a = inputs
x = self.dropout(x, training=training)
x = self.sage_conv1([x, a])
x = self.sage_conv2([x, a])
return self.dense(x)
# Instantiate the model
model = GraphSageModel(n_hidden=64, n_labels=dataset.n_labels)

In this example, we define a simple GraphSAGE model with two GraphSAGE layers. The GraphSageConv layer provided by Spektral implements the neighborhood sampling and feature aggregation steps of GraphSAGE. We also include dropout for regularization, and a final dense layer with softmax activation for multi-class classification.

If you have the same underlying graph structure for all samples and only the node and/or edge features change, you can still use Graph Neural Networks. The GNN will learn to extract meaningful information from the features of the nodes and/or edges, and the graph structurewill provide additional context for how those features are related. This is similar to how Convolutional Neural Networks (CNNs) operate on image data: the underlying grid structure of the image is always the same, but the pixel values (features) change from image to image.

Feature Transformation in GraphSAGE: In the provided code example, the feature transformation is implicitly performed by the GraphSageConv layers. Each GraphSageConv layer in Spektral not only aggregates the features of the neighboring nodes, but also applies a transformation to the aggregated features. This transformation is typically a linear transformation (a fully connected layer), followed by a non-linear activation function. So, even though it's not explicitly shown in the code, the feature transformation step is indeed being performed within the GraphSageConv layers.

Inductive vs Transductive Learning in Graphs: The terms inductive and transductive learning refer to how a model generalizes to unseen data:

1. Transductive Learning: In the transductive setting, during training, the model has access to the entire graph, including all nodes and edges. The task is to learn the labels or features of some nodes given the labels or features of other nodes. However, the model cannot generalize to unseen nodes or graphs that were not present in the training graph. GCNs are an example of transductive learning models.

2. Inductive Learning: In the inductive setting, the model is trained on a subset of graphs or nodes and must generalize to unseen graphs or nodes. This is a more realistic setting for many real-world applications where the graph is dynamic and new nodes or edges may be added over time. GraphSAGE is an example of an inductive learning model. It learns a function that generates a node’s embedding based on its local neighborhood, and this function can be applied to any node in any graph, regardless of whether the node or graph was present in the training data.

In simpler terms, if you have a model that’s trained on a social network graph and you add a new user to the network, a transductive model like a GCN wouldn’t be able to generate an embedding for the new user, because it was trained on a fixed graph. On the other hand, an inductive model like GraphSAGE would be able to generate an embedding for the new user by sampling and aggregating features from the user’s friends in the network.

Graph Attention Networks (GATs)

Graph Attention Networks (GATs) introduce attention mechanisms to GNNs, allowing the model to learn to weigh the features of neighboring nodes differently. This can be particularly useful when some neighbors are more relevant than others for a given task.

The key idea behind GATs is to compute the weights for the edges (connections between nodes) using a self-attention mechanism. The attention mechanism allows the model to focus on the most relevant parts of the input and can be learned in a data-driven way.

Here’s a step-by-step process of how GATs work:

  1. Node Feature Initialization: As with GCNs and GraphSAGE, each node in the graph is initialized with a feature vector.
  2. Attention Mechanism: For each node, an attention coefficient is computed for each of its neighbors. This coefficient determines how much attention should be paid to each neighbor when updating the node’s features. The attention coefficients are computed using a shared attentional mechanism (typically a single-layer feedforward neural network, followed by a softmax function to ensure the coefficients sum to 1).
  3. Feature Aggregation: Each node aggregates the feature vectors of its neighbors, weighted by the attention coefficients. This results in a single feature vector for each node that is a weighted combination of its neighbors’ features.
  4. Feature Transformation: The aggregated feature vector is then transformed, typically using a linear transformation followed by a non-linear activation function.
  5. Repeat Steps 2–4: Steps 2–4 are repeated for a certain number of layers. With each layer, the nodes compute attention coefficients and aggregate features from their neighbors.
  6. Readout: After the final layer, a readout function is used to aggregate the feature vectors of all nodes in the graph to produce a graph-level output.

Now, let’s see how we can implement a simple GAT model using the Spektral library in TensorFlow:

import spektral
from spektral.layers import GATConv
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dropout, Dense

# Define the model
class GATModel(Model):
def __init__(self, n_hidden, n_labels):
super().__init__()
self.gat_conv1 = GATConv(n_hidden)
self.gat_conv2 = GATConv(n_labels)
self.dropout = Dropout(0.5)
self.dense = Dense(n_labels, 'softmax')

def call(self, inputs, training=False):
x, a = inputs
x = self.dropout(x, training=training)
x = self.gat_conv1([x, a])
x = self.gat_conv2([x, a])
return self.dense(x)

# Instantiate the model
model = GATModel(n_hidden=64, n_labels=dataset.n_labels)

In this example, we define a simple GAT model with two GAT layers. The GATConv layer provided by Spektral implements the attention mechanism and feature aggregation steps of GAT. We also include dropout for regularization, and a final dense layer with softmax activation for multi-class classification.

ChebNet (Chebyshev Spectral Graph Convolution)

ChebNet is a type of spectral Graph Neural Network that uses Chebyshev polynomials to create a spectral graph convolution operation. It can capture information from a node’s wider neighborhood by including higher-degree polynomials.

Chebyshev Polynomials are a sequence of orthogonal polynomials that are defined recursively. They are named after Pafnuty Chebyshev, a Russian mathematician. They have many properties that make them useful in numerical mathematics, approximation theory, and spectral graph theory.

The first few Chebyshev polynomials are defined as follows:

  • T0(x) = 1
  • T1(x) = x
  • T2(x) = 2x² — 1
  • T3(x) = 4x³ — 3x

In general, the nth Chebyshev polynomial can be defined recursively using the formula:

  • Tn(x) = 2xTn-1(x) — Tn-2(x)

Chebyshev polynomials have the property that they are bounded by -1 and 1 for x in the range [-1, 1]. This makes them useful for normalizing data.

Spectral Analysis in the context of graph theory typically refers to the analysis of the graph Laplacian, which is a matrix that captures the structure of the graph. The eigenvalues and eigenvectors of the graph Laplacian can provide valuable insights into the properties of the graph, such as its connectivity and community structure.

In the context of Graph Neural Networks and specifically ChebNet, spectral analysis is used to perform graph convolutions in the spectral domain (i.e., the domain of the graph Laplacian’s eigenvalues and eigenvectors). The graph Laplacian is first normalized and then approximated using Chebyshev polynomials, which allows the model to capture information from a node’s wider neighborhood.

The key idea behind ChebNet is to perform graph convolutions in the spectral domain (i.e., the domain of the graph Laplacian’s eigenvalues and eigenvectors) using Chebyshev polynomials. This allows the model to capture more complex graph structures than many other GNN variants.

Here’s a step-by-step process of how ChebNet works:

  1. Node Feature Initialization: As with other GNNs, each node in the graph is initialized with a feature vector.
  2. Spectral Graph Convolution: Each node updates its feature vector by performing a convolution operation in the spectral domain. This operation is implemented using Chebyshev polynomials of the graph Laplacian, which allows the model to capture information from a node’s wider neighborhood.
  3. Feature Transformation: The updated feature vector is then transformed, typically using a linear transformation followed by a non-linear activation function.
  4. Repeat Steps 2 and 3: Steps 2 and 3 are repeated for a certain number of layers. With each layer, the nodes perform a spectral graph convolution and transform their features.
  5. Readout: After the final layer, a readout function is used to aggregate the feature vectors of all nodes in the graph to produce a graph-level output.

Now, let’s see how we can implement a simple ChebNet model using the Spektral library in TensorFlow:

import spektral
from spektral.layers import ChebConv
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dropout, Dense

# Define the model
class ChebNetModel(Model):
def __init__(self, n_hidden, n_labels, K):
super().__init__()
self.cheb_conv1 = ChebConv(n_hidden, K)
self.cheb_conv2 = ChebConv(n_labels, K)
self.dropout = Dropout(0.5)
self.dense = Dense(n_labels, 'softmax')

def call(self, inputs, training=False):
x, a = inputs
x = self.dropout(x, training=training)
x = self.cheb_conv1([x, a])
x = self.cheb_conv2([x, a])
return self.dense(x)

# Instantiate the model
model = ChebNetModel(n_hidden=64, n_labels=dataset.n_labels, K=2)

In this example, we define a simple ChebNet model with two ChebNet layers. The ChebConv layer provided by Spektral implements the spectral graph convolution using Chebyshev polynomials. The parameter K is the order of the Chebyshev polynomials, which determines the size of the neighborhood considered by the convolution. We also include dropout for regularization, and a final dense layer with softmax activation for multi-class classification.

Graph Isomorphism Network (GIN).

The Graph Isomorphism Network (GIN) is a type of GNN designed to be as powerful as the Weisfeiler-Lehman (WL) graph isomorphism test, a classical algorithm used to test if two graphs are isomorphic (i.e., identical apart from the naming of nodes). GINs can capture more complex graph structures than many other GNN variants.

Here’s a step-by-step process of how GINs work:

  1. Node Feature Initialization: As with other GNNs, each node in the graph is initialized with a feature vector.
  2. Feature Aggregation: Each node aggregates the feature vectors of its neighboring nodes. Unlike other GNNs, GINs include the node’s own features in the aggregation.
  3. Feature Transformation: The aggregated feature vector is then transformed, typically using a linear transformation followed by a non-linear activation function. The transformation includes a learnable parameter that allows the model to control the importance of the node’s own features versus its neighbors’ features.
  4. Repeat Steps 2 and 3: Steps 2 and 3 are repeated for a certain number of layers. With each layer, the nodes aggregate and transform features from a larger and larger neighborhood.
  5. Readout: After the final layer, a readout function is used to aggregate the feature vectors of all nodes in the graph to produce a graph-level output.

Now, let’s see how we can implement a simple GIN model using the Spektral library in TensorFlow:

import spektral
from spektral.layers import GINConv
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dropout, Dense

# Define the model
class GINModel(Model):
def __init__(self, n_hidden, n_labels):
super().__init__()
self.gin_conv1 = GINConv(n_hidden)
self.gin_conv2 = GINConv(n_labels)
self.dropout = Dropout(0.5)
self.dense = Dense(n_labels, 'softmax')

def call(self, inputs, training=False):
x, a = inputs
x = self.dropout(x, training=training)
x = self.gin_conv1([x, a])
x = self.gin_conv2([x, a])
return self.dense(x)

# Instantiate the model
model = GINModel(n_hidden=64, n_labels=dataset.n_labels)

In this example, we define a simple GIN model with two GIN layers. The GINConv layer provided by Spektral implements the feature aggregation and transformation steps of GIN. We also include dropout for regularization, and a final dense layer with softmax activation for multi-class classification.

--

--

Krishna Chaitanya

Krishna is a researcher specializing in deep learning, control theory, and machine learning to optimize data-driven models for off-road mobile robots.