Impact of Hybrid Structured Random Projections on GNN Performance
Team
- E/20/133, Tharusha Haththella, e20133@eng.pdn.ac.lk
- E/20/305, Sachindu Premasiri, e20305@eng.pdn.ac.lk
- E/20/381, Nimesha Somathilaka, e20381@eng.pdn.ac.lk
Supervisor
- Dr. Eng. Sampath Deegalla, sampath@eng.pdn.ac.lk
Table of content
- Abstract
- Related works
- Methodology
- Experiment Setup and Implementation
- Results and Analysis
- Conclusion
- Publications
- Links
Abstract
Graph representation learning has become an important approach for analyzing complex networked data in tasks such as node classification and link prediction. However, many graph embedding methods rely on dense random projections, which can introduce higher computational overhead and reduced efficiency when scaling to large graphs. This project investigates whether hybrid structured random projections can provide a better balance between efficiency and predictive performance.
The work is centered around the FastRP graph embedding framework and compares three projection strategies: a Gaussian baseline, a striped structured sparse projection, and a hybrid projection that combines structural information with node feature information. Experiments are conducted on benchmark datasets including BlogCatalog, Flickr, Cora, and ogbn-arxiv. Hyperparameters are optimized using Optuna, and downstream performance is evaluated mainly using Macro-F1 and classification accuracy. The results indicate that hybrid and structured projections can remain competitive with, and in some cases outperform, conventional dense Gaussian projections while offering a more scalable direction for graph representation learning.
Related works
Graph embedding methods transform graph nodes into low-dimensional vector representations while preserving structural and semantic relationships. Among these methods, FastRP is a lightweight and scalable embedding approach that generates node representations using random projections and iterative neighborhood aggregation. It is especially attractive for large graphs because it avoids the heavy training cost of deeper graph neural network architectures.
Despite these advantages, dense Gaussian random projections can still impose computational and memory overhead. This has motivated interest in structured sparse random projections, where the projection matrix is designed to be computationally lighter and more efficient.
In addition, recent graph learning research has shown the value of combining graph structure with node feature information. While structure-only embeddings are effective in many settings, feature-rich datasets may benefit significantly from hybrid methods. This project therefore investigates both structured sparse projections and a hybrid structure-feature projection strategy within the same FastRP-based pipeline.
Methodology
The proposed methodology studies how the design of the projection matrix affects graph embedding quality and downstream performance.
The overall workflow is as follows:
- Load the graph structure and, when available, node features.
- Construct a projection matrix using one of the following strategies:
- Gaussian Projection – dense random baseline.
- Striped Structured Projection – sparse structured matrix designed to reduce computational overhead.
- Hybrid Projection – combines structural embedding generation with node feature information.
- Generate node embeddings through iterative neighborhood aggregation.
- Use the embeddings for downstream graph learning tasks such as:
- Node classification
- Link prediction
- Evaluate performance using appropriate metrics.
The main objective is to determine whether structured sparse projections can preserve or improve the representational quality of dense Gaussian projections while reducing overhead. A secondary objective is to evaluate whether hybridizing structural and feature information leads to stronger predictive performance on feature-rich datasets.
Experiment Setup and Implementation
The implementation is carried out in Python using graph learning and scientific computing libraries. The project includes notebook-based experimentation and modular code for data loading, embedding generation, tuning, and downstream evaluation.
Datasets
The experiments are conducted on multiple benchmark datasets with different graph sizes and characteristics:
- BlogCatalog – social network dataset used for node classification
- Flickr – large social/content-sharing graph used for node classification
- Cora – citation network benchmark
- ogbn-arxiv – large-scale citation graph from the Open Graph Benchmark
- Additional citation-style datasets for link prediction evaluation
Experimental pipeline
The evaluation pipeline includes:
- Graph preprocessing and loading
- Sparse or dense projection matrix construction
- Embedding generation using FastRP-style propagation
- Hyperparameter tuning using Optuna
- Node classification using downstream classifiers
- Link prediction evaluation using train/validation/test edge splits and negative sampling
Tuned parameters
Key tunable parameters include:
- Embedding dimension
- Window size
- Normalization
- Structured grouping parameter
g
This setup enables fair comparison between the Gaussian baseline, striped structured projection, and hybrid projection strategies.
Results and Analysis
The experimental results show that the effectiveness of a projection method depends on both the dataset and the availability of node features.
1. Node classification on Cora
| Model | Score |
|---|---|
| Gaussian Baseline | 0.7232 |
| Striped Structured Projection | 0.7356 |
| Hybrid Projection | 0.7541 |
For Cora, both structured and hybrid approaches outperform the Gaussian baseline. The hybrid model achieves the highest score, showing that integrating feature information with structural propagation can improve embedding quality.
2. Large-scale node classification on ogbn-arxiv
| Model | Train Accuracy | Validation Accuracy | Test Accuracy |
|---|---|---|---|
| Gaussian Baseline | 0.5149 | 0.5080 | 0.5211 |
| Striped HRP | 0.5115 | 0.5063 | 0.5184 |
| Hybrid HRP | 0.7468 | 0.6970 | 0.6891 |
On ogbn-arxiv, the hybrid model clearly outperforms both the Gaussian and striped structure-only variants. This suggests that hybrid structure-feature embeddings are especially beneficial on large graphs with informative node attributes.
3. Flickr observation
| Model | Macro-F1 |
|---|---|
| Gaussian Baseline | 0.2030 |
| Striped Structured Projection | 0.1961 |
In this particular Flickr experiment, the structured sparse variant performed slightly below the Gaussian baseline. This indicates that structured sparsity is promising but can be sensitive to dataset properties and hyperparameter choices.
4. General observations
- The Hybrid projection shows the strongest overall performance in the current experiments.
- The Striped structured projection can be competitive and sometimes outperform the Gaussian baseline.
- The Gaussian projection remains a strong baseline for comparison.
- The results suggest that carefully designed hybrid structured projections can improve predictive quality while supporting scalability.
5. Link prediction
A complete link prediction pipeline was also implemented using edge splitting, negative sampling, and Optuna-based tuning. This extends the project beyond node classification and demonstrates the applicability of the proposed projection strategies across multiple graph learning tasks.
Conclusion
This project investigated whether structured and hybrid random projection strategies can improve the efficiency-performance trade-off in graph embedding methods. The findings show that projection design has a measurable effect on downstream graph learning quality.
The hybrid structure-feature projection consistently produced the strongest results in the current experiments, especially on feature-rich datasets such as Cora and ogbn-arxiv. Structured sparse projections also demonstrated promise as a scalable alternative to dense Gaussian projections, although their performance depends on the dataset and tuning configuration.
Overall, the project shows that hybrid structured random projections are a promising direction for scalable graph representation learning and provide a solid basis for future exploration in efficient graph embedding design.