Predictive Link Stability and Trust-Aware Routing in FANETs

1. Abstract

Flying Ad Hoc Networks (FANETs), composed of Unmanned Aerial Vehicles (UAVs), are increasingly critical for mission-oriented applications ranging from disaster response to intelligent surveillance. However, their high-speed three-dimensional mobility, frequent topology changes, and exposure to adversarial threats pose significant challenges for reliable and secure routing. Existing approaches largely treat link stability and trust evaluation as independent problems, limiting their effectiveness in dynamic aerial environments. This paper proposes a unified predictive link stability and trust-aware routing framework, termed QBR, which integrates cross-layer link metrics, a composite trust model, and distributional reinforcement learning through a Quantile Regression Deep Q-Network (QR-DQN). The composite trust model combines direct, indirect, and path-level trust components, while the multi-metric state representation incorporates SNR, ETX, delay, jitter, and node mobility parameters. Control overhead is mitigated through Multipoint Relay optimization, message piggybacking, and threshold-based suppression mechanisms. Simulation results demonstrate that QBR consistently outperforms OLSR and AODV in Packet Delivery Ratio, end-to-end delay, jitter, and scalability, while exhibiting improved resilience under adversarial conditions. The findings confirm that jointly optimizing link quality and trust through uncertainty-aware learning yields more stable and secure routing in highly dynamic FANET environments.

2. Related Works

Early FANET routing protocols adapted from MANETs, such as AODV and GPSR, suffered from route instability and high latency. Later protocols focused on link stability by incorporating metrics like signal strength, delay, and traffic load. Deterministic approaches, however, could not fully manage UAV mobility, prompting the development of algorithmic optimizations using fuzzy logic, metaheuristics, and predictive learning to improve reliability and energy efficiency. Trust and security also became critical considerations, as UAVs operate in open or adversarial environments. Initial models used direct node behavior observations, while later approaches integrated fuzzy logic, blockchain, and AI-based methods for dynamic trust evaluation and malicious node isolation. Blockchain-based trust models offered tamper-resistant evaluation but introduced computational overhead unsuitable for energy-constrained UAVs. AI and machine learning, particularly reinforcement learning, have transformed FANET routing by enabling predictive and adaptive strategies for both link stability and trust. Cluster-based and topology-aware RL models improve reliability under dynamic conditions. However, research gaps remain, including limited integration of predictive link stability with real-time trust, mobility-adaptive trust assessment, and lightweight models suitable for UAV resource constraints. The proposed QBR framework directly addresses these gaps through a unified, distributional learning-based architecture.

3. Methodology

Baseline Protocol and Design Motivation

The proposed framework was benchmarked against OLSR and AODV to establish a performance reference. OLSR operates proactively through Multipoint Relays but lacks predictive capabilities and trust awareness. AODV establishes routes reactively on demand, making it susceptible to frequent route discovery overhead under high mobility. Neither protocol incorporates link quality prediction or trust-aware decision-making, reflecting the core limitations motivating the development of QBR.

Reinforcement Learning-Based Routing Framework

The routing problem is formulated as a Markov Decision Process and solved using reinforcement learning. The state space represents current network conditions including link quality metrics and trust values. The action space corresponds to next-hop node selection, and the reward signal provides feedback based on routing performance and link reliability. A standard Q-learning model was initially implemented before being extended to a Quantile Regression Deep Q-Network (QR-DQN), which models the full distribution of expected returns rather than a scalar estimate. This enables uncertainty-aware, risk-sensitive routing decisions under stochastic link variations and mobility-induced unpredictability.

Link Stability Modeling and Metric Selection

Routing decisions are based on a comprehensive multi-metric evaluation framework. The incorporated metrics include Packet Delivery Ratio, End-to-End Delay, Jitter, Throughput, Expected Transmission Count (ETX), Signal-to-Noise Ratio (SNR), and node mobility parameters (position and velocity). A key contribution is the cross-layer integration of SNR from the data link layer into the network layer routing process, enabling more accurate link quality estimation and anticipatory decision-making before link degradation manifests as packet loss.

Reward Function Design

The reward function is formulated as a multi-objective optimization function balancing reliability, latency, stability, and security. Positive rewards are assigned to routing decisions selecting stable, high-quality, and trustworthy links, while penalties are applied for link failures, packet drops, selection of low-trust nodes, and paths with high energy consumption. Link Expiration Time is included as a reward component, enabling the agent to prefer routes with longer expected lifetimes and reducing the frequency of route failures.

Trust-Aware Routing Mechanism

A composite trust model is incorporated into the routing framework. The trust value of each node is computed as a weighted combination of direct trust derived from first-hand behavioral observations, indirect trust obtained from neighboring node recommendations, and path trust representing cumulative trustworthiness along an entire routing path. The trust score is integrated into both the state space and reward function of QR-DQN. A threshold-based filtering mechanism progressively excludes nodes below a predefined trust level from routing candidates, producing a self-correcting behavior where malicious or non-cooperative nodes are gradually isolated without centralized coordination.

Control Overhead Optimization

To address resource constraints, control overhead is minimized through Multipoint Relay optimization for efficient topology dissemination, piggybacking of neighbor and trust information within Topology Control messages, and HELLO message suppression to eliminate redundant periodic transmissions. A threshold-based control mechanism triggers updates only when meaningful changes in network state are detected, ensuring overhead does not scale linearly with network size.

4. Results and Analysis

Reliability — Packet Delivery Ratio

QBR maintains PDR values predominantly within the range of approximately 0.88 to 0.96 across the simulation duration, whereas OLSR fluctuates between 0.72 and 0.88, and AODV ranges between 0.60 and 0.79. QBR exhibits a smoother trajectory with limited fluctuations, demonstrating that the QR-DQN model effectively stabilizes routing decisions over time by anticipating link degradation rather than reacting to failures after they occur.

Latency — End-to-End Delay

Following the initial convergence phase, QBR stabilizes and maintains delay values in the range of approximately 0.0023–0.0024 seconds, consistently performing better than OLSR and comparably to AODV after stabilization. The predictive nature of QR-DQN reduces retransmissions and route rediscovery overhead, which are major contributors to delay in conventional protocols.

Scalability under Increasing Node Density

QBR demonstrates a monotonic upward trend in PDR as node density increases, effectively leveraging greater path diversity through intelligent routing decisions. In contrast, both OLSR and AODV exhibit downward trends, with AODV performing worst at higher densities due to frequent route discovery overhead and link breakages. The performance gap between QBR and baseline protocols widens as the number of nodes increases, confirming scalability advantages.

Trust-Aware Routing under Adversarial Conditions

In the presence of malicious nodes, OLSR experiences noticeable degradation in PDR and increased delay due to its inability to differentiate cooperative from non-cooperative nodes. QBR maintains relatively stable performance by integrating trust into both the state representation and reward function, ensuring routing decisions inherently penalize nodes with declining trust scores. The system exhibits self-correcting behavior where malicious nodes are progressively isolated without explicit blacklisting.

5. Conclusion

This paper presented QBR, a unified predictive link stability and trust-aware routing framework for FANETs, integrating cross-layer link metrics, a composite trust model, and distributional reinforcement learning via Quantile Regression Deep Q-Network. By transitioning from reactive routing to an adaptive, uncertainty-aware decision-making paradigm, the proposed framework jointly optimizes link quality and node reliability within a single cohesive architecture.

Simulation results confirmed consistent improvements over baseline protocols across all evaluated performance dimensions. QBR achieved higher Packet Delivery Ratio, faster latency convergence, and more stable delay performance compared to both OLSR and AODV. Scalability analysis demonstrated that QBR effectively leverages increased network density through intelligent routing decisions, while threshold-based trust filtering enabled reliable isolation of malicious nodes without explicit blacklisting, reflecting a self-correcting routing behavior that conventional protocols lack. Control overhead optimizations, including MPR refinement, message piggybacking, and HELLO suppression, ensured that these gains were achieved without disproportionate communication costs.

It is important to acknowledge that all evaluations were conducted under the Gauss–Markov mobility model within a controlled simulation environment, and results should be interpreted carefully when generalizing to real-world deployments involving hardware constraints, channel fading, and heterogeneous node behavior. Future work will focus on validating the framework under diverse mobility patterns and on physical UAV platforms, automating trust parameter tuning through adaptive learning techniques, and strengthening resilience against collusion-based adversarial behaviors. These directions collectively represent the pathway toward practical, secure, and reliable FANET communication for next-generation autonomous UAV applications.