Publications - Hansong Zhou

2025

Behavior-Adaptive Q-Learning: A Unifying Framework for Offline-to-Online RL

Lipeng Zu, Hansong Zhou, Xiaonan Zhang

arXiv:2511.03695 [cs.LG]

Full abstract

Offline reinforcement learning (RL) enables training from fixed data without online interaction, but policies learned offline often struggle when deployed in dynamic environments due to distributional shift and unreliable value estimates on unseen state-action pairs. We introduce Behavior-Adaptive Q-Learning (BAQ), a framework designed to enable a smooth and reliable transition from offline to online RL. The key idea is to leverage an implicit behavioral model derived from offline data to provide a behavior-consistency signal during online fine-tuning. BAQ incorporates a dual-objective loss that (i) aligns the online policy toward the offline behavior when uncertainty is high, and (ii) gradually relaxes this constraint as more confident online experience is accumulated. This adaptive mechanism reduces error propagation from out-of-distribution estimates, stabilizes early online updates, and accelerates adaptation to new scenarios. Across standard benchmarks, BAQ consistently outperforms prior offline-to-online RL approaches, achieving faster recovery, improved robustness, and higher overall performance. Our results demonstrate that implicit behavior adaptation is a principled and practical solution for reliable real-world policy deployment.

Behavior-Adaptive Q-Learning: A Unifying Framework for Offline-to-Online RL

Lipeng Zu, Hansong Zhou, Xiaonan Zhang

arXiv:2511.03695 [cs.LG]

Full abstract

Enhancing Q-Value Updates in Deep Q-Learning via Successor-State Prediction

Lipeng Zu, Hansong Zhou, Xiaonan Zhang

arXiv:2511.03836 [cs.LG]

Full abstract

Deep Q-Networks (DQNs) estimate future returns by learning from transitions sampled from a replay buffer. However, the target updates in DQN often rely on next states generated by actions from past, potentially suboptimal, policy. As a result, these states may not provide informative learning signals, causing high variance into the update process. This issue is exacerbated when the sampled transitions are poorly aligned with the agent's current policy. To address this limitation, we propose the Successor-state Aggregation Deep Q-Network (SADQ), which explicitly models environment dynamics using a stochastic transition model. SADQ integrates successor-state distributions into the Q-value estimation process, enabling more stable and policy-aligned value updates. Additionally, it explores a more efficient action selection strategy with the modeled transition structure. We provide theoretical guarantees that SADQ maintains unbiased value estimates while reducing training variance. Our extensive empirical results across standard RL benchmarks and real-world vector-based control tasks demonstrate that SADQ consistently outperforms DQN variants in both stability and learning efficiency.

Enhancing Q-Value Updates in Deep Q-Learning via Successor-State Prediction

Lipeng Zu, Hansong Zhou, Xiaonan Zhang

arXiv:2511.03836 [cs.LG]

Full abstract

From Static Constraints to Dynamic Adaptation: Enabling Safe Constraint Release in Offline-to-Online Reinforcement Learning

Submitted to ICLR '26: International Conference on Learning Representations

Full abstract

Transitioning from offline to online reinforcement learning (RL) is challenging because the conservative objectives used by offline algorithms must be gradually released during fine-tuning. Naively removing these objectives often destabilizes training, while keeping them uniformly suppresses adaptation. To address this problem, we propose the Dynamic Alignment for Release, a method that estimates the alignment of each sample with the offline policy and uses this signal to decide where conservative objectives should be released during fine-tuning. DARE first trains a state-conditional diffusion model to capture the offline behavioral distribution and incorporates Q-based energy guidance to improve the quality of this distribution modeling. For each sample, the KL divergence between the generated and actual actions is computed, and fits their distribution with a Gaussian to determine a dynamic exchange threshold. This enables selectively relaxing conservative objectives by partitioning each batch into offline-like and online-like subsets, which are then optimized with offline and online objectives respectively. We integrate DARE into Cal-QL and IQL and evaluate it on the D4RL benchmarks. Theoretical analysis proves that DARE contracts the offline/online discrepancy while keeping value estimation stable, and empirical evaluations on the D4RL benchmark show that DARE consistently improves both adaptability and training stability. (Code will be released upon acceptance.)

From Static Constraints to Dynamic Adaptation: Enabling Safe Constraint Release in Offline-to-Online Reinforcement Learning

Submitted to ICLR '26: International Conference on Learning Representations

Full abstract

Fairness-Oracular MARL with Competitor-Aware Signals for Collaborative Inference

Hansong Zhou, Xiaonan Zhang

NeurIPS '25 - AI4NextG: The Thirty-Ninth Annual Conference on Neural Information Processing Systems

Full abstract

Collaborative inference (CI) in NextG networks enables battery-powered devices to collaborate with nearby edges on deep learning inference. The fairness issue in a multi-device multi-edge (M2M) CI system remains underexplored. Mean-field multi-agent reinforcement learning (MFRL) is a promising solution for its low complexity and adaptability to system dynamics. However, the mobility nature in M2M CI systems hinders their effectiveness, as it breaks the premise of stable mean-field statistics. We propose FOCI (Fairness-Oriented Collaborative Inference), an RL-based method with two components: (i) an oracle-shaping reward for approaching max-min fairness, and (ii) a competitor-aware observation augmentation for stabilizing device behaviors. We provide a convergence guarantee with bounded estimation errors. According to the results from real-world devices mobility traces, FOCI shows the best performance on multiple metrics and tightens the tails. It reduces worst-case latency by up to 56\% and worst-case energy by 46\% versus baselines, while halving the switch cost and preserving competitive QoS.

Fairness-Oracular MARL with Competitor-Aware Signals for Collaborative Inference

Hansong Zhou, Xiaonan Zhang

NeurIPS '25 - AI4NextG: The Thirty-Ninth Annual Conference on Neural Information Processing Systems

Full abstract

Similarity-Guided Rapid Deployment of Federated Intelligence Over Heterogeneous Edge Computing

Hansong Zhou, Jingjing Fu, Yukun Yuan, Linke Guo, Xiaonan Zhang

INFOCOM '25: IEEE Conference on Computer Communications

Full abstract

Edge computing is envisioned to enable rapid federated intelligence on edge devices to satisfy their dynamically changing AI service demands. Semi-Asynchronous FL (Semi-Async FL) enables distributed learning in an asynchronous manner, where the server does not have to wait all local models for improving the global model. Hence, it takes a small time to well-train a global model. However, system heterogeneity in edge computing results in staleness issue, which will deteriorate training accuracy. In this paper, we propose to accelerate Semi-Async FL while ensuring training accuracy by designing a Similarity-Aware Aggregation (SAA) strategy. SAA is able to enhance the aggregation quality and thus decrease the wall-clock time, the training time for a certain accuracy. Particularly, we leverage the global model similarity to describe the local model influence and let those with higher influence contribute more to global aggregation. We further measure the similarity between global model update deviations as directional similarity, which is then used for determining aggregation timing. We theoretically provide a convergence analysis to SAA. Our extensive experimental results empirically show that the proposed SAA strategy reduces up to 53.7% wall-clock time and 59.4% wall-clock round for Semi-Async FL compared with several benchmark schemes.

[Paper]

Similarity-Guided Rapid Deployment of Federated Intelligence Over Heterogeneous Edge Computing

Hansong Zhou, Jingjing Fu, Yukun Yuan, Linke Guo, Xiaonan Zhang

INFOCOM '25: IEEE Conference on Computer Communications

Full abstract

[Paper]

Non-Intrusive Speaker Diarization via mmWave Sensing

Shaoying Wang, Hansong Zhou, Yukun Yuan, Xiaonan Zhang

Sensys '25: Proceedings of the 23rd ACM Conference on Embedded Networked Sensor Systems (Poster)

Full abstract

Speaker diarization refers to identifying who speaks what in a conversation. It is critical in sensitive settings like psychological counseling and legal consultations. However, traditional approaches, such as microphone or video, raise privacy concerns and cause discomfort to participants due to their noticeable deployment. To address this, we propose a non-intrusive speaker diarization system via mmWave sensing. Our approach leverages the spatial diversity of signals from multiple objects to distinguish speakers. Specifically, it isolates speech-induced vibrating objects signals and extracts speaker-related features through a two-stage feature extraction process. Our system achieves over 93% accuracy in real-world scenarios, demonstrating its effectiveness in reliably distinguishing speakers.

[Paper]

Non-Intrusive Speaker Diarization via mmWave Sensing

Shaoying Wang, Hansong Zhou, Yukun Yuan, Xiaonan Zhang

Sensys '25: Proceedings of the 23rd ACM Conference on Embedded Networked Sensor Systems (Poster)

Full abstract

[Paper]

2024

Fedar: Addressing client unavailability in federated learning with local update approximation and rectification

Chutian Jiang, Hansong Zhou, Xiaonan Zhang, Shayok Chakraborty

ECML PKDD '24: Joint European Conference on Machine Learning and Knowledge Discovery in Databases

Full abstract

Federated learning (FL) enables clients to collaboratively train machine learning models under the coordination of a server in a privacy-preserving manner. One of the main challenges in FL is that the server may not receive local updates from each client in each round due to client resource limitations and intermittent network connectivity. The existence of unavailable clients severely deteriorates the overall FL performance. In this paper, we propose FedAR, a novel client update Approximation and Rectification algorithm for FL to address the client unavailability issue. FedAR can get all clients involved in the global model update to achieve a high-quality global model on the server, which also furnishes accurate predictions for each client. To this end, the server uses the latest update from each client as a surrogate for its current update. It then assigns a different weight to each client’s surrogate update to derive the global model, in order to guarantee contributions from both available and unavailable clients. Our theoretical analysis proves that FedAR achieves optimal convergence rates on non-IID datasets for both convex and non-convex smooth loss functions. Extensive empirical studies show that FedAR comprehensively outperforms state-of-the-art FL baselines including FedAvg, MIFA, FedVARP and Scaffold in terms of the training loss, test accuracy, and bias mitigation. Moreover, FedAR also depicts impressive performance in the presence of a large number of clients with severe client unavailability.

[paper]

Fedar: Addressing client unavailability in federated learning with local update approximation and rectification

Chutian Jiang, Hansong Zhou, Xiaonan Zhang, Shayok Chakraborty

ECML PKDD '24: Joint European Conference on Machine Learning and Knowledge Discovery in Databases

Full abstract

[paper]

2023

Waste not, want not: service migration-assisted federated intelligence for multi-modality mobile edge computing

Hansong Zhou, Shaoying Wang, Chutian Jiang, Linke Guo, Yukun Yuan, Xiaonan Zhang

MobiHoc '23: Proceedings of the Twenty-fourth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing

Full abstract

Future mobile edge computing (MEC) is envisioned to provide federated intelligence to delay-sensitive learning tasks with multimodal data. Conventional horizontal federated learning (FL) suffers from high resource demand in response to complicated multi-modal models. Multi-modal FL (MFL), on the other hand, offers a more efficient approach for learning from multi-modal data. In MFL, the entire multi-modal model is split into several sub-models with each tailored to a specific data modality and trained on a designated edge. As sub-models are considerably smaller than the multi-modal model, MFL requires fewer computation resources and reduces communication time. Nevertheless, deploying MFL over MEC faces the challenges of device mobility and edge heterogeneity, which, if not addressed, could negatively impact MFL performance. In this paper, we investigate an Service Migration-assisted Mobile Multi-modal Federated Learning (SM3FL) framework, where the service migration for sub-models between edges is enabled. To effectively utilize both communication and computation resources without extravagance in SM3FL, we develop the optimal strategies of service migration and data sample collection to minimize the wall-clock time, defined as the required training time to reach the learning target. Our experiment results show that the proposed SM3FL framework demonstrates remarkable performance, surpassing other state-of-art FL frameworks via substantially reducing the computing demand by 17.5% and dramatically decreasing the wall-clock time by 25.3%.

[Paper]

Waste not, want not: service migration-assisted federated intelligence for multi-modality mobile edge computing

Hansong Zhou, Shaoying Wang, Chutian Jiang, Linke Guo, Yukun Yuan, Xiaonan Zhang

MobiHoc '23: Proceedings of the Twenty-fourth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing

Full abstract

[Paper]

2022

DQN-based QoE Enhancement for Data Collection in Heterogeneous IoT Network

Hansong Zhou, Sihan Yu, Linke Guo, Beatriz Lorenzo, Xiaonan Zhang

MASS '22: IEEE 19th International Conference on Mobile Ad Hoc and Smart Systems

Full abstract

Sensing data collection from the Internet of Things (IoT) devices lays the foundation to support massive IoT applications, such as patient monitoring in smart health and intelligent control in smart manufacturing. Unfortunately, the heterogeneity of IoT devices and dynamic environments result in not only the life-cycle latency but also data collection failures, affecting the quality of experience (QoE) for all the users. In this paper, we propose a recovery mechanism with a dynamic data contamination method to handle the failure. To further enhance the long-term overall QoE, we allocate the spectrum resources and make contamination decisions for each device using a deep reinforcement learning method. Particularly, a lightweight decentralized State-sharing Deep-Recurrent Q-Network (SDRQN) is proposed to find the optimal collection policies. Our simulation results indicate that the recurrent unit in SDRQN gives rise to 10% lower waiting time and 60% lower task drop rate than the fully-connected design. Compared to a centralized DQN scheme, SDRQN achieves a similar ultra-low drop rate of 0.29% but requires only 1% GPU memory, demonstrating the effectiveness of SDRQN in the large-scale heterogeneous IoT network.

[Paper]

DQN-based QoE Enhancement for Data Collection in Heterogeneous IoT Network

Hansong Zhou, Sihan Yu, Linke Guo, Beatriz Lorenzo, Xiaonan Zhang

MASS '22: IEEE 19th International Conference on Mobile Ad Hoc and Smart Systems

Full abstract

[Paper]

Signal emulation attack and defense for smart home IoT

Xiaonan Zhan, Sihan Yu, Hansong Zhou, Pei Huang, Linke Guo, Ming Li

IEEE Transactions on Dependable and Secure Computing (TDSC) 2022

Full abstract

Internet of Things (IoT) is transforming every corner of our daily life and plays important roles in the smart home. Depending on different requirements on wireless transmission, dedicated wireless protocols have been adopted on various types of IoT devices. Recent advances in Cross-Technology Communication (CTC) enable direct communication across those wireless protocols, which will greatly improve the spectrum utilization efficiency. However, it incurs serious security concerns on heterogeneous IoT devices. In this paper, we identify a new physical-layer attack, cross-technology signal emulation attack, where a WiFi device eavesdrops a ZigBee packet on the fly, and further manipulates the ZigBee device by emulating a ZigBee signal. To defend against this attack, we propose two defense strategies with the help of an anchor. Particularly, the passive defense strategy focuses on misleading the ZigBee signal eavesdropping, while the proactive approach develops a real-time detection mechanism on distinguishing between a common ZigBee signal and an emulated signal. We implement the complete attacking process and defense strategies with TI CC26x2R LaunchPad, USRP-N210 platform, and smart LED light bulbs, as well as a self-designed prototype, where a general light bulb can be turned on/off by a Nexus 5 smartphone directly. Extensive experiments have demonstrated the existence of the attack, and the feasibility, effectiveness, and accuracy of the proposed defense strategies.

[Paper]

Signal emulation attack and defense for smart home IoT

Xiaonan Zhan, Sihan Yu, Hansong Zhou, Pei Huang, Linke Guo, Ming Li

IEEE Transactions on Dependable and Secure Computing (TDSC) 2022

Full abstract

[Paper]