Joint optimization framework using variational autoencoders to enhance dataset efficiency and query diversity for preference-based terrain cost learning in robot navigation
Research Project
Multi-institutional collaboration
Accepted at RO-MAN 2025
University of Denver
Jordan Sinclair, Elijah Alabi
DEVCOM Army Research Laboratory
Maggie Wigness, Brian Reily
The MITRE Corporation
Christopher Reardon
This research introduces a joint optimization framework that increases learning efficiency by improving both the diversity of the trajectory set and the query selection strategy. Using a variational autoencoder (VAE) to encode and group trajectories based on terrain characteristics, the system identifies underrepresented terrain types and employs cluster-aware query selection to maximize information gain from human preferences.
Effective robot navigation in real-world environments requires understanding terrain properties, as different terrain types impact factors such as speed, safety, and wear on the platform. While preference-based learning offers a compelling framework for inferring terrain costs through simple trajectory queries, existing approaches face significant efficiency challenges.
Preference-based inverse reinforcement learning framework for terrain cost learning
Our approach introduces a joint optimization framework that uses a variational autoencoder (VAE) to augment the query set and enhance query selection strategy to maximize information gain and improve learning efficiency. The system integrates with APReL (Active Preference Learning library) and extends mutual information-based query acquisition by constraining the sample space and expanding the dataset.
VAE architecture used for trajectory encoding and dataset augmentation
Both encoder and decoder constructed from Long Short Term Memory (LSTM) layers with linear layers for appropriate output representations. Promotes pattern recognition over potentially long terrain sequences with latent dimensions proportional to number of unique terrain types.
Uses κ-means clustering where κ = |Γ| (number of terrain types) to create one-to-one mapping with terrain types. Clusters maintain semantic relationships and identify underrepresented terrain types through cluster size analysis for targeted dataset augmentation.
Builds on mutual information-based acquisition function by limiting sample space to trajectory pairs from distinct clusters. Maximizes contrast between trajectory pairs, reducing query ambiguity and ensuring higher information gain per query.
Uses graph-based path planner to generate new trajectories for underrepresented clusters. Subdivides map into regions, identifies areas with highest concentration of target terrain types, and maintains consistent trajectory structures across augmented samples.
Baseline: Mutual Information querying (sand & trees converge to ~0)
Our method: VAE querying (all terrains converge to correct values)
The key breakthrough was addressing terrain representation imbalance. While mutual information querying resulted in less represented terrain types (sand, trees) converging to approximately zero—indicating they were indistinguishable—our VAE-based approach enabled these terrain weights to converge to representative values in the correct ground truth order. This demonstrates the method's superior ability to maintain information diversity and avoid catastrophic information loss during preference learning.
Performance comparison: Our method vs. mutual information, volume removal, Thompson sampling, and disagreement selection
Statistical validation: 11 convergence trials (8/11 wins, 1 tie)
Alignment analysis: Consistent faster ground truth alignment
VariQuery (our method name) was tested with and without the data enhancement stage to validate both components of our joint optimization framework. The ablation study confirmed that adding targeted trajectory samples to balance latent space clusters significantly improved convergence time.
VariQuery ablation study: Validating the importance of data enhancement in our joint optimization framework
This work has been accepted for publication at the 2025 IEEE International Conference on Robot and Human Interactive Communication (RO-MAN 2025), a premier venue for research in human-robot interaction and autonomous systems. The research was initially presented at the InterAI workshop and in a late breaking report session at RO-MAN 2024.
Conference: IEEE RO-MAN 2025
Status: Accepted for Publication
Research Area: Preference Learning, VAE, Robot Navigation
DOI: To be assigned
Previous Presentations:
• RO-MAN 2024 InterAI Workshop
• RO-MAN 2024 Late Breaking Report Session
Impact: Statistically significant improvements
This work was supported by Army Research Office (ARO) under grantW911NF-22-2-0238, demonstrating the military and civilian applications of enhanced preference-based learning for autonomous navigation systems.
This research establishes a foundational framework for improving preference-based learning efficiency across diverse robotic applications. The joint optimization approach—combining dataset enhancement with intelligent query selection—has broad implications for human-robot interaction and autonomous system development.
This work addresses a fundamental challenge in preference-based inverse reinforcement learning: how to learn efficiently from limited human feedback. By achieving 40% faster convergence with statistical significance, the methodology reduces the cognitive burden on human experts while improving learning outcomes—a critical advancement for practical human-robot collaboration.
Army Research Office funding highlights mission-critical applications including:
The methodology extends to numerous civilian applications:
Building on this terrain cost learning foundation, current research explores robot aesthetic preference learning using images. This extension applies VAE-based encoding to visual preferences, modeling aesthetic decisions through APReL framework. While the current implementation focuses on core VAE functionality, future iterations will integrate the data enhancement and clustering strategies proven effective in this terrain research.
Combining terrain, visual, and tactile preferences for comprehensive environmental understanding
Online learning systems that adapt VAE representations as new terrain types are encountered
Applying learned preferences across different environments and robot platforms
This research contributes to the broader goal of human-aligned autonomous systems that can efficiently learn and adapt to human preferences across diverse domains. By demonstrating statistically significant improvements in preference learning efficiency, this work paves the way for more practical human-robot collaboration in real-world applications where expert time is limited and accuracy is critical.