IMDP Modeling
Model the robotic system as an Interval MDP where each transition probability is bounded by an interval, capturing sensing noise and model inaccuracies without requiring a single precise estimate.
Research Paper -Preprint
arXiv:2510.03481 · Robotics & Formal Methods
University of Virginia · University of Oxford
Encoding 1
Encoding 2
Overview
Robots operating in the real world face sensing noise, actuation imprecision, and modeling errors. Standard Markov Decision Processes (MDPs) assume exact transition probabilities, which rarely hold in practice. Interval MDPs (IMDPs) capture this epistemic uncertainty by allowing transition probabilities to vary within intervals -but existing synthesis methods either produce a single rigid policy, or assume exact models without accounting for uncertainty.
We present the first framework that unifies robust synthesis (correctness under all admissible transitions) with permissive synthesis (retaining multiple admissible actions per state for runtime flexibility). The synthesis problem is cast as a Mixed-Integer Linear Program (MILP) that directly encodes robust Bellman constraints and maximizes a quantitative permissiveness metric: the number of enabled state-action pairs.
We develop two MILP encodings -vertex enumeration and a compact dualization-based formulation -and evaluate them on four robotic benchmark domains. The dualization encoding scales linearly in the number of successors, enabling synthesis over IMDPs with hundreds of thousands of states while preserving near-optimal permissiveness and formal safety guarantees.
Method
Model the robotic system as an Interval MDP where each transition probability is bounded by an interval, capturing sensing noise and model inaccuracies without requiring a single precise estimate.
Cast robust permissive synthesis as a global Mixed-Integer Linear Program. Binary variables encode which actions are admitted per state; continuous variables encode worst-case reachability or reward values via robust Bellman constraints.
Replace exponential vertex enumeration of uncertainty polytopes with LP duality. Dual variables directly encode the adversarial transition choice, yielding a compact formulation that grows linearly with the number of successor states.
Parse IMDP models and specifications via PRISM and solve the MILP with Gurobi. The optimal solution is a maximally permissive multi-strategy guaranteed to satisfy the specification under all admissible transitions.
Results
Synthesizes robust multi-strategies for IMDPs with hundreds of thousands of states across all four benchmark domains: aircraft collision avoidance, semi-autonomous vehicle control, obstacle navigation, and warehouse navigation.
The dualization-based encoding (Encoding 2) reduces MILP constraint count from exponential in successor states to linear, significantly outperforming vertex enumeration on larger instances while producing identical multi-strategies.
Synthesized multi-strategies achieve near-optimal permissiveness -maximizing the number of allowed state-action pairs -while providing worst-case safety guarantees for both probabilistic reachability and expected reward specifications.
Resources
Citation
@article{huynh2025imdp,
title = {Optimization-Based Robust Permissive Synthesis for Interval {MDPs}},
author = {Huynh, Khang Vo and Parker, David and Feng, Lu},
journal = {arXiv preprint arXiv:2510.03481},
year = {2025}
}