Research Paper -Preprint

Optimization-Based Robust Permissive Synthesis for Interval MDPs

arXiv:2510.03481 · Robotics & Formal Methods

Khang Vo Huynh David Parker Lu Feng

University of Virginia · University of Oxford

Encoding 1

Vertex Enumeration

\[\begin{aligned} \operatorname{maximize} \quad & \textstyle\sum_{s \in S} \sum_{a \in \alpha(s)} y_{s,a} \\[6pt] \text{s.t.}\;\; \forall s \in S\text{:} \quad & \textstyle\sum_{a \in \alpha(s)} y_{s,a} \geq 1 \\[2pt] & x_{s_0} \geq p \\[2pt] \forall s \in T\text{:} \quad & x_s = 1 \\[4pt] \forall s \notin T,\, a,\, v \in \mathcal{V}(s,a)\text{:} & \\[-2pt] & x_s \geq \textstyle\sum_{s'} P^{(v)}(s,a,s') \cdot x_{s'} + M(1 - y_{s,a}) \end{aligned}\]

Encoding 2

Dualization-Based

\[\begin{aligned} \operatorname{maximize} \quad & \textstyle\sum_{s \in S} \sum_{a \in \alpha(s)} y_{s,a} \\[6pt] \text{s.t.}\;\; \forall s \in S\text{:} \quad & \textstyle\sum_{a \in \alpha(s)} y_{s,a} \geq 1 \\[2pt] & x_{s_0} \geq p \\[2pt] \forall s \in T\text{:} \quad & x_s = 1 \\[2pt] \forall (s,a,s')\text{:} \quad & \delta_{s,a} - \hat{u}_{s,a,s'} + u_{s,a,s'} \geq x_{s'} \\[2pt] \forall (s,a)\text{:} \quad & x_s \leq M(1 - y_{s,a}) + \delta_{s,a} \\ & \;\;\; + \textstyle\sum_{s'} \bigl[\overline{P}(s,a,s')\, u_{s,a,s'} - \underline{P}(s,a,s')\, \hat{u}_{s,a,s'}\bigr] \end{aligned}\]

Overview

Robust and permissive controller synthesis under transition uncertainty

Robots operating in the real world face sensing noise, actuation imprecision, and modeling errors. Standard Markov Decision Processes (MDPs) assume exact transition probabilities, which rarely hold in practice. Interval MDPs (IMDPs) capture this epistemic uncertainty by allowing transition probabilities to vary within intervals -but existing synthesis methods either produce a single rigid policy, or assume exact models without accounting for uncertainty.

We present the first framework that unifies robust synthesis (correctness under all admissible transitions) with permissive synthesis (retaining multiple admissible actions per state for runtime flexibility). The synthesis problem is cast as a Mixed-Integer Linear Program (MILP) that directly encodes robust Bellman constraints and maximizes a quantitative permissiveness metric: the number of enabled state-action pairs.

We develop two MILP encodings -vertex enumeration and a compact dualization-based formulation -and evaluate them on four robotic benchmark domains. The dualization encoding scales linearly in the number of successors, enabling synthesis over IMDPs with hundreds of thousands of states while preserving near-optimal permissiveness and formal safety guarantees.

Method

From IMDP model to maximally permissive, robust multi-strategy

01

IMDP Modeling

Model the robotic system as an Interval MDP where each transition probability is bounded by an interval, capturing sensing noise and model inaccuracies without requiring a single precise estimate.

02

MILP Formulation

Cast robust permissive synthesis as a global Mixed-Integer Linear Program. Binary variables encode which actions are admitted per state; continuous variables encode worst-case reachability or reward values via robust Bellman constraints.

03

Dualization Encoding

Replace exponential vertex enumeration of uncertainty polytopes with LP duality. Dual variables directly encode the adversarial transition choice, yielding a compact formulation that grows linearly with the number of successor states.

04

Solve & Synthesize

Parse IMDP models and specifications via PRISM and solve the MILP with Gurobi. The optimal solution is a maximally permissive multi-strategy guaranteed to satisfy the specification under all admissible transitions.

Results

Key findings across four robotic benchmarks

Scalability

Synthesizes robust multi-strategies for IMDPs with hundreds of thousands of states across all four benchmark domains: aircraft collision avoidance, semi-autonomous vehicle control, obstacle navigation, and warehouse navigation.

Linear Encoding

The dualization-based encoding (Encoding 2) reduces MILP constraint count from exponential in successor states to linear, significantly outperforming vertex enumeration on larger instances while producing identical multi-strategies.

Robust Permissiveness

Synthesized multi-strategies achieve near-optimal permissiveness -maximizing the number of allowed state-action pairs -while providing worst-case safety guarantees for both probabilistic reachability and expected reward specifications.

Resources

Project materials

Citation

BibTeX

@article{huynh2025imdp,
  title   = {Optimization-Based Robust Permissive Synthesis for Interval {MDPs}},
  author  = {Huynh, Khang Vo and Parker, David and Feng, Lu},
  journal = {arXiv preprint arXiv:2510.03481},
  year    = {2025}
}