LTLf Property Templates
Define 10 reusable safety property templates across 8 manipulation categories using LTLf. Templates are parameterized by task-specific objects, fixtures, and predicates, enabling reuse across diverse tasks and environments.
Research Paper -Preprint
arXiv:2605.12386 · Robotics & AI Safety
* Equal contribution
Georgia Institute of Technology · University of Virginia
LTLf Property Suite
φ1 -Collision & Contact
Avoid unintended contact with objects, fixtures, and surfaces throughout execution.
φ2 -Grasp Stability
Maintain a stable grip from the moment of grasp through intentional release.
φ3 -Release Stability
Every release must eventually reach a settled, supported resting state.
φ4 -Cross-Contamination
No clean-surface contact after contamination until a sanitization step completes.
φ5 -Action-Onset
Initiate each skill only when task-specific preconditions are verified safe.
φ6 -Mechanism
After hitting an obstacle on a fixture, retract and return to a known safe state.
φ7 -Containment
Transferred objects or liquids must end up fully inside the intended receiver.
φ8-10 -Enclosure & Access
Enforce clearing, opening, and placement sequencing within enclosed fixtures.
Overview
Robotic manipulation is typically evaluated by whether a task is completed, but task success does not guarantee safe execution. Many safety failures are temporal: a robot may touch a clean surface only after contamination, or release an object before it is fully inside an enclosure. These failures are invisible to success-rate metrics yet critical for real-world deployment.
We introduce SafeManip, a property-driven benchmark that makes temporal safety formally specified, reusable, and checkable. It defines 10 safety property templates across 8 manipulation categories using Linear Temporal Logic over finite traces (LTLf). Given a rollout, SafeManip converts observations to a symbolic predicate trace and evaluates each formula with an online DFA monitor -producing property-level verdicts independent of task outcome.
We evaluate six vision-language-action (VLA) policies -π0, π0.5, GR00T N1.5, and three training variants -across 50 RoboCasa365 household tasks. Results show that even state-of-the-art models frequently violate temporal safety properties, and that training improvements increasing task success do not reliably produce safer execution.
Method
Define 10 reusable safety property templates across 8 manipulation categories using LTLf. Templates are parameterized by task-specific objects, fixtures, and predicates, enabling reuse across diverse tasks and environments.
Run six VLA policies (π0, π0.5, GR00T N1.5, and training variants) on 50 RoboCasa365 household tasks spanning seven manipulation task suites, collecting full execution trajectories with simulator state access.
At each timestep, query object poses, contact events, gripper state, and fixture state to evaluate Boolean safety predicates. These predicates instantiate the abstract propositions used by the LTLf templates, producing a finite symbolic trace per rollout.
Compile each instantiated LTLf formula into a DFA and update it online as the trace is generated. A rollout is marked as a violation when the monitor reaches a rejecting state. Report safe success, violation category, and task suite breakdowns.
Results
Many rollouts that achieve task success contain temporal safety violations. Task-success rate is an unreliable proxy for safe execution: a policy can complete a task while still exhibiting unsafe contact, unstable releases, or containment failures.
Even the strongest evaluated models (π0, GR00T N1.5) frequently violate temporal safety properties. Longer-horizon and more complex tasks expose significantly more violations, revealing systematic gaps in current VLA safety.
SafeManip enables fine-grained diagnosis by property category and task suite. Training improvements that increase task success do not reliably improve temporal safety, motivating safety-targeted evaluation as a distinct metric.
Resources
Citation
@article{huang2026safemanip,
title = {{SafeManip}: A Property-Driven Benchmark for Temporal Safety
Evaluation in Robotic Manipulation},
author = {Huang, Chengyue and Huynh, Khang Vo and Elbaum, Sebastian
and Kira, Zsolt and Feng, Lu},
journal = {arXiv preprint arXiv:2605.12386},
year = {2026}
}