FCBV-Net: Category-Level Robotic Garment Smoothing

Abstract

Category-level generalization for robotic garment manipulation, such as bimanual smoothing, remains a significant hurdle due to high dimensionality, complex dynamics, and intra-category variations. Current approaches often struggle, either overfitting with concurrently learned visual features or failing to predict the value of synergistic actions.

We propose the Feature-Conditioned Bimanual Value Network (FCBV-Net), operating on 3D point clouds to specifically enhance category-level policy generalization. FCBV-Net conditions bimanual action value prediction on pre-trained, frozen dense geometric features, ensuring robustness to intra-category garment variations.

In simulated GarmentLab experiments with the CLOTH3D dataset, FCBV-Net demonstrated superior category-level generalization. It exhibited only an 11.5% efficiency drop on unseen garments compared to 96.2% for a 2D image-based baseline, and achieved 89% final coverage. These results highlight that the decoupling of geometric understanding from bimanual action value learning enables better category-level generalization.

Method Overview

The core hypothesis is that explicitly conditioning an action's value on robust, pre-trained per-point geometric features improves generalization. We use dense features pre-trained for structural understanding and keep them frozen during policy learning.

FCBV-Net Architecture: Dense geometric features are extracted by a pre-trained and frozen PointNet++ backbone. A ValueDecoderPN++ Network predicts initial grasp quality, while a parallel Primitive Selection Head determines the manipulation primitive (Fling, Drag, PickPlace).

Results

In simulated experiments with the CLOTH3D dataset, FCBV-Net demonstrated superior category-level generalization on unseen garments compared to both a 2D image-based baseline (Sim-SF) and a 3D correspondence-based baseline (UGM-PolicyTransfer).

Conceptual Performance Quadrant demonstrating the impact of the decoupling hypothesis. FCBV-Net overcomes the traditional trade-off, achieving both the high manipulation effectiveness characteristic of learned policies and the robust category-level generalization of geometric correspondence methods.

On unseen garments, FCBV-Net maintained high efficiency with only an 11.5% performance drop and achieved 89% final coverage. In contrast, the 2D baseline suffered a 96.2% drop, while the 3D correspondence baseline achieved 83% coverage with a fixed primitive. These results validate that decoupling geometric understanding from bimanual action value learning enables both high effectiveness and robust generalization.

Baseline vs FCBV-Net Comparison Gallery

Data Preparation Stage
Backbone Features Visualization
Self-Supervised Rollouts
Dataset Collection Flow

Select a category to view the dataset samples (12 samples per category).

FCBV Paper: Dataset Collection - Timelapse - All Primitives

FCBV Paper: Self-Supervised Data Collection - Timelapse

FCBV Paper: Data Pool Generation - Timelapse

BibTeX

@misc{daba2025fcbvnet,
      title={FCBV-Net: Category-Level Robotic Garment Smoothing via Feature-Conditioned Bimanual Value Prediction}, 
      author={Mohammed Daba and Jing Qiu},
      year={2026},
      url={https://arxiv.org/abs/2508.05153}, 
      doi={10.48550/arXiv.2508.05153}
}

FCBV-Net: Category-Level Robotic Garment Smoothing via Feature-Conditioned Bimanual Value Prediction

FCBV-Net transforms an arbitrarily crumpled garment into a smoothed state using bimanual manipulation.

Abstract

Method Overview

Results

Supplementary Media

BibTeX