Affordance Prediction Analysis

Qualitative Comparison against baselines

We compare our affordance model against two baselines using CLIP features - ClipSeg and CliPort on the RAVENS dataset. Spatula and Sauce Pan are seen at training time while Hammer is unseen for CLiPort. For our method, we annotate one exemplar from each category.


Hammer

CLIPORT Dolor sit Amet consectetur Amet consectetur

Ground Truth

ClipSeg

CliPort

Ours



Sauce Pan

CLIPORT Dolor sit Amet consectetur Amet consectetur

Ground Truth

ClipSeg

CliPort

Ours



Spatula

CLIPORT Dolor sit Amet consectetur Amet consectetur

Ground Truth

ClipSeg

CliPort

Ours

Using CLIP Features for Affordance Prediction

While CLIP features are great at identifying objects with open vocabulary generalization, it struggles with localizing specific object parts or affordances. Below is an example prediction from ClipSeg for the image of a hammer with the prompt "hammer handle".