Textures with similar visual features, scrambled locally, can be obviously distinguishable when viewed directly, but metameric (i.e. perceptually indistinguishable) when viewed in the periphery (Freeman & Simoncelli, 2011). This suggests that the visual system pools complex features over small regions of space. Prior studies of texture perception have utilized textures synthesized by iteratively updating random noise images to match handcrafted features derived from a linear filter bank (Portilla & Simoncelli, 2001). Extending this approach, we generated textures by matching complex features extracted from various layers of the VGG-19 convolutional neural network (Gatys et al., 2011), pooled over uniform-sized subregions of a naturalistic image. We asked five human observers to distinguish original images from feature-matched textures, presented at 10 degrees eccentricity. Feature-matched textures were less distinguishable from original images when features from later layers (e.g. pool4) (F=346.42, p<0.001) or within smaller pooling regions (F=68.59, p<0.001) were matched. We modeled behavioral performance as a function of the distance between the features of the original image and texture on each trial. Comparing 12 different observer models, we found that a model using pool4 features computed within 2-degree pooling regions best predicts human performance on held-out trials. Furthermore, to assess the neural basis of texture perception, we measured BOLD activity in the visual cortex as five human subjects viewed texture images during ten 6 minute runs. Following published procedures (Freeman et al., 2013), images were flashed at 5Hz within 9 second blocks alternating between textures and phase-scrambles. We found that sensitivity to pool2 and pool4 textures relative to phase-scrambled images emerges in V2 and increases from V2 to V3 to V4. In sum, these results suggest that both feature complexity and pooling region size contribute to visual metamerism and that cortical representations in areas V2-V4 may support these perceptual effects.
Featured in Psychonomic Society: Fortifying memory after encoding: Internal and external attention and visual short-term memory