Human visual recognition of complex patterns is supported by hierarchical representations in the ventral stream of visual cortex. However, it remains undetermined whether representations in early visual cortical areas, e.g. primary visual cortex (V1), are directly accessible by perception or are merely intermediates used only for the generation of more complex representations in higher-level visual areas, e.g. inferior temporal cortex (IT). Here, we constructed deep convolutional neural network (dCNN) based simulations of V1 and IT by linearly weighting dCNN features to maximize predictivity of electrophysiological responses. We used these cortical simulations to synthesize stimuli which linearly interpolate through either a V1- or IT-like feature space. In a visual discrimination task, we found that human observers are highly sensitive to variation through both V1 and IT representational spaces. We found that behavior on this task cannot be explained by an observer model that makes use of solely V1 features or IT features, but instead is best explained by a weighted combination of V1 and IT features. Our results thus provide evidence for the insufficiency of IT representations and the necessity of representations in both early and late regions of the ventral visual stream to support perception.