Feature selection / removal of information

I am applying a convolutional neural net approach to this challenge. I am currently training, similar to the MxNet tutoral, with all the image sax folders and all 30 images per sax / stack. I would be curious to hear, from those with deep learning expertise and experience, whether it makes theoretical or practical sense to train on a subset of this information (e.g. only a subset of the most relevant sax / cross-cuts, or only part of the 30 seconds). I wonder whether this might improve performance, or whether the architecture of the CNN, if properly tuned, already hones in on the most relevant information and discards superfluous additional information?

