> Question for the training dataset


I have a question about the training dataset.

I download the training set following the pipeline the GitHub provided and I also tried to get some baseline models.

However, I found that my training result demonstrated that the correlations of the image and the groundtruth labels are really weak. I don't know if I did anything wrong, or is it actually a really challenging dataset?

Thank you so much.

Posted by: wyn2168 @ Sept. 26, 2022, 3:55 p.m.

you are correct in saying that this is a challenging task. All chest radiographs are portables, which means that these were acquired when patients were likely in the hospital getting a chest X-ray while in bed. So, while patients may or may not have had COVID, many will have been sick with other types of pneumonia or have had any other diseases/procedures monitored with portable chest radiographs.

You should also note that images such as post-processing images may not be helpful for training (and those won't be part of the validation/test sets) so you may want to remove those from your training data.

You could also play around with your training data to only include the first radiograph post-COVID test per patient, for example, to see if that helps in your model training.

Hope this helps!

Posted by: kdrukker @ Sept. 26, 2022, 4:10 p.m.

Hi, i have a follow-up question:
What do you mean by post-processing images and how to identify them?
Best Regards,

Posted by: FBehrendt @ Sept. 29, 2022, 1:37 p.m.

When you see a post-processed image, you'll recognize it: everything appears very sharp since some type of edge filter was applied. For the validation and test sets, we visually verified all post-processing images were removed but this information is likely in the DICOM header.

Posted by: kdrukker @ Sept. 29, 2022, 2:04 p.m.

I see, i will check the DICOM files i guess. Thanks alot!

Posted by: FBehrendt @ Sept. 29, 2022, 2:31 p.m.


Just for clarification : are any of the images in the Training dataset (the downloadable one with 12 images) are post processing images? Bony structures seem to appear very sharp compared to most data in the MIDRC downloadable database.

In other word, are they expected to be representative of the data to be encountered in the Validation and Test phases?

Many thanks,


Posted by: magou190 @ Oct. 3, 2022, 6:56 p.m.

Hi, the 12 example practice images are from the public-facing side of MIDRC without any special selection criteria other than having some COVID+ and some COVID- images.

Posted by: kdrukker @ Oct. 3, 2022, 7:05 p.m.
Post in this thread