最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Using tf.keras.utils.image_dataset_from_directory mismatches labels on the images - Stack Overflow

programmeradmin0浏览0评论

Here's the code

train_dset = image_dataset_from_directory(
    directory = data_path,
    batch_size = 32,
    image_size = (256,256),
    label_mode = "int",
    shuffle = True
)

in data_path, I have

-> dataset
   |-> train
       |-> cats (having only cats images)
       |-> dogs (having only dogs images)
   |-> test
       |-> cats (having only cats images)
       |-> dogs (having only dogs images)

when i keep shuffle = True, labels are also shuffled, such as dog image is assigned cat label and vice versa. How to club label and image together and then shuffle the data, without any label mismatch?

Here's the code

train_dset = image_dataset_from_directory(
    directory = data_path,
    batch_size = 32,
    image_size = (256,256),
    label_mode = "int",
    shuffle = True
)

in data_path, I have

-> dataset
   |-> train
       |-> cats (having only cats images)
       |-> dogs (having only dogs images)
   |-> test
       |-> cats (having only cats images)
       |-> dogs (having only dogs images)

when i keep shuffle = True, labels are also shuffled, such as dog image is assigned cat label and vice versa. How to club label and image together and then shuffle the data, without any label mismatch?

Share Improve this question asked Feb 11 at 12:08 Chinmaya TewariChinmaya Tewari 112 bronze badges 6
  • 1 Hi @Chinmaya Tewari, you can use the following command: keras.utils.image_dataset_from_directory(directory, labels="inferred") – Sagar Commented Feb 11 at 13:56
  • Are you using only one path? I think you need to have to make one for train and other for test data. In other words you need to create two datasets. – David Sousa Commented Feb 11 at 21:26
  • @DavidSousa Certainly not, I'm using two paths one for train data and other for test data. I didn't post the code for the test as it's going to be almost same, my apologies for that. – Chinmaya Tewari Commented Feb 12 at 6:30
  • @Sagar I've tried using labels="inferred" as well. Outcome was all the same – Chinmaya Tewari Commented Feb 12 at 6:31
  • Hi @ChinmayaTewari if you are using Google_colab can you provide the gist file of the code – Sagar Commented Feb 12 at 6:39
 |  Show 1 more comment

1 Answer 1

Reset to default 0

Basically you need to have a directory structure having folders specific to the labels like below

├── train/
│   ├── cats/
│   │   ├── image1.jpg
│   │   └── ...
│   ├── dogs/
│   │   ├── imageA.jpeg
│   │   └── ...
│   └── ...

Then you can specify

train_ds = image_dataset_from_directory(
    directory = "train",
    batch_size = 32,
    image_size = (256,256),
    label_mode = "int",
    shuffle = True
)

Similarly you can create test_ds, usually with shuffle = False for this.

发布评论

评论列表(0)

  1. 暂无评论