Which method works best to create batches for a GAN?

Hello github community! I’m totaly new here and hope that someone could help me =)
As the title explains, I have a question about training a neural network:

I like to train my deblurGAN (I used Keras) with the REDs dataset. I already tried it with CIFAR10 (where I added some artificial blur) and it works very fine. Since the REDs dateset has a different structure and is much bigger, I need to find a new way to load the dataset and to create the batches.
I need to use google colab and google drive because my Gpu is too slow.

I tried 2 methods:

  1. I loaded all the pictures in 10 numpy arrays (There is to less RAM to load them in just one array) and saved these arrays in my google drive. During training I have a loop where I load the current np array. For every array I’m loading 16 pics into a batch, update the weights with this batch and repeat it with the next batch.
    This works exactly for the first loaded array. If I load the second one I get out of RAM and get an error. Here’s the corresponding code
import gc

def training(epochs, batch_size): #,testset_b,testset_s):
  print('start training with batchsize: ',batch_size)
 
  for e in range(epochs):
    current_epoch = e + 1
    weights_name = "deblurGAN_REDs_weights%s_batchsize_%r.h5" %(current_epoch, batch_size)
    print("epoch: %s " %(current_epoch))

    #loop for the 10 numpy arrays
    for d_set in range(10):

      #loading current array
      with load('/content/gdrive/My Drive/Colab Notebooks/reds/train%r.npz' %(17*(d_set+1)-1)) as data:
        x_train, y_train = data['arr_0'], data['arr_1']
        del data.f
        data.close()
      gc.collect()
      print('Loaded trainingset ',d_set +1," :", x_train.shape, y_train.shape, type(x_train))
   
      n_batches = int (len(x_train)/batch_size) #divide trainingset into batches of batch_size

      #randomize index of trainig set
      random_idx = np.arange(0, x_train.shape[0])
      random.shuffle(random_idx)
      x_train = x_train[random_idx]
      y_train = y_train[random_idx]
     
      #here I'm doing the training with the current batch of the current array
      for i in range(n_batches): 

        img_batch_blured = x_train[i*batch_size:(i+1)*batch_size]
        img_batch_generated = generator.predict(img_batch_blured)
        img_batch_original = y_train[i*batch_size:(i+1)*batch_size]
        img_batch = np.concatenate((img_batch_generated , img_batch_original),0)

        valid0 = -np.ones(batch_size)
        valid1 = np.ones(batch_size)
        valid = np.concatenate((valid0,valid1))
 
        #updating the discriminator
        for k in range(5):
          loss = discriminator.train_on_batch(img_batch, valid)

        discriminator.trainable = False

        GAN_loss = gan.train_on_batch(img_batch_blured, [img_batch_original, valid1])

        discriminator.trainable = True
     
      print("finished training of current dataset")

      # this is what I tried to free all the memory, but I does not work -.-
      x_train = None
      y_train = None
      del x_train
      del y_train
      gc.collect()

    # Save weights: mode the path to your directory 
    gan.save_weights(F"/content/gdrive/My Drive/Colab Notebooks/data/{weights_name}")
  return 

maybe you have an Idea what I’m doing wrong? Somehow the training process allocates 4 GB RAM, but after I set the x_train, y_train = None and delete them it still allocates 4GB RAM. So If It loads the next array there is not enough space (all in all I have 12GB RAM).

  1. I tried to use the Keras imageDataGenerator to load the data and create the batches.
#blurred dataGenerator
generator_blurred = ImageDataGenerator(rescale=1./255)   
dataset_generator_blurred = generator_blurred.flow_from_directory(dataset_blurred_path, target_size=(image_shape[0], image_shape[1]),color_mode="rgb", batch_size=batch_size,classes=None, class_mode=None, shuffle = False)
print(dataset_generator_blurred)

#sharp dataGenerator
generator_sharp = ImageDataGenerator(rescale=1./255)
dataset_generator_sharp = generator_sharp.flow_from_directory(dataset_sharp_path, target_size=(image_shape[0], image_shape[1]),color_mode="rgb", batch_size=batch_size,classes=None ,class_mode=None, shuffle = False)
print(dataset_generator_sharp)

during training …

def training(epochs, batch_size):

  n_batches = int(24000/batch_size)
  for e in range(epochs):

    weights_name = "deblurGAN_REDs_weights%s_batchsize_%r.h5" %(epoch, batch_size)

    for i in range(n_batches):
      
      #creating the batches...
      img_batch_original = dataset_generator_sharp.next()
      img_batch_blured = dataset_generator_blurred.next()

      img_batch_generated = generator.predict(img_batch_blured)
      img_batch = np.concatenate((img_batch_generated , img_batch_original),0)

      #In the last run the batch_size could may differ (totalsize/batchsize != int)
      current_batchsize = img_batch_original.shape[0]
      
      valid0 = -np.ones(current_batchsize)
      valid1 = np.ones(current_batchsize)
      valid = np.concatenate((valid0,valid1))
 
      for k in range(5):
        loss = discriminator.train_on_batch(img_batch, valid)

      discriminator.trainable = False
      
      gan.train_on_batch(img_batch_blured, [img_batch_original, valid1])

      discriminator.trainable = True     
    gan.save_weights(F"/content/gdrive/My Drive/Colab Notebooks/reds/weights/{weights_name}")

  return 

This is working, but it needs around 6 hours for the first epoch. The epochs after the first one only need 30 Minutes. But I often reach the colab time limit or get an error in the first epoch. So training is not really possible.

I hope that you can help me! The neural network works really well and was easy to code, but I wasted so much time to find a solution to load these dataset.
Thanks for reading =)