Benchmarking TensorFlow Input Pipelines | TF_Profiler

Target: improving input pipeline efficiency and GPU utilization

Zukhriddin
6 min readDec 19, 2022

The first step in any machine learning problem is to have a good(clean) dataset and then LOAD this dataset to train your model. Here, we will discuss three different ways to load an image dataset — using Tensorflow(Keras) and check their performance difference:

  • tf.keras.prepocessing.image.ImageDataGenerator
  • tf.keras.utils.image_dataset_from_directory
  • tf.data.Dataset with image files

We will use the Date Fruit Image Dataset on Kaggle for benchmarking.

Full code examples can be found on here…

Download the dataset and extract it to date_fruit folder:

  1. Load the dataset with imageDataGenerator

Let’s now jump into the details of each method, starting with a review of an old legend, which is very simple. ImageDataGenerator generates batches of tensor image data with real-time data augmentation.

imgDataGen = tf.keras.preprocessing.image.ImageDataGenerator()

imgDataGen = imgDataGen.flow_from_directory(dataset_path,
target_size=(IMG_SIZE,IMG_SIZE),
batch_size=BATCH_SIZE,
class_mode="sparse",
color_mode="rgb",
shuffle=True,
interpolation='nearest')

However, we will not use data augmentation for pure comparison. Here ImgDatagen generates a batch of images with their corresponding labels, sinceclass_mode="sparse", labels will be a sparse number which is the index number of class_names(alphabetically sorted), not one-hot encodings:


imgDataGen_classes = list(imgDataGen.class_indices.keys())
print(imgDataGen_classes)

for images, labels in imgDataGen:
plt.figure(figsize=(20, 5))
for i in range(6):
ax = plt.subplot(1, 6, i + 1)
plt.imshow(images[i].astype('uint8'))
label = int(labels[i])
plt.title(f"{imgDataGen_classes[label]}\n{label}")
plt.axis("off")
break
ImgDataGen samples

2. Load the dataset with image_dataset_from_directoy

This contestant is also very simple and is included in Keras as well.

imgDataFromDir = tf.keras.preprocessing.image_dataset_from_directory(dataset_path,
labels="inferred",
label_mode="int",
color_mode="rgb",
batch_size=BATCH_SIZE,
image_size=(IMG_SIZE, IMG_SIZE),
interpolation='nearest',
shuffle=True)

class_names = imgDataFromDir.class_names
imgDataFromDir = imgDataFromDir.cache()
imgDataFromDir = imgDataFromDir.prefetch(tf.data.AUTOTUNE)

Like the first method, the images are resized, batched, and also the labels are sparse numbers (label_mode="int"). There are however no options to do data augmentation on the fly.

The main difference between ImgDataGen and imgDataFromDir is that the former uses python threading which is not efficient due to GIL, and the latter uses tf.data.Dataset object which allows it to fit nicely with the rest of the pipelining tools like caching, prefetching and etc. You can do any kind of data augmentation with TensorFlow on this kind of dataset. It is a bit less simple but much more customizable.

You can see images of imgDataFromDir object using below code snippet:

print(class_names)
for images, labels in imgDataFromDir:
plt.figure(figsize=(20, 5))
for i in range(6):
ax = plt.subplot(1, 6, i + 1)
label = int(labels[i])
plt.title(f"{class_names[label]}\n{label}")
plt.imshow(images[i].numpy().astype('uint8'))
plt.axis("off")
break

3. Load the dataset with tf.data.Dataset

The tf.data.Dataset API supports writing descriptive and efficient input pipelines. Dataset usage follows a common pattern:

  1. Create a source dataset from your input data.
  2. Apply dataset transformations to preprocess the data.
  3. Iterate over the dataset and process the elements.

Iteration happens in a streaming fashion, so the full dataset does not need to fit into memory.

subdirs = []
for subdir in sorted(tf.io.gfile.listdir(dataset_path)):
if tf.io.gfile.isdir(tf.io.gfile.join(dataset_path, subdir)):
if subdir.endswith("/"):
subdir = subdir[:-1]
subdirs.append(subdir)

class_names = subdirs

def parse_image_and_label(filename):
parts = tf.strings.split(filename, os.sep)
boolean_label = parts[-2] == class_names
label = tf.argmax(boolean_label)

image = tf.io.read_file(filename)
image = tf.io.decode_image(image, channels=3)
image.set_shape((IMG_SIZE, IMG_SIZE, 3))
image = tf.image.resize(image, [IMG_SIZE, IMG_SIZE])

return image, label


tf_data = tf.data.Dataset.list_files(f"{dataset_path}*/*/")
tf_data = tf_data.map(parse_image_and_label, num_parallel_calls=tf.data.AUTOTUNE)
tf_data = tf_data.shuffle(BATCH_SIZE*8)
tf_data = tf_data.cache()
tf_data = tf_data.batch(BATCH_SIZE, num_parallel_calls=tf.data.AUTOTUNE)
tf_data = tf_data.prefetch(tf.data.AUTOTUNE)

A brief explanation of the above code snippet:

  • First, we get class_names.
  • tf.data.Dataset's list_files method takes dataset_path and returns a list of all filenames on dataset_path. You can also find an example of using from_tensor_slices method in full code examples…
  • the map method takes a function and applies it across the elements of this dataset, here parse_image_and_label function is applied in that it takes a filename and returns a label and an image as a tensor.
  • the shuffle shuffles the dataset
  • the cache method makes the training extra fast because the data is saved in memory on the first epoch and then the cached dataset is used. If the dataset doesn't fit in memory(RAM) there will be errors. In that case use cache("path/to/file") , it will cache the dataset path/to/file on disk.
  • the batch batches the dataset
  • the prefetch fetches n+1 batch while GPU computing n batch

You can learn more about them in this great tutorial!

print(class_names)
for images, labels in tf_data:
plt.figure(figsize=(20, 5))
for i in range(6):
ax = plt.subplot(1, 6, i + 1)
plt.imshow(images[i].numpy().astype('uint8'))
plt.title(f"{class_names[labels[i]]}\n{labels[i].numpy()}")
plt.axis("off")
break

Now, our three data pipelines are ready to go. We will prepare a simple CNN model for training. As long as our prepared dataset labels sparse numbers we use SparseCategoricalCrossentropy() as a loss function.

def build_model():

model = tf.keras.Sequential([

tf.keras.layers.Input(shape=(IMG_SIZE,IMG_SIZE,3)),

tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),

tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),

tf.keras.layers.GlobalAvgPool2D(),

tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(len(class_names), activation='softmax')
])


model.compile(optimizer=tf.keras.optimizers.Adam(),
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=['accuracy'])

return model

Next, we will prepare a custom training function using tensorboard callbacks and tensorflow_profiler:

def train_the_model(data_gen, epochs=5, profiler=False):

tf.keras.backend.clear_session()

model = build_model()

if profiler:
# Prepare tensorboard with profiler
tensorboard = tf.keras.callbacks.TensorBoard(log_dir=f'tboard/{profiler}',
histogram_freq=1,
profile_batch = (280,300)) # between 280:300 batches

# Start training
model.fit(data_gen, epochs=epochs, callbacks=[tensorboard])
else:
# Start training
model.fit(data_gen, epochs=epochs)

First, we call train_the_model function without callbacks by giving our three dataset objects one by one: imgDataGen, imgDataFromDir, tf.data

The above results showed that ImageDataGenerator is very slow compared to image_dataset_from_directory and tf.data.

Since image_data_from_directory uses tf.data API under the hood they share almost the same performance. Thanks to cache() method, the dataset is cached on the first epoch and then, the cached dataset is used for the next epochs.

Deprecated: tf.keras.preprocessing.image.ImageDataGenerator is not recommended for new code. Prefer loading images with tf.keras.utils.image_dataset_from_directory and transforming the output tf.data.Dataset with preprocessing layers. (From Official docs)

Optimize TensorFlow performance using the Profiler

TensorFlow Profiler is used to profile the execution of your TensorFlow code. The Profile tab opens the Overview page which shows you a high-level summary of your model performance. The Overview page also gives you recommendations on potential next steps you can follow to optimize your model performance. TF_Profiler installation…

If you call train_the_model with the profiler parameter, callbacks profile the training process for some epochs:

Now, you can see the profiler section in the tensorboard:

As you see, ImageDataGenerator spent a lot of time in the data input pipeline while image_data_from_directoy and tf.data was doing well.

You can explore lots of information about your input data pipeline and RAM, CPU, GPU usage, tf_operations, and more. There are also recommendations to optimize your training in that section.

Conclusion

The above experiment with three tf_input_pipelines shows if your image classification project uses ImageDataGenerator it is highly recommended to swap it to imag_dataset_from_directory. However, if you are working on object detection or segmentation problems you may need to parse images and labels based on your datasets. In that cases, it is better to use tf.data.Dataset API which does its job perfectly.

References:

TensorFlow Official Docs

Date Fruit Sorting Based on Deep Learning and Discriminant Correlation Analysis

--

--