Data Preprocessing for Deep Learning

By Bill Sharlow

Day 4 of our TensorFlow Deep Learning Framework Setup

Welcome to Day 4 of our 10-Day DIY TensorFlow Deep Learning Framework Setup series! Today, we’re delving into the crucial realm of data preprocessing for deep learning. Properly preparing your data is paramount for training robust and effective models.

Importance of Data Preprocessing

Data preprocessing plays a pivotal role in enhancing the performance and generalization of your deep learning models. It involves tasks such as normalization, augmentation, and handling missing values, ensuring your data is in optimal condition for training.

Hands-On Data Preprocessing with TensorFlow

In this example, we’ll focus on image data preprocessing using TensorFlow’s data API. We’ll perform normalization and augmentation on the MNIST dataset.

import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

# Load and preprocess the MNIST dataset
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((60000, 28, 28, 1)).astype('float32') / 255
test_images = test_images.reshape((10000, 28, 28, 1)).astype('float32') / 255
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

# Create TensorFlow datasets
train_dataset = tf.data.Dataset.from_tensor_slices((train_images, train_labels))
test_dataset = tf.data.Dataset.from_tensor_slices((test_images, test_labels))

# Data preprocessing functions
def normalize_and_augment(image, label):
    image = tf.image.random_flip_left_right(image)
    image = tf.image.random_flip_up_down(image)
    image = tf.image.random_brightness(image, max_delta=0.1)
    image = tf.image.random_contrast(image, lower=0.9, upper=1.1)
    image = tf.image.per_image_standardization(image)
    return image, label

# Apply preprocessing functions to the datasets
train_dataset = (train_dataset
                 .shuffle(60000)
                 .map(normalize_and_augment, num_parallel_calls=tf.data.AUTOTUNE)
                 .batch(64)
                 .prefetch(tf.data.AUTOTUNE))

test_dataset = (test_dataset
                .map(normalize_and_augment, num_parallel_calls=tf.data.AUTOTUNE)
                .batch(64)
                .prefetch(tf.data.AUTOTUNE))

# Display the first batch of preprocessed images
for images, labels in train_dataset.take(1):
    print('Preprocessed Image Shape:', images.shape)

In this script:

  • We load the MNIST dataset and perform the initial preprocessing steps
  • TensorFlow datasets are created for both the training and testing sets
  • The normalize_and_augment function applies random flips, brightness, contrast adjustments, and standardization to each image
  • The datasets are then shuffled, mapped with the preprocessing function, batched, and prefetched for optimal performance

What’s Next?

You’ve now gained hands-on experience with data preprocessing in TensorFlow. In the upcoming days, we’ll explore advanced neural network architectures and techniques for optimizing model performance.

Stay tuned for Day 5: Advanced Neural Network Architectures, where we’ll delve into the intricacies of convolutional neural networks (CNNs) and their applications. Happy coding!

Leave a Comment