Understanding Overfitting in Neural Networks (TensorFlow- CNN)
Stephen Ogundero

Stephen Ogundero @stephen_ogundero_7c2f6842

Location:
United Kingdom
Joined:
Oct 27, 2025

Understanding Overfitting in Neural Networks (TensorFlow- CNN)

Publish Date: Dec 11 '25
0 0

📘 Understanding Overfitting in Neural Networks and Techniques to Prevent It

Using Fashion-MNIST Experiments

Overfitting is a fundamental challenge when developing neural networks. A model that performs extremely well on the training dataset may fail to generalize to unseen data, leading to poor real-world performance. This post presents a structured investigation of overfitting using the Fashion-MNIST dataset and evaluates several mitigation strategies, including Dropout, L2 Regularisation, and Early Stopping.

All experiments, code, and plots in this post are taken directly from the accompanying notebook.


📂 Dataset Overview: Fashion-MNIST

The Fashion-MNIST dataset contains:

  • 60,000 training images
  • 10,000 test images
  • 28×28 grayscale format
  • 10 output classes

A significantly smaller subset of the training data is intentionally used to make overfitting behaviour more visible.


🧠 Model Architecture Used Throughout

All experiments share the same CNN architecture, with optional L2 regularisation and Dropout:

def create_cnn_model(l2_lambda=0.0, dropout_rate=0.0):
    model = keras.Sequential([
        layers.Conv2D(32, (3,3), activation='relu', kernel_regularizer=regularizers.l2(l2_lambda)),
        layers.MaxPooling2D((2,2)),
        layers.Conv2D(64, (3,3), activation='relu', kernel_regularizer=regularizers.l2(l2_lambda)),
        layers.MaxPooling2D((2,2)),
        layers.Flatten(),
        layers.Dense(64, activation='relu', kernel_regularizer=regularizers.l2(l2_lambda)),
        layers.Dropout(dropout_rate),
        layers.Dense(10, activation='softmax')
    ])

    model.compile(
        optimizer="adam",
        loss="sparse_categorical_crossentropy",
        metrics=["accuracy"]
    )
    return model
Enter fullscreen mode Exit fullscreen mode

📊 Plotting Function

All performance diagrams were generated using the following utility:

def plot_history(history, title_prefix=""):
    hist = history.history
    plt.figure(figsize=(12,5))

    plt.subplot(1,2,1)
    plt.plot(hist["loss"], label="Train Loss")
    plt.plot(hist["val_loss"], label="Val Loss")
    plt.title(f"{title_prefix} Loss")
    plt.legend()

    plt.subplot(1,2,2)
    plt.plot(hist["accuracy"], label="Train Accuracy")
    plt.plot(hist["val_accuracy"], label="Val Accuracy")
    plt.title(f"{title_prefix} Accuracy")
    plt.legend()

    plt.tight_layout()
    plt.show()
Enter fullscreen mode Exit fullscreen mode

🔍 1. Baseline Model (No Regularisation)

baseline_model = create_cnn_model(l2_lambda=0.0, dropout_rate=0.0)
history_baseline = baseline_model.fit(
    x_train_small, y_train_small,
    validation_split=0.2,
    epochs=20
)
plot_history(history_baseline, title_prefix="Baseline (no regularisation)")
Enter fullscreen mode Exit fullscreen mode

Baseline Performance Plot

Observations

  • Training accuracy continues to increase steadily.
  • Validation accuracy peaks early and then declines.
  • Training loss decreases, while validation loss increases.

This is clear evidence of overfitting.


🛠 2. Dropout (0.5 Rate)

dropout_model = create_cnn_model(dropout_rate=0.5)
history_dropout = dropout_model.fit(
    x_train_small, y_train_small,
    validation_split=0.2,
    epochs=20
)
plot_history(history_dropout, title_prefix="Dropout (0.5)")
Enter fullscreen mode Exit fullscreen mode

Dropout Plot

Observations

  • Training accuracy increases more slowly (expected due to Dropout).
  • Validation accuracy tracks the training curve more closely.
  • Divergence between training and validation loss is significantly reduced.

Dropout is highly effective in this experiment, producing noticeably improved generalisation.


🧱 3. L2 Regularisation (λ = 0.001)

l2_model = create_cnn_model(l2_lambda=0.001)
history_l2 = l2_model.fit(
    x_train_small, y_train_small,
    validation_split=0.2,
    epochs=20
)
plot_history(history_l2, title_prefix="L2 Regularisation")
Enter fullscreen mode Exit fullscreen mode

L2 Plot

Observations

  • Training loss is noticeably higher due to weight penalisation.
  • Validation loss trends are more stable compared to the baseline.
  • Validation accuracy improves moderately.

L2 regularisation produces smoother learning dynamics and alleviates overfitting, though its impact is milder than Dropout in this setup.


⏳ 4. Early Stopping

earlystop_model = create_cnn_model()
early_stop = keras.callbacks.EarlyStopping(
    monitor='val_loss',
    patience=3,
    restore_best_weights=True
)

history_early = earlystop_model.fit(
    x_train_small, y_train_small,
    validation_split=0.2,
    epochs=20,
    callbacks=[early_stop]
)
plot_history(history_early, title_prefix="Early Stopping")
Enter fullscreen mode Exit fullscreen mode

Early Stopping Plot

Observations

  • Training terminates after validation loss stops improving.
  • Avoids the late-epoch overfitting seen in the baseline.
  • Produces one of the cleanest validation curves among all models.

Early stopping is a simple and effective generalisation technique.


📦 (Optional) TensorFlow Lite Conversion

converter = tf.lite.TFLiteConverter.from_keras_model(baseline_model)
tflite_model = converter.convert()
print("Quantised model size (bytes):", len(tflite_model))
Enter fullscreen mode Exit fullscreen mode

This step demonstrates model size reduction for deployment purposes, although it is not a regularisation strategy.


🧾 Conclusion

The experimental results highlight the following:

  • The baseline model exhibits clear overfitting.
  • Dropout provides the largest improvement in validation behaviour.
  • L2 regularisation helps stabilise training dynamics.
  • Early Stopping prevents late-epoch divergence and improves generalisation.

Combining Dropout + Early Stopping produces the most robust performance on the reduced Fashion-MNIST dataset.


Comments 0 total

    Add comment