2 Benchmarking Deep Learning with Libraries

2.1 Transitioning from Custom MLP Architecture to Keras for Generalized Applications

The custom Multi-Layer Perceptron (MLP) class architecture implemented earlier provides a solid foundational understanding of how neural networks function, including layer definitions, activation mechanisms, forward and backward propagation, and performance evaluation. However, when scaling these implementations to generalizable and complex applications, certain limitations of such custom approaches become evident. This section introduces Keras, a widely adopted library for building and benchmarking Artificial Neural Networks (ANNs), highlighting its relevance as a practical alternative to custom implementations.

2.2 Limitations of the Custom MLP Class Architecture

While the MLP class offers flexibility and transparency for learning purposes, its limitations include:

2.2.1 Scalability

The manual initialization of weights, biases, and layer-specific computations can become cumbersome as the network depth and size increase.
Handling large datasets or multiple training tasks requires additional optimization techniques that are non-trivial to implement manually.

2.2.2 Optimization Challenges

Implementing advanced optimizers like RMSprop, Adam, or adaptive gradient methods demands significant coding effort.
Features like learning rate scheduling and early stopping require extensive additional logic.

2.2.3 Performance Bottlenecks

The current design lacks GPU acceleration, limiting its applicability to computationally intensive tasks.
Debugging and profiling performance manually can be error-prone and time-consuming.

2.2.4 Generalization Issues

While sufficient for specific tasks like XOR gate simulations, the architecture lacks modularity for handling diverse, general-purpose applications such as image recognition or text classification.
Integration with modern research architectures, such as convolutional or recurrent networks, is challenging.

2.2.5 Limited Ecosystem Support

Custom implementations do not leverage pre-trained models, a key requirement for applications in transfer learning and fine-tuning.

2.3 `Keras`: A Practical Alternative for Generalized Applications

Keras is a neural network Application Programming Interface (API) for Python that is tightly integrated with TensorFlow, which is used to build machine learning models. Keras’ models offer a simple, user-friendly way to define a neural network, which will then be built for you by TensorFlow.

2.3.1 Why `Keras`?

Keras addresses the limitations of from the scratch architecture by providing a high-level, modular, and extensible framework built on top of TensorFlow. Here’s how Keras improves upon the custom MLP class:

Ease of Use:

Pre-built layers and optimizers simplify network creation without compromising flexibility.
APIs for common tasks, such as dataset preprocessing, model saving, and loading, minimize boilerplate code.

Scalability and Performance:

Support for GPUs and TPUs ensures computational efficiency, especially for large datasets and deep architectures.
Built-in profiling tools facilitate real-time performance monitoring and debugging.

Modularity for Advanced Architectures:

Keras supports a wide range of network types, including Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Transformers, which are impractical to implement from scratch.
The library integrates seamlessly with TensorFlow’s ecosystem, allowing access to tools like TensorBoard for visualization.

Pre-trained Models and Ecosystem:

Keras offers access to a library of state-of-the-art pre-trained models, enabling rapid prototyping and transfer learning.
The ecosystem includes community support and extensive documentation, enhancing usability and troubleshooting.

What’s the Difference Between Tensorflow and Keras?

TensorFlow is an open-source set of libraries for creating and working with neural networks, such as those used in Machine Learning (ML) and Deep Learning projects.

Keras, on the other hand, is a high-level API that runs on top of TensorFlow. Keras simplifies the implementation of complex neural networks with its easy to use framework.

When to Use Keras vs TensorFlow

TensorFlow provides a comprehensive machine learning platform that offers both high level and low level capabilities for building and deploying machine learning models. However, it does have a steep learning curve. It’s best used when you have a need for:

Deep learning research
Complex neural networks
Working with large datasets
High performance models

Keras, on the other hand, is perfect for those that do not have a strong background in Deep Learning, but still want to work with neural networks. Using Keras, you can build a neural network model quickly and easily using minimal code, allowing for rapid prototyping. For example:

# Import the Keras libraries required in this example: 
from keras.models import Sequential
from keras.layers import Dense, Activation
# Create a Sequential model: 
model = Sequential()
# Add layers with the add() method: 
model.add(Dense(32, input_dim=784)) 
model.add(Activation('relu'))

Keras is less error prone than TensorFlow, and models are more likely to be accurate with Keras than with TensorFlow. This is because Keras operates within the limitations of its framework, which include:

Computation speed: Keras sacrifices speed for user-friendliness.
Low-level Errors: sometimes you’ll get TensorFlow backend error messages that Keras was not designed to handle.
Algorithm Support: Keras is not well suited for working with certain basic machine learning algorithms and models like clustering and Principal Component Analysis (PCM).
Dynamic Charts: Keras has no support for dynamic chart creation.

2.3.2 Keras Model Overview

Models are the central entities in Keras, enabling the definition of TensorFlow neural networks by specifying attributes, functions, and layers. Keras provides multiple APIs for designing neural networks, catering to varying levels of complexity and use cases:

Sequential API:
- Allows building models layer by layer, suitable for most straightforward problems.
- Provides a simple list-based structure but is restricted to single-input, single-output stacks of layers.
Functional API:
- A comprehensive and flexible API supporting arbitrary model architectures.
- Ideal for creating complex models with multiple inputs, outputs, or shared layers.
Model Subclassing:
- Enables implementing models from scratch by subclassing the base Model class.
- Primarily used for research or highly specialized applications, though rarely needed for typical use cases.

2.4 Transition Example

2.4.1 Simulating XOR Gate with `Keras`

Revisiting the XOR gate example, here’s how it can be implemented using Keras:

Model definition

from keras.models import Sequential
from keras.layers import Dense

model = Sequential()
model.add(Dense(4, input_dim=2, activation='sigmoid')) #first hidden layer with 4 neurons
model.add(Dense(16, activation='sigmoid')) # second hidden layer with 16 neurons
model.add(Dense(1, activation='sigmoid'))
model.summary()

C:\Users\SIJUKSWAMY\AppData\Local\Programs\Python\Python312\Lib\site-packages\keras\src\layers\core\dense.py:87: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)

Model: "sequential"

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ dense (Dense)                   │ (None, 4)              │            12 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ (None, 16)             │            80 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_2 (Dense)                 │ (None, 1)              │            17 │
└─────────────────────────────────┴────────────────────────┴───────────────┘

 Total params: 109 (436.00 B)

 Trainable params: 109 (436.00 B)

 Non-trainable params: 0 (0.00 B)

Model compilation:

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

Loading data:

import numpy as np
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y=np.array([[0], [1], [1], [0]])

Model training:

model.fit(X, y, epochs=100, batch_size=2,verbose=0)

<keras.src.callbacks.history.History at 0x24c3a72e810>

Model evaluation:

loss, accuracy = model.evaluate(X, y)
print(f"Accuracy: {accuracy}")

1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 162ms/step - accuracy: 0.5000 - loss: 0.69431/1 ━━━━━━━━━━━━━━━━━━━━ 0s 171ms/step - accuracy: 0.5000 - loss: 0.6943
Accuracy: 0.5

Note

By using Keras, the manual implementation steps (e.g., weight initialization, forward/backward propagation) are abstracted, focusing instead on defining and optimizing the architecture. In this approach, the key words are: compile– fit–evaluate–predict.

2.4.2 Performance evaluation of the `keras` XOR gate model

from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score
# Predict the outputs
y_pred = (model.predict(X) > 0.5).astype(int)

# Calculate performance metrics
conf_matrix = confusion_matrix(y, y_pred)
accuracy = accuracy_score(y, y_pred)
precision = precision_score(y, y_pred)
recall = recall_score(y, y_pred)
f1 = f1_score(y, y_pred)

# Display results
print("Predictions:")
for inp, pred in zip(X, y_pred):
    print(f"Input: {inp}, Prediction: {pred}")

print("\nPerformance Metrics:")
print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall (Sensitivity): {recall:.2f}")
print(f"F1 Score: {f1:.2f}")

print("\nConfusion Matrix:")
print(conf_matrix)

1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 50ms/step1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 77ms/step
Predictions:
Input: [0 0], Prediction: [1]
Input: [0 1], Prediction: [1]
Input: [1 0], Prediction: [1]
Input: [1 1], Prediction: [1]

Performance Metrics:
Accuracy: 0.50
Precision: 0.50
Recall (Sensitivity): 1.00
F1 Score: 0.67

Confusion Matrix:
[[0 2]
 [0 2]]

2.4.3 Task works

Task 1: Simulate the OR and AND gates using the same approach and analyse the skill of the model using the performance metrices.

Task 2: Implement the regression task to predict the price of a pizza based on its radius using an MLP model in Keras

Solution:

Loading libraries:

import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import load_model
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error,r2_score

Creating synthetic data

# Generate synthetic data: radius (X) and price (y)
np.random.seed(42)
radii = np.random.uniform(5, 20, 25)  # Random radii between 5 and 20 cm
prices = radii * 10 + np.random.normal(0, 5, 25)  # Price proportional to radius with some noise

X = radii.reshape(-1, 1)  # Feature: radius
y = prices.reshape(-1, 1)  # Target: price


# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=24003)

Creating a model

# Build the MLP model
model = Sequential()
model.add(Dense(16, input_dim=1, activation='relu')) # Hidden layer with 16 neurons
model.add(Dense(8, activation='relu'))# Hidden layer with 8 neurons
model.add(Dense(1)) # Output layer for regression

C:\Users\SIJUKSWAMY\AppData\Local\Programs\Python\Python312\Lib\site-packages\keras\src\layers\core\dense.py:87: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)

Compiling the model

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

Train the model

# Train the model and capture training history
history = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=500, verbose=0)

Plotting Model performance while training

# Plot training and validation loss
plt.figure(figsize=(10, 6))
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Loss During Training')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.grid()
plt.show()

Evaluate and save the model

# Evaluate the model
train_predictions = model.predict(X_train)
test_predictions = model.predict(X_test)

# Compute R-squared values
train_r2 = r2_score(y_train, train_predictions)
test_r2 = r2_score(y_test, test_predictions)
y_pred = model.predict(X_test)
print(f"Training R-squared: {train_r2:.2f}")
print(f"Testing R-squared: {test_r2:.2f}")
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error on Test Data: {mse:.2f}")

print(f"Training R-squared: {train_r2:.2f}")
print(f"Testing R-squared: {test_r2:.2f}")
# Save the model
model.save("pizza_price_model.h5")
print("Model saved as 'pizza_price_model.h5'.")

1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 53ms/step1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 85ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 52ms/step1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 74ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 31ms/step

WARNING:absl:You are saving your model as an HDF5 file via `model.save()` or `keras.saving.save_model(model)`. This file format is considered legacy. We recommend using instead the native Keras format, e.g. `model.save('my_model.keras')` or `keras.saving.save_model(model, 'my_model.keras')`.

Training R-squared: 0.98
Testing R-squared: 0.98
Mean Squared Error on Test Data: 47.82
Training R-squared: 0.98
Testing R-squared: 0.98
Model saved as 'pizza_price_model.h5'.

Using the saved model for prediction

# Load the saved model for future use
loaded_model = load_model("pizza_price_model.h5")
print("\nLoaded the saved model and testing it...")

test_radius = np.array([[12]])  # Example: predict price for a 12 cm pizza
predicted_price = loaded_model.predict(test_radius)
print(f"Predicted price for a pizza with radius {test_radius[0][0]} cm: ${predicted_price[0][0]:.2f}")

WARNING:absl:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.


Loaded the saved model and testing it...
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 85ms/step1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 108ms/step
Predicted price for a pizza with radius 12 cm: $120.76