2.1 Transitioning from Custom MLP Architecture to Keras for Generalized Applications
The custom Multi-Layer Perceptron (MLP) class architecture implemented earlier provides a solid foundational understanding of how neural networks function, including layer definitions, activation mechanisms, forward and backward propagation, and performance evaluation. However, when scaling these implementations to generalizable and complex applications, certain limitations of such custom approaches become evident. This section introduces Keras, a widely adopted library for building and benchmarking Artificial Neural Networks (ANNs), highlighting its relevance as a practical alternative to custom implementations.
2.2 Limitations of the Custom MLP Class Architecture
While the MLP class offers flexibility and transparency for learning purposes, its limitations include:
2.2.1 Scalability
The manual initialization of weights, biases, and layer-specific computations can become cumbersome as the network depth and size increase.
Handling large datasets or multiple training tasks requires additional optimization techniques that are non-trivial to implement manually.
2.2.2 Optimization Challenges
Implementing advanced optimizers like RMSprop, Adam, or adaptive gradient methods demands significant coding effort.
Features like learning rate scheduling and early stopping require extensive additional logic.
2.2.3 Performance Bottlenecks
The current design lacks GPU acceleration, limiting its applicability to computationally intensive tasks.
Debugging and profiling performance manually can be error-prone and time-consuming.
2.2.4 Generalization Issues
While sufficient for specific tasks like XOR gate simulations, the architecture lacks modularity for handling diverse, general-purpose applications such as image recognition or text classification.
Integration with modern research architectures, such as convolutional or recurrent networks, is challenging.
2.2.5 Limited Ecosystem Support
Custom implementations do not leverage pre-trained models, a key requirement for applications in transfer learning and fine-tuning.
2.3Keras: A Practical Alternative for Generalized Applications
Keras is a neural network Application Programming Interface (API) for Python that is tightly integrated with TensorFlow, which is used to build machine learning models. Keras’ models offer a simple, user-friendly way to define a neural network, which will then be built for you by TensorFlow.
2.3.1 Why Keras?
Keras addresses the limitations of from the scratch architecture by providing a high-level, modular, and extensible framework built on top of TensorFlow. Here’s how Keras improves upon the custom MLP class:
Ease of Use:
Pre-built layers and optimizers simplify network creation without compromising flexibility.
APIs for common tasks, such as dataset preprocessing, model saving, and loading, minimize boilerplate code.
Scalability and Performance:
Support for GPUs and TPUs ensures computational efficiency, especially for large datasets and deep architectures.
Built-in profiling tools facilitate real-time performance monitoring and debugging.
Modularity for Advanced Architectures:
Keras supports a wide range of network types, including Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Transformers, which are impractical to implement from scratch.
The library integrates seamlessly with TensorFlow’s ecosystem, allowing access to tools like TensorBoard for visualization.
Pre-trained Models and Ecosystem:
Keras offers access to a library of state-of-the-art pre-trained models, enabling rapid prototyping and transfer learning.
The ecosystem includes community support and extensive documentation, enhancing usability and troubleshooting.
What’s the Difference Between Tensorflow and Keras?
TensorFlow is an open-source set of libraries for creating and working with neural networks, such as those used in Machine Learning (ML) and Deep Learning projects.
Keras, on the other hand, is a high-level API that runs on top of TensorFlow. Keras simplifies the implementation of complex neural networks with its easy to use framework.
Tensorflow vs keras
When to Use Keras vs TensorFlow
TensorFlow provides a comprehensive machine learning platform that offers both high level and low level capabilities for building and deploying machine learning models. However, it does have a steep learning curve. It’s best used when you have a need for:
Deep learning research
Complex neural networks
Working with large datasets
High performance models
Keras, on the other hand, is perfect for those that do not have a strong background in Deep Learning, but still want to work with neural networks. Using Keras, you can build a neural network model quickly and easily using minimal code, allowing for rapid prototyping. For example:
# Import the Keras libraries required in this example: from keras.models import Sequentialfrom keras.layers import Dense, Activation# Create a Sequential model: model = Sequential()# Add layers with the add() method: model.add(Dense(32, input_dim=784)) model.add(Activation('relu'))
Keras is less error prone than TensorFlow, and models are more likely to be accurate with Keras than with TensorFlow. This is because Keras operates within the limitations of its framework, which include:
Computation speed: Keras sacrifices speed for user-friendliness.
Low-level Errors: sometimes you’ll get TensorFlow backend error messages that Keras was not designed to handle.
Algorithm Support: Keras is not well suited for working with certain basic machine learning algorithms and models like clustering and Principal Component Analysis (PCM).
Dynamic Charts: Keras has no support for dynamic chart creation.
2.3.2 Keras Model Overview
Models are the central entities in Keras, enabling the definition of TensorFlow neural networks by specifying attributes, functions, and layers. Keras provides multiple APIs for designing neural networks, catering to varying levels of complexity and use cases:
Sequential API:
Allows building models layer by layer, suitable for most straightforward problems.
Provides a simple list-based structure but is restricted to single-input, single-output stacks of layers.
Functional API:
A comprehensive and flexible API supporting arbitrary model architectures.
Ideal for creating complex models with multiple inputs, outputs, or shared layers.
Model Subclassing:
Enables implementing models from scratch by subclassing the base Model class.
Primarily used for research or highly specialized applications, though rarely needed for typical use cases.
2.4 Transition Example
2.4.1 Simulating XOR Gate with Keras
Revisiting the XOR gate example, here’s how it can be implemented using Keras:
Model definition
from keras.models import Sequentialfrom keras.layers import Densemodel = Sequential()model.add(Dense(4, input_dim=2, activation='sigmoid')) #first hidden layer with 4 neuronsmodel.add(Dense(16, activation='sigmoid')) # second hidden layer with 16 neuronsmodel.add(Dense(1, activation='sigmoid'))model.summary()
C:\Users\SIJUKSWAMY\AppData\Local\Programs\Python\Python312\Lib\site-packages\keras\src\layers\core\dense.py:87: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
super().__init__(activity_regularizer=activity_regularizer, **kwargs)
By using Keras, the manual implementation steps (e.g., weight initialization, forward/backward propagation) are abstracted, focusing instead on defining and optimizing the architecture. In this approach, the key words are: compile– fit–evaluate–predict.
2.4.2 Performance evaluation of the keras XOR gate model
# Generate synthetic data: radius (X) and price (y)np.random.seed(42)radii = np.random.uniform(5, 20, 25) # Random radii between 5 and 20 cmprices = radii *10+ np.random.normal(0, 5, 25) # Price proportional to radius with some noiseX = radii.reshape(-1, 1) # Feature: radiusy = prices.reshape(-1, 1) # Target: price# Split data into training and testing setsX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=24003)
Creating a model
# Build the MLP modelmodel = Sequential()model.add(Dense(16, input_dim=1, activation='relu')) # Hidden layer with 16 neuronsmodel.add(Dense(8, activation='relu'))# Hidden layer with 8 neuronsmodel.add(Dense(1)) # Output layer for regression
C:\Users\SIJUKSWAMY\AppData\Local\Programs\Python\Python312\Lib\site-packages\keras\src\layers\core\dense.py:87: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
super().__init__(activity_regularizer=activity_regularizer, **kwargs)
Compiling the model
# Compile the modelmodel.compile(optimizer='adam', loss='mean_squared_error')
Train the model
# Train the model and capture training historyhistory = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=500, verbose=0)
Plotting Model performance while training
# Plot training and validation lossplt.figure(figsize=(10, 6))plt.plot(history.history['loss'], label='Training Loss')plt.plot(history.history['val_loss'], label='Validation Loss')plt.title('Model Loss During Training')plt.xlabel('Epochs')plt.ylabel('Loss')plt.legend()plt.grid()plt.show()
Evaluate and save the model
# Evaluate the modeltrain_predictions = model.predict(X_train)test_predictions = model.predict(X_test)# Compute R-squared valuestrain_r2 = r2_score(y_train, train_predictions)test_r2 = r2_score(y_test, test_predictions)y_pred = model.predict(X_test)print(f"Training R-squared: {train_r2:.2f}")print(f"Testing R-squared: {test_r2:.2f}")mse = mean_squared_error(y_test, y_pred)print(f"Mean Squared Error on Test Data: {mse:.2f}")print(f"Training R-squared: {train_r2:.2f}")print(f"Testing R-squared: {test_r2:.2f}")# Save the modelmodel.save("pizza_price_model.h5")print("Model saved as 'pizza_price_model.h5'.")
WARNING:absl:You are saving your model as an HDF5 file via `model.save()` or `keras.saving.save_model(model)`. This file format is considered legacy. We recommend using instead the native Keras format, e.g. `model.save('my_model.keras')` or `keras.saving.save_model(model, 'my_model.keras')`.
Training R-squared: 0.98
Testing R-squared: 0.98
Mean Squared Error on Test Data: 47.82
Training R-squared: 0.98
Testing R-squared: 0.98
Model saved as 'pizza_price_model.h5'.
Using the saved model for prediction
# Load the saved model for future useloaded_model = load_model("pizza_price_model.h5")print("\nLoaded the saved model and testing it...")test_radius = np.array([[12]]) # Example: predict price for a 12 cm pizzapredicted_price = loaded_model.predict(test_radius)print(f"Predicted price for a pizza with radius {test_radius[0][0]} cm: ${predicted_price[0][0]:.2f}")
WARNING:absl:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.
Loaded the saved model and testing it...
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 85ms/step1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 108ms/step
Predicted price for a pizza with radius 12 cm: $120.76