Variational Autoencoders: Hyperparameter Tuning with Docker and Bash Scripts (Part 3)
Welcome to the final segment of our three-part series on Variational Autoencoders. After delving into the introduction and implementation in Part 1, and exploring training procedures in Part 2, we now turn our focus towards optimizing our model’s performance through hyperparameter tuning in Part 3. To access the complete code for this series, please visit our GitHub repository at https://github.com/asokraju/ImageAutoEncoder.
In any machine learning model, hyperparameters are the knobs and levers we adjust to get the best possible model performance. However, finding the right settings can be a bit like finding a needle in a haystack — time-consuming and, at times, downright perplexing! But don’t worry, we’ll guide you through the process in a simplified and straightforward manner.
In this segment, we’ll leverage Docker and Bash scripts to perform hyperparameter tuning of our Variational Autoencoder. Docker, a popular platform used for deploying applications, will help us create an isolated environment for our experiment, ensuring our results are reproducible. Meanwhile, Bash scripts will automate the tedious task of running our model with different hyperparameters, making the process more efficient.
So, are you ready to fine-tune your understanding of Variational Autoencoders and unlock the full potential of your model? Let’s dive into “Variational Autoencoders: Hyperparameter Tuning with Docker and Bash Scripts” together!
Don’t forget to revisit Part 1 and Part 2 if you need a refresher or want to revisit any concepts. Happy learning!
Hyperparameter Experiments
The development of any machine learning model often involves fine-tuning a range of hyperparameters. However, testing every possible combination manually would be a daunting task. That’s where master.sh
and worker.sh
come in handy. These two bash scripts automate the process of trying out different hyperparameters and logging the results, saving us considerable time and effort.
master.sh
is our control room, orchestrating the various combinations of hyperparameters we wish to test. It systematically loops through our predefined set of hyperparameters — in this case, learning rates, latent dimensions, and batch sizes — and for each unique combination, it calls the worker.sh
script.
The worker.sh
script is the worker on the ground. Each time it is invoked, it receives a unique combination of hyperparameters from master.sh
, sets up a dedicated log directory for that experiment, and then runs our model — in this case, train.py
— with those specific hyperparameters. The log directory is named uniquely based on the hyperparameters used, so that we can easily identify the results of each experiment later.
With these two scripts in place, we can kick back and let our machine do the heavy lifting, running experiments with different hyperparameters and logging the results for us to analyze at our leisure.
# Contents of master.sh
#!/bin/bash -l
for learning_rates in 0.001
do
for latent_dims in 6 8
do
for batch_sizes in 128
do
./scripts/call_experiments.sh $learning_rates $latent_dims $batch_sizes
done
done
done
Let’s now take a closer look at the details of these scripts.
Master Script:
The master.sh
script's primary function is to loop through the different hyperparameters we want to test for our model training and then call the worker.sh
script to perform each experiment with the provided hyperparameters.
Let’s break down the steps:
#!/bin/bash -l
: This line, often known as a shebang, tells the system that this file is a bash script and should be executed as such.for learning_rates in 0.001
: This begins a loop that will iterate through the different learning rates. In this case, it only contains one value, 0.001. You can add more values separated by space, for example,for learning_rates in 0.001 0.01 0.1
.for latent_dims in 6 8
andfor batch_sizes in 128
: These are additional loops for other hyperparameters - latent dimensions and batch sizes../scripts/call_experiments.sh $learning_rates $latent_dims $batch_sizes
: This is the critical step where thecall_experiments.sh
script is invoked with the currently selected hyperparameters. These values are passed as arguments to theworker.sh
script.done
: Each of these closes a for loop. Since there are threefor
loops, there must be threedone
commands.
In essence, this script will perform a hyperparameter search over the Cartesian product of the specified learning rates, latent dimensions, and batch sizes, running the worker.sh
script for each combination.
Worker Script
The worker.sh
script is designed to accept a set of hyperparameters as input, set up a unique log directory for the experiment, and then run the Python training script with these hyperparameters.
# contents of worker.sh
#!/bin/bash
learning_rate=$1
latent_dim=$2
batch_size=$3
PARENT_DIR="$(dirname $PWD)"
EXEC_DIR=$PWD
log_dir="logs/lr=${learning_rate}_latentdim=${latent_dim}_batchsize=${batch_size}"
mkdir -p $log_dir
echo "Current working directory is: $(pwd)"
python train.py --image-dir='../train_data' --learning-rate=${learning_rate} --latent-dim=${latent_dim} --batch-size=${batch_size} --logs-dir=${log_dir}
Here’s a detailed explanation of its steps:
#!/bin/bash
: Just like in themaster.sh
script, this shebang declares the file as a bash script.learning_rate=$1
,latent_dim=$2
,batch_size=$3
: These lines capture the input arguments provided bymaster.sh
and assign them to corresponding variables.PARENT_DIR="$(dirname $PWD)"
,EXEC_DIR=$PWD
: Here, we are saving the parent directory path and current directory path into variables for potential future use.log_dir="logs/lr=${learning_rate}_latentdim=${latent_dim}_batchsize=${batch_size}"
,mkdir -p $log_dir
: This pair of lines creates a unique directory to store logs for the current set of hyperparameters. The-p
flag in themkdir
command ensures that it creates the entire directory path if it doesn't exist.echo "Current working directory is: $(pwd)"
: This line just prints the current working directory to the terminal for debugging purposes.- The final line runs the Python training script with the chosen hyperparameters and specifies the log directory for this run:
python train.py --image-dir='../train_data' --learning-rate=${learning_rate} --latent-dim=${latent_dim} --batch-size=${batch_size} --logs-dir=${log_dir}
In summary, the worker.sh
script performs a single experiment with a given set of hyperparameters, logs the experiment's output in a dedicated directory, and then terminates.
Docker Setup
The Dockerfile and docker-compose files are used in the context of Docker, a platform that allows you to package applications and their dependencies into isolated containers.
The Dockerfile is a text file that contains a set of instructions for building a Docker image. It defines the base image, sets the working directory, copies files into the image, installs dependencies, and specifies the command to run when the container is launched.
The docker-compose file, on the other hand, is used to define and manage multiple containers as a part of a single application. It allows you to define the services, their configurations, and how they interact with each other.
# Contents of Dockerfile
# Use an official Tensorflow runtime as a parent image
FROM tensorflow/tensorflow:latest
# Set the working directory to /app
WORKDIR /autoencoders
COPY . .
# Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
RUN pip install pyyaml
RUN chmod +x scripts/master.sh
RUN chmod +x scripts/worker.sh
# Run app.py when the container launches
CMD ["python", "train.py"]
Now, let’s go through each step in the Dockerfile:
FROM tensorflow/tensorflow:latest
: This line specifies the base image to use, which is the latest version of the official TensorFlow runtime image.WORKDIR /autoencoders
: Sets the working directory inside the container to/autoencoders
. This is where the subsequent commands will be executed.COPY . .
: Copies all the files from the current directory (where the Dockerfile is located) to the/autoencoders
directory inside the container.RUN pip install --no-cache-dir -r requirements.txt
: Installs the Python packages specified in therequirements.txt
file. The--no-cache-dir
flag is used to avoid caching the package index on the container.RUN pip install pyyaml
: Installs thepyyaml
package using pip. This package is likely required for some YAML-related functionality.RUN chmod +x scripts/master.sh
andRUN chmod +x scripts/worker.sh
: Changes the permission of the shell scriptsmaster.sh
andworker.sh
to make them executable.CMD ["python", "train.py"]
: Specifies the default command to run when the container is launched. In this case, it runs thetrain.py
Python script using the Python interpreter.
# contents of requirements.txt
pandas==1.3.3
numpy==1.21.2
matplotlib==3.4.3
argparse==1.4.0
protobuf==3.20.*
tensorflow==2.7.0
pyyaml
Now, let’s move on to the docker-compose file:
version: '3'
: Specifies the version of the docker-compose file format being used.services
: Defines the services (containers) that make up the application.autoencoders
: The name of the service.build
: Specifies how to build the image for this service.context: .
: Sets the build context to the current directory (where the docker-compose file is located).dockerfile: Dockerfile
: Specifies the Dockerfile to use for building the image.ports: - "8080:80"
: Maps port 8080 on the host machine to port 80 on the container. This allows accessing the service running inside the container vialocalhost:8080
.volumes: - ./:/autoencoders
: Mounts the current directory on the host machine to the/autoencoders
directory inside the container, ensuring that changes to files on the host are reflected inside the container.- type: bind source: F:/train_data target: /train_data
: Binds theF:/train_data
directory on the host machine to the/train_data
directory inside the container, allowing access to training data from within the container.command: ./scripts/master.sh
: Specifies the command to run when starting the container. In this case, it runs themaster.sh
script located in thescripts
directory.
# Contents of docker-compose.yml
version: '3'
services:
autoencoders:
build:
context: .
dockerfile: Dockerfile
ports:
- "8080:80"
volumes:
- ./:/autoencoders
- type: bind
source: F:/train_data
target: /train_data
command: ./scripts/master.sh
In your docker-compose.yml
file, you have specified two volumes. The first volume maps the current directory (where your docker-compose.yml
file is located) on your host machine to the /autoencoders
directory in your Docker container.
The second volume is a bind mount, which binds a directory or file from your host machine to a directory or file in your Docker container. In this case, you are binding the F:/train_data
directory on your host machine to the /train_data
directory in your Docker container.
This line is significant because your training script (running inside the Docker container) expects to find your training data at /train_data
. But since Docker containers are isolated from your host machine, you need a way to provide the training data to the script. The bind mount makes this possible by making the F:/train_data
directory on your host machine available at /train_data
in the Docker container.
However, not everyone who uses your scripts will have their training data at F:/train_data
. That's why you need to instruct them to change this line according to where their training data is located. They can replace F:/train_data
with the path to their training data. If their training data is located at C:/Users/user123/data
, for example, they would need to change this line to:
# Contents of docker-compose.yml
version: '3'
services:
autoencoders:
build:
context: .
dockerfile: Dockerfile
ports:
- "8080:80"
volumes:
- ./:/autoencoders
- type: bind
source: C:/Users/user123/data
target: /train_data
command: ./scripts/master.sh
These steps collectively define the Dockerfile and docker-compose file for building an image and running the associated container, enabling the training of autoencoders within a containerized environment.