How to train a Deep Learning model with AWS Deep Learning Containers on Amazon EC2?

Reading Time: 11 minutes
How to train a Deep Learning model with AWS Deep Learning Containers on Amazon EC2?

Data scientists, machine learning engineers, and practitioners must devote significant time and resources to developing, testing, updating, and optimizing Docker images for deep learning. Instead of concentrating on developing and enhancing models, practitioners are forced to divert valuable resources to unrelated tasks. Installing packages, resolving compatibility concerns, performance optimization, and integrating and testing with Amazon Sagemaker, Amazon EC2, Amazon ECS, and Amazon EKS are examples of these tasks. AWS DL Containers provide deep learning Docker environments that are fully tested and optimized and require no installation, configuration, or maintenance. Deep learning practitioners using TensorFlow, PyTorch, or Apache MXNet will find everything they need to be packaged and optimized in these Docker images.

In this blog, we will explore AWS Deep Learning Containers, its features, and benefits, and learn How to Train a Deep Learning model with AWS Deep Learning Containers on Amazon EC2.

In this blog, we will cover:

  • AWS Deep Learning Containers
  • Key features of AWS Deep Learning Containers
  • Benefits of AWS Deep Learning Containers
  • Pricing of AWS Deep Learning Containers
  • Hands-on
  • Customers of AWS Deep Learning Containers
  • Conclusion

AWS Deep Learning Containers

How to train a Deep Learning model with AWS Deep Learning Containers on Amazon EC2?

AWS Deep Learning Containers are Docker images pre-installed with deep learning frameworks.  It allows you to quickly deploy custom machine learning (ML) environments by avoiding the time-consuming process of constructing and optimizing environments from scratch. TensorFlow, PyTorch, and Apache MXNet are supported by AWS DL Containers. You can use Amazon SageMaker, Amazon Elastic Kubernetes Service (Amazon EKS), self-managed Kubernetes on Amazon EC2, and Amazon Elastic Container Service to deploy AWS DL Containers (Amazon ECS). The containers are free to use and may be found on Amazon Elastic Container Registry (Amazon ECR) and AWS Marketplace. You just pay for the resources you use.

Key Features of AWS Deep Learning Containers

  • Customizable container images: Pre-packaged docker container images are fully configured and validated.
How to train a Deep Learning model with AWS Deep Learning Containers on Amazon EC2?

  • Best performance and scalability without tuning: Support for TensorFlow, Apache, MXNet
  • Single and multi-node training and interference: Works with Amazon EKS, Amazon ECS, and Amazon EC2

Benefits of AWS Deep Learning Containers

Start building immediately: Deep learning environments may be deployed in minutes using pre-packaged Docker images. The images are completely tested and contain the essential deep learning framework libraries (currently TensorFlow, PyTorch, and Apache MXNet). For greater control over monitoring, compliance, and data processing, you can easily layer your own libraries and tools on top of these images.

Get the best performance automatically: AWS DL Containers feature AWSoptimizations and enhancements to the newest versions of popular frameworks and libraries, including TensorFlow, PyTorch, and Apache MXNet, to give the best training and inference performance in the cloud. AWS TensorFlow enhancements, for example, allow models to train up to twice as fast because of enhanced GPU scalability.

How to train a Deep Learning model with AWS Deep Learning Containers on Amazon EC2?

Quickly add machine learning to Kubernetes applications: AWS DL containers are designed to integrate with Amazon EC2’s Kubernetes. If you have Kubernetes apps running on Amazon EC2, you can use AWS DL Containers to easily integrate machine learning as a microservice to those applications.

Easily manage machine learning workflows: AWS DL Containers are strongly linked with Amazon SageMaker, Amazon EKS, and Amazon ECS, allowing you to create custom machine learning processes for training, validation, and deployment with ease. Amazon EKS and Amazon ECS manage all of the container orchestration required to launch and scale AWS DL Containers on clusters of virtual machines as a result of this interaction.

Pricing of AWS Deep Learning Containers

AWS DL Containers are available at no additional charge. You just pay for the AWS resources you use, such as Amazon Sagemaker, Amazon EC2, Amazon ECS, Amazon EKS, and other AWS services.

Hands-on

In this hands-on, we will see how we can start training deep learning models with AWS Deep Learning containers on Amazon EC2. AWS Deep Learning Containers (AWS DL Containers) are Docker images pre-installed with deep learning frameworks to make it easy to deploy custom machine learning (ML) environments quickly by letting you skip the complicated process of building and optimizing your environments from scratch. Docker containers are a popular way to deploy custom ML environments that run consistently in multiple environments. AWS DL Containers provide Docker images that are pre-installed and tested with the latest versions of popular deep learning frameworks and the libraries they require. We will first create an IAM user with access to Amazon ECS and attach an inline policy to the same user for Amazon ECR. Modifying the private VPCs to auto-assign IPv4 addresses, we will then navigate through the various steps on the Amazon EC2 dashboard to configure a Deep Learning Base AMI instance and launch it with a private RSA key- pair. Performing the configurations for the EC2 instance, we will then connect to the Deep Learning Base AMI instance via an SSH client. On the command line, changing the rights for the private RSA key-pair, we will then connect to the EC2 instance and log in to the console adding the Access Key ID and Secret Access Key of the newly created user. Finally, on successful login, we will then log in to Amazon ECR, pull the Deep Learning Container images if it doesn’t exist locally, clone a repository that consists of multiple models, and test the configuration via training an example model or any other model as per your requirements.

How to train a Deep Learning model with AWS Deep Learning Containers on Amazon EC2?

To implement this, we will do the following:

  • Login to your AWS console and navigate to the dashboard.
  • Navigate to the IAM console and create a new IAM user with the required policies attached to it.
  • Attach existing policies and an inline policy to the IAM user.
  • Make a note of the Secret Access Key and Access Key ID.
  • Search for the VPC service and navigate to the subnets of the created VPC.
  • Switch on ‘Auto-assign IPv4 addresses’ for your subnets in the created VPC.
  • Navigate back to the EC2 dashboard and launch a new Deep Learning Base AMI instance.
  • Navigate through all the creation steps and change the configurations as shown in the below steps in this hands-on.
  • Review all the configurations and create a private new key pair and download the key while launching the instance.
  • Connect to the newly created EC2 instance either via EC2 connect or SSH client.
  • Change the rights for the .pem file in the folder you downloaded the key pair.
  • SSH into the ec2 instance using the command line and executing the required commands.
  • On success, log in to your AWS account using the Access Key ID and Secret Access Key.
  • Login to AWS ECR using a command provided in the hands-on.
  • Pull the Deep Learning Container images using a Docker command if it doesn’t exist locally.
  • Finally, clone a repository and train the models to test the connection or you can train your own models as per your requirements.
  • In case you are just following the hands-on for learning purposes, make sure to eliminate all the resources you created throughout the hands-on.

Login to your AWS console and navigate to the dashboard.

Search for the IAM service.

How to train a Deep Learning model with AWS Deep Learning Containers on Amazon EC2?

You will be navigated to the IAM dashboard.

Now click on Users and navigate to the Users dashboard. Click on Add users.

How to train a Deep Learning model with AWS Deep Learning Containers on Amazon EC2?

You will be navigated to the Create user dashboard. Enter a user name for the new IAM user. And select Programmatic access. Click on Next: Permissions.

Select Attach existing policies directly. Search for AmazonECS and select AmazonECS_FullAccess. Click on Next: Tags.

How to train a Deep Learning model with AWS Deep Learning Containers on Amazon EC2?

Enter tags for the new IAM user if needed. Click Next: Review.

Review all the changes and once done, click on Create user.

On success, you will see the screen as shown in the image below. Make sure to download the .csv file and save it in a secure place. Now, click Close.

On success, you will see the screen as shown in the image below. Now, select the newly created user.

How to train a Deep Learning model with AWS Deep Learning Containers on Amazon EC2?

Click on Add inline policy.

Add the following policy in the JSON tab. Once done, click on Attach Policy.

{
       "Version": "2012-10-17",
       "Statement": [
              {
                     "Action": "ecr:*",
                     "Effect": "Allow",
                     "Resource": "*"
              }
       ]
}
How to train a Deep Learning model with AWS Deep Learning Containers on Amazon EC2?

Review the policy changes and click on Create policy.

You will see the newly created policy attached to the newly created user.

How to train a Deep Learning model with AWS Deep Learning Containers on Amazon EC2?

Now, search for the VPC service.

Next, you need to ensure that your subnet has the “Enable auto-assign public IPv4 address” checked (enabled). This is to be ensured since you will have to connect to your EC2 instance to make sure the SSM agent is installed on your instance. Without the public IPv4 address, you will not be allowed to connect to your instance.

To enable/verify it, navigate to the Amazon VPC dashboard.

How to train a Deep Learning model with AWS Deep Learning Containers on Amazon EC2?

Click on “Subnets” on the left navigation pane under the “Virtual Private Cloud” section.

Select your subnet in which you will be creating your instance and click on “Actions”.

How to train a Deep Learning model with AWS Deep Learning Containers on Amazon EC2?

Select “Modify auto-assign IP settings”. Ensure that the checkbox for “Enable auto-assign public IPv4 address” is checked and click on “Save”.

On success, you will get a success message as shown in the below image.

Now, search for the EC2 service and navigate to the EC2 dashboard.

How to train a Deep Learning model with AWS Deep Learning Containers on Amazon EC2?

Click on Instances and navigate to the Instances dashboard. Click on Launch instances.

Search for Deep Learning Base AMI.

How to train a Deep Learning model with AWS Deep Learning Containers on Amazon EC2?

Scroll down and select the Deep Learning Base AMI (Amazon Linux). Click on Select.

In the next step, select the instance type as c5.large. Click on Next: Configure Instance Details.

How to train a Deep Learning model with AWS Deep Learning Containers on Amazon EC2?

In the 3rd step, configure the instance details or leave it as it is by default. Click on Next: Add Storage.

Alter the storage requirements as per your needs. Click on Next: Add Tags.

How to train a Deep Learning model with AWS Deep Learning Containers on Amazon EC2?

Add tags for your EC2 instance if any needed. Click on Next: Configure Security Groups.

Finally, click on Review and Launch.

Review the settings for your EC2 instance and click on Launch.

How to train a Deep Learning model with AWS Deep Learning Containers on Amazon EC2?

Select Create a new key pair, key pair type as RSA and enter a name for the key pair. Download the key pair by clicking the download button and finally click on Launch instance.

On success, you will see the screen as shown in the image below. Scroll down and click on View Instances.

You will see your newly launched instance in the Running state.

How to train a Deep Learning model with AWS Deep Learning Containers on Amazon EC2?

Navigate to the folder wherein you stored your EC2 key pair.

Open command prompt or bash terminal in the same folder and execute the command:

chmod 0400 <your .pem filename>

Now, execute either of the two commands to ssh into the newly created instance:

ssh -i "ECS-ECR.pem" ec2-user@ec2-18-142-162-225.ap-southeast-1.compute.amazonaws.com

ssh -L localhost:8888:localhost:8888 -i ECS-ECR.pem ec2-user@ec2-18-142-162-225.ap-southeast-1.compute.amazonaws.com

How to train a Deep Learning model with AWS Deep Learning Containers on Amazon EC2?

Login to your AWS account using the newly created user. Execute the command: 

aws configure

Enter in the AWS Access Key ID, Secret Access Key and the default region name.

Login to AWS ECR using the command:

$(aws ecr get-login --region ap-southeast-1 --no-include-email --registry-ids 763104351884)

You need to include ‘$’ and parentheses in your command. You will see ‘Login Succeeded’ when this step concludes.

You will now run AWS Deep Learning Container images on your EC2 instance using the command below. The command below will automatically pull the Deep Learning Container image if it doesn’t exist locally.

docker run -it 763104351884.dkr.ecr.ap-southeast-1.amazonaws.com/tensorflow-training:1.13-cpu-py36-ubuntu16.04

On success, you will see the screen as shown in the image below.

We will clone the Keras repository, which includes example python scripts to train models.

git clone https://github.com/fchollet/keras.git

Executing the command below, you will find the cloned repo on your EC2 instance.

ls

You can now start training your own models or any example models from the cloned repository.

Customers of AWS Deep Learning Containers

Conclusion

In this blog, we learned about AWS Deep Learning Containers, its key features, benefits, pricing, and customers. We also saw how we can start training deep learning models with AWS Deep Learning containers on Amazon EC2. We first created an IAM user with access to Amazon ECS and attached an inline policy to the same user for Amazon ECR. Modifying the private VPCs to auto-assign IPv4 addresses, we then navigated through the various steps on the Amazon EC2 dashboard to configure a Deep Learning Base AMI instance and launch it with a private RSA key pair. Performing the configurations for the EC2 instance, we then connected to the Deep Learning Base AMI instance via an SSH client. On the command line, changing the rights for the private RSA key-pair, we connected to the EC2 instance and logged into the console adding the Access Key ID and Secret Access Key of the newly created user. Finally, on successful login, we then logged into Amazon ECR, pulled the Deep Learning Container images if they didn’t exist locally, cloned a repository consisting of multiple models, and tested the configuration. We will discuss more use cases of Amazon Deep Learning Containers in our upcoming blogs. Stay tuned to keep getting all updates about our upcoming new blogs on AWS and relevant technologies.

Meanwhile …

Keep Exploring -> Keep Learning -> Keep Mastering

This blog is part of our effort towards building a knowledgeable and kick-ass tech community. At Workfall, we strive to provide the best tech and pay opportunities to AWS-certified talents. If you’re looking to work with global clients, build kick-ass products while making big bucks doing so, give it a shot at workfall.com/partner today.

Back To Top