Get em while they're Hot!
- 8 minsSyncing Docker Images from DockerHub to a Private Harbor Registry with Python
In today’s world of continuous integration and deployment (CI/CD), managing and storing Docker images in a secure, reliable, and centralized registry is crucial for any development workflow. DockerHub is the most popular public registry for images, but many organizations prefer using private registries like Harbor to maintain control over their images and improve security. Storing images in your own private registry allows you to keep an archive of images your organization may still need in the event that they get removed publicly on DockerHub. In this blog, we’ll explore how to sync Docker images from DockerHub to a private Harbor registry and automate the process using Python.
I dont own a boat. What is Harbor?
Harbor is an open-source container registry that provides a secure and scalable solution for storing and distributing container images. It supports both Docker and Helm charts and provides several advanced features such as role-based access control (RBAC), image vulnerability scanning, replication, and more.
By syncing DockerHub images to a private Harbor registry, organizations can benefit from a private environment with greater control over their assets, enhanced security features, and improved reliability. With the frequent changes in today’s industry its not uncommon to hear a team remove old, deprecrated, or vulnerable images from public registries, but if your orgnanization needs those images still, you may be out of like without a backup somewhere. Thats where harbor comes in.
Prerequisites
Before we dive into syncing Docker images, make sure you have the following prerequisites:
- Docker installed on your local machine.
- A Harbor registry set up. If you don’t have one, follow Harbor’s installation guide.
- Python 3.x installed on your machine.
- The Harbor API credentials (username and password) to authenticate with your Harbor registry.
Note: A Dockerhub account may be required for pulling private images but this blog and python script wont follow steps to do that .
Step-by-Step Guide to Sync Docker Images from DockerHub to Harbor
1. Log in to DockerHub and Harbor
Before interacting with Harbor, you’ll need to authenticate from your terminal:
# Harbor Login
docker login <your-harbor-registry-url> -u <your-harbor-username> -p <your-harbor-password>
These steps are done in python using HARBOR_USERNAME and HARBOR_PASSWORD variables. This may not be the best or most secure solution, but it is in fact a working solution for now. Once you’ve logged in, Harbor will store your credentials locally and allow you to pull or push images as needed.
2. Pull Docker Image from DockerHub
You can pull a Docker image from DockerHub using the docker pull
command.
docker pull nginx:latest
This downloads the latest version of the nginx
image from DockerHub.
3. Tag the Image for Harbor Registry
In order to push the image to your private Harbor registry, you need to tag it with your Harbor registry URL. This is done using the docker tag
command.
docker tag nginx:latest <your-harbor-registry-url>/<your-project-name>/nginx:latest
Replace <your-harbor-registry-url>
with the actual URL of your Harbor registry and <your-project-name>
with the name of the project in Harbor where you want to store the image.
Example: "http://harbor.ip.example.com/api/v2.0/projects"
4. Push the Docker Image to Harbor
Now that the image is tagged correctly, you can push it to your Harbor registry.
docker push <your-harbor-registry-url>/<your-project-name>/nginx:latest
This command will upload the nginx:latest
image to the specified project in Harbor. Once the push is complete, you can verify the image in your Harbor registry.
5. Automating the Sync with Python
Now that we know how to manually sync images, let’s automate the process with Python using the requests
library. The Python script will allow you to pull images from DockerHub and push them to your Harbor registry programmatically.
Installing Required Python Libraries
To get started, you need to install the following Python packages:
pip install requests
requests
: Used for HTTP requests to interact with the Harbor API.
Example Python Script
Here’s my Python script that automates the process of syncing Docker images from DockerHub to Harbor.
Where logic meets languages
The docker module for python could be used to manage connecting to dockerhub and syncing docker images. However, my approach doesnt use that becuase I have found in many cases that using popular modules like this tend to cause headaches in the future when I look back at this to update it. In my quest to learn the C programming languages and sharpen my general programming skills with projects like this I have found that many of my headaches in learning programming or maintaining old code come from dealing with dependancies and dependancy hell. Whether it be with react or messing around with minimalistic jekyll blog sites like this one I find myself working on updating dependancies and spending more time handling the struggles of using templates or modules that arent actively maintained. It has made me appreciate the C programming language and inspired a new approach to writing code. I can go on for days about this but I think it can be summed up with this Linus Torvalds email response I found online. His response is in refence to C++ being used in the Linux kernel, which can also explain the struggles that current Rust maintainers have with implementing Rust into the Linux Kernel.
He describes that using C++ libraries leads to bad design choices. More specifically he states
inefficient abstracted programming models where two years down the road you notice that some abstraction wasn’t very efficient, but now all your code depends on all the nice object models around it, and you cannot fix it without rewriting your app.
Many times when learning to write code through tutorials or using a template to help jumpstart projects you fall into the mercy of another persons design choices. While learning C, which can feel like leraning to build a car from scratch, it has shown me that sometimes, even the most popular and useful modules like pyhton’s docker module, may be a bad decision. In this case I find it to be useful because In my example I only use it to authenticate to the harbor registry once. If I wanted to connect to Dockerhub I could create a function that would use the current run_command function to authenticate if needed. No need to utilize a python module that may get updated next week and change the authentication process of this small script. Once you start writing, your choices will start to form on their own. This, along with function names and comments that are clear and robotic make it easier to look at this in the future and update as necessary. The requests module in this project allows you to send http requests easily and is used here because its features and functionality are solid and makes this project much simpler. The complexiity required to pull docker images tags, parse through tags in json, and save them, could not have been done without the request module. At least not in a timely fashion, and for that reason, it will be one of the few modules I choose to use. I have decided to shrink my use of modules to as few as possible and to question my choices after 3 modules to validate the need for them. Working in large enterprise environments like I do now continues to show me why this is important and will be a standard I do my best to follow moving forward. Now, that you have read all that, lets get back to business.
Breakdown of the Script
The journey of creating something comes from the steps involved. When it comes to programming I have found that if you start with something simple and small, and expand over time, it makes the intimidating aspect of writing a big project a lot easier. This can be applied to anything in life really lol. This script was broken down into small problems that I needed functions to fix.
- Pulling the Image
- Tagging the Image
- Harbor Authentication
- Pushing the Image:
6. Scheduling the Sync Task
If you need to periodically sync images from DockerHub to Harbor, you can schedule the Python script to run using AAP (Ansible Automation Project) or by using a local cron job on Linux.
Example: on a Linux system, you can add a cron job by running crontab -e
and adding the following line to sync the image every day at midnight:
0 0 * * * python3 /path/to/docker_image_snyc.py
Conclusion
Syncing Docker images from DockerHub to a private Harbor registry is a great way to secure and manage your containerized applications. By using Python, you can automate the process, making it easier to manage your Docker images and ensuring they are safely stored in your private Harbor registry.
HARBOR_URL = "http://bigharborregistry.com/api/v2.0/projects"
HARBOR_REGISTRY = "bigharborregistry.com"
HARBOR_USERNAME = "automation"
HARBOR_PASSWORD = "RoBoTaCcOuNt16DiGiTpAsSwOrd"
With this provided script you can make changes to the following variables and extend the process to your own CI/CD pipeline. Enjoy it.