In this (two-part) post, we’re going to discuss how to deploy a common seismic processing tool in a slightly different setting. Namely, we are going to go through the steps to create and deploy a container on a remote machine (a so-called serverless deployment). This container will run programs using Madagascar an open-source software suite for geophysical and signal processing applications. Why? I hear you ask. Well, there are a number of reasons:
- Hopefully, it will provide a useful how too or introduction to some of the general concepts in building and deploying services, which in turn can be used to solve a variety of geophysical problems.
- Madagascar is very powerful and widely used but it is also large and can be a bit fickle to build, so even if we were running it locally it might make sense to have a repeatable and transferable installation which is isolated from the rest of our system.
- We want to be able to potentially run large numbers of Madagascar programs in parallel on cloud systems in order to solve seismic processing problems and container environments provide a very good way to do this.
- It will be cool…. No… really it will…
In terms of target audience, I would say although the end goal is to be able to solve geophysical problems, the content of this post is more about computing. That said, in my experience, the majority of geophysical problems are solved using a computer. So I think it often pays for the geophysicist to be somewhat computer savvy. So even if you are not fully paid up computer geek I would invite you to persevere. I will try to avoid specialized content and keep this as simple as possible, however, inevitably there is going to be some technical jargon. In particular, some basic understanding of Docker and cloud deployment concepts is probably desirable if you want to follow along with me.
In this part, we’ll focus on building the image and pushing it to a remote repository so that we (and others) can access it and run Madagascar programs without actually having to install Madagascar on our system. Then in part 2 we will focus on running the image as a remote process on Amazon Web Services.
Let’s get started.
What is Madagascar?
For those not already aware of it, Madagascar is a set of open source set of programs which perform many tasks both simple and complex in seismic processing. It is freely available and (at time of writing) actively maintained with a large user base. The core functionality is written in C/C++ but it also has bindings in python as well as several other high-level programming languages. For details check out the project’s git repository at https://github.com/ahay/src.
Pre-requisites
For the material in this post you will need:
- A command line environment
- Docker installed on your machine – https://www.docker.com/products/docker-engine
Making the Docker Image
Docker images are great, they are essentially transferable virtual machines which allow you to develop in an environment which is identical to production and isolated from the host operating system. This means you make it work once and it will always work even if you have to change to a different production environment (e.g. you manager suddenly decides GCP is cooler than AWS). What is more, they can be stored and versioned in remote repositories (a bit like git) which allows you to transfer them between people and places. Finally, you can also run multiple instances of the same image, which makes this a handy way to run parallel operations. There are plenty of other good reasons to run machines using containers like Docker and there is a lot of further information on the web, https://www.docker.com/ is a good starting point.
Download and install docker
To run Docker containers you first need to install the Docker Deamon/ manager. Go to https://www.docker.com/products/docker-engine and follow instructions for the appropriate download. The installation process is fairly self-explanatory. While you are at it I would also advise creating an account on https://hub.docker.com/ which will allow you to distribute images and deploy them on remote machines etc.
Building Madagascar inside Docker
Docker images have to be built, and Docker builds always start with a Dockerfile. This is a text file usually called … wait for it … “Dockerfile”. It provides the instruction set for how to build the image. The Dockerfile I used to build Madagascar is as follows (you can also obtain the file from https://github.com/motionsignaltechnologies/madagascar-docker-tutorial1):
###################################################################
# Madagascar v2.0 build on Ubuntu 18.04
###################################################################
FROM ubuntu:18.04
###################################################################
# Install some basic pre-requisites
###################################################################
RUN apt-get update && \
apt-get install -y --no-install-recommends \
libxaw7-dev freeglut3-dev libnetpbm10-dev libgd-dev libplplot-dev \
libavcodec-dev libcairo2-dev libjpeg-dev swig python-dev python-numpy g++ gfortran \
libopenmpi-dev libfftw3-dev libsuitesparse-dev python-epydoc scons \
git wget ca-certificates openssl && \
apt-get -y clean
####################################################################
# Madagascar build
####################################################################
RUN git clone https://github.com/ahay/src /madagascar-src&& \
cd /madagascar-src && \
git checkout madagascar-2.0 && \
./configure API=c++,python --prefix=/usr/local && \
make install && \
rm -r /madagascar-src
RUN echo "\n" >> ~/.bashrc && \
echo "source /usr/local/share/madagascar/etc/env.sh" >> ~/.bashrc
Anyone familiar with Linux and shell scripting should be fairly at home with Dockerfile syntax. They are essentially just a list of steps required to build the machine image you want to run. We’ll now go through this sequence for Madagascar and highlight some of the important bits in the Dockerfile. For instance the lines:
FROM ubuntu:18.04
RUN apt-get update && \
apt-get install -y --no-install-recommends \
libxaw7-dev freeglut3-dev libnetpbm10-dev libgd-dev libplplot-dev \
libavcodec-dev libcairo2-dev libjpeg-dev swig python-dev python-numpy g++ gfortran \
libopenmpi-dev libfftw3-dev libsuitesparse-dev python-epydoc scons \
git wget ca-certificates openssl && \
apt-get -y clean
define the base image (a basic ubuntu server) and install some packages required to build Madagascar. Whilst the actual building part is accomplished with:
RUN git clone https://github.com/ahay/src /madagascar-src&& \
cd /madagascar-src && \
git checkout madagascar-2.0 && \
./configure API=c++,python --prefix=/usr/local && \
make install && \
rm -r /madagascar-src
which downloads the Madagascar source from the git repo, switches to the 2.0 branch, configure’s the build, builds, and finally cleans up the source directory.
To build the image, go to the directory containing your Dockerfile and run the following command. Note – this will take a while to run, it does have to build Madagascar from source after all.
docker build -t username/madagascar:latest .
where you will need to replace username with your docker hub user name. Also note the trailing dot, which provides a build context to Docker this is essentially like telling Docker where to run your build. The madagascar:latest is the image name (madagascar) and tag (latest), these can be whatever you want to call this particular image (so you can run it later). Usually, you would use “latest” for a tag and the image name will be the name of the repository you want to push this image too, but if you just want to run the image on your local system these can be anything.
Once that is done you can test run the image on your local system using a command like
docker run --entrypoint=/bin/bash username/madagascar:latest -c "sfspike n1=1000 d1=0.004 o1=-2. | sfattr"
*******************************************
rms = 1
mean = 1
2-norm = 31.6228
variance = 0
std dev = 0
max = 1 at 1
min = 1 at 1
nonzero samples = 1000
total samples = 1000
*******************************************
The above command spins up the docker image and runs the following command then exits.
sfspike n1=1000 d1=0.004 o1=-2. | sfattr
Distributing the image
We are now ready to make the image available elsewhere, assuming you have an account on docker hub you can create a repository (use the instructions on docker hub) and push the image using something like:
docker push username/madagascar:latest
Now anyone with Docker installed can run the Madagascar programs using your image.
Using the Image: Madagascar Examples
Running a single command
We have already used this in order to test the container, but I am repeating it here in case anyone has just jumped to the end.
docker run --entrypoint=/bin/bash username/madagascar:latest -c "sfspike n1=1000 d1=0.004 o1=-2. | sfattr"
Using a shared directory
Docker images are stateless (their data is deleted when the image stops). However, you can mount shared directories from the host machine and read/write data from there. For example, the following command makes a jpeg image in the current directory using the madagascar image we have built.
docker run --entrypoint=/bin/bash -v `pwd`:/work/ username/madagascar:latest -c "cd work/&& sfspike n1=1000 d1=0.004 o1=-2. k1=501 | sfbandpass fhi=2. phase=1 | sfgraph title='Welcome to Madagascar' | jpegpen > plot.jpg"
In order to do this, we have added the -v `pwd`:/work/
option, which tells Docker to mount the current directory as a shared folder under /work/. We have also changed the command to cd work/; sfspike n1=1000 d1=0.004 o1=-2. k1=501 | sfbandpass fhi=2. phase=1 | sfgraph title='Welcome to Madagascar' | jpegpen > plot.jpg"
which changes to the working directory and makes a nice wiggle plot using sfspike and sfbandpass.
Open a bash shell
If you just want to play with Madagascar commands from the bash shell, you can use something like:
docker run --entrypoint=/bin/bash -it username/madagascar:latest
which will open up a bash shell in the running container.
Open a python shell
Similarly, for a python shell you use:
docker run --entrypoint=/usr/bin/python -it username/madagascar:latest
Summary
We now have a built a Madagascar image inside Docker and have pushed it to a remote repository so that we and others can access it from different locations. That’s it for part 1 of this 2 part post. If this was your first go with Docker then congratulations, you have made your first steps into a larger world. In part 2, we’ll discuss how to put this knowledge to good use by running commands in the image on Amazon’s computing cluster (specifically AWS Fargate).
Stay tuned for further updates.
Acknowledgments
Obviously, this post would not have been possible without the communities which develop and maintain tools like Docker and Madagascar. I would also like to thank Mateo Ravasi and Carlos Alberto da Costa Filho for their suggestions on how to improve this post.