Tips for DevOps Pipeline Automation & Bash Scripting
DevOps CI/CD pipelines on platforms such as GitHub or Azure DevOps are basically shell scripts that run in the cloud and are triggered by events, e.g., a Git push. This article explains common mistakes with pipeline scripts and how to avoid them.
Common Errors in Bash Scripts
Globbing (Pattern Matching With Wildcards)
What is Globbing?
Fun fact: the term globbing originates from the glob
command, a separate program to expand wildcard characters that was part of early versions of Unix. The name glob is short for global (source).
The one thing that is important to know and remember: wildcards are processed by the shell.
Globbing Limits
When the shell expands wildcards, it takes an innocent-looking command like rm dir/*
and creates a monster command line where the wildcard is replaced by a list of all matching files and directories. As you can imagine, with large numbers of files, such command lines can become quite long. Now, if you’re wondering if there isn’t some kind of limit, you’re on the right track. Of course there is.
When you hit the globbing limit, you’ll get the error message Argument list too long
. A Google search for linux bash "argument list too long"
yields 475,000 results. Apparently, this is quite a common issue. It’s also rather devious: your script might work perfectly for months and years only to fail unexpectedly when the number of files in a directory processed by the script exceeds the magic limit.
Recommendation: Don’t Do It
That is why I recommend staying away from globbing unless you can be sure that the number of files in directories processed by your script does not increase beyond the numbers you tested with.
There are various alternatives to globbing that are explained in the answers to this Stack Overflow question, for example.
Wildcards and Quotes
If you decide you do want to use globbing, remember that wildcards must be placed outside of quotes, which is quite counterintuitive.
Wrong:
# This only copies a single file named *
cp "source dir/*" target/
Right:
# This copies all files in the directory "source dir"
cp "source dir"/* target/
Copying the Contents of a Directory Recursively
Sometimes, the simple things are the most difficult. This is certainly true for copying the contents of a directory only. In other words: copy everything within a directory to a different directory.
We don’t want to use globbing, and we want the command to stay readable. Luckily, the cp
command has the -T
parameter for this task. Example:
cp -rT source/ dest/
By the way, if the destination directory doesn’t exist, the above command creates it.
Deleting the Contents of a Directory Recursively
Deleting the contents of a directory recursively without resorting to globbing is less straightforward because the rm
command doesn’t have an equivalent to cp
‘s -T
switch. Instead, it’s often easiest to delete the parent directory altogether (even though we really want to keep it) and recreate it subsequently – or have a command such as cp
recreate it for us.
Example of how to delete a directory along with its contents:
rm -rf source
Facilitate Debugging of Pipeline Scripts
Pipeline scripts run silently most of the time, their console output is normally disregarded completely – until the day they fail, and you need to debug their innards. The following tips greatly facilitate the debugging process.
Bash: Print a Trace of Commands
Specify set -x
at the beginning of Bash scripts to configure the shell to print commands as they are executed. This way, you see the command in addition to its output.
Verbose Command Output
Many commands have a -v
switch that enables more verbose output. As a rule of thumb, if the -v
switch is available, use it. Commands with a verbose switch include cp
, mv
, and git add
.
Git
Many pipeline scripts modify a Git repository. The following tips help with that.
Track Deletions With Git Add
You’ll sometimes want to delete files in a locally checked-out copy of a Git repository. When you do that, remember that git add
ignores deletions by default – the files you deleted locally are still present in the repository even after you add and commit your changes.
To instruct git add
to consider all kinds of changes, including deletions, specify the --all
parameter. From the docs: This adds, modifies, and removes index entries to match the working tree.
Example:
git add --all -v "dir/*"
In this case, the asterisk (wildcard character) for once can (and should) be placed inside the quotes because it is processed by the git
command.
User Identity for Git Commits in Pipelines
Variant A: Specify the User Who Triggered a Pipeline
When you commit a change to a Git repository, Git requests a user identity which it stores as part of the commit. CI/CD pipelines can be triggered by different developers or even be executed on a schedule without user intervention at all. It is, therefore, not exactly a best practice to hard-code the Git user name and email in your pipeline scripts.
On Azure DevOps, there is an elegant alternative with the variables $BUILD_REQUESTEDFOR
and $BUILD_REQUESTEDFOREMAIL
. They can be used as follows in YAML pipeline definition files:
- bash: |
# Print executed commands
set -x
# The email is empty for the Azure DevOps system identify (Microsoft.VisualStudio.Services.TFS)
if [[ "$BUILD_REQUESTEDFOREMAIL" == "" ]]; then
[email protected]
fi
# Specify Git user
git config --global user.email $BUILD_REQUESTEDFOREMAIL
git config --global user.name "$BUILD_REQUESTEDFOR"
Variant B: Specify a System User
You might want to be able to identify Git commits that originated from CI/CD pipelines. If that is the case, simply specify any user name and email in the Git options. The email needn’t be a real address, by the way.
Example:
- bash: |
# Print executed commands
set -x
# Specify Git user
git config --global user.email "[email protected]"
git config --global user.name "DevOps"
1 Comment
Tanks Helge.
Its good to get a reminder of how carelessly we tend to throw effective code to the wind when working.
Code that works <= effective code. I still amuse myself getting that regex perfect, or throw large SQL-Queries off using wildcards.
Forced me to remember that because its easier for me to code, doesnt make it any more effective to computere.
client shell, language or platform regardless.