Tips for DevOps Pipeline Automation & Bash Scripting
DevOps CI/CD pipelines on platforms such as GitHub or Azure DevOps are basically shell scripts that run in the cloud and are triggered by events, e.g., a Git push. This article explains common mistakes with pipeline scripts and how to avoid them.
Common Errors in Bash Scripts
Globbing (Pattern Matching With Wildcards)
What is Globbing?
Fun fact: the term globbing originates from the
glob command, a separate program to expand wildcard characters that was part of early versions of Unix. The name glob is short for global (source).
The one thing that is important to know and remember: wildcards are processed by the shell.
When the shell expands wildcards, it takes an innocent-looking command like
rm dir/* and creates a monster command line where the wildcard is replaced by a list of all matching files and directories. As you can imagine, with large numbers of files, such command lines can become quite long. Now, if you’re wondering if there isn’t some kind of limit, you’re on the right track. Of course there is.
When you hit the globbing limit, you’ll get the error message
Argument list too long. A Google search for
linux bash "argument list too long" yields 475,000 results. Apparently, this is quite a common issue. It’s also rather devious: your script might work perfectly for months and years only to fail unexpectedly when the number of files in a directory processed by the script exceeds the magic limit.
Recommendation: Don’t Do It
That is why I recommend staying away from globbing unless you can be sure that the number of files in directories processed by your script does not increase beyond the numbers you tested with.
There are various alternatives to globbing that are explained in the answers to this Stack Overflow question, for example.
Wildcards and Quotes
If you decide you do want to use globbing, remember that wildcards must be placed outside of quotes, which is quite counterintuitive.
# This only copies a single file named * cp "source dir/*" target/
# This copies all files in the directory "source dir" cp "source dir"/* target/
Copying the Contents of a Directory Recursively
Sometimes, the simple things are the most difficult. This is certainly true for copying the contents of a directory only. In other words: copy everything within a directory to a different directory.
We don’t want to use globbing, and we want the command to stay readable. Luckily, the
cp command has the
-T parameter for this task. Example:
cp -rT source/ dest/
By the way, if the destination directory doesn’t exist, the above command creates it.
Deleting the Contents of a Directory Recursively
Deleting the contents of a directory recursively without resorting to globbing is less straightforward because the
rm command doesn’t have an equivalent to
-T switch. Instead, it’s often easiest to delete the parent directory altogether (even though we really want to keep it) and recreate it subsequently – or have a command such as
cp recreate it for us.
Example of how to delete a directory along with its contents:
rm -rf source
Facilitate Debugging of Pipeline Scripts
Pipeline scripts run silently most of the time, their console output is normally disregarded completely – until the day they fail, and you need to debug their innards. The following tips greatly facilitate the debugging process.
Bash: Print a Trace of Commands
set -x at the beginning of Bash scripts to configure the shell to print commands as they are executed. This way, you see the command in addition to its output.
Verbose Command Output
Many commands have a
-v switch that enables more verbose output. As a rule of thumb, if the
-v switch is available, use it. Commands with a verbose switch include
Many pipeline scripts modify a Git repository. The following tips help with that.
Track Deletions With Git Add
You’ll sometimes want to delete files in a locally checked-out copy of a Git repository. When you do that, remember that
git add ignores deletions by default – the files you deleted locally are still present in the repository even after you add and commit your changes.
git add to consider all kinds of changes, including deletions, specify the
--all parameter. From the docs: This adds, modifies, and removes index entries to match the working tree.
git add --all -v "dir/*"
In this case, the asterisk (wildcard character) for once can (and should) be placed inside the quotes because it is processed by the
User Identity for Git Commits in Pipelines
Variant A: Specify the User Who Triggered a Pipeline
When you commit a change to a Git repository, Git requests a user identity which it stores as part of the commit. CI/CD pipelines can be triggered by different developers or even be executed on a schedule without user intervention at all. It is, therefore, not exactly a best practice to hard-code the Git user name and email in your pipeline scripts.
On Azure DevOps, there is an elegant alternative with the variables
$BUILD_REQUESTEDFOREMAIL. They can be used as follows in YAML pipeline definition files:
- bash: | # Print executed commands set -x # The email is empty for the Azure DevOps system identify (Microsoft.VisualStudio.Services.TFS) if [[ "$BUILD_REQUESTEDFOREMAIL" == "" ]]; then [email protected] fi # Specify Git user git config --global user.email $BUILD_REQUESTEDFOREMAIL git config --global user.name "$BUILD_REQUESTEDFOR"
Variant B: Specify a System User
You might want to be able to identify Git commits that originated from CI/CD pipelines. If that is the case, simply specify any user name and email in the Git options. The email needn’t be a real address, by the way.
- bash: | # Print executed commands set -x # Specify Git user git config --global user.email "[email protected]" git config --global user.name "DevOps"
Its good to get a reminder of how carelessly we tend to throw effective code to the wind when working.
Code that works <= effective code. I still amuse myself getting that regex perfect, or throw large SQL-Queries off using wildcards.
Forced me to remember that because its easier for me to code, doesnt make it any more effective to computere.
client shell, language or platform regardless.