by: Helge, published: Mar 16, 2023, in

Tips for DevOps Pipeline Automation & Bash Scripting

DevOps CI/CD pipelines on platforms such as GitHub or Azure DevOps are basically shell scripts that run in the cloud and are triggered by events, e.g., a Git push. This article explains common mistakes with pipeline scripts and how to avoid them.

Common Errors in Bash Scripts

Globbing (Pattern Matching With Wildcards)

What is Globbing?

Fun fact: the term globbing originates from the glob command, a separate program to expand wildcard characters that was part of early versions of Unix. The name glob is short for global (source).

The one thing that is important to know and remember: wildcards are processed by the shell.

Globbing Limits

When the shell expands wildcards, it takes an innocent-looking command like rm dir/* and creates a monster command line where the wildcard is replaced by a list of all matching files and directories. As you can imagine, with large numbers of files, such command lines can become quite long. Now, if you’re wondering if there isn’t some kind of limit, you’re on the right track. Of course there is.

When you hit the globbing limit, you’ll get the error message Argument list too long. A Google search for linux bash "argument list too long" yields 475,000 results. Apparently, this is quite a common issue. It’s also rather devious: your script might work perfectly for months and years only to fail unexpectedly when the number of files in a directory processed by the script exceeds the magic limit.

Recommendation: Don’t Do It

That is why I recommend staying away from globbing unless you can be sure that the number of files in directories processed by your script does not increase beyond the numbers you tested with.

There are various alternatives to globbing that are explained in the answers to this Stack Overflow question, for example.

Wildcards and Quotes

If you decide you do want to use globbing, remember that wildcards must be placed outside of quotes, which is quite counterintuitive.

Wrong:

# This only copies a single file named *
cp "source dir/*" target/

Right:

# This copies all files in the directory "source dir"
cp "source dir"/* target/

Copying the Contents of a Directory Recursively

Sometimes, the simple things are the most difficult. This is certainly true for copying the contents of a directory only. In other words: copy everything within a directory to a different directory.

We don’t want to use globbing, and we want the command to stay readable. Luckily, the cp command has the -T parameter for this task. Example:

cp -rT source/ dest/

By the way, if the destination directory doesn’t exist, the above command creates it.

Deleting the Contents of a Directory Recursively

Deleting the contents of a directory recursively without resorting to globbing is less straightforward because the rm command doesn’t have an equivalent to cp‘s -T switch. Instead, it’s often easiest to delete the parent directory altogether (even though we really want to keep it) and recreate it subsequently – or have a command such as cp recreate it for us.

Example of how to delete a directory along with its contents:

rm -rf source

Facilitate Debugging of Pipeline Scripts

Pipeline scripts run silently most of the time, their console output is normally disregarded completely – until the day they fail, and you need to debug their innards. The following tips greatly facilitate the debugging process.

Bash: Print a Trace of Commands

Specify set -x at the beginning of Bash scripts to configure the shell to print commands as they are executed. This way, you see the command in addition to its output.

Verbose Command Output

Many commands have a -v switch that enables more verbose output. As a rule of thumb, if the -v switch is available, use it. Commands with a verbose switch include cp, mv, and git add.

Git

Many pipeline scripts modify a Git repository. The following tips help with that.

Track Deletions With Git Add

You’ll sometimes want to delete files in a locally checked-out copy of a Git repository. When you do that, remember that git add ignores deletions by default – the files you deleted locally are still present in the repository even after you add and commit your changes.

To instruct git add to consider all kinds of changes, including deletions, specify the --all parameter. From the docs: This adds, modifies, and removes index entries to match the working tree.

Example:

git add --all -v "dir/*"

In this case, the asterisk (wildcard character) for once can (and should) be placed inside the quotes because it is processed by the git command.

User Identity for Git Commits in Pipelines

Variant A: Specify the User Who Triggered a Pipeline

When you commit a change to a Git repository, Git requests a user identity which it stores as part of the commit. CI/CD pipelines can be triggered by different developers or even be executed on a schedule without user intervention at all. It is, therefore, not exactly a best practice to hard-code the Git user name and email in your pipeline scripts.

On Azure DevOps, there is an elegant alternative with the variables $BUILD_REQUESTEDFOR and $BUILD_REQUESTEDFOREMAIL. They can be used as follows in YAML pipeline definition files:

- bash: |
    # Print executed commands
    set -x

    # The email is empty for the Azure DevOps system identify (Microsoft.VisualStudio.Services.TFS)
    if [[ "$BUILD_REQUESTEDFOREMAIL" == "" ]]; then
        [email protected]
    fi
    
    # Specify Git user
    git config --global user.email $BUILD_REQUESTEDFOREMAIL
    git config --global user.name "$BUILD_REQUESTEDFOR"

Variant B: Specify a System User

You might want to be able to identify Git commits that originated from CI/CD pipelines. If that is the case, simply specify any user name and email in the Git options. The email needn’t be a real address, by the way.

Example:

- bash: |
    # Print executed commands
    set -x

    # Specify Git user
    git config --global user.email "[email protected]"
    git config --global user.name "DevOps"

Previous Article restic: Encrypted Offsite Backup for Your Homeserver
Next Article Portainer Setup Guide With Automatic HTTPS & OAuth SSO via Authelia