by: Helge, published: Aug 24, 2020, updated: May 10, 2023, in

Regex Search & Replace in WordPress Posts With WP-CLI

Maintaining a WordPress site that goes back years or even decades (as in this case) can be a challenge. You want to ensure formatting is consistent, even for old posts, and shortcodes are kept in check. As always, automation is key. This post shows how to search and replace in WordPress posts with the help of regular expressions.

Searching WordPress for HTML Tags or Shortcodes

I started thinking about how to search and replace HTML tags in WordPress when I found an older post where expressions were formatted with inline styling like the following: <span style="font-family:'Courier New',Courier,monospace;">. I had migrated to the much more modern <code> tag a long time ago. Seeing that I still had legacy code formatting was a bit of a shock. Apparently, I had overlooked some styling bits in the migration. How many posts might be affected?

“Luckily,” that is easy with WordPress. I put that in quotes because it only works due to the simplicity of WordPress’ search, which dumbly searches what is stored in the database, not caring about the presentation layer at all. This means you can use any WordPress site’s search functionality to look for HTML tags or WordPress shortcodes.

Once I realized the search https://helgeklein.com/?s=font-family returned dozens of posts, I knew I had to do something.

Searching & Replacing in WordPress Posts

The tool of choice for any kind of automated/scripted WordPress maintenance from the command line is WP-CLI. It comes with a search & replace function that optionally accepts regular expressions. It works well, but, as always, the devil’s in the details.

Install or Upgrade WP-CLI

Install WP-CLI according to the docs. If you already have an older version, you should upgrade it or you might get PHP errors. Run the following to upgrade WP-CLI:

sudo wp cli update

Preparation

Important: before you run any commands that modify your site’s content, make sure you a) have a backup and b) first try with the parameters --dry-run and --log=[path].

Navigate to your WordPress directory (you need to use your own path, of course):

cd /var/www/helgeklein.com/public_html/

Dry Run

When you run the search and replace command with the parameters --dry-run and --log=[path] you get a wonderful preview log file of exactly what would happen:

Go!

Once you’re happy with the preview in the log file, remove the --dry-run and --log=[path] parameters to actually make the changes in the WordPress database. The result should look similar to the following:

+----------+--------------+--------------+------+
| Table    | Column       | Replacements | Type |
+----------+--------------+--------------+------+
| wp_posts | post_content | 254          | PHP  |
+----------+--------------+--------------+------+
Success: Made 254 replacements.

Use Cases & Examples

Replacing Courier Font Styles

After some trial and error, I went with the following to replace all courier font formatting with <code> tags:

wp search-replace '<span style="font-family:[^"]*courier[^"]*">(.+?)<\/span>' '<code>\1<|code>' --regex --precise --regex-flags='i' wp_posts --include-columns=post_content
Explanation

Parameters:

  • The first parameter is the search term: '<span style="font-family:[^"]*courier[^"]*">(.+?)<\/span>'.
    • We’re looking for a span where font-family contains Courier, and capture the span’s content in a non-greedy way in a regex group.
  • The second parameter is the replace term: '<code>\1<|code>'.
    • \1 is a variable that is filled with the contents of the capturing group (.+?) of the search term.
  • Please replace the pipe symbol in my examples with a forward slash. I used it to work around some issues with my syntax highlighting plugin.
    • Change | into /
  • --regex enables regular expressions for the search & replace operation.
  • --precise switches to PHP regex processing (which I enabled just to be on the safe side).
  • --regex-flags='i' enables case-insensitive regex matching.
  • wp_posts --include-columns=post_content restricts the operation to the post_content column of the wp_posts table.

Replacing Headline Levels

I used the following to move all HTML headline tags a level higher, e.g., from h3 to h2:

wp search-replace '(</?)h3([^>]*>)' '\1h2\2' --regex --precise --regex-flags='i' wp_posts --include-columns=post_content

The above regex is easy to adjust for other headline-level replacements, e.g., from h4 to h3:

wp search-replace '(</?)h4([^>]*>)' '\1h3\2' --regex --precise --regex-flags='i' wp_posts --include-columns=post_content

Previous Article Twitter Automation: Scheduled Tweets From Google Sheets
Next Article How to Check Which Bluetooth A2DP Audio Codec Is Used on Windows