Regex Search & Replace in WordPress Posts With WP-CLI
Maintaining a WordPress site that goes back years or even decades (as in this case) can be a challenge. You want to ensure formatting is consistent, even for old posts, and shortcodes are kept in check. As always, automation is key. This post shows how to search and replace in WordPress posts with the help of regular expressions.
I started thinking about how to search and replace HTML tags in WordPress when I found an older post where expressions were formatted with inline styling like the following:
<span style="font-family:'Courier New',Courier,monospace;">. I had migrated to the much more modern
<code> tag a long time ago. Seeing that I still had legacy code formatting was a bit of a shock. Apparently, I had overlooked some styling bits in the migration. How many posts might be affected?
“Luckily,” that is easy with WordPress. I put that in quotes because it only works due to the simplicity of WordPress’ search, which dumbly searches what is stored in the database, not caring about the presentation layer at all. This means you can use any WordPress site’s search functionality to look for HTML tags or WordPress shortcodes.
Once I realized the search
https://helgeklein.com/?s=font-family returned dozens of posts, I knew I had to do something.
The tool of choice for any kind of automated/scripted WordPress maintenance from the command line is WP-CLI. It comes with a search & replace function that optionally accepts regular expressions. It works well, but, as always, the devil’s in the details.
Install WP-CLI according to the docs. If you already have an older version, you should upgrade it or you might get PHP errors. Run the following to upgrade WP-CLI:
sudo wp cli update
Important: before you run any commands that modify your site’s content, make sure you a) have a backup and b) first try with the parameters
Navigate to your WordPress directory (you need to use your own path, of course):
When you run the search and replace command with the parameters
--log=[path] you get a wonderful preview log file of exactly what would happen:
Once you’re happy with the preview in the log file, remove the
--log=[path] parameters to actually make the changes in the WordPress database. The result should look similar to the following:
+----------+--------------+--------------+------+ | Table | Column | Replacements | Type | +----------+--------------+--------------+------+ | wp_posts | post_content | 254 | PHP | +----------+--------------+--------------+------+ Success: Made 254 replacements.
After some trial and error, I went with the following to replace all courier font formatting with
wp search-replace '<span style="font-family:[^"]*courier[^"]*">(.+?)<\/span>' '<code>\1<|code>' --regex --precise --regex-flags='i' wp_posts --include-columns=post_content
- The first parameter is the search term:
- We’re looking for a
Courier, and capture the span’s content in a non-greedy way in a regex group.
- We’re looking for a
- The second parameter is the replace term:
\1is a variable that is filled with the contents of the capturing group
(.+?)of the search term.
- Please replace the pipe symbol in my examples with a forward slash. I used it to work around some issues with my syntax highlighting plugin.
--regexenables regular expressions for the search & replace operation.
--preciseswitches to PHP regex processing (which I enabled just to be on the safe side).
--regex-flags='i'enables case-insensitive regex matching.
wp_posts --include-columns=post_contentrestricts the operation to the
post_contentcolumn of the
I used the following to move all HTML headline tags a level higher, e.g., from
wp search-replace '(</?)h3([^>]*>)' '\1h2\2' --regex --precise --regex-flags='i' wp_posts --include-columns=post_content
The above regex is easy to adjust for other headline-level replacements, e.g., from
wp search-replace '(</?)h4([^>]*>)' '\1h3\2' --regex --precise --regex-flags='i' wp_posts --include-columns=post_content