Bash pattern matching

Bash pattern matching even for the most experienced bash programmers has never been easy. And for those of you that are just starting to learn the ropes around bash, you are thinking, where do I start?

Luckily enough, you are in the right place. Here bash pattern matching will be treated thoroughly starting from the basics and working towards less deviled too touch advanced pattern matching techniques.  Bash pattern matching Results, Types and Tools will be covered.

Pattern matching results

The result of pattern matching is a list of 1 or more matching patterns. In the case of an empty list, the pattern did not match.

Types of patterns

Before we even get started with our first pattern matching example, let’s lay down the groundworks to build on. That is, let’s list out all the types of patterns to be treated in the scope of pattern matching and provide an overview of the examples to follow.

  • Generic pattern
  • String exact pattern
  • String regular expression pattern
  • File exact pattern
  • File glob pattern

Patterns in general

In general, when we are looking to do pattern matching there are three base parameters: the pattern, the subject, and the relation. For simplicity purposes, we’ll assume that there is a function that maps the pattern into the subject and the result matches the subject.  Let’s look at some examples.

General patterns: Alphabet soup

Suppose that we have a bowl of alphabet soup that we wish to make subject to pattern matching. For the pattern, we choose the letter P, as in Pikachu. Then, we throw the ball and wait for the result of pattern matching. The letter P matches alphabet soup. Now we can continue eating our breakfast.

General patterns: Spaghetti Os

Now instead, we have a bowl of Spaghetti-Os. Again, we use the letter P as the pattern and throw the ball. As you would expect, the letter P does not match Spaghetti-Os. Maybe we should have had alphabet soup for breakfast or picked a pattern more likely to match.

Patterns in strings

In bash, all variables despite attributes, are represented internally as strings. That is all variables in bash are subject to pattern matching in the same way. Types of string patterns can be Exact or Regular expression.

String patterns: exact pattern

The string exact pattern is a string that represents only 1 string. When matched, the subject of pattern matching is returned as a whole or a substring if matched.

Example 1: simple pattern matching using string exact patterns

Subject:  algorithm
Pattern:  ori
Matches(pattern,subject): true (ori)
See parameter expansion

Example 2: simple pattern mismatch using string exact patterns

Subject:  algorithm
Pattern:  ali
Matches(pattern,subject): false ()
See tests

String patterns: regular expression patterns

The string regular expression pattern is a string that can be expanded to match one or more expressions. They come in handy when exact string matching just doesn’t cut it. That is, we need magic or regular expressions. Let’s go with the latter.

Example 3: simple pattern matching using string exact patterns for the word algorithm

Subject:  algorithm
Pattern:  [logarithm]
Matches(pattern,subject): true (algorithm)
See example in tests

Example 4: simple pattern matching using string exact patterns for hyphen separated date strings

Subject:  2020-01-01
Pattern:  [0-9-]*
Matches(pattern,subject): true (2010-01-01)
See example in tests

Patterns in the tree

Bash has a feature called globbing that expands strings outside of quotes to names of files or directories immediately present in the tree. File expansion as it is also referred to as is enabled by default so you never have to turn it one. However, in some cases, you may opt to turn it off. Do note that although similar, globbing is not as extensive as regular expressions as seen in string patterns.

Example 5: glob all files in the working directory together

Subject:  working directory
Pattern:  *
Matches(pattern, subject): true (all files in working directory)
See example in file expansion

Example 6: glob all files in the working directory together with name containing only a single character

Subject:  working directory
Pattern: ?
Matches(pattern, subject): true (single letter file and directory names)
See example in file expansion

Tools for pattern matching in bash

Bash does not have special builtins for pattern matching. Instead, it requires tools such as grep, sed, or awk in addition to bash builtins like file and parameter expansion, and tests. Here are the tools in and out of bash for pattern matching.

External tools for bash pattern matching

  • grep
  • gawk
  • sed
  • xxd
  • find

grep

Grep is a simple yet powerful command-line utility and one of the reasons bash doesn’t know how to handle pattern matching. It searches for a pattern in a file. What more can you ask for?

It finds patterns within a file. Using xargs, it can be used to search for patterns in the filesystem.

Suppose that you want to search a directory called haystack for a file containing the word ‘haystack’. Here is how we would use grep.

find haystack -type f | xargs grep -e "needle" || echo not found
echo needle >> haystack/aa
find haystack -type f | xargs grep -e "needle" || echo not found

Note that I just happened to rename the sandbox directory in the example below to haystack.

gawk (or awk)

Perhaps another reason why bash appears to not want anything to do with pattern matching is that awk, the pattern scanning, and processing language, existed well before the first release of bash.

In practice, you will find gawk used extensively in many polyglot bash programs as a means of entering pattern matching mode from within a batch script.

Unlike other tools listed for bash pattern matching, gawk has the capability of creating new instances of bash or any other command-line utility through a builtin system function. However, in this case, it is more practical to handle using xargs to run in parallel or pipe into bash directly to run in sequence.

Gawk may also be used to implement primitive versions of command command-line utilities like tac and shuffle, as seen in bash tac command and bash shuf command, respectfully.

sed

Sed, yet another powerful command-line utility and another reason why bash can’t compete by itself in pattern matching, stands for stream editor. It uses a simple programming language built around regular expression allowing you to search, replace, edit files in place, or otherwise to more than string manipulation in bash.

It is commonly used in polyglot bash scripts to replace patterns in files that would otherwise be overkill trying to accomplish using bash parameter expansion.

As seen in bash sed examples, there is more to sed than pattern matching alone.

xxd

xxd is a command-line utility available in most systems that allows you to convert the output to and from hex notation. It makes pattern matching and replacement in non-text files easier when used in conjunction with other pattern matching tools for in bash.

find

find is a command-line utility that can be used as an alternative to file expansion when recursion is required. It allows you to traverse the file system while listing files found matching the options set. For pattern matching on file names, the -name option may be used.

Internal tools for bash pattern matching

Bash has pattern matching capabilities when it comes to files and strings. Here are the tools for pure bash pattern matching:  file expansion (globbing), parameter expansion, tests.

file expansion (globbing)

File expansion allows a string not surrounded by quotes containing the characters * or ? to be expanded into one or more paths matching the string. In cases where using the find command is not required, especially when working in the interactive mode in command-line, we may opt to use file expansion over the find command. File expansion is enabled by default. However, it may be disabled using the shopt builtin command.

Usage

Wildcard matching 1 or more characters in a filename
*
Wildcard matching 1 character in a filename
?

By default, unquoted strings will expand depending on files present in the working directory.

Globbing may be disabled and enabled by setting noglob.

Disable globbing

set -o noglob

Enabled globbing (default)

set +o noglob

Alternatively, you may use the short command for disabled globbing

set -f

For other ways to use set, see The Set Builtin. It deserves a section.

You may also find The Shopt Builtin useful as well.

There are ways to modify the file globbing behavior in bash via the set and shopt builtins.

Commands

Run the following commands to set up a sandbox for file expansion (globbing).

{
  mkdir sandbox
  cd sandbox
  touch {.,}{a..z}{a..z}
  touch {.,}{a..z}{a..z}{a,b}
}

You should now be working in a directory named sandbox containing files such as aa, ab, …, zy, zz, including hidden files.

Match all hidden files and directories

echo .*

Match all files and directories

echo .* *

Match all files and directories starting with an ‘a’

echo a*

Match all files and directories starting with an ‘a’ and ending with a ‘b’

echo a*b

Match all files and directories with name containing 2 characters and starts with an ‘a’

echo a?

Match all files and directories with name containing 2 characters

echo ??

Last but not least, let’s try to glob with noglob set

set -f
echo .*
echo .* *
echo a*
echo a*b
echo a?
echo ??

parameter expansion

Parameter expansion in bash allows you to manipulate variables containing strings. It may be used to replace and replace a pattern within a string. Support for case insensitive pattern matching is available by using the shopt builtin command.

Usage

Here is a little function I cooked up to show bash pattern matching in action using parameter expansion. It has 2 parameters: 1) subject; and 2) pattern. If the subject matches the pattern, the function returns a ‘0’; otherwise, it will return ‘1’. Pattern may be a regular expression.

match ()
{
  local subject
  local pattern
  subject="${1}"
  pattern="${2}"
  new_subject="${subject//${pattern}/}"
  echo "${new_subject}" 1>&2
  test ! "${subject}" = "${new_subject}"
  echo ${?}
}

Commands

Here is a block of commands showing how the match function works.

subject=$( echo {a..z} | tr -d ' '  )
match ${subject} a
match ${subject} ba
match ${subject} [a-d]

Output

tests

Tests in bash allow you to compare files, strings, and integers. They may be used to do pattern matching on a string. In the case of simple pattern matching on strings using regular expressions, we may opt to use tests instead of grep.

Usage

[[ "string" =~ regex ]]

Commands

_ ()
{
  [[ "algorithm" =~ [${1}]{9} ]];
  echo ${?}
}
_ logarithm
_ algorithm
_ algorith_

Output

TLDR;

I’ll admit, pattern matching goes way beyond bash alone and may require another section with examples and exercise allowing you to get your hands dirty. I’ll just say that including pure bash pattern matching methods, becoming familiar with the command line utilities listed as external tools for pattern matching in bash is a definite must. Happy bash programming!
Thanks,



from Linux Hint https://ift.tt/2W8s7Sd

Post a Comment

0 Comments