Skip to content

Linux Globbing(Wildcard)

参考 man 1 bash 中的 EXPANSION | Pathname Expansion | Pattern Matching(glob & extglob)。

参考 Bash Reference ManualPDF)的 3.5 Shell Expansions | Filename Expansion | Pattern Matching。

Shell GLOB patterns
Bash Extended Globbing

man 7 GLOB#

NOTES | Regular expressions of man 7 glob

Note that wildcard patterns are not regular expressions, although they are a bit similar.
First of all, they match filenames, rather than text, and secondly, the conventions are not the same: for example, in a regular expression '' means zero or more copies of the preceding* thing.

Now that regular expressions have bracket expressions where the negation is indicated by a '^', POSIX has declared the effect of a wildcard pattern "[^...]" to be undefined.

Advanced Bash-Scripting Guide#

Globbing & Wildcards

Bash itself cannot recognize Regular Expressions.
Inside scripts, it is commands and utilities -- such as sed and awk -- that interpret RE's.

Bash does carry out filename expansion -- a process known as globbing -- but this does not use the standard RE set.
Instead, globbing recognizes and expands wild cards.

Bash performs filename expansion on unquoted command-line arguments.

glob#

In computer programming, in particular in a Unix-like environment, glob patterns specify sets of filenames with wildcard characters.

For example, the Unix command mv *.txt textfiles/ moves (mv) all files with names ending in .txt from the current directory to the directory textfiles. Here, mv *.txt textfiles/, * is a wildcard standing for "any string of characters" and *.txt is a glob pattern.

The other common wildcard is the question mark (?), which stands for one character.

The most common wildcards are *, ?, and [list].

wildcards#

In software, a wildcard character is a kind of placeholder represented by a single character, such as an asterisk (*), which can be interpreted as a number of literal characters or an empty string. It is often used in file searches so the full name need not be typed.

Bash 中常用通配符只有3个: *, ?, [list]

**:出现在路径中,匹配任意级别目录(递归)。

*#

*(asterisk): match any number of any characters

matches zero or more of any character in a name, including spaces or other strange characters.
*:出现在路径中,匹配一级子目录(不递归)。

?#

?(question mark):match only one single character, any character

matches exactly one of any character in a name, including a space or other strange character.

  • The GLOB pattern ???* matches non-hidden names that are three or more characters long.

[list]#

[list](square brackets):match single characters from a list

match exactly one character in a name from a list of characters.

  • [aA]: it matches any one-character name that is either a or A.
  • [a][A]: only matches aA.

demo#

*.txt               # 匹配全部后缀为.txt的文件
file?.log           # 匹配file1.log, file2.log, ...
[a-z]*.log          # 匹配a-z开头的.log文件
[^a-z]*.log         # 上面的反向匹配
/etc/**/*.conf      # 猜猜看?

反斜杠(\)或引号(', ")都会使通配符失效。

如: \*, '*', "*" 都表示 * 本身,不通配任何文件。

将 helloc.c 和 hellocpp.cpp 移动到 hello 文件夹下:

mv helloc*.c* hello

* 出现在路径中进行路径匹配:

# 匹配 usr/include 下的一级子目录(不递归)
ls -d usr/include/*/
# 匹配 usr/include 下的所有子目录(递归)
ls -d usr/include/**/

# 通配一级子目录
file /Applications/*/Contents/MacOS/*
# 匹配 /etc/ 下一级子目录下的 conf 配置文件(不递归)
ls -l /etc/*/*.conf
# 匹配 /etc/ 下所有子目录下的 conf 配置文件(递归)
ls -l /etc/**/*.conf

在 /usr/include 目录下(递归)查找名称为 wordsize.h 的文件。

$ find /usr/include -type f -name wordsize.h
/usr/include/aarch64-linux-gnu/bits/wordsize.h

如果知道这个头文件在旗下某个一级 target 的 bits 目录下,可缩小范围通配查找:

$ find /usr/include/*/bits -type f -name wordsize.h
/usr/include/aarch64-linux-gnu/bits/wordsize.h

移除当前目录下名称为 helloworld,后缀为 .s.o 的文件:

$ rm helloworld.[so]
$ rm helloworld.{s,o}

cheatsheet#

Syntax#

The most common wildcards are *, ?, and […].

Wildcard Description Example Matches Does not match
* matches any number of any characters including none Law* Law, Laws, or Lawyer GrokLaw, La, or aw
*Law* Law, GrokLaw, or Lawyer. La, or aw
? matches any single character ?at Cat, cat, Bat or bat at
[abc] matches one character given in the bracket [CB]at Cat or Bat cat or bat
[a-z] matches one character from the (locale-dependent) range given in the bracket Letter[0-9] Letter0, Letter1, Letter2 up to Letter9 Letters, Letter or Letter10

In all cases the path separator character (/ on Unix or \ on Windows) will never be matched.

UNIX#

On Linux and POSIX systems *, ? is defined as above while […] has two additional meanings:

Wildcard Description Example Matches Does not match
[!abc] matches one character that is not given in the bracket [!C]at Bat, bat, or cat Cat
[!a-z] matches one character that is not from the range given in the bracket Letter[!3-5] Letter1, Letter2, Letter6 up to Letter9 and Letterx etc. Letter3, Letter4, Letter5 or Letterxx

Some shells (such as the C shell and Bash) support additional syntax known as alternation or brace expansion.

The Bash shell also supports Extended Globbing which allows other pattern matching operators to be used to match multiple occurrences of a pattern enclosed in parentheses. It can be enabled by setting the extglob shell option.

Globbing vs RegExp#

通配符是用来匹配 文件名 的,正则表达式则是用来匹配 文件内容(文本字符串)的。

BASH 所做的是扩展文件名,基于通配符(globbing/wildcard patterns),并非正则表达式。
通配符多用在文件名匹配上,比如 lsfindcprm 等命令可通配查找出符合匹配条件的文件进行操作。

BASH 本身没有正则表达式的功能,在脚本里使用正则表达式的是命令(grep,sed)和工具包(awk),它们可以解析正则表达式。
针对文件内容的文本过滤工具 grep(-G,-e,-E),awk,sed 等,都是基于正则表达式。

Wildcards are more limited in what they can pattern, as they have fewer metacharacters and a simple language-base.

Equivalence#

glob regexp 说明
? . 换行符 以外的任意单字符
* .* 任意数量(0或多个)的字符
???* .{3,} 3或多个字符
[:digit:] \d 数字
[:space:] \s 空白

最主要的区别是通配符是基于 位置 匹配,而正则表达式是 前向 限定(a*a+a?)。

ranges exclude#

Now that regular expressions have bracket expressions where the negation is indicated by a '^', POSIX has declared the effect of a wildcard pattern "[^...]" to be undefined.

通配及正则表达式都支持对字符集范围的匹配和排除匹配。范围匹配表达式都为 [abc] 格式,但是排除匹配略有不同。

传统 POSIX 中对于通配字符集排除匹配基于 ![!abc] 格式,现代大部分通配及正则表达式中,都支持 [^abc] 格式排除匹配。

references#

Linux命令ls通配符的使用
shell中的正则表达式和通配符 @github
Linux基础概要通配符 & 正则表达式
命令行通配符教程 - 阮一峰

python: glob - Filename globbing utility.

Comments