Linux Globbing(Wildcard)
参考 man 1 bash
中的 EXPANSION | Pathname Expansion | Pattern Matching(glob & extglob)。
参考 Bash Reference Manual(PDF)的 3.5 Shell Expansions | Filename Expansion | Pattern Matching。
Shell GLOB patterns
Bash Extended Globbing
man 7 GLOB#
NOTES | Regular expressions of man 7 glob
Note that wildcard patterns are not regular expressions, although they are a bit similar.
First of all, they match filenames, rather than text, and secondly, the conventions are not the same: for example, in a regular expression '' means zero or more copies of the preceding* thing.
Now that regular expressions have bracket expressions where the negation is indicated by a '^', POSIX has declared the effect of a wildcard pattern "[^...]" to be undefined.
Advanced Bash-Scripting Guide#
Bash itself cannot recognize Regular Expressions.
Inside scripts, it is commands and utilities -- such as sed and awk -- that interpret RE's.
Bash does carry out filename expansion -- a process known as globbing -- but this does not use the standard RE set.
Instead, globbing recognizes and expands wild cards.
Bash performs filename expansion on unquoted command-line arguments.
glob#
In computer programming, in particular in a Unix-like environment, glob patterns specify sets of filenames with wildcard characters.
For example, the Unix command mv *.txt textfiles/
moves (mv
) all files with names ending in .txt
from the current directory to the directory textfiles
. Here, mv *.txt textfiles/
, *
is a wildcard standing for "any string of characters" and *.txt
is a glob pattern.
The other common wildcard is the question mark (?
), which stands for one character.
The most common wildcards are *
, ?
, and [list]
.
wildcards#
In software, a wildcard character is a kind of placeholder represented by a single character, such as an asterisk (*
), which can be interpreted as a number of literal characters or an empty string. It is often used in file searches so the full name need not be typed.
Bash 中常用通配符只有3个: *
, ?
, [list]
。
**
:出现在路径中,匹配任意级别目录(递归)。
*#
*
(asterisk): match any number of any characters
matches zero or more of any character in a name, including spaces or other strange characters.
*
:出现在路径中,匹配一级子目录(不递归)。
?#
?
(question mark):match only one single character, any character
matches exactly one of any character in a name, including a space or other strange character.
- The GLOB pattern
???*
matches non-hidden names that are three or more characters long.
[list]#
[list]
(square brackets):match single characters from a list
match exactly one character in a name from a list of characters.
[aA]
: it matches any one-character name that is either a or A.[a][A]
: only matches aA.
demo#
*.txt # 匹配全部后缀为.txt的文件
file?.log # 匹配file1.log, file2.log, ...
[a-z]*.log # 匹配a-z开头的.log文件
[^a-z]*.log # 上面的反向匹配
/etc/**/*.conf # 猜猜看?
反斜杠(\
)或引号('
, "
)都会使通配符失效。
如: \*
, '*'
, "*"
都表示 *
本身,不通配任何文件。
将 helloc.c 和 hellocpp.cpp 移动到 hello 文件夹下:
*
出现在路径中进行路径匹配:
# 匹配 usr/include 下的一级子目录(不递归)
ls -d usr/include/*/
# 匹配 usr/include 下的所有子目录(递归)
ls -d usr/include/**/
# 通配一级子目录
file /Applications/*/Contents/MacOS/*
# 匹配 /etc/ 下一级子目录下的 conf 配置文件(不递归)
ls -l /etc/*/*.conf
# 匹配 /etc/ 下所有子目录下的 conf 配置文件(递归)
ls -l /etc/**/*.conf
在 /usr/include 目录下(递归)查找名称为 wordsize.h 的文件。
如果知道这个头文件在旗下某个一级 target 的 bits 目录下,可缩小范围通配查找:
移除当前目录下名称为 helloworld,后缀为 .s
或 .o
的文件:
cheatsheet#
Syntax#
The most common wildcards are *
, ?
, and […]
.
Wildcard | Description | Example | Matches | Does not match |
---|---|---|---|---|
*
|
matches any number of any characters including none | Law*
|
Law , Laws , or Lawyer
|
GrokLaw , La , or aw
|
*Law*
|
Law , GrokLaw , or Lawyer .
|
La , or aw
|
||
?
|
matches any single character | ?at
|
Cat , cat , Bat or bat
|
at
|
[abc]
|
matches one character given in the bracket | [CB]at
|
Cat or Bat
|
cat or bat
|
[a-z]
|
matches one character from the (locale-dependent) range given in the bracket | Letter[0-9]
|
Letter0 , Letter1 , Letter2
up to Letter9
|
Letters , Letter or Letter10
|
In all cases the path separator character (/
on Unix or \
on Windows) will never be matched.
UNIX#
On Linux and POSIX systems *
, ?
is defined as above while […]
has two additional meanings:
Wildcard | Description | Example | Matches | Does not match |
---|---|---|---|---|
[!abc]
|
matches one character that is not given in the bracket | [!C]at
|
Bat , bat , or cat
|
Cat
|
[!a-z]
|
matches one character that is not from the range given in the bracket | Letter[!3-5]
|
Letter1 , Letter2 , Letter6
up to Letter9 and Letterx etc.
|
Letter3 , Letter4 , Letter5
or Letterxx
|
Some shells (such as the C shell and Bash) support additional syntax known as alternation
or brace expansion
.
The Bash shell also supports Extended Globbing which allows other pattern matching operators to be used to match multiple occurrences of a pattern enclosed in parentheses. It can be enabled by setting the extglob
shell option.
Globbing vs RegExp#
通配符是用来匹配 文件名 的,正则表达式则是用来匹配 文件内容(文本字符串)的。
BASH 所做的是扩展文件名,基于通配符(globbing/wildcard patterns),并非正则表达式。
通配符多用在文件名匹配上,比如 ls
、find
、cp
、rm
等命令可通配查找出符合匹配条件的文件进行操作。
BASH 本身没有正则表达式的功能,在脚本里使用正则表达式的是命令(grep,sed)和工具包(awk),它们可以解析正则表达式。
针对文件内容的文本过滤工具 grep
(-G,-e,-E),awk
,sed
等,都是基于正则表达式。
Wildcards are more limited in what they can pattern, as they have fewer metacharacters and a simple language-base.
Equivalence#
glob | regexp | 说明 |
---|---|---|
? |
. |
除 换行符 以外的任意单字符 |
* |
.* |
任意数量(0或多个)的字符 |
???* |
.{3,} |
3或多个字符 |
[:digit:] |
\d |
数字 |
[:space:] |
\s |
空白 |
最主要的区别是通配符是基于 位置 匹配,而正则表达式是 前向 限定(a*
、a+
、a?
)。
ranges exclude#
Now that regular expressions have bracket expressions where the negation is indicated by a '^
', POSIX has declared the effect of a wildcard pattern "[^...]
" to be undefined.
通配及正则表达式都支持对字符集范围的匹配和排除匹配。范围匹配表达式都为 [abc]
格式,但是排除匹配略有不同。
传统 POSIX 中对于通配字符集排除匹配基于 !
的 [!abc]
格式,现代大部分通配及正则表达式中,都支持 [^abc]
格式排除匹配。
references#
Linux命令ls通配符的使用
shell中的正则表达式和通配符 @github
Linux基础概要 之 通配符 & 正则表达式
命令行通配符教程 - 阮一峰
python: glob - Filename globbing utility.