0%

正则表达式中关于 [] 的用法

正则表达式关于 Metacharacter [ ]

引用自 wiki

A bracket expression. Matches a single character that is contained within the brackets. For example, [abc] matches “a”, “b”, or “c”. [a-z] specifies a range which matches any lowercase letter from “a” to “z”. These forms can be mixed: [abcx-z] matches “a”, “b”, “c”, “x”, “y”, or “z”, as does [a-cx-z].
The - character is treated as a literal character if it is the last or the first (after the ^, if present) character within the brackets: [abc-], [-abc]. Note that backslash escapes are not allowed. The ] character can be included in a bracket expression if it is the first (after the ^) character: []abc].

说的很明白了不翻译了,重点:

  • ‘-‘ 的用法需特殊注意
  • ‘]’ 的用法需要特殊注意

其他

1. sed如何对待一些特殊字符

sed 匹配一些非常规字符,比如字符串 “helloæworld”,当我们想替换中间的 æ,操作如下:

1
2
3
4
...]$ echo helloæworld | sed -n 'l0'
...]$ hello\303\246world$ \303\246 为 æ 在sed中的显示,八进制
...]$ echo helloæworld | sed 's/\o303\o246/a+e/'
...]$ helloa+eworld æ 已被替换

打印 sed 看到的特殊字符是什么,便于操作

1
sed -n 'l0'

参考并感谢相应作者

2. 正则表达式 Metacharacter [] 中 ‘-’的用法

当 ‘-’ 被用在[]之间表示字符范围,那么是否任意两个字符都可以作为范围的起始和结束。比如,用十进制表示字符的ASCII码,[\d1-\d255] 是否表示任意一个字符。将范围表示为[begin - end]

测试结果

  1. 正则表达式显示声明的三类范围支持这种用法,比如0-9,a-z,A-Z。
    例如 [0-9] 等于 [\d48-\d57]
  2. begin ASCII码 必须小于 end ASCII码
  3. 只在相应区间0-9,a-z,A-Z内成立,不能跨区间,不能用其他字符

3. 正则表达式 Metacharacter [] 中 无法用 ‘\s’ 匹配空格或tab

  • 空格用 ‘ ‘
  • tab用 ‘\t’

4. 其他

sed 中用不同进制的数位表示ASCII

\dxxx
Produces or matches a character whose decimal ASCII value is xxx.
\oxxx
Produces or matches a character whose octal ASCII value is xxx.
\xxx
Produces or matches a character whose hexadecimal ASCII value is xx