go正则regexp - Golang教程网

regexp包实现了正则表达式搜索。正则表达式采用RE2语法（除了\c、\C），和Perl、Python等语言的正则基本一致。

regexp

Find(All)?(String)?(Submatch)?(Index)?

如果’All’出现了，该方法会返回输入中所有互不重叠的匹配结果。如果一个匹配结果的前后（没有间隔字符）存在长度为0的成功匹配，该空匹配会被忽略。包含All的方法会要求一个额外的整数参数n，如果n>=0，方法会返回最多前n个匹配结果。
如果’String’出现了，匹配对象为字符串，否则应该是[]byte类型，返回值和匹配对象的类型是对应的。
如果’Submatch’出现了，返回值是表示正则表达式中成功的组匹配（子匹配/次级匹配）的切片。组匹配是正则表达式内部的括号包围的次级表达式（也被称为“捕获分组”），从左到右按左括号的顺序编号。，索引0的组匹配为完整表达式的匹配结果，1为第一个分组的匹配结果，依次类推。
如果’Index’出现了，匹配/分组匹配会用输入流的字节索引对表示result[2n:2n+1]表示第n个分组匹配的的匹配结果。如果没有’Index’，匹配结果表示为匹配到的文本。如果索引为负数，表示分组匹配没有匹配到输入流中的文本。

分组

1. (re)           编号的捕获分组
2. (?P<name>re)   命名并编号的捕获分组
3. (?:re)         不捕获的分组
4. (?flags)       设置当前所在分组的标志，不捕获也不匹配
5. (?flags:re)    设置re段的标志，不捕获的分组

func Match

func Match(pattern string, b []byte) (matched bool, err error)

Match检查b中是否存在匹配pattern的子序列。更复杂的用法请使用Compile函数和Regexp对象。

func MatchString

func MatchString(pattern string, s string) (matched bool, err error)

MatchString类似Match，但匹配对象是字符串。

func MatchReader

func MatchReader(pattern string, r io.RuneReader) (matched bool, err error)

MatchReader类似Match，但匹配对象是io.RuneReader。

type Regexp

Regexp代表一个编译好的正则表达式。Regexp可以被多线程安全地同时使用。

func Compile

func Compile(expr string) (*Regexp, error)

Compile解析并返回一个正则对象。

在匹配文本时，该正则表达式会尽可能早的开始匹配，并且在匹配过程中选择回溯搜索到的第一个匹配结果，这种模式被称为“leftmost-first”。
Perl、Python和其他实现都采用了这种模式，但本包的实现没有回溯的损耗。对POSIX的“leftmost-longest”模式，参见CompilePOSIX。

func CompilePOSIX

func CompilePOSIX(expr string) (*Regexp, error)

类似Compile但会将语法约束到POSIX ERE（egrep）语法，并将匹配模式设置为leftmost-longest（最左最长方式搜索）不支持\w,\W等perl写法
在匹配文本时，该正则表达式会尽可能早的开始匹配，并且在匹配过程中选择搜索到的最长的匹配结果，这种模式被称为“leftmost-longest”

func MustCompile

func MustCompile(str string) *Regexp

MustCompile类似Compile但会在解析失败时panic，主要用于全局正则表达式变量的安全初始化。

func MustCompilePOSIX

func MustCompilePOSIX(str string) *Regexp

MustCompilePOSIX类似CompilePOSIX但会在解析失败时panic，主要用于全局正则表达式变量的安全初始化。

regexp.Compile 和 regexp.CompilePOSIX 有什么区别

Perl和POSIX兼容的正则表达式在很大程度上相似，但在一些关键方面有所不同，例如子匹配。
假设你有一个正则表达式（foo | foobar）。当将这个表达式与匹配多个子表达式的字符串进行匹配时（例如，foobarbaz将同时匹配子模式foo和foobar），与Perl兼容的正则表达式将返回第一个匹配（foo），而与POSIX兼容的正则表达式将返回最长的匹配（foobar）。

func main() {
    pattern := "(foo|foobar)"
    str := []byte("foobarbaz")

    rPCRE, _ := regexp.Compile(pattern)
    rPOSIX, _ := regexp.CompilePOSIX(pattern)

    matchesPCRE := rPCRE.Find(str)
    fmt.Println(string(matchesPCRE))
    // prints "foo"

    matchesPOSIX := rPOSIX.Find(str)
    fmt.Println(string(matchesPOSIX))
    // prints "foobar"
}

func (*Regexp) String

func (re *Regexp) String() string

String返回用于编译成正则表达式的字符串。

func (*Regexp) Match

func (re *Regexp) Match(b []byte) bool

Match检查b中是否存在匹配pattern的子序列。

func (*Regexp) MatchString

func (re *Regexp) MatchString(s string) bool

MatchString类似Match，但匹配对象是字符串。

func (*Regexp) Find

func (re *Regexp) Find(b []byte) []byte

Find返回保管正则表达式re在b中的最左侧的一个匹配结果的[]byte切片。如果没有匹配到，会返回nil。

func (*Regexp) FindString

func (re *Regexp) FindString(s string) string

Find返回保管正则表达式re在b中的最左侧的一个匹配结果的字符串。如果没有匹配到，会返回"“；但如果正则表达式成功匹配了一个空字符串，也会返回”"。如果需要区分这种情况，请使用FindStringIndex 或FindStringSubmatch。

https://studygolang.com/pkgdoc

举例

匹配判断

匹配判断使用Match方法或MatchString方法，两者区别仅是Match参数为字节类型，MatchString为字符串类型

//匹配,参数为字符串
matchResult,_ := regexp.MatchString("study", text)
fmt.Println(matchResult) //true

//匹配，参数为[]byte
byteMatchResult,_ := regexp.Match(`\d+`, []byte(text))
fmt.Println(byteMatchResult) //true

查找

查找主要方法有FindString、FindAllString、FindStringIndex

//查询所有，指定参数最多查询个数为2， 负数表示查所有，如-1
queryAllResult := regexp.MustCompilePOSIX(`study`).FindAllString(text, 2)
fmt.Println(queryAllResult) //[study study]

//查询第一个匹配字符串
queryResult := regexp.MustCompile("study").FindString(text)
fmt.Println(queryResult) //study

//最短匹配
shortTxt := regexp.MustCompile(`st|study`).FindString(text)
fmt.Println(shortTxt) //st

//最长匹配
longTxt := regexp.MustCompilePOSIX(`st|study`).FindString(text)
fmt.Println(longTxt) //study

//查询匹配串位置
findIdxResult := regexp.MustCompile(`study`).FindStringIndex(text)
println(findIdxResult)

//Unicode使用，如\p{Han}表示中文
hanResult := regexp.MustCompile(`[\p{Han}]+`).FindString(text)
println(hanResult) //好好学习

//非贪婪，添加标识?
notGreedyResult := regexp.MustCompile(`\d+?3`).FindString(text)
println(notGreedyResult) //23
//贪婪
greedyResult := regexp.MustCompile(`\d+3`).FindString(text)
println(greedyResult) //2333

//分组
groupResult := regexp.MustCompile(`(\d+)\s+(\d+)`).FindString(text)
println(groupResult) //11 2333
//分组打标签
tagGroupResult := regexp.MustCompile(`(?P<tagName_1>\d+)\s+(?P<tagName_2>\d+)`).FindString(text)
println(tagGroupResult) //11 2333

//非贪婪模式标识：(?U)，切换为贪婪模式用方法Longest
notGreedyReg := regexp.MustCompile(`(?U)\d+3`)
notGreedyResult := notGreedyReg.FindString(text)
println(notGreedyResult) //23
//通过方法Longest转为贪婪模式
notGreedyReg.Longest()
changeGreedyResult := notGreedyReg.FindString(text)
println(changeGreedyResult) //2333
//贪婪模式，不带标识(?U)
greedyResult := regexp.MustCompile(`\d+3`).FindString(text)
println(greedyResult)//2333

替换

替换主要方法有ReplaceAllString、ReplaceAllStringFunc，其中ReplaceAllStringFunc表示使用自定义方法定义替换规则。

//自定义替换规则函数
func updateDisplose(text string) string{
	if strings.Contains(text, "5"){
		return text+"++"
	}
	return text
}

//替换
updateResult := regexp.MustCompile("<.*?>").ReplaceAllString(text, "")
fmt.Println(updateResult) //apple study hard ,好好学习, 11 2333 33 study, good，天天向上 study 1243 53,

//替换，使用函数指定替换规则
updateFuncResult := regexp.MustCompile(`\d+`).ReplaceAllStringFunc(text, updateDisplose)
println(updateFuncResult) //<p>apple study hard ,好好学习, 11 2333 33 study, good，天天向上 study 1243 53++, </p>

//分组查找替换
expandReg := regexp.MustCompile(`(\w+).*(\d+)`)
template := "hello $1, study $2"
//dst := "info:"
//subMatchIdx := expandReg.FindStringSubmatchIndex(text)
//expandResult :=expandReg.ExpandString([]byte(dst), template, text, subMatchIdx)
expandResult :=expandReg.ExpandString(nil, template, text, expandReg.FindStringSubmatchIndex(text))
fmt.Printf("%s\n", expandResult) //hello p, study 3