二、Golang性能调优示例

故事

这篇博客模拟了一个数据处理程序，并有意使用go语言内置的json模块（性能较低）完成数据的序列化和反序列化。

然后通过Benchmark和cpu profile测量技术发现这个json模块的性能瓶颈。

最后通过引入github上的easyjson（性能较高）模块来优化程序。

翠花，上代码

先定义数据结构（structs.go）：

package profiling

type Request struct {
	TransactionID string `json:"transaction_id"`
	PayLoad       []int  `json:"payload"`
}

type Response struct {
	TransactionID string `json:"transaction_id"`
	Expression    string `json:"exp"`
}

Request和Response结构中都使用了Tag（类似Java中的注解）的方式，便于使用Go内置的json模块进行序列化。

定义处理函数（optimization.go）：

package profiling

import (
	"encoding/json"
	"strconv"
	"strings"
)

func createRequest() string {
	payload := make([]int, 100, 100)
	for i := 0; i < 100; i++ {
		payload[i] = i
	}
	req := Request{"demo_transaction", payload}
	v, err := json.Marshal(&req)
	if err != nil {
		panic(err)
	}
	return string(v)
}

func processRequest(reqs []string) []string {
	reps := []string{}
	for _, req := range reqs {
		reqObj := &Request{}

		json.Unmarshal([]byte(req), reqObj)

		var buf strings.Builder
		for _, e := range reqObj.PayLoad {
			buf.WriteString(strconv.Itoa(e))
			buf.WriteString(",")
		}
		repObj := &Response{reqObj.TransactionID, buf.String()}

		repJson, err := json.Marshal(&repObj)
		if err != nil {
			panic(err)
		}
		reps = append(reps, string(repJson))
	}
	return reps
}

其中，createRequest函数构造了一个数组，使用json.Marshal将数组中的值序列化成json字符串。

processRequest函数则接收json字符串，然后使用json.Unmarshal函数将json字符串反序列化为数组，for循环将数组中的元素使用逗号连接成字符串。

测试代码

optimization_test.go

package profiling

import "testing"

func TestCreateRequest(t *testing.T) {
	str := createRequest()
	t.Log(str)
}

func TestProcessRequest(t *testing.T) {
	reqs := []string{}
	reqs = append(reqs, createRequest())
	reps := processRequest(reqs)
	t.Log(reps[0])
}

func BenchmarkProcessRequest(b *testing.B) {

	reqs := []string{}
	reqs = append(reqs, createRequest())
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		_ = processRequest(reqs)
	}
	b.StopTimer()

}

测试代码中，分别对createRequest、processRequest进行了功能测试（没有断言。。不太严谨）。最后使用Benchmark对函数调用进行了性能测试。

运行测试代码

$ go test -v -bench=.
=== RUN   TestCreateRequest
--- PASS: TestCreateRequest (0.00s)
    opt_test.go:7: {"transaction_id":"demo_transaction","payload":[0,1,2,3,4,5,6,7,8,....]}
=== RUN   TestProcessRequest
--- PASS: TestProcessRequest (0.00s)
    opt_test.go:14: {"transaction_id":"demo_transaction","exp":"0,1,2,3,4,5,6,7,8..."}
goos: windows
goarch: amd64
pkg: opt
BenchmarkProcessRequest-4          37521             29914 ns/op
PASS
ok      opt     1.769s

从运行结果上看，BenchmarkProcessRequest测试函数执行了29914ns（纳秒），下面我们将以此值为基准进行优化。

分析瓶颈点

首先，在运行时生成cpu profile文件，通过cpu profile文件分析函数调用过程中的瓶颈点：

$ go test -v -bench=. -cpuprofile=cpu.prof
=== RUN   TestCreateRequest
--- PASS: TestCreateRequest (0.00s)
    opt_test.go:7: {"transaction_id":"demo_transaction","payload":[0,1,2,3,4,5,6,....]}
=== RUN   TestProcessRequest
--- PASS: TestProcessRequest (0.00s)
    opt_test.go:14: {"transaction_id":"demo_transaction","exp":"0,1,2,3,4,5,6,7......."}
goos: windows
goarch: amd64
pkg: opt
BenchmarkProcessRequest-4          38134             31257 ns/op
PASS
ok      opt     3.308s

使用-cpuprofile参数运行测试用例后，会在当前目录中生成cpu.prof文件：

wanhex@DESKTOP-DP10B8O MINGW64 /d/project_root/gogogo/hello_test/src/opt
$ ls
cpu.prof  opt.go  opt.test.exe*  opt_test.go  stuct.go

使用pprof分析性能瓶颈点：

$ go tool pprof cpu.prof
Type: cpu
Time: Feb 24, 2020 at 12:17pm (CST)
Duration: 2.73s, Total samples = 3.09s (113.37%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof)
(pprof) top -cum
Showing nodes accounting for 0.23s, 7.44% of 3.09s total
Dropped 102 nodes (cum <= 0.02s)
Showing top 10 nodes out of 122
      flat  flat%   sum%        cum   cum%
         0     0%     0%      2.49s 80.58%  opt.BenchmarkProcessRequest
     0.01s  0.32%  0.32%      2.49s 80.58%  opt.processRequest
         0     0%  0.32%      2.49s 80.58%  testing.(*B).launch
         0     0%  0.32%      2.49s 80.58%  testing.(*B).runN
         0     0%  0.32%      2.22s 71.84%  encoding/json.Unmarshal
     0.01s  0.32%  0.65%      2.01s 65.05%  encoding/json.(*decodeState).unmarshal
     0.04s  1.29%  1.94%         2s 64.72%  encoding/json.(*decodeState).value
         0     0%  1.94%      1.99s 64.40%  encoding/json.(*decodeState).object
     0.06s  1.94%  3.88%      1.94s 62.78%  encoding/json.(*decodeState).array
     0.11s  3.56%  7.44%      1.28s 41.42%  encoding/json.(*decodeState).literalStore
(pprof) list opt.processRequest
Total: 3.09s
ROUTINE ======================== opt.processRequest in D:\project_root\gogogo\hello_test\src\opt\opt.go
      10ms      2.49s (flat, cum) 80.58% of Total
         .          .     22:func processRequest(reqs []string) []string {
         .          .     23:   reps := []string{}
         .          .     24:   for _, req := range reqs {
         .          .     25:           reqObj := &Request{}
         .          .     26:
         .      2.23s     27:           json.Unmarshal([]byte(req), reqObj)
         .          .     28:
         .          .     29:           var buf strings.Builder
      10ms       10ms     30:           for _, e := range reqObj.PayLoad {
         .       90ms     31:                   buf.WriteString(strconv.Itoa(e))
         .       50ms     32:                   buf.WriteString(",")
         .          .     33:           }
         .       20ms     34:           repObj := &Response{reqObj.TransactionID, buf.String()}
         .          .     35:
         .       70ms     36:           repJson, err := json.Marshal(&repObj)
         .          .     37:           if err != nil {
         .          .     38:                   panic(err)
         .          .     39:           }
         .       20ms     40:           reps = append(reps, string(repJson))
         .          .     41:   }
         .          .     42:   return reps
         .          .     43:}
(pprof)
(pprof) exit

使用top命令可以看到，opt.processRequest是主要的性能瓶颈点。

使用list命令详细分析opt.processRequest函数调用情况，可以发现json.Unmarshal为本次函数调用的性能瓶颈点。

go语言中，json模块对结构体序列化和反序列化使用了反射机制，而反射机制其实是很耗性能的。

所以，我们将使用easyjson模块（不使用反射）来代替go内置的json模块，以此进行性能优化。

使用easyjson进行优化

安装easyjson：

$ go get -u github.com/mailru/easyjson/...

若访问github较慢，可使用加速器：http://91tianlu.date/aff.php?aff=3468

生成structs.go文件中Request和Response结构体的序列化和反序列化函数：

$ easyjson.exe  -all structs.go

成功后，会在当前目录下生成structs_easyjson.go文件，文件中包含了对应的序列化和反序列化函数。

修改上面的optimization.go代码（注意看注释代码）：

package profiling

import (
	"encoding/json"
	"strconv"
	"strings"
)

func createRequest() string {
	payload := make([]int, 100, 100)
	for i := 0; i < 100; i++ {
		payload[i] = i
	}
	req := Request{"demo_transaction", payload}
	v, err := json.Marshal(&req)
	if err != nil {
		panic(err)
	}
	return string(v)
}

func processRequest(reqs []string) []string {
	reps := []string{}
	for _, req := range reqs {
		reqObj := &Request{}

                //优化后
                reqObj.UnmarshalJSON([]byte(req))
                //优化前
		//json.Unmarshal([]byte(req), reqObj)

		var buf strings.Builder
		for _, e := range reqObj.PayLoad {
			buf.WriteString(strconv.Itoa(e))
			buf.WriteString(",")
		}
		repObj := &Response{reqObj.TransactionID, buf.String()}

                //优化后
                repJson, err := repObj.MarshalJSON()
                //优化前
		//repJson, err := json.Marshal(&repObj)
		if err != nil {
			panic(err)
		}
		reps = append(reps, string(repJson))
	}
	return reps
}

优化后，执行测试用例，查看优化后的性能情况：

$ go test -v -bench=.
=== RUN   TestCreateRequest
--- PASS: TestCreateRequest (0.00s)
    opt_test.go:7: {"transaction_id":"demo_transaction","payload":[0,1,2,3,4,5,6,7,8,9...]}
=== RUN   TestProcessRequest
--- PASS: TestProcessRequest (0.00s)
    opt_test.go:14: {"transaction_id":"demo_transaction","exp":"0,1,2,3,4,5,6..."}
goos: windows
goarch: amd64
pkg: opt
BenchmarkProcessRequest-4         126828              9673 ns/op
PASS
ok      opt     3.549s

经过上面的优化，我们将BenchmarkProcessRequest函数的执行时间从29914 ns/op优化到了9673 ns/op。

当然，你也可以通过-memprofile选项结合pprof工具对内存情况进行优化，memprofile用法与cpuprofile基本一致。

注：

博客内容为极客时间视频课《Go语言从入门到实战》学习笔记。