故事
这篇博客模拟了一个数据处理程序,并有意使用go语言内置的json模块(性能较低)完成数据的序列化和反序列化。
然后通过Benchmark和cpu profile测量技术发现这个json模块的性能瓶颈。
最后通过引入github上的easyjson(性能较高)模块来优化程序。
翠花,上代码
先定义数据结构(structs.go):
package profiling
type Request struct {
TransactionID string `json:"transaction_id"`
PayLoad []int `json:"payload"`
}
type Response struct {
TransactionID string `json:"transaction_id"`
Expression string `json:"exp"`
}
Request和Response结构中都使用了Tag(类似Java中的注解)的方式,便于使用Go内置的json模块进行序列化。
定义处理函数(optimization.go):
package profiling
import (
"encoding/json"
"strconv"
"strings"
)
func createRequest() string {
payload := make([]int, 100, 100)
for i := 0; i < 100; i++ {
payload[i] = i
}
req := Request{"demo_transaction", payload}
v, err := json.Marshal(&req)
if err != nil {
panic(err)
}
return string(v)
}
func processRequest(reqs []string) []string {
reps := []string{}
for _, req := range reqs {
reqObj := &Request{}
json.Unmarshal([]byte(req), reqObj)
var buf strings.Builder
for _, e := range reqObj.PayLoad {
buf.WriteString(strconv.Itoa(e))
buf.WriteString(",")
}
repObj := &Response{reqObj.TransactionID, buf.String()}
repJson, err := json.Marshal(&repObj)
if err != nil {
panic(err)
}
reps = append(reps, string(repJson))
}
return reps
}
其中,createRequest函数构造了一个数组,使用json.Marshal将数组中的值序列化成json字符串。
processRequest函数则接收json字符串,然后使用json.Unmarshal函数将json字符串反序列化为数组,for循环将数组中的元素使用逗号连接成字符串。
测试代码
optimization_test.go
package profiling
import "testing"
func TestCreateRequest(t *testing.T) {
str := createRequest()
t.Log(str)
}
func TestProcessRequest(t *testing.T) {
reqs := []string{}
reqs = append(reqs, createRequest())
reps := processRequest(reqs)
t.Log(reps[0])
}
func BenchmarkProcessRequest(b *testing.B) {
reqs := []string{}
reqs = append(reqs, createRequest())
b.ResetTimer()
for i := 0; i < b.N; i++ {
_ = processRequest(reqs)
}
b.StopTimer()
}
测试代码中,分别对createRequest、processRequest进行了功能测试(没有断言。。不太严谨)。最后使用Benchmark对函数调用进行了性能测试。
运行测试代码
$ go test -v -bench=.
=== RUN TestCreateRequest
--- PASS: TestCreateRequest (0.00s)
opt_test.go:7: {"transaction_id":"demo_transaction","payload":[0,1,2,3,4,5,6,7,8,....]}
=== RUN TestProcessRequest
--- PASS: TestProcessRequest (0.00s)
opt_test.go:14: {"transaction_id":"demo_transaction","exp":"0,1,2,3,4,5,6,7,8..."}
goos: windows
goarch: amd64
pkg: opt
BenchmarkProcessRequest-4 37521 29914 ns/op
PASS
ok opt 1.769s
从运行结果上看,BenchmarkProcessRequest测试函数执行了29914ns(纳秒),下面我们将以此值为基准进行优化。
分析瓶颈点
首先,在运行时生成cpu profile文件,通过cpu profile文件分析函数调用过程中的瓶颈点:
$ go test -v -bench=. -cpuprofile=cpu.prof
=== RUN TestCreateRequest
--- PASS: TestCreateRequest (0.00s)
opt_test.go:7: {"transaction_id":"demo_transaction","payload":[0,1,2,3,4,5,6,....]}
=== RUN TestProcessRequest
--- PASS: TestProcessRequest (0.00s)
opt_test.go:14: {"transaction_id":"demo_transaction","exp":"0,1,2,3,4,5,6,7......."}
goos: windows
goarch: amd64
pkg: opt
BenchmarkProcessRequest-4 38134 31257 ns/op
PASS
ok opt 3.308s
使用-cpuprofile参数运行测试用例后,会在当前目录中生成cpu.prof文件:
wanhex@DESKTOP-DP10B8O MINGW64 /d/project_root/gogogo/hello_test/src/opt
$ ls
cpu.prof opt.go opt.test.exe* opt_test.go stuct.go
使用pprof分析性能瓶颈点:
$ go tool pprof cpu.prof
Type: cpu
Time: Feb 24, 2020 at 12:17pm (CST)
Duration: 2.73s, Total samples = 3.09s (113.37%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof)
(pprof) top -cum
Showing nodes accounting for 0.23s, 7.44% of 3.09s total
Dropped 102 nodes (cum <= 0.02s)
Showing top 10 nodes out of 122
flat flat% sum% cum cum%
0 0% 0% 2.49s 80.58% opt.BenchmarkProcessRequest
0.01s 0.32% 0.32% 2.49s 80.58% opt.processRequest
0 0% 0.32% 2.49s 80.58% testing.(*B).launch
0 0% 0.32% 2.49s 80.58% testing.(*B).runN
0 0% 0.32% 2.22s 71.84% encoding/json.Unmarshal
0.01s 0.32% 0.65% 2.01s 65.05% encoding/json.(*decodeState).unmarshal
0.04s 1.29% 1.94% 2s 64.72% encoding/json.(*decodeState).value
0 0% 1.94% 1.99s 64.40% encoding/json.(*decodeState).object
0.06s 1.94% 3.88% 1.94s 62.78% encoding/json.(*decodeState).array
0.11s 3.56% 7.44% 1.28s 41.42% encoding/json.(*decodeState).literalStore
(pprof) list opt.processRequest
Total: 3.09s
ROUTINE ======================== opt.processRequest in D:\project_root\gogogo\hello_test\src\opt\opt.go
10ms 2.49s (flat, cum) 80.58% of Total
. . 22:func processRequest(reqs []string) []string {
. . 23: reps := []string{}
. . 24: for _, req := range reqs {
. . 25: reqObj := &Request{}
. . 26:
. 2.23s 27: json.Unmarshal([]byte(req), reqObj)
. . 28:
. . 29: var buf strings.Builder
10ms 10ms 30: for _, e := range reqObj.PayLoad {
. 90ms 31: buf.WriteString(strconv.Itoa(e))
. 50ms 32: buf.WriteString(",")
. . 33: }
. 20ms 34: repObj := &Response{reqObj.TransactionID, buf.String()}
. . 35:
. 70ms 36: repJson, err := json.Marshal(&repObj)
. . 37: if err != nil {
. . 38: panic(err)
. . 39: }
. 20ms 40: reps = append(reps, string(repJson))
. . 41: }
. . 42: return reps
. . 43:}
(pprof)
(pprof) exit
使用top命令可以看到,opt.processRequest是主要的性能瓶颈点。
使用list命令详细分析opt.processRequest函数调用情况,可以发现json.Unmarshal为本次函数调用的性能瓶颈点。
go语言中,json模块对结构体序列化和反序列化使用了反射机制,而反射机制其实是很耗性能的。
所以,我们将使用easyjson模块(不使用反射)来代替go内置的json模块,以此进行性能优化。
使用easyjson进行优化
安装easyjson:
$ go get -u github.com/mailru/easyjson/...
若访问github较慢,可使用加速器:http://91tianlu.date/aff.php?aff=3468
生成structs.go文件中Request和Response结构体的序列化和反序列化函数:
$ easyjson.exe -all structs.go
成功后,会在当前目录下生成structs_easyjson.go文件,文件中包含了对应的序列化和反序列化函数。
修改上面的optimization.go代码(注意看注释代码):
package profiling
import (
"encoding/json"
"strconv"
"strings"
)
func createRequest() string {
payload := make([]int, 100, 100)
for i := 0; i < 100; i++ {
payload[i] = i
}
req := Request{"demo_transaction", payload}
v, err := json.Marshal(&req)
if err != nil {
panic(err)
}
return string(v)
}
func processRequest(reqs []string) []string {
reps := []string{}
for _, req := range reqs {
reqObj := &Request{}
//优化后
reqObj.UnmarshalJSON([]byte(req))
//优化前
//json.Unmarshal([]byte(req), reqObj)
var buf strings.Builder
for _, e := range reqObj.PayLoad {
buf.WriteString(strconv.Itoa(e))
buf.WriteString(",")
}
repObj := &Response{reqObj.TransactionID, buf.String()}
//优化后
repJson, err := repObj.MarshalJSON()
//优化前
//repJson, err := json.Marshal(&repObj)
if err != nil {
panic(err)
}
reps = append(reps, string(repJson))
}
return reps
}
优化后,执行测试用例,查看优化后的性能情况:
$ go test -v -bench=.
=== RUN TestCreateRequest
--- PASS: TestCreateRequest (0.00s)
opt_test.go:7: {"transaction_id":"demo_transaction","payload":[0,1,2,3,4,5,6,7,8,9...]}
=== RUN TestProcessRequest
--- PASS: TestProcessRequest (0.00s)
opt_test.go:14: {"transaction_id":"demo_transaction","exp":"0,1,2,3,4,5,6..."}
goos: windows
goarch: amd64
pkg: opt
BenchmarkProcessRequest-4 126828 9673 ns/op
PASS
ok opt 3.549s
经过上面的优化,我们将BenchmarkProcessRequest函数的执行时间从29914 ns/op优化到了9673 ns/op。
当然,你也可以通过-memprofile选项结合pprof工具对内存情况进行优化,memprofile用法与cpuprofile基本一致。
注:
博客内容为极客时间视频课《Go语言从入门到实战》学习笔记。
参考课程链接:
代码github链接