Golang 性能测试 (3) 跟踪刨析 golang trace

简介

对于绝大部分服务，跟踪刨析是用不到的。但是如果遇到了下面问题，可以不妨一试：

怀疑哪个协程慢了
系统调用有问题
协程调度问题 (chan 交互、互斥锁、信号量等)
怀疑是 gc (garbage-collect) 影响了服务性能
网络阻塞
等等

坦白的讲，通过跟踪刨析可以看到每个协程在某一时刻在干什么。

做跟踪刨析，首先需要获取trace 数据。可以通过代码中插入trace，或者上节提到的通过pprof 下载即可。

example

code

下面通过代码直接插入的方式来获取trace. 内容会涉及到网络请求，涉及协程异步执行等。

package main

import (
	"io/ioutil"
	"math/rand"
	"net/http"
	"os"
	"runtime/trace"
	"strconv"
	"sync"
	"time"
)


var wg sync.waitgroup
var httpclient = &http.client{timeout: 30 * time.second}

func sleepsometime() time.duration{
	return time.microsecond * time.duration(rand.int()%1000)
}

func create(readchan chan int) {
	defer wg.done()
	for i := 0; i < 500; i++ {
		readchan <- getbodysize()
		sleepsometime()
	}
	close(readchan)
}

func convert(readchan chan int, output chan string) {
	defer wg.done()
	for readchan := range readchan {
		output <- strconv.itoa(readchan)
		sleepsometime()
	}
	close(output)
}

func outputstr(output chan string) {
	defer wg.done()
	for _ = range output {
		// do nothing
		sleepsometime()
	}
}

// 获取taobao 页面大小
func getbodysize() int {
	resp, _ := httpclient.get("https://taobao.com")
	res, _ := ioutil.readall(resp.body)
	_ = resp.body.close()
	return len(res)
}

func run() {
	readchan, output := make(chan int), make(chan string)
	wg.add(3)
	go create(readchan)
	go convert(readchan, output)
	go outputstr(output)
}

func main() {
	f, _ := os.create("trace.out")
	defer f.close()
	_ = trace.start(f)
	defer trace.stop()
	run()
	wg.wait()
}

编译，并执行，然后启动trace;

[~/blog]$ go build trace_example.go 
[~/blog]$ ./trace_example
[~/blog]$ go tool trace -http=":8000" trace_example trace.out 
2020/04/15 17:34:48 parsing trace...
2020/04/15 17:34:50 splitting trace...
2020/04/15 17:34:51 opening browser. trace viewer is listening on http://0.0.0.0:8000

然后打开浏览器，访问8000 端口即可。

trace 功能

其中:
view trace：查看跟踪 (按照时间分段，上面我的例子时间比较短，所以没有分段)
goroutine analysis：goroutine 分析
network blocking profile：网络阻塞概况
synchronization blocking profile：同步阻塞概况
syscall blocking profile：系统调用阻塞概况
scheduler latency profile：调度延迟概况
user defined tasks：用户自定义任务
user defined regions：用户自定义区域
minimum mutator utilization：最低 mutator 利用率（主要是gc 的评价标准, 暂时没搞懂）

goroutine 调度分析

下图包含了两种事件：

网络相关 main.create 触发网络写的协程，网络写操作的协程 writeloop，然后等待网络返回。
gc 相关操作

下面是web请求到数据，从epoll 中触发，然后readloop协程响应,直接触发main.create 的协程得到执行。

当然我们也可以筛选协程做具体分析，从 goroutine analysis 进入，选择具体的协程进行分析：

我们选择对 main.create 的协程做分析（这个协程略复杂，可以分析的东西比较多）

可以从图中看出，network 唤醒 readloop 协程，进而readloop 又通知了main.create 协程。

当然，我们也可以选择 main.convert 协程。可以看出协程被main.create 唤醒了（由于给chan 提供了数据）

除了可以分析goroutine 调度之外，还可以做网络阻塞分析，异步阻塞分析，系统调度阻塞分析，协程调度阻塞分析（下图）

自定义 task 和 region

当然，还可以指定task 和 region 做分析，下面是官方举的例子:

//filepath:  src/runtime/trace/trace.go
ctx, task := trace.newtask(ctx, "makecappuccino")
trace.log(ctx, "orderid", orderid)

milk := make(chan bool)
espresso := make(chan bool)

go func() {
        trace.withregion(ctx, "steammilk", steammilk)
        milk <- true
}()
go func() {
        trace.withregion(ctx, "extractcoffee", extractcoffee)
        espresso <- true
}()
go func() {
        defer task.end() // when assemble is done, the order is complete.
        <-espresso
        <-milk
        trace.withregion(ctx, "mixmilkcoffee", mixmilkcoffee)
}()

mmu 图

除此之外，还提供了minimum mutator utilization 图 (mmu 图 )

mmu 图，数轴是服务可以占用cpu的百分比 (其他时间为gc操作)

从图中可以看出，在2ms之后，可利用的cpu逐步上升，直到接近100%.所以gc 毫无压力。

重点提醒

必须用chrome，并且高版本不行。我使用的是76.
trace 的文件都比较大，几分钟可能上百兆，所以网络一定要好，或者使用本机做验证。
造作是 w 放大， s 缩小， a 左移， d 右移
gc 的mmu 图解释（备注下，还没有来得及看）