gonum.org/v1/gonum/floats
首先,需要安装mkl,下载地址
github.com/cpmech/gosl
blas里面,第一层是vector与vector的操作,验证一个点乘算子
package main
import (
"fmt"
"github.com/cpmech/gosl/la/mkl"
"gonum.org/v1/gonum/floats"
"time"
)
func main() {
sizes := []int{1e5, 1e7, 1e7, 1e8, 4 * 1e8}
for _, v := range sizes {
x := make([]float64, int(v))
for i := 0; i < int(v); i++ {
x[i] = 1
}
rawDot(x, x)
for j := 1; j < 5; j++ {
mklDot(j, x, x)
}
}
}
func mklDot(threadNum int, x, y []float64) {
n, incx, incy := len(x), 1, 1
start := time.Now()
mkl.SetNumThreads(threadNum)
res := mkl.Ddot(n, x, incx, y, incy)
fmt.Printf("mkl threadNum:%v len:%v duration:%v res:%v\n", threadNum, len(x), time.Now().Sub(start), res)
}
func rawDot(x, y []float64) {
start := time.Now()
res := floats.Dot(x, y)
fmt.Printf("raw len:%v duration:%v res:%v\n", len(x), time.Now().Sub(start), res)
}
代码做的功能很简单,raw代表gonum的算子操作,与mkl的对比。由于mkl的编程思想是fork-join,可以启用多线程,所以这里验证了同样一个mkl算子,1到4个线程的差异。
输出:
raw len:100000 duration:34.479µs res:100000
mkl threadNum:1 len:100000 duration:663.039µs res:100000
mkl threadNum:2 len:100000 duration:105.323µs res:100000
mkl threadNum:3 len:100000 duration:40.387µs res:100000
mkl threadNum:4 len:100000 duration:45.641µs res:100000
raw len:1000000 duration:538.054µs res:1e+06
mkl threadNum:1 len:1000000 duration:513.524µs res:1e+06
mkl threadNum:2 len:1000000 duration:319.566µs res:1e+06
mkl threadNum:3 len:1000000 duration:253.524µs res:1e+06
mkl threadNum:4 len:1000000 duration:185.843µs res:1e+06
raw len:10000000 duration:5.57392ms res:1e+07
mkl threadNum:1 len:10000000 duration:3.649214ms res:1e+07
mkl threadNum:2 len:10000000 duration:2.811791ms res:1e+07
mkl threadNum:3 len:10000000 duration:2.545393ms res:1e+07
mkl threadNum:4 len:10000000 duration:2.385943ms res:1e+07
raw len:100000000 duration:59.743135ms res:1e+08
mkl threadNum:1 len:100000000 duration:44.084916ms res:1e+08
mkl threadNum:2 len:100000000 duration:33.835634ms res:1e+08
mkl threadNum:3 len:100000000 duration:32.16158ms res:1e+08
mkl threadNum:4 len:100000000 duration:30.032124ms res:1e+08
raw len:400000000 duration:1.610460541s res:4e+08
mkl threadNum:1 len:400000000 duration:194.624908ms res:4e+08
mkl threadNum:2 len:400000000 duration:141.571815ms res:4e+08
mkl threadNum:3 len:400000000 duration:126.421845ms res:4e+08
mkl threadNum:4 len:400000000 duration:118.767545ms res:4e+08```
日志看有点乱,变成表格就会清晰很多
1e5 | 1e6 | 1e7 | 1e8 | 4 * 1e8 | |
---|---|---|---|---|---|
gonumraw | 34us | 538us | 5.5ms | 59.7ms | 1610ms |
mkl 1 thread | 663us | 513us | 3.6ms | 44ms | 194ms |
mkl 2 thread | 105us | 319us | 2.8ms | 33.8ms | 141ms |
mkl 3 thread | 40us | 253us | 2.5ms | 32.1ms | 126ms |
mkl 4 thread | 45us | 185us | 3.8ms | 30ms | 118ms |
可以看到:
单线程情况下:
- 计算量在1e5这个数量级,gonum具有优势,
- 在1e8这个数量级,mkl几乎是10倍的优势,而且线性趋势依旧能够保持
- gonum的计算,在1e8这个数量级,已经出现非线性的趋势
多线程情况下:
mkl在多线程上并非是线性的,1个线程与4线程差别并不大