golang Map
map描述了一种键与值的映射关系,开发者通常会通过键来查询其对应的值。map最常见的底层实现有两种:基于Hash散列和基于平衡树,两者的存取时间复杂度不同,Go语言的map属于前者范畴。
Hash算法有两大核心:设计Hash函数和解决Hash冲突。
Map使用
声明&初始化
声明方式var m map[string]stringnew关键字m := *new(map[string]string)make关键字m := make(map[string]string)直接创建m := map[string]string{"": ""}
由于map是引用类型,因此使用声明方式创建map时在使用之前必须初始化。以上四种方式中最常用的是后面两种使用方式,第二种基本不会使用。
//方式一
// userInfo := map[string]string{}
userInfo := map[string]string{"name":"武沛齐","age":"18"}
userInfo["name"] // 武沛齐
userInfo["age"] = "20"
userInfo["email"] = "wupeiqi@live.com"
// data := make(map[int]int, 10)
data := make(map[int]int)
data[100] = 998
data[200] = 999
//方式二
// 声明,nil
var row map[int]int
row = data
注意:键不重复 & 键必须可哈希(int/bool/float/string/array)
常用操作
长度和容量
// 根据参数值(10),计算出合适的容量,容量是无限的。
// 一个map 中会包含很多桶,每个桶中可以存放8个键值对。
info := make(map[string]string, 10)
info["n1"] = "武沛齐"
info["n2"] = "alex"
v1 := len(info) // 2
// v2 := cap(info) // 报错,无效参数
t.Log(v1)
// t.Log(v2)
增加
data := map[string]string{"n1":"武沛齐","n2":"alex"}
data["n3"] = "eric"
删除
data := map[string]string{"n1":"武沛齐","n2":"alex"}
delete(data,"n2")
修改
data := map[string]string{"n1":"武沛齐","n2":"alex"}
data["n1"] = "eric"
查看
data := map[string]string{"n1":"武沛齐","n2":"alex"}
for key,value := range data{
fmt.Println(key,value)
}
data := map[string]string{"n1": "武沛齐", "n2": "alex"}
k, ok := data["n1"]
fmt.Println("k", k, "ok", ok)
嵌套
v1 := make(map[string]int)
v2 := make(map[string]string)
v3 := make(map[string]...)
v4 := make(map[string][2]int)
v5 := make(map[string][]int)
v6 := make(map[string]map[int]int)
v7 := make(map[string][2]map[string]string)
v7["n1"] = [2]map[string]string{ map[string]string{"name":"武沛齐","age":"18"},map[string]string{"name":"alex","age":"78"}}
v7["n2"] = [2]map[string]string{ map[string]string{"name":"eric","age":"18"},map[string]string{"name":"seven","age":"78"}}
// 伪代码
v7 = {
n1:[
{"name":"武沛齐","age":"18"},
{"name":"alex","age":"78"}
],
n2:[
{"name":"eric","age":"18"},
{"name":"seven","age":"78"}
]
}
前提:键不重复 & 键必须可哈希
v8 := make(map[int]int)
v9 := make(map[string]int)
v10 := make(map[float32]int)
v11 := make(map[bool]int)
v12 := make(map[ [2]int ]int)
v13 := make(map[ []int ]int) // 错误,不可哈希
v14 := make(map[ map[int]int ]int) // 错误,不可哈希
v15 := make(map[ [2][]int ]int) // 报错
v16 := make(map[ [2]map[string]string ]int) // 报错
变量赋值
v1 := map[string]string{"n1":"武沛齐","n2":"alex"}
v2 := v1
v1["n1"] = "wupeiqi"
ftm.Println(v1) // {"n1":"wupeiqi","n2":"alex"}
ftm.Println(v2) // {"n1":"wupeiqi","n2":"alex"}
特别提醒:无论是否存在扩容都指向同一个地址。
源码赏析
本文涉及到的源码解析部分来源于go官方的1.16.5版本,其中英文部分的注释也属于源码,中文部分的注释则属于笔者添加。
Hash
map最常见的底层实现有两种:基于Hash散列和基于平衡树,两者的存取时间复杂度不同,Go语言的map属于前者范畴。
Go语言map的底层实现基于Hash散列。Hash散列是一种著名的广义上的算法,它能够将任意长度的数据映射到有限的值域上面。
Hash算法有两大核心:设计Hash函数和解决Hash冲突。
设计Hash函数
设计Hash函数的基本原则有:
- 尽可能让输入的数据映射到不同的值域上面。
- 函数计算过程需要保证高性能。
但实际工程中由于输入数据范围是无限的,而输出值域范围是有限的,因此必然存在不同的输入数据经过映射后得到相同的输出值,这种现象称为Hash冲突。
hashCode()
// s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
public int hashCode() {
int h = hash;
if (h == 0 && value.length > 0) {
char val[] = value;
for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i];
}
hash = h;
}
return h;
}
Java选取31作为Magic Number,31不仅是奇数更是质数。stackoverflow上也有关于该问题的讨论:
31 * i == (i << 5) - i
此外,常见的著名Hash算法还有:MD5、SHA1、SHA2等等。
解决Hash冲突
如何解决Hash冲突是Hash算法中的核心一环,最常见的做法是拉链法。所谓拉链法是指当Hash冲突产生时,将出现冲突的Bucket位用链表这一数据结构串联。
此外,在Java语言JDK 1.8中,HashMap的链表还引入了平衡树来优化局部链表过长的性能问题1。
时间复杂度
Hash散列之所以成为map底层实现的首选,核心原因就是:平均增删改查时间复杂度都是O(1)。相比之下,基于平衡树的map(例如:C++ STL的map)平均时间复杂度是O(logN)。
| Algorithm | Action | Average Time Complexity | Worst Time Complexity |
|---|---|---|---|
| Hash | Insert | O(1) | O(N) |
| Hash | Delete | O(1) | O(N) |
| Hash | Update | O(1) | O(N) |
| Hash | Get | O(1) | O(N) |
| Balanced Binary Tree | Insert | O(logN) | O(logN) |
| Balanced Binary Tree | Delete | O(logN) | O(logN) |
| Balanced Binary Tree | Update | O(logN) | O(logN) |
| Balanced Binary Tree | Get | O(logN) | O(logN) |
需要注意的是,基于平衡树的map在最坏情况下依然能够保证时间复杂度是O(logN),并且能够提供key的有序遍历,这是基于Hash的map所不具备的能力。
go哈希函数
alginit()src/runtime/alg.go
hash 函数,有加密型和非加密型。
加密型的一般用于加密数据、数字摘要等,典型代表就是 md5、sha1、sha256、aes256 这种;
非加密型的一般就是查找。在 map 的应用场景中,用的是查找。
选择 hash 函数主要考察的是两点:性能、碰撞概率。
map的数据结构
// map的类型
type maptype struct {
typ _type
key *_type // 键的类型
elem *_type // 值的类型
bucket *_type // hash桶的内部类型
keysize uint8 // key的尺寸
valuesize uint8 // value的尺寸
bucketsize uint16 // bucket的尺寸
flags uint32
}
在源码中,表示 map 的结构体是 hmap,它是 hashmap 的“缩写”:
type hmap struct {
count int // map中元素(k-v)个数, 调用len(map)实际上返回该值
flags uint8 // map的状态字段, 记录当前map的状态
B uint8 // map当前桶的个数为: 2^B
noverflow uint16 // map当前溢出桶的个数
hash0 uint32 // 在计算key得hash值时作为hash种子使用
buckets unsafe.Pointer // buckets是一块连续内存(理解为一个数组), 用于存放数据(k-v), 该字段指向内存首地址
oldbuckets unsafe.Pointer // map扩容后, 该字段指向扩容前的buckets内存首地址
nevacuate uintptr // map扩容时, 该字段表示已经扩容完毕的buckets数(迁移进度)
extra *mapextra // 可选数据, 并不是所有map都会用到, 当map的key和value都不是指针,Go为了避免GC扫描整个hmap,会将bmap的overflow字段移动到extra。
}
hmapbucketsbmap
type bmap struct {
tophash [bucketCnt]uint8 // 一个[]uint8, 存储桶中8个元素的高八位hash值
}
// bucketCnt是map.go中的常量, bucketCnt = 8
// tophash是key hash的高八位
bmapbmap
type bmap struct {
tophash [bucketCnt]uint8 // tophash 是 hash 值的高 8 位
keys [8]keytype // 存放key的数组, 数组长度为8
values [8]valuetype // 存放value的数组,数组长度为8
pad uintptr
overflow pointer // 指向溢出桶的指针, 溢出桶也是个bmap结构, 只是叫法不同
}
bmapoverflowtophash
map的内存模型
桶(bmap)
map[int64]int8key/value...key/key/.../value/value/...
extra
extra是可选的
type mapextra struct {
overflow *[]*bmap // 所有溢出桶组成的数组
oldoverflow *[]*bmap // 扩容时指向扩容前所有溢出桶组成的数组
nextOverflow *bmap // 指向空闲的溢出桶
}
为什么需要extra
overflow
这一点map源码中的已经告诉了我们答案:当map的key和val均不含指针并且可以内联(size < 128字节)时,bmap可以被标注为不包含指针,这样可以避免GC时扫描整个map(扫描所有的bmap),
overflow
key的定位
后5位确定桶的位置,前8位确定key的位置,key找value;key相同发生哈希冲突,overflow解决,链表链接一个桶。
key 经过哈希计算后得到哈希值,共 64 个 bit 位(64位机,32位机就不讨论了,现在主流都是64位机),计算它到底要落在哪个桶时,只会用到最后 B 个 bit 位。还记得前面提到过的 B 吗?如果 B = 5,那么桶的数量,也就是 buckets 数组的长度是 2^5 = 32。
例如,现在有一个 key 经过哈希函数计算后,得到的哈希结果是:
10010111 | 000011110110110010001111001010100010010110010101010 │ 01010
01010
再用哈希值的高 8 位,找到此 key 在 bucket 中的位置,这是在寻找已有的 key。最开始桶内还没有 key,新加入的 key 会找到第一个空位,放入。
buckets 编号就是桶编号,当两个不同的 key 落在同一个桶中,也就是发生了哈希冲突。冲突的解决手段是用链表法:在 bucket 中,从前往后找到第一个空位。这样,在查找某个 key 时,先找到对应的桶,再去遍历 bucket 中的 key。
0011010010111
如果在 bucket 中没找到,并且 overflow 不为空,还要继续去 overflow bucket 中寻找,直到找到或是所有的 key 槽位都找遍了,包括所有的 overflow bucket。
map的创建
基本用法
makemap
package main
import "fmt"
func main() {
m1 := make(map[int]int)
m2 := make(map[int]int, 10) // 创建map支持传递一个参数,表示初始大小
fmt.Println(m1, m2)
}
让我们通过go的官方工具来查看汇编结果:
$ go tool compile -S main.go
"".main STEXT size=207 args=0x0 locals=0x70 funcid=0x0
0x0000 00000 (main.go:5) TEXT "".main(SB), ABIInternal, $112-0
0x0000 00000 (main.go:5) MOVQ (TLS), CX
0x0009 00009 (main.go:5) CMPQ SP, 16(CX)
0x000d 00013 (main.go:5) PCDATA $0, $-2
0x000d 00013 (main.go:5) JLS 197
0x0013 00019 (main.go:5) PCDATA $0, $-1
0x0013 00019 (main.go:5) SUBQ $112, SP
0x0017 00023 (main.go:5) MOVQ BP, 104(SP)
0x001c 00028 (main.go:5) LEAQ 104(SP), BP
0x0021 00033 (main.go:5) FUNCDATA $0, gclocals·69c1753bd5f81501d95132d08af04464(SB)
0x0021 00033 (main.go:5) FUNCDATA $1, gclocals·d527b79a98f329c2ba624a68e7df03d6(SB)
0x0021 00033 (main.go:5) FUNCDATA $2, "".main.stkobj(SB)
0x0021 00033 (main.go:6) PCDATA $1, $0
0x0021 00033 (main.go:6) CALL runtime.makemap_small(SB)
0x0026 00038 (main.go:6) MOVQ (SP), AX
0x002a 00042 (main.go:6) MOVQ AX, ""..autotmp_25+64(SP)
0x002f 00047 (main.go:7) LEAQ type.map[int]int(SB), CX
0x0036 00054 (main.go:7) MOVQ CX, (SP)
0x003a 00058 (main.go:7) MOVQ $10, 8(SP)
0x0043 00067 (main.go:7) MOVQ $0, 16(SP)
0x004c 00076 (main.go:7) PCDATA $1, $1
0x004c 00076 (main.go:7) CALL runtime.makemap(SB)
0x0051 00081 (main.go:7) MOVQ 24(SP), AX
0x0056 00086 (main.go:9) XORPS X0, X0
0x0059 00089 (main.go:9) MOVUPS X0, ""..autotmp_17+72(SP)
0x005e 00094 (main.go:9) MOVUPS X0, ""..autotmp_17+88(SP)
0x0063 00099 (main.go:9) LEAQ type.map[int]int(SB), CX
0x006a 00106 (main.go:9) MOVQ CX, ""..autotmp_17+72(SP)
0x006f 00111 (main.go:9) MOVQ ""..autotmp_25+64(SP), DX
0x0074 00116 (main.go:9) MOVQ DX, ""..autotmp_17+80(SP)
0x0079 00121 (main.go:9) MOVQ CX, ""..autotmp_17+88(SP)
0x007e 00126 (main.go:9) MOVQ AX, ""..autotmp_17+96(SP)
... 省略不相关的部分
makeruntime.makemap_smallruntime.makemap
// makemap_small implements Go map creation for make(map[k]v) and
// make(map[k]v, hint) when hint is known to be at most bucketCnt
// at compile time and the map needs to be allocated on the heap.
func makemap_small() *hmap {
h := new(hmap)
h.hash0 = fastrand()
return h
}
const (
// Maximum number of key/elem pairs a bucket can hold.
bucketCntBits = 3
bucketCnt = 1 << bucketCntBits
)
bucketCntruntime.makemap_smallmakemap_small
runtime.makemap
// makemap implements Go map creation for make(map[k]v, hint).
// If the compiler has determined that the map or the first bucket
// can be created on the stack, h and/or bucket may be non-nil.
// If h != nil, the map can be created directly in h.
// If h.buckets != nil, bucket pointed to can be used as the first bucket.
// 如果编译器发现map或者它第一个bucket能够被分配在栈上,则传入的h参数将不为空。
func makemap(t *maptype, hint int, h *hmap) *hmap {
mem, overflow := math.MulUintptr(uintptr(hint), t.bucket.size)
if overflow || mem > maxAlloc {
hint = 0
}
// initialize Hmap
if h == nil {
h = new(hmap)
}
h.hash0 = fastrand()
// Find the size parameter B which will hold the requested # of elements.
// For hint < 0 overLoadFactor returns false since hint < bucketCnt.
// hint表示创建map的预期初始大小,因此要计算一个合适的对数B
// overLoadFactor即用于判断当前的2^B是否已经超过hint
B := uint8(0)
for overLoadFactor(hint, B) {
B++
}
h.B = B
// allocate initial hash table
// if B == 0, the buckets field is allocated lazily later (in mapassign)
// If hint is large zeroing this memory could take a while.
// 如果B == 0,则Buckets数组将会延迟初始化,直到调用mapassign给该map存值
if h.B != 0 {
var nextOverflow *bmap
h.buckets, nextOverflow = makeBucketArray(t, h.B, nil) // 给Buckets数组分配内存
if nextOverflow != nil {
h.extra = new(mapextra)
h.extra.nextOverflow = nextOverflow
}
}
return h
}
runtime.makemap64
func makemap64(t *maptype, hint int64, h *hmap) *hmap {
if int64(int(hint)) != hint {
hint = 0
}
return makemap(t, int(hint), h)
}
小结一下,map的创建底层有三种实现:
makemap_smallmakemap64makemakemapmakemap
选择Hash函数
map底层使用的Hash函数在runtime层的初始化中会根据CPU架构和一些系统支持来选择。让我们翻阅相关源码:
func alginit() {
// Install AES hash algorithms if the instructions needed are present.
// 对于386/amd64/arm64都优先判断是否支持AES Hash函数
if (GOARCH == "386" || GOARCH == "amd64") &&
cpu.X86.HasAES && // AESENC
cpu.X86.HasSSSE3 && // PSHUFB
cpu.X86.HasSSE41 { // PINSR{D,Q}
initAlgAES()
return
}
if GOARCH == "arm64" && cpu.ARM64.HasAES {
initAlgAES()
return
}
getRandomData((*[len(hashkey) * sys.PtrSize]byte)(unsafe.Pointer(&hashkey))[:])
hashkey[0] |= 1 // make sure these numbers are odd
hashkey[1] |= 1
hashkey[2] |= 1
hashkey[3] |= 1
}
由上述源码可知,map的Hash函数优先选择AES,不支持的情况下进入默认逻辑。
map的访问
概述
map访问示例,B = 5
map的访问即通过给定的key在map中寻找其对应value,它的大致步骤如下:
bmap[]bmapbmaptophashbmapbmaptophashtophashbmapoverflowoverflowbmap
基本用法
map的访问方式直接通过key下标获取,返回值可以是一个value,也可以在此基础上多一个表示是否存在的bool:
m := map[int8]int{
1: 1,
2: 2,
3: 3,
}
v1 := m[1]
v2, ok := m[2] // ok是一个bool值,用于判断是否存在2这个key
fmt.Println(v1)
fmt.Println(v2, ok)
让我们通过go的官方工具来查看汇编结果:
$ go tool compile -S main.go
"".main STEXT size=741 args=0x0 locals=0x120 funcid=0x0
0x0000 00000 (main.go:5) TEXT "".main(SB), ABIInternal, $288-0
0x0000 00000 (main.go:5) MOVQ (TLS), CX
0x0009 00009 (main.go:5) LEAQ -160(SP), AX
0x0011 00017 (main.go:5) CMPQ AX, 16(CX)
0x0015 00021 (main.go:5) PCDATA $0, $-2
0x0015 00021 (main.go:5) JLS 726
0x001b 00027 (main.go:5) PCDATA $0, $-1
0x001b 00027 (main.go:5) SUBQ $288, SP
0x0022 00034 (main.go:5) MOVQ BP, 280(SP)
0x002a 00042 (main.go:5) LEAQ 280(SP), BP
0x0032 00050 (main.go:5) FUNCDATA $0, gclocals·69c1753bd5f81501d95132d08af04464(SB)
0x0032 00050 (main.go:5) FUNCDATA $1, gclocals·164458436596f10dcbb268ec85fd5e10(SB)
0x0032 00050 (main.go:5) FUNCDATA $2, "".main.stkobj(SB)
0x0032 00050 (main.go:6) XORPS X0, X0
0x0035 00053 (main.go:6) MOVUPS X0, ""..autotmp_20+232(SP)
0x003d 00061 (main.go:6) MOVUPS X0, ""..autotmp_20+248(SP)
0x0045 00069 (main.go:6) MOVUPS X0, ""..autotmp_20+264(SP)
0x004d 00077 (main.go:6) MOVQ $0, ""..autotmp_21+96(SP)
0x0056 00086 (main.go:6) LEAQ ""..autotmp_21+104(SP), DI
0x005b 00091 (main.go:6) PCDATA $0, $-2
0x005b 00091 (main.go:6) LEAQ -48(DI), DI
0x005f 00095 (main.go:6) NOP
0x0060 00096 (main.go:6) DUFFZERO $277
0x0073 00115 (main.go:6) PCDATA $0, $-1
0x0073 00115 (main.go:6) LEAQ ""..autotmp_21+96(SP), AX
0x0078 00120 (main.go:6) MOVQ AX, ""..autotmp_20+248(SP)
0x0080 00128 (main.go:6) PCDATA $1, $1
0x0080 00128 (main.go:6) CALL runtime.fastrand(SB)
0x0085 00133 (main.go:6) MOVL (SP), AX
0x0088 00136 (main.go:6) MOVL AX, ""..autotmp_20+244(SP)
0x008f 00143 (main.go:7) MOVB $1, ""..autotmp_24+71(SP)
0x0094 00148 (main.go:7) LEAQ type.map[int8]int(SB), AX
0x009b 00155 (main.go:7) MOVQ AX, (SP)
0x009f 00159 (main.go:7) LEAQ ""..autotmp_20+232(SP), CX
0x00a7 00167 (main.go:7) MOVQ CX, 8(SP)
0x00ac 00172 (main.go:7) LEAQ ""..autotmp_24+71(SP), DX
0x00b1 00177 (main.go:7) MOVQ DX, 16(SP)
0x00b6 00182 (main.go:7) CALL runtime.mapassign(SB)
0x00bb 00187 (main.go:7) MOVQ 24(SP), AX
0x00c0 00192 (main.go:7) MOVQ $1, (AX)
0x00c7 00199 (main.go:8) MOVB $2, ""..autotmp_24+71(SP)
0x00cc 00204 (main.go:8) LEAQ type.map[int8]int(SB), AX
0x00d3 00211 (main.go:8) MOVQ AX, (SP)
0x00d7 00215 (main.go:8) LEAQ ""..autotmp_20+232(SP), CX
0x00df 00223 (main.go:8) MOVQ CX, 8(SP)
0x00e4 00228 (main.go:8) LEAQ ""..autotmp_24+71(SP), DX
0x00e9 00233 (main.go:8) MOVQ DX, 16(SP)
0x00ee 00238 (main.go:8) CALL runtime.mapassign(SB)
0x00f3 00243 (main.go:8) MOVQ 24(SP), AX
0x00f8 00248 (main.go:8) MOVQ $2, (AX)
0x00ff 00255 (main.go:9) MOVB $3, ""..autotmp_24+71(SP)
0x0104 00260 (main.go:9) LEAQ type.map[int8]int(SB), AX
0x010b 00267 (main.go:9) MOVQ AX, (SP)
0x010f 00271 (main.go:9) LEAQ ""..autotmp_20+232(SP), CX
0x0117 00279 (main.go:9) MOVQ CX, 8(SP)
0x011c 00284 (main.go:9) LEAQ ""..autotmp_24+71(SP), DX
0x0121 00289 (main.go:9) MOVQ DX, 16(SP)
0x0126 00294 (main.go:9) CALL runtime.mapassign(SB)
0x012b 00299 (main.go:9) MOVQ 24(SP), AX
0x0130 00304 (main.go:9) MOVQ $3, (AX)
0x0137 00311 (main.go:12) LEAQ type.map[int8]int(SB), AX
0x013e 00318 (main.go:12) MOVQ AX, (SP)
0x0142 00322 (main.go:12) LEAQ ""..autotmp_20+232(SP), CX
0x014a 00330 (main.go:12) MOVQ CX, 8(SP)
0x014f 00335 (main.go:12) LEAQ ""..stmp_0(SB), DX
0x0156 00342 (main.go:12) MOVQ DX, 16(SP)
0x015b 00347 (main.go:12) NOP
0x0160 00352 (main.go:12) CALL runtime.mapaccess1(SB)
0x0165 00357 (main.go:12) MOVQ 24(SP), AX
0x016a 00362 (main.go:12) MOVQ (AX), AX
0x016d 00365 (main.go:12) MOVQ AX, "".v1+80(SP)
0x0172 00370 (main.go:13) LEAQ type.map[int8]int(SB), CX
0x0179 00377 (main.go:13) MOVQ CX, (SP)
0x017d 00381 (main.go:13) LEAQ ""..autotmp_20+232(SP), CX
0x0185 00389 (main.go:13) MOVQ CX, 8(SP)
0x018a 00394 (main.go:13) LEAQ ""..stmp_1(SB), CX
0x0191 00401 (main.go:13) MOVQ CX, 16(SP)
0x0196 00406 (main.go:13) PCDATA $1, $0
0x0196 00406 (main.go:13) CALL runtime.mapaccess2(SB)
0x019b 00411 (main.go:13) MOVQ 24(SP), AX
0x01a0 00416 (main.go:13) MOVQ (AX), AX
0x01a3 00419 (main.go:13) MOVQ AX, "".v2+72(SP)
0x01a8 00424 (main.go:13) MOVBLZX 32(SP), CX
0x01ad 00429 (main.go:13) MOVQ CX, ""..autotmp_53+88(SP)
0x01b2 00434 (main.go:15) MOVQ "".v1+80(SP), DX
0x01b7 00439 (main.go:15) MOVQ DX, (SP)
0x01bb 00443 (main.go:15) NOP
0x01c0 00448 (main.go:15) CALL runtime.convT64(SB)
0x01c5 00453 (main.go:15) MOVQ 8(SP), AX
0x01ca 00458 (main.go:15) XORPS X1, X1
0x01cd 00461 (main.go:15) MOVUPS X1, ""..autotmp_34+184(SP)
0x01d5 00469 (main.go:15) LEAQ type.int(SB), CX
0x01dc 00476 (main.go:15) MOVQ CX, ""..autotmp_34+184(SP)
0x01e4 00484 (main.go:15) MOVQ AX, ""..autotmp_34+192(SP)
... 省略不相关的部分
runtime.mapaccess1runtime.mapaccess2
func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer
func mapaccess2(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, bool)
mapaccess2mapaccess1boolmapaccess1
// mapaccess1 returns a pointer to h[key]. Never returns nil, instead
// it will return a reference to the zero object for the elem type if
// the key is not in the map.
// NOTE: The returned pointer may keep the whole map live, so don't
// hold onto it for very long.
func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
if raceenabled && h != nil {
callerpc := getcallerpc()
pc := funcPC(mapaccess1)
racereadpc(unsafe.Pointer(h), callerpc, pc)
raceReadObjectPC(t.key, key, callerpc, pc)
}
if msanenabled && h != nil {
msanread(key, t.key.size)
}
if h == nil || h.count == 0 {
if t.hashMightPanic() {
t.hasher(key, 0) // see issue 23734
}
return unsafe.Pointer(&zeroVal[0])
}
// map不允许并发读写,会触发该panic。这里是通过flags的标记位来判断是否正在进行写操作。
if h.flags&hashWriting != 0 {
throw("concurrent map read and map write")
}
hash := t.hasher(key, uintptr(h.hash0))
m := bucketMask(h.B) // m即后B位,用于定位[]bmap中对应的bmap。
b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize))) // 定位到具体的bmap。
if c := h.oldbuckets; c != nil {
if !h.sameSizeGrow() {
// There used to be half as many buckets; mask down one more power of two.
m >>= 1
}
oldb := (*bmap)(add(c, (hash&m)*uintptr(t.bucketsize)))
if !evacuated(oldb) {
b = oldb
}
}
top := tophash(hash) // tophash即首8位,用于快速比较。
bucketloop:
// 遍历迭代bmap和它的overflow bmap。
for ; b != nil; b = b.overflow(t) {
for i := uintptr(0); i < bucketCnt; i++ {
// 首先比较tophash是否相同,不相同会直接continue。
if b.tophash[i] != top {
if b.tophash[i] == emptyRest {
break bucketloop
}
continue
}
k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
if t.indirectkey() {
k = *((*unsafe.Pointer)(k))
}
// tophash相同的情况下,会比较key的值是否相同。若相同,则说明已经定位到该key,返回结果。
if t.key.equal(key, k) {
e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
if t.indirectelem() {
e = *((*unsafe.Pointer)(e))
}
return e
}
}
}
return unsafe.Pointer(&zeroVal[0])
}
key是string/32位整型/64位整型
stringint32/uint32int64/uint64
func mapaccess1_fast32(t *maptype, h *hmap, key uint32) unsafe.Pointer
func mapaccess2_fast32(t *maptype, h *hmap, key uint32) (unsafe.Pointer, bool)
func mapaccess1_fast64(t *maptype, h *hmap, key uint64) unsafe.Pointer
func mapaccess2_fast64(t *maptype, h *hmap, key uint64) (unsafe.Pointer, bool)
func mapaccess1_faststr(t *maptype, h *hmap, ky string) unsafe.Pointer
func mapaccess2_faststr(t *maptype, h *hmap, ky string) (unsafe.Pointer, bool)
mapaccess1mapaccess2
func mapaccess1_fast64(t *maptype, h *hmap, key uint64) unsafe.Pointer {
if raceenabled && h != nil {
callerpc := getcallerpc()
racereadpc(unsafe.Pointer(h), callerpc, funcPC(mapaccess1_fast64))
}
if h == nil || h.count == 0 {
return unsafe.Pointer(&zeroVal[0])
}
if h.flags&hashWriting != 0 {
throw("concurrent map read and map write")
}
var b *bmap
if h.B == 0 {
// One-bucket table. No need to hash.
b = (*bmap)(h.buckets)
} else {
hash := t.hasher(noescape(unsafe.Pointer(&key)), uintptr(h.hash0))
m := bucketMask(h.B)
b = (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize)))
if c := h.oldbuckets; c != nil {
if !h.sameSizeGrow() {
// There used to be half as many buckets; mask down one more power of two.
m >>= 1
}
oldb := (*bmap)(add(c, (hash&m)*uintptr(t.bucketsize)))
if !evacuated(oldb) {
b = oldb
}
}
}
for ; b != nil; b = b.overflow(t) {
for i, k := uintptr(0), b.keys(); i < bucketCnt; i, k = i+1, add(k, 8) {
if *(*uint64)(k) == key && !isEmpty(b.tophash[i]) { // 直接比较key,没有先比较tophash再比较key。
return add(unsafe.Pointer(b), dataOffset+bucketCnt*8+i*uintptr(t.elemsize))
}
}
}
return unsafe.Pointer(&zeroVal[0])
}
map的赋值
概述
bmaptophashbmaptophash
基本用法
package main
import "fmt"
func main() {
m := make(map[int8]int)
m[1] = 1
fmt.Println(m)
}
让我们通过go的官方工具来查看汇编结果:
$ go tool compile -S main.go
"".main STEXT size=202 args=0x0 locals=0x60 funcid=0x0
0x0000 00000 (main.go:5) TEXT "".main(SB), ABIInternal, $96-0
0x0000 00000 (main.go:5) MOVQ (TLS), CX
0x0009 00009 (main.go:5) CMPQ SP, 16(CX)
0x000d 00013 (main.go:5) PCDATA $0, $-2
0x000d 00013 (main.go:5) JLS 192
0x0013 00019 (main.go:5) PCDATA $0, $-1
0x0013 00019 (main.go:5) SUBQ $96, SP
0x0017 00023 (main.go:5) MOVQ BP, 88(SP)
0x001c 00028 (main.go:5) LEAQ 88(SP), BP
0x0021 00033 (main.go:5) FUNCDATA $0, gclocals·69c1753bd5f81501d95132d08af04464(SB)
0x0021 00033 (main.go:5) FUNCDATA $1, gclocals·713abd6cdf5e052e4dcd3eb297c82601(SB)
0x0021 00033 (main.go:5) FUNCDATA $2, "".main.stkobj(SB)
0x0021 00033 (main.go:6) PCDATA $1, $0
0x0021 00033 (main.go:6) CALL runtime.makemap_small(SB)
0x0026 00038 (main.go:6) MOVQ (SP), AX
0x002a 00042 (main.go:6) MOVQ AX, ""..autotmp_20+64(SP)
0x002f 00047 (main.go:8) LEAQ type.map[int8]int(SB), CX
0x0036 00054 (main.go:8) MOVQ CX, (SP)
0x003a 00058 (main.go:8) MOVQ AX, 8(SP)
0x003f 00063 (main.go:8) LEAQ ""..stmp_0(SB), DX
0x0046 00070 (main.go:8) MOVQ DX, 16(SP)
0x004b 00075 (main.go:8) PCDATA $1, $1
0x004b 00075 (main.go:8) CALL runtime.mapassign(SB)
0x0050 00080 (main.go:8) MOVQ 24(SP), AX
0x0055 00085 (main.go:8) MOVQ $1, (AX)
0x005c 00092 (main.go:10) XORPS X0, X0
0x005f 00095 (main.go:10) MOVUPS X0, ""..autotmp_14+72(SP)
0x0064 00100 (main.go:10) LEAQ type.map[int8]int(SB), AX
0x006b 00107 (main.go:10) MOVQ AX, ""..autotmp_14+72(SP)
0x0070 00112 (main.go:10) MOVQ ""..autotmp_20+64(SP), AX
0x0075 00117 (main.go:10) MOVQ AX, ""..autotmp_14+80(SP)
... 省略不相关的部分
runtime.mapassign
// Like mapaccess, but allocates a slot for the key if it is not present in the map.
func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
if h == nil {
panic(plainError("assignment to entry in nil map"))
}
if raceenabled {
callerpc := getcallerpc()
pc := funcPC(mapassign)
racewritepc(unsafe.Pointer(h), callerpc, pc)
raceReadObjectPC(t.key, key, callerpc, pc)
}
if msanenabled {
msanread(key, t.key.size)
}
// map不允许并发读写,通过flags的位判断。
if h.flags&hashWriting != 0 {
throw("concurrent map writes")
}
hash := t.hasher(key, uintptr(h.hash0))
// Set hashWriting after calling t.hasher, since t.hasher may panic,
// in which case we have not actually done a write.
// 因为赋值是写操作,因此置位flags写标志。
h.flags ^= hashWriting
if h.buckets == nil {
h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)
}
again:
bucket := hash & bucketMask(h.B)
if h.growing() { // 判断是否处于扩容中。
growWork(t, h, bucket)
}
b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize))) // 定位bmap
top := tophash(hash) // 计算tophash
var inserti *uint8
var insertk unsafe.Pointer
var elem unsafe.Pointer
bucketloop:
for {
for i := uintptr(0); i < bucketCnt; i++ {
if b.tophash[i] != top {
// 当遇到第一个空闲位置的tophash,记录下来。
// 遍历完整个bmap及其overflow bmap,没有找到该key,则认为map中不存在该key。
// 最终插入位置即这里记录的位置。
if isEmpty(b.tophash[i]) && inserti == nil {
inserti = &b.tophash[i]
insertk = add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
}
if b.tophash[i] == emptyRest {
break bucketloop
}
continue
}
k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
if t.indirectkey() {
k = *((*unsafe.Pointer)(k))
}
if !t.key.equal(key, k) {
continue
}
// already have a mapping for key. Update it.
// 表明在map中已经存在这个key,那么就需要对该key的值进行更新操作。
if t.needkeyupdate() {
typedmemmove(t.key, k, key)
}
elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
goto done
}
ovf := b.overflow(t) // 继续遍历下一个overflow bmap
if ovf == nil {
break
}
b = ovf
}
// Did not find mapping for key. Allocate new cell & add entry.
// If we hit the max load factor or we have too many overflow buckets,
// and we're not already in the middle of growing, start growing.
// 触发扩容的情况。
if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
hashGrow(t, h)
goto again // Growing the table invalidates everything, so try again
}
if inserti == nil {
// The current bucket and all the overflow buckets connected to it are full, allocate a new one.
newb := h.newoverflow(t, b)
inserti = &newb.tophash[0]
insertk = add(unsafe.Pointer(newb), dataOffset)
elem = add(insertk, bucketCnt*uintptr(t.keysize))
}
// store new key/elem at insert position
if t.indirectkey() {
kmem := newobject(t.key)
*(*unsafe.Pointer)(insertk) = kmem
insertk = kmem
}
if t.indirectelem() {
vmem := newobject(t.elem)
*(*unsafe.Pointer)(elem) = vmem
}
typedmemmove(t.key, insertk, key)
*inserti = top
h.count++
done:
if h.flags&hashWriting == 0 {
throw("concurrent map writes")
}
h.flags &^= hashWriting // 将写操作标记位置空。
if t.indirectelem() {
elem = *((*unsafe.Pointer)(elem))
}
return elem
}
key是string/32位整型/64位整型
stringint32/uint32int64/uint64
func mapassign_fast32(t *maptype, h *hmap, key uint32) unsafe.Pointer
func mapassign_fast64(t *maptype, h *hmap, key uint64) unsafe.Pointer
func mapassign_faststr(t *maptype, h *hmap, s string) unsafe.Pointer
map的扩容和迁移
概述
map扩容的目的在于减少Hash冲突,防止算法复杂度退化,保持Hash算法O(1)的时间复杂度。
map的扩容对使用方不可见,开发者在使用map的过程中是不会感知到map是否处于扩容中,以及当前扩容的进度。Go语言的map扩容是渐进式的,即整个扩容过程拆散在每一次的写操作里面,这样做的好处是保证每一次map的读写操作时间复杂度都是稳定的。
map触发扩容的时机有两个:
- map的负载因子(长度与容量的比例)超过阈值,此时map被认为无法承担更多的key,需要进行2倍扩容。
负载因子超过阈值
2.map存在局部bmap包含过多overflow的情况,此时map会认为局部的bmap可以进行tophash密集排列,让overflow数量更少
局部overflow过多
扩容过程源码
首先,我们通过源码来查看如何判断负载因子是否超过阈值:
// overLoadFactor reports whether count items placed in 1<<B buckets is over loadFactor.
func overLoadFactor(count int, B uint8) bool {
return count > bucketCnt && uintptr(count) > loadFactorNum*(bucketShift(B)/loadFactorDen)
}
// bucketShift returns 1<<b, optimized for code generation.
func bucketShift(b uint8) uintptr {
// Masking the shift amount allows overflow checks to be elided.
return uintptr(1) << (b & (sys.PtrSize*8 - 1))
}
const (
// Maximum number of key/elem pairs a bucket can hold.
bucketCntBits = 3
bucketCnt = 1 << bucketCntBits
// Maximum average load of a bucket that triggers growth is 6.5.
// Represent as loadFactorNum/loadFactorDen, to allow integer math.
loadFactorNum = 13
loadFactorDen = 2
)
loadFactorNumloadFactorDencountbucketShift(B)
hashGrow
func hashGrow(t *maptype, h *hmap) {
// If we've hit the load factor, get bigger.
// Otherwise, there are too many overflow buckets,
// so keep the same number of buckets and "grow" laterally.
bigger := uint8(1)
if !overLoadFactor(h.count+1, h.B) { // 如果不是负载因子超过阈值,那么本次map的扩容实际上更应该理解为map的整理,容量不变。
bigger = 0
h.flags |= sameSizeGrow // 标记sameSizeGrow,表示容量不变。
}
oldbuckets := h.buckets // 保留旧的buckets。
newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil) // 创建新的buckets。
flags := h.flags &^ (iterator | oldIterator)
if h.flags&iterator != 0 {
flags |= oldIterator
}
// commit the grow (atomic wrt gc)
h.B += bigger
h.flags = flags
h.oldbuckets = oldbuckets
h.buckets = newbuckets
h.nevacuate = 0 // 表示搬迁进度,从0开始将会依次递增,每调用一次evacuate()会自增一次。
h.noverflow = 0
if h.extra != nil && h.extra.overflow != nil {
// Promote current overflow buckets to the old generation.
if h.extra.oldoverflow != nil {
throw("oldoverflow is not nil")
}
h.extra.oldoverflow = h.extra.overflow
h.extra.overflow = nil
}
if nextOverflow != nil {
if h.extra == nil {
h.extra = new(mapextra)
}
h.extra.nextOverflow = nextOverflow
}
// the actual copying of the hash table data is done incrementally
// by growWork() and evacuate().
}
evacuate
// evacDst is an evacuation destination.
type evacDst struct {
b *bmap // current destination bucket // 搬迁目标bmap。
i int // key/elem index into b // 目标bmap的当前索引。
k unsafe.Pointer // pointer to current key storage // 当前key。
e unsafe.Pointer // pointer to current elem storage // 当前value。
}
func evacuate(t *maptype, h *hmap, oldbucket uintptr) {
b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize))) // 定位旧bmap。
newbit := h.noldbuckets()
if !evacuated(b) { // 判断是否已经被搬迁过。
// TODO: reuse overflow buckets instead of using new ones, if there
// is no iterator using the old buckets. (If !oldIterator.)
// xy contains the x and y (low and high) evacuation destinations.
var xy [2]evacDst
x := &xy[0]
x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.bucketsize)))
x.k = add(unsafe.Pointer(x.b), dataOffset)
x.e = add(x.k, bucketCnt*uintptr(t.keysize))
if !h.sameSizeGrow() { // 对于2倍扩容的场景,搬迁目标可能有2个,因此xy[1]记录另一个目标。
// Only calculate y pointers if we're growing bigger.
// Otherwise GC can see bad pointers.
y := &xy[1]
y.b = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.bucketsize)))
y.k = add(unsafe.Pointer(y.b), dataOffset)
y.e = add(y.k, bucketCnt*uintptr(t.keysize))
}
for ; b != nil; b = b.overflow(t) { // 遍历当前的bmap。
k := add(unsafe.Pointer(b), dataOffset)
e := add(k, bucketCnt*uintptr(t.keysize))
for i := 0; i < bucketCnt; i, k, e = i+1, add(k, uintptr(t.keysize)), add(e, uintptr(t.elemsize)) {
top := b.tophash[i]
if isEmpty(top) {
b.tophash[i] = evacuatedEmpty
continue
}
if top < minTopHash {
throw("bad map state")
}
k2 := k
if t.indirectkey() {
k2 = *((*unsafe.Pointer)(k2))
}
var useY uint8
if !h.sameSizeGrow() { // 对于2倍扩容的场景,要重新计算tophash
// Compute hash to make our evacuation decision (whether we need
// to send this key/elem to bucket x or bucket y).
hash := t.hasher(k2, uintptr(h.hash0))
if h.flags&iterator != 0 && !t.reflexivekey() && !t.key.equal(k2, k2) {
// If key != key (NaNs), then the hash could be (and probably
// will be) entirely different from the old hash. Moreover,
// it isn't reproducible. Reproducibility is required in the
// presence of iterators, as our evacuation decision must
// match whatever decision the iterator made.
// Fortunately, we have the freedom to send these keys either
// way. Also, tophash is meaningless for these kinds of keys.
// We let the low bit of tophash drive the evacuation decision.
// We recompute a new random tophash for the next level so
// these keys will get evenly distributed across all buckets
// after multiple grows.
useY = top & 1
top = tophash(hash)
} else {
if hash&newbit != 0 {
useY = 1
}
}
}
if evacuatedX+1 != evacuatedY || evacuatedX^1 != evacuatedY {
throw("bad evacuatedN")
}
b.tophash[i] = evacuatedX + useY // evacuatedX + 1 == evacuatedY
dst := &xy[useY] // evacuation destination
if dst.i == bucketCnt {
dst.b = h.newoverflow(t, dst.b)
dst.i = 0
dst.k = add(unsafe.Pointer(dst.b), dataOffset)
dst.e = add(dst.k, bucketCnt*uintptr(t.keysize))
}
dst.b.tophash[dst.i&(bucketCnt-1)] = top // mask dst.i as an optimization, to avoid a bounds check
// 拷贝内存数据到新的搬迁位置
if t.indirectkey() {
*(*unsafe.Pointer)(dst.k) = k2 // copy pointer
} else {
typedmemmove(t.key, dst.k, k) // copy elem
}
if t.indirectelem() {
*(*unsafe.Pointer)(dst.e) = *(*unsafe.Pointer)(e)
} else {
typedmemmove(t.elem, dst.e, e)
}
dst.i++
// These updates might push these pointers past the end of the
// key or elem arrays. That's ok, as we have the overflow pointer
// at the end of the bucket to protect against pointing past the
// end of the bucket.
dst.k = add(dst.k, uintptr(t.keysize))
dst.e = add(dst.e, uintptr(t.elemsize))
}
}
// Unlink the overflow buckets & clear key/elem to help GC.
// 把旧的bmap引用解除,便于GC完成内存回收。
if h.flags&oldIterator == 0 && t.bucket.ptrdata != 0 {
b := add(h.oldbuckets, oldbucket*uintptr(t.bucketsize))
// Preserve b.tophash because the evacuation
// state is maintained there.
ptr := add(b, dataOffset)
n := uintptr(t.bucketsize) - dataOffset
memclrHasPointers(ptr, n)
}
}
if oldbucket == h.nevacuate {
advanceEvacuationMark(h, t, newbit)
}
}
扩容中插入新的数据会放入到新桶中。
扩容
在向map中添加数据时,当达到某个条件,则会引发字典扩容。
扩容条件:
- map中数据总个数 / 桶个数 > 6.5 ,引发翻倍扩容。
- 使用了太多的溢出桶时(溢出桶使用的太多会导致map处理速度降低)。
- B <=15,已使用的溢出桶个数 >= \(2^B\) 时,引发等量扩容。
- B > 15,已使用的溢出桶个数 >= \(2^{15}\) 时,引发等量扩容。
双倍扩容:装载因子多大,直接翻倍,B+1;扩容也不是申请一块内存,立马开始拷贝,每一次访问旧的 buckets 时,就迁移一部分,直到完成,旧 bucket 被 GC 回收。
等量扩容:重新排列,极端情况下,重新排列也解决不了,map 成了链表,性能大大降低,此时哈希种子 hash0 的设置,可以降低此类极端场景的发生。
迁移
扩容之后,必然要伴随着数据的迁移,即:将旧桶中的数据要迁移到新桶中。
翻倍扩容
如果是翻倍扩容,那么迁移规就是将旧桶中的数据分流至新的两个桶中(比例不定),并且桶编号的位置为:同编号位置 和 翻倍后对应编号位置
那么问题来了,如何实现的这种迁移呢?
新桶的B的值=原桶B + 1
底B位
扩容后,B的值在原来的基础上已加1,也就意味着通过多1位来计算此键值对要分流到新桶位置,如上图:
- 当新增的位(红色)的值为 0,则数据会迁移到与旧桶编号一致的位置。
- 当新增的位(红色)的值为 1,则数据会迁移到翻倍后对应编号位置。
等量扩容
如果是等量扩容(溢出桶太多引发的扩容),那么数据迁移机制就会比较简单,就是将旧桶(含溢出桶)中的值迁移到新桶中。
这种扩容和迁移的意义在于:当溢出桶比较多而每个桶中的数据又不多时,可以通过等量扩容和迁移让数据更紧凑,从而减少溢出桶。
扩容和迁移总结
扩容
底B位
map的删除
概述
bmapoverflowtophash
基本用法
builtindelete
package main
func main() {
m := make(map[int8]int)
delete(m, 1)
}
让我们通过go的官方工具来查看汇编结果:
$ go tool compile -S main.go
"".main STEXT size=204 args=0x0 locals=0xa8 funcid=0x0
0x0000 00000 (main.go:3) TEXT "".main(SB), ABIInternal, $168-0
0x0000 00000 (main.go:3) MOVQ (TLS), CX
0x0009 00009 (main.go:3) LEAQ -40(SP), AX
0x000e 00014 (main.go:3) CMPQ AX, 16(CX)
0x0012 00018 (main.go:3) PCDATA $0, $-2
0x0012 00018 (main.go:3) JLS 194
0x0018 00024 (main.go:3) PCDATA $0, $-1
0x0018 00024 (main.go:3) SUBQ $168, SP
0x001f 00031 (main.go:3) MOVQ BP, 160(SP)
0x0027 00039 (main.go:3) LEAQ 160(SP), BP
0x002f 00047 (main.go:3) FUNCDATA $0, gclocals·69c1753bd5f81501d95132d08af04464(SB)
0x002f 00047 (main.go:3) FUNCDATA $1, gclocals·d9a525d1de6bc16975a5efbb873db17d(SB)
0x002f 00047 (main.go:3) FUNCDATA $2, "".main.stkobj(SB)
0x002f 00047 (main.go:4) XORPS X0, X0
0x0032 00050 (main.go:4) MOVUPS X0, ""..autotmp_1+112(SP)
0x0037 00055 (main.go:4) MOVUPS X0, ""..autotmp_1+128(SP)
0x003f 00063 (main.go:4) MOVUPS X0, ""..autotmp_1+144(SP)
0x0047 00071 (main.go:4) MOVQ $0, ""..autotmp_2+24(SP)
0x0050 00080 (main.go:4) LEAQ ""..autotmp_2+32(SP), DI
0x0055 00085 (main.go:4) PCDATA $0, $-2
0x0055 00085 (main.go:4) LEAQ -48(DI), DI
0x0059 00089 (main.go:4) NOP
0x0060 00096 (main.go:4) DUFFZERO $277
0x0073 00115 (main.go:4) PCDATA $0, $-1
0x0073 00115 (main.go:4) LEAQ ""..autotmp_2+24(SP), AX
0x0078 00120 (main.go:4) MOVQ AX, ""..autotmp_1+128(SP)
0x0080 00128 (main.go:4) PCDATA $1, $1
0x0080 00128 (main.go:4) CALL runtime.fastrand(SB)
0x0085 00133 (main.go:4) MOVL (SP), AX
0x0088 00136 (main.go:4) MOVL AX, ""..autotmp_1+124(SP)
0x008c 00140 (main.go:6) LEAQ type.map[int8]int(SB), AX
0x0093 00147 (main.go:6) MOVQ AX, (SP)
0x0097 00151 (main.go:6) LEAQ ""..autotmp_1+112(SP), AX
0x009c 00156 (main.go:6) MOVQ AX, 8(SP)
0x00a1 00161 (main.go:6) LEAQ ""..stmp_0(SB), AX
0x00a8 00168 (main.go:6) MOVQ AX, 16(SP)
0x00ad 00173 (main.go:6) PCDATA $1, $0
0x00ad 00173 (main.go:6) CALL runtime.mapdelete(SB)
0x00b2 00178 (main.go:7) MOVQ 160(SP), BP
0x00ba 00186 (main.go:7) ADDQ $168, SP
0x00c1 00193 (main.go:7) RET
0x00c2 00194 (main.go:7) NOP
0x00c2 00194 (main.go:3) PCDATA $1, $-1
0x00c2 00194 (main.go:3) PCDATA $0, $-2
0x00c2 00194 (main.go:3) CALL runtime.morestack_noctxt(SB)
0x00c7 00199 (main.go:3) PCDATA $0, $-1
0x00c7 00199 (main.go:3) JMP 0
... 省略不相关的部分
runtime.mapdelete
func mapdelete(t *maptype, h *hmap, key unsafe.Pointer) {
if raceenabled && h != nil {
callerpc := getcallerpc()
pc := funcPC(mapdelete)
racewritepc(unsafe.Pointer(h), callerpc, pc)
raceReadObjectPC(t.key, key, callerpc, pc)
}
if msanenabled && h != nil {
msanread(key, t.key.size)
}
if h == nil || h.count == 0 {
if t.hashMightPanic() {
t.hasher(key, 0) // see issue 23734
}
return
}
if h.flags&hashWriting != 0 {
throw("concurrent map writes")
}
hash := t.hasher(key, uintptr(h.hash0))
// Set hashWriting after calling t.hasher, since t.hasher may panic,
// in which case we have not actually done a write (delete).
h.flags ^= hashWriting
bucket := hash & bucketMask(h.B)
if h.growing() {
growWork(t, h, bucket)
}
b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
bOrig := b
top := tophash(hash)
search:
for ; b != nil; b = b.overflow(t) {
for i := uintptr(0); i < bucketCnt; i++ {
if b.tophash[i] != top {
if b.tophash[i] == emptyRest {
break search
}
continue
}
k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
k2 := k
if t.indirectkey() {
k2 = *((*unsafe.Pointer)(k2))
}
if !t.key.equal(key, k2) {
continue
}
// 删除key
// Only clear key if there are pointers in it.
if t.indirectkey() {
*(*unsafe.Pointer)(k) = nil
} else if t.key.ptrdata != 0 {
memclrHasPointers(k, t.key.size)
}
// 删除value
e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
if t.indirectelem() {
*(*unsafe.Pointer)(e) = nil
} else if t.elem.ptrdata != 0 {
memclrHasPointers(e, t.elem.size)
} else {
memclrNoHeapPointers(e, t.elem.size)
}
// 清空tophash
b.tophash[i] = emptyOne
// If the bucket now ends in a bunch of emptyOne states,
// change those to emptyRest states.
// It would be nice to make this a separate function, but
// for loops are not currently inlineable.
if i == bucketCnt-1 {
if b.overflow(t) != nil && b.overflow(t).tophash[0] != emptyRest {
goto notLast
}
} else {
if b.tophash[i+1] != emptyRest {
goto notLast
}
}
for {
b.tophash[i] = emptyRest
if i == 0 {
if b == bOrig {
break // beginning of initial bucket, we're done.
}
// Find previous bucket, continue at its last entry.
c := b
for b = bOrig; b.overflow(t) != c; b = b.overflow(t) {
}
i = bucketCnt - 1
} else {
i--
}
if b.tophash[i] != emptyOne {
break
}
}
notLast:
h.count--
// Reset the hash seed to make it more difficult for attackers to
// repeatedly trigger hash collisions. See issue 25237.
if h.count == 0 {
h.hash0 = fastrand()
}
break search
}
}
if h.flags&hashWriting == 0 {
throw("concurrent map writes")
}
h.flags &^= hashWriting
}
key是string/32位整型/64位整型
与map的访问同理,当map对应的key类型是string、int32/uint32、int64/uint64其中之一,Go语言的runtime层会调用不同函数,签名如下:
func mapdelete_fast32(t *maptype, h *hmap, key uint32)
func mapdelete_fast64(t *maptype, h *hmap, key uint64)
func mapdelete_faststr(t *maptype, h *hmap, ky string)
map的迭代
概述
nevacuatebmapbmap
基本用法
for range
package main
import "fmt"
func main() {
m := make(map[int8]int)
for k, v := range m {
fmt.Println(k, v)
}
}
runtime.mapiterinitruntime.mapiternext
// mapiterinit initializes the hiter struct used for ranging over maps.
// The hiter struct pointed to by 'it' is allocated on the stack
// by the compilers order pass or on the heap by reflect_mapiterinit.
// Both need to have zeroed hiter since the struct contains pointers.
func mapiterinit(t *maptype, h *hmap, it *hiter) {
if raceenabled && h != nil {
callerpc := getcallerpc()
racereadpc(unsafe.Pointer(h), callerpc, funcPC(mapiterinit))
}
if h == nil || h.count == 0 {
return
}
if unsafe.Sizeof(hiter{})/sys.PtrSize != 12 {
throw("hash_iter size incorrect") // see cmd/compile/internal/gc/reflect.go
}
it.t = t
it.h = h
// grab snapshot of bucket state
it.B = h.B
it.buckets = h.buckets
if t.bucket.ptrdata == 0 {
// Allocate the current slice and remember pointers to both current and old.
// This preserves all relevant overflow buckets alive even if
// the table grows and/or overflow buckets are added to the table
// while we are iterating.
h.createOverflow()
it.overflow = h.extra.overflow
it.oldoverflow = h.extra.oldoverflow
}
// decide where to start
// 可以看出,map的迭代器会随机选择一个开始的bucket位置和开始的offset。
r := uintptr(fastrand())
if h.B > 31-bucketCntBits {
r += uintptr(fastrand()) << 31
}
it.startBucket = r & bucketMask(h.B)
it.offset = uint8(r >> h.B & (bucketCnt - 1))
// iterator state
it.bucket = it.startBucket // 记录随机的开始位置。
// Remember we have an iterator.
// Can run concurrently with another mapiterinit().
if old := h.flags; old&(iterator|oldIterator) != iterator|oldIterator {
atomic.Or8(&h.flags, iterator|oldIterator)
}
mapiternext(it)
}
func mapiternext(it *hiter) {
h := it.h
if raceenabled {
callerpc := getcallerpc()
racereadpc(unsafe.Pointer(h), callerpc, funcPC(mapiternext))
}
if h.flags&hashWriting != 0 {
throw("concurrent map iteration and map write")
}
t := it.t
bucket := it.bucket
b := it.bptr
i := it.i
checkBucket := it.checkBucket
next:
if b == nil {
if bucket == it.startBucket && it.wrapped {
// end of iteration
it.key = nil
it.elem = nil
return
}
// 判断是否正处于扩容搬迁中,对于正在搬迁过程中仍未完成搬迁的bucket,迭代过程中需要考虑进入旧的bucket里面。
if h.growing() && it.B == h.B {
// Iterator was started in the middle of a grow, and the grow isn't done yet.
// If the bucket we're looking at hasn't been filled in yet (i.e. the old
// bucket hasn't been evacuated) then we need to iterate through the old
// bucket and only return the ones that will be migrated to this bucket.
oldbucket := bucket & it.h.oldbucketmask()
b = (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))
if !evacuated(b) {
checkBucket = bucket
} else {
b = (*bmap)(add(it.buckets, bucket*uintptr(t.bucketsize)))
checkBucket = noCheck
}
} else {
b = (*bmap)(add(it.buckets, bucket*uintptr(t.bucketsize)))
checkBucket = noCheck
}
bucket++
if bucket == bucketShift(it.B) {
bucket = 0
it.wrapped = true
}
i = 0
}
for ; i < bucketCnt; i++ {
offi := (i + it.offset) & (bucketCnt - 1)
if isEmpty(b.tophash[offi]) || b.tophash[offi] == evacuatedEmpty {
// TODO: emptyRest is hard to use here, as we start iterating
// in the middle of a bucket. It's feasible, just tricky.
continue
}
k := add(unsafe.Pointer(b), dataOffset+uintptr(offi)*uintptr(t.keysize))
if t.indirectkey() {
k = *((*unsafe.Pointer)(k))
}
e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+uintptr(offi)*uintptr(t.elemsize))
if checkBucket != noCheck && !h.sameSizeGrow() {
// Special case: iterator was started during a grow to a larger size
// and the grow is not done yet. We're working on a bucket whose
// oldbucket has not been evacuated yet. Or at least, it wasn't
// evacuated when we started the bucket. So we're iterating
// through the oldbucket, skipping any keys that will go
// to the other new bucket (each oldbucket expands to two
// buckets during a grow).
if t.reflexivekey() || t.key.equal(k, k) {
// If the item in the oldbucket is not destined for
// the current new bucket in the iteration, skip it.
hash := t.hasher(k, uintptr(h.hash0))
if hash&bucketMask(it.B) != checkBucket {
continue
}
} else {
// Hash isn't repeatable if k != k (NaNs). We need a
// repeatable and randomish choice of which direction
// to send NaNs during evacuation. We'll use the low
// bit of tophash to decide which way NaNs go.
// NOTE: this case is why we need two evacuate tophash
// values, evacuatedX and evacuatedY, that differ in
// their low bit.
if checkBucket>>(it.B-1) != uintptr(b.tophash[offi]&1) {
continue
}
}
}
if (b.tophash[offi] != evacuatedX && b.tophash[offi] != evacuatedY) ||
!(t.reflexivekey() || t.key.equal(k, k)) {
// This is the golden data, we can return it.
// OR
// key!=key, so the entry can't be deleted or updated, so we can just return it.
// That's lucky for us because when key!=key we can't look it up successfully.
it.key = k
if t.indirectelem() {
e = *((*unsafe.Pointer)(e))
}
it.elem = e
} else {
// The hash table has grown since the iterator was started.
// The golden data for this key is now somewhere else.
// Check the current hash table for the data.
// This code handles the case where the key
// has been deleted, updated, or deleted and reinserted.
// NOTE: we need to regrab the key as it has potentially been
// updated to an equal() but not identical key (e.g. +0.0 vs -0.0).
rk, re := mapaccessK(t, h, k)
if rk == nil {
continue // key has been deleted
}
it.key = rk
it.elem = re
}
it.bucket = bucket
if it.bptr != b { // avoid unnecessary write barrier; see issue 14921
it.bptr = b
}
it.i = i + 1
it.checkBucket = checkBucket
return
}
b = b.overflow(t)
i = 0
goto next
}
小结一下,map的迭代顺序是完全随机的,并且map在迭代过程中会根据搬迁进度来判断是否要兼顾旧的bucket。
是否能在迭代过程中删除元素
这是一个比较热门的话题,即map中是否可以在迭代过程中删除元素。在不同语言中的行为结果可能是不一样的,熟悉C++的读者应该知道这样的行为在C++语言中是不合法的,本文讨论的是Go语言的行为:
import "fmt"
func main() {
m := map[int]int{
1: 1,
2: 2,
3: 3,
}
for k := range m {
if k == 1 {
delete(m, k)
}
}
fmt.Println(m)
}
bmaptophash
key 可以是 float 型吗?
"".statictmp_02.42.4000000000000000000000001math.Float64bits()
float 型可以作为 key,但是由于精度的问题,会导致一些诡异的问题,慎用之。
总结
makemap_smallmakemap64makemapmapaccess2mapaccess1 mapassign_fast32 mapassign_fast64mapassign_faststrmapdelete_fast32mapdelete_fast64mapdelete_faststrruntime.mapiterinitruntime.mapiternext
参考
https://github.com/WuPeiqi/go_course/blob/master/day06 数据类型:指针、切片、字典/笔记/day06 数据类型.md