golang Map源码赏析，map声明，长度和容量，新增，删除，修改，迭代查看，解决哈希冲突，数组+链表（桶），map内存模型

golang Map

map描述了一种键与值的映射关系，开发者通常会通过键来查询其对应的值。map最常见的底层实现有两种：基于Hash散列和基于平衡树，两者的存取时间复杂度不同，Go语言的map属于前者范畴。

Hash算法有两大核心：设计Hash函数和解决Hash冲突。

Map使用

声明&初始化

声明方式var m map[string]stringnew关键字m := *new(map[string]string)make关键字m := make(map[string]string)直接创建m := map[string]string{"": ""}

由于map是引用类型，因此使用声明方式创建map时在使用之前必须初始化。以上四种方式中最常用的是后面两种使用方式，第二种基本不会使用。

//方式一
// userInfo := map[string]string{}
userInfo := map[string]string{"name":"武沛齐","age":"18"}

userInfo["name"]  // 武沛齐
userInfo["age"] = "20"
userInfo["email"] = "wupeiqi@live.com"

// data := make(map[int]int, 10)
data := make(map[int]int)
data[100] = 998
data[200] = 999

//方式二
// 声明，nil
var row map[int]int
row = data

注意：键不重复 & 键必须可哈希（int/bool/float/string/array）

常用操作

长度和容量

  // 根据参数值（10），计算出合适的容量,容量是无限的。
  // 一个map 中会包含很多桶，每个桶中可以存放8个键值对。
  info := make(map[string]string, 10)

  info["n1"] = "武沛齐"
  info["n2"] = "alex"

  v1 := len(info) // 2
  // v2 := cap(info) // 报错,无效参数
  t.Log(v1)
  // t.Log(v2)

增加

data := map[string]string{"n1":"武沛齐","n2":"alex"}
data["n3"] = "eric"

删除

data := map[string]string{"n1":"武沛齐","n2":"alex"}
delete(data,"n2")

修改

data := map[string]string{"n1":"武沛齐","n2":"alex"}
data["n1"] = "eric"

查看

data := map[string]string{"n1":"武沛齐","n2":"alex"}
for key,value := range data{
    fmt.Println(key,value)
}

  data := map[string]string{"n1": "武沛齐", "n2": "alex"}
  k, ok := data["n1"]
  fmt.Println("k", k, "ok", ok)

嵌套

v1 := make(map[string]int)
v2 := make(map[string]string)
v3 := make(map[string]...)
v4 := make(map[string][2]int)
v5 := make(map[string][]int)
v6 := make(map[string]map[int]int)

v7 := make(map[string][2]map[string]string)
v7["n1"] = [2]map[string]string{ map[string]string{"name":"武沛齐","age":"18"},map[string]string{"name":"alex","age":"78"}}
v7["n2"] = [2]map[string]string{ map[string]string{"name":"eric","age":"18"},map[string]string{"name":"seven","age":"78"}}

// 伪代码
v7 = {
    n1:[
        {"name":"武沛齐","age":"18"},
        {"name":"alex","age":"78"}
    ],
    n2:[
        {"name":"eric","age":"18"},
        {"name":"seven","age":"78"}
    ]
}

前提：键不重复 & 键必须可哈希

v8 := make(map[int]int)
v9 := make(map[string]int)
v10 := make(map[float32]int)
v11 := make(map[bool]int)
v12 := make(map[ [2]int ]int)
v13 := make(map[ []int ]int) // 错误,不可哈希
v14 := make(map[ map[int]int ]int) // 错误，不可哈希
v15 := make(map[ [2][]int ]int) // 报错
v16 := make(map[ [2]map[string]string ]int) // 报错

变量赋值

v1 := map[string]string{"n1":"武沛齐","n2":"alex"}
v2 := v1

v1["n1"] = "wupeiqi"

ftm.Println(v1) // {"n1":"wupeiqi","n2":"alex"}
ftm.Println(v2) // {"n1":"wupeiqi","n2":"alex"}

特别提醒：无论是否存在扩容都指向同一个地址。

源码赏析

本文涉及到的源码解析部分来源于go官方的1.16.5版本，其中英文部分的注释也属于源码，中文部分的注释则属于笔者添加。

Hash

map最常见的底层实现有两种：基于Hash散列和基于平衡树，两者的存取时间复杂度不同，Go语言的map属于前者范畴。

Go语言map的底层实现基于Hash散列。Hash散列是一种著名的广义上的算法，它能够将任意长度的数据映射到有限的值域上面。

Hash算法有两大核心：设计Hash函数和解决Hash冲突。

设计Hash函数

设计Hash函数的基本原则有：

尽可能让输入的数据映射到不同的值域上面。
函数计算过程需要保证高性能。

但实际工程中由于输入数据范围是无限的，而输出值域范围是有限的，因此必然存在不同的输入数据经过映射后得到相同的输出值，这种现象称为Hash冲突。

hashCode()

// s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        char val[] = value;

        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
}

Java选取31作为Magic Number，31不仅是奇数更是质数。stackoverflow上也有关于该问题的讨论：

31 * i == (i << 5) - i

此外，常见的著名Hash算法还有：MD5、SHA1、SHA2等等。

解决Hash冲突

如何解决Hash冲突是Hash算法中的核心一环，最常见的做法是拉链法。所谓拉链法是指当Hash冲突产生时，将出现冲突的Bucket位用链表这一数据结构串联。

此外，在Java语言JDK 1.8中，HashMap的链表还引入了平衡树来优化局部链表过长的性能问题1。

时间复杂度

Hash散列之所以成为map底层实现的首选，核心原因就是：平均增删改查时间复杂度都是O(1)。相比之下，基于平衡树的map（例如：C++ STL的map）平均时间复杂度是O(logN)。

Algorithm	Action	Average Time Complexity	Worst Time Complexity
Hash	Insert	O(1)	O(N)
Hash	Delete	O(1)	O(N)
Hash	Update	O(1)	O(N)
Hash	Get	O(1)	O(N)
Balanced Binary Tree	Insert	O(logN)	O(logN)
Balanced Binary Tree	Delete	O(logN)	O(logN)
Balanced Binary Tree	Update	O(logN)	O(logN)
Balanced Binary Tree	Get	O(logN)	O(logN)

需要注意的是，基于平衡树的map在最坏情况下依然能够保证时间复杂度是O(logN)，并且能够提供key的有序遍历，这是基于Hash的map所不具备的能力。

go哈希函数

alginit()src/runtime/alg.go

hash 函数，有加密型和非加密型。
加密型的一般用于加密数据、数字摘要等，典型代表就是 md5、sha1、sha256、aes256 这种；
非加密型的一般就是查找。在 map 的应用场景中，用的是查找。
选择 hash 函数主要考察的是两点：性能、碰撞概率。

map的数据结构

// map的类型
type maptype struct {
  typ        _type
  key        *_type // 键的类型
  elem       *_type // 值的类型
  bucket     *_type // hash桶的内部类型
  keysize    uint8  // key的尺寸
  valuesize  uint8  // value的尺寸
  bucketsize uint16 // bucket的尺寸
  flags      uint32
}

在源码中，表示 map 的结构体是 hmap，它是 hashmap 的“缩写”：

type hmap struct {
    count     int             // map中元素(k-v)个数, 调用len(map)实际上返回该值
    flags     uint8           // map的状态字段, 记录当前map的状态
    B         uint8           // map当前桶的个数为: 2^B
    noverflow uint16          // map当前溢出桶的个数
    hash0     uint32          // 在计算key得hash值时作为hash种子使用
    buckets    unsafe.Pointer // buckets是一块连续内存(理解为一个数组), 用于存放数据(k-v), 该字段指向内存首地址
    oldbuckets unsafe.Pointer // map扩容后, 该字段指向扩容前的buckets内存首地址
    nevacuate  uintptr        // map扩容时, 该字段表示已经扩容完毕的buckets数（迁移进度）
    extra *mapextra           // 可选数据, 并不是所有map都会用到, 当map的key和value都不是指针，Go为了避免GC扫描整个hmap，会将bmap的overflow字段移动到extra。
}

hmapbucketsbmap

type bmap struct {
  tophash [bucketCnt]uint8 // 一个[]uint8, 存储桶中8个元素的高八位hash值
}

// bucketCnt是map.go中的常量, bucketCnt = 8
// tophash是key hash的高八位

bmapbmap

type bmap struct {
    tophash [bucketCnt]uint8 // tophash 是 hash 值的高 8 位
    keys    [8]keytype     // 存放key的数组, 数组长度为8
    values  [8]valuetype   // 存放value的数组，数组长度为8
    pad      uintptr
    overflow pointer         // 指向溢出桶的指针, 溢出桶也是个bmap结构, 只是叫法不同
}

bmapoverflowtophash

map的内存模型

桶(bmap)

map[int64]int8key/value...key/key/.../value/value/...

extra

extra是可选的

type mapextra struct {
    overflow    *[]*bmap   // 所有溢出桶组成的数组
    oldoverflow *[]*bmap   // 扩容时指向扩容前所有溢出桶组成的数组
    nextOverflow *bmap     // 指向空闲的溢出桶
}

为什么需要extra

overflow

这一点map源码中的已经告诉了我们答案：当map的key和val均不含指针并且可以内联(size < 128字节)时，bmap可以被标注为不包含指针，这样可以避免GC时扫描整个map(扫描所有的bmap)，

overflow

key的定位

后5位确定桶的位置，前8位确定key的位置，key找value；key相同发生哈希冲突，overflow解决，链表链接一个桶。

key 经过哈希计算后得到哈希值，共 64 个 bit 位（64位机，32位机就不讨论了，现在主流都是64位机），计算它到底要落在哪个桶时，只会用到最后 B 个 bit 位。还记得前面提到过的 B 吗？如果 B = 5，那么桶的数量，也就是 buckets 数组的长度是 2^5 = 32。

例如，现在有一个 key 经过哈希函数计算后，得到的哈希结果是：

 10010111 | 000011110110110010001111001010100010010110010101010 │ 01010

再用哈希值的高 8 位，找到此 key 在 bucket 中的位置，这是在寻找已有的 key。最开始桶内还没有 key，新加入的 key 会找到第一个空位，放入。

buckets 编号就是桶编号，当两个不同的 key 落在同一个桶中，也就是发生了哈希冲突。冲突的解决手段是用链表法：在 bucket 中，从前往后找到第一个空位。这样，在查找某个 key 时，先找到对应的桶，再去遍历 bucket 中的 key。

0011010010111

如果在 bucket 中没找到，并且 overflow 不为空，还要继续去 overflow bucket 中寻找，直到找到或是所有的 key 槽位都找遍了，包括所有的 overflow bucket。

map的创建

基本用法

makemap

package main

import "fmt"

func main() {
  m1 := make(map[int]int) 
  m2 := make(map[int]int, 10) // 创建map支持传递一个参数，表示初始大小

  fmt.Println(m1, m2)
}

让我们通过go的官方工具来查看汇编结果：

$ go tool compile -S main.go

"".main STEXT size=207 args=0x0 locals=0x70 funcid=0x0
  0x0000 00000 (main.go:5)  TEXT  "".main(SB), ABIInternal, $112-0
  0x0000 00000 (main.go:5)  MOVQ  (TLS), CX
  0x0009 00009 (main.go:5)  CMPQ  SP, 16(CX)
  0x000d 00013 (main.go:5)  PCDATA  $0, $-2
  0x000d 00013 (main.go:5)  JLS  197
  0x0013 00019 (main.go:5)  PCDATA  $0, $-1
  0x0013 00019 (main.go:5)  SUBQ  $112, SP
  0x0017 00023 (main.go:5)  MOVQ  BP, 104(SP)
  0x001c 00028 (main.go:5)  LEAQ  104(SP), BP
  0x0021 00033 (main.go:5)  FUNCDATA  $0, gclocals·69c1753bd5f81501d95132d08af04464(SB)
  0x0021 00033 (main.go:5)  FUNCDATA  $1, gclocals·d527b79a98f329c2ba624a68e7df03d6(SB)
  0x0021 00033 (main.go:5)  FUNCDATA  $2, "".main.stkobj(SB)
  0x0021 00033 (main.go:6)  PCDATA  $1, $0
  0x0021 00033 (main.go:6)  CALL  runtime.makemap_small(SB)
  0x0026 00038 (main.go:6)  MOVQ  (SP), AX
  0x002a 00042 (main.go:6)  MOVQ  AX, ""..autotmp_25+64(SP)
  0x002f 00047 (main.go:7)  LEAQ  type.map[int]int(SB), CX
  0x0036 00054 (main.go:7)  MOVQ  CX, (SP)
  0x003a 00058 (main.go:7)  MOVQ  $10, 8(SP)
  0x0043 00067 (main.go:7)  MOVQ  $0, 16(SP)
  0x004c 00076 (main.go:7)  PCDATA  $1, $1
  0x004c 00076 (main.go:7)  CALL  runtime.makemap(SB)
  0x0051 00081 (main.go:7)  MOVQ  24(SP), AX
  0x0056 00086 (main.go:9)  XORPS  X0, X0
  0x0059 00089 (main.go:9)  MOVUPS  X0, ""..autotmp_17+72(SP)
  0x005e 00094 (main.go:9)  MOVUPS  X0, ""..autotmp_17+88(SP)
  0x0063 00099 (main.go:9)  LEAQ  type.map[int]int(SB), CX
  0x006a 00106 (main.go:9)  MOVQ  CX, ""..autotmp_17+72(SP)
  0x006f 00111 (main.go:9)  MOVQ  ""..autotmp_25+64(SP), DX
  0x0074 00116 (main.go:9)  MOVQ  DX, ""..autotmp_17+80(SP)
  0x0079 00121 (main.go:9)  MOVQ  CX, ""..autotmp_17+88(SP)
  0x007e 00126 (main.go:9)  MOVQ  AX, ""..autotmp_17+96(SP)
  ... 省略不相关的部分

makeruntime.makemap_smallruntime.makemap

// makemap_small implements Go map creation for make(map[k]v) and
// make(map[k]v, hint) when hint is known to be at most bucketCnt
// at compile time and the map needs to be allocated on the heap.
func makemap_small() *hmap {
  h := new(hmap)
  h.hash0 = fastrand()
  return h
}

const (
  // Maximum number of key/elem pairs a bucket can hold.
  bucketCntBits = 3
  bucketCnt     = 1 << bucketCntBits
)

bucketCntruntime.makemap_smallmakemap_small

runtime.makemap

// makemap implements Go map creation for make(map[k]v, hint).
// If the compiler has determined that the map or the first bucket
// can be created on the stack, h and/or bucket may be non-nil.
// If h != nil, the map can be created directly in h.
// If h.buckets != nil, bucket pointed to can be used as the first bucket.

// 如果编译器发现map或者它第一个bucket能够被分配在栈上，则传入的h参数将不为空。
func makemap(t *maptype, hint int, h *hmap) *hmap {
  mem, overflow := math.MulUintptr(uintptr(hint), t.bucket.size)
  if overflow || mem > maxAlloc {
    hint = 0
  }

  // initialize Hmap
  if h == nil {
    h = new(hmap)
  }
  h.hash0 = fastrand()

  // Find the size parameter B which will hold the requested # of elements.
  // For hint < 0 overLoadFactor returns false since hint < bucketCnt.

  // hint表示创建map的预期初始大小，因此要计算一个合适的对数B
  // overLoadFactor即用于判断当前的2^B是否已经超过hint
  B := uint8(0)
  for overLoadFactor(hint, B) {
    B++
  }
  h.B = B

  // allocate initial hash table
  // if B == 0, the buckets field is allocated lazily later (in mapassign)
  // If hint is large zeroing this memory could take a while.
  
  // 如果B == 0，则Buckets数组将会延迟初始化，直到调用mapassign给该map存值
  if h.B != 0 {
    var nextOverflow *bmap
    h.buckets, nextOverflow = makeBucketArray(t, h.B, nil) // 给Buckets数组分配内存
    if nextOverflow != nil {
      h.extra = new(mapextra)
      h.extra.nextOverflow = nextOverflow
    }
  }

  return h
}

runtime.makemap64

func makemap64(t *maptype, hint int64, h *hmap) *hmap {
  if int64(int(hint)) != hint {
    hint = 0
  }
  return makemap(t, int(hint), h)
}

小结一下，map的创建底层有三种实现：

makemap_smallmakemap64makemakemapmakemap

选择Hash函数

map底层使用的Hash函数在runtime层的初始化中会根据CPU架构和一些系统支持来选择。让我们翻阅相关源码：

func alginit() {
  // Install AES hash algorithms if the instructions needed are present.
  
  // 对于386/amd64/arm64都优先判断是否支持AES Hash函数
  if (GOARCH == "386" || GOARCH == "amd64") &&
    cpu.X86.HasAES && // AESENC
    cpu.X86.HasSSSE3 && // PSHUFB
    cpu.X86.HasSSE41 { // PINSR{D,Q}
    initAlgAES()
    return
  }
  if GOARCH == "arm64" && cpu.ARM64.HasAES {
    initAlgAES()
    return
  }
  getRandomData((*[len(hashkey) * sys.PtrSize]byte)(unsafe.Pointer(&hashkey))[:])
  hashkey[0] |= 1 // make sure these numbers are odd
  hashkey[1] |= 1
  hashkey[2] |= 1
  hashkey[3] |= 1
}

由上述源码可知，map的Hash函数优先选择AES，不支持的情况下进入默认逻辑。

map的访问

概述

                                                             map访问示例，B = 5

map的访问即通过给定的key在map中寻找其对应value，它的大致步骤如下：

bmap[]bmapbmaptophashbmapbmaptophashtophashbmapoverflowoverflowbmap

基本用法

map的访问方式直接通过key下标获取，返回值可以是一个value，也可以在此基础上多一个表示是否存在的bool：

  m := map[int8]int{
    1: 1,
    2: 2,
    3: 3,
  }

  v1 := m[1]
  v2, ok := m[2] // ok是一个bool值，用于判断是否存在2这个key

  fmt.Println(v1)
  fmt.Println(v2, ok)

让我们通过go的官方工具来查看汇编结果：

$ go tool compile -S main.go
"".main STEXT size=741 args=0x0 locals=0x120 funcid=0x0
  0x0000 00000 (main.go:5)  TEXT  "".main(SB), ABIInternal, $288-0
  0x0000 00000 (main.go:5)  MOVQ  (TLS), CX
  0x0009 00009 (main.go:5)  LEAQ  -160(SP), AX
  0x0011 00017 (main.go:5)  CMPQ  AX, 16(CX)
  0x0015 00021 (main.go:5)  PCDATA  $0, $-2
  0x0015 00021 (main.go:5)  JLS  726
  0x001b 00027 (main.go:5)  PCDATA  $0, $-1
  0x001b 00027 (main.go:5)  SUBQ  $288, SP
  0x0022 00034 (main.go:5)  MOVQ  BP, 280(SP)
  0x002a 00042 (main.go:5)  LEAQ  280(SP), BP
  0x0032 00050 (main.go:5)  FUNCDATA  $0, gclocals·69c1753bd5f81501d95132d08af04464(SB)
  0x0032 00050 (main.go:5)  FUNCDATA  $1, gclocals·164458436596f10dcbb268ec85fd5e10(SB)
  0x0032 00050 (main.go:5)  FUNCDATA  $2, "".main.stkobj(SB)
  0x0032 00050 (main.go:6)  XORPS  X0, X0
  0x0035 00053 (main.go:6)  MOVUPS  X0, ""..autotmp_20+232(SP)
  0x003d 00061 (main.go:6)  MOVUPS  X0, ""..autotmp_20+248(SP)
  0x0045 00069 (main.go:6)  MOVUPS  X0, ""..autotmp_20+264(SP)
  0x004d 00077 (main.go:6)  MOVQ  $0, ""..autotmp_21+96(SP)
  0x0056 00086 (main.go:6)  LEAQ  ""..autotmp_21+104(SP), DI
  0x005b 00091 (main.go:6)  PCDATA  $0, $-2
  0x005b 00091 (main.go:6)  LEAQ  -48(DI), DI
  0x005f 00095 (main.go:6)  NOP
  0x0060 00096 (main.go:6)  DUFFZERO  $277
  0x0073 00115 (main.go:6)  PCDATA  $0, $-1
  0x0073 00115 (main.go:6)  LEAQ  ""..autotmp_21+96(SP), AX
  0x0078 00120 (main.go:6)  MOVQ  AX, ""..autotmp_20+248(SP)
  0x0080 00128 (main.go:6)  PCDATA  $1, $1
  0x0080 00128 (main.go:6)  CALL  runtime.fastrand(SB)
  0x0085 00133 (main.go:6)  MOVL  (SP), AX
  0x0088 00136 (main.go:6)  MOVL  AX, ""..autotmp_20+244(SP)
  0x008f 00143 (main.go:7)  MOVB  $1, ""..autotmp_24+71(SP)
  0x0094 00148 (main.go:7)  LEAQ  type.map[int8]int(SB), AX
  0x009b 00155 (main.go:7)  MOVQ  AX, (SP)
  0x009f 00159 (main.go:7)  LEAQ  ""..autotmp_20+232(SP), CX
  0x00a7 00167 (main.go:7)  MOVQ  CX, 8(SP)
  0x00ac 00172 (main.go:7)  LEAQ  ""..autotmp_24+71(SP), DX
  0x00b1 00177 (main.go:7)  MOVQ  DX, 16(SP)
  0x00b6 00182 (main.go:7)  CALL  runtime.mapassign(SB)
  0x00bb 00187 (main.go:7)  MOVQ  24(SP), AX
  0x00c0 00192 (main.go:7)  MOVQ  $1, (AX)
  0x00c7 00199 (main.go:8)  MOVB  $2, ""..autotmp_24+71(SP)
  0x00cc 00204 (main.go:8)  LEAQ  type.map[int8]int(SB), AX
  0x00d3 00211 (main.go:8)  MOVQ  AX, (SP)
  0x00d7 00215 (main.go:8)  LEAQ  ""..autotmp_20+232(SP), CX
  0x00df 00223 (main.go:8)  MOVQ  CX, 8(SP)
  0x00e4 00228 (main.go:8)  LEAQ  ""..autotmp_24+71(SP), DX
  0x00e9 00233 (main.go:8)  MOVQ  DX, 16(SP)
  0x00ee 00238 (main.go:8)  CALL  runtime.mapassign(SB)
  0x00f3 00243 (main.go:8)  MOVQ  24(SP), AX
  0x00f8 00248 (main.go:8)  MOVQ  $2, (AX)
  0x00ff 00255 (main.go:9)  MOVB  $3, ""..autotmp_24+71(SP)
  0x0104 00260 (main.go:9)  LEAQ  type.map[int8]int(SB), AX
  0x010b 00267 (main.go:9)  MOVQ  AX, (SP)
  0x010f 00271 (main.go:9)  LEAQ  ""..autotmp_20+232(SP), CX
  0x0117 00279 (main.go:9)  MOVQ  CX, 8(SP)
  0x011c 00284 (main.go:9)  LEAQ  ""..autotmp_24+71(SP), DX
  0x0121 00289 (main.go:9)  MOVQ  DX, 16(SP)
  0x0126 00294 (main.go:9)  CALL  runtime.mapassign(SB)
  0x012b 00299 (main.go:9)  MOVQ  24(SP), AX
  0x0130 00304 (main.go:9)  MOVQ  $3, (AX)
  0x0137 00311 (main.go:12)  LEAQ  type.map[int8]int(SB), AX
  0x013e 00318 (main.go:12)  MOVQ  AX, (SP)
  0x0142 00322 (main.go:12)  LEAQ  ""..autotmp_20+232(SP), CX
  0x014a 00330 (main.go:12)  MOVQ  CX, 8(SP)
  0x014f 00335 (main.go:12)  LEAQ  ""..stmp_0(SB), DX
  0x0156 00342 (main.go:12)  MOVQ  DX, 16(SP)
  0x015b 00347 (main.go:12)  NOP
  0x0160 00352 (main.go:12)  CALL  runtime.mapaccess1(SB)
  0x0165 00357 (main.go:12)  MOVQ  24(SP), AX
  0x016a 00362 (main.go:12)  MOVQ  (AX), AX
  0x016d 00365 (main.go:12)  MOVQ  AX, "".v1+80(SP)
  0x0172 00370 (main.go:13)  LEAQ  type.map[int8]int(SB), CX
  0x0179 00377 (main.go:13)  MOVQ  CX, (SP)
  0x017d 00381 (main.go:13)  LEAQ  ""..autotmp_20+232(SP), CX
  0x0185 00389 (main.go:13)  MOVQ  CX, 8(SP)
  0x018a 00394 (main.go:13)  LEAQ  ""..stmp_1(SB), CX
  0x0191 00401 (main.go:13)  MOVQ  CX, 16(SP)
  0x0196 00406 (main.go:13)  PCDATA  $1, $0
  0x0196 00406 (main.go:13)  CALL  runtime.mapaccess2(SB)
  0x019b 00411 (main.go:13)  MOVQ  24(SP), AX
  0x01a0 00416 (main.go:13)  MOVQ  (AX), AX
  0x01a3 00419 (main.go:13)  MOVQ  AX, "".v2+72(SP)
  0x01a8 00424 (main.go:13)  MOVBLZX  32(SP), CX
  0x01ad 00429 (main.go:13)  MOVQ  CX, ""..autotmp_53+88(SP)
  0x01b2 00434 (main.go:15)  MOVQ  "".v1+80(SP), DX
  0x01b7 00439 (main.go:15)  MOVQ  DX, (SP)
  0x01bb 00443 (main.go:15)  NOP
  0x01c0 00448 (main.go:15)  CALL  runtime.convT64(SB)
  0x01c5 00453 (main.go:15)  MOVQ  8(SP), AX
  0x01ca 00458 (main.go:15)  XORPS  X1, X1
  0x01cd 00461 (main.go:15)  MOVUPS  X1, ""..autotmp_34+184(SP)
  0x01d5 00469 (main.go:15)  LEAQ  type.int(SB), CX
  0x01dc 00476 (main.go:15)  MOVQ  CX, ""..autotmp_34+184(SP)
  0x01e4 00484 (main.go:15)  MOVQ  AX, ""..autotmp_34+192(SP)
  ... 省略不相关的部分

runtime.mapaccess1runtime.mapaccess2

func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer
func mapaccess2(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, bool)

mapaccess2mapaccess1boolmapaccess1

// mapaccess1 returns a pointer to h[key].  Never returns nil, instead
// it will return a reference to the zero object for the elem type if
// the key is not in the map.
// NOTE: The returned pointer may keep the whole map live, so don't
// hold onto it for very long.
func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
  if raceenabled && h != nil {
    callerpc := getcallerpc()
    pc := funcPC(mapaccess1)
    racereadpc(unsafe.Pointer(h), callerpc, pc)
    raceReadObjectPC(t.key, key, callerpc, pc)
  }
  if msanenabled && h != nil {
    msanread(key, t.key.size)
  }
  if h == nil || h.count == 0 {
    if t.hashMightPanic() {
      t.hasher(key, 0) // see issue 23734
    }
    return unsafe.Pointer(&zeroVal[0])
  }

  // map不允许并发读写，会触发该panic。这里是通过flags的标记位来判断是否正在进行写操作。
  if h.flags&hashWriting != 0 {
    throw("concurrent map read and map write")
  }
  hash := t.hasher(key, uintptr(h.hash0))
  m := bucketMask(h.B) // m即后B位，用于定位[]bmap中对应的bmap。
  b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize))) // 定位到具体的bmap。
  if c := h.oldbuckets; c != nil {
    if !h.sameSizeGrow() {
      // There used to be half as many buckets; mask down one more power of two.
      m >>= 1
    }
    oldb := (*bmap)(add(c, (hash&m)*uintptr(t.bucketsize)))
    if !evacuated(oldb) {
      b = oldb
    }
  }
  top := tophash(hash) // tophash即首8位，用于快速比较。
bucketloop:
  // 遍历迭代bmap和它的overflow bmap。
  for ; b != nil; b = b.overflow(t) {
    for i := uintptr(0); i < bucketCnt; i++ {
      // 首先比较tophash是否相同，不相同会直接continue。
      if b.tophash[i] != top { 
        if b.tophash[i] == emptyRest {
          break bucketloop
        }
        continue
      }
      k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
      if t.indirectkey() {
        k = *((*unsafe.Pointer)(k))
      }

      // tophash相同的情况下，会比较key的值是否相同。若相同，则说明已经定位到该key，返回结果。
      if t.key.equal(key, k) {
        e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
        if t.indirectelem() {
          e = *((*unsafe.Pointer)(e))
        }
        return e
      }
    }
  }
  return unsafe.Pointer(&zeroVal[0])
}

key是string/32位整型/64位整型

stringint32/uint32int64/uint64

func mapaccess1_fast32(t *maptype, h *hmap, key uint32) unsafe.Pointer
func mapaccess2_fast32(t *maptype, h *hmap, key uint32) (unsafe.Pointer, bool)

func mapaccess1_fast64(t *maptype, h *hmap, key uint64) unsafe.Pointer
func mapaccess2_fast64(t *maptype, h *hmap, key uint64) (unsafe.Pointer, bool)

func mapaccess1_faststr(t *maptype, h *hmap, ky string) unsafe.Pointer
func mapaccess2_faststr(t *maptype, h *hmap, ky string) (unsafe.Pointer, bool)

mapaccess1mapaccess2

func mapaccess1_fast64(t *maptype, h *hmap, key uint64) unsafe.Pointer {
  if raceenabled && h != nil {
    callerpc := getcallerpc()
    racereadpc(unsafe.Pointer(h), callerpc, funcPC(mapaccess1_fast64))
  }
  if h == nil || h.count == 0 {
    return unsafe.Pointer(&zeroVal[0])
  }
  if h.flags&hashWriting != 0 {
    throw("concurrent map read and map write")
  }
  var b *bmap
  if h.B == 0 {
    // One-bucket table. No need to hash.
    b = (*bmap)(h.buckets)
  } else {
    hash := t.hasher(noescape(unsafe.Pointer(&key)), uintptr(h.hash0))
    m := bucketMask(h.B)
    b = (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize)))
    if c := h.oldbuckets; c != nil {
      if !h.sameSizeGrow() {
        // There used to be half as many buckets; mask down one more power of two.
        m >>= 1
      }
      oldb := (*bmap)(add(c, (hash&m)*uintptr(t.bucketsize)))
      if !evacuated(oldb) {
        b = oldb
      }
    }
  }
  for ; b != nil; b = b.overflow(t) {
    for i, k := uintptr(0), b.keys(); i < bucketCnt; i, k = i+1, add(k, 8) {
      if *(*uint64)(k) == key && !isEmpty(b.tophash[i]) { // 直接比较key，没有先比较tophash再比较key。
        return add(unsafe.Pointer(b), dataOffset+bucketCnt*8+i*uintptr(t.elemsize))
      }
    }
  }
  return unsafe.Pointer(&zeroVal[0])
}

map的赋值

概述

bmaptophashbmaptophash

基本用法

package main

import "fmt"

func main() {
  m := make(map[int8]int)

  m[1] = 1

  fmt.Println(m)
}

让我们通过go的官方工具来查看汇编结果：

$ go tool compile -S main.go

"".main STEXT size=202 args=0x0 locals=0x60 funcid=0x0
  0x0000 00000 (main.go:5)  TEXT  "".main(SB), ABIInternal, $96-0
  0x0000 00000 (main.go:5)  MOVQ  (TLS), CX
  0x0009 00009 (main.go:5)  CMPQ  SP, 16(CX)
  0x000d 00013 (main.go:5)  PCDATA  $0, $-2
  0x000d 00013 (main.go:5)  JLS  192
  0x0013 00019 (main.go:5)  PCDATA  $0, $-1
  0x0013 00019 (main.go:5)  SUBQ  $96, SP
  0x0017 00023 (main.go:5)  MOVQ  BP, 88(SP)
  0x001c 00028 (main.go:5)  LEAQ  88(SP), BP
  0x0021 00033 (main.go:5)  FUNCDATA  $0, gclocals·69c1753bd5f81501d95132d08af04464(SB)
  0x0021 00033 (main.go:5)  FUNCDATA  $1, gclocals·713abd6cdf5e052e4dcd3eb297c82601(SB)
  0x0021 00033 (main.go:5)  FUNCDATA  $2, "".main.stkobj(SB)
  0x0021 00033 (main.go:6)  PCDATA  $1, $0
  0x0021 00033 (main.go:6)  CALL  runtime.makemap_small(SB)
  0x0026 00038 (main.go:6)  MOVQ  (SP), AX
  0x002a 00042 (main.go:6)  MOVQ  AX, ""..autotmp_20+64(SP)
  0x002f 00047 (main.go:8)  LEAQ  type.map[int8]int(SB), CX
  0x0036 00054 (main.go:8)  MOVQ  CX, (SP)
  0x003a 00058 (main.go:8)  MOVQ  AX, 8(SP)
  0x003f 00063 (main.go:8)  LEAQ  ""..stmp_0(SB), DX
  0x0046 00070 (main.go:8)  MOVQ  DX, 16(SP)
  0x004b 00075 (main.go:8)  PCDATA  $1, $1
  0x004b 00075 (main.go:8)  CALL  runtime.mapassign(SB)
  0x0050 00080 (main.go:8)  MOVQ  24(SP), AX
  0x0055 00085 (main.go:8)  MOVQ  $1, (AX)
  0x005c 00092 (main.go:10)  XORPS  X0, X0
  0x005f 00095 (main.go:10)  MOVUPS  X0, ""..autotmp_14+72(SP)
  0x0064 00100 (main.go:10)  LEAQ  type.map[int8]int(SB), AX
  0x006b 00107 (main.go:10)  MOVQ  AX, ""..autotmp_14+72(SP)
  0x0070 00112 (main.go:10)  MOVQ  ""..autotmp_20+64(SP), AX
  0x0075 00117 (main.go:10)  MOVQ  AX, ""..autotmp_14+80(SP)
  ... 省略不相关的部分

runtime.mapassign

// Like mapaccess, but allocates a slot for the key if it is not present in the map.
func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
  if h == nil {
    panic(plainError("assignment to entry in nil map"))
  }
  if raceenabled {
    callerpc := getcallerpc()
    pc := funcPC(mapassign)
    racewritepc(unsafe.Pointer(h), callerpc, pc)
    raceReadObjectPC(t.key, key, callerpc, pc)
  }
  if msanenabled {
    msanread(key, t.key.size)
  }
  // map不允许并发读写，通过flags的位判断。
  if h.flags&hashWriting != 0 {
    throw("concurrent map writes")
  }
  hash := t.hasher(key, uintptr(h.hash0))

  // Set hashWriting after calling t.hasher, since t.hasher may panic,
  // in which case we have not actually done a write.
  // 因为赋值是写操作，因此置位flags写标志。
  h.flags ^= hashWriting

  if h.buckets == nil {
    h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)
  }

again:
  bucket := hash & bucketMask(h.B)
  if h.growing() { // 判断是否处于扩容中。
    growWork(t, h, bucket)
  }
  b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize))) // 定位bmap
  top := tophash(hash) // 计算tophash

  var inserti *uint8
  var insertk unsafe.Pointer
  var elem unsafe.Pointer
bucketloop:
  for {
    for i := uintptr(0); i < bucketCnt; i++ {
      if b.tophash[i] != top {
        // 当遇到第一个空闲位置的tophash，记录下来。
        // 遍历完整个bmap及其overflow bmap，没有找到该key，则认为map中不存在该key。
        // 最终插入位置即这里记录的位置。
        if isEmpty(b.tophash[i]) && inserti == nil {
          inserti = &b.tophash[i]
          insertk = add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
          elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
        }
        if b.tophash[i] == emptyRest {
          break bucketloop
        }
        continue
      }
      k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
      if t.indirectkey() {
        k = *((*unsafe.Pointer)(k))
      }
      if !t.key.equal(key, k) {
        continue
      }
      // already have a mapping for key. Update it.
      // 表明在map中已经存在这个key，那么就需要对该key的值进行更新操作。
      if t.needkeyupdate() {
        typedmemmove(t.key, k, key)
      }
      elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
      goto done
    }
    ovf := b.overflow(t) // 继续遍历下一个overflow bmap
    if ovf == nil {
      break
    }
    b = ovf
  }

  // Did not find mapping for key. Allocate new cell & add entry.

  // If we hit the max load factor or we have too many overflow buckets,
  // and we're not already in the middle of growing, start growing.

  // 触发扩容的情况。
  if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
    hashGrow(t, h)
    goto again // Growing the table invalidates everything, so try again
  }

  if inserti == nil {
    // The current bucket and all the overflow buckets connected to it are full, allocate a new one.
    newb := h.newoverflow(t, b)
    inserti = &newb.tophash[0]
    insertk = add(unsafe.Pointer(newb), dataOffset)
    elem = add(insertk, bucketCnt*uintptr(t.keysize))
  }

  // store new key/elem at insert position
  if t.indirectkey() {
    kmem := newobject(t.key)
    *(*unsafe.Pointer)(insertk) = kmem
    insertk = kmem
  }
  if t.indirectelem() {
    vmem := newobject(t.elem)
    *(*unsafe.Pointer)(elem) = vmem
  }
  typedmemmove(t.key, insertk, key)
  *inserti = top
  h.count++

done:
  if h.flags&hashWriting == 0 {
    throw("concurrent map writes")
  }
  h.flags &^= hashWriting // 将写操作标记位置空。
  if t.indirectelem() {
    elem = *((*unsafe.Pointer)(elem))
  }
  return elem
}

key是string/32位整型/64位整型

stringint32/uint32int64/uint64

func mapassign_fast32(t *maptype, h *hmap, key uint32) unsafe.Pointer

func mapassign_fast64(t *maptype, h *hmap, key uint64) unsafe.Pointer

func mapassign_faststr(t *maptype, h *hmap, s string) unsafe.Pointer

map的扩容和迁移

概述

map扩容的目的在于减少Hash冲突，防止算法复杂度退化，保持Hash算法O(1)的时间复杂度。

map的扩容对使用方不可见，开发者在使用map的过程中是不会感知到map是否处于扩容中，以及当前扩容的进度。Go语言的map扩容是渐进式的，即整个扩容过程拆散在每一次的写操作里面，这样做的好处是保证每一次map的读写操作时间复杂度都是稳定的。

map触发扩容的时机有两个：

map的负载因子（长度与容量的比例）超过阈值，此时map被认为无法承担更多的key，需要进行2倍扩容。

                                                             负载因子超过阈值

2.map存在局部bmap包含过多overflow的情况，此时map会认为局部的bmap可以进行tophash密集排列，让overflow数量更少

                                                        局部overflow过多

扩容过程源码

首先，我们通过源码来查看如何判断负载因子是否超过阈值：

// overLoadFactor reports whether count items placed in 1<<B buckets is over loadFactor.
func overLoadFactor(count int, B uint8) bool {
  return count > bucketCnt && uintptr(count) > loadFactorNum*(bucketShift(B)/loadFactorDen)
}

// bucketShift returns 1<<b, optimized for code generation.
func bucketShift(b uint8) uintptr {
  // Masking the shift amount allows overflow checks to be elided.
  return uintptr(1) << (b & (sys.PtrSize*8 - 1))
}

const (
  // Maximum number of key/elem pairs a bucket can hold.
  bucketCntBits = 3
  bucketCnt     = 1 << bucketCntBits

  // Maximum average load of a bucket that triggers growth is 6.5.
  // Represent as loadFactorNum/loadFactorDen, to allow integer math.
  loadFactorNum = 13
  loadFactorDen = 2
)

loadFactorNumloadFactorDencountbucketShift(B)

hashGrow

func hashGrow(t *maptype, h *hmap) {
  // If we've hit the load factor, get bigger.
  // Otherwise, there are too many overflow buckets,
  // so keep the same number of buckets and "grow" laterally.
  bigger := uint8(1)
  if !overLoadFactor(h.count+1, h.B) { // 如果不是负载因子超过阈值，那么本次map的扩容实际上更应该理解为map的整理，容量不变。
    bigger = 0
    h.flags |= sameSizeGrow // 标记sameSizeGrow，表示容量不变。
  }
  oldbuckets := h.buckets // 保留旧的buckets。
  newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil) // 创建新的buckets。

  flags := h.flags &^ (iterator | oldIterator)
  if h.flags&iterator != 0 {
    flags |= oldIterator
  }
  // commit the grow (atomic wrt gc)
  h.B += bigger
  h.flags = flags
  h.oldbuckets = oldbuckets
  h.buckets = newbuckets
  h.nevacuate = 0 // 表示搬迁进度，从0开始将会依次递增，每调用一次evacuate()会自增一次。
  h.noverflow = 0

  if h.extra != nil && h.extra.overflow != nil {
    // Promote current overflow buckets to the old generation.
    if h.extra.oldoverflow != nil {
      throw("oldoverflow is not nil")
    }
    h.extra.oldoverflow = h.extra.overflow
    h.extra.overflow = nil
  }
  if nextOverflow != nil {
    if h.extra == nil {
      h.extra = new(mapextra)
    }
    h.extra.nextOverflow = nextOverflow
  }

  // the actual copying of the hash table data is done incrementally
  // by growWork() and evacuate().
}

evacuate

// evacDst is an evacuation destination.
type evacDst struct {
  b *bmap          // current destination bucket // 搬迁目标bmap。
  i int            // key/elem index into b // 目标bmap的当前索引。
  k unsafe.Pointer // pointer to current key storage // 当前key。
  e unsafe.Pointer // pointer to current elem storage // 当前value。
}

func evacuate(t *maptype, h *hmap, oldbucket uintptr) {
  b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize))) // 定位旧bmap。
  newbit := h.noldbuckets()
  if !evacuated(b) { // 判断是否已经被搬迁过。
    // TODO: reuse overflow buckets instead of using new ones, if there
    // is no iterator using the old buckets.  (If !oldIterator.)

    // xy contains the x and y (low and high) evacuation destinations.
    var xy [2]evacDst
    x := &xy[0]
    x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.bucketsize)))
    x.k = add(unsafe.Pointer(x.b), dataOffset)
    x.e = add(x.k, bucketCnt*uintptr(t.keysize))

    if !h.sameSizeGrow() { // 对于2倍扩容的场景，搬迁目标可能有2个，因此xy[1]记录另一个目标。
      // Only calculate y pointers if we're growing bigger.
      // Otherwise GC can see bad pointers.
      y := &xy[1]
      y.b = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.bucketsize)))
      y.k = add(unsafe.Pointer(y.b), dataOffset)
      y.e = add(y.k, bucketCnt*uintptr(t.keysize))
    }

    for ; b != nil; b = b.overflow(t) { // 遍历当前的bmap。
      k := add(unsafe.Pointer(b), dataOffset)
      e := add(k, bucketCnt*uintptr(t.keysize))
      for i := 0; i < bucketCnt; i, k, e = i+1, add(k, uintptr(t.keysize)), add(e, uintptr(t.elemsize)) {
        top := b.tophash[i]
        if isEmpty(top) {
          b.tophash[i] = evacuatedEmpty
          continue
        }
        if top < minTopHash {
          throw("bad map state")
        }
        k2 := k
        if t.indirectkey() {
          k2 = *((*unsafe.Pointer)(k2))
        }
        var useY uint8
        if !h.sameSizeGrow() { // 对于2倍扩容的场景，要重新计算tophash
          // Compute hash to make our evacuation decision (whether we need
          // to send this key/elem to bucket x or bucket y).
          hash := t.hasher(k2, uintptr(h.hash0))
          if h.flags&iterator != 0 && !t.reflexivekey() && !t.key.equal(k2, k2) {
            // If key != key (NaNs), then the hash could be (and probably
            // will be) entirely different from the old hash. Moreover,
            // it isn't reproducible. Reproducibility is required in the
            // presence of iterators, as our evacuation decision must
            // match whatever decision the iterator made.
            // Fortunately, we have the freedom to send these keys either
            // way. Also, tophash is meaningless for these kinds of keys.
            // We let the low bit of tophash drive the evacuation decision.
            // We recompute a new random tophash for the next level so
            // these keys will get evenly distributed across all buckets
            // after multiple grows.
            useY = top & 1
            top = tophash(hash)
          } else {
            if hash&newbit != 0 {
              useY = 1
            }
          }
        }

        if evacuatedX+1 != evacuatedY || evacuatedX^1 != evacuatedY {
          throw("bad evacuatedN")
        }

        b.tophash[i] = evacuatedX + useY // evacuatedX + 1 == evacuatedY
        dst := &xy[useY]                 // evacuation destination

        if dst.i == bucketCnt {
          dst.b = h.newoverflow(t, dst.b)
          dst.i = 0
          dst.k = add(unsafe.Pointer(dst.b), dataOffset)
          dst.e = add(dst.k, bucketCnt*uintptr(t.keysize))
        }
        dst.b.tophash[dst.i&(bucketCnt-1)] = top // mask dst.i as an optimization, to avoid a bounds check

        // 拷贝内存数据到新的搬迁位置
        if t.indirectkey() {
          *(*unsafe.Pointer)(dst.k) = k2 // copy pointer
        } else {
          typedmemmove(t.key, dst.k, k) // copy elem
        }
        if t.indirectelem() {
          *(*unsafe.Pointer)(dst.e) = *(*unsafe.Pointer)(e)
        } else {
          typedmemmove(t.elem, dst.e, e)
        }
        dst.i++
        // These updates might push these pointers past the end of the
        // key or elem arrays.  That's ok, as we have the overflow pointer
        // at the end of the bucket to protect against pointing past the
        // end of the bucket.
        dst.k = add(dst.k, uintptr(t.keysize))
        dst.e = add(dst.e, uintptr(t.elemsize))
      }
    }
    // Unlink the overflow buckets & clear key/elem to help GC.
    // 把旧的bmap引用解除，便于GC完成内存回收。
    if h.flags&oldIterator == 0 && t.bucket.ptrdata != 0 {
      b := add(h.oldbuckets, oldbucket*uintptr(t.bucketsize))
      // Preserve b.tophash because the evacuation
      // state is maintained there.
      ptr := add(b, dataOffset)
      n := uintptr(t.bucketsize) - dataOffset
      memclrHasPointers(ptr, n)
    }
  }

  if oldbucket == h.nevacuate {
    advanceEvacuationMark(h, t, newbit)
  }
}

扩容中插入新的数据会放入到新桶中。

扩容

在向map中添加数据时，当达到某个条件，则会引发字典扩容。

扩容条件：

map中数据总个数 / 桶个数 > 6.5 ，引发翻倍扩容。
使用了太多的溢出桶时（溢出桶使用的太多会导致map处理速度降低）。
- B <=15，已使用的溢出桶个数 >= \(2^B\) 时，引发等量扩容。
- B > 15，已使用的溢出桶个数 >= \(2^{15}\) 时，引发等量扩容。

双倍扩容：装载因子多大，直接翻倍，B+1；扩容也不是申请一块内存，立马开始拷贝，每一次访问旧的 buckets 时，就迁移一部分，直到完成，旧 bucket 被 GC 回收。

等量扩容：重新排列，极端情况下，重新排列也解决不了，map 成了链表，性能大大降低，此时哈希种子 hash0 的设置，可以降低此类极端场景的发生。

迁移

扩容之后，必然要伴随着数据的迁移，即：将旧桶中的数据要迁移到新桶中。

翻倍扩容

如果是翻倍扩容，那么迁移规就是将旧桶中的数据分流至新的两个桶中（比例不定），并且桶编号的位置为：同编号位置和翻倍后对应编号位置

那么问题来了，如何实现的这种迁移呢？

新桶的B的值=原桶B + 1

底B位

扩容后，B的值在原来的基础上已加1，也就意味着通过多1位来计算此键值对要分流到新桶位置，如上图：

当新增的位（红色）的值为 0，则数据会迁移到与旧桶编号一致的位置。
当新增的位（红色）的值为 1，则数据会迁移到翻倍后对应编号位置。

等量扩容

如果是等量扩容（溢出桶太多引发的扩容），那么数据迁移机制就会比较简单，就是将旧桶（含溢出桶）中的值迁移到新桶中。

这种扩容和迁移的意义在于：当溢出桶比较多而每个桶中的数据又不多时，可以通过等量扩容和迁移让数据更紧凑，从而减少溢出桶。

扩容和迁移总结

扩容

底B位

map的删除

概述

bmapoverflowtophash

基本用法

builtindelete

package main

func main() {
  m := make(map[int8]int)

  delete(m, 1)
}

让我们通过go的官方工具来查看汇编结果：

$ go tool compile -S main.go
"".main STEXT size=204 args=0x0 locals=0xa8 funcid=0x0
  0x0000 00000 (main.go:3)  TEXT  "".main(SB), ABIInternal, $168-0
  0x0000 00000 (main.go:3)  MOVQ  (TLS), CX
  0x0009 00009 (main.go:3)  LEAQ  -40(SP), AX
  0x000e 00014 (main.go:3)  CMPQ  AX, 16(CX)
  0x0012 00018 (main.go:3)  PCDATA  $0, $-2
  0x0012 00018 (main.go:3)  JLS  194
  0x0018 00024 (main.go:3)  PCDATA  $0, $-1
  0x0018 00024 (main.go:3)  SUBQ  $168, SP
  0x001f 00031 (main.go:3)  MOVQ  BP, 160(SP)
  0x0027 00039 (main.go:3)  LEAQ  160(SP), BP
  0x002f 00047 (main.go:3)  FUNCDATA  $0, gclocals·69c1753bd5f81501d95132d08af04464(SB)
  0x002f 00047 (main.go:3)  FUNCDATA  $1, gclocals·d9a525d1de6bc16975a5efbb873db17d(SB)
  0x002f 00047 (main.go:3)  FUNCDATA  $2, "".main.stkobj(SB)
  0x002f 00047 (main.go:4)  XORPS  X0, X0
  0x0032 00050 (main.go:4)  MOVUPS  X0, ""..autotmp_1+112(SP)
  0x0037 00055 (main.go:4)  MOVUPS  X0, ""..autotmp_1+128(SP)
  0x003f 00063 (main.go:4)  MOVUPS  X0, ""..autotmp_1+144(SP)
  0x0047 00071 (main.go:4)  MOVQ  $0, ""..autotmp_2+24(SP)
  0x0050 00080 (main.go:4)  LEAQ  ""..autotmp_2+32(SP), DI
  0x0055 00085 (main.go:4)  PCDATA  $0, $-2
  0x0055 00085 (main.go:4)  LEAQ  -48(DI), DI
  0x0059 00089 (main.go:4)  NOP
  0x0060 00096 (main.go:4)  DUFFZERO  $277
  0x0073 00115 (main.go:4)  PCDATA  $0, $-1
  0x0073 00115 (main.go:4)  LEAQ  ""..autotmp_2+24(SP), AX
  0x0078 00120 (main.go:4)  MOVQ  AX, ""..autotmp_1+128(SP)
  0x0080 00128 (main.go:4)  PCDATA  $1, $1
  0x0080 00128 (main.go:4)  CALL  runtime.fastrand(SB)
  0x0085 00133 (main.go:4)  MOVL  (SP), AX
  0x0088 00136 (main.go:4)  MOVL  AX, ""..autotmp_1+124(SP)
  0x008c 00140 (main.go:6)  LEAQ  type.map[int8]int(SB), AX
  0x0093 00147 (main.go:6)  MOVQ  AX, (SP)
  0x0097 00151 (main.go:6)  LEAQ  ""..autotmp_1+112(SP), AX
  0x009c 00156 (main.go:6)  MOVQ  AX, 8(SP)
  0x00a1 00161 (main.go:6)  LEAQ  ""..stmp_0(SB), AX
  0x00a8 00168 (main.go:6)  MOVQ  AX, 16(SP)
  0x00ad 00173 (main.go:6)  PCDATA  $1, $0
  0x00ad 00173 (main.go:6)  CALL  runtime.mapdelete(SB)
  0x00b2 00178 (main.go:7)  MOVQ  160(SP), BP
  0x00ba 00186 (main.go:7)  ADDQ  $168, SP
  0x00c1 00193 (main.go:7)  RET
  0x00c2 00194 (main.go:7)  NOP
  0x00c2 00194 (main.go:3)  PCDATA  $1, $-1
  0x00c2 00194 (main.go:3)  PCDATA  $0, $-2
  0x00c2 00194 (main.go:3)  CALL  runtime.morestack_noctxt(SB)
  0x00c7 00199 (main.go:3)  PCDATA  $0, $-1
  0x00c7 00199 (main.go:3)  JMP  0
  ... 省略不相关的部分

runtime.mapdelete

func mapdelete(t *maptype, h *hmap, key unsafe.Pointer) {
  if raceenabled && h != nil {
    callerpc := getcallerpc()
    pc := funcPC(mapdelete)
    racewritepc(unsafe.Pointer(h), callerpc, pc)
    raceReadObjectPC(t.key, key, callerpc, pc)
  }
  if msanenabled && h != nil {
    msanread(key, t.key.size)
  }
  if h == nil || h.count == 0 {
    if t.hashMightPanic() {
      t.hasher(key, 0) // see issue 23734
    }
    return
  }
  if h.flags&hashWriting != 0 {
    throw("concurrent map writes")
  }

  hash := t.hasher(key, uintptr(h.hash0))

  // Set hashWriting after calling t.hasher, since t.hasher may panic,
  // in which case we have not actually done a write (delete).
  h.flags ^= hashWriting

  bucket := hash & bucketMask(h.B)
  if h.growing() {
    growWork(t, h, bucket)
  }
  b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
  bOrig := b
  top := tophash(hash)
search:
  for ; b != nil; b = b.overflow(t) {
    for i := uintptr(0); i < bucketCnt; i++ {
      if b.tophash[i] != top {
        if b.tophash[i] == emptyRest {
          break search
        }
        continue
      }
      k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
      k2 := k
      if t.indirectkey() {
        k2 = *((*unsafe.Pointer)(k2))
      }
      if !t.key.equal(key, k2) {
        continue
      }
      // 删除key
      // Only clear key if there are pointers in it.
      if t.indirectkey() {
        *(*unsafe.Pointer)(k) = nil
      } else if t.key.ptrdata != 0 {
        memclrHasPointers(k, t.key.size)
      }

      // 删除value
      e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
      if t.indirectelem() {
        *(*unsafe.Pointer)(e) = nil
      } else if t.elem.ptrdata != 0 {
        memclrHasPointers(e, t.elem.size)
      } else {
        memclrNoHeapPointers(e, t.elem.size)
      }

      // 清空tophash
      b.tophash[i] = emptyOne
      // If the bucket now ends in a bunch of emptyOne states,
      // change those to emptyRest states.
      // It would be nice to make this a separate function, but
      // for loops are not currently inlineable.
      if i == bucketCnt-1 {
        if b.overflow(t) != nil && b.overflow(t).tophash[0] != emptyRest {
          goto notLast
        }
      } else {
        if b.tophash[i+1] != emptyRest {
          goto notLast
        }
      }
      for {
        b.tophash[i] = emptyRest
        if i == 0 {
          if b == bOrig {
            break // beginning of initial bucket, we're done.
          }
          // Find previous bucket, continue at its last entry.
          c := b
          for b = bOrig; b.overflow(t) != c; b = b.overflow(t) {
          }
          i = bucketCnt - 1
        } else {
          i--
        }
        if b.tophash[i] != emptyOne {
          break
        }
      }
    notLast:
      h.count--
      // Reset the hash seed to make it more difficult for attackers to
      // repeatedly trigger hash collisions. See issue 25237.
      if h.count == 0 {
        h.hash0 = fastrand()
      }
      break search
    }
  }

  if h.flags&hashWriting == 0 {
    throw("concurrent map writes")
  }
  h.flags &^= hashWriting
}

key是string/32位整型/64位整型

与map的访问同理，当map对应的key类型是string、int32/uint32、int64/uint64其中之一，Go语言的runtime层会调用不同函数，签名如下：

func mapdelete_fast32(t *maptype, h *hmap, key uint32)
func mapdelete_fast64(t *maptype, h *hmap, key uint64)
func mapdelete_faststr(t *maptype, h *hmap, ky string)

map的迭代

概述

nevacuatebmapbmap

基本用法

for range

package main

import "fmt"

func main() {
  m := make(map[int8]int)

  for k, v := range m {
    fmt.Println(k, v)
  }
}

runtime.mapiterinitruntime.mapiternext

// mapiterinit initializes the hiter struct used for ranging over maps.
// The hiter struct pointed to by 'it' is allocated on the stack
// by the compilers order pass or on the heap by reflect_mapiterinit.
// Both need to have zeroed hiter since the struct contains pointers.
func mapiterinit(t *maptype, h *hmap, it *hiter) {
  if raceenabled && h != nil {
    callerpc := getcallerpc()
    racereadpc(unsafe.Pointer(h), callerpc, funcPC(mapiterinit))
  }

  if h == nil || h.count == 0 {
    return
  }

  if unsafe.Sizeof(hiter{})/sys.PtrSize != 12 {
    throw("hash_iter size incorrect") // see cmd/compile/internal/gc/reflect.go
  }
  it.t = t
  it.h = h

  // grab snapshot of bucket state
  it.B = h.B
  it.buckets = h.buckets
  if t.bucket.ptrdata == 0 {
    // Allocate the current slice and remember pointers to both current and old.
    // This preserves all relevant overflow buckets alive even if
    // the table grows and/or overflow buckets are added to the table
    // while we are iterating.
    h.createOverflow()
    it.overflow = h.extra.overflow
    it.oldoverflow = h.extra.oldoverflow
  }

  // decide where to start
  // 可以看出，map的迭代器会随机选择一个开始的bucket位置和开始的offset。
  r := uintptr(fastrand())
  if h.B > 31-bucketCntBits {
    r += uintptr(fastrand()) << 31
  }
  it.startBucket = r & bucketMask(h.B)
  it.offset = uint8(r >> h.B & (bucketCnt - 1))

  // iterator state
  it.bucket = it.startBucket // 记录随机的开始位置。

  // Remember we have an iterator.
  // Can run concurrently with another mapiterinit().
  if old := h.flags; old&(iterator|oldIterator) != iterator|oldIterator {
    atomic.Or8(&h.flags, iterator|oldIterator)
  }

  mapiternext(it)
}

func mapiternext(it *hiter) {
  h := it.h
  if raceenabled {
    callerpc := getcallerpc()
    racereadpc(unsafe.Pointer(h), callerpc, funcPC(mapiternext))
  }
  if h.flags&hashWriting != 0 {
    throw("concurrent map iteration and map write")
  }
  t := it.t
  bucket := it.bucket
  b := it.bptr
  i := it.i
  checkBucket := it.checkBucket

next:
  if b == nil {
    if bucket == it.startBucket && it.wrapped {
      // end of iteration
      it.key = nil
      it.elem = nil
      return
    }
    // 判断是否正处于扩容搬迁中，对于正在搬迁过程中仍未完成搬迁的bucket，迭代过程中需要考虑进入旧的bucket里面。
    if h.growing() && it.B == h.B {
      // Iterator was started in the middle of a grow, and the grow isn't done yet.
      // If the bucket we're looking at hasn't been filled in yet (i.e. the old
      // bucket hasn't been evacuated) then we need to iterate through the old
      // bucket and only return the ones that will be migrated to this bucket.
      oldbucket := bucket & it.h.oldbucketmask()
      b = (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))
      if !evacuated(b) {
        checkBucket = bucket
      } else {
        b = (*bmap)(add(it.buckets, bucket*uintptr(t.bucketsize)))
        checkBucket = noCheck
      }
    } else {
      b = (*bmap)(add(it.buckets, bucket*uintptr(t.bucketsize)))
      checkBucket = noCheck
    }
    bucket++
    if bucket == bucketShift(it.B) {
      bucket = 0
      it.wrapped = true
    }
    i = 0
  }
  for ; i < bucketCnt; i++ {
    offi := (i + it.offset) & (bucketCnt - 1)
    if isEmpty(b.tophash[offi]) || b.tophash[offi] == evacuatedEmpty {
      // TODO: emptyRest is hard to use here, as we start iterating
      // in the middle of a bucket. It's feasible, just tricky.
      continue
    }
    k := add(unsafe.Pointer(b), dataOffset+uintptr(offi)*uintptr(t.keysize))
    if t.indirectkey() {
      k = *((*unsafe.Pointer)(k))
    }
    e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+uintptr(offi)*uintptr(t.elemsize))
    if checkBucket != noCheck && !h.sameSizeGrow() {
      // Special case: iterator was started during a grow to a larger size
      // and the grow is not done yet. We're working on a bucket whose
      // oldbucket has not been evacuated yet. Or at least, it wasn't
      // evacuated when we started the bucket. So we're iterating
      // through the oldbucket, skipping any keys that will go
      // to the other new bucket (each oldbucket expands to two
      // buckets during a grow).
      if t.reflexivekey() || t.key.equal(k, k) {
        // If the item in the oldbucket is not destined for
        // the current new bucket in the iteration, skip it.
        hash := t.hasher(k, uintptr(h.hash0))
        if hash&bucketMask(it.B) != checkBucket {
          continue
        }
      } else {
        // Hash isn't repeatable if k != k (NaNs).  We need a
        // repeatable and randomish choice of which direction
        // to send NaNs during evacuation. We'll use the low
        // bit of tophash to decide which way NaNs go.
        // NOTE: this case is why we need two evacuate tophash
        // values, evacuatedX and evacuatedY, that differ in
        // their low bit.
        if checkBucket>>(it.B-1) != uintptr(b.tophash[offi]&1) {
          continue
        }
      }
    }
    if (b.tophash[offi] != evacuatedX && b.tophash[offi] != evacuatedY) ||
      !(t.reflexivekey() || t.key.equal(k, k)) {
      // This is the golden data, we can return it.
      // OR
      // key!=key, so the entry can't be deleted or updated, so we can just return it.
      // That's lucky for us because when key!=key we can't look it up successfully.
      it.key = k
      if t.indirectelem() {
        e = *((*unsafe.Pointer)(e))
      }
      it.elem = e
    } else {
      // The hash table has grown since the iterator was started.
      // The golden data for this key is now somewhere else.
      // Check the current hash table for the data.
      // This code handles the case where the key
      // has been deleted, updated, or deleted and reinserted.
      // NOTE: we need to regrab the key as it has potentially been
      // updated to an equal() but not identical key (e.g. +0.0 vs -0.0).
      rk, re := mapaccessK(t, h, k)
      if rk == nil {
        continue // key has been deleted
      }
      it.key = rk
      it.elem = re
    }
    it.bucket = bucket
    if it.bptr != b { // avoid unnecessary write barrier; see issue 14921
      it.bptr = b
    }
    it.i = i + 1
    it.checkBucket = checkBucket
    return
  }
  b = b.overflow(t)
  i = 0
  goto next
}

小结一下，map的迭代顺序是完全随机的，并且map在迭代过程中会根据搬迁进度来判断是否要兼顾旧的bucket。

是否能在迭代过程中删除元素

这是一个比较热门的话题，即map中是否可以在迭代过程中删除元素。在不同语言中的行为结果可能是不一样的，熟悉C++的读者应该知道这样的行为在C++语言中是不合法的，本文讨论的是Go语言的行为：

import "fmt"

func main() {
  m := map[int]int{
    1: 1,
    2: 2,
    3: 3,
  }

  for k := range m {
    if k == 1 {
      delete(m, k)
    }
  }

  fmt.Println(m)
}

bmaptophash

key 可以是 float 型吗？

"".statictmp_02.42.4000000000000000000000001math.Float64bits()

float 型可以作为 key，但是由于精度的问题，会导致一些诡异的问题，慎用之。

总结

makemap_smallmakemap64makemapmapaccess2mapaccess1 mapassign_fast32 mapassign_fast64mapassign_faststrmapdelete_fast32mapdelete_fast64mapdelete_faststrruntime.mapiterinitruntime.mapiternext

参考

https://github.com/WuPeiqi/go_course/blob/master/day06 数据类型：指针、切片、字典/笔记/day06 数据类型.md