聊一聊golang map - Golang教程网

本文主要从源码角度去理解golang的map。map是golang的哈希表，golang使用拉链法来解决哈希冲突。

Data Structure

首先我们看一下golang里面map的数据结构，map的核心数据结构是hmap，bmap是拉链式哈希表中的桶（bucket）：

// A header for a Go map.
type hmap struct {
	// Note: the format of the hmap is also encoded in cmd/compile/internal/gc/reflect.go.
	// Make sure this stays in sync with the compiler's definition.
	count     int // # live cells == size of map.  Must be first (used by len() builtin)
	flags     uint8
	B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
	noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
	hash0     uint32 // hash seed

	buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
	oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
	nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)

	extra *mapextra // optional fields
}

type mapextra struct {
	overflow    *[]*bmap
	oldoverflow *[]*bmap
	nextOverflow *bmap
}

// A bucket for a Go map.
type bmap struct {
	// tophash generally contains the top byte of the hash value
	// for each key in this bucket. If tophash[0] < minTopHash,
	// tophash[0] is a bucket evacuation state instead.
	tophash [bucketCnt]uint8
	// Followed by bucketCnt keys and then bucketCnt elems.
	// NOTE: packing all the keys together and then all the elems together makes the
	// code a bit more complicated than alternating key/elem/key/elem/... but it allows
	// us to eliminate padding which would be needed for, e.g., map[int64]int8.
	// Followed by an overflow pointer.
}

每个桶最多放8个元素，注意这个bmap结构体，它只存了这个8个key的tophash，用于加速遍历，真正的key和val是紧凑地排在bmap结构体后面的，当要访问元素时会根据offset去访问。

大致结构图如下：

Init map

接着我们看下map是怎么被初始化的，我们平常会用 m := make(map[string]int) 去初始化一个map，在编译阶段做type check之后生成中间代码时根据情况会用 makemap_small/ makemap 去代替：

m := make(map[string]int)  //makemap_small，初始化一个空的hmap
m := make(map[string]int，10)  //makemap

在makemap中，会根据传入的hint(预估大小）调用makeBucketArray分配适当数量的buckets，数量是2的指数。如果数量大于16，数据量可能会比较多，会分配适量的overflow buckets；否则不分配，节约内存。这些overflow buckets会在正常bucket空间不够时会分配给需要的bucket，通过setoverflow函数将bucket末尾的指针指向一个overflow bucket。

好了，我们大致知道map长啥样了，下面我们聊一下golang map的相关操作：读（Read），写(Write)，扩容(Expand)，删除(Delete)，这些功能的主要实现都在src/runtime/map.go里面。其中扩容不是我们要操作的，是我们在写入或者删除元素时map自己操作的。

Read

当我们使用map[key]访问map中的某一个元素的时候，在代码编译阶段会用mapaccess1/mapaccess2替换：

v:=hashmap[key] //使用mapaccess1
v,ok:=hashmap[key] //使用mapaccess2

两个函数实现方式差不多，mapaccess2会多返回一个bool变量用于告诉这个key是否存在于map之中。下面我们截取mapaccess1函数的部分片段分析一下读取map的流程

由于map并发读写不是线程安全的，如果map正在被写入，会抛出一个异常：

if h.flags&hashWriting != 0 {
    throw("concurrent map read and map write")
}

接着计算key的hash值，取hash值的最低几位去找对应的bucket：

hash := t.hasher(key, uintptr(h.hash0))
m := bucketMask(h.B) // m = uint8(1)<<B - 1
b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize)))

在遍历bucket中的元素之前，我们要看一下map是否在扩容（后面会讲），以及要访问的bucket是否在迁移；如果有oldbucket，表明map正在扩容，由于map的扩容是渐进式的，部分old bucket中的元素可能还没有迁移到新的bucket中去，通过evacuated函数看一下bucket是否在迁移过程中，如果还没迁移，我们就使用old bucket：

if c := h.oldbuckets; c != nil {
	if !h.sameSizeGrow() {
	// There used to be half as many buckets; mask down one more power of two.
		m >>= 1
	}
	oldb := (*bmap)(add(c, (hash&m)*uintptr(t.bucketsize)))
	if !evacuated(oldb) {
		b = oldb
	}
}

//evacuated函数
func evacuated(b *bmap) bool {
	h := b.tophash[0]
	return h > emptyOne && h < minTopHash
}

然后计算key的tophash（高8位），然后先和bucket以及其overflow bucket中的tophash比较，相等才去比较整个key是否相等，否则continue省去后面的计算步骤。可以看出tophash在这里起到一个加速遍历的作用：

top := tophash(hash)
bucketloop:
	for ; b != nil; b = b.overflow(t) {
		for i := uintptr(0); i < bucketCnt; i++ {
			if b.tophash[i] != top {
				if b.tophash[i] == emptyRest {
					break bucketloop
				}
				continue
			}
			k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
			if t.indirectkey() {
				k = *((*unsafe.Pointer)(k))
			}
			if t.key.equal(key, k) {
				e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
				if t.indirectelem() {
					e = *((*unsafe.Pointer)(e))
				}
				return e
			}
		}
	}

如果没有找到key，返回一个zero object，也就是golang中的零值：

false for booleans
0 for numeric types
"" for strings
nil for pointers, functions, interfaces, slices, channels, and maps

return unsafe.Pointer(&zeroVal[0])

//zero object
const maxZero = 1024 // must match value in cmd/compile/internal/gc/walk.go:zeroValSize
var zeroVal [maxZero]byte

Write

map的写入是由 mapassign实现的

首先看在写之前做了哪些操作：会置一个写的标记位；如果没有对应的bucket，会创建一个bucket；如果bucket正在扩容（有oldbucket）,并且当前的bucket是一个old bucket，那么会将old bucket中的元素迁移到新的bucket中：

h.flags ^= hashWriting

if h.buckets == nil {
    h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)
}


again:
	bucket := hash & bucketMask(h.B)
	if h.growing() {
		growWork(t, h, bucket) //调用evacuate函数迁移
	}
	b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + bucket*uintptr(t.bucketsize)))

接下来和mapaccess的bucketloop差不多，如果有对应的key则更新

如果没有key并且map不在扩容过程中，在下面两个条件下会进行扩容：

达到一定的负载因子
过多的overflow buckets

扩容完成后然后再走一遍之前的流程：

// If we hit the max load factor or we have too many overflow buckets,
// and we're not already in the middle of growing, start growing.
if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
	hashGrow(t, h)
	goto again // Growing the table invalidates everything, so try again
}

最后写入key，value

Expand

当map触发扩容的时候会调用hashGrow函数，如果是overflow buckets过多则等量扩容，如果是负载因子较高则翻倍扩容。扩容会分配新的内存空间，扩容完成后，让oldbucket指向原来的桶, bucket指向新创建的桶：

bigger := uint8(1)
if !overLoadFactor(h.count+1, h.B) {
	bigger = 0
	h.flags |= sameSizeGrow
}
oldbuckets := h.buckets
newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)

注意这里只是扩容，但是还没有迁移数据，map的数据迁移是渐进式的，比如之前提到的写入操作时会触发growWork迁移对应的桶中的数据，还有之后的删除操作。

这种渐进式迁移数据的好处是，如果map中的数据较多，一次迁移所有数据会导致系统卡顿较慢。

迁移是通过evacuate 实现的，在桶翻倍时需要分流。比如4个桶变成了8个桶，那么hash值为3和7的key之前在3号桶，扩容后key=7的要分流到7号桶，感兴趣可以看看该函数的实现。

Delete

删除和写入类似，当我们使用delete(map,key)的时候，编译期间会转换成runtime中的对应函数，比如mapdelete, 它的流程和mapassign差不多，期间也会对相应的桶做数据迁移。

聊完了map的实现原理，我们接下来再聊一聊在使用map时的注意事项吧，可能对我们工作更有帮助：

用for range遍历map

我们在用for range去遍历map的时候，编译器会用 mapiterinit 和 mapiternext 去替代。返回的key顺序可能是随机的，因为如果有扩容，会发生rehash，key的顺序会发生变化。所以我们的程序里最好不要有逻辑依赖遍历map返回key的顺序。

map作为参数传入函数

看makemap函数的返回值，map本身其实是一个*hmap指针：

func makemap(t *maptype, hint int, h *hmap) *hmap

作为函数参数传递时传的就是hmap的地址，所以我们在函数内增删改传入的map时也会改变函数外这个map本身的

哪些类型能作为key

只要类型能比较相等就行：

能作为key的类型	不能作为key的类型
Bool	Slice
Numeric	Map
String	Function
Pointer
Channel
Interface
Struct
只包含上述类型的数组

为什么map并发读写不是线程安全的

看了之前的map实现细节也就知道了，写入时可能会出现扩容和迁移数据，没有加锁也没有原子操作，这时一个读的线程进来，可能会出现意外的结果导致这个读线程无法正确执行。其实slice并发读写也不是线程安全的，它也涉及到扩容和数据迁移并且没有锁和原子操作；只不过对于map的并发读写，官方在代码里就直接给你throw一个异常，所以我们印象更深刻。如果要并发读写map，要么自己加锁，要么用官方提供的sync.Map