golang中的WaitGroup实现原理

原理解析

type WaitGroup struct {
   noCopy noCopy

   // 64-bit value: high 32 bits are counter, low 32 bits are waiter count.
   // 64-bit atomic operations require 64-bit alignment, but 32-bit
   // compilers only guarantee that 64-bit fields are 32-bit aligned.
   // For this reason on 32 bit architectures we need to check in state()
   // if state1 is aligned or not, and dynamically "swap" the field order if
   // needed.
   state1 uint64
   state2 uint32
}

noCopygo vet

state1字段

Waitwait

state2为信号量。

WaitGroup 的整个调用过程可以简单地描述成下面这样：

WaitGroup.Add(n)counter + nWaitGroup.Wait()waiter++runtime_Semacquire(semap)WaitGroup.Done()counter--runtime_SemreleaseWaitGroup.Wait

关于内存对其

func (wg *WaitGroup) state() (statep *uint64, semap *uint32) {
	if unsafe.Alignof(wg.state1) == 8 || uintptr(unsafe.Pointer(&wg.state1))%8 == 0 {
		// state1 is 64-bit aligned: nothing to do.
		return &wg.state1, &wg.state2
	} else {
		// state1 is 32-bit aligned but not 64-bit aligned: this means that
		// (&state1)+4 is 64-bit aligned.
		state := (*[3]uint32)(unsafe.Pointer(&wg.state1))
		return (*uint64)(unsafe.Pointer(&state[1])), &state[0]
	}
}

如果变量是 64 位对齐 (8 byte), 则该变量的起始地址是 8 的倍数。如果变量是 32 位对齐 (4 byte)，则该变量的起始地址是 4 的倍数。

state1state1[3]uint32semapcounter, waiter

为什么会有这种奇怪的设定呢？这里涉及两个前提:

前提 1：在 WaitGroup 的真实逻辑中， counter 和 waiter 被合在了一起，当成一个 64 位的整数对外使用。当需要变化 counter 和 waiter 的值的时候，也是通过 atomic 来原子操作这个 64 位整数。

前提 2：在 32 位系统下，如果使用 atomic 对 64 位变量进行原子操作，调用者需要自行保证变量的 64 位对齐，否则将会出现异常。golang 的官方文档 sync/atomic/#pkg-note-BUG 原文是这么说的：

On ARM, x86-32, and 32-bit MIPS, it is the caller’s responsibility to arrange for 64-bit alignment of 64-bit words accessed atomically. The first word in a variable or in an allocated struct, array, or slice can be relied upon to be 64-bit aligned.

count+waiter

semapcounter+waiter

state1

sync.mutexstate int32

sync.mutex

counterwaitercounterwaitercounter

这里的原子操作并没有使用Mutex或者RWMutex这样的锁，主要是因为锁会带来不小的性能损耗，存在上下文切换，而对于单个内存地址的原子操作最好的方式是atomic，因为这是由底层硬件提供的支持（CPU指令），粒度更小，性能更高。

源码部分

func (wg *WaitGroup) Add(delta int) {
    // wg.state()返回的是地址
	statep, semap := wg.state()
	
    // 原子操作，修改statep高32位的值，即counter的值
	state := atomic.AddUint64(statep, uint64(delta)<<32)
    
    // 右移32位，使高32位变成了低32，得到counter的值
	v := int32(state >> 32)
    
    // 直接取低32位，得到waiter的值
	w := uint32(state)
    
	// 不规范的操作
	if v < 0 {
		panic("sync: negative WaitGroup counter")
	}
    // 不规范的操作
	if w != 0 && delta > 0 && v == int32(delta) {
		panic("sync: WaitGroup misuse: Add called concurrently with Wait")
	}
    // 这是正常的情况
	if v > 0 || w == 0 {
		return
	}
    
    // 剩下的就是 counter == 0 且 waiter != 0 的情况
    // 在这个情况下，*statep 的值就是 waiter 的值，否则就有问题
    // 在这个情况下，所有的任务都已经完成，可以将 *statep 整个置0
    // 同时向所有的Waiter释放信号量
    
	// This goroutine has set counter to 0 when waiters > 0.
	// Now there can't be concurrent mutations of state:
	// - Adds must not happen concurrently with Wait,
	// - Wait does not increment waiters if it sees counter == 0.
	// Still do a cheap sanity check to detect WaitGroup misuse.
	if *statep != state {
		panic("sync: WaitGroup misuse: Add called concurrently with Wait")
	}
	// Reset waiters count to 0.
	*statep = 0
	for ; w != 0; w-- {
		runtime_Semrelease(semap, false, 0)
	}
}

func (wg *WaitGroup) Done() {
	wg.Add(-1)
}

func (wg *WaitGroup) Wait() {
    // wg.state()返回的是地址
	statep, semap := wg.state()
    
    // for循环是配合CAS操作
	for {
		state := atomic.LoadUint64(statep)
		v := int32(state >> 32) // counter
		w := uint32(state) // waiter
        
        // 如果counter为0，说明所有的任务在调用Wait的时候就已经完成了，直接退出
        // 这就要求，必须在同步的情况下调用Add()，否则Wait可能先退出了
		if v == 0 {
			return
		}
		// waiter++，原子操作
		if atomic.CompareAndSwapUint64(statep, state, state+1) {
            // 如果自增成功，则获取信号量，此处信号量起到了同步的作用
			runtime_Semacquire(semap)
			return
		}
	}
}

总结一下，WaitGroup 的原理就五个点：内存对齐，原子操作，counter，waiter，信号量。

内存对齐的作用是为了原子操作。
counter的增减使用原子操作，counter的作用是一旦为0就释放全部信号量。
waiter的自增使用原子操作，waiter的作用是表明要释放多少信号量。