Before Coding, let's briefly introduce RTP (Real-time Transport Protocol), as its name says, a real-time transport protocol for the Internet, a network protocol for transmitting audio and video over an IP network.

Developed by the audio and video transmission working group, it was first released in 1996, and the following usage ideas were proposed.

  1. Simple multicast audio conference

Use IP multicast service for voice communication. Through a certain allocation mechanism, the multicast group address and port pair are obtained. One port is used for audio data and the other is used for control (RTCP) packets. Address and port information are distributed to the intended participants. If encryption is required, it can be encrypted in a specific format.

  1. Audio and Video Conference

If audio and video media are used in the conference at the same time, they are transmitted as a separate RTP session. Audio and video media use different UDP ports to transmit separate RTP and RTCP array packets. The multicast addresses may be the same or different. The motivation for this separation is that if participants only want to accept one type of media, they can choose.

  1. Mixer and Translator

We need to consider such a situation, in a certain meeting, most people are in a high-speed network link, and a small number of people in a certain place can only connect at a low speed. In order to prevent everyone from using low bandwidth, an RTP-level repeater Mixer can be placed in the low bandwidth area. Mixer resynchronizes the received audio packets to the sender at a constant interval of 20 ms, reconstructs the audio into a single stream, encodes the audio into low-speed bandwidth audio, and then forwards it to the bandwidth packet stream on the low-speed link.

  1. Hierarchical coding

Multimedia applications should be able to adjust the transmission rate to match the receiver's capacity or adapt to network congestion. The task of adjusting the rate can be realized by combining the layered coding and layered transmission system to realize the receiver. In the context of RTP based on IP multicast, each RTP session is carried on its own multicast group. Then, the receiver can adjust the receiving bandwidth only by joining the appropriate subset of the multicast group.

RTP packet header field

The CSRC identifier list will only exist if Mixer exists. These fields have the following meanings. The first 12 octets are present in every packet.

  • version (V): 2 bits

RTP version.

  • Padding (P): 1 bit

If the padding bit is set, the packet contains at least one padding 8-bit group, and other padding bits do not belong to the payload.

  • Expansion (X): 1 bit

Exist if the extension bit is set.

  • CSRC Quantity (CC): 4 bits

The number of CSRC is contained in the fixed header, the number of CSRC identifiers.

  • Mark (M): 1 bit

The tag is defined by the configuration file. Used to mark important events such as frame boundaries in the data packet stream.

  • payload type (PT): 7 bits

This field indicates the RTP payload format, which is interpreted by the application program. The receiver must ignore data packets of payload types that cannot be understood.

  • Serial number: 16 bits

It is increased every time the RTP data packet is sent, which may be used by the receiver to detect packet loss and restore the packet sequence.

  • Timestamp: 32 bits

This field reflects the sampling time of the first 8-bit group in the RTP packet.

  • SSRC: 32 bits

Identifies the synchronization source. This identifier should be selected randomly. The two synchronization sources in the same RTP conversation should have different synchronization identifiers.

  • CSRC list: 0 to 15 items, each of which is 32 bits

This field indicates all SSRCs that contribute to the payload data.

Golang related implementation

There are some implementations of RTP, but implementation through Go has some advantages.

  • Easy to test

The ease of testing here is not only reflected in the ease of writing, the ability to quickly pass the source code, and the function directly generates the corresponding test function. And more importantly, it can provide corresponding benchmark tests, provide timing, parallel execution, memory statistics and other parameters for developers to adjust accordingly.

  • Powerful Web development capabilities at the language level

Ability to quickly parse and encapsulate JSON based on the language level. No need to introduce tripartite libraries.

  • Excellent performance

Compared to interpreted languages such as Python and Ruby, it is faster and easier to write than languages such as node and erlang. If concurrency is required in the service, the built-in keyword go can quickly start multiple goroutines.

RTP Go community has RTP-related implementations, and the corresponding tests are also relatively comprehensive. Let’s briefly introduce it.

package_test.go (Basic test)

func TestBasic(t *testing.T) {
  p := &Packet{}

  if err := p.Unmarshal([]byte{}); err == nil {
    t.Fatal("Unmarshal did not error on zero length packet")
  }

  rawPkt := []byte{
    0x90, 0xe0, 0x69, 0x8f, 0xd9, 0xc2, 0x93, 0xda, 0x1c, 0x64,
    0x27, 0x82, 0x00, 0x01, 0x00, 0x01, 0xFF, 0xFF, 0xFF, 0xFF, 0x98, 0x36, 0xbe, 0x88, 0x9e,
  }
  parsedPacket := &Packet{
        // 固定头部
    Header: Header{
      Marker:           true,
      Extension:        true,
      ExtensionProfile: 1,
      Extensions: []Extension{
        {0, []byte{
          0xFF, 0xFF, 0xFF, 0xFF,
        }},
      },
      Version:        2,
      PayloadOffset:  20,
      PayloadType:    96,
      SequenceNumber: 27023,
      Timestamp:      3653407706,
      SSRC:           476325762,
      CSRC:           []uint32{},
    },
        // 有效负载
    Payload: rawPkt[20:],
    Raw:     rawPkt,
  }

  // Unmarshal to the used Packet should work as well.
  for i := 0; i < 2; i++ {
    t.Run(fmt.Sprintf("Run%d", i+1), func(t *testing.T) {
      if err := p.Unmarshal(rawPkt); err != nil {
        t.Error(err)
      } else if !reflect.DeepEqual(p, parsedPacket) {
        t.Errorf("TestBasic unmarshal: got %#v, want %#v", p, parsedPacket)
      }

      if parsedPacket.Header.MarshalSize() != 20 {
        t.Errorf("wrong computed header marshal size")
      } else if parsedPacket.MarshalSize() != len(rawPkt) {
        t.Errorf("wrong computed marshal size")
      }

      if p.PayloadOffset != 20 {
        t.Errorf("wrong payload offset: %d != %d", p.PayloadOffset, 20)
      }

      raw, err := p.Marshal()
      if err != nil {
        t.Error(err)
      } else if !reflect.DeepEqual(raw, rawPkt) {
        t.Errorf("TestBasic marshal: got %#v, want %#v", raw, rawPkt)
      }

      if p.PayloadOffset != 20 {
        t.Errorf("wrong payload offset: %d != %d", p.PayloadOffset, 20)
      }
    })
  }
}

In the basic test, use Golang's built-in Unmarshal to quickly convert byte slices into corresponding structures. Reduce the workload of related packaging, unpacking and other codes. In network transmission, the conversion between big-endian and little-endian codes can also be done directly at the language level, reducing coding troubles.

h.SequenceNumber = binary.BigEndian.Uint16(rawPacket[seqNumOffset : seqNumOffset+seqNumLength])
h.Timestamp = binary.BigEndian.Uint32(rawPacket[timestampOffset : timestampOffset+timestampLength])
h.SSRC = binary.BigEndian.Uint32(rawPacket[ssrcOffset : ssrcOffset+ssrcLength])

Among them, the related operations of slicing are very convenient, and a certain piece of data in the array can be obtained, and the operation is relatively flexible. During the transmission of protocol data, through slicing, a certain piece of data is obtained for corresponding processing.

m := copy(buf[n:], p.Payload)
p.Raw = buf[:n+m]

After the implementation is complete, Golang's subtests can be nested tests. It is especially useful for executing specific test cases. The parent test will only return after the child test is completed.

func TestVP8PartitionHeadChecker_IsPartitionHead(t *testing.T) {
    checker := &VP8PartitionHeadChecker{}
    t.Run("SmallPacket", func(t *testing.T) {
        if checker.IsPartitionHead([]byte{0x00}) {
            t.Fatal("Small packet should not be the head of a new partition")
        }
    })
    t.Run("SFlagON", func(t *testing.T) {
        if !checker.IsPartitionHead([]byte{0x10, 0x00, 0x00, 0x00}) {
            t.Fatal("Packet with S flag should be the head of a new partition")
        }
    })
    t.Run("SFlagOFF", func(t *testing.T) {
        if checker.IsPartitionHead([]byte{0x00, 0x00, 0x00, 0x00}) {
            t.Fatal("Packet without S flag should not be the head of a new partition")
        }
    })
}

More related implementations can go to GitHub ( https://github.com/pion/rtp ) to see the implementation source code.

end

If you manually pay attention to the relevant transmission details, it may consume a lot of time in the bottom layer. At present, there are many related implementation solutions on the market, including open source ones, and some solutions provided by some companies. At present, through industry practice, many companies of Momo and Xiaomi have adopted the SDK of Soundnet to conduct related business hours. Some companies have even handed over their core business to processing, which shows their stability. Individuals have tested their cloud classroom related services, playback, online demonstration and other functions are very convenient, which can save a lot of development time.