Video stream transcoding by Wowza Streaming Engine on a server with Nvidia RTX A4000 and Nvidia Quadro P4000 graphics accelerator.

2023-06-16 19:03:00

Using a graphics processing unit (GPU) for processing and converting a video stream is the most efficient way to create a performance transcoding system. The transcoding process consists of three main operations: video stream decoding, frame scaling and encoding a new, converted stream or several time-synchronized streams with different image quality parameters. Wowza Streaming Engine allows you to use the GPU to perform all or several of these operations, with the ability to use the power of multiple GPUs installed in the computing system. GPUs from several manufacturers are supported: Intel, AMD, Advantech, Nvidia. Using products from each manufacturer has its own advantages, but Nvidia offers the widest selection of devices. Based on Nvidia GPUs from the Quadro/Nvidia RTX line of professional graphics accelerators for workstations, we will create a relatively low-cost but high-performance transcoding system and test it. We will measure system performance for a standard test scenario: converting a 1080p H.264/AAC source stream with a bitrate of 9500kbps (big_buck_bunny_1080p_h264.mov, 24FPS, video file published in a circle) into three streams with different quality parameters for broadcasting to the Internet with an adaptive bitrate. The encoding profiles for the new streams will be:

  1. 1920х1080, 5000 kbits, H.264 Main, AAC 96 kbits
  2. 1280х720, 2000 kbits, H.264 Main, AAC 96 kbits
  3. 640х360, 800 kbits, H.264 Main, AAC 96 kbits

CBR (Constant Bit Rate) mode will be used to encode new streams. We will start collecting test data by launching 8 streams at once, since testing a system with a smaller number of transcoding channels is beyond the scope of our study. Additional threads will be started sequentially, measuring the following performance metrics :

  1. Mpstat CPU utilization
  2. Nvidia-smi dmon decode utilization
  3. Nvidia-smi dmon encode utilization

New streams will be added until messages appear in the Wowza log indicating transcoder skipping frames:

    Video behind filter state change. New state: SKIP1FRAME
    Video behind filter state change. New state: SKIP2FRAME
    Video behind filter state change. New state: SKIP4FRAME
    Video behind filter state change. New state: KEYFRAMESONLY

During the test runs, we got the following results:

Wowza Streaming Engine 4.8.13 GPU Nvidia RTX A4000 (Driver Version: 525.85.05), Ubuntu 18.04.6, Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz (18cores) x 2 , RAM 64GB.

Number of channels CPU% GPU encode % GPU decode % Comments

8

7

60

31

9

8

63

32

10

9

66

36

11

10

72

38

12

11

79

40

13

12

84

43

14

14

93

51

Skip frames


Wowza Streaming Engine 4.8.23 GPU Nvidia RTX A4000 (Driver Version: 525.85.05, nvenc.preset = 2 ), Ubuntu 18.04.6, Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz (18cores) x 2 , RAM 64GB.

Number of channels CPU% GPU encode % GPU decode % Comments

8

7

50

33

9

8

58

38

10

9

58

38

11

11

62

42

12

12

65

44

13

14

68

45

14

15

76

49

15

16

78

51

16

18

85

56

17

20

88

59

Skip frames


Wowza Streaming Engine 4.8.13 GPU Nvidia Quadro P4000 (Driver Version: 525.85.05), Ubuntu 18.04.6, Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz (10cores) x 2, RAM 32GB.

Number of channels CPU% GPU encode % GPU decode % Comments

8

10

65

32

9

11

67

33

10

12

73

36

11

14

82

43

12

16

86

45

13

20

88

50

14

23

92

53

Skip frames


Wowza Streaming Engine 4.8.23 GPU Nvidia Quadro P4000 (Driver Version: 525.85.05, nvenc.preset = 2 ), Ubuntu 18.04.6, Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz (10cores) x 2, RAM 32GB.

Number of channels CPU% GPU encode % GPU decode % Comments

8

10

49

32

9

11

60

33

10

12

62

37

11

14

70

38

12

17

75

41

13

20

79

50

14

22

82

54

15

24

89

58

Skip frames


Сonclusion:

The results of the Wowza Streaming Engine 4.8.13 test on servers with Nvidia A4000 and Nvidia P4000 cards are almost the same, despite different generations of GPU chips. Apparently, this is due to the use of a rather old Nvidia Video Codec SDK 10 in this version of Wowza.

The Wowza Streaming Engine 4.8.23 test showed a significant performance increase for the more recent Nvidia A4000 model. We do not observe the problem with the decrease in the performance of transcoding with the GPU inherent in WSE versions 4.8.14 - 4.8.22 in version 4.8.23. Although it is still stated in the section WSE Known Issues.


Important notes:

Channels need to be started with a slight delay. You can do this by adding the property in VHost.xml block <Root>/<Properties> ( at the very end of the file ):

<Property>
<Name>startupStreamsDelayTime</Name>
<Value>3000</Value>
<Type>Integer</Type>
</Property>

When starting more than 12 streams at once, transcoder performance problems may occur, up to a complete freeze.


Although our test was done with only one GPU card, Wowza supports multiple Nvidia cards in the same server. When using several cards, it is necessary to set up transcoder load balancing between GPU cards (otherwise, decoding and scaling will be performed only on one GPU and system performance will be disappointing). This can be done as follows, in Server.xml block <Root>/<Properties> (at the very end of the file) add:

<Property>
<Name>transcoderVideoLoadBalancerClass</Name>
<Value>com.wowza.wms.transcoder.model.TranscoderVideoLoadBalancerCUDASimple</Value>
<Type>String</Type>
</Property>