In this series of articles, we shall discuss one of my old projects. During that time, I had a consulting company working in IT, and this project was part of my initial steps in cybersecurity. The project started around the middle of 2015 and ceased to exist at the end of 2016. It is in body cameras, and actually, it was a competition to systems such as Axon Body 3 camera. During the lifecycle of this project, Axon cameras did not support LTE-based streaming.

The team around the project and I managed to produce a working prototype of the system, and in this series, I shall present to you how we implemented the prototype. At the end of the articles, I shall show you the actual budget for doing this prototype and analyze why it was unsuccessful. 

The topic of this part will be an analysis of the advantages and disadvantages of the current video streaming network protocols. We shall start with the standard video streaming protocols, and at the end of the article, we shall discuss our modified, more secure protocol.

There are multiple different protocols for video streaming. Part of them do not support encryption, and we shall focus ourselves on those which support it.

RTMPe

Real-Time Messaging Protocol or RTMP is used to stream multimedia data – audio and video – between Flash Media Server and Flash Player. The chief utility of the RTMP stream is in the optimization of the audio and video data transfer between the server and player.

Encrypted RTMP (RTMPE) wraps the RTMP stream session in a lightweight encryption layer. Through Encrypted RTMPE, the streaming protocol provides low-level stream encryptions for high-traffic sites. RTMPE uses the Anonymous Diffie-Hellman key exchange method. In this algorithm, two parties – the media server and the flash player – establish a shared secret key over an insecure channel.

The standard RMTP protocol uses TCP, and RTPMe uses an encryption model based on a shared secret.

HTTP Live Streaming Encryption Methods

While the HLS supports AES-128 encryption, there are two different ways to implement the standard in practice.

Broadcasters can use one key to encrypt the entire video stream, but that also means the whole stream is unprotected if an unauthorized third party intercepts the secret key.

Alternatively, each segment of a stream can be encrypted with a different key. That way, users can access only a few seconds of video with each specific key. Broadcasters might choose this method if the video content their sharing is highly sensitive.

As it comes from its name, HTTP Streaming uses HTTP to resemble MPEG-DASH. It works by breaking the overall stream into a sequence of small HTTP-based file downloads, each downloading one short chunk of a broad, potentially unbounded transport stream. A list of available streams, encoded at different bit rates, is sent to the client using an extended M3U playlist. HTTP is a TCP-based protocol, as well.

MPEG DASH Encryption

Dynamic Adaptive Streaming over HTTP (DASH), also known as MPEG-DASH, is an adaptive bitrate streaming technique that enables high-quality streaming of media content over the Internet delivered from conventional HTTP web servers. Similar to Apple’s HTTP Live Streaming (HLS) solution, MPEG-DASH works by breaking the content into a sequence of small segments, which are served over HTTP. Each piece contains a short interval of playback time of content that is potentially many hours in duration, such as a movie or the live broadcast of a sports event.

MPEG DASH supports a Common Encryption mode (CENC), which Bento4 implements. Encrypted MPEG DASH presentations should also include the proper signaling in the MPD to inform the player of what DRM(s) can be used to obtain the decryption keys for the streams. An MPD can contain DRM signaling for several DRMs (either just one or multiple entries if the same stream can reach players with different DRM technologies).

Again MPEG Dash is based on HTTP, aka TCP. In that case, DRM encryption is usually based on a public, private key encryption scheme.

On the diagram, you can see a standard AVI container. The video data objects are x264/h264 frames, which most of the streaming protocols encrypt, encode, and stream.

Our Modified Streaming Protocol

As you can see from the upper paragraphs, every standard encryption protocol was designed to stream data from a centralized server to a list of devices. Most of them use the traditional HTTP delivery networks to speed up their streaming. In our case, we had an entirely different problem. We had to stream encrypted content from multiple body cameras to a centralized server and, after that, restream the video from the server to a web browser-based dashboard. LTE networks can be quite fast when you have proper coverage, but when your signal drops, your network speed drops significantly, as well. So we decided to design our video streaming protocol, and I shall list our requirements:

  • Based on UDP: Sending TCP data through LTE can hurt your performance a lot. That’s the reason we decided to establish our protocol on UDP and to implement packet control.
  • Based on X264: X264 is an open-source implementation of the H.264 protocol. It is already implemented in most Android devices and is supported natively. The encoding rate is reasonable.
  • Codec agnostic: In the future, we wanted to support H.265 and its open-source implementation. Thus the protocol had to be code agnostic.
  • To use hybrid encryption: Most of the listed protocols do not use a hybrid encryption approach. We wanted our protocol to have better authentication and encryption mechanism, and that’s why we decided to use hybrid-based encryption on top of RSA and AES-GCM. We changed the keyphrase and IV for AES on every packet frame sent to implement the encryption correctly.
  • Binary-based: Keeping in mind that LTE is usually sold using monthly plans. These plans are generally only a couple of gigabytes. So we ended up making a binary-based protocol. Any other protocols, and especially the semantic-based ones, would result in more significant data consumption.
  • Adaptive Bitrate: The LTE network bandwidth depends on how strong a radio signal your device has. The weaker the signal, the lower the bandwidth. We had to implement an adaptive bitrate strategy, which lowered the resolution in a weaker signal. This way, you could receive frames no matter how strong is your LTE cell signal.

Our proof of concept implementation managed to fulfill these requirements. The finished network protocol was fast enough and binary compatible. It supported adaptive bitrate and was code agnostic. 

On the diagram, you can see a sample datagram of this protocol. The MTU was 1500 bytes to support all kinds of equipment, but not only with jumbo frames.

We used an UUIDv4 and RSA signature for authentication purposes. After that, you have multiple fields as a counter in the index, date, packet size, and an array of bytes. The implementation stripped down an h.264 frame to multiple UDP packets and sent them together. The server combined them back to the h.264 packet and appended them to corresponding files. 

We saw that it is better to have adaptive logic on the codec level during our tests for the protocol. For example, a simple JPEG stream was much better when the signal was weaker.

In the next part, we shall discuss how we created our body camera device and its software. We shall discuss our streaming server implementation in the final third part, give you a budget, and explain why the whole business model did not work as expected.

Next part is here