Recently, our team need to deal with some image processing / object tracking task. One of challenge is using low cost platform to perform huge amount of computing.
Thus, during my free time, I am trying to figure out how image is processed in Nvidia’s platform and how can we accelereate this process. The following content is part of my note for this small personal project.
What do you need for this exercise?
- Nvidia Jetson Nano developer kit: LINK
- A high-speed micro SD card with a card reader.
- 5V, 4A power supply for Jetson Nano: You should be fine with it: LINK
- A screen with HDMI input.
- A keyboard and a computer mouse.
- 4K MP4 video example from here: LINK
Please rename it “sample2.mp4” for the later exercise.
Let’s also make sure it is a 4K video with H.264 encoded.
What you should finish before this exercise?
Install the CUDA supported version of OpenCV:
Tutorial: Install OpenCV 4.5.0 on Jetson Nano
This part will take a couple of hours to complete. Make sure you finish it before you jump into this exercise. If you install the normal OpenCV version without CUDA supported, it will not meet this exercise.
After you completed the installation process, make sure you reboot and check the OpenCV in Jtop. It should display compiled with CUDA: YES as the following picture.
What happens when you try to play a 4K video with a built-in video player in Jetson Nano?
When you try to play the example video with a built-in video player, you might feel the image is frozen and very laggy. I guess it is because the video player is using the “Software decoding” technique instead of using the HW decoder of Jetson Nano.
First example: OpenCV with CUDA but no HW decoder.
Let’s name this file VideoPlay.py first. This script will need 2 libraries. One is OpenCV(cv2) and another one is time for calculating the FPS.
First, you have to tell the OpenCV which mp4 file and we are using the VideoCapture function in this case.
cap = cv2.VideoCapture('sample2.mp4')
In order to have a whole video picture, we will need to resize it to a reasonable resolution. Since my screen has 1920*1080 resolution, we will resize the video to 1280*720.
New=cv2.resize(frame,(1280,720),fx=0,fy=0, interpolation = cv2.INTER_CUBIC)
Furthermore, we also like to see the FPS information in real-time during playing the video. So, let’s put it on.
We also place a timestamp before and after the image is processed and display so we can calculate the time past and calculate the FPS.
In this case, we will use time() in python. It will return second since 1970/1/1 00:00:00 with fraction part. Something will look like this: 1612930094.2901552.
start = float(time())
Then, we can calculate the FPS and display it in the NEXT frame.
We also record the start time of the script and the end time of the script. By counting how many frames we process and how much time it passes, we will have an average FPS. And we can display it in the Terminal.
Putting all together will be:
This will be the full example of using Python OpenCV with CUDA support. We will expect it should be able to decode and play 4K video with low FPS.
Compare with using a built-in video player in which the screen is not moving at all, 10 FPS is OK but still feel a little terrible. But original video should be 30FPS according to the file description.
As you can see the NVDEC is OFF which means the Pyhton script still NOT using HW to decode the mp4 file.
During this study, the next step coming to my mind instantly is how can we use this HW decoder to accelerate it?
Second example: OpenCV with CUDA + HW decoder.
Lucky, I found an example exactly the same as what I want to achieve.
The major difference between this code and the previous example is:
As far as I know, it is using the GStreamer pipeline technique to decode the file. Let’s talk about it later.
Replacing the file read method and save it to VideoPlay2.py
Run the script again:
Also, check the Jtop information to confirm the HW decoder is working.
COOL! The HW decoder is actually working and we can see the FPS boost from 10 to around 14~15. That is a great improvement.
The max FPS without HW decoding normally will be 15. On the other side, once HW decoding is enabled it will reach up to 25 in some scenes.
Third example: OpenCV with CUDA + HW decoder but resize in Gstreamer
When I using GStreamer command to direct show the image, it tells me actually Gstreamer can display 4K videos with resizing to 1280*720 with 30 FPS. So, I decide to let Gstreamer resize the video instead of OpenCV.
Change the command in python into
And delete the OpenCV resize part. The python code will become:
The FPS becomes around 30 FPS which matches the video file description. So OpenCV in did is the bottleneck of the previous example.
Also, check the Jtop, the GPU utilization is slightly increasing and CPU usage is decreasing which is good. And we can tell the HW decoder is actually working too.
Also check the average FPS count, which does match with what we expected.
In this case, we are using the GStreamer framework which is an alternative solution of Multimedia API (MMAPI) to assist us to accelerate the decoding process. Using existing hardware is a good approach since normally HW decoding is normally much faster than CPU in Jetson nano’s case.
If there is a hardware acceleration solution build-in in your system.