What are speaker tracking and auto framing?

by Nexvoo Shop October 13, 2021

Whether it is working from home during the COVID or having business meetings with people from other countries, distance is no longer a restriction - thanks to advanced video conferencing equipment. Video conferencing cameras allow us to easily get connected and make communication more efficient.

Using video conferencing cameras with AI technology, every meeting participant in the room can be seen and heard clearly. No one needs to worry about the improper framing to include either too many or too few people. This is because the AI-Powered video conferencing equipment we use has speaker tracking and auto framing functions.

Video conferencing cameras can accurately capture and track the speaker in time no matter where he/she locates, which is the result of these 2 functions working together. In this article, we will explain in detail what is speaker tracking and auto framing video conference camera.

What is speaker tracking?

Speaker tracking means that the camera will identify the person who is currently speaking in the room, and keep focusing on him until he stops. The speaker tracking function makes the video conference smoother and every participant can see the expression and body language of the speaker clearly, just like he is talking in front of you.

What is speaker tracking doing is that when the speaker stops talking, the camera will continue to focus on him for a few seconds until it captures a new speaker, and the camera will switch the focus to the new speaker. If no one speaks for a while, the camera will automatically zoom out and frame the entire team or meeting room.

What is speaker tracking required to work? The beamforming microphones and cameras need to work together for optimum results. The microphone firstly detects the source of the sound and guides the camera to focus there. Then the camera will pan, tilt, and zoom physically or digitally according to the location of the sound source. No matter how far or close you are to the camera, you can have the total focus in the room when you talk.

The video conference cameras we used in the past can also capture the entire conference room and all the participants, but the far-away participants will appear to be very small on the screen and can’t be seen and heard by the other side behind the screen. This is what speaker tracking with AI video functions has resolved with current technology. The situation is very common especially in medium and large conference rooms. Speaker tracking video conference cameras can solve this problem, allowing all participants to experience a smooth and cinema-level video conference improves work efficiency and reduces communication costs.

If you can’t imagine how this function works based on the just text description, let me introduce a cost-effective product on the market as an example. Nexvoo N110 is an all-in-one video bar equipped with a 4K UHD camera, a 120-degree FOV, and a 6-meter pickup distance.

The N110 utilizes advanced AI algorithms to achieve intelligent speaker tracking, ensuring everyone in the meeting room can be well captured and included. More delicate details and textured images make you enjoy more during the meeting.

What is auto framing?

Auto framing combines face recognition, a composition algorithm based on the principle of thirds, and pixel-wise super-resolution to create a portrait with the best composition from the image with the detected face. The auto framing function automatically recognizes all participants in the conference room and adjusts the camera based on the quantity and location of participants using real-time face detection and location tracking to cover every participant in the conference.

For example, when you use the Nexvoo N109, participants can be detected when they enter or leave the judgment area. If you are alone in the meeting room, the auto framing function will keep the camera focusing on you, and follow you once you change your position to keep you in the center of the frame. When you move from the front of the camera to another corner of the meeting room, the camera will change the focal length to capture you in the corner and frame you in the center of the screen.

If a new participant joins, the camera will zoom out to include the new participant. This will avoid moving the camera which is already all set just to make the view more complete with all participants. The auto-framing function increases the level of auto-focus. When you are in a video conference, you will be free from worrying about adjusting camera angles and focus on communication.

The auto framing video conference equipment function can be applied to meeting rooms of various sizes. In medium and large conference rooms, when the number of participants reaches a certain level, the auto framing function will increase its recognition range to capture more participants.

You can't force all participants to be completely fixed in one position and keep a fixed distance from the camera, so you need the auto framing function of a speaker tracking video conference equipment to save you from this embarrassing situation. In this case, the camera needs to have a longer focal length and a wider wide-angle to frame the participants in all positions in the field of view as much as possible. For example, Nexvoo N120, a dual-cam video bar with a 6-meter long focal length and a 120-degree FOV can make video conferences in medium or large conference rooms go smoothly.

Speaker tracking and auto framing working together to achieve intelligent tracking

Although the speaker tracking and the auto framing functions are in different principles, they are not completely independent and can work collaboratively. Auto framing function enables an instant capture of people in the conference room and detects their real-time location, and based on this, the speaker tracking function can identify the speaker in real-time and enlarge the frame and center the speaker in the view. Just like Nexvoo N110 and N120, both have speaker tracking and auto framing.

The N120 is equipped with dual cameras, a long focal length, and an ultra-wide FOV. It can also automatically optimize the brightness according to the light conditions, effectively avoiding the meeting room from being too dark. The negative effect brought about allows the detailed actions of all participants in the conference room to be more clearly seen by the other party.