VIDEO PROCESSING METHOD AND ASSOCIATED SYSTEM ON CHIP
20230237621 · 2023-07-27
Assignee
Inventors
Cpc classification
G06V10/25
PHYSICS
G06T3/4053
PHYSICS
International classification
G06T3/40
PHYSICS
G06V10/25
PHYSICS
Abstract
The present invention provides a SoC including a recognition circuit and a processing circuit. The recognition circuit is configured to obtain image data from an image capturing device, and perform a recognition operation on the image data to generate a recognition result. The processing circuit is coupled to the recognition circuit, and is configured to determine a ROI in the image data according to the recognition result, perform image enhancement operation on the ROI to generate an enhanced region, and combine the enhanced region with the image data to generate processed image data.
Claims
1. A system on chip (SoC), comprising: a recognition circuit, configured to obtain image data from an image capturing device, and perform a recognition operation on the image data to generate a recognition result; and a processing circuit, coupled to the recognition circuit, configured to determine a region of interest (ROI) in the image data according to the recognition result, perform image enhancement operation on the ROI to generate an enhanced region, and combine the enhanced region with the image data to generate processed image data.
2. The SoC of claim 1, wherein the processing circuit covers the enhanced region to a specific area of the image data to generate the processed image data.
3. The SoC of claim 2, wherein the specific area does not overlap the ROI.
4. The SoC of claim 2, wherein the processing circuit performs an enlargement operation and a resolution enhancement operation on the ROI to generate the enhanced region.
5. The SoC of claim 1, wherein the recognition circuit is a person recognition circuit, the person recognition circuit performs a person recognition operation on the image data to generate the recognition result, and the SoC further comprises: a sound detection circuit, configured to receive a plurality of sound signals from a plurality of microphones, and detect a position/direction of a main sound to generate a sound detection result; wherein the processing circuit determines a region where a speaker is located in the image data according to the recognition result and the sound detection result, as the ROI.
6. The SoC of claim 5, wherein the recognition result comprises a plurality of regions, each region comprises a person, and the processing circuit refers to the recognition result and the sound detection result to select one of the regions to serve as the ROI.
7. The SoC of claim 1, wherein the SoC is used in an electronic device, and the processed image data is transmitted from the electronic device to another electronic device via network.
8. A video processing method, comprising: obtaining image data from an image capturing device, and performing a recognition operation on the image data to generate a recognition result; determining a region of interest (ROI) in the image data according to the recognition result; performing image enhancement operation on the ROI to generate an enhanced region; and combining the enhanced region with the image data to generate processed image data.
9. The video processing method of claim 8, wherein the step of combining the enhanced region with the image data to generate the processed image data comprises: covering the enhanced region to a specific region of the image data to generate the processed image data.
10. The video processing method of claim 8, wherein the step of performing the recognition operation on the image data to generate the recognition result is to perform a person recognition operation on the image data to generate the recognition result, and the video processing method further comprises: receiving a plurality of sound signals from a plurality of microphones, and detecting a position/direction of a main sound to generate a sound detection result.
11. A system on chip (SoC) positioned in an electronic device, comprising: a recognition circuit, configured to obtain image data from an image capturing device, and perform a recognition operation on the image data to generate a recognition result; and a processing circuit, coupled to the recognition circuit, configured to determine whether an operating state of the electronic device or characteristics of the image data meet a condition; and if the condition is met, the processing circuit determines a region of interest (ROI) in the image data according to the recognition result, performs image enhancement operation on the ROI to generate an enhanced region, and combines the enhanced region with the image data to generate processed image data.
12. The SoC of claim 11, wherein if the condition is not met, the processing circuit does not perform the image enhancement operation on the ROI to generate the enhanced region.
13. The SoC of claim 11, wherein the processing circuit determines whether the operating state of the electronic device or the characteristics of the image data meet the condition according to a network speed or throughput of the electronic device, or a resolution of the image data.
14. The SoC of claim 13, wherein when the network speed or the throughput of the electronic device is lower than a threshold or within a range, the processing circuit determines that the operating state of the electronic device or the characteristics of the image data meet the condition.
15. The SoC of claim 13, wherein when the resolution of the image data is lower than a threshold or within a range, the processing circuit determines that the operating state of the electronic device or the characteristics of the image data meet the condition.
16. The SoC of claim 13, wherein when the network speed or the throughput of the electronic device is greater than a threshold, and the resolution of the image data is greater than another threshold, the processing circuit determines that the operating state of the electronic device or the characteristics of the image data does not meet the condition.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0008]
[0009]
[0010]
[0011]
[0012]
[0013]
DETAILED DESCRIPTION
[0014]
[0015] As described in the prior art, the image frame transmitted by the electronic device 110 or the electronic device 120 may have poor quality due to the camera, the codec circuit or the network transmission speed, thus causing trouble for both parties. For example, when one party in the remote video conference includes multiple participants in the video, if the image quality is poor, it may sometimes be difficult for the other party's participants to know who is speaking in the video. Therefore, a system on chip (SoC) in the electronic device 110 of this embodiment provides a method that can enhance a region of interest (ROI) in the image, for example, to highlight the person speaking in the image, so that the participant in the second conference room can clearly know which participant is speaking in the first conference room, to solve the above problems.
[0016]
[0017] In the SoC 200, the person recognition circuit 210 is used to identify person in the image data received from the image capturing device 202, to determine whether there is a person in the received image data, and to determine the characteristic value of each person and the position/region of each person in the image. Specifically, the person recognition circuit 210 can use a deep learning or neural network module to process each frame in the image data, such as using multiple different convolution filters to perform convolution operations on the frame (image frame) to identify whether there is a person in the frame. In addition, for the detected persons, a characteristic value of each person (or, a characteristic value of the region where each person is located) is determined by a previously used deep learning or neural network module, where the characteristic value can be represented as a multidimensional vector, such as a vector with dimension ‘512’. It is noted that the above-mentioned circuit design related to person recognition is well known to a person skilled in the art, and one of the main features of this embodiment is the applications of the person identified by the person recognition circuit 210 and the characteristic value thereof, so other details of the person recognition circuit 210 are not described here.
[0018] The voice activity detection circuit 220 is used to receive the sound signals from the microphones 204_1-204_N, and to determine whether there are voice components in the sound signals. Specifically, the voice activity detection circuit 220 can mainly perform the following operations: perform noise reduction operation on the received sound signal, convert the sound signal into a frequency domain, and process blocks to obtain characteristic values; and the characteristic value of is compared with a reference value to determine whether the sound signal is a voice signal. It is noted that since the related circuit design of the voice activity detection circuit 220 is well known to a person skilled in the art, and one of the main features of this embodiment is to perform the follow-up operations according to the determination result of the voice activity detection circuit 220, so other details of the voice activity detection circuit 220 will not be described here. In another embodiment, the voice activity detection circuit 220 can only receive the sound signals from some of the microphones 204_1-204_N, and does not need to receive the sound signals from all the microphones 204_1-204_N.
[0019] Regarding the operation of the sound direction detection circuit 230, since the positions of the microphones 204_1-204_N on the electronic device 110 are known, the sound direction detection circuit 230 can determine an azimuth of the main sound in the first conference room according to a time difference of the sound signals from the microphones 204_1-204_N (that is, phase differences between the received sound signals). That is, the sound direction detection circuit 230 determines direction and angle of the main speaker relative to the electronic device 110. In this embodiment, the sound direction detection circuit 230 can only determine one direction, that is, if there are multiple people talking at the same time in the first conference room, it will be determined from which direction the main sound comes from according to some characteristics (e.g., signal strength) of the multiple received sound signals. It is noted that since the related circuit design of the sound direction detection circuit 230 is well known to a person skilled in the art, and one of the main features of this embodiment is to perform the follow-up operations according to the detection result of the sound direction detection circuit 230, so other details of the sound direction detection circuit 230 will not be described here.
[0020]
[0021] In Step 308, the processing circuit 240 determines which person in the image (image frame) is speaking by using the regions where each person is located in the frame determined by the person recognition circuit 210 (for example, the regions 410-430 in
[0022] In step 312, the processing circuit 240 combines the enhanced region 420′ with the frame 400, for example, the enhanced region 420′ directly covers an area of the frame 400 to generate a processed frame 500. In one embodiment, the processing circuit 240 may directly cover the enhanced region 420′ to a specific position of the frame 400, such as the lower left corner or the lower right corner, to generate the processed frame 500. In another embodiment, the processing circuit 240 may detect an area in the frame 400 where no person appears, and directly cover the enhanced area 420′ to the area in the frame 400 to generate the processed frame 500, so as to prevent the enhanced area 420′ from covering the person(s) in the regions 410-430.
[0023] In step 314, the processing circuit 240 transmits the processed image data (i.e., the processed frame 500) to a back-end circuit for other image processing, and then the processed image data is transmitted to the electronic device 120 located in the second conference room through the network, so that the participants in the second conference room can clearly know who is currently speaking in the first conference room. In this embodiment, the electronic device 120 located in the second conference room only receives the processed image data (i.e., the processed frame 500), but does not receive the original image data (i.e., the frame 400).
[0024] In one embodiment, the processing circuit 240 continues to track the previously enhanced region, and continues to process the image data from the image capturing device 202 to generate the processed image data. Specifically, the person recognition circuit 210 can continuously determine the region where each person is located in the frame and its characteristic value, and the processing circuit 240 can continue to highlight this person in the following frames according to the characteristic value of the previously highlighted person. Taking the region 420 in
[0025] As shown in the flowchart of
[0026] In one embodiment, the SoC 200 may determine whether to use the video processing method shown in
[0027] In Step 604, because the network speed is slow or the resolution of the image data is low, the SoC chip 200 executes Steps 302-314 shown in
[0028] In Steps 606 and 608, because the network speed is high and/or the resolution of the image data is high, the SoC 200 does not perform Steps 302-314 shown in
[0029] In the embodiment shown in
[0030] It should be noted that, in the embodiments shown in
[0031] Those skilled in the art will readily observe that numerous modifications and alterations of the device and method may be made while retaining the teachings of the invention. Accordingly, the above disclosure should be construed as limited only by the metes and bounds of the appended claims.