Patent classifications
H04N5/2624
FRONT IMAGE GENERATION DEVICE FOR HEAVY EQUIPMENT
Disclosed is a front image generation device for heavy equipment, which generates a composite front image by using two or more cameras. The front image generation device includes: an upper camera disposed on a wheel loader and configured to generate a first front image; a lower camera disposed on the wheel loader and configured to generate a second front image; an image processor configured to generate a composite front image by compositing the first front image and the second front image; and a display configured to display the composite front image generated by the image processor.
DYNAMIC VIDEO LAYOUT DESIGN DURING ONLINE MEETINGS
Presented herein are techniques for cropping video streams to create an optimized layout in which participants of a meeting are a similar size. A user device receives a plurality of video streams, each video stream including at least one face of a participant participating in a video communication session. Faces in one or more of the plurality of video streams are cropped so that faces in the plurality of video streams are approximately equal in size, to produce a plurality of processed video streams. The plurality of processed video streams are sorted according to video stream widths to produce sorted video streams and the plurality of sorted video streams are distributed for display across a smallest number of rows possible on a display of the user device.
Display method and video recording method
Provided are a display method and a video recording method for a user to perceive a position of an extraction range moving within an angle of view. A display method according to an aspect of the present invention includes an acquisition step of acquiring a reference video that is a motion picture, an extraction step of extracting an extraction video set to be smaller than an angle of view of the reference video within the angle of view from the reference video, a movement step of moving an extraction range of the extraction video over time, a first display step of displaying the extraction video on a display device, and a second display step of displaying a support video based on a positional relationship between the angle of view and the extraction range on the display device, in which the second display step is executed during execution of the first display step.
OPTICAL TRACKING DEVICE WITH BUILT-IN STRUCTURED LIGHT MODULE
A system is disclosed that includes an optical tracking device and a surgical computing device. The optical tracking device includes a structured light module and an optical module that includes an image sensor and is spaced from the structured light module at a known distance. The surgical computing device includes a display device, a non-transitory computer readable medium including instructions, and processor(s) configured to execute the instructions to generate a depth map from a first image captured by the image sensor during projection of a pattern into a surgical environment by the structured light module. The pattern is projected in a near-infrared (NIR) spectrum. The processor(s) are further configured to execute the stored instructions to reconstruct a 3D surface of anatomical structure(s) based on the generated depth map. Additionally, the processor(s) are configured to execute the stored instructions to output the reconstructed 3D surface to the display device.
System and Method for Attention Detection and Visualization
The attention level of participants is measured and then the resulting value is provided on a display of the participants. The participants are presented in a gallery view layout. The frame of each participant is colored to indicate the attention level. The entire window is tinted in colors representing the attention level. The blurriness of the participant indicates attention level. The saturation the participant indicates attention level. The window sizes vary based on attention level. Color bars are added to provide indications of percentages of attention level over differing time periods. Neural networks are used to find the faces of the participants and then develop facial keypoint values which are used to determine gaze direction, which in turn is used to develop an attention score. The attention score is then used to determine the settings of the layout.
METHODS AND SYSTEMS OF COMBINING VIDEO CONTENT WITH ONE OR MORE AUGMENTATIONS TO PRODUCE AUGMENTED VIDEO
Data processing systems and methods are disclosed for combining video content with one or more augmentations to produce augmented video. Objects within video content may have associated bounding boxes that may each be associated with respective RGB values. Upon user selection of a pixel, the RGBA value of the pixel may be used to determine a bounding box associated with the RGBA value. The client may transmit an indicator of the determined bounding box to an augmentation system to request augmentation data for the object associated with the bounding box. The system then uses the indicator to determine the augmentation data and transmits the augmentation data to the client device.
AUDIO PROCESSING METHOD AND DEVICE
Embodiments of this application provide an audio processing method and a device. In a multi-channel video recording mode, a plurality of channels of video images and a plurality of channels of audio can be recorded simultaneously, and different audio can be played during video playback. A specific solution is as follows: After detecting an operation of opening a camera by a user, an electronic device displays a shooting preview interface, and then enters a multi-channel video recording mode. After detecting a shooting operation of the user, the electronic device displays a shooting interface, where the shooting interface includes a plurality of channels of video images. Then, the electronic device records the plurality of channels of video images, and records audio corresponding to each of the plurality of channels of video images based on a shooting angle of view corresponding to each channel of video image.
Audio Processing Method and Device
An audio processing method implemented by an electronic device includes entering a multi-channel video recording mode, detecting a shooting operation of a user, simultaneously recording, after detecting the shooting operation, a first video image and a second video image using a first camera and a second camera, and recording audio of a plurality of sound channels, where the audio includes panoramic audio, first audio corresponding to the first video image, and second audio corresponding to the second video image. The electronic device further records the first audio based on a feature value such as a zoom magnification corresponding to the first display area.
Video processing device and video processing method
A video processing device includes a state memory storing a plurality of setting states of each setting related to video processing; a state applying processor configured to apply the setting states to the settings related to the video processing; a history memory setting a series of changes in the settings related to the video processing as a change history of one group and store a plurality of change histories of the one group; a history reproduction processor reproducing the series of changes of the settings; an execution sequence memory configured to store a sequence of the setting states to be applied among the plurality of setting states and the change histories; and a sequencer configured to execute the application of the setting states by the state applying processor and the reproduction of the change histories by the history reproduction processor in the sequence stored in the execution sequence memory.
Enhanced representations based on sensor data
Techniques for generating enhanced representations based on sensor data are described and are implementable in a video conference setting. Generally, the described implementations enable an enhanced representation of a focal individual, for instance a speaker, to be generated based on sensor data, for instance audio and visual sensor data. The audio data can identify an individual as the speaker or determine a general location of a source of audio. Visual sensors can detect gestures of individuals located in the general location of the source of audio to identify gestures which indicate that one or more individuals are speaking or are about to speak.