RECORDING MEETING AUDIO VIA MULTIPLE INDIVIDUAL SMARTPHONES

20220377458 · 2022-11-24

    Inventors

    Cpc classification

    International classification

    Abstract

    A method of providing audio information from a meeting includes receiving a first audio stream from a first input audio device and a second audio stream from a second input audio device during the meeting, identifying a first audio fragment from the first audio stream, and identifying a second audio fragment from the second audio stream. The method also includes compiling the audio fragments from the first and second audio streams into an audio file that includes at least the first audio fragment and the second audio fragment. The method further includes providing the audio file to one or more recipients. The audio file identifies the first audio fragment as corresponding to a first participant of the meeting and the second audio fragment as corresponding to a second participant of the meeting.

    Claims

    1. A method of recording audio information from a meeting, the method comprising: executing a meeting management application, including establishing a plurality of connections with a plurality of audio input devices configured to record audio data; receiving a plurality of audio streams via the plurality of connections during a meeting, the plurality of audio streams including a first audio stream; associating the first audio stream with a first participant; identifying a first audio fragment from the first audio stream; transcribing the first audio fragment to first textual content; and compiling the plurality of audio streams into a storyboard of the meeting, the storyboard including at least the first textual content of the first audio fragment to be displayed in association with the first participant.

    2. The method of claim 1, further comprising: extracting a voice profile from a first audio input device associated with the first participant, wherein the first audio fragment is transcribed based on the voice profile.

    3. The method of claim 1, further comprising: distributing the storyboard of the meeting to a subset of the plurality of audio input devices.

    4. The method of claim 1, wherein the storyboard of the meeting is organized in a chronological order of audio fragments.

    5. The method of claim 1, wherein the storyboard of the meeting further includes a plurality of audio fragments.

    6. The method of claim 1, wherein the storyboard of the meeting further includes at least one of voice annotations and pre-recorded introductions of each active participant.

    7. The method of claim 1, further comprising: receiving one or more user inputs to edit the storyboard; and in response to the one or more user inputs to edit the storyboard, performing an action including one or more of: emphasizing the first audio fragment associated with the first participant; deemphasizing the first audio fragment associated with the first participant; grouping a plurality of audio fragments in the storyboard by topic; grouping audio fragments associated with the first participant in the storyboard; and adding new audio fragments to the storyboard.

    8. The method of claim 1, further comprising: receiving an additional user voice input annotating the first audio fragment, wherein the first audio fragment is automatically identified from the first audio stream in response to the additional user voice input.

    9. The method of claim 1, further comprising: receiving an additional user input annotating the first audio fragment, wherein the first audio fragment is automatically identified from the first audio stream in response to the additional user input.

    10. An electronic device, comprising: one or more processors; and memory storing one or more programs for execution by the one or more processors, the one or more programs including instructions for: executing a meeting management application, including establishing a plurality of connections with a plurality of audio input devices configured to record audio data; receiving a plurality of audio streams via the plurality of connections during a meeting, the plurality of audio streams including a first audio stream; associating the first audio stream with a first participant; identifying a first audio fragment from the first audio stream; transcribing the first audio fragment to first textual content; and compiling the plurality of audio streams into a storyboard of the meeting, the storyboard including at least the first textual content of the first audio fragment to be displayed in association with the first participant.

    11. The electronic device of claim 10, wherein the storyboard of the meeting is organized in a chronological order.

    12. The electronic device of claim 10, wherein the storyboard of the meeting further includes at least one of voice annotations and pre-recorded introductions of each active participant.

    13. The electronic device of claim 10, wherein the first audio stream is recorded by a first audio input device, and a visual signal is provided on the first audio input device to request feedback from the first participant associated with the first audio input device.

    14. The electronic device of claim 13, wherein the first participant responds to the visual signal provided on the first audio input device to confirm whether the first participant is currently speaking.

    15. A non-transitory computer-readable medium storing one or more programs configured for execution by a system, the one or more programs including instructions for: executing a meeting management application, including establishing a plurality of connections with a plurality of audio input devices configured to record audio data; receiving a plurality of audio streams via the plurality of connections during a meeting, the plurality of audio streams including a first audio stream; associating the first audio stream with a first participant; identifying a first audio fragment from the first audio stream; transcribing the first audio fragment to first textual content; and compiling the plurality of audio streams into a storyboard of the meeting, the storyboard including at least the first textual content of the first audio fragment to be displayed in association with the first participant.

    16. The non-transitory computer-readable medium of claim 15, the one or more programs further comprising instructions for: extracting a voice profile from a first audio input device associated with the first participant, wherein the first audio fragment is transcribed based on the voice profile.

    17. The non-transitory computer-readable medium of claim 15, the one or more programs further comprising instructions for: distributing the storyboard of the meeting to a subset of the plurality of audio input devices.

    18. The non-transitory computer-readable medium of claim 15, the plurality of audio streams including a second audio stream, the one or more programs further comprising instructions for: identifying from the second audio stream a second audio fragment, wherein the storyboard includes the second audio fragment; and providing the storyboard to one or more recipients, wherein the storyboard identifies the transcribed first audio fragment as corresponding to the first participant and the second audio fragment as corresponding to a second participant.

    19. The non-transitory computer-readable medium of claim 18, wherein providing the storyboard to the one or more recipients includes replaying the second audio fragment to the one or more recipients.

    20. The non-transitory computer-readable medium of claim 18, the one or more programs further comprising instructions for: maintaining the first audio fragment in a first audio channel associated with the first participant; and maintaining the second audio fragment in a second audio channel associated with the second participant, the first and second audio channels being separate from each other.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0027] Embodiments of the system described herein will now be explained in more detail in accordance with the figures of the drawings, which are briefly described as follows.

    [0028] FIG. 1 is a schematic illustration of speaker identification, according to an embodiment of the system described herein.

    [0029] FIG. 2 is a schematic illustration of speaker channels and of handling double-talk episodes, according to an embodiment of the system described herein.

    [0030] FIG. 3 schematically illustrates storyline compilation, post-meeting annotation and voice-to-text features, according to an embodiment of the system described herein.

    [0031] FIG. 4 is a system flow diagram illustrating system functioning, according to an embodiment of the system described herein.

    DETAILED DESCRIPTION OF VARIOUS EMBODIMENTS

    [0032] The system described herein provides a mechanism for recording meeting audio on multiple individual smartphones of meeting participants, automatic speaker identification, handling double-talk episodes, compiling meeting storyline from fragments recorded in speaker channels, including post-meeting voice annotations, and optional voice-to-text conversion of certain portions of recording.

    [0033] FIG. 1 is a schematic illustration 100 of speaker identification. Meeting participants 110, 120, 130 come to a meeting with personal smartphones 140, 150, 160 (possibly running different mobile operating systems), all capable of audio recording and having software, explained elsewhere herein, installed. Note that, other personal audio input devices may be used in place of one or more of the smartphones 140, 150, 160, such as a tablet, a dedicated recording device, etc. A connection graph 170 is used to identify a current speaker. In FIG. 1, the participant 110 starts speaking and each of the smartphones 140, 150, 160 receives voice signals 170a, 170b, 170c of different volume, as schematically depicted by line thickness decreasing for the smartphones 150, 160 located further from the speaker 110. The system averages signal volumes received by each of the smartphones 140, 150, 160 over short periods of time and builds deltas of average volumes for each edge of the connection graph 170. If a condition min (ΔV)>0 is satisfied, where the minimum is taken for all edges starting at the node 140, then the participant 110 who is an owner of the smartphone 140 is marked as a candidate for an active speaker. Additional conditions may be checked for verification; for example, unique voice characteristics may be extracted from the signal and compared with a stored value of Voice ID on the smartphone 140 of the participant 110, as illustrated by a criteria 180.

    [0034] If all conditions and checks for a current speaker are satisfied, the participant 110 is marked as an active speaker and the smartphone 140 of the participant 110 is marked as a principal recording device and becomes a designated one of the smartphones 140, 150, 160 recording a voice stream of the participant 110, as schematically shown on the screen of the smartphone 140. A channel of the participant 110 is activated (or created if the participant 110 speaks for the first time in the meeting) and a fragment of an audio recording of the participant 110 is added to the channel after a pause or speaker change, as explained elsewhere herein.

    [0035] FIG. 2 is a schematic illustration 200 of storyline compilation, post-meeting annotation and voice-to-text features. Each of the meeting participants 110, 120, 130 has been an active speaker at some time during the meeting; accordingly, channels 210, 220, 230 corresponding to the participants 110, 120, 130 have been created by the system and kept as audio fragments 240 of active speakers. When a fragment of double-talk is identified, the fragment may be recorded on more than one principal device, as illustrated by double-talk fragments 250a, 250b. Even though audio signals recorded by the two principal devices represent the same conversation, the audio signals may not be identical, as explained elsewhere herein and illustrated by audio signal profile functions 260a, 260b. The system may attempt to resolve double talk fragment and retrieve individual fragments assignable to each active speaker channel by applying various filtering techniques 270, such as LMS filtering. If successful, separate fragments 280a, 280b may be added to their respective channels. Otherwise, double-talk recorded on each principal recording device may be added to the corresponding channel, all double-talk fragments may be cross-referenced and switchable between channels.

    [0036] FIG. 3 is a schematic illustration 300 of storyline compilation, post-meeting annotation and voice-to-text features. The three channels 210, 220, 230 of the original meeting participants 110, 120, 130 correspond to the fragments 240, 280a, 350, 280b of audio recording by each active speaker during the meeting. A commenter 310 listens to the recording of the meeting and decides to add voice comments to several fragments. A separate channel 320 is created for the commenter 310 and maintains a voice annotation 330 for a fragment 240 by the participant 110 and another voice annotation 340 for a fragment 280b by the speaker 130, retrieved from a double-talk, as explained elsewhere herein. Voice annotations may also refer to a particular topic or a meeting as a whole.

    [0037] A storyline 350 of a meeting may be compiled from original audio fragments for meeting participants recorded during the meeting, combined with voice annotations and other components, such as pre-recorded introductions of each speaker and organized chronologically, by topics or otherwise. For example, the storyline 350 may be organized in a chronological order of speaker fragments, with the addition of voice annotations immediately after annotated fragments. Such storylines may be distributed as key meeting materials shortly after the end of the meeting.

    [0038] Some of the recorded audio fragments may be converted to text using voice-to-text technologies. In FIG. 3, a fragment 360 by the participant 110 is automatically transcribed. To facilitate voice recognition, a voice profile 370 may be extracted from a device of the participant 110 (or may be stored in the system for the commentator 310) and used by a voice-to-text system 380 to create transcribed text 390.

    [0039] Referring to FIG. 4, a system flow diagram 400 illustrates processing in connection with recording a meeting on multiple individual phones. Note that the processing for the system described herein may be provided by one or more of the smartphones 140, 150, 160, by a smartphone (or similar device) of a non-participant, a separate computing device (e.g., desktop computer), in connection with a cloud service (or similar) coupled to one or more of the smartphones 140, 150, 160, etc.

    [0040] Processing begins at a step 410, where the system establishes connections between smartphones of participants and/or with a local or cloud service run by the system. The system may also ensure that software for the system is running on each smartphone of each participant and that a recording mode on each smartphone is enabled. After the step 410, processing proceeds to a step 415, where a meeting participant speaks. After the step 415, processing proceeds to a step 420, where the system measures average volume of an audio signal over short periods of time and delay of the audio signal on each smartphone, as explained elsewhere herein (see in particular FIG. 1 and the accompanying text). After the step 420, processing proceeds to a step 425, where the system calculates deltas of average volumes received by different smartphones of the participants over a connectivity graph, as explained in more details in conjunction with FIG. 1.

    [0041] After the step 425, processing proceeds to a step 430, where a candidate for the current speaker is detected according to specific criteria, as explained elsewhere herein. After the step 430, processing proceeds to a step 435, where the system runs an additional speaker identification check, as explained in conjunction with FIG. 1 and speaker identification criteria 180 explained elsewhere herein. After the step 435, processing proceeds to a step 440 where the system designates and marks a principal recording smartphone (i.e., the only smartphone that records the audio fragment from the current speaker until conditions change, as explained elsewhere herein). After the step 440, processing proceeds to a test step 445, where it is determined whether double-talk is detected by the system. If so, processing proceeds to a step 450, where the system marks the starting time stamp of a double-talk fragment and designates principal smartphones for recording the fragment. After the step 450, processing proceeds to a test step 455, where it is determined whether any of the speakers stopped talking (note that the test step 455 can be independently reached from the test step 445 if double-talk has not been identified). If not, processing proceeds to a step 460 where the system and the principal recording smartphone(s) continue capturing the current fragment.

    [0042] After the step 460, processing proceeds back to the test step 455. If it was determined at the test step 455 that any of the current speakers stopped talking, processing proceeds to a step 465, where a recorded speaker fragment from principal smartphones is added to the corresponding speaker channels, as explained elsewhere herein (see FIGS. 2, 3 and the accompanying text). After the step 465, processing proceeds to a test step 470, where it is determined whether any speaker is talking. If so, processing proceeds back to the step 460 for continued recording of a fragment; otherwise, processing proceeds to a test step 475, where it is determined whether the meeting is over. If not, processing proceeds back to the step 425 for continued speaker identification and recording of the meeting by audio fragments; otherwise, processing proceeds to a step 480 where the system attempts filtering out double-talk background from principal fragments or split double-talk into speaker channels, as explained elsewhere herein (see FIG. 2 and the accompanying text).

    [0043] After the step 480, processing proceeds to a step 485, where certain fragments may be optionally transcribed to text, as explained elsewhere herein, in particular, in conjunction with FIG. 3. After the step 485, processing proceeds to a step 490, where voice annotations may optionally be added by meeting participants or other user of the system, as explained elsewhere herein. After the step 490, processing proceeds to a step 495, where audio storyboards of the meeting are compiled and distributed, as explained elsewhere herein. After the step 495, processing is complete.

    [0044] Various embodiments discussed herein may be combined with each other in appropriate combinations in connection with the system described herein. Additionally, in some instances, the order of steps in the flowcharts, flow diagrams and/or described flow processing may be modified, where appropriate. Subsequently, elements and areas of screen described in screen layouts may vary from the illustrations presented herein. Further, various aspects of the system described herein may be implemented using software, hardware, a combination of software and hardware and/or other computer-implemented modules or devices having the described features and performing the described functions. Smartphones functioning as audio recording devices may include software that is pre-loaded with the device, installed from an app store, installed from a desktop (after possibly being pre-loaded thereon), installed from media such as a CD, DVD, etc., and/or downloaded from a Web site. Such smartphones may use operating system(s) selected from the group consisting of iOS, Android OS, Windows Phone OS, Blackberry OS and mobile versions of Linux OS.

    [0045] Software implementations of the system described herein may include executable code that is stored in a computer readable medium and executed by one or more processors. The computer readable medium may be non-transitory and include a computer hard drive, ROM, RAM, flash memory, portable computer storage media such as a CD-ROM, a DVD-ROM, a flash drive, an SD card and/or other drive with, for example, a universal serial bus (USB) interface, and/or any other appropriate tangible or non-transitory computer readable medium or computer memory on which executable code may be stored and executed by a processor. The software may be bundled (pre-loaded), installed from an app store or downloaded from a location of a network operator. The system described herein may be used in connection with any appropriate operating system.

    [0046] Other embodiments of the invention will be apparent to those skilled in the art from a consideration of the specification or practice of the invention disclosed herein. It is intended that the specification and examples be considered as exemplary only, with the true scope and spirit of the invention being indicated by the following claims.