CONTROLLING PLAYBACK OF AUDIO DATA
20230096846 · 2023-03-30
Assignee
Inventors
Cpc classification
International classification
Abstract
Playback of audio data is controlled by: receiving a speech signal to be conveyed to a user simultaneously with playback of the audio data. Volume and/or spectral appearance of selected elements of the audio data are then modified to obtain adjusted audio data, and the adjusted audio data is played back. The received speech signal may then be played back simultaneously with the adjusted audio data.
Claims
1. A method for controlling playback of audio data, the method comprising: receiving a speech signal to be conveyed to a user simultaneously with playback of the audio data; modifying volume and/or spectral appearance of selected elements of the audio data to obtain adjusted audio data; and playing back the adjusted audio data; wherein the selected elements of the audio data are modified by attenuating or removing vocal content in a foreground track of the audio data independently from processing of a background track of the audio data.
2. The method according to claim 1, further comprising playing back the received speech signal simultaneously with the adjusted audio data.
3. The method according to claim 2, wherein playing back the received speech signal is delayed based on meta information associated with the audio data and/or or based on external data.
4. The method according to claim 3, wherein the speech signal (SP) to be conveyed is an announcement, a notification, verbal information of a voice call, or verbal information uttered by another user.
5. The method according to claim 4, wherein the selected elements of the audio data are further adjusted by filtering background music.
6. The method according to claim 5, wherein the selected elements of the audio data are further adjusted by looping a segment of the audio data.
7. The method according to claim 6, wherein a playback speed of the speech signal is adapted to a duration of the looped segment.
8. The method according to claim 7, wherein a duration of the adjusted audio data is dependent on a duration of the received speech signal.
9. The method according to claim 8, wherein the duration of the adjusted audio data allows for a feedback by the user following the speech signal.
10. A motor vehicle having an apparatus for controlling playback of audio data, the apparatus comprising: a receiving unit configured to receive a speech signal to be conveyed to a user simultaneously with playback of the audio data; a modifying unit configured to modify volume and/or spectral appearance of selected elements of the audio data to obtain adjusted audio data; and a playback unit configured to play back the adjusted audio data; wherein the selected elements of the audio data are modified by attenuating or removing vocal content in a foreground track of the audio data independently from processing of a background track of the audio data.
11. The motor vehicle of claim 10, wherein the apparatus for controlling playback of audio data, is further configured for: playing back the received speech signal simultaneously with the adjusted audio data.
12. The motor vehicle of claim 11, wherein playing back the received speech signal is delayed based on meta information associated with the audio data and/or or based on external data.
13. The motor vehicle of claim 12, wherein the speech signal (SP) to be conveyed is an announcement, a notification, verbal information of a voice call, or verbal information uttered by another user.
14. The motor vehicle of claim 13, wherein the selected elements of the audio data are further adjusted by filtering background music.
15. The motor vehicle of claim 14, wherein the selected elements of the audio data are further adjusted by looping a segment of the audio data.
16. The motor vehicle of claim 15, wherein a playback speed of the speech signal is adapted to a duration of the looped segment.
17. The motor vehicle of claim 16, wherein a duration of the adjusted audio data is dependent on a duration of the received speech signal.
18. The motor vehicle of claim 17, wherein the duration of the adjusted audio data allows for a feedback by the user following the speech signal.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0040]
[0041]
[0042]
[0043]
[0044]
[0045]
DETAILED DESCRIPTION
[0046] The present description illustrates the principles of the present disclosure. It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described or shown herein, embody the principles of the disclosure.
[0047] All examples and conditional language recited herein are intended for educational purposes to aid the reader in understanding the principles of the disclosure and the concepts contributed by the inventor to furthering the art and are to be construed as being without limitation to such specifically recited examples and conditions.
[0048] Moreover, all statements herein reciting principles, aspects, and embodiments of the disclosure, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure.
[0049] Thus, for example, it will be appreciated by those skilled in the art that the diagrams presented herein represent conceptual views of illustrative circuitry embodying the principles of the disclosure.
[0050] The functions of the various elements shown in the figures may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared. Moreover, explicit use of the term “processor” or “controller” should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor (DSP) hardware, systems on a chip, microcontrollers, read only memory (ROM) for storing software, random access memory (RAM), and nonvolatile storage.
[0051] Other hardware, conventional and/or custom, may also be included. Similarly, any switches shown in the figures are conceptual only. Their function may be carried out through the operation of program logic, through dedicated logic, through the interaction of program control and dedicated logic, or even manually, the particular technique being selectable by the implementer as more specifically understood from the context.
[0052] In the claims hereof, any element expressed as a means for performing a specified function is intended to encompass any way of performing that function including, for example, a combination of circuit elements that performs that function or software in any form, including, therefore, firmware, microcode or the like, combined with appropriate circuitry for executing that software to perform the function. The disclosure as defined by such claims resides in the fact that the functionalities provided by the various recited means are combined and brought together in the manner which the claims call for. It is thus regarded that any means that can provide those functionalities are equivalent to those shown herein.
[0053]
[0054]
[0055] The receiving unit 22, the modifying unit 23, and the playback unit 24 may be controlled by a control module 25. A user interface 28 may be provided for enabling a user to modify settings of the receiving unit 22, the modifying unit 23, the playback unit 24, and the control module 25. The receiving unit 22, the modifying unit 23, the playback unit 24, and the control module 25 can be embodied as dedicated hardware units. Of course, they may likewise be fully or partially combined into a single unit or implemented as software running on a processor, e.g. a CPU or a GPU.
[0056] A block diagram of a second embodiment of an apparatus 30 according to the invention for controlling playback of audio data is illustrated in
[0057] The processing device 32 as used herein may include one or more processing units, such as microprocessors, digital signal processors, or a combination thereof.
[0058] The local storage unit 26 and the memory device 31 may include volatile and/or non-volatile memory regions and storage devices such as hard disk drives, optical drives, and/or solid-state memories.
[0059]
[0060]
[0061]
[0062] In case of an announcement, however, the audio data from the different media sources is processed in the media audio domain. For example, a stem-based or object-based audio file may be provided to a mixer, which outputs a background track and a foreground track, e.g. vocal content, to a media playback control block. Alternatively, a classic audio file or live audio may be provided to a content aware audio separator or a classic filter for processing. This audio separator or filter then outputs a background track and a foreground track to the media playback control block. The media playback control block provides the background track to a filter FIL, e.g. for reducing spectral components that are potentially impeding comprehensibility. The foreground track is provided to a gain control GC for attenuation or removal.
[0063] Meta information, which may be associated with the audio data, obtained from an auxiliary meta data source, or derived by a content analysis and detection block, is provided to an optional placement and loop information block, which determines an appropriate placement of announcement as well as a segment of the audio data that may be repeated in a loop, if such a segment should be available. The placement and loop information block provides the respective information to an announcement playback control block. The announcement playback control block initiates operation of the media playback control block, the filter FIL, and the gain control GC with the appropriate timing, and provides the audio signal of the announcement to a combiner. The placement of the announcement may also be influenced by external data. Metrics for this can be driving situations, which may be derived by evaluating map data, traffic data, or sensor data, or information on stress or attention of the user. The combiner combines this audio signal with the adjusted foreground track and the adjusted background track, and provides the combined audio signal to the selector SEL. The selector SEL is controlled by the announcement playback control block in such way that the combined audio signal is output to the speaker SPK.
[0064] In the voice recognition domain, voice data acquired by one or more microphones MIC are evaluated by a voice recognition block. Voice recognition may be triggered by the announcement. The voice recognition preferably provides a signal to the announcement playback control block that voice recognition is completed. In this way, the announcement playback control block can terminate adjustment of the foreground track and the background track and switch back to unaltered audio playback by providing an appropriate signal to the selector SEL.
REFERENCE NUMERALS
[0065] 20 Apparatus [0066] 21 Input [0067] 22 Receiving unit [0068] 23 Modifying unit [0069] 24 Playback unit [0070] 25 Control module [0071] 26 Local storage unit [0072] 27 Output [0073] 28 User interface [0074] 30 Apparatus [0075] 31 Memory device [0076] 32 Processing device [0077] 33 Input [0078] 34 Output [0079] 40 Motor vehicle [0080] 41 Speaker [0081] 42 Infotainment system [0082] 43 Navigation system [0083] 44 Environment sensors [0084] 45 Data transmission unit [0085] 46 Memory [0086] 47 Network [0087] 50 Electronic device [0088] 51 Speaker [0089] 52 Socket [0090] 53 Screen [0091] AAD Adjusted audio data [0092] AD Audio data [0093] E Element [0094] FILFilter [0095] GC Gain control [0096] MIC Microphone [0097] OUT Audio output [0098] SEL Selector [0099] SP Speech signal [0100] S1 Receive speech signal [0101] S2 Modify volume and/or spectral appearance of selected elements of audio data [0102] S3 Play back adjusted audio data [0103] S4 Play back received speech signal