AGENT COOPERATION DEVICE, OPERATION METHOD THEREOF, AND STORAGE MEDIUM
20210360326 · 2021-11-18
Assignee
Inventors
Cpc classification
G10L15/22
PHYSICS
H04N21/43079
ELECTRICITY
H04N21/41422
ELECTRICITY
H04N21/43078
ELECTRICITY
H04N21/47217
ELECTRICITY
International classification
H04N21/472
ELECTRICITY
G10L15/22
PHYSICS
Abstract
An agent cooperation device includes: a sound output section that controls sound output in accordance with instructions from plural agents that are configured to receive an instruction regarding a predetermined service by voice dialogue; and a control section that, in a case in which a voice dialogue is provided with respect to one of the plural agents while another agent is playing music or an audiobook as a service, controls the sound output section so as to lower a volume of or stop playback that is being carried out by the other agent.
Claims
1. An agent cooperation device, comprising: a sound output section that controls sound output in accordance with instructions from a plurality of agents that are configured to receive an instruction regarding a predetermined service by voice dialogue; a memory; and a processor that is coupled to the memory and is configured to: in a case in which a voice dialogue is provided with respect to one of the plurality of agents while another agent is playing music or an audiobook as a service, control the sound output section so as to lower a volume of, or stop, playback that is being carried out by the other agent.
2. The agent cooperation device of claim 1, wherein the processor is configured to control the sound output section so as to lower the volume of the playback that is being carried out by the other agent in a case in which the one agent receives a voice dialogue during the playback, and to stopsound of the playback when the one agent outputs a response voice to the voice dialogue.
3. The agent cooperation device of claim 1, wherein the processor is configured to control the sound output section so as to lower the volume of the playback that is being carried out by the other agent in a case in which the one agent receives a voice dialogue during the playback, to stop sound of the playback while the one agent outputs a response voice, and to restart the sound of the playback after the voice dialogue with the one agent ends.
4. The agent cooperation device of claim 1, wherein the processor is configured to control the sound output section so as to, in a case in which the one agent is to play music or an audiobook, while the other agent is playing music or an audiobook, lower a volume of the playback that is being carried out by the other agent when the one agent receives the voice dialogue, and stop the playback of the music or the audiobook by the other agent when the one agent starts playback of music or an audiobook.
5. The agent cooperation device of claim 1, wherein the processor is configured to control the sound output section so as to, in a case in which the one agent is to output a voice response to a voice dialogue while the other agent is playing music or an audiobook, lower a volume of the playback that is being carried out by the other agent when the one agent receives the voice dialogue, and restore the volume of the playback that is being carried out by the other agent after the one agent outputs the voice response.
6. A method of operating an agent cooperation device that includes functions of a plurality of agents that are configured to receive an instruction regarding a predetermined service by voice dialogue, and a sound output section that controls sound output from the plurality of agents, the method comprising: detecting a voice dialogue with respect to one agent among the plurality of agents; determining whether or not another agent among the plurality of agents is playing music or an audiobook as the service; and controlling the sound output section so as to lower a volume of, or stop, playback that is being carried out by the other agent, in a case in which it is determined that the other agent is carrying out the playback.
7. A non-transitory storage medium storing a program that is executable by a computer to perform agent cooperation processing, the computer being configured to perform functions of a plurality of agents that receive an instruction regarding a predetermined service by voice dialogue, and of a sound output section that controls sound output from the plurality of agents, the agent cooperation processing comprising: detecting a voice dialogue with respect to one agent among the plurality of agents; determining whether or not another agent among the plurality of agents is playing music or an audiobook as the service; and controlling the sound output section so as to lower a volume of, or stop, playback that is being carried out by the other agent, in a case in which it is determined that the other agent is carrying out the playback.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
DETAILED DESCRIPTION
[0026] Exemplary embodiments of the present disclosure are described in detail hereinafter with reference to the drawings.
[0027] Description will be given by using, as an example, a case in which an agent cooperation device 10 relating to the present embodiment is incorporated in a head unit (H/U) that is installed as an onboard device.
[0028] The agent cooperation device 10 is connected to plural agent servers via a communication device 16. In the present embodiment, for example, the agent cooperation device 10 is connected to two agent servers that are a first agent server 12 and a second agent server 14. By carrying out communication with these two agent servers, the agent cooperation device 10 provides a user with services that the respective agent servers provide. Further, the agent cooperation device 10 has the function of controlling output of sounds from the respective agent servers.
[0029] Each of the first agent server 12 and the second agent server 14 provides the function of a voice dialogue assistant that is referred to as a Virtual Personal Assistant (VPA). Specifically, a predetermined service such as playback of music, playback of an audiobook, providing of a weather report, or the like is provided to a user by a voice dialogue via the agent cooperation device 10. Because any of various known techniques may be used as the detailed structures of the agent server, description is omitted.
[0030] In the present embodiment, the communication device 16 is a communication instrument dedicated for a vehicle, and carries out communication between the agent cooperation device 10 and the first agent server 12, and between the agent cooperation device 10 and the second agent server 14. For example, these respective communications are carried out via a wireless communication network of a cell phone or the like. For example, a communication device called a Data Communication Module (DCM) is used as the communication device 16.
[0031] The agent cooperation device 10 is, for example, structured by a general microcomputer that includes a Central Processing Unit (CPU), a Read Only Memory (ROM), a Random Access Memory (RAM), and the like. The agent cooperation device 10 includes functions of a sound output controller 18 that serves as an example of the sound output section, an A2A cooperation controller 20 that serves as an example of the control section, and a voice sensing section 26.
[0032] The sound output controller 18 is connected to a speaker 28, and controls sound output from the first agent server 12 and the second agent server 14.
[0033] The A2A cooperation controller 20 is connected to a touch panel 30, the sound output controller 18 and the voice sensing section 26, and transmits and receives information to and from these sections. Further, the A2A cooperation controller 20 has the functions of a first agent 22 and a second agent 24. The first agent 22 is provided in correspondence with the first agent server 12, and controls communications with the first agent server 12. Further, the second agent 24 is provided in correspondence with the second agent server 14, and controls communications with the second agent server 14. In response to receiving information relating to a voice dialogue from any of the agent servers, the A2A cooperation controller 20 notifies the sound output controller 18. The sound output controller 18 thereby controls sound output from the speaker 28 based on the information relating to the voice dialogue.
[0034] The voice sensing section 26 is connected to a microphone 32, and senses voice information obtained from the microphone 32, and notifies the A2A cooperation controller 20 of the results of sensing. For example, the voice sensing section 26 senses a wakeup word for activating an agent.
[0035] An example of the specific operations carried out at the respective sections of the agent cooperation device 10 of the present embodiment structured as described above, are described next.
[0036] In the agent cooperation device 10 relating to the present embodiment, the voice sensing section 26 senses a wakeup word, and notifies the A2A cooperation controller 20, and the A2A cooperation controller 20 connects to the corresponding agent server via the communication device 16.
[0037] The sound output controller 18 controls the output of a sound from the speaker 28 in accordance with a request for sound output (a voice dialogue, music, an audiobook, or the like) from an agent server.
[0038] In a case in which, a voice dialogue is provided with respect to either one of the agents among the first agent 22 and the second agent 24, while the other agent is playing music or an audiobook, the A2A cooperation controller 20 controls the sound output controller 18 so as to lower the volume of or stop the playback that is being carried out.
[0039] Further, in a case in which one agent receives a voice dialogue while another agent is performing playback, the A2A cooperation controller 20 carries out control so as to lower the volume of the playback by the other agent, and, at the time when the one agent outputs a response voice to the voice dialogue, stop the sound that is being played by the other agent.
[0040] Further, in a case in which one agent receives a voice dialogue while another agent is performing playback, the A2A cooperation controller 20 carries out control so as to lower the volume of the playback by the other agent, and, while the one agent is outputting a response voice, stop the sound that that is being played by the other agent, and, after the voice dialogue with the one agent ends, restart the sound of the playback by the other agent.
[0041] Further, in a case in which one agent is to playback music or an audiobook while another agent is performing playback of music or an audiobook, the A2A cooperation controller 20 carries out control so as to, at the time when the one agent receives a voice dialogue, lower the volume of the playback that is being carried out by the other agent, and, at the time when the one agent starts playback of music or an audiobook, stop the playback of the music or the audiobook by the other agent.
[0042] Moreover, in a case in which one agent outputs a response voice to a voice dialogue while another agent is performing playback of music or an audiobook, the A2A cooperation controller 20 carries out control so as to, at the time when the one agent receives the voice dialogue, lower the volume of the playback that is being carried out by the other agent, and, after the one agent outputs a response voice, return the volume of the playback that is being carried out by the other agent to the original volume.
[0043] Specific processings that are carried out at the respective sections of the agent cooperation device 10 relating to the present embodiment are described next.
[0044] First, the processing carried out at the voice sensing section 26 are described.
[0045] In step 100, the voice sensing section 26 carries out voice detection, and the routine moves on to step 102. Namely, the voice sensing section 26 detects the voice input from the microphone 32.
[0046] In step 102, the voice sensing section 26 judges whether or not a wakeup word has been detected. This judgment is a judgment as to whether or not a predetermined wakeup word for activating the first agent 22, or a predetermined wakeup word for activating the second agent 24, has been detected. If this judgment is affirmative, the routine moves on to step 104, and if this judgment is negative, the series of processings ends.
[0047] In step 104, the voice sensing section 26 judges whether or not the agent corresponding to the wakeup word is currently activated. If this judgment is negative, the routine moves on to step 106, and, if this judgment is affirmative, the routine moves on to step 112.
[0048] In step 106, the voice sensing section 26 judges whether or not the detected wakeup word is for the first agent 22. If this judgment is affirmative, the routine moves on to step 108. If the wakeup word for the second agent 24 has been detected and the judgment is negative, the routine moves on to step 110.
[0049] In step 108, the voice sensing section 26 notifies the first agent 22 that it is to activate, and the routine moves on to step 112.
[0050] In step 110, the voice sensing section 26 notifies the second agent 24 that it is to activate, and the routine moves on to step 112.
[0051] In step 112, the voice sensing section 26 judges whether or not a voice is sensed within a predetermined time period. If this judgment is negative, i.e., if a voice is not sensed within the predetermined time period, the series of processing ends. If the judgment is affirmative, the routine moves on to step 114.
[0052] In step 114, the voice sensing section 26 notifies the corresponding agent of the sensed voice, and ends the series of processings. Namely, if a voice is sensed within the predetermined time period after the sensing of the wakeup word of the first agent 22, the voice sensing section 26 notifies the first agent 22 of the sensed voice. If a voice is sensed within the predetermined time period after the sensing of the wakeup word of the second agent 24, the voice sensing section 26 notifies the second agent 24 of the sensed voice.
[0053] Processing at the A2A cooperation controller 20 is described next.
[0054] In step 200, the A2A cooperation controller 20 receives an agent activation notification, and the routine moves on to step 202. Namely, the A2A cooperation controller 20 receives the agent activation notification given in step 108 or step 110 of
[0055] In step 202, the A2A cooperation controller 20 judges whether or not the agent activation notification received from the voice sensing section 26 is an activation notification for the first agent 22. If this judgment is affirmative, the routine moves on to step 204, and if this judgment is negative, the routine moves on to step 205.
[0056] In step 204, the first agent 22 is activated, and the routine moves on to step 208. Specifically, communication between the first agent 22 and the first agent server 12 is established, and the system transitions to a state in which the provision of service from the first agent server 12 is possible.
[0057] In step 205, the second agent 24 is activated, and the routine moves on to step 206. Specifically, communication between the second agent 24 and the second agent server 14 is established, and the system transitions to a state in which the provision of service from the second agent server 14 is possible.
[0058] In step 206, the A2A cooperation controller 20 judges whether or not the another agent is currently activated. In a case in which one of the first agent 22 and the second agent 24 has received voice information, this judgment is a judgment as to whether or not the other of the first agent 22 and the second agent 24 is currently activated. If this judgment is affirmative, the routine moves on to step 208, and, if this judgment is negative, the routine moves on to step 210.
[0059] In step 208, the A2A cooperation controller 20 lowers the volume of the sound output by the agent that has been activated previously, and the routine moves on to step 210. Namely, the A2A cooperation controller 20 instructs the sound output controller 18 to lower the volume of the sound output (e.g., an audiobook or music or the like) by the agent that has been previously activated. Due thereto, the volume of the sound source that is already outputting is lowered, and it becomes easy to hear the dialogue with the agent. Note that, in step 208, the sound output during the dialogue may be stopped temporarily, rather the volume thereof being lowered.
[0060] In step 210, the A2A cooperation controller 20 judges whether or not a voice notification has been received from the voice sensing section 26 within a predetermined time period. In this judgment, it is judged whether or not a voice notification has been received by above-described step 114. If this judgment is affirmative, the routine moves on to step 212, and, if this judgment is negative, the series of processings ends.
[0061] In step 212, the A2A cooperation controller 20 transmits the voice information from the corresponding agent to the corresponding agent server, and the routine moves on to step 214. Namely, in a case in which the first agent 22 is activated and receives a voice notification, the first agent 22 transmits the voice information to the first agent server 12. In a case in which the second agent 24 is activated and receives a voice notification, the second agent 24 transmits the voice information to the second agent server 14.
[0062] In step 214, the A2A cooperation controller 20 receives voice information from the agent server, and the routine moves on to step 216. For example, in step 212, in a case in which voice information whose contents are to playback an audiobook or music is transmitted to the agent server, the agent server carries out semantic analysis on the basis of the voice information, and the A2A cooperation controller 20 receives the voice information to playback the corresponding audiobook or music.
[0063] In step 216, the A2A cooperation controller 20 carries out response output processing, and ends the series of processings. The response output processing is processing that gives a response to the dialogue from the user, and, for example, the processing illustrated in
[0064] Namely, in step 300, the A2A cooperation controller 20 judges whether or not another agent is currently outputting sound. If this judgment is negative, the routine moves on to step 302. If this judgment is affirmative, the routine moves on to step 304.
[0065] In step 302, on the basis of the voice information received from the agent server, the A2A cooperation controller 20 carries out the requested sound playback, and returns the processing of
[0066] In step 304, the A2A cooperation controller 20 judges whether or not the voice information received from the agent server is music playback. If this judgment is affirmative, the routine moves on to step 306. If this judgment is negative, the routine moves on to step 312.
[0067] In step 306, the A2A cooperation control 20 controls the sound output controller 18 to output a playback start message, and the routine moves on to step 308.
[0068] In step 308, the A2A cooperation controller 20 ends the sound output by the other agent, and the routine moves on to step 310.
[0069] In step 310, the A2A cooperation controller 20 controls the sound output controller 18 so as to playback the requested music, i.e., the music designated by the voice information received from the agent server, and returns the processing of
[0070] In step 312, the A2A cooperation controller 20 judges whether or not the voice information received from the agent server is a weather report. If this judgment is negative, the routine moves on to step 314, and, if this judgment is affirmative, the routine moves on to step 316.
[0071] In step 314, the A2A cooperation controller 20 outputs a voice corresponding to a request other than the weather report, and returns the processing of
[0072] In step 316, the A2A cooperation controller 20 controls the sound output controller 18 such that a weather report that is expressed by the voice information received from the agent server is output, and the routine moves on to step 318. Namely, the weather report is output while the volume of the sound output (e.g., an audiobook, music, or the like) by the other agent is lowered. Therefore, the audibility of the weather report may be improved.
[0073] In step 318, the A2A cooperation controller 20 controls the sound output controller 18 so as to restore the volume of the sound output by the other agent that has been activated previously, and returns the processing of
[0074] Here, operation of the agent cooperation device 10 relating to the present embodiment is described by using a specific example.
[0075] As illustrated in
[0076] Further, in response to the user's speech of “play music” within a predetermined time period following the wakeup word, at the voice sensing section 26, the judgment of step 112 is affirmative, and the first agent 22 is notified of the voice at step 114. In response to the notification of the voice given, at the A2A cooperation controller 20, the judgment of above-described step 210 is affirmative, and the spoken voice is transmitted to the first agent server 12 at step 212. Then, semantic analysis is carried out by the first agent server 12, the first agent 22 of the A2A cooperation controller 20 receives a response at step 214, and response output processing is carried out at step 216.
[0077] In the response output processing, the judgments of above-described steps 300 and 304 are affirmative, and, at step 306, a playback start message is output from the first agent 22. Namely, as illustrated in
[0078] By carrying out processing in this way, in the example of
[0079]
[0080] As illustrated in
[0081] Further, in response to the user's speech of “tell me the weather” within a predetermined time period following the wakeup word, at the voice sensing section 26, the judgment of step 112 is affirmative, and the first agent 22 is notified of the voice at step 114. After notification of the voice is given, at the A2A cooperation controller 20, the judgment of above-described step 210 is affirmative, and the spoken voice is transmitted to the first agent server 12 at step 212. Then, semantic analysis is carried out by the first agent server 12, the first agent 22 of the A2A cooperation controller 20 receives a response at step 214, and response output processing is carried out at step 216.
[0082] In the response output processing, the judgment of above-described step 300 is affirmative, the judgment of step 304 is negative, and the judgment of step 312 is affirmative. In step 316, a weather report is output from the first agent 22. Namely, as illustrated in
[0083] By carrying out processing in this way, in the example of
[0084] A modified example of the response output processing is described next.
[0085] In step 300, the A2A cooperation controller 20 judges whether or not another agent is currently outputting sound. If this judgment is negative, the routine moves on to step 302. If this judgment is affirmative, the routine moves on to step 304.
[0086] In step 302, on the basis of the voice information received from the agent server, the A2A cooperation controller 20 carries out the requested sound playback, and returns the processing of
[0087] In step 304, the A2A cooperation controller 20 judges whether or not the voice information received from the agent server is music playback. If this judgment is affirmative, the routine moves on to step 305. If this judgment is negative, the routine moves on to step 312.
[0088] In step 305, the A2A cooperation controller 20 ends the sound output by the other agent, and the routine moves on to step 307.
[0089] In step 307, the A2A cooperation controller 20 controls the sound output controller 18 so as to output a playback start message, and the routine moves on to step 310.
[0090] In step 310, the A2A cooperation controller 20 controls the sound output controller 18 so as to playback the requested music, i.e., the music designated by the voice information received from the agent server, and returns the processing of
[0091] In step 312, the A2A cooperation controller 20 judges whether or not the voice information received from the agent server is a weather report. If this judgment is negative, the routine moves on to step 314, and, if this judgment is affirmative, the routine moves on to step 315.
[0092] In step 314, the A2A cooperation controller 20 outputs a voice corresponding to a request other than the weather report, and returns the processing of
[0093] In step 315, the A2A cooperation controller 20 stops the sound output by the other agent that has been previously activated, and the routine moves on to step 316. Namely, the A2A cooperation controller 20 instructs the sound output controller 18 to stop the sound output (e.g., an audiobook, music or the like) by the other agent that has been previously activated.
[0094] In step 316, the A2A cooperation controller 20 controls the sound output controller 18 such that a weather report that is expressed by the voice information received from the agent server is output, and the routine moves on to step 317. Namely, the weather report is output in a state in which the sound output (e.g., an audiobook, music, or the like) by the other agent is stopped. Therefore, the audibility of the weather report may be improved.
[0095] In step 317, the A2A cooperation controller 20 controls the sound output controller 18 so as restart the sound output by the other agent that has been activated previously, and returns the processing of
[0096] Here, operation of the agent cooperation device 10 relating to the present embodiment at which the response output processing of the modified example is applied, is described by using a specific example.
[0097] As illustrated in
[0098] Further, in response to the user's speech of “play music” within a predetermined time period following the wakeup word, at the voice sensing section 26, the judgment of step 112 is affirmative, and the first agent 22 is notified of the voice at step 114. After notification of the voice is given, at the A2A cooperation controller 20, the judgment of above-described step 210 is affirmative, and the spoken voice is transmitted to the first agent server 12 at step 212. Then, semantic analysis is carried out by the first agent server 12, the first agent 22 of the A2A cooperation controller 20 receives a response at step 214, and response output processing is carried out at step 216.
[0099] In the response output processing, the judgments of above-described steps 300 and 304 are affirmative. After the playback of music by the second agent 24 ends at step 305, in step 307, a playback start message is output from the first agent 22. Namely, as illustrated in
[0100] By carrying out processing in this way, in the example of
[0101]
[0102] As illustrated in
[0103] Further, in response to the user's speech of “tell me the weather” within a predetermined time period following the wakeup word, at the voice sensing section 26, the judgment of step 112 is affirmative, and the first agent 22 is notified of the voice at step 114. After notification of the voice is given, at the A2A cooperation controller 20, the judgment of above-described step 210 is affirmative, and the spoken voice is transmitted to the first agent server 12 at step 212. Then, semantic analysis is carried out by the first agent server 12, the first agent 22 of the A2A cooperation controller 20 receives a response at step 214, and response output processing is carried out at step 216.
[0104] In the response output processing, the judgment of above-described step 300 is affirmative, the judgment of step 304 is negative, and the judgment of step 312 is affirmative. After playing of music by the second agent 24 is stopped in step 315, a weather report is output from the first agent 22 in step 316. Namely, as illustrated in
[0105] By carrying out processing in this way, in the example of
[0106] Note that, the above-described embodiments describe cases in which the first agent 22 and the second agent 24 provide services of playing of music, playing an audiobook or providing of a weather report, in
[0107] Further, although the above-described embodiment describe examples in which there are two agents that are the first agent 22 and the second agent 24, the present disclosure is not limited to this, and there may be three or more agents. In this case, in a case in which a voice dialogue is carried out with respect to one agent of the plural agents while another agent is playing music or an audiobook, it suffices for the A2A cooperation controller 20 to control the sound output control section such that the volume of the playback being carried out is lowered or the playback being carried out is stopped.
[0108] Although description has been given in which the processings carried out at the agent cooperation device 10 in the above-described respective embodiments are software processings performed by the CPU executing programs, the present disclosure is not limited to this. For example, the processings may be processings that are carried out by hardware using Graphics Processing Units (GPUs), Application Specific Integrated Circuits (ASICs), Field-Programmable Gate Arrays (FPGAs) or the like. Or, the processings may be processings realized by a combination of software and hardware. In the case of software processings, the programs may be stored on any of various types of storage media and distributed.
[0109] The present disclosure is not limited to the above embodiments, and can of course be implemented by being modified in various ways other than above embodiments within a scope that does not depart from the gist thereof.