Apparatus, method and program to facilitate retrieval of voice messages
09824143 ยท 2017-11-21
Assignee
Inventors
- Ken Miyashita (Tokyo, JP)
- Tomohiko Hishinuma (Kanagawa, JP)
- Yoshihito Ohki (Tokyo, JP)
- Ryohei Morimoto (Kanagawa, JP)
- Junya Ono (Kanagawa, JP)
Cpc classification
G10L15/30
PHYSICS
G06F16/685
PHYSICS
G06F16/3328
PHYSICS
International classification
Abstract
An information processing apparatus includes a display, an input unit, and a controller. The input unit is configured to receive an input of a first keyword from a user. The controller is configured to retrieve first character information including the input first keyword from a database configured to store a plurality of character information items converted from a plurality of voice information items by voice recognition processing, extract a second keyword that is included in the first character information acquired by the retrieval and is different from the first keyword, and control the display to display a list of items including first identification information with which the acquired first character information is identified and the second keyword included in the first character information.
Claims
1. An information processing apparatus, comprising: a display; an input device to receive a first user input of a first keyword inputted by a user in a portion of a graphical user interface; and a processor configured to: receive voice call data and store the voice call data in a local memory; transmit the stored voice call data to a server, wherein the server converts the voice call data to a voice call character string; cause a first query of the server to be performed using the first keyword of the first user input, wherein the first query retrieves first character information associated with the first keyword, and in response to the first query, cause a first list of data items associated with the first character information to be displayed on the display; receive a second user input on a data item in the list of data items, wherein the second user input causes the voice call character string to be displayed, wherein the voice call character string includes a second keyword extracted from the voice call character string, and wherein a visual indication of the second keyword is displayed; and receive a third user input on the voice call character string, and determine a selected position of the third user input on the voice call character string, and determine whether or not the selected position of the third user input on the voice call character string displayed on the display is of the extracted second keyword, and (a) when a determination result thereof indicates that the selected position on the voice call character string is one other than that of the second keyword, cause the voice call data to be played from a playback position corresponding to the selected position on the voice call character string displayed on the display, and (b) when the determination result thereof indicates that the selected position on the voice call character string is that of the second keyword or the visual indication thereof, cause a second query to be performed using the second keyword, wherein the second query retrieves second character information associated with the second keyword from the server without causing the voice call data to be played from the position corresponding to the selected position on the voice call character string displayed on the display, wherein the second character information is used to present a second list of data items associated with the second character information to be displayed on the display.
2. An information processing method for use with a processing apparatus, said method comprising: receiving a first user input of a first keyword inputted by a user in a portion of a graphical user interface; and causing, by use of a processor of the processing apparatus, voice call data of the user to be stored in a local memory, the stored voice call data of the user to be transmitted to a server, wherein the server converts the voice call data to a voice call character string, a first query of the server to be performed using the first keyword of the first user input, wherein the first query retrieves first character information associated with the first keyword, and in response to the first query, causing a first list of data items associated with the first character information to be displayed on a display; a receiving of a second user input on a data item in the list of data items, wherein the second user input causes the voice call character string to be displayed on the display, wherein the voice call character string includes a second keyword extracted from the voice call character string, and wherein a visual indication of the second keyword is displayed on the display; and a receiving of a third user input on the voice call character string, and a determining of a selected position of the third user input on the voice call character string, and a determining of whether or not the selected position of the third user input on the voice call character string displayed on the display is of the extracted second keyword, and (a) when the determining result thereof indicates that the selected position on the voice call character string is one other than that of the second keyword, causing the voice call data to be played from a playback position corresponding to the selected position on the voice call character string displayed on the display, and (b) when the determining result thereof indicates that the selected position on the voice call character string is that of the second keyword or the visual indication thereof, causing a second query to be performed using the second keyword, wherein the second query retrieves second character information associated with the second keyword from the server without causing the voice call data to be played from the position corresponding to the selected position on the voice call character string displayed on the display, wherein the second character information is used to present a second list of data items associated with the second character information to be displayed on the display.
3. A non-transitory computer readable storage medium having stored thereon a program causing an information processing apparatus to execute the steps of: receiving a first user input of a first keyword inputted by a user in a portion of a graphical user interface; and causing, by use of a processor of the information processing apparatus, voice call data of the user to be stored in a local memory, the stored voice call data of the user to be transmitted to a server, wherein the server converts the voice call data to a voice call character string, a first query of the server to be performed using the first keyword of the first user input, wherein the first query retrieves first character information associated with the first keyword, and in response to the first query, causing a first list of data items associated with the first character information to be displayed on a display; a receiving of a second user input on a data item in the list of data items, wherein the second user input causes the voice call character string to be displayed on the display, wherein the voice call character string includes a second keyword extracted from the voice call character string, and wherein a visual indication of the second keyword is displayed on the display; and a receiving of a third user input on the voice call character string, and a determining of a selected position of the third user input on the voice call character string, and a determining of whether or not the selected position of the third user input on the voice call character string displayed on the display is of the extracted second keyword, and (a) when the determining result thereof indicates that the selected position on the voice call character string is one other than that of the second keyword, causing the voice call data to be played from a playback position corresponding to the selected position on the voice call character string displayed on the display, and (b) when the determination determining result thereof indicates that the selected position on the voice call character string is that of the second keyword or the visual indication thereof, causing a second query to be performed using the second keyword, wherein the second query retrieves second character information associated with the second keyword from the server without causing the voice call data to be played from the position corresponding to the selected position on the voice call character string displayed on the display, wherein the second character information is used to present a second list of data items associated with the second character information to be displayed on the display.
Description
BRIEF DESCRIPTION OF DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
DETAILED DESCRIPTION OF EMBODIMENTS
(9) Hereinafter, an embodiment of the present disclosure will be described with reference to the drawings.
(10) [Outline of System]
(11)
(12) As shown in
(13) A plurality of user terminals 200 may exist on the network. The user terminals 200 are typically mobile terminals such as a smartphone, a mobile phone, and a tablet PC (Personal Computer), but may be any information processing apparatuses including desktop and laptop PCs, an electronic book reader, portable A/V (audio/visual) equipment, and the like.
(14) A user of the user terminal 200 makes a voice call to a user of another user terminal with use of the user terminal 200. This voice call data is stored in the user terminal 200 as a call history.
(15) The server 100 acquires the voice call data from the user terminal 200, and converts the voice call data into character information by voice recognition processing and then store it.
(16) The user terminal 200 retrieves past voice calls via the server 100 based on a keyword input by the user and displays results of the retrieval.
(17) [Hardware Configuration of Server]
(18)
(19) The CPU 11 accesses the RAM 13 and the like when necessary and performs overall control on the respective blocks of the server 100 while performing various types of computation processing. The ROM 12 is a non-volatile memory in which an OS to be executed by the CPU 11 and firmware such as programs and various parameters are fixedly stored. The RAM 13 is used as a work area or the like of the CPU 11 and temporarily stores the OS, various applications in execution, and various types of data being processed.
(20) Connected to the input and output interface 15 are a display 16, an operation reception unit 17, a storage 18, a communication unit 19, and the like.
(21) The display 16 is a display device using, for example, an LCD (Liquid Crystal Display), an GELD (Organic Electroluminescent Display), a CRT (Cathode Ray Tube), or the like.
(22) The operation reception unit 17 includes a pointing device such as a mouse, a keyboard, a touch panel, and other input devices. In the case where the operation reception unit 17 is a touch panel, the touch panel may be formed integrally with the display 16.
(23) The storage 18 is a non-volatile memory such as an HDD (Hard Disk Drive), a flash memory (SSD (Solid State Drive)), and other solid-state memories. The storage 18 stores the OS, various applications, and various types of data. In particular, in this embodiment, the storage 18 stores voice call data received from the user terminal 200 and character data obtained by performing voice recognition processing on the voice call data.
(24) The communication unit 19 is a NIC (Network Interface Card) or the like for wired connection to the Internet 50 or a LAN (Local Area Network), and performs communication processing with the user terminal 200.
(25) [Hardware Configuration of User Terminal]
(26)
(27) The display 21 is constituted of, for example, a liquid crystal display, an EL (Electroluminescent) display, or the like. The display 21 is formed integrally with the touch panel 22. Examples of the touch panel 22 include a resistive touch panel and a capacitive touch panel, but the touch panel may have any system. On the display 21 (touch panel 22), a list of history information of past voice calls is displayed, which will be described later.
(28) The communication unit 23 executes processing such as frequency transform, modulation, and demodulation of radio waves that are transmitted and received by the antenna 24. The antenna 24 transmits and receives radio waves for calls and packet communication of e-mail and the like. Further, the communication unit 23 is also used when voice call data is transmitted to the server 100.
(29) The speaker 26 includes a digital/analog converter, an amplifier, and the like. The speaker 26 executes digital/analog conversion processing and amplification processing on voice call data input from the CPU 25 and outputs a voice via a receiver (not shown).
(30) The microphone 27 includes an analog/digital converter and the like. The microphone 27 converts analog voice data that has been input from the user through a microphone into digital voice data and outputs it to the CPU 25. The digital voice data output to the CPU 25 is encoded and then transmitted via the communication unit 23 and the antenna 24.
(31) The RAM 28 is a volatile memory used as a work area of the CPU 25. The RAM 28 temporarily stores various program and various types of data used for processing of the CPU 25.
(32) The flash memory 29 is a non-volatile memory in which various programs and various types of data necessary for processing of the CPU 25 are stored. In particular, in this embodiment, the flash memory 29 stores applications and data for displaying a list of the call history and the voice call data.
(33) The CPU 25 performs overall control on the respective units of the user terminal 200 and executes various computations according to various programs. For example, the CPU 25 exchanges data with the server 100 to execute processing of retrieving voice call data based on a character string (keyword) input through the touch panel 22, and displays results of the retrieval on the display 21.
(34) [Software Configurations of Server and User Terminal]
(35)
(36) As shown in
(37) The call recording unit 41 stores voice call data of the user in the primary storage area 42.
(38) The transmission and reception processing unit 43 transmits the voice call data stored in the primary storage area 42 to the call-related-information storage unit 46 of the server 100 and notifies the voice-to-character conversion unit 47 of the fact that the voice call data has been transmitted.
(39) The call-related-information storage unit 46 stores the voice call data transmitted from the transmission and reception processing unit 43.
(40) The voice-to-character conversion unit 47 executes voice recognition processing on the received voice call data to convert the voice call data into character data. The converted character data is stored in the call-related-information storage unit 46.
(41) The search word input unit 44 receives an input of a search keyword from the user.
(42) The retrieval result display unit 45 displays, on the display 21, a list of voice call data retrieved from the call-related-information storage unit 46 based on the search keyword.
(43) [Operations of Server and User Terminal]
(44) Next, operations of the server 100 and user terminal 200 configured as described above will be described. Hereinafter, descriptions will be given on the assumption that the CPU 11 of the server 100 and the CPU 25 of the user terminal 200 are subjects of operations. However, those operations are also executed in cooperation with other hardware and software (application) provided to the server 100 and the user terminal 200.
(45)
(46) As shown in
(47) Subsequently, the CPU 25 transmits the input keyword to the server 100 (Step 52). In the case where the keyword is input by a voice, voice data thereof is transmitted to the server 100.
(48) Meanwhile, as shown in
(49) Then, the CPU 11 creates, based on results of the retrieval, a list of voice call data including the keyword (Step 63). In this case, the CPU 11 extracts summary information and an important keyword from the call-related-information storage unit 46. The summary information is a summary of contents of character information obtained by converting voice call data items into characters. The important keyword is included in the character information. Then, the CPU 11 adds the extracted summary information and important keyword to a list of the retrieved voice call data (Step 64). The summary information and the important keyword will be described later in detail.
(50) The CPU 11 then transmits the list to the user terminal 200 (Step 65).
(51) Meanwhile, as shown in
(52)
(53) As shown in
(54) In addition, in each of the voice call data items 71, an important keyword 72 that is different from the search keyword described above and is included in the voice call data is displayed.
(55) The important keyword 72 is, for example, a noun that has been extracted by the voice-to-character conversion unit 47 from the voice call data converted into characters, in morphological analysis processing or the like. The important keyword 72 is underlined in order to indicate that the underlined important keyword 72 is selectable (capable of receiving tap operation, for example).
(56) Referring back to
(57) When determining that the tap is made on a portion corresponding to the important keyword 72 (Yes), the CPU 25 executes the retrieval processing performed in Step 52 and subsequent steps with that important keyword 72 as a new search keyword and displays as results of the retrieval a new list of voice call data, which is received from the server 100, in the same manner as that shown in
(58) When determining that the tap is not made on a portion corresponding to the important keyword 72 (No), that is, that the tap is an operation to select a specific one of the voice call data items 71, the CPU 25 displays detailed information of the selected voice call data item (Step 57).
(59)
(60) As shown in
(61) The summary information 81 is created by the voice-to-character conversion unit 47 based on the character information stored in the call-related-information storage unit 46 and stored in the call-related-information storage unit 46. The summary information 81 is received simultaneously when a list of voice call data is received from the server 100. The summary information 81 may be crated by any method, but it is created by combining clauses including a specific noun in the character information with one another, for example.
(62) In the summary information 81, the character information is displayed while distinguishing speakers by different colors, fonts, and the like. Such differentiation of speakers is executed by the voice-to-character conversion unit 47 in advance and added as metadata. Such differentiation is executed by a comparison with a voice feature (acoustic pattern) such as a waveform of voice call data as a source. Alternatively, each speaker may be indicated by characters or the like in each sentence within the summary information 81.
(63) Further, in the summary information 81, an important keyword 82 included therein is displayed in a selectable state. The important keyword 82 corresponds to the important keyword 72 displayed on the above-mentioned retrieval result display screen.
(64) Furthermore, a replay button 83 is also displayed on the display screen showing detailed information. Although not described in the flowchart of
(65) Referring back to
(66) Specifically, when determining that the position of the tap operation (tapped position) is on a portion corresponding to the important keyword 82 in the summary information 81, the CPU 25 executes the retrieval processing performed in Step 52 and subsequent steps with that important keyword 82 as a new search keyword. Then, the CPU 25 displays as results of the retrieval a new list of voice call data, which is received from the server 100, in the same manner as that shown in
(67) Further, when determining that the tapped position is on a portion other than the important keyword 82 in the summary information 81, the CPU 25 replays the voice call data from a position corresponding to a character string displayed at the tapped position. For example, by receiving correspondence information between character strings in the summary information 81 and replay positions in the voice call data in advance together with the summary information 81 and the like from the server 100, the CPU 25 distinguishes a character string displayed at the tapped position.
(68) Further, when determining that the tapped position is on the history button 73, the CPU 25 displays again the retrieval result display screen shown in
(69) [Conclusion]
(70) As described above, according to this embodiment, the user terminal 200 cooperates with the server 100 to retrieve voice call data based on a keyword input by the user, and when results of the retrieval are displayed as a list, the user terminal 200 also display the important keyword 72 other than the search keyword. Accordingly, the user easily grasp contents of voice call data based on a relationship between the keyword input by the user and the important keyword 71.
Modified Example
(71) The present disclosure is not limited to the embodiment described above and may variously be modified without departing from the gist of the present disclosure.
(72) In the embodiment described above, a target to be retrieved is the voice call data, but it is not limited thereto. For example, music data, voice data in moving image contents, and the like may be targets to be retrieved.
(73) In the embodiment described above, the server 100 performs the processing of converting voice call data into characters and the processing of storing the voice call data and character information obtained by conversion of the voice call data. In the case where the user terminal 200 has a sufficiently storage capacity and calculation capability, however, the processing performed by the server 100 may be executed by the user terminal 200.
(74) In the embodiment described above, the user terminal 200 makes a voice call and then temporarily stores voice call data thereof to transmit it to the server 100. However, an apparatus for making a voice call may be an apparatus different from the user terminal 200. In this case, the user terminal 200 receives voice call data from the apparatus that has made a voice call via a network for example, or stores voice call data received via a recording medium.
(75) The layout of the user interface of the retrieval result display screen shown in
(76) In the embodiment described above, the present disclosure is applied to the user terminal 200. However, the present disclosure is applicable to an audio player, a television apparatus, a game device, a car navigation apparatus, a recording and reproducing apparatus, and any other information processing apparatuses.
(77) [Others]
(78) It should be noted that the present disclosure may take the following configurations.
(79) (1) An information processing apparatus, including:
(80) a display;
(81) an input unit configured to receive an input of a first keyword from a user; and
(82) a controller configured to retrieve first character information including the input first keyword from a database configured to store a plurality of character information items converted from a plurality of voice information items by voice recognition processing, extract a second keyword that is included in the first character information acquired by the retrieval and is different from the first keyword, and control the display to display a list of items including first identification information with which the acquired first character information is identified and the second keyword included in the first character information.
(83) (2) The information processing apparatus according to Item (1), in which
(84) the controller retrieves, when an operation of the user to select the second keyword included in the displayed items is received, second character information including the second keyword from the database, extracts a third keyword that is included in the second character information acquired by the retrieval and is different from the second keyword, and controls the display to display a list of items including second identification information with which the acquired second character information is identified and the third keyword included in the second character information.
(85) (3) The information processing apparatus according to Item (1) or (2), in which
(86) the database stores a plurality of summary information items obtained by summarizing the plurality of character information items, and
(87) the controller acquires, when an operation of the user to select one of the displayed items is received, a summary information item corresponding to the selected item from the database, and controls the display to display the acquired summary information item.
(88) (4) The information processing apparatus according to Item (3), in which
(89) the controller controls the display to display the summary information item in a state where a third keyword included in the summary information item is selectable, retrieves, when an operation of the user to select the displayed third keyword is received, third character information including the third keyword from the database, extracts a fourth keyword that is included in the third character information acquired by the retrieval and is different from the third keyword, and controls the display to display a list of items including third identification information with which the acquired third character information is identified and the fourth keyword included in the third character information.
(90) (5) The information processing apparatus according to Item (3) or (4), in which
(91) the controller replays, when an operation of the user to designate an arbitrary position of the displayed summary information item is received, one of the plurality of voice information items that corresponds to a character information item as a summary source of the summary information item, from a replay position corresponding to a character string displayed at the designated position.
(92) It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and alterations may occur depending on design requirements and other factors insofar as they are within the scope of the appended claims or the equivalents thereof.