SYSTEM AND METHODS FOR SIMULTANEOUSLY CAPTURING AUDIO AND IMAGE DATA FOR DIGITAL PLAYBACK
20170230630 ยท 2017-08-10
Inventors
Cpc classification
H04N5/772
ELECTRICITY
G03B31/06
PHYSICS
International classification
Abstract
The invention provides for a system and method for the simultaneous capturing and storage of digital audio data in association with still image data. The captured audio clip is instantaneously associated and embedded with the photographic image, and allows for the user to digitally play back, view, and share the still image embedded with the audio clip.
Claims
1-6. (canceled)
7. A device comprising: an audio input device operative to capture audio signals; an image input device operative to capture an image; a detection device operative to detect events experienced by the device, such events respectively represented by data associated with respective events, and in response to detecting events, output respective detection signals, and a processor connected to the audio input device, the image input device, and the detection device, and operation of the device includes: the processor initiating capturing of audio signals by the audio input device upon receiving a detection signal by the detection device; the detection signal being generated based on an event experienced by the device that is associated with a user manipulating the device in connection with capturing the image; and the audio input device continuing to perform the capturing of audio signals over the course of a time window, and the processor receiving input from a user to capture the image, the processor, based on the input, initiating the capture of the image by the image input device; the image input device capturing the image; the processor performing processing to store the image; the processor terminating the capturing of the audio signals resulting in termination of the time window in which audio signals are captured, such that (a) the initiating the capturing of audio signals and (b) the terminating the capturing of audio signals, defines the time window so as to capture audio over a time spanning the image input device capturing the image; and the processor performing processing to store the captured audio.
8. The device of claim 7, the device constituted by a portable device.
9. The device of claim 7, the processor performing processing to store the image is constituted by the processor storing the image on the device; and the processor performing processing to store the captured audio is constituted by the processor saving the captured audio on the device.
10. The device of claim 7, the processor performing processing to store the image is constituted by the processor storing the image on a cloud-based database; and the processor performing processing to store the captured audio is constituted by the processor storing the captured audio on a cloud-based database.
11. The device of claim 7, the audio input device comprises a microphone.
12. The device of claim 7, the image input device comprises a camera.
13. The device of claim 7, wherein the detection device is constituted by a motion detection device, and the event is constituted by observed motion experienced by the device that is electronically detected by the motion detection device.
14. The device of claim 7, wherein the detection device is constituted by a motion detection device that includes an accelerometer.
15. The device of claim 7, wherein the processor embeds audio data corresponding to the captured audio into image data corresponding to the image, and the processor performing processing to save the audio data, that is embedded into the image data, as embedded audio data.
16. The device of claim 15, wherein the device displays the image on a display in the device and plays the embedded audio data over one or more speakers in the device.
17. The device of claim 7, the device being wireless.
18. A device comprising: an audio input device operative to capture audio signals; an image input device operative to capture an image; a detection device operative to detect observations that are observed by the device and, in response to such observations, output respective detection signals that respectively reflect such observations, and a processor connected to the audio input device, the image input device, and the detection device, and operation of the device includes: the processor initiating capturing of audio signals by the audio input device upon receiving a detection signal by the detection device; the detection signal being generated based on an observation observed by the device, such observation constituted by generation of generated data reflecting an event, such event associated with a user manipulating the device in connection with capturing the image; and the audio input device continuing to perform the capturing of audio signals over the course of a time window, and the processor receiving input from a user to capture the image, the processor, based on the input, initiating the capture of the image by the image input device; the image input device capturing the image; the processor performing processing to store the image; the processor terminating the capturing of the audio signals resulting in termination of the time window in which audio signals are captured, such that (a) the initiating the capturing of audio signals and (b) the terminating the capturing of audio signals, defines the time window so as to capture audio over a time spanning the image input device capturing the image; and the processor performing processing to store the captured audio.
19. The device of claim 18, the device constituted by a portable device.
20. The device of claim 18, the processor performing processing to store the image is constituted by the processor storing the image on the device; and the processor performing processing to store the captured audio is constituted by the processor saving the captured audio on the device.
21. The device of claim 18, the detection device is constituted by a motion detection device.
22. The device of claim 21, the event, as reflected in the generated data, including the user moving a camera, and the motion detection device detecting such moving the camera.
23. The device of claim 18, the event, as reflected in the generated data, including the user focusing a camera, and the detection device detecting such focusing the camera.
24. The device of claim 18, the image input device comprises a camera; and the event, as reflected in the generated data, including the user utilizing a camera zoom function of the device, and the detection device detecting such utilizing the camera zoom function; and the audio input device comprises a microphone.
25. The device of claim 18, the image input device comprises a camera, and the event, as reflected in the generated data, including an operation of the camera, and the detection device detecting such operation of the camera.
26. The device of claim 25, the operation of the camera constituted by at least one selected from the group consisting of: (a) the camera automatically adjusting focus, and (b) the camera detecting an object.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0013] Various exemplary embodiments of this invention will be described in detail, wherein like reference numerals refer to identical or similar components or steps, with reference to the following figures, wherein:
[0014]
[0015]
[0016]
DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS
[0017] Particular embodiments of the present invention will now be described in greater detail with reference to the figures.
[0018] As shown in
[0019] The remote processing system 117 may be comprised of one or more servers 118 and one more databases 120 which may be cloud-based. The servers 118 can be any processor based device which can be connected to remotely. The remote server 118 has one or more applications resident on the server 118 for processing data, such as image and audio data. The remote server 118 may also be connected to other servers or processor based devices for running the various applications or processing data. The remote system 117 may also be used to access the images and audio files by third parties using other computers and wireless devices.
[0020] In addition, various applications, software, and programs may reside on the one or more servers 118. These applications, software, and programs resident on the mobile device 104 or on the remote processing system 117 enable for the processing of the audio and image data. The user 102 may access these applications 106 through their mobile device 104 and/or through the Internet, local network, wide network, or telecommunication network 114.
[0021] The system and method of the present invention allows for users of the system's mobile application 106, executing on at least one processor 105 of the mobile device 104 to interact with the camera 112 and microphone 114, to simultaneously capture an audio signal and an image or video capture; and then to process and embed or store the digital audio data in association with the image data.
[0022] In the preferred embodiment, the application 106, on the mobile device 104, converts the image and audio file into BASE64 encoding or other similar binary to text encoding. The application 106 then posts the encoded image and audio file to the online application resident on the server 118 of the remote system 117. Such action might employ a JSON POST such as through an HTTP post to send or receive as a string of data. The string of data may include the encoded image or video, the audio, user credentials and metadata. The remote application resident on the server 118 creates a request object using the JSON data and the remote application converts or decodes the encoded (i.e. BASE64) image data and audio data back to its original format. The remote application then uploads the data to a cloud environment for sharing. The remote application may use a paperclip framework to upload the data to a cloud environment such as an Amazon AWS bucket. The remote application also creates a unique URL (uniform resource locator) for the audio and image files and saves them into an object file associated with the remote application on the server 118 and associated with the local application 106 on the mobile device 104. The URLs can
[0023] In the preferred embodiment, the audio file and image file are separate files with an associated unique identification element used by the application 106 on the mobile device, or by the application running on the remote server 118 which can retrieve the appropriate audio file associated with the appropriate image file. Further, in the preferred embodiment, the image file may be created by using one frame of a video stream. Such video capture helps to eliminate any sounds associated with the actual capture of the image such as the shutter sound created from the camera hardware. The image and audio files are saved locally in the memory 110 of the mobile device 104 as an object of the application 106. Upon obtaining access through network 116 to the remote system 117, the files and encoded data are transmitted.
[0024] In an alternative embodiment, the audio and image data are captured and the audio file data is encoded and embedded into the image file or the image file is encoded and embedded into the audio file. In such embodiment, the embedded file may be opened for viewing and the captured audio file will be played back while the image is viewed.
[0025] The process of capturing audio data is triggered when the system 100 detects common events that are associated with a user capturing a still image. Various actions may trigger the recording of the audio data surrounding the capturing of a still image. For example, the recording of the audio data may be triggered when the user open the system's mobile application 106 and begins to focus the camera, utilizes the camera's zoom function, or clicks the image capture button. Additionally, the recording of the audio data may be triggered when the camera itself automatically adjusts its focus, detects objects such as faces, or audio clues. In the preferred embodiment of the present invention, the system will utilize the mobile device's accelerometer 115 to detect changes in orientation and thus trigger the capture of audio data associated with the still image.
[0026] For example, when the application 106 is opened, the mobile device 104 is usually in a horizontal position almost parallel to the ground. When the user decides to take a picture, he or she will usually move the mobile device 104 from this horizontal position to a vertical position perpendicular the ground. The accelerometer would recognize this abrupt movement of the mobile device from a horizontal position to a vertical position thus initiating the system's recording of audio data. If the user wants to take another picture, the user may shake the mobile device so that the accelerometer recognizes that another picture is imminent thus initiating the stoppage of the first audio track and the initiation of the second audio track or recording.
[0027] In another embodiment, the recording of the audio data is voice activated. For example, the system 100 recognizes verbal commands, such as start, record, and next. Furthermore, the system is programmed to trigger audio recording when it detects common phrases said by a photographer such as say cheese or smile.
[0028] In an alternative embodiment of the present invention, the capturing of the audio data is manually manipulated by the user by utilizing several capture modes. In a first capture mode, the click of the capture button simultaneously captures the still image and initiates the capture of the audio data for a predetermined time. In a second capture mode, the user can press and hold the capture button to capture the still image and audio data. When the button is released the system stops recording the audio data. In a third alternative capture mode, the user initiates the audio recording by pressing the capture button, but does not capture the image until the button is released. In a fourth capture mode, the audio and image is captured when a button is released. In another capture mode, the user needs to press the capture button three times to initiate the audio recording, and then to capture the still image, and finally to end the audio recording.
[0029] In an alternative embodiment, the capturing of the audio begins immediately when the application is opened. However, the system will only save the audio data that occurs during a predetermined amount of time before and after the picture is captured depending of the user settings or preferences (i.e. 5 seconds before and after the image is captured).
[0030] The image may be, but is not limited to, a standard picture, a panoramic picture, a burst image, or a combination of an image captured simultaneously using the front and back camera 112 of a mobile device 104. Furthermore, the user 102 may take several pictures in a given time frame and associate or embed the audio clip with all of the images. Also, the user 102 may take several pictures within a given time frame to instantaneously develop an image collage and saving that collage as one embedded audio image.
[0031] In an alternative embodiment, as the still image and audio data are simultaneously recorded, the system 100 instantaneously associates and embeds at least a portion of the audio data with the still image data and stores this captured data on the same media by utilizing a computerized process as an embedded audio image. In the present invention, that media will be an image database 110 located in the memory of the mobile device 106 containing the mobile application 106 or a remote cloud-base server 118. This process is instantaneous and does not require post-processing of the audio and image data captured in order to associate the data. In another embodiment of the present invention, the audio and image data may be captured and converted into a video file. By way of example, if the user wanted to upload a picture or series of pictures to a video site (i.e. YouTube) or if they wanted to send them via an SMS/text message to someone who does not have the application installed, the application would convert the audio and image file into a video file and transmit only the video file. Furthermore, in another embodiment, the image data and the audio date could be stored separately in two distinct files, which are synced to initiate or open simultaneously.
[0032] The embedded audio image can be played back on the mobile device 104 that captured the data or another third-party device. The user can view the embedded audio image by viewing through the mobile application's 106 GUI 108 such as in an album book. By opening the album, the user 102 may advance through each embedded audio image at his or her own pace. The user 102 will have the option for the audio to automatically play when the image is displayed or manually initiate the audio portion of the image.
[0033] The mobile or web application 106 will also contain a slideshow function where each image is displayed based on the length of sound recorded for each picture. The user 102 will be able to customize the slideshow by changing its speed as well as the type of transition that occurs. Additionally, the user 102 may save a slideshow and share it with third parties.
[0034] In another embodiment, the present invention will allow users 102 to edit the image data and audio data of the embedded audio image. For example, the user 102 may add filters to the image as well as crop, remove redeye, and brighten an image. Furthermore, the user 102 can edit the audio data in various ways including but not limited to cropping portions of the audio, changing the tone of the audio or the play back rate of the audio to give the recorded sound a slow motion feel. Additional methods to edit and manipulate image and audio data may also be used.
[0035] The present invention also allows for users to share with third-parties the embedded audio image with third parties by exchanging data through WiFi, Bluetooth, IrDA, TCP/IP, and other communication protocols. For example, the embedded audio image may be emailed, texted, tweeted, shared via AirDrop as well as posted on social networks such as Google+, Facebook, Pinterest, and Flickr. In another embodiment of the present invention, third parties that receive the shared embedded audio image through the mobile application or web application may verbally (or text) comment on the embedded audio image by recording additional audio data. This additional audio data may modify the existing embedded audio image file or create a second embedded audio image file. The new or modified file can then be shared with the original sender or other third parties. Furthermore, when a new audio comment is created the system may automatically notify the send and others that are tagged in the embedded audio image. This allows the photographer to hear reactions to the embedded audio image. Further, in addition to the audio capture, the mobile application can capture a video or image of the viewer when they see the embedded audio image.
[0036] In another embodiment, the system may include a verbal tagging mechanism. This mechanisms allows users to verbally tag third parties in the embedded audio image during the recording of the embedded audio image, immediately after the picture is taken or anytime thereafter. Anyone can be tagged in the embedded audio image, including but not limited to third parties that are in the picture or third parties who the user wants the embedded audio image to be automatically sent to. For example, once the picture is captured and the recording is complete, the user can say tag Mary or tag Jack and that information is then associated with the embedded audio image for future playback. Furthermore, the system may also communicate with the user's mobile device's contacts to acquire contact information for third parties and transmit the embedded audio image file to such users.
[0037] In another embodiment of the present invention, the user will have the opportunity to set up and join groups based on location, interests, and social network connections (e.g. Facebook friends). For example, one user can set up group for a wedding day where all the attendees can snap embedded audio images and instantaneously share them with the group. Further, an additional embodiment of the present invention allows users to locate and identify pictures based on voice recognition. By detecting a voice and reading the signature of the voice, the system can find pictures in the system which contain the voice.
[0038]
[0039]
[0040] It will be recognized by those skilled in the art that changes or modifications may be made to the above described embodiment without departing from the broad inventive concepts of the invention. It is understood therefore that the invention is not limited to the particular embodiment which is described, but is intended to cover all modifications and changes within the scope and spirit of the invention.