Method and System for Interacting with a Wearable Electronic Device
20190129508 ยท 2019-05-02
Assignee
Inventors
- Christopher Harrison (Pittsburgh, PA, US)
- Robert Xiao (Pittsburgh, CA)
- Gierad Laput (Pittsburgh, PA, US)
Cpc classification
G06F3/015
PHYSICS
A61B5/0048
HUMAN NECESSITIES
G06F3/017
PHYSICS
G06F3/0346
PHYSICS
A61B2503/12
HUMAN NECESSITIES
A61B5/11
HUMAN NECESSITIES
A61B5/1123
HUMAN NECESSITIES
A61B2562/0219
HUMAN NECESSITIES
International classification
G06F3/0346
PHYSICS
A61B5/00
HUMAN NECESSITIES
Abstract
Disclosed herein is a method of interacting with a wearable electronic device. The wearable electronic device, comprising a vibration sensor, captures vibrations transmitted through a body part on which the electronic device is worn. The vibration can emanate from an object in contact with the user's body or by the motions of the body itself. Once received by the wearable electronic device, the vibrations are analyzed and identified as a specific object, data message, or movement.
Claims
1. A method of interacting with a wearable electronic device comprising: providing a wearable electronic device, the wearable electronic device comprising an inertial measurement unit capable of capturing data at a rate of about 4000 Hz or more; placing the wearable electronic device in contact with a first body part; capturing data related to a movement of a second body part, wherein the movement creates vibrations that travel from the second body part to the inertial measurement unit of the wearable electronic device; analyzing the data; and providing feedback though the wearable electronic device based on the analyzed data.
2. The method of claim 1, further comprising: classifying the movement based on the analyzed data.
3. The method of claim 1, wherein the movement comprises a hand gesture.
4. The method of claim 1, wherein the movement comprises motion created by an object touching the second body part.
5. The method of claim 1, wherein the vibrations have a frequency greater than 200 Hz.
6. The method of claim 1, wherein the IMU comprises at least one of an accelerometer and a gyroscope.
7. The method of claim 1, wherein the wearable electronic device is a smart-watch.
8. The method of claim 4, wherein the object is a transducer emitting a structured vibration.
9. The method of claim 8, where the structured vibration comprises a header sequence followed by a message.
10. The method of claim 9, wherein the header sequence comprises chirps at 100 Hz, 200 Hz, and 300 Hz.
11. The method of claim 1, wherein analyzing the data comprises: extracting a maximum value at a plurality of frequency bands.
12. The method of claim 8, wherein the structured vibration comprises a data packetization layer, an error detection layer, an error correction layer, and a modulation layer.
13. The method of claim 1, wherein analyzing the data comprises: determining a power spectra of a fast Fourier transform for each axis of a three-axis accelerometer in the inertial measurement unit; combining the power spectra of each axis into a combined power spectra by using a maximum value of the three axis.
14. A system for providing interaction between a user and a wearable electronic device comprising: a wearable electronic device comprising an inertial measurement unit capable of operating at about 4000 Hz, wherein the inertial measurement unit outputs data related to bio-acoustic vibrations received at the wearable electronic device; a classifier for correlating the data with at least one of a hand gesture, grasped object, or structure vibration.
15. The system of claim 14, further comprising: a vibro tag that outputs the structured vibration.
16. The system of claim 15, wherein the vibro tag comprises a transducer operating at about 100-300 Hz.
Description
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
DETAILED DESCRIPTION
[0026] According to embodiments of the present invention is a method and system for interacting with a wearable electronic device 101. As shown in
[0027] The applications 105 include user interfaces that can be launched once a gesture or object is recognized. For example, if a user grasps an electronic toothbrush, the wearable 101 will launch a timer to ensure the user brushes for an appropriate amount of time.
[0028] Although most wearable electronic devices 101 (including smartwatches, activity trackers, and other devices designed to be worn on the body) contain capable IMU's 102, existing software for these devices 101 generally limit accelerometer data access to about 100 Hz. This rate is sufficient for detecting coarse movements such as changes in screen orientation or gross interactions such as walking, sitting, or standing. However, these IMU's 102 often support significantly higher sample ratesup to thousands of hertz. At these faster sampling speeds, the wearable 101 can capture nuanced and fine-grained movements that are initiated or experienced by the human user. Like water, the human body is a non-compressible medium, making it an excellent vibration carrier. For example, when sampling at 4000 Hz, vibrations oscillating up to 2000 Hz (e.g., gestures, grasped objects) can be sensed and identified (per the Nyquist Theorem). This superior sensitivity transforms the wearable 101 into a bioacoustic sensor capable of detecting minute compressive waves propagating through the human body.
[0029] For example,
[0030] Like any medium, the human arm characteristically amplifies or attenuates vibrations at different frequencies. Therefore, certain frequencies transmit more easily through the human body.
[0031] In one example embodiment, the wearable electronic device 101 comprises an LG G W100 smartwatch. The smartwatch, in this example, includes an InvenSense MPU6515 IMU 102 capable of measuring acceleration at 4000 samples per second. This type of IMU 102 can be found in many popular smartwatches and activity trackers. Despite the high sampling rate capability, the maximum rate obtainable through the Android Wear API is 100 Hz. Therefore, to detect user movements, the Linux kernel 103 on the device must be modified, replacing the existing accelerometer driver with a custom driver.
[0032] In the example using a smartwatch, the kernel driver interfaces with the IMU 102 via an inter-integrated circuit (I.sup.2C), configuring the IMU 102 registers to enable its documented high-speed operation. Notably, this requires the system to use the IMU's 102 onboard 4096-byte FIFO to avoid excessively waking up the system CPU. However, this FIFO only stores 160 ms of dataeach data sample consists of a 16-bit sample for each of the three axes. Thus, the driver is configured to poll the accelerometer in a dedicated kernel thread, which reads the accelerometer FIFO into a larger buffer every 50 ms. Overall, the thread uses about 9% of one of the wearable's 101 four CPU cores.
[0033] To improve the accuracy of systems with internal clocks that are not temperature-stabilized, a correction is made. For non-corrected clocks, higher sampling rates are experienced as the CPU temperature increased. For example, sampling rates may vary between 3990 Hz (watch sleeping, off wrist) to 4080 Hz (on arm, high CPU activity). To correct this error, in one embodiment the kernel driver is augmented to compute the rate at which samples were written into the MPU's FIFO buffer using a nanosecond-precision kernel timestamp. For applications requiring precise sampling rates, such as resonance profiling and data transmission, the input data is normalized to 4000 Hz using a sine-based interpolator capable of supporting continuously variable input sample rates.
[0034] In one example method of interacting with the wearable electronic device 101, unique hand gestures, such as flicks, claps, snaps, scratches and taps performed by a user are detected and classified by the wearable 101. Each gesture is then classified by recognizing the distinctive micro-vibrations created by the movement and propagated through the arm. Depending on the location and type of gesture, different frequencies of vibrations are generated. Subsequently, various frequencies are attenuated during propagation (e.g., anatomical features can act as passive vibroacoustic filters). The resulting frequency profiles make many gestures uniquely identifiable. Many types of gestures can be recognized, such as one-handed gestures, two-handed gestures, and on-body touch input (see
[0035]
[0036] Once the bio-acoustic signals are received on the wearable 101, several signal processing operations can be completed to detect and classify hand gestures in real-time. For each incoming signal frame t, the power spectra of the fast Fourier transform (FFT) is computed on data from each accelerometer axis, producing three spectra Xt, Yt, Zt. Optionally, a Hamming window on the FFT is used to minimize spectral banding. To make sensing robust across hand orientations, the DC component is removed and the three FFTs combined into one by taking the max value across the axes (F.sub.t,i=max(X.sub.t,i, Y.sub.t,i, Z.sub.t,i)).
[0037] Next, the average of the w=20 past FFT spectra (S.sub.i=F.sub.t1,i, . . . , Ft.sub.tw+1,i)) is computed and statistical features are extracted from the averaged signal: mean, sum, min, max, 1st derivative, median, standard deviation, range, spectral band ratios, and the n highest peaks (n=5). These features form the input to a SMO-based support vector machine (SVM) (poly kernel, =10.sup.12, normalized) for real-time classification. In this example embodiment, the band ratios, peaks, mean, and standard deviation are capable of providing 90% of the bio-acoustic signal's discriminative power. Table 1 describes these features and the motivations behind their use.
TABLE-US-00001 TABLE 1 Feature Set Operation Justification Power spectrum S.sub.i Specific frequency data Statistical .sub.s, .sub.s, .sub.s, max(S), Characterizes gross features of FFT min(S), centroid, signal peaks 1.sup.st Derivative
[0038] When hand gestures are combined with relative motion tracking (e.g., native data from IMUs 102), the example embodiment uncovers a range of interaction modalities (see
[0039] In another example embodiment, the method of the present invention can be used to identify grasped objects 301. With objects identified, context-relevant functionality or applications can be launched automatically by the wearable electronic device 101. For example, when a user operates a mechanical or motor-powered device, the object 301 produces characteristic vibrations, which transfer into the operator. The wearable electronic device 101 is able to capture these signals, which can be classified, allowing interactive applications to better understand their user's context and further augment a wide range of everyday activities.
[0040] The same signal processing pipeline used for gestures is used for object detection, but with slightly tweaked parameters (w=15, n=15). In addition, the data analysis step comprises a simple voting mechanism (size=10) to stabilize the recognition. The method recognizes a wide range of objects 301 (see
[0041] In yet another alternative embodiment, the method of the present invention can be used to augment environments and objects with structured vibrations. For example, in one embodiment a vibro-tag 201 comprising a small (2.4 cm.sup.3) SparkFun COM-10917 Bone Conductor Transducer, powered by a standard audio amplifier, is used to augment a user's environment. When a user touches the tag 201, modulated vibrations are transmitted bio-acoustically to the wearable electronic device 101, which decodes the acoustic packet and extracts a data payload (see
[0042] In one embodiment, the vibro-tags 201 are inaudible to the user, but still capable of transmitting data at high speed. Because the IMU 102 can only sense frequencies up to 2 KHz, ultrasound frequencies (e.g. frequencies above 16 kHz) cannot be used. Further, frequencies above 300 Hz are not used as they would manifest as audible buzzing sounds to the user. As a result, in one embodiment, 200 Hz is utilized as a suitable carrier frequency for data transmission. However, a person having ordinary skill in the art will appreciate that other frequencies can be used, particularly if audible sounds are tolerable.
[0043] In one example embodiment, the data transmission system is a full stack signal pipeline, consisting of data packetization, error detection, error correction, and modulation layers. The input data stream is segmented into individually transmitted data packets. In one example, the format comprises an 8-bit sequence number combined with a data payload. Packet size is constrained by the error detection and correction layers; in this embodiment, it can be up to 147 bits in length. In order to detect transmission errors and ensure that bad data is not accidentally accepted, an 8-bit cyclic redundancy check (CRC) is optionally appended to the message. In this example, the CRC is computed by truncating the Adler-32 CRC of the message.
[0044] Next, error correction is applied. Although this stage also detects errors (like the CRC), its primary purpose is to mitigate the effects of minor transmission problems. In an example embodiment, a Reed-Solomon code is used with 5 bits per symbol, allowing the system to have 31 symbols per message (a total of 155 bits). These parameters were chosen to allow a single message to be transmitted in approximately one second using common modulation parameters. The number of ECC symbols can be tuned to compensate for noisier transmission schemes.
[0045] At this point, the full message+CRC+ECC is transmitted, totaling 155 bits, as modulated vibrations. Four different modulation schemes can be used, using binary Gray coding to encode bit strings as symbols:
[0046] Amplitude Shift Keying (ASK): data is encoded by varying the amplitude of the carrier signal;
[0047] Frequency Shift Keying (FSK): data is encoded by transmitting frequency multiples of the carrier signal;
[0048] Phase Shift Keying (PSK): adjusting the phase of the carrier signal, with respect to a fixed reference phase; and
[0049] Quadrature Amplitude Modulation (QAM): data encoded as variations in phase and amplitude, with symbols encoded according to a constellation diagram mapping phase and amplitude combinations to bit sequences.
[0050] In an alternative embodiment, the message is created with a short header sequence consisting of three 20 ms chirps at 100 Hz, 300 Hz, and 200 Hz. This sequence is readily recognized and quite unlikely to occur by accident. Furthermore, the presence of a 300 Hz chirp in the header prevents accidental detection in the middle of a transmission. Finally, the 200 Hz chirp provides a phase and amplitude reference for the ASK, PSK and QAM transmission schemes, eliminating the need for clock synchronization between the tag 201 and wearable 101.
[0051] Decoding can be performed on the wearable electronic device 101 itself, using an optimized decoding routine. The decoder 106 continuously reads samples from the accelerometer or IMU 102, converts the samples to 6400 Hz (to simplify FFT computations), and continuously searches for the header sequence. When found, the decoder 106 demodulates the signal (using the amplitude and phase of the 200 Hz header chirp), performs decoding, verifies the CRC, and reports the resulting message to an application (if decoding was successful).
[0052] In an example demonstration of the method of the present invention, 18 participants (10 female, mean age 25.3, 17 right-handed) were recruited for a live user study. Participants were asked to perform a series of tasks while wearing a wearable electronic device 101. Since variations in user anatomy could affect bio-acoustic signal propagation, the user's body mass index (BMI, mean=22.3) was recorded to further explore the accuracy of the sensing technique. To verify the robustness of the method across different devices 101, the study used two different devices 101 of the same model (Watch A and Watch B), randomized per user. All machine learning models were trained on Watch A, but deployed and tested on both watches 101.
[0053] To test the accuracy of gesture recognition, different machine learning models were trained for each gesture set (
[0054] For opbject detection, data was collected from one user on 29 objects using a single wearable electronic device 101. The collected data was then used to train a machine learning model. An example object set and their bio-acoustic signatures are shown in
[0055] After collecting the data from a single user, real-time object classification was performed for all 17 participants using the same 29 objects 301. Objects were spread across six locations to vary environmental conditions. These locations include: personal desk area, shared woodshop, office, kitchen and bathroom, public common area, and a parking space. Further, all objects 301 were tested in a location that was different from where it was trained. A single trial involved a user interacting with one of the 29 objects 301. Participants were briefly shown how to operate the objects 301 (for safety), but were free to grasp the object however they wished. Objects 301 were randomized per location (rather than randomized globally).
[0056] Across 29 objects 301, 17 users, and using data that was trained on a single person four weeks prior, an overall object detection accuracy of 91.5% (SD=4.3%) was obtained. Two outlier objects 301 were found that were 3.5 standard deviations below the mean. When these two outlier objects 301 are removed, the method returned an overall accuracy of 94.0% (27 objects), with many objects 301 achieving 100% accuracy. Additionally, no statistical differences were found between a user's body-mass index or object 301 location. Overall, these results suggest that object detection is indeed accurate and robust across users and environment, and object bio-acoustic signatures are consistent over time.
[0057] In another example embodiment, the method recognizes structured vibrations that can be used with several variations of ASK, PSK, FSK and QAM modulation schemes. In addition, multiple symbol rate and bits-per-symbol configurations can be used. For example, configuration can include: 4-FSK (2 bits per symbol, transmitting frequencies of 50, 100, 150 and 200 Hz), 4-PSK (2 bits per symbol), 8-PSK (3 bits per symbol), 8-QAM (3 bits per symbol, non-rectangular constellation), 16-QAM (4 bits per symbol, non-rectangular constellation).
[0058] Using these various schemes, 1700 trials were collected with a bit error rate results, which compares the received, demodulated message with the original transmitted message. (See
[0059] The 80.sup.th percentile BER (BER.sub.80), for parity with Ripple, is used to get a better sense of the distribution. This measurement has a practical impact on the choice of error correction parameter: if an error correction scheme is chosen that can correct errors up to BER.sub.80, then it can be expected to successfully decode 80% of transmitted packets.
[0060] The results indicate that 4-PSK provides optimal performance in terms of BER across all conditions, when considering the raw bit rate. With a BER.sub.80 of 0.6% (0.93 message bits), only 2 Reed-Solomon ECC symbols would need to be added to our message in order to correct 80% of messages, leaving 137 bits for the payload. This payload takes 0.83 seconds to transmit (155 bits at 200 bits per second, plus header overhead), for an overall transmission rate of 165 bits per second (with a 20% packet loss rate), through the finger, hand and wrist.
[0061] In a system that takes advantage of accelerometers and IMUs 102, it is critically important to reduce the detection of false positives (i.e., an action that is unintentionally triggered). To validate the resistance of the method to false positives, the classifier is trained with a large set of background data (i.e., negative training examples). In this example, 17 participants were asked to perform several mundane and physically rigorous activities in different locations. These activities included: walking for two minutes, jogging in place for 30 seconds, performing jumping jacks for 30 seconds, reading a magazine or book for one minute, and washing hands for 30 seconds. These five activities were randomly interspersed throughout the object detection study (i.e., when users transitioned between each of the six building locations).
[0062] While participants performed these activities, the number of false detections triggered by the system (any prediction that was not null or no object was considered a false positive) were tallied. Across 17 users, six random locations, and five activities, collectively spanning a total of 77 minutes, the method triggered a total of six false positive classifications. For 12 of 17 participants, the system triggered no false positives. These results suggest that false positives can be greatly reduced by exposing the machine-learning model to a large set of negative examples.
[0063] The methods described herein open the possibility for enhanced interaction with wearable electronic devices 101. Hand gestures can be used to appropriate the area around the watch for input and sensing. For example, in a smartwatch launcher, navigation controls can be placed on the skin (e.g., left, right, select), as well as enabling users to traverse back up through the hierarchy with a flick gesture (
[0064] Other examples of interaction can include the following. Gestures can be used to control remote devices. For example, a user can clap to turn on a proximate appliance, such as a TV; wave gestures navigate and snaps offer input confirmation. Flick gestures can be used to navigate up the menu hierarchy (
[0065] Gestures can also be used to control nearby infrastructure. For example, a user can snap his fingers to turn on the nearest light. A pinching gesture can be used as a clutch for continuous brightness adjustment, and a flick confirms the manipulation (
[0066] Because the method of the present invention can also be used to identify objects 301, applications offer the ability to better understand context and augment everyday activities. For example, the kitchen experience can be augmented by sensing equipment used in the preparation of a meal and e.g., offering a progress indicator for blending ingredients with an egg mixer (
[0067] The method can also sense unpowered objects 301, such as an acoustic guitar. For example, the method can detect the closest note whenever the guitar is grasped, and provide visual feedback to tune the instrument precisely (
[0068] Through object sensing, the method can also augment analog experiences with digital interactivity. For example, with a Nerf gun, it can detect the loading of a new ammo clip, and then keep count of the number of darts remaining (
[0069] Many classes of objects 301 do not emit characteristic vibrations. However, with a vibro-tag 201, the object can emit inaudible, structured vibrations containing data. For example, a glue gun (non-mechanical but electrically powered) can be instrumented with a vibro-tag 201. The tag 201 broadcasts an object ID that enables the wearable 101 to know what object 301 is being held. It also transmits metadata e.g., its current temperature and ideal operating range (
[0070] Structured vibrations are also valuable for augmenting fixed infrastructure with dynamic data or interactivity. For example, in an office setting, a user can retrieve more information about an occupant by touching the room nameplate augmented with a vibro-tag 201, which transmits e.g., the person's contact details to the wearable 101 (
[0071] While the disclosure has been described in detail and with reference to specific embodiments thereof, it will be apparent to one skilled in the art that various changes and modification can be made therein without departing from the spirit and scope of the embodiments. Thus, it is intended that the present disclosure cover the modifications and variations of this disclosure provided they come within the scope of the appended claims and their equivalents.