Encoding/decoding apparatuses and methods for encoding/decoding vibrotactile signals
11113934 · 2021-09-07
Assignee
Inventors
Cpc classification
H03M7/6047
ELECTRICITY
H03M7/30
ELECTRICITY
G08B6/00
PHYSICS
International classification
Abstract
An encoding apparatus for encoding a vibrotactile signal includes a first transforming unit configured to perform a discrete wavelet transform of the signal, a second transforming unit configured to generate a frequency domain representation of the signal, a psychohaptic model unit configured to generate at least one quantization control signal based on the generated frequency domain representation of the sampled signal and on a predetermined perceptual model based on human haptic perception, a quantization unit configured to quantize wavelet coefficients resulting from the performed discrete wavelet transform and adapted by the quantization control signal, a compression unit configured to compress the quantized wavelet coefficients, and a bitstream generating unit configured to generate a bitstream corresponding to the encoded signal based on the compressed quantized wavelet coefficients. The subject matter described herein also includes a corresponding decoding unit, an encoding method and a decoding method.
Claims
1. An encoding apparatus for encoding a vibrotactile signal, comprising: a) a first transforming unit configured to perform a discrete wavelet transform of the signal; b) a second transforming unit configured to generate a frequency domain representation of the signal; c) a psychohaptic model unit configured to generate at least one quantization control signal based on the generated frequency domain representation of the sampled signal and on a predetermined perceptual model based on human haptic perception, wherein the psychohaptic model unit is configured to identify peaks in the signal spectrum, wherein each peak corresponds to a frequency and a magnitude, and wherein psychohaptic model unit comprises a memory adapted to store the frequency and magnitude of each identified peak; d) a quantization unit configured to quantize wavelet coefficients resulting from the performed discrete wavelet transform and adapted by the quantization control signal; e) a compression unit configured to compress the quantized wavelet coefficients; and f) a bitstream generating unit configured to generate a bitstream corresponding to the encoded signal based on the compressed quantized wavelet coefficients.
2. The encoding apparatus according to claim 1 further comprising a block unit configured to split the vibrotactile signal into a plurality of consecutive blocks.
3. The encoding apparatus according to claim 1 wherein the first transforming unit is adapted to perform the discrete wavelet transform by using a biorthogonal wavelet that comprises at least one of a Cohen-Daubechies-Feauveau-wavelet and a 9/7-Cohen-Daubechies-Feauveau-wavelet.
4. The encoding apparatus according to claim 1, wherein the second transforming unit is configured to generate the frequency domain representation by using a discrete Fourier transform, a fast Fourier transform, a discrete cosine transform or a discrete sine transform of the sampled signal.
5. A transmitter in a communication system comprising the encoding apparatus according to claim 1.
6. The encoding apparatus according to claim 1, wherein the psychohaptic model unit is configured to compute a masking threshold for the peaks at different frequencies based on the frequency and magnitude of each peak.
7. The encoding apparatus according to claim 1, wherein the psychohaptic model unit is further configured to compute an absolute threshold of perception at different frequencies which corresponds to an average signal magnitude required for human at a certain frequency to be able to perceive a signal.
8. The encoding apparatus according to claim 7, wherein the psychohaptic model unit is further configured to compute a global masking threshold based on the masking threshold and the absolute threshold.
9. The encoding apparatus according to claim 8, wherein the psychohaptic model unit is configured to compute a signal-to-mask-ratio based on the sum of the energy of the global masking threshold at different frequencies and on the energy of the signal, in particular to compute the signal-to-mask-ratio for each frequency band of the wavelet coefficients of the discrete wavelet transform.
10. The encoding apparatus according to claim 9, wherein the quantization unit is configured to quantize wavelet coefficients by allocating bits for each frequency band of the wavelet coefficients based on a mask-to-noise-ratio, wherein the mask-to-noise-ratio is computed based on the signal-to-mask-ratio and a signal-to-noise-ratio which is computed based on the energy of the signal and the energy of a noise introduced by the quantization.
11. The encoding apparatus according to claim 1, wherein the compression unit is adapted to use an algorithm based on set partitioning in hierarchical trees for the compression of wavelet coefficients.
12. The encoding apparatus according to claim 1, wherein the quantization unit is configured to be adapted by the quantization control signal such that the distortion introduced during the quantization of the sampled signal in different frequency ranges is, relative to a perception masking threshold of the perception model, not perceivable by a human.
13. The encoding apparatus according to claim 1, wherein the quantization unit comprises an embedded deadzone quantizer.
14. A decoding apparatus for decoding a vibrotactile signal from a bitstream, comprising: a) a decompression unit configured to decompress the bitstream, wherein in particular an algorithm based on inverse set partitioning in hierarchical trees is provided for the decompression; b) a dequantization unit configured to dequantize the decompressed bitstream, wherein the decompressed bitstream is quantized using a psychohaptic model unit configured to generate at least one quantization control signal based on a frequency domain representation of a sampled signal and on a predetermined perceptual model based on human haptic perception, wherein the psychohaptic model unit is configured to identify peaks in the signal spectrum, wherein each peak corresponds to a frequency and a magnitude, and wherein psychohaptic model unit comprises a memory adapted to store the frequency and magnitude of each identified peak; and a third transforming unit configured to perform an inverse discrete wavelet transform of the dequantized bitstream.
15. A receiver in a communication system comprising the decoding apparatus according to claim 14.
16. An encoding method for encoding a vibrotactile signal, comprising: a) performing a discrete wavelet transform of the signal; b) generating a frequency domain representation of the signal; c) generating at least one quantization control signal based on the generated frequency domain representation of the signal and on a predetermined perceptual model based on human haptic perception, wherein generating the quantization control signal includes using a psychohaptic model unit to identify peaks in the signal spectrum, wherein each peak corresponds to a frequency and a magnitude, and wherein psychohaptic model unit comprises a memory adapted to store the frequency and magnitude of each identified peak; d) quantizing wavelet coefficients resulting from the performed discrete wavelet transform and adapted by the quantization control signal; e) compressing the quantized wavelet coefficients; and f) generating a bitstream corresponding to the encoded signal based on the compressed quantized wavelet coefficients.
17. A computer program product comprising computer-executable instructions embodied in a non-transitory computer-readable medium which, when is executed by a processor of a computer, cause the computer to carry out the method of claim 16.
18. A decoding method for decoding a vibrotactile signal from a bitstream, comprising: a) decompressing the bitstream, wherein in an algorithm based on inverse set partitioning in hierarchical trees is provided for the decompression; b) dequantizing the decompressed bitstream, wherein the decompressed bitstream is quantized using a psychohaptic model unit configured to generate at least one quantization control signal based on a frequency domain representation of a sampled signal and on a predetermined perceptual model based on human haptic perception, wherein the psychohaptic model unit is configured to identify peaks in the signal spectrum, wherein each peak corresponds to a frequency and a magnitude, and wherein psychohaptic model unit comprises a memory adapted to store the frequency and magnitude of each identified peak; and c) performing an inverse discrete wavelet transform of the dequantized bitstream.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Hereinafter, the invention will be described with reference to its advantageous embodiments with reference to the drawings. These drawings, where like reference numerals refer to identical or functionally similar elements throughout the separate views, together with the detailed description below, are incorporated in and form part of the specification, and serve to further illustrate embodiments of concepts that include the claimed invention, and explain various principles and advantages of those embodiments.
(2) In the drawings:
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
PRINCIPLE OF THE PRESENT INVENTION
(16) The present invention proposes a tactile codec. Essentially, the proposed codec uses a perceptual approach with a DWT and subsequent quantization. The quantizer is designed to be adaptive considering a psychohaptic model. After quantization there is used a SPIHT algorithm to generate the bitstream to be transmitted. The entire process is modular, hence the encoder can work with any psychohaptic model (although a specific model is also described below in one embodiment). This allows for future enhancements.
DESCRIPTION OF THE EMBODIMENTS OF THE INVENTION
(17)
(18) The encoding apparatus 100 comprises further a psychohaptic model unit 8 configured to generate at least one quantization control signal based on the generated frequency domain representation of the sampled signal and on a predetermined perceptual model based on human haptic perception, a quantization unit 10 configured to quantize wavelet coefficients resulting from the performed discrete wavelet transform and adapted by the quantization control signal and a compression unit 12 configured to compress the quantized wavelet coefficients.
(19) The psychohaptic model of human haptic perception is essentially describing how a human being perceives touch or vibrations. It has been researched and investigated for different model assumptions that touch sensation (haptic) perceived by a human is frequency-dependent and amplitude-dependent. That is, the intrinsic model of human haptic perception is first of all based on a (measured) threshold amplitude necessary to cause a sensation dependent on frequency. However, it has also been found that this frequency-dependent sensation threshold is not independent from the actual frequency spectrum of the sensed (vibrotactile) signal (a signal which is generated as a result of vibration, as explained above). This is due to masking effects (as further explained below) due to the different input frequencies and the (measured) threshold. Hence, the psychohaptic model describes how, dependent on the input frequency spectrum, the theoretical frequency-dependent sensation amplitude threshold is modified into a modified frequency-dependent amplitude threshold which represents the amplitude magnitude at a particular frequency that is really needed to cause a sensation at each frequency for the particular input signal considered.
(20) It is one aspect of the present invention that such a modified threshold (hereinafter “masking threshold”) which already takes into account (is modified by) the input signal (frequency spectrum) through a particularly selected psychohaptic model (which describes how the theoretical threshold is modified into the real masking threshold) is used for allocation of bits in the quantizer by a quantization control signal. The quantization control signal is based on the frequency-dependent masking (modified by the psychohaptic model used) threshold and, since the quantizer operates/quantizes in different frequency bands, it can use this information on the threshold (amplitude) in each frequency region to make the allocation of the number of allocated bits among the different frequency bands adaptable on the sensation masking threshold. Since the modified (masking) threshold represents the real sensation/perception of the human for that particular vibrotactile input signal dependent on frequency, the allocation of bits in each frequency band becomes perception dependent. This applies also in a dynamic fashion, i.e. when the signal changes over time, the spectrum changes, the threshold changes dependent on the model which describes how the threshold is modified dependent on frequency and amplitude, and consequently also the allocation of bits in each frequency band may change dynamically.
(21) Since essentially the amplitude perception threshold in each frequency band is now known, the quantization control signal distributes the available bits among the different frequency bands in such a manner that the noise/distortion (invariably introduced by the quantizer when quantizing in each frequency band) is not perceivable as much as possible (dependent on the available number of bits). That is, the quantization control signal provides the threshold for the concerned frequency band and the quantizer then looks at the quantization noise level (i.e. the distortion) in this frequency range and allocates as many bits as possible to bring the noise level below or at least as close as possible to the threshold in that very frequency range. This is possible because more bits (of the available number of bits) in a particular frequency range cause less noise and less cause more noise.
(22) It is a second aspect of the present invention that the modified masking threshold (modified by a (any) psychohaptic model based on the modification of the theoretical threshold by the input spectrum) makes the bit allocation variable in the frequency bands such that the distortion introduced to the sampled signal during the quantization is not, relative to the perception masking threshold in that frequency range, perceivable by a human. This reduces the amount of data to be transmitted or, to put it differently, only usable data of the signal adapted to human perception is transmitted/encoded with low noise.
(23) Hereinafter, a special embodiment of the creation and use of a psychohaptic model for a threshold dependent noise adjustment and bit allocation is described below.
(24) The compression unit 12 may adopt any lossless compression that allows the original data to be perfectly reconstructed from the compressed data. In particular, an algorithm based on set partitioning in hierarchical trees is provided for the compression process.
(25) The encoding apparatus 100 further comprises a header encoding unit 14 configured to add a header at the front of every compressed block to incorporate some side information in a bitstream corresponding to the encoded signal such that a corresponding decoder is able to decompress the signal correctly. In addition, the encoding apparatus 100 comprises a bitstream generating unit 16 configured to generate the bitstream based on the compressed quantized wavelet coefficients and on the header with side information.
(26) The header added by the header encoding unit 14 shown in
(27) In a further preferred embodiment not shown, the signal is split by the block unit 2 into smaller blocks with smaller block lengths. Accordingly, the length of the header is reduced. The block length on which the encoding apparatus operates can be chosen as 32, 64, 128, 256 or 512 input signal samples. To signal this to the corresponding decoding apparatus, a coding with variable bit length is employed. A code of 1 corresponds to a block length of 32, a code of 01 corresponds to a block length of 64, a code of 001 corresponds to a block length of 128, a code of 0001 corresponds to a block length of 256, and a code of 0000 corresponds to a block length of 512.
(28) The first transforming unit 4 shown in
(29) The number of levels of the DWT l.sub.DWT is given by the following equation:
(30)
wherein L.sub.B represents the block length. Thus, for a longer block length L.sub.B, the number of levels of the DWT l.sub.DWT is increased in order to achieve a better decorrelation performance.
(31) Before the psychohaptic model is applied, a DFT of length 2L.sub.B of the current signal block is performed. Afterwards, the result is normalized by √{square root over (L.sub.B)} such that it is able to directly compute energy in the spectral domain. Thereafter, the spectrum is cut to the original block length L.sub.B again to obtain a single-sided spectrum. Thereby, a direct mapping of spectral values to wavelet coefficients is achieved.
(32)
(33) The model providing unit 20 comprises a peaks extraction subunit 24, a masking threshold computation subunit 26, a perceptual threshold subunit 25 and a power additive combination subunit 28. The model application unit 30 comprises a band energy computation subunit 32 and a SMR computation subunit 34.
(34) The peaks extraction subunit 24 is configured to identify peaks based on the extracted magnitude of the signal by identifying peaks that have a certain prominence and level. In particular, the minimum prominence may be chosen as 15 dB and the minimum level as −42 dB. A minimum separation of 10 Hz between the peaks is preferred. However, it is also possible not to set the minimum separation value. Each peak corresponds to a frequency f.sub.p and a magnitude a.sub.p. The psychohaptic model unit comprises a memory (not shown) adapted to store the frequency f.sub.p and magnitude a.sub.p of each identified peak.
(35) The masking threshold computation subunit 26 is configured to compute a masking threshold for the peaks at different frequencies f based on the frequency f.sub.p and magnitude a.sub.p of each peak as well as on a sampling frequency f.sub.S of the signal of a sampling unit which is adapted for sampling of the signal. The masking thresholds m.sub.p(f) at different frequencies f for each peak are computed with the following formula:
(36)
(37) The perceptual threshold subunit 25 is configured to compute an absolute threshold of perception at different frequencies (due to the fact that humans perceive signals at different frequencies in a different way) which corresponds to a signal magnitude, in particular an average signal magnitude, required for human at a certain frequency to be able to perceive a signal. The absolute thresholds of perception t(f) at different frequencies f are computed with the following formula:
(38)
(39) The power additive combination subunit 28 is configured to compute a global masking threshold using power additive combination to add the absolute threshold of perception t(f) with the masking threshold m.sub.p(f).
(40) The band energy computation subunit 32 is configured to compute the energy of the signal in each DWT band E.sub.S,b. The SMR computation subunit 34 is configured to compute a signal-to-mask-ratio (SMR) for each DWT band based on the sum of the energy of the global masking threshold in each band E.sub.M,b and on the computed energy of the signal in each band E.sub.S,b. The SMR for each band is obtained by dividing E.sub.S,b by E.sub.M,b and representing the result in dB. The SMR values for all bands are passed on to the quantization unit 10 together with the values of E.sub.S,b.
(41)
(42) In another preferred embodiment, encoding of the quantized wavelet coefficients of each block is performed via a 1D version of Set Partitioning in Hierarchical Trees (SPIHT) algorithm. However, the present invention is not limited to use a SPIHT algorithm, but may adopt any lossless compression algorithm which removes the redundancies of the signal and is suitable for the DWT.
(43) Since the input of the 1D version of SPIHT algorithm is a 1D signal, SPIHT is adapted to encode 1D quantized wavelet coefficients. SPIHT is a zero tree based coding method which utilizes two types of zero trees and encodes the significant coefficients and zero trees by successive sorting and refinement passes. In SPIHT, each quantized wavelet coefficient is presented with magnitude bitplanes and a corresponding sign plane. The parent-child relationship among these coefficients are defined based on the applied DWT levels and used when encoding the bitplanes through the iterations of sorting and refinement passes.
(44) An example of the resulting 1D wavelet coefficients and the tree structure is shown in
(45) SPIHT defines three lists, namely List of Significant Pixels (LSP), List of Insignificant Pixels (LIP), and List of Insignificant Sets (LIS). In sorting pass, the position of a coefficient is inserted into LSP when the coefficient is significant, into LIP when it is insignificant, or into LIS when the coefficients of corresponding tree are insignificant. Furthermore, the bits of coefficients in magnitude bitplane are inserted to the encoded bitstream, if the coefficient is LIP or LIS. In refinement pass, the bits in magnitude bitplane for the coefficients belonging to LSP before the last sorting pass are inserted to the encoded bitstream. These passes are repeated for each magnitude bitplane. The final output of the SPIHT module is the bitstream of lossless compression of quantized 1D DWT coefficients.
(46)
(47)
(48) In step S1, a vibrotactile signal is received. In step S2, the received signal is split into blocks. In step S3, the spectrum of the signal blocks is obtained by applying DFT. In step S4, a global masking threshold according to the inventive psychohaptic model is computed. In parallel to step S3, the signal blocks are decomposed by applying DWT (S5).
(49) The quantization unit 10 allocates a certain bit budget to the different DWT bands according to the inventive psychohaptic model in order to reduce the rate considerably without introducing any perceivable distortion. In order to accomplish this task, the quantization unit 10 takes into account the SMR values from the psychohaptic model (resulting from step S4). In a loop shown in
(50) After step S5, it is started to allocate each DWT band 0 bits from the total bit budget of n bits (S6). In every iteration (S7, S9, S10), the SNR is calculated in dB using the signal energy values in each band passed over by the psychohaptic model and the noise energy introduced by the quantization. Then, the Mask-to-Noise-Ratio (MNR) is calculated with the formula “MNR=SNR−SMR” (S9). Then, one bit is allocated to the band with the lowest MNR value and repeat until all n bits are allocated.
(51) An embodiment of a quantization unit 10 according to the invention, namely an embedded deadzone quantizer, is shown in
(52) The structure in
(53)
where b is the number of bits allocated to a particular band. The wavelet coefficients are then quantized according to the formula
(54)
(55) Thus, the wavelet coefficients are quantized to the original range. This formula also implies the addition of one sign bit. After all bits have been allocated (S7) and therefore all wavelet coefficients have been quantized (S8), all the quantized wavelet coefficients are scaled to integers by
(56)
(57) These quantized integer wavelet coefficients are passed on to the SPIHT algorithm to compress the signal (S11).
(58) Afterwards, side information is collected and multiplexed with the encoded signal (S12) in order to generate a bitstream (S13).
(59)
(60) In step S21, a bitstream is received. In step S22, the received bitstream is demultiplexed into side information and encoded signal. In step S23, signal blocks are extracted from the encoded signal. In step S24, an ISPHIT is performed on the received encoded blocks in order to decompress the encoded blocks. In step S25, the decompressed blocks are dequantized. In step S26, an IDWT is applied to the dequantized signal blocks. Afterwards, the decoded blocks are merged (S27) to generate the reconstructed signal (S28).
(61)
(62) In order to examining the rate-distortion-behavior, test data set consisting of 280 vibrotactile signals recorded with an accelerometer are encoded. The test dataset contains signals of various materials for different exploration speeds. the signals are compressed using a block length of 512 samples and a DWT of level 7.
(63) All signals are encoded, decoded and the resulting output is then compared to the original. The bit budgets of the quantization unit 10 are varied between 8 and 128 bits to achieve different rates and therefore quality levels. The compression ratio (CR) is defined as the ratio between the original rate and the compressed rate. Afterwards, the SNR and PSNR are computed for all 280 test signals for different CR values. The respective scatter plots for all three metrics with averages are given in
(64)
(65) As shown in
(66)
(67) As shown in
INDUSTRIAL APPLICABILITY
(68) Summarizing, the described invention provides encoding and decoding of vibrotactile signals with low noise and a small amount of data to be transmitted. Whilst the invention has been described with particular emphasis on IoT vibrotactile signals, the invention is of course generally applicable to the efficient transmission of tactile signals in other technological fields.
(69) In the foregoing specification, specific embodiments have been described. However, one of ordinary skill in the art appreciates that various modifications and changes can be made without departing from the scope of the invention as set forth in the claims below. Accordingly, the specification and figures are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of present teachings.
(70) The benefits, advantages, solutions to problems, and any element(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential features or elements of any or all the claims. The invention is defined solely by the appended claims including any amendments made during the pendency of this application and all equivalents of those claims as issued.
(71) Moreover, in this document, relational terms such as first and second, top and bottom, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms “comprises,” “comprising,” “has,” “having,” “includes,” “including,” “contains,” “containing,” or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises, has, includes, contains a list of elements does not include only those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. An element proceeded by “comprises . . . a,” “has . . . a,” “includes . . . a,” or “contains . . . a,” does not, without more constraints, preclude the existence of additional identical elements in the process, method, article, or apparatus that comprises, has, includes, or contains the element. The terms “a” and “an” are defined as one or more unless explicitly stated otherwise herein. The terms “substantially,” “essentially,” “approximately,” “about,” or any other version thereof, are defined as being close to as understood by one of ordinary skill in the art, and in one non-limiting embodiment the term is defined to be within 10%, in another embodiment within 5%, in another embodiment within 1%, and in another embodiment within 0.5%. The term “coupled” as used herein is defined as connected, although not necessarily directly and not necessarily mechanically. A device or structure that is “configured” in a certain way is configured in at least that way, but may also be configured in ways that are not listed.
(72) It will be appreciated that some embodiments may be comprised of one or more generic or specialized processors (or “processing devices”) such as microprocessors, digital signal processors, customized processors, and field programmable gate arrays (FPGAs), and unique stored program instructions (including both software and firmware) that control the one or more processors to implement, in conjunction with certain non-processor circuits, some, most, or all of the functions of the method and/or apparatus described herein. Alternatively, some or all functions could be implemented by a state machine that has no stored program instructions, or in one or more application specific integrated circuits (ASICs), in which each function or some combinations of certain of the functions are implemented as custom logic. Of course, a combination of the two approaches could be used.
(73) Software programs containing software instructions for carrying out the functionalities and method steps in the described units may be used. Therefore, one or more embodiments can be implemented as a computer-readable storage medium having computer readable code stored thereon for programming a computer (e.g., comprising a processor) to perform a method as described and claimed herein. Examples of such computer-readable storage mediums include, but are not limited to, a hard disk, a CD-ROM, an optical storage device, a magnetic storage device, a ROM (Read Only Memory), a PROM (Programmable Read Only Memory), an EPROM (Erasable Programmable Read Only Memory), an EEPROM (Electrically Erasable Programmable Read Only Memory) and a Flash memory. Further, it is expected that one of ordinary skill, notwithstanding possibly significant effort and many design choices motivated by, for example, available time, current technology, and economic considerations, when guided by the concepts and principles disclosed herein, will be readily capable of generating such software instructions and programs and ICs with minimal experimentation.
(74) The Abstract of the Disclosure is provided to allow the reader to quickly ascertain the nature of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims.
(75) In addition, in the foregoing Detailed Description, it can be seen that various features are grouped together in various embodiments for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter lies in less than all features of a single disclosed embodiment. Thus, the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separately claimed subject matter.