DATA GENERATION APPARATUS, DATA GENERATION METHOD, AND RECORDING MEDIUM
20230118020 · 2023-04-20
Assignee
Inventors
- So YAMADA (Tokyo, JP)
- Junko WATANABE (Tokyo, JP)
- Riki Eto (Tokyo, JP)
- Hiromi SHIMIZU (Tokyo, JP)
- Noriyuki TONOUCHI (Tokyo, JP)
Cpc classification
G01N5/02
PHYSICS
International classification
Abstract
In a data generation apparatus, an acquisition unit acquires original data which are odor data measured in a specific environment. A generation unit performs a linear transformation with respect to the original data, and generates augmented data which are odor data in an environment where temperature and humidity are different from those in the specific environment.
Claims
1. A data generation apparatus comprising: a memory storing instructions; and one or more processors configured to execute the instructions to: acquire original data which are odor data measured in a specific environment; and generate augmented data by performing a linear transformation with respect to the original data, the augmented data being odor data in an environment where temperature and humidity are different from that in the specific environment.
2. The data generation apparatus according to claim 1, wherein each set of the odor data represents features of an object with a waveform that indicates a rate of each of a plurality of odor molecules, the waveform indicates the plurality of odor molecules on a horizontal axis and the rate of each of the plurality of odor molecules on a vertical axis, and the processor generates the augmented data by performing the linear transformation with respect to a waveform of the original data.
3. The data generation apparatus according to claim 2, wherein the linear transformation shifts the waveform of the original data in a horizontal axis direction and changes a level.
4. The data generation apparatus according to claim 3, wherein the processor generates a vector representing the augmented data by multiplying a vector representing the waveform of the original data with an operation matrix expressing the linear transformation.
5. The data generation apparatus according to claim 4, wherein the operation matrix shifts each of elements of the vector representing the waveform of the original data and changes the level with the same level change rate.
6. The data generation apparatus according to claim 4, wherein the operation matrix shifts each of elements of the vector representing the waveform of the original data with the same shift amount or a different shift amount, and changes the level with the same level change rate or a different level change rate.
7. The data generation apparatus according to claim 4, wherein the operation matrix shifts each of elements of the vector representing the original data with the same shift amount or a different shift amount, and changes the level with a level change rate which is weighted with the same weight or a different weight.
8. The data generation apparatus according to claim 7, wherein the processor is further configured to generate a predictive model that predicts an object based on odor data by using the original data and the augmented data; and determine a weight for weighting the level change rate based on the weight of the predictive model.
9. A data generation method, comprising: acquiring original data which are odor data measured in a specific environment; and generating augmented data by performing a linear transformation with respect to the original data, the augmented data being odor data in an environment where temperature and humidity are different from that in the specific environment.
10. A non-transitory computer-readable acquiring original data which are odor data measured in a specific environment; and generating augmented data by performing a linear transformation with respect to the original data, the augmented data being odor data in an environment where temperature and humidity are different from that in the specific environment.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
EXAMPLE EMBODIMENTS
[0030] In the following, example embodiments will be described with reference to the accompanying drawings.
First Example Embodiment
[0031] [Overall Configuration]
[0032]
[0033] [Odor Measurement Apparatus]
[0034] The odor measurement apparatus 10 measures an odor of an object using a sensor, and outputs odor data.
[0035] The sensor 12 is a membrane-type surface stress (MSS: Membrane-type Surface Stress) sensor. The MSS sensor has, as a receptor, a functional film to which molecules adhere, and a stress generated in a support member of the functional film changes due to attachments and detachments of odor molecules to the functional film. The MSS sensor outputs a detected value based on this change in this stress. The sensor 12 is not limited to the MSS sensor, and may be any one that outputs the detected value based on a variation in a physical quantity related to a viscoelasticity and a dynamic property (a mass, a moment of inertia, or the like) of a member of the sensor 12 that occurs in response to attachments and detachments of the molecules with respect to the receptor. For instance, one of various types of sensors may be employed, such as a cantilever type, a membrane type, an optical type, a piezo, a vibration response, and the like.
[0036] For the sake of explanation, sensing by the sensor 12 is modeled as follows. [0037] (1) The sensor 12 is exposed to a target gas containing k types of molecules. [0038] (2) A concentration for each of the k types of molecules in the target gas is a constant ρ.sub.k. [0039] (3) A total of n molecules can be adhered to the sensor 12. [0040] (4) The number of the molecules k attached to the sensor 12 at a time t is denoted by n.sub.k(t).
[0041] In this case, a change in the number n.sub.k(t) of the molecules k attached to the sensor 12 over time can be formulated as follows.
[0042] Each of a first term and a second term on a right side of the above formula (1) represents an increase amount of the molecules k per unit time (a number of the molecules k newly attaching to the sensor 12) and a decrease amount of the molecules k per unit time (a number of the molecules k detaching from the sensor 12). Moreover, α.sub.k denotes a rate constant representing a rate at which the molecules k attach to the sensor 12, β.sub.k denotes a rate constant representing a rate at which the molecules k detach from the sensor 12.
[0043] Here, since the concentration ρ.sub.k is constant, the number n.sub.k(t) of the molecules k at the time t from the above formula (1) can be formulated as follows.
[0044] Furthermore, assuming that no molecule is attached to the sensor 12 at a time to (an initial state), n.sub.k(t) is expressed as follows.
[Math 3]
n.sub.k(t)=n*.sub.k(1−e.sup.−β.sup.
[0045] The detected value of the sensor 12 is determined by the stress exerted on the sensor 12 by the molecules contained in the target gas. Accordingly, it is considered that a stress exerted on the sensor 12 by a plurality of molecules can be represented by a linear sum of stresses generated by individual molecules. However, it is considered that a stress generated by each molecule varies depending on a type of the molecule. That is, a contribution of the molecule with respect to the detected value of the sensor 12 differs depending on the type of the molecule.
[0046] Therefore, the detected value y(t) of the sensor 12 can be formulated as follows.
[0047] Here, both γ.sub.k and ξ.sub.k represent contributions of a molecule k with respect to the detected value of the sensor 12. Note that the “rising case” refers to a case of exposing the sensor 12 to the target gas, and the “falling case” refers to a case of removing the target gas from the sensor 12. Note that an operation of removing the target gas from the sensor is performed, for instance, by exposing the sensor to a gas called purge gas.
[0048] Here, in a case where the time series data Y obtained by the sensor 12 in which the target gas is sensed can be decomposed as in the above formula (4), it is possible to grasp the types of the molecules contained in the target gas and a ratio of each of various types of the molecules contained in the target gas. That is, by the decomposition represented by the formula (4), data representing features of the target gas, that is, a feature amount of the target gas can be obtained.
[0049] Therefore, the odor measurement apparatus 10 acquires the time series data Y output by the sensor 12, and decomposes as expressed in the following formula (5).
Here, θ.sub.i denotes a time constant or a rate constant with respect to a magnitude of a change in an amount of the molecules adhering to the sensor 12 over time. ξ.sub.k denotes a contribution value representing a contribution of the feature constant θ.sub.i to the detected value of the sensor 12.
[0050] As a feature constant θ, it is possible to adopt the aforementioned rate constant β and a time constant τ which is an inverse of the rate constant. For each case where β and τ are used as the feature constant θ, the formula (5) can be expressed as follows.
[0051] Hereinafter, for convenience of explanation, it is assumed that the time series data Y are represented by the formula (6). As illustrated in
[0052] [Data Augmentation Apparatus]
[0053] (Basic Principle)
[0054] As described above, since the time constant spectrum (hereinafter, also referred to as a “TS”) indicates a rate of each odor molecule in a target gas, a model for predicting an object based on features of odor data can be created by machine learning, or the like. Here, since the TS varies depending on an environment such as temperature or humidity, in order to be able to predict in various environments, it is necessary to measure odor data for each environment with different temperature or humidity and to prepare training data for training a model. However, a huge amount of time and a considerable effort are required to prepare training data for all environments by measurement. Therefore, a large number of the training data are prepared by performing a data augmentation for the odor data obtained by the measurement in a specific environment, and by artificially creating sets of odor data in environments with different temperature or humidity.
[0055] From changes in waveforms of the TS (hereinafter also referred to as “TS waveforms”) respectively obtained in the different environments, it is possible to qualitatively know an effect due to the change in the temperature or the humidity on each TS waveform.
[0056]
[0057] Therefore, a linear transformation which gives the change of the waveform as mentioned above is obtained, and augmented data are generated based on the original data of odor data by using this linear transformation. In detail, the data augmentation apparatus 20 performs the linear transformation that shifts the TS waveform of the input original data in the horizontal axis direction, and changes a level in response to a change in the temperature or the humidity, so as to generate the augmented data.
[0058] (Hardware Configuration)
[0059]
[0060] The input IF 21 inputs and outputs odor data. In detail, the input IF 21 is used to acquire original data of the odor data from the DB 5 and to store, in the DB 5, augmented data generated by the data augmentation apparatus 20. The processor 22 is a computer such as a CPU (Central Processing Unit) and controls the entire data augmentation apparatus 20 by executing programs prepared in advance. Specifically, the processor 22 executes a data augmentation process, which will be described later.
[0061] The memory 23 is formed by a ROM (Read Only Memory), a RAM (Random Access Memory), and the like. The memory 23 stores various programs to be executed by the processor 22. The memory 23 is also used as a working memory during executions of various processes by the processor 22.
[0062] The recording medium 24 is a non-volatile and non-transitory recording medium such as a disk-shaped recording medium, a semiconductor memory, or the like, and is formed to be detachable from the data augmentation apparatus 20. The recording medium 24 records various programs executed by the processor 22. When the data augmentation apparatus 20 executes various types of processes, programs recorded on the recording medium 24 are loaded into the memory 23 and executed by the processor 22.
[0063] The DB 25 stores data input from an external apparatus including an input IF 21. Specifically, the DB 25 temporarily stores the odor data acquired from the DB 5.
[0064] (Functional Configuration)
[0065]
[0066]
[0067] Now, assuming that the original data of the odor data are represented by x.sub.old, the operation matrix is denoted by O, and the augmented data are represented by x.sub.new, the augmented data can be obtained by the following equation.
X.sub.new=Ox.sub.old
Here, the original data x.sub.old and the augmented data x.sub.new are represented by a d×1 dimensional vector (matrix), and the operation matrix O represented by a d×d dimensional vector (matrix).
[0068] As illustrated in
[0069] Note that restrictions (1) through (3) illustrated in
[0070]
[0071] Next, the operation matrices O.sub.40.fwdarw.15 and O.sub.25.fwdarw.15 thus obtained are applied to another set of original data illustrated in
[0072]
[0073]
[0074] Next, a method for generating the operation matrix O will be described in detail.
[0075] (A) First Method
[0076] In a first method, all shift amounts n.sub.i of the operation matrix O are the same value and all level change rates a.sub.i are the same values. In a case where the source data used for generating the operation matrix O are denoted by x.sub.source, and the target data are denoted by x.sub.target, the operation matrix O is generated so that a product Ox.sub.source of the source data x.sub.source and the operation matrix O is closer to the target data x.sub.target.
[0077] Now, a difference d is defined as follows and O(n, a) is acquired so as to minimize the difference d.
d=∥x.sub.target−Ox.sub.source∥
where ∥⋅∥ represents a norm.
[0078] In detail, first, an initial value d.sub.min of the difference d is set, the level change rate a and the difference d are calculated by the following formulas.
a=argmin∥x.sub.target−O(n a)x.sub.source∥
d=∥x.sub.target−O(n, a)x.sub.source∥ [0079] Then, −a=a and d.sub.min=d when d.sub.min>d. [0080] By repeating this process a predetermined number of times, a combination of n and a is acquired so that the difference d is minimized.
[0081] In the formula of the level change rate a, in order for a value of the level change rate a not to be excessive, a regularization term may be added as follows.
a=argmin∥x.sub.target−O(n a)x.sub.source∥+λ∥a∥,
where “λ” is an arbitrary coefficient.
[0082] (B) Second Method
[0083] In a second method, each shift amount n.sub.i of the operation matrix O is a different value and each level change rate a.sub.i is a different value. In a case where the source data used for generating the operation matrix O are denoted by x.sub.source and the target data are denoted by x.sub.target, the operation matrix O is generated so that the product Ox.sub.source of the source data x.sub.source and the operation matrix O is closer to the target data x target.
[0084] Similar to the first method, the difference d is defined as follows.
d=∥x.sub.target−Ox.sub.source∥
where ∥⋅∥ represents a norm. Then, O(n, a) is obtained so as to approach the difference d to “0”, and n is obtained so as to minimize a parameter Σ.sub.i|a.sub.i|. In the second method, both the shift amount n and the level change rate a are vectors (may be different vectors depending on i).
[0085] In the second method, the solution is not uniquely determined even in a case where the norm becomes “0” due to x.sub.target dimensions of the level change rate a. Accordingly, by enumerating the shift amount n, n is acquired so that the parameter Σ.sub.i|a.sub.i| is minimized. At this time, for the shift amount n, it is sufficient to determine a realistic range based on an actual TS waveform and perform a search within the range.
[0086] (Modification)
[0087] Next, a modification of the first example embodiment will be described. In the modification, a weight is added to the level change rate a of the operation matrix O.
[0088]
[0089] The predictive model generation unit 33 generates a predictive model for predicting an object or the like from odor data using machine learning or the like. In detail, the predictive model generation unit 33 trains the predictive model using the original data and the augmented data generated by the data augmentation unit 32. At this time, the predictive model generation unit 33 generates each weight Wm indicating an important portion in the prediction based on the odor data, that is, the target portion of the TS waveform. For instance, in a case where the predictive model is the linear model, each coefficient of the predictive model can be used as the weight Wm. The weight Wm is input to the operation matrix generation unit 31.
[0090] The operation matrix generation unit 31 normalizes the weight Wm input from the predictive model generation unit 33 and sets the normalized weight Wm to a weight w of the operation matrix O illustrated in
[0091] According to the above modification, it is possible to inherit features of the target portion which is important in the prediction using the odor data to the augmented data.
Second Example Embodiment
[0092]
[0093] A part or all of the example embodiments described above may also be described as the following supplementary notes, but not limited thereto.
[0094] (Supplementary Note 1)
[0095] 1. A data generation apparatus comprising:
[0096] an acquisition unit configured to acquire original data which are odor data measured in a specific environment; and
[0097] a generation unit configured to perform a linear transformation with respect to the original data, and generate augmented data which are odor data in an environment where temperature and humidity are different from that in the specific environment.
[0098] (Supplementary Note 2)
[0099] 2. The data generation apparatus according to supplementary note 1, wherein
[0100] each set of the odor data represents features of an object with a waveform that indicates a rate of each of a plurality of odor molecules,
[0101] the waveform indicates the plurality of odor molecules on a horizontal axis and the rate of each of the plurality of odor molecules on a vertical axis, and
[0102] the generation unit generates the augmented data by performing the linear transformation with respect to a waveform of the original data.
[0103] (Supplementary Note 3)
[0104] 3. The data generation apparatus according to supplementary note 2, wherein the linear transformation shifts the waveform of the original data in a horizontal axis direction and changes a level.
[0105] (Supplementary Note 4)
[0106] 4. The data generation apparatus according to supplementary note 3, wherein the generation unit generates a vector representing the augmented data by multiplying a vector representing the waveform of the original data with an operation matrix expressing the linear transformation.
[0107] (Supplementary Note 5)
[0108] 5. The data generation apparatus according to supplementary note 4, wherein the operation matrix shifts each of elements of the vector representing the waveform of the original data and changes the level with the same level change rate.
[0109] (Supplementary Note 6)
[0110] 6. The data generation apparatus according to supplementary note 4, wherein the operation matrix shifts each of elements of the vector representing the waveform of the original data with the same shift amount or a different shift amount, and changes the level with the same level change rate or a different level change rate.
[0111] (Supplementary Note 7)
[0112] 7. The data generation apparatus according to supplementary note 4, wherein the operation matrix shifts each of elements of the vector representing the original data with the same shift amount or a different shift amount, and changes the level with a level change rate which is weighted with the same weight or a different weight.
[0113] (Supplementary Note 8)
[0114] 8. The data generation apparatus according to supplementary note 7, further comprising
[0115] a predictive model generation unit configured to generate a predictive model that predicts an object based on odor data by using the original data and the augmented data; and
[0116] a weight determination unit configured to determine a weight for weighting the level change rate based on the weight of the predictive model.
[0117] (Supplementary Note 9)
[0118] 9. A data generation method, comprising:
[0119] acquiring original data which are odor data measured in a specific environment; and
[0120] performing a linear transformation with respect to the original data, and generating augmented data which are odor data in an environment where temperature and humidity are different from that in the specific environment.
[0121] (Supplementary Note 10)
[0122] 10. A recording medium storing a program, the program causing a computer to perform a process comprising:
[0123] acquiring original data which are odor data measured in a specific environment; and
[0124] performing a linear transformation with respect to the original data, and generating augmented data which are odor data in an environment where temperature and humidity are different from that in the specific environment.
[0125] While the disclosure has been described with reference to the example embodiments and examples, the disclosure is not limited to the above example embodiments and examples. Various modifications that can be understood by those skilled in the art can be made to the structure and details of the present disclosure within the scope of the present disclosure.
DESCRIPTION OF SYMBOLS
[0126] 5, 6 Database (DB)
[0127] 10 Odor measurement apparatus
[0128] 12 Sensor
[0129] 20, 20x Data augmentation apparatus
[0130] 22 Processor
[0131] 23 Memory
[0132] 31 Operation matrix generation unit
[0133] 32 Data augmentation unit
[0134] 33 Predictive model generation unit