METHOD OF AUGMENTING A DATASET USED IN FACIAL EXPRESSION ANALYSIS

Abstract

In a computer-implemented method of augmenting a dataset used in facial expression analysis, a first facial image and a second facial image are added to a training/testing dataset and mapped to two respective points in a continuous dimensional emotion space. The position of a third point in the continuous dimensional emotion space between the first two points is determined. Augmentation is achieved when a labelled facial image is derived from the third point based on its position relative to the first and second facial expression.

Claims

1. A computer-implemented method of augmenting a dataset used in facial expression analysis, the method comprising the steps of: adding to a dataset a first facial image and a second facial image; mapping the first and second facial images to a first expression point and second expression point respectively in a continuous dimensional emotion space; determining a position of a third expression point in the continuous dimensional emotion space, the third expression point being a position between the first and second expression point; generating an expression variation image; assigning a first dimensional label to the expression variation image; and adding the expression variation image to the dataset, wherein the expression variation image and first dimensional label are derived from the position of the third expression point, based on its position relative to the first and second facial image.

2. The method according to claim 1, wherein the expression variation image is obtained by applying a first morph function to interpolate the first facial image and the second facial image and the intensity variation image is obtained by applying a second morph function to interpolate the neutral facial image and the first facial image.

3. The method according to claim 1, wherein the third expression point is determined by applying an expression distance to the first expression point or second expression point.

4. The method according to claim 3, wherein the expression distance is preselected.

5. The method according to claim 1, wherein the first and second facial images are each assigned a label comprising the following expressions: happy, surprised, afraid, angry, disgusted or sad and are mapped to positions that correspond to the labels in the continuous dimensional emotion space.

6. The method according to claim 1, wherein the first and second facial images are apex expressions.

7. The method according to claim 1, further comprising the step of: adding to the dataset, a neutral facial image; mapping the neutral facial image to a neutral point in the continuous dimensional emotion space; determining a first intensity point in the continuous dimensional emotion space, the first intensity point being a position between the neutral facial image and one of: the first facial image, the second facial image or the expression variation image; generating an intensity variation image; assigning a second dimensional label to the first variation image; and adding the intensity variation image to the dataset; wherein the intensity variation image and the second dimensional label are based on the first intensity point's relative position between the neutral facial image and the first facial image.

8. The method according to claim 7, wherein the first intensity point is determined by applying an intensity distance to the neutral expression point or the first facial image.

9. The method according to claim 7, wherein the intensity distance is preselected.

10. The method according to claim 7, wherein the expression variation image is obtained by applying a first morph function to interpolate the first facial image and the second facial image and the intensity variation image is obtained by applying a second morph function to interpolate the neutral facial image and the first facial image.

11. A storage medium comprising machine readable instructions stored thereon for causing a computer system to perform a method according to claim 1.

12. A computer-implemented method of augmenting a dataset used in facial expression analysis comprising the steps of: adding to a dataset, a neutral facial image and a first facial image; mapping the neutral facial image to a neutral point and the first facial image to a first expression point, the neutral point and first expression point being located in a continuous dimensional emotion space; determining a first intensity point on an intensity plane in the continuous dimensional emotion space, the first intensity point being a position between the neutral facial image and the first expression; generating an intensity variation image; and assigning a dimensional label to the intensity variation image; wherein the intensity variation image and the dimensional label are based on the first intensity point's relative position between the neutral facial image and the first facial image.

13. The method according to claim 12, wherein the intensity variation image is obtained by applying a second morph function to interpolate the neutral facial image and the first facial image.

14. The method according to claim 13, wherein the continuous dimensional emotion space is a valence-arousal circumplex.

15. A storage medium comprising machine readable instructions stored thereon for causing a computer system to perform a method according to claim 12.

Description

BRIEF DESCRIPTION OF DRAWINGS

[0014] FIG. 1 illustrates a method of augmenting a dataset used in facial expression analysis according to the present invention.

[0015] FIG. 2 is a depiction of the augmentation framework according to the present invention in valence-arousal space of the circumplex model.

[0016] FIG. 3 depicts two types of morphings use to generate expression and intensity variations.

DETAILED DESCRIPTION OF THE INVENTION

[0017] In the following detailed description, reference is made to the accompanying drawings, which form a part hereof. The illustrative embodiments described in the detailed description, drawings and claims are not meant to be limiting. Other embodiments can be utilized, and other changes can be made, without departing from the spirit or scope of the subject matter presented herein.

[0018] The present invention provides a method of augmenting a dataset used in Facial Expression Analysis systems 100 specifically, by it being premised on the observation that many expression variations can be approximated by generating high-quality morphings between the existing available images of a categorical dataset. While expected to be typically employed to augment training datasets used to train machine learning models used in facial expression analysis, the method can likewise be used to augment a testing dataset.

[0019] Referring to FIG. 1 and in a first embodiment of the present invention, a first facial image and second facial image are added to a dataset (step 101). Facial images that are added to the dataset typically depict one of seven expressions, namely: Neutral (NE), Happy (HA), Surprised (SU), Afraid (AF), Angry (AN), Disgusted (DI) and Sad (SA), and are already annotated/labelled as such. By way of example, a typical categorical dataset is expected to contain each of the seven expressions per subject. However, it will be readily understood by a person of ordinary skill in the art that the first and second facial image may also be other expressions such as, but not limited to Delighted, or Excited. For the purposes of augmenting expressions, the first and second facial images cannot depict Neutral.

[0020] In one embodiment, the first facial image and second facial image both depict an apex expression, meaning the facial expression is a peak intensity.

[0021] Referring to FIG. 2, a 2-dimensional polar affective space, similar to the Arousal-Valence (AV) space of the circumplex model, with Neutral at the center is depicted. Distance from the center (i.e. deviation from neutral) represents the intensity of an expression. High intensity expressions (e.g. ‘extremely happy’) are located at the outer perimeter of the affective space, while low-key expressions (e.g. ‘slightly happy’) near the center of the space, close to Neutral. Arousal and Valence are in the range of [−1, 1]. Emotions are defined by angles in the interval [0°, 360°], while intensity of expression is defined by the distance from the center (Neutral) and is in the interval [0, 1]. The following assumptions are made. First, one-to-one correspondence of affective coordinates to facial deformations. Second, continuity of the facial deformation space. Third, neighboring affective coordinates have similar facial deformations. Small changes in affective coordinates, result in small changes in the facial deformation space. Lastly, continuity of the dimensional affective space.

[0022] The first and second facial images are mapped to a first and second expression point respectively in a continuous dimensional emotion space (step 102). A non-limiting example of a continuous two-dimensional emotion space is the valence-arousal space of the circumplex model which will be discussed further as a non-limiting example.

[0023] Since the first and second facial images are already labelled, the first and second expression points can be very specific coordinates in the valence-arousal space of the circumplex model. Referring to FIG. 2, the original dataset images are depicted as outlined circles. It is apparent after mapping a typical dataset with facial images depicting the seven typical facial expressions that there is a lot of empty space in the valence-arousal space.

[0024] Next, a position of a third expression point in the continuous dimensional emotion space is determined, the third expression point being a position between the first and second expression point (step 103). In one embodiment, the position of the third expression point is determined by applying an expression distance to either the first or second expression point, so long as the third expression point lies therebetween. In one embodiment, further expression points can be determined by applying the expression distance to the third expression point and so on to obtain a series of expression points between the first and second expression point. Although not strictly required, in one embodiment, the expression distance is preselected based on the specific needs of the training or testing dataset, such as expression granularity, augmentation factor and symmetry between the points. In one embodiment, the expression distance is an angle increment of 15° starting from either the first or second expression point and strikes a balance between the aforementioned criteria. In one embodiment, the expression is preselected. The method as described above can also be applied to a neutral point and any expression image, including the first facial image and second facial image and newly generated images as a result of augmentation such as an expression variation image described below to generate an intensity variation image with the third expression point being analogous to the first intensity point for the purposes of mapping.

[0025] In one embodiment, the first and second facial images correspond to points that are immediately adjacent to each other when mapped in the continuous dimensional emotion space. By way of example, and once again referring again to FIG. 2, the emotions, angry and afraid are example of two expressions that are immediately adjacent to each other in the valence arousal space.

[0026] With the position of the third expression determined, an expression variation image can be generated, and a first dimensional label can be assigned to the expression variation image, the expression variation image and first dimensional label being derived from the position of the third expression point, based on its position relative to the first and second facial expression (step 104). In a non-limiting example, and assuming a two-dimensional valence arousal space of the circumplex model.

[0027] Let F.sub.i.sup.E denote the face image of subject i with facial expression E. For categorical datasets, usually E∈{NE,HA,SU,AF,AN,DI,SA}. Let θ.sup.E E denote the specific angle of each expression in the polar valence-arousal space, as estimated from emotion studies. See J. A. Russell, “A circumplex model of affect,” Journal of Personality and Social Psychology, vol. 39, no. 6, p. 1161-1178, 1980; and G. Paltoglou and M. Thelwall, “Seeing stars of valence and arousal in blog posts,” IEEE Trans. Affective Computing, vol. 4, no. 1, pp. 116-123, January 2013, each of which is incorporated herein by reference in its entirety. Let I.sub.i.sup.E∈[0, 1] denote the intensity of expression E of subject i. Zero expression intensity I.sup.e=0 coincides with NE (by definition I.sup.NE=0), while I.sup.E=1 represents the highest possible intensity of expression. Let M.sub.p (F.sub.i.sup.source, F.sub.i.sup.target, r) be a morphing function, based on p facial landmarks, that returns a new face image, which is the result of morphing F.sub.i.sup.source towards F.sub.i.sup.target with a ratio r∈[0, 1]; when r=0 the morphed image is identical to F.sub.i.sup.source and when r=1 it is identical to F.sub.i.sup.target. Any known contemporary morphing approach can be used for this, such as Delaunay triangulation followed by local warping on groups of facial landmarks.

[0028] The augmentation method according some aspects the present invention is based on 2 types of morphings. In order to synthesize new expression variations, Apex to Apex morphing (1) is used, between the given apex expressions of the categorical dataset:

[00001] $\begin{matrix} Apex to Apex {\begin{matrix} F_{i}^{A_{1} {rA}_{2}} = M_{p} (F_{i}^{A_{1}}, F_{i}^{A_{2}}, r) \\ I_{i}^{A_{1} {rA}_{2}} = (1 - r) I_{i}^{A_{2}} + {rI}_{i}^{A_{2}} \\ θ^{A_{1} {rA}_{2}} = (1 - r) θ^{A_{1}} + {r θ}^{A_{2}} \end{matrix} & (1) \end{matrix}$

[0029] Where A, A.sub.1 and A.sub.2 are apex expressions from the parent dataset, and r is a ratio in the interval [0, 1].

[0030] In order to synthesize new intensity variations, Neutral to Apex morphing (2) is used, between the NE image and a given (or interpolated) apex image:

[00002] $\begin{matrix} Neutral to Apex {\begin{matrix} F_{i}^{rA} = M_{p} (F_{i}^{NE}, F_{i}^{A}, r) \\ I_{i}^{rA} = {rI}_{i}^{A} \\ θ^{rA} = θ^{A} \end{matrix} & (2) \end{matrix}$

[0031] Referring to FIG. 3, examples of these 2 types of morphings are depicted. For every given or generated face image F.sub.i.sup.E, with I.sub.i.sup.E and θ.sup.E, the first dimensional label can be computed as V.sub.i.sup.E=I.sub.i.sup.E cos(θ.sup.E)=I and A.sub.i.sup.E=I.sub.i.sup.E sin(θ.sup.E). In some embodiments, an intensity variation image is generated by applying a intensity distance, the intensity distance being an intensity increment of 0.1. Optionally, both (1) and (2) can both be applied to generated augmented image(s) for both facial expressions and intensity.

[0032] Optionally, the expression variation image or expression variation image can be added to the dataset (step 105).

[0033] As is traditional in the field of the disclosed technology, features and embodiments are described, and illustrated in the drawings, in terms of various steps. Those skilled in the art will appreciate that these steps are physically implemented by a computer system including one or more computers implemented with electronic (or optical) circuits such as logic circuits, discrete components, microprocessors, hard-wired circuits, memory elements, wiring connections, and the like. The steps may be implemented by a computer-readable storage medium used in conjunction with the one or more computers, and comprising machine readable instructions stored thereon for causing a computer system to perform the steps. In the case of the one or more computers being implemented by microprocessors or similar, they may be programmed using software (e.g., microcode) to perform various functions discussed herein and may optionally be driven by firmware and/or software. Alternatively, certain steps or procedures may be implemented by dedicated hardware, or as a combination of dedicated hardware to perform some functions and a processor (e.g., one or more programmed microprocessors and associated circuitry) to perform other functions. In this sense, the various steps described can be performed automatically to result in a rapid process for performing facial expression analysis or for augmenting a dataset used in performing facial expression analysis.

[0034] While example embodiments have been illustrated and described above, it will be apparent to those skilled in the art that modifications and variations could be made without departing from the scope of the present invention as defined by the appended claims.

METHOD OF AUGMENTING A DATASET USED IN FACIAL EXPRESSION ANALYSIS

Inventors

Cpc classification

Classification Explorer

G06T7/001

PHYSICS

Classification Explorer

G06V40/175

PHYSICS

Classification Explorer

G06T2210/44

PHYSICS

Classification Explorer

G06V40/174

PHYSICS

Classification Explorer

G06T2207/20081

PHYSICS

Classification Explorer

G06V40/168

PHYSICS

International classification

Classification Explorer

G06V40/16

PHYSICS

Classification Explorer

G06T7/00

PHYSICS

Abstract

Claims

Description