METHOD AND APPARATUS FOR ENCODING FEATURE MAP
20250232563 ยท 2025-07-17
Assignee
Inventors
Cpc classification
G06V10/454
PHYSICS
G06V10/7715
PHYSICS
G06V10/467
PHYSICS
International classification
G06V10/77
PHYSICS
G06V10/46
PHYSICS
Abstract
Disclosed herein is a method for encoding a feature map. The method may include arranging multiple channels based on similarity therebetween for a feature map having the multiple channels, rearranging the arranged multiple channels so as to be adjacent to each other in a feature map channel having a matrix form, and generating an encoded feature map by converting a feature value corresponding to the feature map channel from a real number to an integer.
Claims
1. A method for encoding a feature map, comprising: packing multiple channels of multi-channel feature map into a single channel feature map; and encoding the single channel feature map, wherein a feature value in the single channel feature map is refined based on an average value and a deviation value of feature values, and wherein a refined feature value is obtained by subtracting the average value from the feature value, and dividing a difference obtained by subtracting the average value from the feature value with the deviation value.
2. The method of claim 1, wherein the average value and the deviation value are explicitly encoded into a bitstream.
3. The method of claim 1, wherein multiple channels of the multi-channel feature map are arranged into multiple rows and multiple columns in the single channel feature map, and wherein, based on a packed position of a channel, one of no-flipping, horizontal flipping, vertical flipping and horizontal-vertical flipping is applied to the channel, the packed position of the channel being represented as i-th row and a j-th column in the single channel feature map.
4. The method of claim 3, wherein packing orders of the multiple channels are determined based on a similarity of each channel.
5. The method of claim 4, wherein the similarity of a channel is determined with reference to a reference channel among the multiple channels.
6. The method of claim 3, wherein in response to both the i and j being even number, the no-flipping is applied to the channel.
7. The method of claim 3, wherein, in response to the i being even number and the j being odd number, the horizontal flipping is applied to the channel.
8. The method of claim 3, wherein, in response to the i being odd-number and the j being even number, the vertical-flipping is applied to the channel.
9. The method of claim 3, wherein, in response to both the i and j being odd number, the horizontal-vertical flipping is applied to the channel.
10. The method of claim 4, wherein the packing orders represent descending order of the similarity of channels.
11. A method for decoding a feature map, comprising: decoding a single channel feature map; and restoring a multi-channel feature map by unpacking the single channel feature map, wherein a decoded feature value in the single channel feature map is refined based on an average value and a deviation value of feature values, and wherein a refined feature value is obtained by multiplying the deviation value to the decoded feature value, and adding the average value to intermediate feature value, resulting from the multiplication.
12. The method of claim 11, wherein the average value and the deviation value are obtained by decoding information from a bitstream.
13. The method of claim 11, wherein channels packed into multiple rows and multiple columns in the single channel feature map is restored as multiple channels of the multi-channel feature map, and wherein, based on a packed position of a channel, one of no-flipping, horizontal flipping, vertical flipping and horizontal-vertical flipping is applied to the channel during a restoration of the multiple channels, the packed position of the channel being represented as i-th row and a j-th column in the single channel feature map.
14. The method of claim 13, wherein in response to both the i and j being even number, the no-flipping is applied to the channel.
15. The method of claim 13, wherein, in response to the i being even number and the j being odd number, the horizontal flipping is applied to the channel.
16. The method of claim 13, wherein, in response to the i being odd-number and the j being even number, the vertical-flipping is applied to the channel.
17. The method of claim 13, wherein, in response to both the i and j being odd number, the horizontal-vertical flipping is applied to the channel.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0029] The above and other objects, features, and advantages of the present invention will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings, in which:
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0038] The advantages and features of the present invention and methods of achieving the same will be apparent from the exemplary embodiments to be described below in more detail with reference to the accompanying drawings. However, it should be noted that the present invention is not limited to the following exemplary embodiments, and may be implemented in various forms. Accordingly, the exemplary embodiments are provided only to disclose the present invention and to let those skilled in the art know the category of the present invention, and the present invention is to be defined based only on the claims. The same reference numerals or the same reference designators denote the same elements throughout the specification.
[0039] It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements are not intended to be limited by these terms. These terms are only used to distinguish one element from another element. For example, a first element discussed below could be referred to as a second element without departing from the technical spirit of the present invention.
[0040] The terms used herein are for the purpose of describing particular embodiments only, and are not intended to limit the present invention. As used herein, the singular forms are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms comprises, comprising,, includes and/or including, when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
[0041] Unless differently defined, all terms used herein, including technical or scientific terms, have the same meanings as terms generally understood by those skilled in the art to which the present invention pertains. Terms identical to those defined in generally used dictionaries should be interpreted as having meanings identical to contextual meanings of the related art, and are not to be interpreted as having ideal or excessively formal meanings unless they are definitively defined in the present specification.
[0042] Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the following description, the same reference numerals are used to designate the same or similar elements throughout the drawings, and repeated descriptions of the same components will be omitted.
[0043]
[0044] Referring to
[0045] The communication unit 110 may serve to transmit and receive information required for encoding a feature map through a communication network. Here, the network provides a path via which data is delivered between devices, and may be conceptually understood to encompass networks that are currently being used and networks that have yet to be developed.
[0046] For example, the network may be an IP network, which provides service for transmission and reception of a large amount of data and a seamless data service through an Internet Protocol (IP), an all-IP network, which is an IP network structure that integrates different networks based on IP, or the like.
[0047] Also, the network may be configured as a combination of one or more of a wired network, a Wireless Broadband (WiBro) network, a 3G mobile communication network including WCDMA, a 3.5G mobile communication network including a High-Speed Downlink Packet Access (HSDPA) network and an LTE network, a 4G mobile communication network including LTE advanced, a satellite communication network, and a Wi-Fi network.
[0048] Also, the network may be any one of a wired/wireless local area communication network for providing communication between various kinds of data devices in a limited area, a mobile communication network for providing communication between mobile devices or between a mobile device and the outside thereof, a satellite communication network for providing communication between earth stations using a satellite, and a wired/wireless communication network, or may be a combination of two or more selected therefrom.
[0049] The processor 130 may acquire feature map information pertaining to a feature map. The feature map may be defined using result (feature) values that are output when at least one filter (kernel) is applied to the input of a neural network, and may be represented as a 1D, 2D, or 3D array.
[0050] For example, a 2D feature map may be represented using a width and a length, and a 3D feature map may be represented using a width, a length, and a channel size. Also, the number of features (feature values) of a 2D feature map may be equal to the product of the width and the length thereof, and the number of features (feature values) of a 3D feature map may be equal to the product of the width, the length, and the channel size thereof.
[0051] The processor 130 may reconfigure the feature map based on the feature map information. When the feature map is configured with multiple channels, the multiple channels may be arranged based on similarity therebetween.
[0052] The processor 130 may rearrange the arranged multiple channels so as to be adjacent to each other in a feature map channel. Here, the feature map channel may have a matrix form. For example, the distances to the multiple channels are calculated based on an origin point, and the multiple channels may be rearranged in the feature map channel in ascending order of distance therefrom.
[0053] The processor 130 may perform a flip with respect to a specific direction of the feature map channel. Here, the flip may indicate symmetric transposition. For example, the processor 130 may perform a flip in the vertical direction of the feature map channel. Alternatively, the processor 130 may perform a flip in the horizontal direction of the feature map channel. Alternatively, the processor 130 may perform a flip in both the vertical and horizontal directions of the feature map channel.
[0054] The processor 130 may convert a feature value corresponding to the rearranged feature map channel from a real number to an integer. For example, the feature value may be converted from a real number to an integer through a normalization process using at least one of the average of feature values, the variance thereof, the minimum value of the range thereof after conversion, and the maximum value of the range thereof after conversion.
[0055] Hereinafter, a method for encoding a feature map, performed by a feature-map-encoding apparatus, will be described.
[0056]
[0057] Referring to
[0058] Here, the feature map may be the target to be encoded, and may be the output of a specific layer of a neural network. The feature map information may include at least one of the layer number of the neural network, the width of the feature map, the length of the feature map, the channel length of the feature map, the channel number of the feature map, and the difference of number of channels of the feature map.
[0059]
[0060] As shown in
[0061] Using a feature map channel index included in feature map information, a specific channel, or a specific channel number 1, 2, 3, 4, . . . , or k may be designated. For example, the first channel C1 has a channel index (channel_idx) of 1, and the second channel C2 has a channel index of 2. Accordingly, the channel of the feature map corresponding to a certain channel may be inferred using the feature map channel index.
[0062] The apparatus 100 for encoding a feature map may arrange the multiple channels based on similarity therebetween. The apparatus 100 for encoding a feature map may determine the similarity of feature values between a reference channel and each of the multiple channels and arrange the multiple channel in descending order of similarity.
[0063] For example, the k channels may be sorted into the most similar channel, the second most similar channel, the k-th most similar channel, and the like by determining the similarity between the reference channel and the k channels.
[0064] Assuming that the reference channel is the first channel, the k channels may be sorted into the channel that is most similar to the reference channel, the channel that is second most similar thereto, the channel that is k-th most similar thereto, and so on using at least one of PSMR and MSE.
[0065] Here, the sorted multiple channels may be arranged according to a feature map group sequence or in descending order of similarity.
[0066] Referring again to
[0067]
[0068] Referring to
[0069] The apparatus 100 for encoding a feature map according to an embodiment may rearrange the multiple channels CC1, CC2, . . . , CCk so as to be adjacent to each other in the upward, downward, leftward, and rightward directions in the feature map channel MM1 based on the distance from an origin point.
[0070] For example, when the position at which each of the multiple channels is disposed on an image is represented as 2D integer coordinates (i, j), a first distance d.sub.1_1 or a second distance d.sub.1_2 from the origin point to each of the rearranged multiple channels is calculated as shown in Equations (1) and (2), and the multiple channels may be arranged in ascending order of first distance d.sub.1_1 or second distance d.sub.1_2.
[0071] For example, the first distance d.sub.1_1 may be calculated by adding the absolute value of the difference between the i-th row and the origin point and the absolute value of the difference between the j-th column and the origin point. The second distance d.sub.1_2 may be calculated by adding the square of the difference between the i-th row and the origin point and the square of the difference between the j-th column and the origin point.
[0072] Accordingly, the rearranged first channel CC1, second channel CC2, . . . , k-th channel CCk may be sequentially arranged in a diagonal direction in the feature map channel.
[0073]
[0074] Embodiments 1 and 2 are images in which multiple channels are arranged in a feature map channel so as to be adjacent to each other based on a first distance or a second distance, and Comparison Examples 1 and 2 are images in which the multiple channels are sequentially arranged in a feature map channel in a row direction.
[0075] Referring to
[0076] Also, referring to
[0077] Referring again to
[0078] For example, a large discontinuity may appear in a boundary area in the feature map channel depending on the form of the feature map channel. Accordingly, in an embodiment, a flip may be performed such that the boundary area in which the discontinuity appears has an image that is as similar as possible to that in the area adjacent thereto.
[0079] The apparatus 100 for encoding a feature map according to an embodiment may perform a flip in a horizontal direction, a vertical direction, or both horizontal and vertical directions.
[0080]
[0081] As shown in
[0082] For example, when the i-th row is an even-numbered row and the j-th column is an odd-numbered column, a flip may be performed in the vertical direction V of the image. Also, after the flip is performed in the vertical direction V of the image, when the i-th row is an even-numbered row and the j-th column is an even-numbered column, a flip may be performed in the horizontal direction H of the image.
[0083] Also, when the i-th row is an odd-numbered row and the j-th column is an even-numbered column, a flip may be performed in the horizontal direction H of the image. Also, when the i-th row is an odd-numbered row and the j-th column is an odd-numbered column, a flip may not be performed.
[0084]
[0085] Embodiments 3 and 4 are images in which a flip is performed, and Comparison Examples 3 and 4 are images in which a flip is not performed.
[0086] Referring to
[0087] Also, referring to
[0088] Referring again to
[0089] In a neural network structure, a feature (value) in a feature map may be represented as either a real number or an integer having a predetermined range. For example, when a feature map has a single channel, the channel of the feature map is configured with a predetermined number of feature values, and the predetermined number may be nm. Here, the range for the real number may be 2.sup.1282.sup.128, and the range for the integer may be any one of 0255, 0511, and 01023.
[0090] In the encoding process according to an embodiment, the features (values) of the feature map are converted from real numbers to integers, after which encoding may be performed.
[0091] Through a normalization process using at least one of the average of the feature values, the variance thereof, the minimum number of the range thereof after conversion, and the maximum number of the range thereof after conversion, the feature values may be converted from real numbers to integers.
[0092] For example, through a normalization process using at least one of the average of the feature values (cast_avg), the variance thereof (cast_var), the minimum number of the range thereof after conversion (cast_min), and the maximum number of the range thereof after conversion (cast_max), a predetermined real-number feature value may be converted into an integer feature value ranging from 0 to 255, as shown in Equation (4).
[0093] Here, the average of the feature values (cast_avg) may be the average of the feature values in the entire feature map or a single feature map channel.
[0094] The variance of the feature values (cast_var) may be the variance of the feature values in the entire feature map or a single feature map channel.
[0095] Accordingly, in an embodiment, at least one of the average of the feature values, the variance thereof, the minimum value of the range thereof after conversion, and the maximum value of the range thereof after conversion may be signaled in order to encode the feature map.
[0096] Although the apparatus and method for encoding a feature map are described in an embodiment, an apparatus and method for decoding a feature map may also be provided.
[0097] For example, the apparatus for decoding a feature map may include a communication unit, a processor, and memory, and may perform the method for decoding a feature map. The apparatus for decoding a feature map may acquire feature map information pertaining to an encoded feature map. The apparatus for decoding a feature map may inversely reconfigure the encoded feature map based on the feature map information. The apparatus for decoding a feature map converts the feature values of the inversely reconfigured feature map from integers to real numbers, thereby generating a decoded feature map.
[0098]
[0099] Referring to
[0100] The processor 1010 may be a central processing unit or a semiconductor device for executing a program or processing instructions stored in the memory or the storage. The processor 1010 is a kind of central processing unit, and may control the overall operation of the apparatus 100 for encoding a feature map.
[0101] The processor 1010 may include all kinds of devices capable of processing data. Here, the processor may be, for example, a data-processing device embedded in hardware, which has a physically structured circuit in order to perform functions represented as code or instructions included in a program. Examples of the data-processing device embedded in hardware may include processing devices such as a microprocessor, a central processing unit (CPU), a processor core, a multiprocessor, an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), and the like, but are not limited thereto.
[0102] The memory 1030 may store various kinds of data for overall operation, such as a control program, and the like, for performing a method for encoding a feature map according to an embodiment. Specifically, the memory may store multiple applications running in the apparatus for encoding a feature map and data and instructions for operation of the apparatus for encoding a feature map.
[0103] The memory 1030 and the storage 1060 may be storage media including at least one of a volatile medium, a nonvolatile medium, a detachable medium, a non-detachable medium, a communication medium, and an information delivery medium. For example, the memory 1030 may include ROM 1031 or RAM 1032.
[0104] According to the present invention, a feature map may be effectively encoded and decoded in a feature extraction process using a neural network model.
[0105] Also, the present invention may provide technology that enables a feature map to be effectively used in a neural network.
[0106] Also, according to the present invention, similar images may be arranged in a column direction by rearranging multiple channels so as to be adjacent to each other in a feature map channel.
[0107] Also, according to the present invention, a flip is performed in a boundary area of images, whereby a discontinuity may be prevented from appearing in the boundary area of the images.
[0108] Although specific embodiments have been described in the specification, they are not intended to limit the scope of the present invention. For conciseness of the specification, descriptions of conventional electronic components, control systems, software, and other functional aspects thereof may be omitted. Also, lines connecting components or connecting members illustrated in the drawings show functional connections and/or physical or circuit connections, and may be represented as various functional connections, physical connections, or circuit connections that are capable of replacing or being added to an actual device. Also, unless specific terms, such as essential, important, or the like, are used, corresponding components may not be absolutely necessary.
[0109] Accordingly, the spirit of the present invention should not be construed as being limited to the above-described embodiments, and the entire scope of the appended claims and their equivalents should be understood as defining the scope and spirit of the present invention.