System of visualizing and querying data using data-pearls
11580140 · 2023-02-14
Assignee
Inventors
- Kamalakar Karlapalem (Hyderabad, IN)
- Nahil Jain (Hyderabad, IN)
- Ayush Jain (Hyderabad, IN)
- Nikhil Gogate (Hyderabad, IN)
Cpc classification
G06F16/283
PHYSICS
G06F16/2425
PHYSICS
G06F16/2428
PHYSICS
International classification
G06F7/00
PHYSICS
G06F16/28
PHYSICS
G06F16/2458
PHYSICS
Abstract
A system and method for visualizing and querying high dimensional data to a user. The system includes a user device, a data-pearls visualization and querying server. The server obtains the high dimensional data from the user device associated with user. The server generates data clusters and sub-divides the data clusters into non-overlapping subsets of data-pearls using a clustering technique. The server selects a shape for each data-pearl by comparing a distance between centroid of a data-pearl and a farthest point from a determined centroid using L.sub.p norm distance measures. The server configures each data-pearl in a three-dimensional plot. The server enables the user to visualize the data-pearls on a screen of the user device. The server queries data based on a query using data dimension technique. The server dimensions data related to the query through determined classifiers based on filtered data after pruning unrelated data to the query.
Claims
1. A computer-implemented method for generating visual representations of high dimensional data using a plurality of data-pearls to a user, said method performed by one or more hardware processors comprising: implementing a clustering technique that is configured to extract at least one data cluster from the high dimensional data and sub-dividing the at least one data cluster that is extracted into the plurality of data-pearls such that the plurality of data-pearls are not partitioned further and are not overlapped with one another; implementing L.sub.p norm distance measures that is configured to select a L.sub.p norm shape for each data-pearl of the at least one data cluster by comparing an average distance (Ap) between a centroid of the plurality of data-pearls and all points in the plurality of data-pearls to obtain a minimum average distance, wherein the minimum average distance is fixed as p; constructing the L.sub.p norm shape by providing an envelope for the plurality of data-pearls using the L.sub.p norm distance measures by defining the envelope using the L.sub.p norm distance measures such that L.sub.p(x,y)=r, x, and y are a set of points on the envelope based on a given distance value r, wherein the given distance value r is equal to the minimum average distance; representing a high-dimensional space occupied by the high-dimensional data as sectors in a plane of a three-dimensional plot by defining an origin of the three-dimensional plot as centroid of the at least one data cluster and a range of axes of the three-dimensional plot based on a distance between a center of each data-pearl, the centroid of the at least one data cluster and a radius of each data-pearl; plotting each data-pearl in the L.sub.p norm shape in a corresponding sector of the plane of the tree-dimensional plot using a radius of the L.sub.p norm shape, a sector angle in the three-dimensional plot, and an elevation angle; and generating visual representations of the plurality of data-pearls of the high dimensional data in plotted L.sub.p norm shapes in the sectors of the plane of the three-dimensional plot using a graphics module of a user device associated with the user.
2. The method of claim 1, further comprising querying the high-dimensional data based on a query provided by the user using at least one dimension, by (i) determining the plurality of data-pearls and placing the plurality of data-pearls separately using domain values of the at least one dimension, (ii) implementing a concept search that is configured to select the plurality of data-pearls based on the query provided by the user, (iii) incrementing the plurality of data-pearls based on a volume of at least one dataset, wherein the at least one dataset comprises the plurality of data-pearls, wherein the plurality of data-pearls comprises at least one attribute, (iv) filtering the high-dimensional data based on the plurality of data-pearls, (v) selecting data related to the query and performing steps from (i) to (iv) repetitively, and (vi) dimensioning the data related to the query through determined classifiers based on filtered data after pruning unrelated data to the query.
3. The method of claim 1, wherein the farthest point is computed from the centroid of corresponding data-pearl based on Lp norm distance measures.
4. The method of claim 1, wherein the elevation angle is an angle between the centroid of the plurality of data-pearls and a unit vector in dimension with maximum standard deviation.
5. The method of claim 1, further comprising identifying the L.sub.p norm shape from a given set of L.sub.p norm shapes as determined by the closed sets formed by points having at a most distance using the L.sub.p norm distance measures.
6. The method of claim 1, wherein each Lp norm distance measure defined as a positive square root of sum of all squares of the distance in each dimension related to the high-dimensional data.
7. One or more non-transitory computer-readable storage medium storing the one or more sequence of instructions, which when executed by the one or more processors, causes to perform a method of generating visual representations of high dimensional data using a plurality of data-pearls to a user, wherein the method comprises: implementing a clustering technique that is configured to extract at least one data cluster from the high dimensional data and sub-dividing the at least one data cluster that is extracted into the plurality of data-pearls such that the plurality of data-pearls is not partitioned further and is not overlapped with one another; implementing L.sub.p norm distance measures that is configured to select a L.sub.p norm shape for each data-pearl of the at least one data cluster by comparing an average distance (Ap) between a centroid of the plurality of data-pearls all points in the plurality of data-pearls to obtain a minimum average distance, wherein the minimum average distance is fixed as p; constructing the L.sub.p norm shape by providing an envelope for the plurality of data-pearls using the L.sub.p norm distance measures by defining the envelope using the L.sub.p norm distance measures such that L.sub.p(x,y)=r, x, and y are a set of points on the envelope based on a given distance value r, wherein the given distance value r is equal to the minimum average distance; representing a high-dimensional space occupied by the high-dimensional data as sectors in a plane of a three-dimensional plot by defining an origin of the three-dimensional plot as centroid of the at least one data cluster and a range of axes of the three-dimensional plot based on a distance between a center of each data-pearl, the centroid of the at least one data cluster and a radius of each data-pearl; plotting each data-pearl in the L.sub.p norm shape in a corresponding sector of the plane of the three-dimensional plot using a radius of the L.sub.p norm shape, a sector angle in the three-dimensional plot, and an elevation angle; and generating visual representations of the plurality of data-pearls of the high dimensional data in plotted L.sub.p norm shapes in the sectors of the plane of the three-dimensional plot using a graphics module of a user device associated with the user.
8. A system for generating visual representations of high dimensional data using a plurality of data-pearls to a user comprising: a device processor; and a non-transitory computer-readable storage medium storing one or more sequences of instructions, which when executed by the device processor, causes: implement a clustering technique that is configured to extract at least one data cluster from the high dimensional data and sub-dividing the at least one data cluster that is extracted into the plurality of data-pearls such that the plurality of data-pearls is not partitioned further and is not overlapped with one another; implement L.sub.p norm distance measures that is configured to select a L.sub.p norm shape for each data-pearl of the at least one data cluster by comparing an average distance (Ap) between a centroid of the plurality of data-pearls and all points in the plurality of data-pearls to obtain a minimum average distance, wherein the minimum average distance is fixed as p; construct the L.sub.p norm shape by providing an envelope for the plurality of data-pearls using the L.sub.p norm distance measures by defining the envelope using the L norm distance measures such that L.sub.p(x,y)=r, x, and y are a set of points on the envelope based on a given distance value r, wherein the given distance value r is equal to the minimum average distance; represent a high-dimensional space occupied by the high-dimensional data as sectors in a plane of a three-dimensional plot by defining an origin of the three-dimensional plot as centroid of the at least one data cluster and a range of axes of the three-dimensional plot based on a distance between a center of each data-pearl, the centroid of the at least one data cluster and a radius of each data-pearl; plot each data-pearl in the L.sub.p norm shape in a corresponding sector of h plane of the three-dimensional plot using a radius of the L.sub.p norm shape, a sector angle in the three-dimensional plot, and an elevation angle; and generate visual representations of the plurality of data-pearls of the high dimensional data in plotted L.sub.p norm shapes in the sectors of the plane of the three-dimensional plot using a graphics module of a user device associated with the user.
9. The system of claim 8, wherein the system further performs querying the high-dimensional data based on a query using the at least one of data-pearl, by, (i) determining the plurality of data-pearls and placing the plurality of data-pearls separately using domain values of the at least one dimension, (ii) implementing a concept search that is configured to select the plurality of data-pearls based on the query provided by the user, (iii) incrementing the plurality of data-pearls based on a volume of at least one dataset, wherein the at least one dataset comprises the plurality of data-pearls, wherein the plurality of data-pearls comprises at least one attribute, (iv) filtering the high-dimensional data based on the plurality of data-pearls, wherein the at least one dataset comprises the plurality of data-pearls and the plurality of data-pearls comprises at least one attribute, (v) selecting data related to the query and performing steps from (i) to (iv) repetitively, and (vi) dimensioning the data related to the query through determined classifiers based on filtered data after pruning unrelated data to the query.
10. The system of claim 8, wherein the farthest point is computed from the centroid of corresponding data-pearl based on Lp norm distance measures.
11. The system of claim 8, wherein the elevation angle is an angle between the centroid of the plurality of data-pearls and a unit vector in dimension with maximum standard deviation.
12. The system of claim 8, further comprising identifying the L.sub.p norm shape from a given set of L.sub.p norm shapes as determined by the closed sets formed by points having at a most distance using the L.sub.p norm distance measures.
13. The system of claim 8, wherein each Lp norm distance measure is defined as a positive square root of sum of all squares of the distance in each dimension related to the high-dimensional data.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The embodiments herein will be better understood from the following detailed description with reference to the drawings, in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
DETAILED DESCRIPTION OF THE DRAWINGS
(10) The embodiments herein and the various features and advantageous details thereof are explained more fully with reference to the non-limiting embodiments that are illustrated in the accompanying drawings and detailed in the following description. Descriptions of well-known components and processing techniques are omitted so as to not unnecessarily obscure the embodiments herein. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiments herein may be practiced and to further enable those of skill in the art to practice the embodiments herein. Accordingly, the examples should not be construed as limiting the scope of the embodiments herein.
(11) As mentioned, there remains a need for a system and method for visual querying on high dimensional data using data clusters that are data-pearls. Referring now to the drawings, and more particularly to
(12)
(13) The data-pearls visualization and querying server 108 obtains high dimensional data from the user device 104 associated with the user 102. The high dimensional data may be given to the data-pearls visualization and querying server 108 through a user interface associated with the user device 104. In some embodiments, high dimensional data include data in one or more dimensions.
(14) The high dimensional data may include at least one dataset. The at least one dataset may include at least one data cluster. The data-pearls visualization and querying server 108 extracts at least one data cluster from the high dimensional data using a clustering technique. The data-pearls visualization and querying server 108 sub-divides the at least one data cluster into at least one data-pearl from the high dimensional data using the clustering technique. The at least one data-pearl is a subset of the at least one data cluster. Each data-pearl is not partitioned further and is not overlapped with one another. In some embodiments, the clustering technique may be K-means clustering technique. The data-pearls visualization and querying server 108 selects a L.sub.p norm shape for each data-pearl. The L.sub.p norm shape is selected by comparing a distance between a centroid of the at least one data-pearl and a farthest point from the centroid of the data-pearl using L.sub.p norm distance measures. The farthest point is computed from the centroid of the data-pearl using the L.sub.p norm distance measures. In some embodiments, the L.sub.p norm shapes may be a cube, sphere, etc. In some embodiments, each L.sub.p norm distance measure may be defined as a positive square root of sum of all squares of the distance in each dimension related to the high-dimensional data. In some embodiments, the L.sub.p norm shape is selected using any other distance measures other than L.sub.p norm distance measures.
(15) The data-pearls visualization and querying server 108 configures each data-pearl in the L.sub.p norm shape in a three-dimensional plot using a radius of the L.sub.p norm shape, a sector angle, and elevation angle. In some embodiments, the elevation angle is an angle between the centroid of the at least one data-pearl and a unit vector in dimension with maximum standard deviation. The data-pearls visualization and querying server 108 defines an origin of the three-dimensional plot a centroid of the at least one data cluster. The data-pearls visualization and querying server 108 defines a range of axes of the three-dimensional plot using a distance between a center of each data-pearl, the centroid of the at least one data cluster, and a radius of each data-pearl. The data-pearls visualization and querying server 108 represents a high-dimensional space occupied by the high-dimensional data as sectors in a plane of the three-dimensional plot. The data-pearls visualization and querying server 108 plots the at least one data-pearl lying in the high dimensional space in a corresponding sector of the three-dimensional plot to construct the L.sub.p norm shape.
(16) The data-pearls visualization and querying server 108 constructs the L.sub.p norm shape by providing an envelope for the at least one data-pearl using the L.sub.p norm distance measures. The data-pearls visualization and querying server 108 defines the envelope using the L.sub.p norm distance measures such that L.sub.p(x,y)=r, x, and y are a set of points on the envelope for a given distance value r, wherein the envelope obtains the L.sub.p norm shape based on the value of p. The data-pearls visualization and querying server 108 identifies the L.sub.p norm shape from a given set of L.sub.p norm shapes as determined by the closed sets formed by points having at most distance using L.sub.p norm distance measures.
(17) For example, four-dimensional (4D) space of four-dimensional data is divided into 16 parts by w, x, y and z axes. In some embodiments, orthogonal axes divisions of the high dimensional space are represented as sectors in XY plane of the three-dimensional plot. Correspondingly, the XY plane of the three-dimensional plot is also divided into sectors. In some embodiments, the data-pearls lying in various axes are plotted in the corresponding sector in the three-dimensional plot while constructing an overall shape of the clusters. The three-dimensional plot may provide a projection of the data-pearl position on the XY plane. The projection of the data-pearl is used for accessing Z-axis. The projection of the data-pearl position is obtained using the three-dimensional plot. In some embodiments, the three-dimensional plot is elevated by the elevation angle.
(18) The data-pearls visualization and querying server 108 enables the user 102 to visualize at least one data-pearl one of the high dimensional data, using a graphics module on a screen of the user device 104.
(19) The data-pearls visualization and querying server 108 enable the user 102 to search the high-dimensional data based on a query provided by the user 102 using at least one dimension, by (i) determining the at least one data-pearl and placing it in a separated manner using domain values of the at least one dimension, (ii) selecting, using a concept search, the at least one data-pearl based on the query provided by the user 102, (iii) incrementing the at least one data-pearl based on a volume of the at least one dataset, (iv) filtering the high-dimensional data based on the at least one data-pearl, the at least one dataset comprises the at least one data-pearl and the at least one data-pearl comprises at least one attribute, (v) selecting data related to the query and performing steps from (i) to (iv) iteratively and recursively, and (vi) dimensioning data related to the query through determined classifiers based on filtered data after pruning unrelated data to the query.
(20) In some embodiments, the at least one attribute of the user 102 may include acceleration, strength, finishing, heading, diving etc. In some embodiments, for example, the query in the concept search may be, “finding all the football players who are good strikers and have moderate heading accuracy”. The data-pearls visualization and querying server 108 selects the at least one attribute based on the query. In some embodiments, using the query stated above, the data-pearls visualization and querying server 108 determines a group of datapoints of all the football players who are “good” at striking, and a group of datapoints of all the football players who are “moderate”. A data-dimension is selected as “finishing” data attribute on one axis and as “heading accuracy” data attribute on another axis. The data-pearls visualization and querying server 108 filters the data-pearls using a data dimension technique. The data-pearls visualization and querying server 108 dimensions the data-pearls based on filtered data. The data-pearls visualization and querying server 108 determines number of classifiers on Z-axis using the three-dimensional plot. In some embodiments, the Z-axis is a group of datapoints of all the football players who are “moderate” as finishing. The data-pearls visualization and querying server 108 removes irrelevant data-pearls by considering the Z coordinate values. In some embodiments, the steps are recorded as a video in the user device 104.
(21)
(22) The data-pearl placement module 210 configures each data-pearl in the L.sub.p norm shape in a three-dimensional plot using a radius of the L.sub.p norm shape, a sector angle, and elevation angle. In some embodiments, the elevation angle is an angle between the centroid of the at least one data-pearl and a unit vector in dimension with maximum standard deviation. The data-pearl placement module 210 defines an origin of the three-dimensional plot is a centroid of the at least one data cluster and a range of axes of the three-dimensional plot using a distance between a center of each data-pearl, the centroid of the at least one data cluster, and a radius of each data-pearl. In some embodiments, a high-dimensional space is represented as space occupied by the high-dimensional data as sectors in a plane of the three-dimensional plot. The data-pearl placement module 210 plots the at least one data-pearl lying in the high-dimensional space in a corresponding sector of the three-dimensional plot to construct the L.sub.p norm shape.
(23) The visualization module 212 enables the user 102 to visualize at least one data-pearl one of the high dimensional data, using a graphics module on a screen of the user device 104. The querying module 214 enable the user 102 to search the high-dimensional data based on a query provided by the user 102 using at least one dimension, by (i) determining the at least one data-pearl and placing it in a separated manner using domain values of the at least one dimension, (ii) selecting, using a concept search, the at least one data-pearl based on the query provided by the user 102, (iii) incrementing the at least one data-pearl based on a volume of the at least one dataset, the at least one dataset comprises the at least one data-pearl, wherein the at least one data-pearl comprises at least one attribute, (iv) filtering the high-dimensional data based on the at least one data-pearl, (v) selecting data related to the query and performing steps from (i) to (iv) repetitively, and (vi) dimensioning data related to the query through determined classifiers based on filtered data after pruning unrelated data to the query.
(24)
(25) In
(26)
(27)
(28)
(29) IG. 7 illustrates an exploded view of a data-pearls visualization and querying server 108 of
(30) The embodiments herein can take the form of, an entirely hardware embodiment, an entire software embodiment or an embodiment including both hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. Furthermore, the embodiments herein can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer-readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
(31) The medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid-state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk-read-only memory (CD-ROM), compact disk-read/write (CD-R/W) and DVD.
(32) A data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
(33) Input/output (I/O) devices (including but not limited to keyboards, displays, pointing devices, remote controls, etc.) can be coupled to the system either directly or through intervening I/O controllers. Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem, and Ethernet cards are just a few of the currently available types of network adapters.
(34) A representative hardware environment for practicing the embodiments herein is depicted in
(35) The system provides a humanly understandable visualization of high dimensional data. The system helps to understand high dimensional data visually. The system provides visual querying on high dimensional data. For querying, no need of learning any new languages. The system is very easy to use. The system provides the ability to determine data of interest through iterative selection and exploration. The system employs a simple technique that has wider applicability for data visualization. The system analyses clustering techniques based on the selection of a number of clusters and it becomes the visualization is clearer.
(36) The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modification within the spirit and scope.