Methods and systems for building a view of a dataset incrementally according to data types of user-selected data fields
11592955 · 2023-02-28
Assignee
Inventors
- Jock Douglas Mackinlay (Seattle, WA)
- Christopher Richard Stolte (Seattle, WA)
- Patrick Hanrahan (Portola Valley, CA)
Cpc classification
G06F16/283
PHYSICS
G06F16/9535
PHYSICS
G06F16/252
PHYSICS
G06F16/2379
PHYSICS
International classification
G06F16/28
PHYSICS
G06F16/9535
PHYSICS
G06F16/25
PHYSICS
Abstract
A process builds a view of a dataset. The process displays a graphical user interface window, including: a schema display region; a visualization region; and a shelf region that includes multiple shelves. The process detects user input to place a data field icon from the schema display region into the visualization region. Upon ceasing to detect the user input, the process associates the data field icon with a first shelf according to its data type and data types corresponding to other data field icons, if any, previously associated with the shelves, and then places the first data field icon within the first shelf. The method further includes determining a view type based on the data field icon and the association of the data field icon with the first shelf, and generating a graphical representation in the visualization region in accordance with the determined view type.
Claims
1. A computer implemented method for generating a graphical representation of a dataset, comprising: at a computer having one or more processors and memory storing one or more programs configured for execution by the one or more processors: detecting, via a graphical user interface, a first user selection to include a first data field in a graphical representation, the first data field having a first data type; in response to detecting the first user selection: selecting a first visualization type of a graphical representation for visualizing a portion of the dataset based on the first data type, wherein the first visualization type is selected from a plurality of predetermined visualization types; and displaying the graphical representation, having the first visualization type, in the graphical user interface, the graphical representation including a first plurality of visual marks representing data values for data fields that have been selected for inclusion in the graphical representation, including the first data field; detecting, via the graphical user interface, a second user selection to include a second data field in the graphical representation, the second data field having a second data type and the second user selection being distinct from the first user selection; in response to detecting the second user selection: selecting a second visualization type based on the first data type and the second data type, wherein the second visualization type is selected from the plurality of predetermined visualization types and the second visualization type is different from the first visualization type; and displaying an updated graphical representation, having the second visualization type, in the graphical user interface, the updated graphical representation including a second plurality of visual marks representing data values for data fields that have been selected for inclusion in the updated graphical representation, including the first data field and the second data field.
2. The method of claim 1, wherein the second data field is different from the first data field.
3. The method of claim 1, wherein: generating the updated graphical representation comprises adding color encoding to at least some of the second plurality of marks according to data in the second data field.
4. The method of claim 1, wherein the first data type is different from the second data type.
5. The method of claim 1, wherein the first data type is the same as the second data type.
6. The method of claim 1, wherein displaying the graphical representation comprises displaying visual marks, in the graphical representation, that correspond to data from one or more data fields that were previously selected for inclusion in the graphical representation.
7. The method of claim 1, wherein the first data type is selected from the group consisting of ordinal, independent quantitative, and dependent quantitative.
8. The method of claim 1, wherein the first data type is selected from the group consisting of ordinal, ordinal time, dependent ordinal, quantitative, independent quantitative, dependent quantitative, quantitative time, and quantitative position.
9. A computer system for generating graphical representations, comprising: one or more processors; memory; and one or more programs stored in the memory and configured for execution by the one or more processors, the one or more programs comprising instructions for: detecting, via a graphical user interface, a first user selection to include a first data field from a dataset in a graphical representation, the first data field having a first data type; in response to detecting the first user selection: selecting a first visualization type of a graphical representation for visualizing a portion of the dataset based on the first data type, wherein the first visualization type is selected from a plurality of predetermined visualization types; and displaying the graphical representation, having the first visualization type, in the graphical user interface, the graphical representation including a first plurality of visual marks representing data values for data fields that have been selected for inclusion in the graphical representation, including the first data field; detecting, via the graphical user interface, a second user selection to include a second data field in the graphical representation, the second data field having a second data type and the second user selection being distinct from the first user selection; in response to detecting the second user selection: selecting a second visualization type based on the first data type and the second data type, wherein the second visualization type is selected from the plurality of predetermined visualization types and the second visualization type is different from the first visualization type; and displaying an updated graphical representation, having the second visualization type, in the graphical user interface, the updated graphical representation including a second plurality of visual marks representing data values for data fields that have been selected for inclusion in the updated graphical representation, including the first data field and the second data field.
10. A non-transitory computer-readable storage medium storing one or more programs configured for execution by a computer system having one or more processors, and memory, the one or more programs comprising instructions for: detecting, via a graphical user interface, a first user selection to include a first data field from a dataset in a graphical representation, the first data field having a first data type; in response to detecting the first user selection: selecting a first visualization type of a graphical representation for visualizing a portion of the dataset based on the first data type, wherein the first visualization type is selected from a plurality of predetermined visualization types; and displaying the graphical representation, having the first visualization type, in the graphical user interface, the graphical representation including a first plurality of visual marks representing data values for data fields that have been selected for inclusion in the graphical representation, including the first data field; detecting, via the graphical user interface, a second user selection to include a second data field in the graphical representation, the second data field having a second data type and the second user selection being distinct from the first user selection; in response to detecting the second user selection: selecting a second visualization type based on the first data type and the second data type, wherein the second visualization type is selected from the plurality of predetermined visualization types and the second visualization type is different from the first visualization type; and displaying an updated graphical representation, having the second visualization type, in the graphical user interface, the updated graphical representation including a second plurality of visual marks representing data values for data fields that have been selected for inclusion in the updated graphical representation, including the first data field and the second data field.
11. The method of claim 1, wherein the first user selection is any one of: a click on a data field icon corresponding to the first data field, wherein the data field icon is displayed in the graphical user interface; a double click on the data field icon corresponding to the first data field; [and] or typing in a field name corresponding to the first data field.
12. The method of claim 1, wherein the second user selection includes any one of: a click on a data field icon corresponding to a data field of the one or more data fields; a double click on a data field icon corresponding to a data field of the one or more data fields; typing in a field name corresponding to a data field of the one or more data fields; [and] or creating a specification for a set of fields using statistical analysis, historical analysis, or heuristic algorithms.
13. The method of claim 1, wherein the graphical representation is updated in response to the user selecting the second data field and independently of additional user input detected after detection of the user selection of the second data field.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9) Like reference numerals refer to corresponding parts throughout the several views of the drawings.
DETAILED DESCRIPTION OF THE INVENTION
(10) The present invention provides methods, computer program products, and computer systems for automatically providing a user with a clear and useful view of a dataset. In a typical embodiment, the present invention builds and displays a view of a dataset as a user adds fields to the dataset or as a dataset is accessed, such that the view is clear and useful, and is automatically presented to the user. An advantage of the present invention is that data is presented in a clear and useful form automatically.
(11) The present invention operates on a set of data, called a dataset, that are made up of tuples. As one skilled in the art will realize, the dataset can be a relational database, a multidimensional database, a semantic abstraction of a relational database, or an aggregated or unaggregated subset of a relational database, multidimensional database, or semantic abstraction. Fields are categorizations of data in a dataset. A tuple is an item of data (such as a record) from a dataset, specified by attributes from fields in the dataset. A search query across the dataset will return one or more tuples. Fields contain data that are of particular types, and each field is of a particular type. These types include:
(12) TABLE-US-00001 Data Type Symbol Ordinal O Ordinal time (date) Ot Dependent ordinal Od (categorical measure) Measure names Om Quantitative Q Independent Quantitative Qi (dimension) Dependent Quantitative Qd (measure) Measure values Qm Quantitative time Qt Quantitative position Qx
Measure names may include an ordinal field whose domain is the name of one or more Qd fields. Measure values may include a dependent quantitative field whose domain and values are the blending of the Qd fields whose names appear in the domain of measure names.
(13) A view is a visual representation of a dataset or a transformation of that dataset. Text table, bar chart, and scatter plots are all examples of types of views. Views contain marks that represent one or more tuples in a dataset. In other words, marks are visual representations of tuples in a view. A mark is typically associated with a type of graphical display. Some examples of views and their associated marks are as follows:
(14) TABLE-US-00002 View Type Associated Mark Table Text Scatter Plot Shape Bar Chart Bar Gantt Plot Bar Line Graph Line Segment Circle Graph Circle
(15)
(16) In
(17) The computer system modules used to perform this embodiment of the invention are shown in
(18) According to one embodiment of the invention, resulting view selector 110 selects the resulting view by choosing rule(s) for adding the user selected ordered fields (step 208). This is accomplished by rule chooser 114. Rule applier 116 then applies the rule(s) to determine the resulting view's view type (step 210). In another embodiment of the invention, before rule chooser 114 chooses rule(s), view determiner 118 determines whether a first view exists (step 212). In yet another embodiment of the invention, the dataset is displayed in step 206 when mark chooser 126 chooses a mark for the resulting view (step 218), and dataset renderer 128 renders the dataset according to the mark (step 220).
(19)
(20) TABLE-US-00003 Operator Limitations = Assign field to a clause Left hand side is a column or row += Add field to the end of the clause Right hand side must (some rearrangements may occur) be O or Qd *= Blend field with column or row Right hand side (blend Qd with first E accepting/containing a must be Qd Qd). The blend will result in Qm being on column or row, and an Om being added to the view. ? Guard the action. Only add if the Unary column or row accepts the field and the cardinality of the field is less than the cardinality associated with the column or row.
(21) The sets of rules are organized first by the type of the field that is dropped (e.g. O or Qd), and then by the type of the view that the field is being dropped onto. The rules are further broken down by the type of the view. The type of a view depends on their innermost row and column. For example, OO is a view with ordinal fields in the row and column; OQ is a view with an ordinal field in the row and a quantitative field in the column; and φ is an empty view with no fields. For each type of field being dropped, a rule table is shown containing the rules for each type of view into which the field is being dropped. The columns of the rule tables represent the contents of the innermost field on the column (X), and the rows of the rules table the innermost field on the row (Y).
(22) In step 208, rule(s) for adding the user selected field's data type are chosen. For example, if a user selected field is an ordinal, then the set of rules in
(23) Notes for
(24) Conversions for
(25) Notes for
(26) Notes for
(27) Notes for
(28) Notes for
(29) Notes for
(30) The order in which fields are added affect the view type of the resulting view. For example, if a measure data type field is added to an empty view, and is subsequently followed by a dimension data type field, the resulting view will be a bar chart. However, if a measure data type field is added to an empty view subsequent to a dimension data type field, then the resulting view will be a text table. The resulting view's view type is thusly selected based upon a set of rules. The view type is then assigned to the resulting view and the view is then populated with data from the dataset. In one embodiment, the set of rules are predetermined. In another embodiment, the set of rules are based upon a user's preferences or actual usage. For example, a user may be given the opportunity to designate the best view type for various sequences of the addition of fields to views. Or, after the visual plot is populated and rendered for the user, the user is allowed to choose a different rendering. The user's choice as to the ultimate resulting view, if recorded, may indicate the user's preference for what view type the user considers the clear and/or useful. In yet another embodiment, heuristics may be used instead of a set of rules for selecting a resulting view.
(31) In one embodiment, the cardinalities of the fields in the resulting view are computed and are considered in determining how the user selected fields are added. In set theory, cardinality is the size of a set. In the present invention, cardinality refers to the number of distinct instances that are associated with a field's type. For example, if a field type is “States of America”, then the cardinality of such a field would be 50.
(32) In another embodiment, the functional dependency of the fields in the resulting view are computed and are considered in determining how the user selected fields are added. Functional dependency refers to the determination of one field by another field. For example, if one field is of the type “States of America,” and a second field is “Inches of Rainfall of the States of America,” then the second field depends upon the first. Another example is shown in
(33) In yet another embodiment, in the application of the selected rule to populate the resulting view with data from the dataset, a mark is chosen for the resulting view's view type and the data from the dataset is rendered according to the mark. This is shown in
(34)
(35) Now, referring to
(36) In another embodiment, alternative views are formed based upon a set of criteria.
(37) In one embodiment, if the user selected a first option, then the alternative views are ranked according to a rating system by alternative view ranker 134 in step 608. View assignor 120 then assigns the resulting view as the highest ranked alternative view at step 610. Dataset displayer 112 then displays the dataset according to the resulting view in step 606. For example, if all the data in a dataset is aggregated and does not contain any independently quantitative data, then alternative views of all the view types listed in
(38) In another embodiment, if the user selected a second option, then a list of alternative views would be displayed by list displayer 136 at step 622 for the user's selection. After the user's selection is received at step 624 by selection receiver 138, the resulting view is assigned as the alternative view that the user selected by view assignor 120 at step 616, and dataset displayer 112 then displays the dataset according to the resulting view in step 606.
(39) In yet another embodiment of the invention, cardinality computer 122 computes the cardinality of the fields in the plurality of tuples when forming the alternative views. In a further embodiment, functional dependency computer 124 computes the functional dependency of the fields in the plurality of tuples when forming the alternative views.
(40)
(41)
(42)
(43)
(44)
(45)
(46)
(47)
(48)
(49)
(50)
(51)
(52) The present invention not only accepts datasets and databases as inputs, it also accepts views as inputs. A view can be used to represent a set of fields. Resulting views can also depend on the existing view. For example, rules or operators can take into account the current view to generate a new view that is related to the current view. Also, as one skilled in the art will realize, many other rules are possible, include ones to generate statistical, maps, pie charts, and three dimensional views of data.
(53) The present invention can be implemented as a computer program product that comprises a computer program mechanism embedded in a computer readable storage medium. For instance, the computer program product could contain the program modules shown in
(54) Many modifications and variations of this invention can be made without departing from its spirit and scope, as will be apparent to those skilled in the art. The specific embodiments described herein are offered by way of example only, and the invention is to be limited only by the terms of the appended claims, along with the full scope of equivalents to which such claims are entitled.
(55) All references cited herein are incorporated herein by reference in their entirety and for all purposes to the same extent as if each individual publication or patent or patent application was specifically and individually indicated to be incorporated by reference in its entirety for all purposes.