Methods and Apparatus For Facilitating Context Searching

20180011924 · 2018-01-11

    Inventors

    Cpc classification

    International classification

    Abstract

    A search engine provides summary context information for search terms across many or all of the matched records. The summary information can include additional terms including single words, phrases, pairs or other groupings of words, as well as tags or other information having special meaning. Search engines or other providers can allow users to click on or otherwise select one or more of the additional terms. Such selection can be used to add the selected term(s) to a subsequent search, and/or display to the searcher examples of the selected term(s) in context of at least one of selected ones of the plurality of records. Searchers can advantageously be given an ability to set the size of the window from which the additional terms are drawn, the number of records searched

    Claims

    1. A method of searching an electronic database, the method comprising: receiving a first search query that includes first and second search terms; identifying a set of documents in the electronic database that each include both the first and second search terms; defining a first proximity window about instances of the first search term in the set of documents and a second proximity window about instances of the second search term, wherein each first proximity window overlaps at least one second proximity window; identifying additional terms within each overlap; generating a ranked concordance of at least some of the additional terms along with a relative frequency of each additional term within each overlap; presenting the ranked concordance of the additional terms found within the first and second proximity windows to a user interface; receiving a third search term selected from the presented ranked concordance; generating a subset of the set of multiple documents containing the third search term within an overlap; and presenting the subset of the set of multiple documents to the user interface.

    2. The method of claim 1 further comprising sorting the ranked concordance.

    3. The method of claim 1 further comprising sorting the ranked concordance according to a relative frequency of occurrences of each additional term within the set of multiple documents.

    4. The method of claim 1 wherein the first and second search terms each comprise a single word.

    5. The method of claim 1 wherein at least one of the additional terms comprises multiple words.

    6. The method of claim 1 wherein at least one of the additional terms comprises a data tag.

    7. The method of claim 1 wherein at least one of the additional terms comprises a data tag, and another of the additional terms comprises a value corresponding to the data tag.

    8. The method of claim 1 wherein t at least one of the additional terms comprises an XML data tag.

    9. The method of claim 1 further comprising providing a size selection interface through which the user can select a maximum size of documents to include in the set of documents.

    10. The method of claim 1 further comprising including in the ranked concordance contexts of the additional terms, as used within the set of multiple documents.

    11. The method of claim 1 further comprising eliminating specified common terms from the listing.

    12. The method of claim 1 further comprising receiving from the user a designation of a size of the first proximity window.

    13. The method of claim 12, wherein the designation comprises a number of words on either side of the first search term.

    14. The method of claim 12 wherein the designation comprises a number of characters on either side of the first search term.

    15. The method of claim 1 further comprising receiving a limit on a number of records to be included in the set of multiple documents.

    16. The method of claim 1, wherein the ranked concordance comprises context for each additional term of the concordance, the context comprising text surrounding each additional term.

    Description

    BRIEF DESCRIPTION OF THE DRAWING

    [0018] FIG. 1 is a mock-up of a search interface that includes a search term(s) entry box and a “What's Nearby?” radio button to access contexts of searches.

    [0019] FIG. 2 is a mock-up of a search interface that includes a first table that lists words occurring within a given window of the search term(s) entered in the entry box, and a second table that lists that lists examples of the search term(s) in context of a selected one of the listed words.

    [0020] FIG. 3 is a mock-up of a search interface that includes a first table that lists phrases occurring within a given window of the search term(s) entered in the entry box, and a second table that lists that lists examples of the search term(s) in context of a selected one of the listed phrases.

    [0021] FIG. 4 is a mock-up of a search interface that includes a first table that lists metatags occurring within a given window of the search term(s) entered in the entry box, and a second table that lists that lists examples of values associated with a selected one of the listed metatags.

    [0022] FIG. 5 is a mock-up of a search interface that includes a first table that lists phrases occurring within records of the Internet or other database, and a second table that lists exemplary occurrences of the search term(s) showing examples of actual usage of a selected one of the listed phrases.

    [0023] FIG. 6 is a mock-up of a search interface in which a user has entered a search specific for a given metatag name and a range of values for that metatag.

    [0024] FIG. 7 is a mockup interface for entering and maintaining user information and preferences

    DETAILED DESCRIPTION

    [0025] FIGS. 1-6 generally depict an interface 10 which includes title 20, a primary navigation section 30, a search box 42 with secondary navigation section 44, advertising section 50, and results display sections, 60, 70, 80, and 90. Each of the sections can be considered its own interface, as can each of the fields within the sections.

    [0026] Navigation section 30 is shown here as having three radio buttons; Search, Preferences, and Alerts. Exemplary Search pages and a Preferences page are depicted in this application. Alerts pages are not separately shown or described herein because examples are well known in the field, and adaptation of known alerts strategies to include materials disclosed herein are well within the scope of ordinary skill in view of these disclosures.

    [0027] In FIG. 1, a user has entered the search term “CLA” in search box 42, and has selected (or defaulted to) the “Search Results” button in the secondary navigation section 44. Entering of a search term, per se, is conventional, and it is contemplated that users could use any combination of terms, Boolean logic, wildcards and so forth. Section 60 is also shown here in a conventional manner, showing titles, page links, text extractions and other information for the top ranked six records. Slider 61 provides a mechanism for users to view more line items than can be shown on the display at a given time.

    [0028] Although the records shown in this example are links to, and excepts from, web pages on the Internet, FIGS. 1-6 should be interpreted as being representative of searches of any database, including for example legal databases of the type used by Lexis™ and Westlaw™, and even private databases such as those containing company information. In addition, those skilled in the art will appreciate that the display 10 field layout, highlighting, and so forth are exemplary only, and that the specific mock-ups shown in these figures are to be interpreted as being representative of all suitable layouts, highlighting schemes, functional button and so forth in keeping with the teachings herein.

    [0029] In FIG. 2, the user has selected the “What's Nearby?” button in the secondary navigation section 44, which prompted the system to display the results display section 70. That section generally includes sections for selecting alpha or numeric sort 102, context selection 104, Window 106, number of Records examined 108, and two columns of data, 72 and 74. Column 72 comprises a summary of terms that fall within a given proximity of the search term(s) entered in box 42, preferably as they occur throughout the entire data set being examined. The sort (alpha or numeric according to frequency of occurrence) is set by sort buttons 151. The proximity is set by the window size 153, and the size of the data set (i.e. number of records) being examined is set by the records size 154. Sliders 73 and 75 provide a mechanism for users to view more line items than can be shown on the display at a given time.

    [0030] Although previously unknown for use in the current context, algorithms for creating concordances are known. Preferred technology for preparing concordances is available at http://www.concordancesoftware.co.uk/

    [0031] In this hypothetical example, the term “linoleic” is shown as occurring 23,667 times in the highest ranked 250,000 records of the database being searched, within a window of 20 words on either side of the term CLA. Looking further down the list, the term “conjugated” is shown as occurring only 31,05 times in the highest ranked 250,000 records of the database being searched, within a window of 20 words on either side of the term CLA.

    [0032] Those skilled in the art will immediately appreciate that the items in column 72 are sorted alphabetically in accordance with the alpha choice in section 102, and that the user could readily change the sort choice to numeric, alter the number of words in the window, and the number of records being considered. Those skilled in the art will also appreciate that the listing of items in column 72 excludes occurrences of common articles “a”, “the”, “an” and other words that are likely to be of little interest to the searcher. It is contemplated that users could set the window by number of characters instead of number of words, and that the number of records examined to develop the summary data of column 72 could be much smaller or larger than 240, and potentially even set to examine all records of a data set. The tradeoff of course in examining too many records is that the processing requirement might have a decidedly negative effect on the response time. A default window size and/or default number of records to be considered in preparing concordances can be advantageously set on a Preferences page (see e.g., FIG. 7, element 322).

    [0033] In the particular hypothetical example of FIG. 2, the user has selected the term “weight” and “loss” for further analysis. In response the system has listed surrounding text for instances of the term “weight” or “loss” occurring within the 20 word window of the search term “CLA”. The terms “weight” and/or “loss” is/are more or less centered in each line. Obviously, if the user had chose a different term from column 72, such as “fatty”, the column 74 would list instances of, and text surrounding the term “fatty”, but only in records in which the term “fatty” occurs within the 20 word window within the 250,000 top ranked records. It is contemplated that a user could select more than one term from column 72, preferably using the Windows™ standard key combinations, <shift><click>or <cntrl><click>. In the event that the user wants to conveniently add one or more selected terms from column 72, the system also provides an “Add” button 155 on the far right of the display 10.

    [0034] One issue arises as to what should be done about multiple search terms. For example, if a person searches for “apple AND banana”, should the system respond by showing additional terms in a window about “apple” and also show additional terms in a window about “banana”? Similarly, what should the system do if the term “apple” appears ten times in a single web page or other record? These are all design considerations, and it is contemplated that different search engines would implement the inventive concepts herein in disparate ways. The current preference would to count each occurrence of a search term as a separate instance, unless the two occurrences are within overlapping windows. On the other hand, if a searcher is already looking for words in proximity to each other, as in the searches “apple *** banana” or “apple near/5banana”, then the window preferably is taken about the most extreme occurrences of the searched for proximity. Thus, if the search for “apple *** banana” retrieved a record having the following text, “eat at least one serving of a high fiber, fruit or vegetable every day. Apple. Apricot. Asparagus. Banana. Beans (kidney, navy, lima, pinto, lentils)” then the current preference would be for the context window to be a fixed number of words to the left of the term “apple” and a fixed number of words to the right of the term “Banana”, even though the terms “apple” and/or “banana” appear elsewhere in the record at greater than three words apart.

    [0035] It is also contemplated that the context windows could be variable according to circumstance. Thus, if the web page or other record included long lists of somewhat related nouns or verbs that are strung together in a nonsensical fashion, which is often the case where web designers are trying to game the ranking algorithm, the system may well set the context window to zero for those particular lists.

    [0036] FIG. 2 also shows a radio button section 152 for a user to select among “Words”, “Phrases”, and “Tags”. In FIG. 2 the user has selected (or the system has defaulted to) “Words”. In FIG. 3 the user has selected (or the system has defaulted to) “Phrases”, and in FIG. 4 the user has selected (or the system has defaulted to) “Tags”.

    [0037] In FIG. 3, the user has chosen the appropriate radio button of section 152 to view context for Phrases instead of Words. Consequently, the system shows phrases in column 82, and shows specific instances of a selected phrase or phrases in column 84. Sliders 83 and 85 provide a mechanism for users to view more line items than can be shown on the display at a given time. In this particular example, the user has also chosen to list the phrases alphabetically, to use a window of 30 words, and to run the analysis on 500,000 records.

    [0038] Determination of what in considered to be a phrase can be made in any suitable manner. For example, it is contemplated that the search engine could base its phrase searching on an existing compilation such as that provided by http://www.phrases.org.uk/ or Phrases In English at http:/pie.usna.edu/. These and all other extrinsic materials discussed herein are incorporated by reference. It is preferred, however, that the search engine develop its own list of phrases, based upon user searches. Thus, every time a user searches for a phrase in quotation marks, and that phrase generates a hit list of more than a given threshold (perhaps 500 or 1,000 hits), then the system adds that phrase to the list. It is also contemplated that a user could set a field on a Preferences page or elsewhere that limits the phrases being considered by the system in the current user's searches to those having a given threshold of hits (see e.g., FIG. 7, element 324). For example, a user might well want to set the threshold hits for phrases to 100,000, so that only the more common phrases are considered in preparing the phrase concordance. Alternatively, or in addition, a user might well want to set an upper and/or lower limit to the size of phrases (number of words or number of characters) being considered (see e.g., FIG. 7, element 326). At present a preferred threshold is 10,000 hits and preferred size limits on the phrases is between 2 and 6 words.

    [0039] In FIG. 4, the user has chosen the appropriate radio button of section 152 to view context for Tags instead of Words or Phrases. Consequently, the system shows tag names in column 92, and shows specific values for a selected tag in column 94. Sliders 93 and 95 provide a mechanism for users to view more line items than can be shown on the display at a given time. In this particular example, the user has also chosen to list the tag numerically according to frequency of occurrence, to use a window of 20 words, and to run the analysis on all records.

    [0040] The system can consider any type of metatag as a tag, but most preferably focuses on XML type tags because they are currently thought to be the most popular. The type of tags searched could be selected in a Preferences page (see e.g., FIG. 7, element 328), or elsewhere.

    [0041] In FIG. 5 the user has chosen to enter search criteria, but in this case rather than merely entering a search term to search the database at large, the user has chosen to search the phrases database. In this particular example, the user entered “phrase:milk”, which would trigger the system to provide a listing of phrases containing the term “milk”. This includes, for example, “almond milk” and “got milk?”. The terms are sorted alphabetically according to the sort choice 151, and here again the user is presented with an “Add” button 155 to conveniently add the selected phrase(s) to the search box. In this particular example, the system could advantageously respond to a user clicking the “Add” button 155 by replacing “phrase:milk” with “No milk today”, and re-running the search as in FIG. 1. Although it may not be readily apparent on the reproduction of FIG. 5, the terms “Words” and “Tags” are preferably grayed out in section 152 because they are not viable choices when searching for phrases. Numerous XML alternatives are listed at http://www.pault.com/pault/pxml/xmlalternatives.html.

    [0042] In FIG. 6 the user has again chosen to enter specialized search criteria, but here the user has chosen to search for a particular tag name and a particular range of values associated with that tag. In this particular example, the user entered tag:<name>=“Super K” AND tag:<price>≦22, which would trigger the system to provide a listing of records in which the tag <name>is associated with a value of “Super K” and the tag <price>is listed with a value less than or equal to 22. Those skilled in the art will immediately appreciate that wild cards and Boolean logic could be used here and in the other examples to create extremely useful searches.

    [0043] FIG. 7 is an interface for entering and maintaining user information and preferences. The interface 300 generally comprises the company identifier 20 and navigation line 30 discussed previously, and also includes a personal information table 310, radio buttons for selecting searching preferences 320, and a table for selecting override units 330. Of particular interest here are sections for selecting preferences discussed above, namely “Concordance # Records” 322, “Phrase Threshold—Hits” 324, “Phrase Threshold—Length” 326, and “Tags Evaluated” 328.

    [0044] It should be apparent to those skilled in the art that many more modifications besides those already described are possible without departing from the inventive concepts herein. Moreover, in interpreting the disclosure, all terms should be interpreted in the broadest possible manner consistent with the context. In particular, the terms “comprises” and “comprising” should be interpreted as referring to elements, components, or steps in a non-exclusive manner, indicating that the referenced elements, components, or steps could be present, or utilized, or combined with other elements, components, or steps that are not expressly referenced. Where the specification claims refers to at least one of something selected from the group consisting of A, B, C . . . and N, the text should be interpreted as requiring only one element from the group, not A plus N, or B plus N, etc.