Plagiarism risk detector and interface
11289059 · 2022-03-29
Assignee
Inventors
Cpc classification
G10H2210/061
PHYSICS
G10H1/383
PHYSICS
G10H2220/121
PHYSICS
G10H2240/141
PHYSICS
International classification
Abstract
Methods, systems and computer program products are provided for testing a lead sheet for plagiarism. A test lead sheet receiving having a plurality of passages is received at receiving a plagiarism detector. A set of annotations describing a level of plagiarism of a plurality of elements (e.g., chord sequence, subsequences, melodic fragments (i.e., notes), rhythm, harmony, etc.) of the test lead sheet in relation to the preexisting lead sheets are generated and output via an output device.
Claims
1. A method for testing a lead sheet for plagiarism, comprising the steps of: training a machine learning model based on a plurality of preexisting encoded lead sheets; receiving, at a plagiarism detector, an encoded test lead sheet representing a test lead sheet having a plurality of segments; testing the encoded test lead sheet using the trained machine learning model to detect a level of plagiarism of a plurality of elements within one or more segments of the plurality of segments of the encoded test lead sheet in relation to the plurality of preexisting encoded lead sheets; generating a set of annotations describing the level of plagiarism of the plurality of elements; and presenting, via an output device, the set of annotations.
2. The method according to claim 1, further comprising the steps of: displaying the test lead sheet on the output device; and displaying the set of annotations on the output device by overlaying the set of annotations over the test lead sheet.
3. The method according to claim 2, wherein displaying the set of annotations includes: overlaying each annotation of the set of annotations over any one of (i) a corresponding melodic fragment, (ii) a chord sequence, or (iii) a combination of (i) and (ii) depicted on the test lead sheet.
4. The method according to claim 1, wherein each annotation indicates a portion of the plurality of elements and a level of plagiarism of the portion of the plurality of elements.
5. The method according to claim 1, wherein testing the encoded test lead sheet using the trained machine learning model further comprises the step of: for each segment of the plurality of segments of the encoded test lead sheet, determining a similarity value between the segment and each of a plurality of segments of the plurality of preexisting encoded lead sheets.
6. The method according to claim 5, further comprising the steps of: labeling as a match a segment of the encoded test lead sheet and a corresponding segment of the plurality of preexisting encoded lead sheets having a similarity value that meets a similarity threshold.
7. The method according to claim 1, wherein a negative filter database coupled to the plagiarism detector stores a plurality of encoded filter elements, and the method further comprising the steps of: comparing at least one encoded filter element of the plurality of encoded filter elements to the plurality of preexisting encoded lead sheets; and filtering out any segments of the plurality of preexisting encoded lead sheets that match.
8. A plagiarism detector for testing a lead sheet for plagiarism, comprising: at least one processor operable to: train a machine learning model based on a plurality of preexisting encoded lead sheets; receive an encoded test lead sheet representing a test lead sheet having a plurality of segments; test the encoded test lead sheet using the trained machine learning model to detect a level of plagiarism of a plurality of elements within one or more segments of the plurality of segments of the encoded test lead sheet in relation to the plurality of preexisting encoded lead sheets; generate a set of annotations describing the level of plagiarism of the plurality of elements; and cause an output device to present the set of annotations.
9. The plagiarism detector according to claim 8, the at least one processor further configured to: cause the output device to: display the test lead sheet; and display the set of annotations by overlaying the set of annotations over the test lead sheet.
10. The plagiarism detector according to claim 9, the at least one processor further configured to cause the output device to: overlay each annotation of the set of annotations over any one of (i) a corresponding melodic fragment, (ii) a chord sequence, or (iii) a combination of (i) and (ii) depicted on the test lead sheet.
11. The plagiarism detector according to claim 8, wherein each annotation indicates a portion of the plurality of elements and a level of plagiarism of the portion of the plurality of elements.
12. The plagiarism detector according to claim 8, wherein to test the encoded test lead sheet using the trained machine learning model, the at least one processor further configured to: for each segment of the plurality of segments of the encoded test lead sheet, determine a similarity value between the segment and each of a plurality of segments of the plurality of preexisting encoded lead sheets.
13. The plagiarism detector according to claim 12, the at least one processor further configured to: label as a match a segment of the encoded test lead sheet and a corresponding segment of the plurality of preexisting encoded lead sheets having a similarity value that meets a similarity threshold.
14. The plagiarism detector according to claim 8, further comprising: a negative filter database coupled to the plagiarism detector and configured to store a plurality of encoded filter elements; and the at least one processor further configured to: compare at least one encoded filter element of the plurality of encoded filter elements to the plurality of preexisting encoded lead sheets, and filter out any segments of the plurality of preexisting encoded lead sheets that match.
15. A non-transitory computer-readable medium having stored thereon one or more sequences of instructions for causing one or more processors to perform: training a machine learning model based on a plurality of preexisting encoded lead sheets; receiving, at a plagiarism detector, an encoded test lead sheet representing a test lead sheet having a plurality of segments; testing the encoded test lead sheet using the trained machine learning model to detect a level of plagiarism of a plurality of elements within one or more segments of the plurality of segments of the encoded test lead sheet in relation to the plurality of preexisting encoded lead sheets; generating a set of annotations describing the level of plagiarism of the plurality of elements; and presenting, via an output device, the set of annotations.
16. The non-transitory computer-readable medium of claim 15, further having stored thereon a sequence of instructions for causing the one or more processors to perform: displaying the test lead sheet on the output device; and displaying the set of annotations on the output device by overlaying the set of annotations over the test lead sheet.
17. The non-transitory computer-readable medium of claim 16, further having stored thereon a sequence of instructions for causing the one or more processors to perform: overlaying each annotation of the set of annotations over at least one of (i) a corresponding melodic fragment, (ii) a chord sequence, or (iii) a combination of (i) and (ii) depicted on the test lead sheet.
18. The non-transitory computer-readable medium of claim 15 wherein each annotation indicates a portion of the plurality of elements and a level of plagiarism of the portion of the plurality of elements.
19. The non-transitory computer-readable medium of claim 15, further having stored thereon a sequence of instructions for causing the one or more processors to perform: for each segment of the plurality of segments of the encoded test lead sheet, determining a similarity value between the segment and each of a plurality of segments of the plurality of preexisting encoded lead sheets.
20. The non-transitory computer-readable medium of claim 19, further having stored thereon a sequence of instructions for causing the one or more processors to perform: labeling as a match a segment of the encoded test lead sheet and a corresponding segment of the plurality of preexisting encoded lead sheets having a similarity value that meets a similarity threshold.
21. The non-transitory computer-readable medium of claim 15, wherein a negative filter database coupled to the plagiarism detector stores a plurality of encoded filter elements, and the non-transitory computer-readable medium further having stored thereon a sequence of instructions for causing the one or more processors to perform: comparing at least one encoded filter element of the plurality of encoded filter elements to the plurality of preexisting encoded lead sheets; and filtering out any segments of the plurality of preexisting encoded lead sheets that match.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The features and advantages of the example embodiments of the invention presented herein will become more apparent from the detailed description set forth below when taken in conjunction with the following drawings.
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
DETAILED DESCRIPTION
(12) The example embodiments of the invention presented herein are directed to methods, systems and computer program products for plagiarism risk assessment, which are now described herein in terms of an example cloud-based service for assessing the probability that a musical work in the form of a lead sheet is plagiaristic and presenting a graphical user interface identifying any potentially plagiaristic portions of the lead sheet along with relevant information. This description is not intended to limit the application of the example embodiments presented herein. In fact, after reading the following description, it will be apparent to one skilled in the relevant art(s) how to implement the following example embodiments in alternative embodiments (e.g., as a dedicated hardware device, and/or involving different types of music scores such as chord charts, and the like).
(13) Generally, lead sheets are encoded in a computer format referred to herein as a music interchange format and the music interchange formatted lead sheets are uploaded to a database. The music interchange format thus contains one or more sequences of information representing the content of a lead sheet. A plagiarism risk assessment service (e.g., that operates a plagiarism risk detector) uses the uploaded music interchange formatted lead sheets for detecting possible plagiarism of a test lead sheet that has also been encoded in the music interchange format. The plagiarism risk assessment service returns a set of annotations describing which aspects of the test lead sheet are similar to existing lead sheets in the database.
(14) In some embodiments, the plagiarism risk assessment service provides the annotations in real-time, and causes a graphical user interface (GUI) to display the annotations. The plagiarism risk assessment GUI can work in conjunction with a scorewriter application GUI. In some embodiments the plagiarism risk assessment GUI is combined with the scorewriter application GUI to provide annotations in substantially real time as the lead sheet is being composed. In some embodiments, the plagiarism risk assessment service is implemented in the form of a plugin of an existing scorewriter.
(15) Electronically Formatting a Lead Sheet
(16) Musical structure generally is the overall organization of a composition into sections, phrases, and patterns, very much like the organization of a text. Songs, for example, include sections, phrases and patterns that can often be further decomposed into elements that include melody, chord progression, rhythm, and lyrics.
(17) Common Western music notation is a symbolic method of representing music for performers and listeners. Besides its use in publishing sheet music, musical scores and parts, the notation has been encoded in different computer formats, referred to herein as a music interchange formats. One example music interchange format is MusicXML which is an XML based format intended to be used with scorewriter tools to parse and manipulate a musical score. MusicXML is one type of music interchange format that is designed to allow the interchange of music notation data between and among music notation editing and publishing programs, as well as music scanning programs. While the example embodiments of the invention presented herein are described as using MusicXML it should be understood that other music interchange formats can be used instead of Music XML. Alternative embodiments can use different types of music interchange formats such as msf, RMTF, MIDI, abc, reativeMusicFile, FinaleFormat, ETF, RhapsodyFormat, EncoreFormat, Noteworthy, GuitarProFormat, TablEditFormat, SmartScore, and the like.
(18)
(19) Plagiarism Risk Detection System
(20)
(21) In some embodiments, each encoded lead sheet is stored in encoded lead sheet database 306 as sequences S.sub.1, S.sub.2, . . . , S.sub.n, where n is an integer.
(22) In an example implementation, fingerprinting is performed on the segments of the sequences using a fingerprinting algorithm. Generally, a fingerprinting algorithm maps the data contained in the sequences (e.g., segments of the sequences) to, for example, shorter text strings. Such shorter text strings are known as fingerprints. These fingerprints are unique identifiers for their corresponding data and/or files. Now known or future developed mechanisms for fingerprinting and matching encoded test lead sheets to a corpus of encoded lead sheets stored in encoded lead sheet database 306 can be used.
(23) In yet another example embodiment, plagiarism risk detector 302 is coupled to a negative filter database 308. In some embodiment, such elements are also encoded in a music exchange format and are referred to herein as encoded filter elements. Negative filter database 308 stores elements of musical scores that are viewed as non-plagiaristic. Negative filter database 308 is used, for example, to filter out matches that are permissible uses, common features of musical scores, or other sections, phrases, and/or patterns (e.g., melodies, chord progressions, rhythms, and lyrics) that are common or otherwise would report false positives for plagiarism. In an example implementation, a negative filter database 308 stores encoded filter elements F.sub.1, F.sub.2, . . . , F.sub.x, where x is an integer. The filtering process involves comparing segments of a collection of source sequences S.sub.1, S.sub.2, . . . , S.sub.n, where n is an integer (e.g., representing encoded lead sheets stored in an encoded lead sheet database 306) with segments of sequences of encoded filter elements F.sub.1, F.sub.2, . . . , F.sub.x, where x is an integer. The matched segments (e.g., the segments that are similar or substantially similar) are, in turn, filtered out. That is, the matched segments are filtered and not compared to a test lead sheet.
(24) In an example embodiment, fingerprinting is performed on segments of sequences of the encoded filter elements stored in negative filter database 308. Fingerprinting is also performed on the segments of source sequences stored in encoded lead sheet database 306. In this embodiment, one or more fingerprints of the encoded filter elements are compared against the fingerprints of the encoded lead sheets. This reduces the amount of processing resources that need to be used to test an encoded test lead sheet by reducing the test data set that the encoded test lead sheet is compared against.
(25) As shown in
(26) In another example embodiment, a notation service 320 converts media content (e.g., songs) from, for example media distribution service 314 into encoded lead sheets and supplies the encoded lead sheets to encoded lead sheet database 306 for later processing.
(27) As explained above segments of a collection of source sequences S1, S2, . . . , S.sub.n, where n is an integer, representing encoded lead sheets are stored in the encoded lead sheet database 306. In some embodiments, fingerprints of the segments can be stored, for example to decrease the amount of time it takes to compare the segments, to increase the ability to make accurate comparisons, and to reduce processing resources.
(28) Plagiarism risk detector 302 uses the encoded lead sheets stored in encoded lead sheet database 306 to detect possible plagiarism and provide a set of annotations describing which elements of a test lead sheets are similar to existing lead sheets in the encoded lead sheet database 306.
(29) In some embodiments, plagiarism risk detector 302 is communicatively coupled to client device 322. In one embodiment, Plagiarism risk detector 302 is coupled to client device 322 via network 310. Client device 322 includes one or more processors and a non-transitory memory device storing an integrated scorewriting and plagiarism detection application, which when executed by the one or more processors causes the client device to operate as an integrated scorewriter and plagiarism detector.
(30) Lead Sheet Conversion and Output Model Generation Procedures
(31)
(32) As described above, in some embodiments, the computer format used to generate computer formatted lead sheet files is a music interchange format. In some example embodiments lead sheet encoding procedure 420 transmits the encoded lead sheets to another service or system for further processing. Lead sheet learning procedure 430 is such a processing service.
(33) Lead sheet learning procedure 430 retrieves the encoded lead sheet files as shown in block S432, performs a learning algorithm on the computer formatted lead sheet files S434, and generates an output model S436. The machine learning algorithm that is used to generate the output model is not limited to any machine algorithm implementation. Indeed, in some embodiments, combining multiple base learners can result in improved prediction performance. Those skilled in the art will appreciate that now known or future developed learning algorithms can be used to train the output model.
(34) Lead Sheet Plagiarism Detection Procedure
(35)
(36) In block S452, an encoded test lead sheet is received. The encoded test lead sheet 502 is also sometimes referred to as a query lead sheet.
(37) If a lead sheet to be tested is not already in a music interchange formats, the lead sheet is converted into an encoded lead sheet file 502.
(38) In the example embodiment depicted in
(39) An example test lead sheet is illustrated in
(40) In block S454, the test lead sheet is evaluated against a corpus of encoded lead sheets. This can be accomplished in a number of ways.
(41)
(42) In some embodiments, the encoded test lead sheet is formatted as a sequence (e.g., a digitized chord sequence, a digitized subsequence, and the like). Referring to
(43) In some embodiments, a method performs calculating a similarity value indicating the similarity of the segment of the encoded test lead sheet to a corresponding segment of the plurality of preexisting encoded lead sheets and identifying a segment of the encoded test lead sheet having a similarity value that meets a similarity threshold. The segment of the encoded test lead sheet having a similarity value that meets the similarity threshold is labeled as a match (i.e., as potentially plagiaristic), as shown in block S454-3.
(44) With this information, the segments of the target sequence which have the highest number of matches M (M, where M is an integer) in the source collection can be identified as being potentially plagiaristic.
(45) In some embodiments, the music score being composed, e.g., the target sequence T can be rendered as an audio file (e.g. using a MIDI synthesizer). Then sampling detection methods can be used to detect similar audio segments in the source collection (themselves rendered as audio files).
(46) As used herein a musical element refers to sections, phrases, and patterns. With respect to songs, for example, the term musical element includes sections, phrases and patterns that can be further decomposed into elements that include melody, chord progression, rhythm, and lyrics.
(47) Referring back to
(48) In turn, in block S460, the test result user interface (UI) overlay is displayed to appear on top of (e.g., overlaid over) the lead sheet notation. At block S462 a determination is made whether a test result user interface overlay has been selected. If so, then at block S464 additional information is rendered onto the display. In some examples, the encoded test lead sheet can be updated in real time as changes (e.g., edits) to the lead sheet are being made through the use of a scorewriter application, for example. In such examples, the lead sheet edit input is received at block S468, and the edited lead sheet is tested using the model at block S470.
(49)
(50) In some embodiments a link to the media content item that might be infringed (e.g., a track of an album) is provided so that an operator can quickly select the link to listen to the potentially plagiarized work. The links (or the track identifiers) are illustrated here by track identifier 530. However, other forms of identification can be used (E.g., name of song). The number of works 540 potentially plagiarized can also be presented via interface 800.
(51) It will be recognized by those skilled in the art that additional information can be provided via the user interface. For example, a plagiarism probability value (not shown) of the potential plagiarism can be displayed. The calculation can be based on the similarity value. Those skilled in the art will recognize that additional information can be displayed and still be within the scope of the invention.
(52) The example embodiments presented herein may be provided as a computer program product, or software, that may include an article of manufacture on a machine accessible or machine readable medium having instructions. The instructions on the non-transitory machine accessible machine readable or computer-readable medium may be used to program a computer system or other electronic device. The machine or computer-readable medium may include, but is not limited to, optical disks, CD-ROMs, and magneto-optical disks or other type of media/machine-readable medium suitable for storing or transmitting electronic instructions. The techniques described herein are not limited to any particular software configuration. They may find applicability in any computing or processing environment. The terms “computer-readable”, “machine accessible medium” or “machine readable medium” used herein shall include any medium that is capable of storing, encoding, or transmitting a sequence of instructions for execution by the machine and that cause the machine to perform any one of the methods described herein. Furthermore, it is common in the art to speak of software, in one form or another (e.g., program, procedure, process, application, module, unit, logic, and so on) as taking an action or causing a result. Such expressions are merely a shorthand way of stating that the execution of the software by a processing system causes the processor to perform an action to produce a result.
(53) Portions of the example embodiments of the invention may be conveniently implemented by using a conventional general purpose computer, a specialized digital computer and/or a microprocessor programmed according to the teachings of the present disclosure, as is apparent to those skilled in the computer art. Appropriate software coding may readily be prepared by skilled programmers based on the teachings of the present disclosure.
(54) Some embodiments may also be implemented by the preparation of application-specific integrated circuits, field programmable gate arrays, or by interconnecting an appropriate network of conventional component circuits.
(55) Some embodiments include a computer program product. The computer program product may be a storage medium or media having instructions stored thereon or therein which can be used to control, or cause, a computer to perform any of the procedures of the example embodiments of the invention. The storage medium may include without limitation an optical disc, a Blu-ray Disc, a DVD, a CD or CD-ROM, a micro-drive, a magneto-optical disk, a ROM, a RAM, an EPROM, an EEPROM, a DRAM, a VRAM, a flash memory, a flash card, a magnetic card, an optical card, nanosystems, a molecular memory integrated circuit, a RAID, remote data storage/archive/warehousing, and/or any other type of device suitable for storing instructions and/or data.
(56) Stored on any one of the computer readable medium or media, some implementations include software for controlling both the hardware of the general and/or special computer or microprocessor, and for enabling the computer or microprocessor to interact with a human user or other mechanism utilizing the results of the example embodiments of the invention. Such software may include without limitation device drivers, operating systems, and user applications. Ultimately, such computer readable media further includes software for performing example aspects of the invention, as described above.
(57) Included in the programming and/or software of the general and/or special purpose computer or microprocessor are software modules for implementing the procedures described above.
(58)
(59) The plagiarism risk detector 302 may further include a mass storage device 930, peripheral device(s) 940, portable non-transitory storage medium device(s) 950, input control device(s) 980, a graphics subsystem 960, and/or an output display interface 970. For explanatory purposes, all components in the plagiarism risk detector 302 are shown in
(60) Mass storage device 930 additionally stores code for executing the similarity measurement (e.g., similarity test processor) 931, test result generator 932, test results overlay generator 933, test result user interface UI 934, and negative filter 935. Similarity test processor 931 receives encoded lead sheets in a and performs a similarity measurement to determine whether any segments of sequences of the test lead sheet potentially plagiarizes any segments of sequences of preexisting encoded lead sheets. Test result generator 932 generates the test results based on a comparison of the test lead sheet against the corpus of test lead sheets. Test result user interface (UI) overlay generator 933 performs the rendering of the test results user interface overlay onto a screen, and Test results UI receives input and output from a client device on which a test music score is generated. Negative filter 935 performs negative filtering to filter out matches that are permissible uses, common features of musical scores, or other sections, phrases, and/or patterns (e.g., melodies, chord progressions, rhythms, and lyrics) that are common or otherwise would report false positives for plagiarism.
(61) The portable storage medium device 950 operates in conjunction with a nonvolatile portable storage medium, such as, for example, flash memory, to input and output data and code to and from the plagiarism risk detector 302. In some embodiments, the software for storing information may be stored on a portable storage medium, and may be inputted into the plagiarism risk detector 302 via the portable storage medium device 950. The peripheral device(s) 940 may include any type of computer support device, such as, for example, an input/output (I/O) interface configured to add additional functionality to the plagiarism detector 302. For example, the peripheral device(s) 940 may include a network interface card for interfacing the plagiarism risk detector 302 with a network 920.
(62) The input control device(s) 980 provide a portion of the user interface for a user of the plagiarism risk detector 302. The input control device(s) 980 may include a keypad and/or a cursor control device. The keypad may be configured for inputting alphanumeric characters and/or other key information. The cursor control device may include, for example, a handheld controller or mouse, a trackball, a stylus, and/or cursor direction keys. The plagiarism risk detector 302 may include an optional graphics subsystem 960 and output display 970 to display textual and graphical information. The output display 970 may include a display such as a CSTN (Color Super Twisted Nematic), TFT (Thin Film Transistor), TFD (Thin Film Diode), OLED (Organic Light-Emitting Diode), AMOLED display (Activematrix organic light-emitting diode), and/or liquid crystal display (LCD)-type displays. The displays can also be touchscreen displays, such as capacitive and resistive-type touchscreen displays.
(63) The graphics subsystem 960 receives textual and graphical information, and processes the information for output to the output display 970. Input control devices 980 can control the operation and various functions of the plagiarism risk detector 302.
(64) Input control devices 980 can include any components, circuitry, or logic operative to drive the functionality of the plagiarism detector 302. For example, input control device(s) 980 can include one or more processors acting under the control of an application.
(65) Also shown
(66) Various operations and processes described herein can be performed by the cooperation of two or more devices, systems, processes, or combinations thereof.
(67) While various example embodiments of the invention have been described above, it should be understood that they have been presented by way of example, and not limitation. It is apparent to persons skilled in the relevant art(s) that various changes in form and detail can be made therein. Thus, the disclosure should not be limited by any of the above described example embodiments, but should be defined only in accordance with the following claims and their equivalents. Further, the Abstract is not intended to be limiting as to the scope of the example embodiments presented herein in any way. It is also to be understood that the procedures recited in the claims need not be performed in the order presented.