SYSTEM AND METHOD FOR MUTATION TUNING OF AN AUDIO FILE
20230368758 · 2023-11-16
Inventors
Cpc classification
G10H2210/131
PHYSICS
G10H1/0025
PHYSICS
G10H2220/126
PHYSICS
G10H2250/311
PHYSICS
G10H2210/105
PHYSICS
International classification
Abstract
Disclosed is a system and method for mutating an audio file, and more particularly, for user-trained mutation tracking and tuning of an audio file, comprising the steps of: receiving a user input, wherein the user input is at least one of an audio file; entering at least a pattern into a grid sequencer by selecting any number of squares in the grid, wherein each square represents a particular count occupancy probability at a particular count in a musical composition bar that the user prefers to render as a final output; uploading at least one ‘good’ and ‘bad’ audio file sample by the user to affect the particular count occupancy probability based on the user input and pattern; and rendering the final output comprising the mutated audio file based on the user input, pattern, and upload.
Claims
1. A method for mutating an audio file, said method comprising the steps of: receiving a user input, wherein the user input is at least one of an audio file from a first user; entering at least a pattern into a grid sequencer by selecting any number of squares in the grid, wherein each square represents a particular count occupancy probability at a particular count in a musical composition bar that the user prefers to render as a final output; uploading at least one ‘good’ and ‘bad’ audio file sample by the user to affect the particular count occupancy probability based on the user input and pattern; and rendering the final output comprising the mutated audio file and a visualization of the grid sequencer in terms of an indicator of a probability of a particular count occupancy based on the user input, pattern, and upload.
2. The method of claim 1, wherein the audio files are at least one of a drumbeat sample categorized as either a hat, kick, or snare.
3. The method of claim 2, wherein the audio files are at least one of uploaded from the user, uploaded from a shared pool or pre-installed by a developer.
4. The method of claim 1, further comprising a confidence value entered by the user in which the value represents an extent of a muting filter to affect drum count occupancies at a particular count in the bar.
5. The method of claim 4, wherein the confidence value entered is an integer within a range of integers, wherein higher integers within the range result in a higher probability of rendering the highest occupancy counts to be rendered to the final output.
6. The method of claim 4, wherein the confidence value entered is an integer within a range of integers, wherein lower integers within the range result in a lower probability of rendering the highest occupancy count to be rendered to the final output.
7. The method of claim 1, wherein the uploaded “good” and “bad” audio files train a neural network to determine interrelatedness between sequencer grid counts expressed as weights to determine which count is expressed in the mutated audio file and/or final output.
8. The method of claim 1, further comprising a harvest value entered to determine the size of the mutated audio file and/or final rendered output as a function of size.
9. The method of claim 1, wherein the mutated audio file is referred to as a seed, wherein the seed is a distinct procedurally generated fragment derived from at least one of the user pattern, input, load, or entered.
10. The method of claim 9, wherein the seed is at least one of saved, played-back, uploaded for training, or shared to another user for seed germination based on the other users preferences, or scraped to determine the first users seedling characteristics (pattern, input, load, or entered).
11. A method for mutating an audio file, said method comprising the steps of: receiving a user input, wherein the user input is at least one of an audio file; entering at least a pattern into a grid sequencer by selecting any number of squares in the grid, wherein each square represents a particular count occupancy probability at a particular count in a musical composition bar that the user prefers to render as a final output; uploading at least one ‘good’ and ‘bad’ audio file sample by the user to affect the particular count occupancy probability based on the user input and pattern; and rendering the final output comprising the mutated audio file based on the user input, pattern, and upload.
12. The method of claim 11, further comprising a confidence value entered by the user in which the value represents an extent of a muting filter to affect drum count occupancies at a particular count in the bar.
13. The method of claim 12, wherein the confidence value entered is an integer within a range of integers, wherein higher integers within the range result in a higher probability of rendering the highest occupancy counts to be rendered to the final output.
14. The method of claim 12, wherein the confidence value entered is an integer within a range of integers, wherein lower integers within the range result in a lower probability of rendering the highest occupancy count to be rendered to the final output.
15. The method of claim 11, wherein the mutated audio file is referred to as a seed, wherein the seed is a distinct procedurally generated fragment derived from at least one of the user pattern, input, load, or entered.
16. The method of claim 15, wherein the seed is at least one of saved, played-back, uploaded for training, or shared to another user for seed germination based on the other users preferences, or scraped to determine the first users seedling characteristics (pattern, input, load, or entered).
17. The method of claim 11, further comprising a visualization of a grid sequencer in terms of an indicator of a probability of a particular count occupancy based on the user input and entered pattern.
18. The method of claim 11, wherein the mutated audio file is referred to as a seed, wherein the seed is a distinct procedurally generated fragment derived from at least one of the user pattern, input, load, or entered for archive, share, playback, uploaded for training.
19. The method of claim 18, wherein a plurality of analogous seeds are visually depicted in a graph based on a pre-defined analogy of sound for further mutation tuning.
20. A system for mutating an audio file, comprising: a rendering module; a visualization module; a processor; a memory element coupled to the processor; a program executable by the processor, over a network, to: receive a user input, wherein the user input is at least one of an audio file; enter a pattern into a grid sequencer with columns and rows of boxes, wherein each box of the grid represents a particular count occupancy probability at a particular count in a musical composition bar that the user prefers to render as a final output; render the final output comprising the mutated audio file by the rendering module and a visualization of a grid sequencer in terms of an indicator of a probability of a particular count occupancy based on the user input and entered pattern by the visualization module; and render a second final output mutated from the final output based on at least one of a second received input, pattern entered, or training samples uploaded by the rendering module.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
[0009] The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings which:
[0010]
[0011]
[0012]
[0013]
[0014]
[0015]
[0016]
DETAILED DESCRIPTION OF THE DRAWINGS
[0017] Numerous embodiments of the invention will now be described in detail with reference to the accompanying figures. The following description of the embodiments of the invention is not intended to limit the invention to these embodiments but rather to enable a person skilled in the art to make and use this invention. Variations, configurations, implementations, and applications described herein are optional and not exclusive to the variations, configurations, implementations, and applications they describe. The invention described herein can include any permutations of these variations, configurations, implementations, and applications.
[0018] In the following description, numerous specific details are outlined in order to provide a thorough understanding of the invention. It will be apparent, however, to one skilled in the art that the invention can be practiced without these specific details.
[0019] Reference in this specification to “one embodiment” or “an embodiment” or “some embodiments” means that a particular feature, structure, or characteristic described in connection with the embodiment(s) is included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” or “in some embodiments” in various places in the specification are not necessarily all referring to the same embodiment(s), nor are separate or alternative embodiments mutually exclusive of other embodiments. Moreover, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments but no other embodiments.
[0020] As a person skilled in the art will recognize from the previous detailed description and the figures and claims, modifications and changes can be made to the embodiments of the invention without departing from the scope of this invention as disclosed herein the present application. It will be appreciated that, although the methods, processes, and functions of the present application have been recited in a particular series of steps, the individual steps of the methods, processes, and functions may be performed in any order, in any combination, or individually.
[0021] Embodiments are described at least in part herein regarding flowchart illustrations and/or block diagrams of methods, systems, and computer program products and data structures according to embodiments of the disclosure. It will be understood that each block of the illustrations, and combinations of blocks, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general-purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the block or blocks.
[0022] The aforementioned computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function/act specified in the block or blocks. The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus, to produce a computer-implemented process such that, the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions/acts specified in the block or blocks.
[0023] In general, the word “module” as used herein, refers to logic embodied in hardware or firmware, or to a collection of software instructions, written in a programming language, such as Java, C, etc. One or more software instructions in the unit may be embedded in firmware. The modules described herein may be implemented as either software and/or hardware modules and may be stored in any type of non-transitory computer-readable medium or other non-transitory storage elements. Some non-limiting examples of non-transitory computer-readable media include CDs, DVDs, BLU-RAY, flash memory, and hard disk drives.
Certain Terminologies
[0024] Unless otherwise defined, all technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context dictates otherwise. Any reference to “or” herein is intended to encompass “and/or” unless otherwise stated.
[0025] As used herein, an “audio-visual file” or “AV file” is a series of one or more audio-visual (AV) clips recorded on the same video source (e.g., a single video camera). Two or more “parallel AV files” are recordings of the same action recorded on two or more respective video sources.
[0026] Now in reference to
[0027] In one embodiment, a system may comprise: a rendering module 107207; a visualization module 105205; a processor; a memory element coupled to the processor; a program executable by the processor, over a network 103, to render a mutated audio output comprising the mutated audio file and a visualization of the grid sequencer in terms of an indicator of a probability of a particular count occupancy based on the user input, pattern, and upload (PILE) 101102. The interactive visual enables a user to tune/train the construct for hyper-specific randomized mutations/germinations derived from the PILE 101102. As shown in
[0028] The network 103 may be any suitable wired network, wireless network, a combination of these, or any other conventional network, without limiting the scope of the present invention. A few examples may include a LAN or wireless LAN connection, an Internet connection, a point-to-point connection, or other network connections and combinations thereof. The network 103 may be any other type of network that is capable of transmitting or receiving data to/from host computers, personal devices, telephones, video/image capturing devices, video/image servers, or any other electronic devices. Further, the network 103 is capable of transmitting/sending data between the mentioned devices. Additionally, the network 103 may be a local, regional, or global communication network, for example, an enterprise telecommunication network, the Internet, a global mobile communication network, or any combination of similar networks. The network 103 may be a combination of an enterprise network (or the Internet) and a cellular network, in which case, suitable systems and methods are employed to seamlessly communicate between the two networks. In such cases, a mobile switching gateway may be utilized to communicate with a computer network gateway to pass data between the two networks. The network 103 may include any software, hardware, or computer applications that can provide a medium to exchange signals or data in any of the formats known in the art, related art, or developed later.
[0029] Preferred embodiments may include the addition of a remote server or cloud server 508 to further provide for back-end functionality and provisioning/analytical support 510. The server 508 may be situated adjacent to or remotely from the system and connected to each system via a communication network 103. In one embodiment, the server 508 may be used to support user behavior profiling; user history function; predictive learning/analytics; alert function; network sharing function; digital footprint tracking; visualization, graphical interactivity, etc. (510).
[0030] The electronic computing device may be any electronic device capable of sending, receiving, and processing information. Examples of the computing device include, but are not limited to, a smartphone, a mobile device/phone, a Personal Digital Assistant (PDA), a computer, a workstation, a notebook, a mainframe computer, a laptop, a tablet, a smartwatch, an internet appliance and any equivalent device capable of processing, sending and receiving data. The electronic computing device can include any number of sensors or components configured to intake or gather data from a user of the electronic computing device including, but not limited to, a camera, a heart rate monitor, a temperature sensor, an accelerometer, a microphone, and a gyroscope, to assess a state of the user for informing the user profile/context for more user-specific randomized mutation/germination. The electronic computing device can also include an input device (e.g., a touchscreen, keyboard, or mouse) through which a user may touch and/or cursor control for input commands. Multiple inputs from a single user computing device (as shown in
[0031] In another embodiment of the present invention, the rendering/mutation algorithm may employ unsupervised machine learning to learn the features of drum count occupancy probability from the PILE and iterative (i) inputs (any input beyond PILE) for final rendering. For example, a Neural Network Autoencoder can be used to learn the features and then train a Deep Neural Network or a Convolutional Neural Network. The classification may be based on a supervised or unsupervised machine learning technique, and the classification is performed by analyzing one or more features of the inputs (PILE/i). Such approaches result in hyper user specificity, in what may otherwise appear as a randomized mutation, and not to mention a reduction of power consumption and/or increase in the detection speed and accuracy.
[0032] Additionally, in another embodiment of the invention, the system may comprise a back-propagated neural network to use a series of externally captured buffers containing known audio-visual sources to aid in real-time recognition of the audio and video input by using a probabilistic approach to determine the presence in a captured buffer. A classification algorithm may be based on supervised machine learning techniques such as SVM, Decision Tree, Neural Net, Ada Boost, and the like. Further, the classification may be performed by analyzing one or more features based on any one of, or combination of, any PILE/i.
[0033] While not shown in
[0034] Now in reference to
[0035] In continuing reference to
[0036] Below the Confidence drop-down, lies the Seed integer drop-down, specifying an identifier for the random instance the output file was rendered. Different seeds render different composition results in the output audio file within the parameters specified in the confidence score value box. The seed may be seen as a distinct procedurally generated fragment derived from at least one of the user patterns, input, load, or entered. The seed is at least one saved, played- back, uploaded for training, or shared to another user for seed germination based on the other user’s preferences, or scraped to determine the first users’ seedling characteristics (pattern, input, load, or entered). Following the user inputs of a confidence and seed value, the user may then enter in a Reps value by drop-down or manually entering a text/numeric value that indicates how many times the neural network will be trained. By increasing the value of the rep, sequencer composition outliers will be further controlled for. By controlling for the value of the rep, the user has an additional incremental mutation tuning tool-allowing for the user to engage in an ever-so-slight germination trajectory yet again. The user manipulation makes adjustments as a more fine-tuned technique of filtering results that are more or less sporadic, similar to a limiter audio effect where a vocalist’s volume level is more consistently spaced from the microphone. The confidence parameters establish a ceiling with its curvature tightened or loosened by the Reps integer. The default value is 100,000. What’s more, the seeds may be visually depicted in a graph, based on a pre-defined color-coded analogy of a sound/sound feature/sound characteristic for further mutation tuning, processing, sharing, etc.
[0037] The user then enters in a Harvest value as an integer, indicating how long the output audio file will be expressed in bars. The value determines the size of the mutated audio file and/or final rendered output. The textbox’s default integer is 4 bars. Each bar will contain a unique mutation making it easier for the user to systematically review them one after another.
[0038] The user then presses the Upload button under the Good/1 Batch Size title and an integer indicating the batch collection size the user wishes to reflect in the final rendered audio output file. The user selects the “Upload” button to summon the device’s native OS, the file explorer window, where they can select a text composition file on their device’s hard drive. The user presses the Load Notation button to increment the Good/1 Batch Size integer by one indicating that the user has stored another composition in the good batch group before training the neural network. The user presses the Train Good/1 Batch to train the neural network where the weights between the perceptron beat count and output neuron are represented as 1 or 0, depending on whether loaded as a ‘good’ or ‘bad’ training sample. This is the leading factor in significantly increasing beat count occupancy for the associated X counts in the text composition file. The Training Dataset integer increases by the ‘good’ compositions batch quantity. Conversely, the ‘batch’ composition files negatively affect the weights instead since they belong to the Bad/0 Batch group.
[0039] The Beat Chamber (BC) is the composition that is dialed into the sequencer window or dialed automatically by uploading an individual composition file to toggle the beat count squares from light to dark gray (state 1 to state 2). This composition is sent through a channel that bypasses the mutation process from the neural network (
[0040]
Interface Source Code - the Interface Requires a Jupyter Notebook Server Installed on a Debian Linux System via PIP3
[0041] Exemplary Script Excerpt:
TABLE-US-00001 import ipywidgets as widgets from time import sleep import os from ipywidgets import ColorPicker, Button, HBox, VBox, Box, Layout, ButtonStyle from ipywidgets import GridBox, Label, ToggleButton, BoundedIntText, Select, Text import math #import IPython.display as ipd from ipywidgets import FileUpload #hello=“hello” #labelWidget = widgets.Label(value = r’\(\color{red}\fontsize{9.6}{‘ + str(hello) + ’}\)’) upload = FileUpload() #from IPython.display import clear_output tds = “Training Dataset” sle_siz = 5 check_pass = 0 ###### widgets ######## bat_1 = os.listdir(‘/root/neuralscan/simple-neural- network_revamped/beat_frame/’) bat_0 = os.listdir(‘/root/neuralscan/simple-neural- network_revamped/beat_frame_off/’) bc_occ = os.listdir(‘/root/neuralscan/simple-neural- network_revamped/beat_chamber/’) ##print(len(entries)) file = open (“/root/logol.jpg”, “rb”) image = file.read() logo = widgets.Image( value=image, format=‘png’, width=150, height=200, ) difficulty_label = widgets.Label(value=“Expert User Interface”) prj_label = widgets.Label(value=“Project Name: Loader”) drum_samp_arr = [“LexLugerHiHat.wav”,“lexhvykick.wav”,“lexsnarel.wav”] , , , select_sample = os.listdir(‘/root/neuralscan/samples/’) drum_type=widgets.widgets.Select( options=select_sample, #options=[‘kick’, ‘snare’, ‘high hats’], #value=‘kick’, # Defaults to ‘pineapple’ #layout={‘width’: ‘max-content’}, # If the items’ names are long description=‘Drum Type:’, disabled=False ) , , , select_hats = os.listdir(‘/root/neuralscan/samples/hats/’) hats_menu=widgets.Select( #options=[“kick1”, ‘kick2’, ‘kick3’], options=select_hats, #value=‘kick1’, # rows=10, description=‘Hats:’, disabled=False ) , , , def on_value_change2(change): if str(sample_menu.value) == str(change[‘new’]): print(“sample change”) #if str(drum_type.value) == “hats”: # print(drum_samp_arr) # drum_samp_arr[0] = sample_menu.value # print(drum_samp_arr) #if str(drum_type.value) == “kicks”: # print(drum_samp_arr) # drum_samp_arr[1] = sample_menu.value # print(drum_samp_arr) #if str(drum_type.value) == “snare”: # print(drum_samp_arr) # drum_samp_arr[2] = sample_menu.value # print(drum_samp_arr) def on_value_change(change): if str(drum_type.value) == str(change[‘new’]): print(“drum type change”) sample_menu.options = os.listdir(“/root/neuralscan/samples/”+str(drum type.value)+“/”) #if str(drum_type.value) == “hats”: # sample_menu.value = drum_samp_arr[0] #if str(drum_type.value) == “kicks”: # sample_menu.value = drum_samp_arr[1] #if str(drum_type.value) == “snare”: # sample_menu.value = drum_samp_arr[2]
[0042]
[0043] Embodiments are described at least in part herein with reference to flowchart illustrations and/or block diagrams of methods, systems, and computer program products and data structures according to embodiments of the disclosure. It will be understood that each block of the illustrations, and combinations of blocks, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general-purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the block or blocks.