Author
|
Music A.I.
|
routingwithin
IsraTrance Junior Member
Started Topics :
46
Posts :
204
Posted : Jul 29, 2016 19:55:01
|
Yeah I know, you probably expected more when seeing a title like that, but this is indeed only a basic point. lol
Someone should design an A.I. that exists within the web. If you produce a track then it must go through the A.I.
Taking into account that every person has a taste in music, so to speak the A.I. has a music taste of everyone in this world. However if your produced track sucks then your upload is denied, same goes to tutorials etc.
So in the end only the best and most important exist on the web.
Reason why. I have seen tutorials, listened to SC music and some stuff should just be deleted u know. I mean, have you ever listen to someone rambling on just to end after 4min hearing he is a crappy producer.. hahah
peace out
" We are together in this matter you and I, closer to death, yes, closer than i'd like. How do you feel? - There can be no division in our actions, or everything is lost. What affects you affects me. " |
|
|
knocz
IsraTrance Junior Member
Started Topics :
40
Posts :
1151
Posted : Sep 16, 2017 02:20
|
Well, neural networks are somewhat indicated for solving this problem.
Everything you describe there is quite simple, given that we can "make every music pass through a service" - on any media stream service, each song should go through a bunch of services even during the upload.
But, "the A.I. has a music taste of everyone in this world" is where the whole tricky part comes in. How can we do that? Are there any simple describable set of rules you could set forth to consistently make this part?
Neural networks are more or less black ~magic~ math, ideal for solving problems which can't easily be divided into smaller logical rules, but one which we can provide an input and and output, and "tell it to adjust".
So, we program a bunch of "nodes" (idealized as neurons) and connect them between each other in some way. On one side, we feed an input - a music track, somehow - and we define the output as having just 2 nodes: OK, and NAY
Each "connection" between a node has a given weight, from 0 to 1.
So, we feed the entire track at the same time somehow, like 1 sample per node - that's a friggin huge amount of input nodes - and we then will imagine, each node will send that value it received to all of it's connections, multiplied by each respective connection weight. In other words, you feed the track into the network, and look at he OK and NAY nodes on the output: the networks should light up one, according to the right choice.
Strange concept, because it usually starts with random weights all around, so you can imagine it's all gibberish going on inside - values being sent to the next nodes while being multiplied, which in are then send to the next connections, multiplied again.. until it finally reaches the end 2 nodes and we get a "OK" or a "NOK" (or both or even none, which would be an undesirable result).
It would be trash, if it weren't for those crazy math dudes who decided to apply some weird algorithms to be able to tell the network: "Yo dude, you just said NAY for this track, but it was the friggin hit of the year everyone bas bouncing all over the place, go fix yourself" - essentially teaching the network more about what is right and wrong, by experience.
This way, we can backpropagate any desired output, and the crazy math will adjust all the weights so that (eventually) that specific input will provide this desired output.
Do this too many times with the same info, and it will be fixed to the same result. But, if you start giving it lots of different input and output data, it will learn...
It's really that "simple" LOL
The tricky parts are:
- how will you represent your input - having 308700 samples to input will mean that many nodes, plus a ton of inner nodes and connections.. we need to convert the track to something else before, or even try to feed it the tune over time, but then we need to teach it the concept of "time" Not exactly hard, but not the most trivial neural network for sure
- how will you define your architecture - 308700 input nodes, connected to 508700 (lets call it) layer1 nodes, connected to 8700 layer2 nodes, ..., connected to the final 2 outputNodes.. That's a lot of connections, a LOT of math and computation that the poor CPU/GPU's need to work out, and this takes time. It's easy to hear these concepts and imagine the human brain, but our CPU can handle this - it's "made" out of it
- which knowledge should it start with? Even though we are all different, we share a lot between us, so naturally a lot of our brains are programmed alike. But I think if you make a human do a task of choosing good and bad tunes for all his life, his brain will get bored - so we need a lesser brain.
I like "Long Short-Term Memory Recurring Neural Networks" - although it can lean to develop knowledge and a lot of tendencies, to wont remember specific knowledge for long times. So, if we are feeding a track, lets say 44100 samples at a time (which reduces dramatically the number of nodes), it will take some iterations to feed the entire track. It should be able to remember the track it's listening to - so at the end it can give an output somehow - but what it needs to retain for the next track is only what it learned from the previous one - not the entire tune for sure. So it gets the weights adjusted, but the actual input and inner values would be deleted.
What is nice about the LSTM R NN is it's simplicity to imagine (from the nice graphics you can find on uncle google), each neuron is in fact a little tiny group of neurons, connected in a specific way, where it can store a piece of information for any given time, if can choose to output it later, and it has the wonderful ability to forget (the creator of this had actually forgotten to add this ability - which limited the networks capabilities ).
And then you still need to fiddle around with the number of nodes and layers and etc., and hope that perhaps this next simulation run, the random starting numbers will align and everything will work.- cause if it doesn't work, you have no way of understanding what the inner weights values mean nor what they should be, except having to tell the ouput what it should have been, and let it fix itself over time.
So, you would need to apply some sort of idea like this one, which although isn't too complex, and the math is all online, it does require some time to implement and train. Plus, you need the data - a large dataset of input tracks and the info if it was a killaarrgh tune, or something dignity of The Workshop section. Then you need to actually run a large amount of simulations to train the network; afterwards, you'll need to be able to store and load the network state, and have all that infrastructure of the online services sending you the tune and getting the result (or, perhaps, your own website where you provide this service to The People).
Super Banana Sauce http://www.soundcloud.com/knocz |
|
|
knocz
IsraTrance Junior Member
Started Topics :
40
Posts :
1151
Posted : Sep 16, 2017 02:28
|
Now that I review this, I was remembering since the outputs are floating numbers (not high, but with a decimal point), these number would be rounded on the output nodes, or even in the inner nodes. But, I can't think of anyone who tried adding the ability to compress/expand the node values to some (weighted?) threshold.. could our brain compress a signal?
Add a little bit of convulsion and, given you can work out the math to backpropagate the errors to train the network, you could be on to something nice
Super Banana Sauce http://www.soundcloud.com/knocz |
|
|
|