A few weeks ago, I decided to take a look (one free week) at software that claims to be able to create a vocal track based on musicxml and lyrics. Most of this class of software 'extends' the piano roll paradigm used by the AEM (Audio Engineering Model). Two things are added to a vocal track by the (normally separate) software - a text syllable is placed into the rectangle representing a note and a 'curve' is added around the rectangle to designate how the note is sung/articulated. I decided to try one of my earlier live instrumental recordings where I recorded and played the melody and arrangement on a Yamaha P80 keyboard back in 2002.
https://soundcloud.com/joelirwin/my-friend-forevermore
I chose AceStudio to try as it heavily advertised on my Facebook feed. Here is what I tried:
1. I first created a full score of the track as a lead sheet containing the melody and lyrics. I scored the whole song longhand without any repeat sections. The idea of this score was to allow me to export a musicxml file that would allow AceStudio to create a singing track. I imported a musicxml file into AceStudio and chose a singer and i was indeed able to generate a vocal track that played well against the original mp3 file - 'to a point'. Now in real life the musicians may not hold a beat exactly - sometimes they could play slightly early or sometimes slightly late. But the singer always seems to manage to stay with the musicians and never seems to sound out of sync (i.e., early or late with the musicians). Unfortunately, about halfway through the playing, the singer started to sing before the musicians constantly and it was noticeable. In concept, it could be fixed by the AE (Audio Engineer) but I didn't have time or the feature in the software other than manually dragging all the out of sync rectangles.
2. I thought perhaps it was my exported music XML so I tried the AceStudio utility to create musicxml from a PDF file - the utility kept aborting. Support said it worked for them. I never had time to troubleshoot why they worked and mine didn't.
3. I played the music track and recorded my voice singing it and then substituted another singer. The issue here is the same as playing a keyboard and recording it into a piano roll track except a bit more inexact - thing is my voice when recorded was slightly high or low (i.e., I sang sometimes slightly out of key) and sometimes early or sometimes late. So they were all the syllabic rectangles there but some were to high/low or slightly to left/right. With more time and the right software I could have recorded myself separately and put myself in key and maybe even sync on beat ("quantize") and then import. Instead I manually moved all the rectangles in place. Here is how it sounded
https://soundcloud.com/joelirwin/my-friend-forever-more-acestudio-valentina
So if you are well versed in the AEM and like to work with piano roll and enjoy tweaking and customizing, then you will find software like AceStudio will function for you much like any virtual instrument you have worked with keeping in mind that singing is its own idiom and that what gets manipulated and chosen is vocal specific.
Is it AI? IMHO that depends on what you call AI - my idea of effective AI is where the software does most everything (like you would expect from a real singer). The current software (so far as I have seen) while providing 'intelligence' on the singing paradigm, still requires a significant design and implementation effort by the AE.
Take for the example the first step of using a vocalist - auditioning for one. In the software model you get to filter the singer on certain criteria and then you need to choose the resulting ones one at a time as your vocalist until you find one that sounds 'right'. You may end up try out literally dozens of singers. In real life there are contests the function this way but as the songwriter/music creator' I would prefer the software to analyze my song/music and knowing (i.e., intelligence) a lot about the candidate singers including some of their recordings, be able to create a preferred ranked list of 'finalists' much like the auditioning process could be for a theater or an opera.
I believe AI Vocalist software has a way to go in offloading intelligence from the AE. For example, each singer has a different sweet spot (AceStudio does not let you know what the sweet spot is but software like "SoundID Voice AI" does). The music key may need to be modified to maximize the singers performance. Also, some singers have limited range and so the key needs adjusting to minimize 'octaving' which may still be necessary sometimes (e.g., singing too high may sound weird and so the vocalist may need to sing an octave lower). And here is one more: word articulation. Singers sing in dialects and words like "hand" or "potato" which are pronounced differently and so must be sung differently. You can ask a singer to sing every occurrence of the word "hand" in a different pronunciation. Right now it could be difficult for an AE to tell the software to change all occurrences of a word or syllable to something else or even something more complex - re-sing a song with a different accent.
So they may be software out there that are more sophisticated than AceStudio in implementing a virtual singer - I have only tried one so far.
And by the way, I have looked and discussed only software that create a virtual singer based on melody and lyrics. This does not pertain to software that creates its own instrumentation and singers such as Suno. I may look at it just to become acquainted though I much prefer composing and arranging my own material. I have heard some country demo tracks coming out of suno and they sound awfully commercial - that's very scary!
3 people like this
Hey Kat, thanks for sharing! I'm sure it would be helpful for many Stage32 members.