Or…. not made by humans….
I’m a music technology geek…. fascinated by what it can do to support the music production process (in the widest meaning of that term, from song writing to final mastering). However, if you have any skin invested in the music creation world, the odds are you will have considered the impact of the most recent technological disrupter into that space; AI and machine learning.
Me too…. Indeed, this has been a topic that almost everyone I know who works in music creation is both thinking and talking about. Like you I suspect, I’ve taken part in – and learned things from – those conversations. So, what follows is a very personal take on the issue. Where can/should technology sit in what is, from a social perspective, one of our most significant (precious?) creative endeavours as humans; the making (and recording) of music?
It’s a complex topic with, I think, many contradictions and lots of subjective aspects. I’m not entirely sure what follows helps (I know; sorry about that) but just know that it was written primarily to help me work out what I actually thought rather than to suggest to anyone else how they ought to think….
Still, it’s an interesting topic so, if you want a perspective which you can place alongside your own thoughts and feelings, or just to see how utterly misguided my own take is, then feel free to read on. However, if you do decide to go further, then go and stick the kettle on before you start…. this turned into a long read for which I apologise in advance….
As Led Zeppelin once said…. “Ramble on”…. you have been warned 🙂
Great song…. let’s record it badly….
One generally held aim during the music recording process is to capture the best representation of a song possible. No, not perhaps the only role of the recording process, but certainly a very common consideration and, often, a primary concern from the perspective of an artist and/or songwriter. This would include elements such as the most appropriate arrangement of instruments, the intended audio recording quality for each of those included instruments, the best (although ‘best’ might take on different meanings in different contexts here) performances for each of the instruments and, eventually, the final mix of all those elements that best serves the song and the performances in the way the artist envisages it.
There are lots of subjective elements within this (brief and simplified) description of most people’s intentions for the recording process. Terms such as ‘best’ or ‘most appropriate’, whether applied to audio quality of the recordings, arrangement of the instruments, the nature of the performance, or the mix, are all matters where beauty may well be in the ears of the beholder….. and all that’s before any discussion of whether the actual song is worth all the effort put into recording it in the first place….
All of which is perhaps the long way around of saying that, when we have taken the trouble to write what we think is a worthy song, we generally then do our very best to record it to a standard, and in a style, that we think presents that song as well as we are able.
Band in a room
If we go back in recording technology time, any ensemble recording of a song would have been recorded in a suitable studio room with all the musicians playing together. Get really old-school, and everyone might have been arranged around the room with a single mic capturing the performance and the sound being cut directly to vinyl. Hop forward a technology breakthrough or three and that might have been a few microphones into a mixer and recorded in stereo to tape….. The performance, and the arrangements of the instruments, was all captured at the same time and the band just kept going until their studio budget ran out or they produced a take that they felt captured the song at its best.

Even when we got to multitrack tape, in many (the majority?) of our favourite classic recordings from the 60s and 70s, the band may still have started with an ensemble ‘band in a room’ performance. However, each instrument could be recorded via its own microphone to its own track on the tape (albeit with some ‘bleed’ from other instruments). You could then change the balance and tonality of each instrument after the fact and, if you had sufficient tape tracks, or were prepared to punch in/out, overdubbing could be used onto that ensemble performance to correct duff notes, improve on a specific performance, or add additional elements to the arrangement. And, if you had an engineer who knew how to cut tape (a very scary prospect but the ‘norm’ back in the day), then you could cut and paste bits from one part of the song and put them in another part.

Opening DAWs
Full band recordings are, of course, still a thing. And, indeed, multi-track recordings to tape are also still a thing is some recording contexts, although the cost of the multi-track tape recorders, challenges they can present in maintenance, and the not insignificant price of a reel of 2” magnetic tape, mean this is a platform that is perhaps found only in more elite circles (err… it always was; in this format, multi-track recording always required a healthy budget).
However, while tape can still get used for creative/artistic reasons, even in those high-end recording studios where tape is available, today, digital recording is undoubtedly the norm. Indeed, even where tape is being used, a digital recording format is likely to be used alongside it, even if only in the role of ‘backup’.

And, for the rest of us (pretty much everybody!), digital recording – via our particular DAW software of choice – is our ‘normal’. There is nothing wrong with that. Modern digital audio recording systems are capable of very high fidelity, can cope with dozens (hundreds, host computer depending) of tracks, allow sessions to be managed, backed up, and restored with tremendous ease and, when it comes to editing the audio of a performance or two, dragging clips along the timeline is much easier (and less scary) than cutting tape. The DAWs we now take for granted – Pro Tools, Logic, Cubase, Digital Performer, Reason, Reaper, Studio One, and many others – are powerful, generally reliable, convenient and inexpensive (well, compared to those multi-track tape machines at least). The DAW – and digital audio recording in general – has democratized the recording process. Anyone who can afford a basic computer, a DAW, an audio interface, a microphone or two, a MIDI keyboard and a pair of headphones, pretty much has the basis of their own recording studio. Yes, you still have to write a good song, and you still have to learn the basics of the recording and mixing process, but the need for a mega budget to be allowed access to a capable recording system has been largely removed. Your mileage may vary but, to me at least, that seems like a good thing….
Is technology that ‘fixes’ things ‘wrong’?
So, let’s go back to the ‘make the recording the best we can…’ idea for a moment. While our DAW software lets us record our initial audio (or MIDI) performances in ways that mirror the recording process with tape, software is undeniably more powerful and feature-rich when it comes to the editing stage of the overall recording/arranging/mixing process.
Those editing capabilities have also evolved over time and the range of options now available to music producers/artists to help them work towards that ‘best’ end result is almost limitless. Technology has always been used ‘improve’ how a recording sounds (better microphones, better preamps, better mix consoles, faster tape speeds, etc.) so this is not a new phenomenon…. but it’s interesting to observe how this ‘make it better technology’ has evolved and how different elements of it are perceived, especially the reactions that come when a new option appears for the first time. That new technology may ‘fix’ a ‘problem’ for the artist/producer, ultimately resulting in a recording that they feel is ‘better’ as a result…. but, on occasions, not everyone is comfortable with the fact that technology is providing some of these ‘fixes’.
The obvious example here is pitch correction (or, more generally, any type of pitch manipulation) applied to vocals. Ultimately, this is an audio editing tool and, as the name says on the tin, it allows the user to change the pitch properties of a performance compared with how it was originally delivered by the vocalist. While initially able to just shunt the central pitch of notes back onto a simple scale, this technology is now capable of much more than this. And, of course, it can be used on other instruments as well as the human voice.
Products such as Celemony’s Melodyne, graphical mode in Antares Auto-Tune, and even the pitch manipulation tools within many top-tier DAWs (Cubase’s VariAudio, for example) allow you to adjust almost any aspect of a vocal’s pitch. Yes, you can center notes onto a scale but, equally, you can finesse all sorts of other details such as vibrato, transitions into and out of notes, and the amount of pitch drift that’s present within a sustained note. And, in skilled hands, all of these tasks can be done in a way that makes the individual edits very transparent and free of digital artefacts.

That’s not to say experienced ears can’t detect such processing though, especially when it is perhaps overdone within a single performance. It’s perfectly possible to use this technology to remove enough of the ‘human’ contained within the original performance to allow some listeners to ‘hear’ that slightly unnatural (too perfect?) character within the edited performance. And, as we are all aware, sometimes that overcooked pitch manipulation is used as a deliberate creative choice….
So, is this technological fix a good thing or a bad thing? How do you feel about pitch manipulation technology? Is it ‘acceptable’, ‘unacceptable’ or does it ‘depend upon the context’? If it’s just used to rescue a couple of duff notes in what is otherwise a brilliant vocal performance? If it’s used to transform a vocal recording that’s just plain bad (someone who simply can’t hold any sense of a melody) into an acceptable (pitch perfect) lead vocal? If it’s just routinely applied to tighten a decent vocal to make it a bit more ‘polished’? Does anything go, or nothing go? What’s OK for the tech to fix and what’s not?
Are there any lines?
I’ve used pitch manipulation here as an example simply because it is a technology that has generated lots of discussion over the years and it’s become something that, outside the world of recording technology, even casual music enthusiasts are familiar with. However, it’s far from the only possible example that you might consider….
That’s especially the case if you frame the discussion in a context of ‘this technology has been used to create a performance that’s more accomplished than the performer is actually capable of’. Yes, pitch manipulation can be done for creative reasons when used as a deliberate effect or when, after the recording is complete (and the singer no longer available), you decide to change the melody of a phrase (the original was well sung but changing the pitch of a note or two creates a more satisfying, musically pleasing, end result). But, more often than not, it is used to improve some technical aspects of the singer’s pitching…. that is, to make the vocal more ‘in tune’. The technology is being used to make the performance better than in the original recording…. or, if you want to be less charitable, to make the singer’s performance better than the singer was actually capable of delivering in the recording session.
The pros and cons of pitch correction have generated lots of discussion, and it is undoubtedly a topic that also generates very different views amongst musicians (and some music consumers). However, taking that broader framing of the question described above, how about some other examples where ‘this technology has been used to create a performance that’s more accomplished than the performer is actually capable of’?
For example, when the drum sounds in your track sound a bit underwhelming because either (a) the drummer didn’t know how to properly set up and/or tune his drum kit or (b) the person making the recording didn’t (or couldn’t; unsuitable recording space, limited microphone options, etc.) make a very good job of the actual recording process? Well, amongst other things, we can start solving those technical problem with, for example, the use of the humble EQ, making the drums sound better than the musicians or recording engineers were actually able to at the recording stage. Or we can replace the recorded drums with samples….
Or what about when the bass player’s dynamics are all over the shop and the performance recorded has wildly varying volume? Well, some suitable compression might provide a technical fix that gets us a long way towards solving the problem and giving us a better performance than the player was actually able to produce.
Or when your guitar player (read ‘any musician’) couldn’t play the syncopated, funky, chord part in time to the groove. Apply a little audio quantizing and you can end up with a performance that is simply better than the original in that it locks into the rest of the groove more tightly.

These sorts of examples (and there could be lots of others) are interesting because, in the main, they are technical solutions that pretty much everyone has available to them in a modern DAW and are regularly used to improve (rescue?) a performance and/or recording. Sometimes, faults with a recording are unavoidable or go unnoticed at the time; in those cases, a modest technical fix is a pragmatic solution. The point of using these very routine examples is that most of us simply take these kinds of technical tools (a) totally for granted and (b) hardly give it a second thought when we need to apply them. But, under the hood, you could argue that they are being used with the same aim in mind as our pitch correction example; to create a performance within our mix that is, in some way, better than it was in its original recorded format. I guess the question here is whether there is a line to be crossed between a ‘modest’ fix and a fix that rescues something that was just plain bad in its original form?
Does this actually matter? Well, it might if that sub-par performance was the best the artist can actually do, and they then take that level of performance out into the live arena (obviously, there is a further debate to be had here)…. but, if you never see the artist in a live performance setting, that might not stop you really enjoying the final recorded mix of the song created with the help of all those technical fixes.
The point I’m exploring here is simply that technology has a long history when it comes to improving or enhancing recorded performances (either in terms of their sonic qualities or in terms of the mechanics of the performance itself) of pretty much every instrument – drums, guitars, bass, keys, strings, etc. and, yes, vocals. Maybe pitch correction generates a little more controversy because of how important vocals are in how we connect with a song? We connect with the emotions and/or feelings expressed within the vocal and lyrical content…. and we want to feel that those sentiments are genuine. And maybe that’s more difficult for some people to do if they know that the vocal has been edited within an inch of its life?
I’ll come back to this issue towards the end of the article…. but, for now, let’s ponder our own positions on where (or if?) there are technical fix boundaries that, at a personal level, represent lines that we’d rather not see (hear) crossed…. and then, from our individual perspectives, try to unpick why we place those lines where we do….
Band in a box
Let’s now move on to a related area of ‘tech support’ that is now routinely used in music production; virtual instruments and, in particular, virtual instruments that, to some degree or another, include a ‘performance’ element within their feature set. That is, they not only provide you with the sound of the instrument (drum samples, for example), but also the skills of a performer (a professional drummer, for example).
Perhaps the simplest form of this type of music technological enhancement you might use in your own productions is provided by commercial loop libraries. This is also an issue that individuals can have very strong (and very different) views about. That said, it’s also an approach to music creation that has undoubtedly generated some highly creative end results and some truly iconic songs, especially in particular genres. A personal favourite on this front would be Moby but there are plenty of others. It can be a beautiful thing….
However, it’s also possible to make a cup of tea, lash a few loops together in a couple of minutes, render out a mix, and release a track across multiple streaming platforms before your beverage has cooled down. Yes, there may be contexts where a quick combination of a few loops provides a musical solution (for example, a media composer asked to hit a deadline for a 30 second ad that’s due for broadcast an hour later), but I suspect most music producers and/or songwriters would want to feel they had embedded a little more of themselves into the creative process over and above just arranging half a dozen pre-recorded loops along a timeline.
In the early days of sampling technology, loop libraries became an established part of the music technology marketplace. Indeed, they still are, although loops are now also found embedded within some sophisticated virtual instruments that allow the user to manipulate them with ease, processing and blending them in ways to make them their own. For most music producers, this is perhaps a less ‘problematic’ way to incorporate loop-based elements into their own creative process.
However, the aspect of the ‘performance’ music technology that interests me more than the pros/cons of using loops – and I think represents another of those grey areas where our personal positions often require a little introspection – is not where that technology is used to improve on a performance provided by a real musician, but it is used to actually create the performance in the first place. In short, where technology allows you to create a musical performance that you might not have been able to create for yourself without the technology.
There are some very obvious examples here and instruments such as Toontrack’s EZdrummer (or Superior Drummer), EZbass and EZkeys would be prime candidates from the current crop. All three of these products (all of which I’ve been lucky enough to review for SOS at one point or another, and I think are amazing, by the way) are really three tools in one….
First, they are powerful sample-based virtual instruments that you can play. So, for example, if you are a drummer, and you hook up your electronic drum kit to trigger EZdrummer, you can create a drum performance based just on EZdrummer’s sounds. I say ‘just’, but those sounds are exceptionally good and, in very many recording contexts, they may be drum sounds that are vastly superior to those that you might be able to achieve in the same recording space with an actual acoustic drum kit and a selection of mics. In this mode of use, EZdrummer is simply giving you great drum sounds while the drummer is delivering the performance. It’s providing a very impressive technical solution to the (often difficult) problem of recording an acoustic kit.

Second, even when you have played the original drum part yourself, as the data is captured as MIDI, EZdrummer lets you rearrange those performances. In this sense, EZdrummer is providing you with a drum tool that can help while you are still in the writing phase of a project. You can capture some basic ideas for drum parts (perhaps for a verse, chorus, bridge, etc) and then use these as the vocal, guitar, bass, etc. ideas are layered in to develop your final song structure. You can also edit the drum performance after the fact with any of the MIDI editing tools EZdrummer (or your DAW) provides. The software is, in essence, providing a tool to assist your song writing process. Incidentally, this ‘song writing assist’ role is even more obvious in something like EZkeys as this contains an excellent chord sequence feature that can help you explore scale/key based chord changes as you work on your song idea.
Third – and this is the one where the conversation may get into one of those slightly grey areas – EZdrummer is also a virtual drummer. That is, if you buy into Toontrack’s mightly impressive drum groove packs (collections of drum grooves played by some of the world’s very best drummers and captured as MIDI), you can arrange a top-notch drum performance by simply choosing, and arranging, a sequence of those drum grooves to suit the musical context (song) you need a drum part for. This is, in essence, not a million miles away from using audio-based drum loops, but here the loops contain MIDI data that will then trigger drum sounds within EZdrummer.
And, in an extra layer to this role, EZdrummer can listen to an example of another instrument within your song idea (for example, a guitar part), and search through your MIDI groove collection to find a good musical fit. Repeat this process for each song section and EZdrummer will find you drum parts that can be sequenced to create a full performance. Yes, you can then edit the MIDI data as required, but the software is undoubtedly doing at least part of a turn as a ‘session drummer’.
This is a process that is taken even further in EZkeys and EZbass with the Bandmate feature, allowing you to get all three members of the EZ trio to feed off each other in terms of suggested performances. They are all doing an element of session musician duties when used in this way.
The one person band
So, what we have here, is a music technology that allows you create highly competent musical performances for important instrument groups where, as the artist, songwriter or producer, you have limited (or no) capabilities when it comes to playing those actual instruments. These are session musicians in software.
Toontrack is not, of course, the only option on this front and nor were they the first. PG Music’s Band In A Box software is a longstanding example that will generate full MIDI performances based upon your chosen style/genre. UJAM offer ‘virtual musicians’ in their Virtual Guitarist range (and also offer drum, bass and keyboard equivalents). Native Instruments have a whole series of Session Guitarist titles that cover various acoustic and electric guitar options, bass and ukulele. Logic Pro also offers a number of virtual musicians within its current feature set.

Nor is it limited to instruments normally associated with pop, rock or electronic styles. If you are interested in classical music or score music for film, TV or computer games, not only are you spoilt for choice in terms of playable virtual instruments full of excellent orchestral sounds, but dip into products such as Sonuscore’s The Score or EastWest’s Hollywood Orchestra, and you will find some very clever technology that, with you supplying just a few simple chords as input, will translate that into full performances across all the orchestral sections without you needing to be able to play a violin, cello, tuba, or flute. It’s mightily impressive technology and, where budgets don’t exist for a real orchestra, you are undoubtedly hearing it regularly in broadcast TV, film and game scores.
This ‘performance assistance’ exists in other forms also. Take, for example, the Chord Pads feature in Cubase, or the Chords Tool now offered in Kontakt. Both of these systems (and there are plenty of other examples of this) let you trigger a full chord voicing based upon a single MIDI note trigger. Regards of how limited your own piano keyboard skills might be, you can easily experiment with chord sequences – simple or complex – and create a very effective performance with rhythm and dynamics, all just using a single finger.
For solo artists, songwriters or music producers working out of their own home or project studios, the attraction of these tools – these technological options – is obvious. Not only do you get a quality of sounds that might match those made in a top-tier studio (as opposed to a corner of your spare room) but you can also create professional sounding musical performances worthy of a respectable session musician for virtually any instrument you might care to include, regardless of whether you can play that instrument or not. What’s not to like?

Well, there is the question about whether these kinds of tools have put real musicians out of a job. I’ve no idea and I’m not aware that anyone has actually done the research over the years to quantify that in any meaningful fashion (I suspect it would be a very difficult task). What the rise of these virtual instruments as ‘session musicians’ has done, however, is given individual artists additional creative options in even the most modest of studio environments. It’s another step in making the music creation process more accessible. I guess that’s one reason why almost anyone can now make, and release, their own music…. Of course, while access to these amazing tools might mean you have the potential for great sonic quality, and even musical performances that exceed your own capabilities on every instrument you include within your arrangement, they are no guarantee that you can write a great song, or mix/master it to its full potential….
It bears repeating though…. these kinds of virtual instrument sounds are not just for aspiring artists in their home studios. Tools like Superior Drummer or Hollywood Orchestra or any number of Kontakt-based virtual instruments can be found in many high-end studios around the world and, when they can help, will be used by top-level professional artists, producers and composers. They are an accepted part of the music productions process, from the home studio up to the top-tier pro-level environment. The virtual instrument – and the virtual session musician – genie is well out of the bottle….

…. and, to a very large extent, because their use is so widespread, the arguments about whether their use is good or bad have been left behind us. They are just another technological tool that is there it be used to help you make the recording of your music as good as you can possibly make it.
Aye-up, AI….
All of which us gets us rather nicely to the latest in the ‘technological assistants’ to arrive within music production; AI and/or machine learning.
Like most tech-nerds, I’ve done my fair share of reading about AI and machine learning, but I’d most certainly not claim any great expertise or understanding, nor less any ability to predict where it might (very quickly) take us, whether in music technology or elsewhere. But it definitely throws us some interesting ‘good vs evil’ questions to consider.
Let’s start with an ethical one that I think we might find there is some general consensus about; using AI technology to generate ‘new’ stuff, where the creators of the original ‘stuff’ used as the training data set for the AI have (a) not given permission for the ‘stuff’ to be used as training data or (b) been compensated in any way for the use of that ‘stuff’ as training material. Overall, I think most folk would place this in the ‘not ethical’ (and probably not legal) category. Personally, I think this is pretty straightforward…. but I appreciate that the laws that might be used to control this kind of data use (particularly where the appropriate law(s) may fail to cross international borders) are probably years behind the curve.
What about technological tools where any AI/machine learning element is based upon ethically sourced data and where the original creators of those training data have granted permission for its use and, if required, been correctly compensated? Is it then OK to use that sort of technology in a workflow/process if it can help you realise a new creative product or idea?
You could obviously ask this kind of question in almost any field of activity (film making, teaching, writing books, business strategy, financial strategy, etc.) but, as there are already plenty of examples of music technology software that is advertised as ‘smart’, or as including an AI element, let’s look at some examples from our own sphere of interest….
Actually, if we stick with products from some of the already established development teams here, I think we can be fairly sure that we will avoid the ethical issues concerning the source of the training data…. so the questions will again boil down to how you feel about technology enabling you to do a specific task within the music creation/recording process better than you could do that task without the technology. Is it OK to lean into this technology to (in some way) improve your music production?
So, an example or three…. What about Sonible’s highly regarded ‘smart’ series of plugins (although, actually, the vast majority of their plugins, including those in the ‘pure’ series, could be included here)? For example, smart:comp 2 provides a sophisticated modern take on compression that includes spectral capabilities (the compressor is sensitive to the tonal balance of your audio, not just its overall volume level). You can set all of the plugin’s controls manually should you wish, and it is a very capable and powerful compressor that can produce very effective results. However, you can also let smart:comp 2 audition your material and then, based upon its AI-based capabilities, it will automatically suggest settings for each of its parameters to provide an appropriate style of compression. The bottom line here is that, with little or no knowledge of how compression (let alone spectral compression) works, you can get an appropriate compression starting point for either the individual tracks, or the busses, within your mix. Yes, you still need to be able to trust your ears and use your judgement about the merits of the plugin’s suggestions within the context of your overall mix, but you don’t necessarily need to understand a compressor to get a suggestion that may well be all you need.

Sonible are not alone obviously. The same sort of ‘audition and then suggest’ technology is built into iZotope’s Ozone, Nectar and Neutron. Users who might not regard themselves as skilled mastering engineers, mixing engineers or vocal mixers, can now get instant suggestions on how to do all those tasks from software. Do the results beat those that might be crafted by hand by a more experienced user of the same tools? I’d argue probably not, but it may get the less experienced much closer than they otherwise would or could.
There are other examples where the ‘intelligent’ element of a plugin processor attempts to solve a more specific (and more technical?) problem. For example, iZotope’s Catalyst series of plugins, at one level, provide low-cost options for reverb, delay and saturation. Taking Aurora as an example, this is a straightforward reverb plugin. The reverb itself is perfectly useable and sounds good, and it’s accessibly priced. However, compared to some other reverb plugins, it’s certainly not the most sophisticated tool out there, nor does it emulate a classic hardware reverb of days gone by. However, it does do one thing that’s rather clever and helps solve a problem that reverb itself can create; it ‘intelligently’ ducks frequency ranges within the generated reverb signal if they are likely to mask key frequencies within the original dry signal. The result? Well, used on a vocal for example, it means your reverb shouldn’t ever mask the vocal itself. It uses its ‘smarts’ to solve a problem reverb can create without the user having to find a solution by some other means (perhaps lots of EQ and automation on the reverb return?). The principle applied here strikes me as rather a good one…. and a way of leaning into AI/machine learning that usefully deals with a technical problem before it can even become a problem.

There is another category of these sorts of ‘intelligent adaptive’ plugins that many readers will be familiar with, the best known examples of which are perhaps Gullfoss and Soothe. These operate in real-time (albeit with some modest latency to allow the plugin to peek ahead in time to evaluate the properties of the incoming audio and for the processing involved) and can automatically refine the tonal balance of an audio signal. They are, in essence, dynamic EQ plugins that attempt to just make your audio sound ‘better’. Now, if you can find a clear account of the built-in decision making used by these plugins (it’s perhaps most opaque in Gullfoss), then do feel free to share…. but, under the hood, there must be some sort of machine learning/AI that’s ‘trained’ these plugins to recognise what sounds ‘good’ to the human ear in a musical context and, to then apply suitable processing to the audio to nudge it in that direction. I’ve used both of these plugins myself and think they are great, but I’m not entirely convinced I know how they do what they do. They must be doing something useful though as this type of ‘let the software deal with it’ spectral balancing is now something that’s almost as routinely used in all tiers of the music production world as drum samples and pitch correction.

Fixing vs creating; the other AI grey area
The ethical issues surrounding the use of training data aside, there is another grey area in the use of AI when it comes to an activity such as music writing, production and recording. It essentially reflects the (now infamous) social media quote that I think originated with Joanna Maciejewska along the lines of….
“You know what the biggest problem with pushing all-things-AI is? Wrong direction. I want AI to do my laundry and dishes so that I can do art and writing, not for AI to do my art and writing so that I can do my laundry and dishes.”
Now, I know that the merits of this statement have been debated and critiqued by lots of folks and, if you get personal enjoyment (or a livelihood) out of doing laundry or washing dishes, you may be less keen to see AI take your role. However, from the perspective of someone whose passion lies in a creative task, it perhaps rings true. Primarily, we might want AI to ‘fix’ things for us, rather than to ‘create’ things for us. To some extent, the software tools mentioned above do that; they employ AI/machine learning to do that kind of ‘fixing’. Sonible’s smart:comp 2 ‘fixes’ my dynamics issues on a bass or vocal track, Neutron ‘fixes’ my vocal track to provide suitable EQ, dynamics and ambience choices to help is shine within my mix, and Gullfoss sits happily on my master buss quietly ‘fixing’ the overall tonal balance of my mix in the background. Yes, you can easily argue that these are tasks that someone could (should?) learn how to do for themselves, manually dialling in the processing options required. And, yes, that is how many people still do it, especially if their role in the music production creative workflow is, for example, as a mix engineer. However, if you are a self-producing solo artist, responsible for every single stage of the music creation, performance, recording, mixing, mastering and release processes, then maybe you will be happy to get some of this type of technical ‘assistance’ from software, especially if it means you can focus your own efforts in areas that you find most creatively satisfying and feel lie at the center of your skill set?
Music by prompt
Perhaps the most obvious (extreme?) position in the ‘fix vs create’ question is posed by AI tools that will generate a full music production based upon a written prompt from the user. This process doesn’t require any specific musical skill or ability although that’s not to say there is not some skill to be developed in constructing a suitable prompt as a starting point to direct the AI process.
As I’m writing this (a few weeks into 2025), the latest website causing a stir on this front is Riffusion. This is currently free to use and, in less time than it takes to open your DAW, configure a blank project, and plug in your guitar, a short Riffusion prompt will generate you a complete song, with vocals and lyrics if required, in almost any possible style you might like. And, if you just do it a few times as an experiment, the results are surprisingly (scarily?) good….
I’ve highlighted Riffusion here for a number of reasons that I think are interesting although there are other websites that offer similar full song outputs. The first – and I think quite important – reason is that the site currently makes it very clear that songs created using Riffusion can NOT be used in any commercial context (for example, released on Spotify or sync’ed into a film or TV soundtrack). I’m not sure what that perhaps says about the source of the training data used by Riffusion’s AI, but I do think this declaration is a good thing given the obviously challenging legal/copyright issues and uncertainties that currently surround materials created by AI. You can use it for personal exploration, but not for commercial gain.
Second – and this is based upon my own limited use of the site, so it might be a false impression – the songs it generated from the prompts I experimented with were surprisingly good. Yes, critical listening would soon let you identify some duff lyrics (although you can get the system to take an additional pass at specific lyrical lines until you get something you like) but these were songs that, if they appeared in a Spotify playlist in a specific genre, might not pop out as ‘AI created’ to the causal listener. Equally, sat as background music in the soundtrack of a low budget reality TV show – with or without vocals – I’m not sure anyone watching the show is going to switch channels because the music sucks. Riffusion is already hitting a quality bar that could make it useful if commercial use was permitted.

Third, Riffusion seems to come up with song structures that ‘work’. That’s maybe not so surprising given that I suspect it will have ‘learned’ about song structures from training data containing lots of successful songs from popular genres. That those ‘winning’ song structures are then spat out by the engine is perhaps to be expected…. but it’s still impressive that the AI is capable of generating such sensible arrangements given that lots of songwriters often agonize over finding the magical ‘hit formula’ for a specific song. Even if I didn’t wish to embrace this particular area of AI ‘creativity’, I’d love to know what the same AI training data set could teach me (or any aspiring songwriter) about writing a great song structure….
Fourth – and I think this is perhaps the most scary part of my personal Riffusion user experience – if the first issue highlighted above gets resolved to everyone’s legal satisfaction, and the site’s output is then cleared for commercial use, if you are currently in the business of producing stock or production music (which is some of what I do), then you are going to have to listen to what this type of AI based music creation tool can achieve because it may well become a competitor. That said, there will be financial considerations here also because I would imagine that, as soon as full commercial clearance is available, services like Riffusion will no longer be free to use. I’d make an educated guess that, at present, any of the ‘free to use’ versions of these types of sites are essentially loss-leader investment/development ventures; when they are ready and able to (legally) make money for their users, they will (quite obviously) need to generate revenue to support a sustainable business model for the development team behind them.
OK, so let’s put the legal issues aside for a minute (and assume that websites like Riffusion will find an approach that is legally and ethically sound to source training data) and get back to the more qualitative question about the desirability of music that is entirely the creation of AI-based computer models. Where are our personal positions on this concept where the computer technology is going way beyond ‘fixing’ and – bar the prompt – does the complete ‘create’ process?
While sitting under my ‘technology nerd’ hat, I can’t help but be fascinated by what AI-based systems such as Riffusion can ‘create’, although I’d qualify that by using ‘generate’ as a better description of what’s happening. However, with my musician/music producer hat on – and especially as someone who earns at least a portion of their income from the music I create – I find it undeniably scary. And, while I might be amazed by the technology, I’d still have no problem placing music creation (music generation) entirely by AI in my own ‘bad thing’ box. I suspect the majority of musicians – people who have invested a lot of time and effort in developing their personal skill set for music creation – would feel the same way.
Unfortunately for us, while our displeasure may be noted, I think this may be a tide we can do little to resist in the short term. First, I think it is obvious that, right about now, casual listeners to music (including us musicians when we are listening ‘casually’) are not really going to be able to tell whether the music that’s playing in their supermarket, or popping up on Spotify after their playlist has run its course, or in the background of a low-budget reality TV show is ‘real’ (composed, performed, recorded and mixed by humans) or AI (generated by a computer based upon a prompt). The quality is already very close to being undetectable and the technology is only going to improve….
Second, I’m not sure our displeasure as musicians will be universally shared by non-musicians. There will be non-musicians who might be music users or consumers who see the ability to generate music from a prompt as a positive and genuinely useful thing. For example, there will undoubtedly be music consumers who like the idea of generating a set of 50 ‘original’ songs in a specific genre for a party playlist.
However, perhaps more interesting is the aspiring film maker or games developer. They are, themselves, doing creative artistic work in their own field of interest. They may be doing it with little or no budget. They may have no expertise in (or the time/desire to learn about) music composition, recording and mixing. But they absolutely do need music for their film or game project. The option to generate music to their own brief (prompt) in an instant and for little or no cost, would undoubtedly be an attractive proposition. Would we, as musicians, rather they hired a composer/music producer, or paid a sync license for some of our existing music from a music production library or stock site? Yes, we would…. And, yes, they might well prefer that route also in an ideal world where they had a budget and the time to do it…. but AI generated music might be the only viable choice they have access to.

But, before we get too judgmental about our (hypothetical) fellow film/game creative bypassing our hard-earned musical skill set in favour AI, don’t forget that there have already been lots of bands and musicians who have used AI film or animation sites to generate video content to support their musical output. The musicians in this scenario are, essentially, doing exactly the same thing in reverse; using AI to fulfil a secondary need for their ‘made by humans’ musical endeavours because they don’t have either the skills or the budget to generate a music video for their latest single release. AI generated video might provide one means to tick that ‘video’ box in terms of the overall strategy and promotion of their musical project. And, just as with AI generated music, AI generated video is getting impressively good. Might we prefer to find a videographer to do the job in a brilliant, creative, original (human) way? Yup, we might…. but not everyone is in a position to pursue the ‘human’ route.
I think what I’d want to draw out from this bit of the discussion is that the relative merits of AI generated ‘art’ (music, film, images, text, etc.) is very much wrapped up in individual perspectives. What may seem like a threat or problem to one group, may be seen as an opportunity and solution by another. Maybe, when the sci-fi B-movies of the past – those that portrayed the time when computers are threatening to take over the planet and dispose of humans all together – eventually do come to pass, we might find the collective view of AI generated ‘creativity’ hardens into a universally negative stance. Until then, I think we are in for a somewhat bumpy ride of very divided opinion….
The voice in the machine
So, technology can pop its head up in a whole raft of ways in the composing, recording, mixing and mastering stages of music production. Many of these are well established and – now – just generally accepted, while others divide opinion. When it comes to music, how much ‘human’ do we need for it to qualify as ‘music creation’? How much ‘technology’ can we involve before it becomes ‘music generation’? There would seem to be some white, some black…. and a whole lot of grey.
We have, therefore, covered a lot of ground here (in a sort of ‘chuck everything at the wall and see what sticks’ sort of a way). However, before I try to draw some of these thoughts together, I’d like to consider one final – and I think quite interesting (thought provoking) – example; vocal synthesis.
Vocal synthesis – that is, the generation of sung vocals by computer software – has been around for a long time. Probably the most well-known option for this is Vocaloid. I reviewed an early iteration of Vocaloid in the March 2004 issue of SOS, and I remember the interest this product generated when first released. Vocaloid was (and still is) a virtual instrument. You created a MIDI-like melody line in a grid editor, added your own lyrics to each of the notes, and could use a number of the synth engine’s parameters to add certain elements of vocal expression. The engine took a considerable time to then generate you synthetic vocal (and to re-generate sections every time you made an edit), editing was perhaps best described as painstaking and, at that stage, the results were not going to fool anyone into believing it was actually a human voice.
However, step forward to today, and the technology underlying vocal synthesis has come a very long way. Vocaloid is still around but there are a couple of competitor products – ACE Studio and Synth V – that are perhaps currently leading the way. In addition, we can see the evidence of how good vocal synthesis has become in the output of AI music generation sites such as Riffusion mentioned earlier. The vocal performances within these prompt-driven musical offerings are quite remarkable given the unbelievably complex task they are trying to perform. The human ear is incredibly sensitive to all the small details that are present within the human voice (sung or spoken); if we are listening critically then even a general listener may be aware something is ‘off’ with a vocal performance, even if they are not quite sure what that ‘off’ is.
In order to explore the issues and/or questions this technology raises, we need a little bit of background as to how it works. For those new to vocal synthesis, I’ll use Dreamtonics’ Synth V as my reference here as it’s the software I’m most familiar with. Synth V can run as a stand-alone application or as a plugin virtual instrument within a suitable DAW. As far as I understand it, Synth V’s engine does employ AI, and that AI is built into its synthesis processing. All that processing is done locally on your host computer (not in the cloud).
Like Vocaloid, you get a timeline-based editing environment that is very similar to a MIDI grid editor, and you can enter notes (or import MIDI files), edit their length, pitch and timing. You can then add lyrical content attached to those notes. This is flexible enough to let you spread a specific word out over multiple notes and control how the syllables fall on each note segment. Thankfully, the editing tools for this phase of the process are pretty slick. However, you can also import an audio file containing a sung vocal, and Synth V will then attempt to extract both the note and lyrical content from that audio and turn it into the note/lyric data the engine can use (and which you can edit). This works remarkably well and can save a lot of time in the initial note/lyric editing stage.

As a brief aside, the obvious question at this point might be ‘if you have already sung the vocal part, why do you need to use software to generate a synthetic vocal?’. Well, that initial sung vocal might be just a guide. While it does need to be cleanly recorded, it doesn’t necessarily need to be brilliantly sung, or delivered with a characterful performance, or even contain the final melodic and/or lyrical content. If you don’t really consider yourself a singer, but can just about hold a tune, it is a means of considerably speeding up the initial creation of the MIDI-like data (and which you can subsequently edit) needed by Synth V’s engine.
The other key element of Synth V’s engine are the ‘voicebanks’. I’ve no great insight into the technicalities of their format but each voicebank is created by extensive sampling of an actual individual singer. Presumably, the AI engine is then ‘trained’ on that singer’s performances and learns how they pronounce syllables, how they transition pitches between notes, how their vibrato works on sustained notes and other ‘human’ elements of how that singer actually sings. The initial sampling also includes performances in different vocal styles – loud, soft, gentle, passionate, chest, breathy, overdriven, etc. – and these are then available as ‘vocal modes’ within the engine, allowing you to blend between them, and automate changes between these characteristics, along the timeline of the performance. Each of the voicebanks offers a different selection of these vocal modes depending upon the styles of the original singer.
What this means is that your MIDI-like note and lyrical content can be sung by any of the Synth V vocalists happen to have available. You can simply switch between male or female vocalists, change the vocal style, adjust the pitch curve, change the pitch transition into/out of notes, adjust the vibrato, re-write the melody, change the lyrics, change the timing of phrases, copy a main vocal line and edit it to generate a double, harmony part or backing vocal. This is exactly the same kind of detailed editing you might do with any sophisticated virtual instrument (a multi-articulation solo violin, for example) and, after every edit you make, Synth V’s engine (very quickly on any modern host computer) simply resynthesizes the part.
There are already a good selection of voices available and 3rd-party developers are also now involved (Eclipsed Sounds are probably the best known of these). No, not every vocal style is currently represented, but the coverage is good so, whether it’s soft female pop, EDM friendly male or female voices, K-pop, show tunes, ballads, opera, soul, rock or metal, there is something that will get you in the ballpark. Oh, and just for information, while each singer’s original native language is specified, the engine can currently generate vocals in English, Japanese, Chinese and Spanish from any of the voices. Oh, and rap styles are also supported within the engine.

Importantly – given our legal/ethical AUI discussion earlier – all of these singers have been properly contracted (and suitably compensated) by Dreamtonics (or the 3rd-party developers) for their work in providing what is essentially AI training data based upon their vocal skills. As a consequence, users of the software can utilize the synthesised vocals without any legal and/or ethical problems. That’s clearly a much more straightforward situation than you might be in if, for example, you use an online voice changing site to transform your own voice into that of, for example, another singer (especially a well-known singer dead or alive). The IP/copyright/legal/ethical potholes you could get into are considerable if you then put content featuring those vocals into the public domain. With Synth V’s vocals, no celebrity voices have been harmed in the process, so you are good to go on that front at least.
So much for the technical background. How good are the results? Well, you can create some suitably robotic vocals (either as a deliberate creative choice or simply because you haven’t quite got your head around what the engine can offer) but, once you have gained a little familiarity with the software, they can be truly remarkable. I’m choosing my words carefully here, but I’ve played songs to other musician friends containing Synth V generated lead vocals and, on many occasions, they have simply not noticed the vocals were not ‘real’, even when those vocals were placed in a fairly exposed form in a sparse mix. Interestingly, if you tell someone that the vocals are synthesized before they listen, then they are more inclined to listen critically to ‘spot the fake’. There are some potentially fascinating blind listening tests to be constructed here in much the same way that guitarist’s love a ‘real amp vs amp sim’ blind test. Anyway, in terms of Synth V’s generated vocals, ‘truly remarkable’ just about sums it up.
Whatever your personal comfort level with virtual (synthetic) vocals, the use-case scenarios for a virtual instrument such as Synth V are pretty obvious. For example, a song writer or music producer might use a Synth V vocal as a writing tool. It gives you a realistic impression of what the main vocal part might be as you work on the song idea. It lets you experiment with different melodies, lyrics and styles of delivery and types of voice. Whether you sing or not, this represents a very flexible ‘vocal drafting’ tool. And, whether you sing or not, it gives you access to different vocal styles – and different genders – from those you are personally able to sing. Equally, it lets you easily add backing vocals or harmony vocals into a production that, frankly, given their supportive role in the arrangement, very few listeners are going to identify as ‘not actually human’. Media composers needing vocals – lead, backing or harmony – in multiple styles, and that can be as easily edited as their drum, bass, piano, synth or orchestral elements, could find plenty of applications for vocal parts generated by Synth V.
Of course, given that the realism level of the output can be high (especially when you are fully familiar with how the software operates), Synth V’s output might become more than a guide of the required lead vocal for a human session signer to interpret; in certain musical circumstances, it may become the final vocal. In some musical genres/contexts, I’d have no problems believing that was possible. Take, for example, electronic pop/dance styles. Here, the lead vocals are often heavily stylised and processed. By the time you have got creative with your vocal processing in this sort of production, ‘natural’ is not really what you might be going for in your vocal sounds. Synth V’s very creditable output may be a perfectly acceptable starting point for all that creative sound design/processing.
That said, you can easily coax an incredible amount of realism out of Synth V so, providing you can fully exploit what the software has to offer, the ‘not human’ status may well be missed by the majority of your audience, regardless of the style or genre, from piano ballad to symphonic metal. And, as with other forms of generated content that, in some way, lean into AI/machine learning, that level of realism is only going to improve.
So, if we take Synth V as an example of where the technology of synthesized vocals might be, we can see that it makes use of AI/machine learning to interpret the note/melody/lyrical content that we have created and then synthesizes a vocal performance. The AI is applied to make the synthesized vocal as realistic and ‘human’ as possible, not to generate the underlying musical idea. And, that AI element has been implemented in a fashion that is legally/ethically sound. We can also see that it does not itself generate our melodies or lyrics in any way; that’s a creative process that remains with the user of the software. Equally, the user of that software can customise/automate the various vocal style options (soft, clear, passionate, tense, etc.) to change the character of the vocal delivery to fit the requirements of the song.
In short, it’s not difficult to make the case that Synth V is a virtual instrument with many parallels to other virtual instruments. For example, if we take a top-tier virtual solo violin instrument within a sample engine such as Kontakt, the user gets to write the melodic line as MIDI note data, edit that line in as much details as they like, change the dynamics of the delivery via automation, and change the articulations (playing styles such as sustains, pizzicato, staccato, etc.) used. And, once you do all that, Kontakt does it’s best to produce a performance that is as realistic as it possibly can with the aim that, whether solo, or within an ensemble, the listener doesn’t question that they are listening to a real violin played by a real violinist. This virtual instrument ‘fakery’ is something the vast majority of music producers and media composers do on a routine basis and without malice; it’s simply a practical solution to placing certain instruments within your project when you don’t have access to a suitable session player or the budget to employ one. And it’s accepted practice…. no drama, no fuss, even if, in an ideal world, we would all love the opportunity to get all of these performances from a capable (inspiring) human musician.
But….
And, in my own experience at least, there is a but…. because despite what seems to me to be a reasonable (?) technical comparison between a virtual instrument capable of producing realistic human solo violin (or drum kit, or guitar, or piano, etc.) performances and a virtual instrument capable of producing realistic human vocal performances, once you tell a listener that they have just being humming along to a synthesized vocal in the song you just played them, it often dramatically changes their perception of the song. And it does that in a way, and to a degree, that is not the case if they suddenly discover the drums are sampled or the guitar is virtual, or the whole orchestra is ‘fake’. That’s not to say said listener might not be both impressed or amazed at the technology and the realism of the vocal performance…. but their response to the performance and/or song changes.
So why is that? What is it that drives this kind of reaction to a virtual vocal performance that does not occur with other virtual instrument elements within a piece of music? Well, perhaps the most obvious explanation lies in the role that a vocal plays in a song. Much as we might engage with a song because of a cool guitar riff, epic bass line, or addictive drum groove, more often than not, it’s the vocal performance that takes center stage for the listener. The lyrical content conveys the meaning of the song and provides the message that may resonate with the audience, whether that message is ‘time to party’, ‘my heart is broken’ or ‘the world is beautiful’. In addition, it’s the vocal performance that then puts the appropriate emotional slant on the message to deliver it effectively, whether that’s through a soft and gentle whisper or a raging roar of vocal grit. In short, the lead vocal is the emotional heart of the song; if it lacks authenticity then the recording of the song is unlikely to really engage the listener.
So, if the vocal lacks authenticity or emotion, it will probably be missing that magic ingredient that will keep listeners pressing repeat. This might be why hearing (or knowing that) a vocal has been pitch corrected is such a big deal for some listeners; it suggests a fakery has been involved at the very heart of the emotional core of the song. If you can’t hear that processing, or are not aware that it has been used, then you either buy into the performance (or not) because it engages with you (or it doesn’t)…. but your view is not tainted by being able to detect (or being told about) an element of technological input within the production of the performance. It’s 100% ‘human’ (at least, as far as you are aware) and you like it, or you don’t, on that basis.
And, if I had to hazard a guess, then maybe the same factors come into play with synthesized vocals. If you can hear the synthesis (or you have been told about it), then it’s more difficult to buy into whatever emotional message the vocal is intended to deliver. With the lead vocal playing such an integral/important role within the emotional impact of most songs, it’s perhaps not surprising that even a hint of ‘not human’ is going to undermine the whole listening experience.
This is a whole lot of things. At their best, synthesized vocals are an amazing technical feat. At their best, synthesized vocals may be undetectable by many (most?) listeners if auditioned without prior knowledge of their synthesized nature. They can be created in an entirely ethical fashion without any IP/legal issues if their underlying AI/machine learning is done with full permissions from the training data sources. The software user is fully in control of the majority of the creative elements, responsible for the vocal melody, lyrics and style of delivery. The ‘generated’ element exists entirely within how those melodic, lyrical and style ideas are used by the software within its synthesis engine to actually create the sound of the voice. This includes the syllable pronunciation, the pitch flow and/or vibrato through the sequence of notes, although all these elements can also be adjusted in great detail by the user much like you might adjust parameters within a conventional synth to finesse the sound. In so many ways, it’s just another virtual instrument that we can now use, just as we use virtual drums, virtual bass, virtual pianos, virtual synths, and virtual orchestras.
Until, as listeners, we realise – or are told – what we are listening to is not human…. and then it can place a very different perspective on our listening experience despite the actual audio we are listening to remaining unchanged. It’s not the actual sound that’s the issue; it’s our response to how that sound was made that changes our perception of it.
There is a potential inconsistency here that’s really interesting. We are used to (and comfortable with) a non-guitarist singer adding a virtual guitarist’s performance to a recording of a song they have written…. but many people feel differently about a non-singing guitarist adding a virtual singer’s performance to a song they’ve written.
Back to the band in a room?
OK, as I did at the start of this piece, I’ll apologise again for the long – and somewhat rambling – nature of this discussion piece. It’s been a long read so, if you made it this far, then you will be glad to know that it is nearly time to put the kettle on again (or pour yourself something stronger)….
However, in considering the role of some current technological tools – and, in particular, tools that include AI/machine learning elements – I think we also get into the much deeper (and much broader) waters of how we use technology in every stage of the music production process. And that’s not a recent phenomenon; AI is just the latest technology to appear within the sphere. From writing, to performing, to recording, to mixing and to mastering, there has always been technological developments intended to ‘fix’ issues within our human-led music making/creation workflow, to assist us with elements within the process that are difficult, or, more recently, simply to perform parts of the task for us because it’s something where the technology can now do a specific task faster and/or better than us.
The questions about this ‘tech support’ are, therefore, much broader, and much older, than the more recent debates about AI/machine learning and hinge around how we value just how ‘human’ the music is that we listen to. And, seen from that broader perspective, AI is just one more data point on everyone’s individual continuum of acceptable to unacceptable.
At the extremes of this continuum, I suspect we will find some broad consensus. Only the purest of the purists is probably going to baulk at the use of multitrack recording, or EQ to finesse the overall tonal balance of a recorded sound, or compression to tame the huge dynamic range of a bass guitar so it doesn’t overwhelm other important elements within the music. Equally, at the other end of the spectrum, I’d hazard a guess that most music creators (if not everyone else) would place music that is entirely AI generated into the ‘unacceptable’ category, even if it is produced using ethically sourced training data and is, therefore, not in breach of any copyright laws. If the only unique element required to generate a complete song is a prompt consisting of a few words, I’m pretty sure that’s not going to qualify as ‘music composition’ in most people’s understanding of the term…. although the absence of true creative value is not to say that such music couldn’t easily have commercial value in the right context; a listener may still enjoy it and a TV production may still feel it’s a great fit for their soundtrack needs.
However, as you move away from those extremes, both the black and the white transition very quickly into grey…. areas where there will undoubtedly be differences of opinion. I’ve tried to tease out some of the more obvious examples of this within the discussion above (and, at times, tried to poke the occasional bear with a stick just to make a point). Pitch correction is a very obvious example, and the devil is undoubtedly in the detail of exactly how it is used in any given situation. That said, some would prefer it if we consigned all pitch correction to Room 101.
I would emphasize as I wrap up, however, that all the mainstream examples I’ve mentioned here are tools that sit in my own studio and get regular use. For example, Toontrack’s EZ line of virtual instruments, NI’s Kontakt, and Cubase’s Chord Pads are built into various project templates that sit at the heart of many of my musical projects. That said, I’ve deliberately avoiding being very specific about my own black, white and grey opinions here; your mileage may well be very different from mine and that’s absolutely fine with me. Equally, we may both have a very different take to your average music consumer who just wants great tunes to listen to and a great performance if they go to see the artist perform live; if they can tick both those boxes, they probably don’t sweat some of the technical details involved in how both of those things are achieved. Although, even there, not many punters will want to see a live show where everyone on the stage is miming to a (heavily edited by technology) backing track.

While technology that can correct imperfections within a music production (a tech fix) has its share of grey areas, technology that actually generates elements within those music productions is just as divisive of opinion (and maybe even more so). Software that can generate chord progressions, melodic lines, or drum beats to serve as inspiration for your next song, for example. Or maybe virtual instruments that can generate a full guitar performance in almost any musical style based upon no more than the input of a few MIDI chords (chords perhaps generated by that chord sequence generation software just mentioned)? Or software that does a similar job in constructing a full orchestral arrangement from that same MIDI chord sequence? Or software that listens to your guitar part (real or generated) and then suggests suitable drum patterns or bass lines to sit alongside it? Or software that generates lead vocals that are realistic enough to replace the most ‘real’ thing in almost any song; the human voice. Or, ultimately, software that generates a complete song when prompted by just a few words….
If the legal/copyright uncertainties surrounding the latter do become resolved in a way that means music generated by AI is not going to get anyone sued, then maybe the collective position on the contribution that technology ought to make (as opposed to ‘can make’) may well shift. Those of you with long memories and a fondness for the music of Queen may well remember the ‘no synthesizers were used in the making of this album’ statements that appeared on some of their earlier album sleeves. I’ve seen various quotes from members of the band describing the rationale behind these ‘no synthesizers’ statements but I do wonder if we might not see something similar coming with AI? A “No AI technology was used in the creation of this music….” kind of thing?
The rule book of creativity
There is a classic music producer joke/meme that perhaps most of you might already be familiar with that, I think, provides a neat way to summarize the issues this discussion has attempted to unpick…. I’ve no idea who originally wrote it, and I’ve seen it in multiple forms over the years, but the version below captures the essence of it…. It’s easy to poke holes in, but it still makes me smile….
“I thought using loops was cheating, so I programmed my own loops using samples. I then thought using purchased samples was cheating, so I recorded real drum hits to make my own samples. Then I thought that perhaps programming patterns was cheating, so I learned to play the drums. Then I wondered if using the drums I had purchased from a store was cheating, so I started to make my own. Then I thought using premade drum skins was cheating, so I bought a goat, killed it, and skinned it to make my own drum skins. Then I wondered if that that was also cheating, so l bred my own goat from a baby goat…. How’s the music making going? Well, thanks for asking, but I haven’t made any music lately, what with all the goat farming.”

Maybe, if AI generated music becomes a significant component of what stuffs our streaming platforms, advertising, and game, TV and film soundtracks, alongside a higher value being placed upon music in a live performance context, our recorded music process might get back to a place similar to those old-school ‘band in a room’ recordings; all the musicians and musical parts recorded as if it was a totally live performance, but in a controlled recording studio environment to ensure pristine audio quality. This may become (again) the ultimate way in which our most human of musical content is created and captured. No editing to the grid, no comping of takes, no pitch correction, and no AI; just a little instrument balancing on the mix console and the very best performances the musicians can give. It would, undoubtedly, be expensive, and perhaps somewhat elitist, as a process/format…. but the results would also be very human. To some listeners – those prepared to pay a premium to get as close as possible to music with maximum human and minimum technology – that might be the most important thing.
Writing this discussion piece has been something of a cathartic experience for me. It’s also let me examine (and question) my own views on this complex subject. I’m not here to tell you what you should think, but maybe reading it has encouraged you to consider your own opinions of the topic? If so, I hope that’s felt helpful. We might all develop our own personal guidelines in this regard but, outside the reaches of the actual law (IP, copyright, etc.), should musical creativity be subject to rules? I sincerely hope it is not….
Whatever your personal comfort level with leaning into all the types of technology available to support your music making process, good luck with your own music making journey and I hope that process brings you a creative outlet, and a form of expression, that lets you share something about the human in you with others who might listen to the music you create.
Finally (I can hear you sighing!), it’s time for me to ‘Ramble off’…. so, however you choose to do it, enjoy and embrace your own personal creative process, and go and make the very best music you possibly can.