Stolen Songs
On AI training data, cultural IP, and the specific harm the debate keeps leaving out
The debate about AI and music is starting to develop a more concrete shape to it now. Training data was taken without consent. Royalty pools are being diluted. Artists are being impersonated. Copyright frameworks are being negotiated in real time. These are real harms affecting real people, and the conversation about them matters. But it tends to leave something out.
Not all music is equivalent in what it means, what it carries, and what happens when it is taken. The song you wrote last year and uploaded to Spotify is yours in a straightforward legal sense. The harm when it is used without your consent to train an AI model is real and has a clear remediation path - consent frameworks, licensing agreements, compensation. The harm is economic and legal and addressable, even if it has not yet been adequately addressed.
Some music carries obligations that are not captured by copyright law and cannot be restored by a licensing payment. Some music is ceremonial. Some is language material. Some is part of a system of knowledge transmission that operates through specific protocols about who may hear it, who may reproduce it, and in what contexts. Some music belongs to a community in a way that Western intellectual property frameworks, built on the assumption of individual authorship and the commodification of creative works, have never been equipped to recognise or protect.
When AI systems are trained on publicly accessible audio, they are trained on all of it. Including the recordings that should not have been accessible in the forms they were made accessible. Including the ceremonial music uploaded without community authorisation. Including the language recordings made without proper consultation. Including the cultural material that exists on digital platforms because of the relentless digitisation of everything and the inability of digitisation infrastructure to distinguish between what communities wish to share and what they wish to protect.
APRA AMCOS and Canada's SOCAN issued a joint statement in March 2026 explicitly including Indigenous Cultural Intellectual Property in their framework for AI copyright reform. 'Ensuring the songs, stories, languages and knowledge of First Nations peoples are respected and not harvested by AI without consent' is their formulation.(1) Australia has declined to introduce a copyright exception for AI training and is exploring practical licensing frameworks. This places it ahead of the UK's initial position, which would have defaulted to use, and alongside Canada in treating creator consent as the starting condition rather than an opt-out mechanism.
The policy position is the minimum, and it is coming late in the day. The practical reality is that the training datasets that already exist were built before any of these frameworks were in place, and the music that was in those datasets when they were built has already been stolen, and used. The Suno/Udio settlements with Warner and Universal start to establish compensation frameworks for future licensing.
They do not address what was taken in the past. It’s hard to know whether they will, or even can. They do not establish mechanisms for identifying when specific recordings are present in specific training datasets. They do not include Indigenous communities as parties to the negotiation, because those communities are not the rights holders under copyright law, and the settlements operate through copyright law.
This is not a problem unique to AI. The long history of cultural appropriation in the music industry — the adoption of Indigenous, Black, and other minority musical forms by commercial interests that did not compensate or credit the originators — predates AI by centuries and is not a problem that AI created. AI accelerates and scales what was already happening, which is a way of making it more visible and more urgent without making it categorically new.
What is categorically different about AI is the mechanism of extraction. When a commercial recording artist adapted a traditional song into a pop hit in 1965, the adaptation was audible. You could hear the relationship between the source and the derivative. The attribution was often absent and the compensation was usually absent, but the relationship was traceable. When a large language model is trained on a corpus that includes ceremonial recordings, the relationship between those recordings and the model's outputs is not audible, not traceable, and not attributable. The training process digests everything it encounters and produces outputs that do not visibly connect to any specific source. The harm has occurred before any particular output is generated. The harm is in the training.
Germany's GEMA proved in 2025 that Suno's AI generates outputs that are 'misleadingly similar' to original compositions in melody, harmony, and rhythm — similar enough to constitute copyright infringement under German law. The case establishes that the training process produces outputs that are legally and measurably connected to specific source material. The implication for the broader question of training data and Indigenous cultural material is that the 'fair use' defence — the argument that AI systems merely learn from rather than reproduce copyrighted works — is weaker than it was assumed to be.
Sony has developed technology capable of identifying the underlying music used in AI-generated tracks, making it possible for rights holders to trace which original recordings influenced a given output and seek compensation.(2) This is a meaningful development. It is also a development that benefits rights holders who are parties to the Sony rights management system, which does not include most independent artists and does not include Indigenous communities whose relationship to their cultural material does not operate through commercial rights frameworks.
The infrastructure for addressing the AI training data problem is being built by and for the entities with the most commercial leverage in the existing system. That is predictable and not surprising. It is also not sufficient. The communities most vulnerable to the specific harm of having ceremonial, linguistic, and cultural material harvested without consent are the communities with the least commercial leverage, the weakest position in the negotiation, and the most at stake in the outcome.
A platform committed to majority human-created, original music made by its members has a different relationship to this question than a major streaming service negotiating AI training data licensing with major labels. The cooperative structure — governance by member artists, including First Nations member artists — creates at least the structural condition for those communities' knowledge and protocols to be part of the decision-making about how their material is handled.
The debate about AI and music has a shape, and that shape, as ever, tends to centre the concerns of the people most visible in the debate. The songs that were taken from communities that did not give consent, that carry obligations the training process cannot read, that exist in datasets that are still being used to generate music that no court has yet examined for its relationship to those specific sources — those songs are part of the debate whether or not they appear in it.
The policy arguments about consent and compensation are necessary, and they are not yet sufficient. Start from consent. Ask whose consent is missing. Go from there.
Notes
APRA AMCOS / SOCAN joint statement on AI copyright and Indigenous Cultural Intellectual Property, March 2026. https://www.apraamcos.com.au/about-us/news-and-events/socan-joint-statement-ai
Nikkei Asia — Sony Group tech can identify original music in AI-generated songs. https://asia.nikkei.com/Business/Technology/Artificial-intelligence/Sony-Group-tech-can-identify-original-music-in-AI-generated-songs
If you've made it this far, you probably care about where music is headed.
So do we — that's why we built something different. The Pack Music Co-operative is Australia's first musician-owned streaming platform: cooperative-governed, human-curated, and built on the radical premise that the people who make the music should own the infrastructure that distributes it.
Join the Pack — become an early adopter member, support our crowdfunding campaign, or lend your voice as an Ambassador: 👉 packmusic.au/join-the-pack
Back the campaign — every dollar goes directly toward getting us to launch: 👉 crowdfunding.startsomegood.com/thepackmusiccoop
Read our story — where we came from, why we built it, and what we believe: 👉 packmusic.au/who-we-are
Say hello — we genuinely want to hear from you: 👉 packmusic.au/contact