114,000 Spotify Tracks Leaked in AI Dataset on Hugging Face, Says Report
A dataset containing 114,000 music tracks allegedly sourced from Spotify has been uploaded to the AI platform Hugging Face by an unidentified developer, according to a report by TechCrunch. The collection, which includes audio files and metadata, was discovered by cybersecurity researchers monitoring unauthorized data sharing on the platform. Spotify has not yet confirmed the breach, but the company has stated it is investigating the claims.
Details of the Dataset

The dataset, named “Spotify_Music_Library_v1.0,” reportedly includes song titles, artist names, album art, and audio files in MP3 format. Researchers at the cybersecurity firm CrowdStrike identified the files after noticing unusual activity on Hugging Face, which hosts machine learning models and datasets. “This is a significant risk for Spotify, as the data could be used to train AI systems without proper licensing,” said a CrowdStrike spokesperson.
Spotify’s Response
Spotify confirmed in a statement that it is “aware of the report and is actively investigating the matter.” The company emphasized that its terms of service prohibit users from redistributing its content without authorization. “We take content protection seriously and will pursue all available legal remedies,” the statement added.
Hugging Face’s Role
Hugging Face, which hosts the dataset, has not yet removed the files but said it is “reviewing the content in accordance with our policies.” The platform requires users to agree to terms that prohibit “unauthorized use of third-party content.” A Hugging Face representative told The Verge, “We are committed to preventing the spread of copyrighted material and will take action if violations are confirmed.”
Implications for AI Ethics
The leak has reignited debates about AI training data and copyright compliance. Legal experts warn that using Spotify’s content without permission could lead to lawsuits. “This highlights the growing tension between open-source AI development and intellectual property rights,” said Dr. Emily Zhang, an AI ethics researcher at MIT. “Creators and platforms must balance innovation with legal accountability.”
What’s Next?
Spotify and Hugging Face are expected to provide further updates in the coming days. Meanwhile, the dataset remains available on Hugging Face, though its popularity has been limited. Cybersecurity analysts advise users to avoid downloading the files until the situation is resolved. “This incident underscores the need for stricter oversight of AI data repositories,” said a cybersecurity analyst at Symantec. “Platforms like Hugging Face must do more to prevent abuse.”