From Life Moments to Songs

Think about the moments that defined your year. Maybe it was a big win at work, a trip that changed your perspective, or even a difficult experience that taught you something important. Every moment—joyful or challenging—carries an emotion worth remembering.

What if you could hear those moments as songs?

That's exactly what I built: a Year Soundtrack Generator that transforms your life moments into personalized AI-generated songs. Before we dive into the code, here's what we're building: soundtrack.codeanding.com/en

Why ElevenLabs Music API?

Earlier this year, I participated in a hackathon where we received ElevenLabs credits. Most people know them for voice cloning and text-to-speech, but they also have a Music API that generates complete songs with vocals. With the year ending and credits still available, I wanted to build something meaningful before they expired.

ElevenLabs stood out because:

It generates music with actual vocals (not just instrumentals)
The API is straightforward to integrate
It supports multiple languages for lyrics

Here's the breakdown of how it works:

const response = await fetch('https://api.elevenlabs.io/v1/music', {
  method: 'POST',
  headers: {
    'xi-api-key': this.apiKey,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    prompt: request.prompt,
    music_length_ms: Math.min(Math.max(durationMs, 3000), 300000), // 3s to 5min
    model_id: 'music_v1',
    force_instrumental: false, // Allow vocals based on prompt
  }),
});

The key parameters are:

prompt: A text description of what you want the song to be about
music_length_ms: Duration between 3 seconds and 5 minutes
model_id: Currently music_v1
force_instrumental: Set to false to include vocals

Credits-wise, expect around 500 credits per minute of generated audio.

The Challenge: Turning Text into Music

Now comes the fun part - how do you tell an AI to make a song about "I got promoted at work"?

My first attempts were... not great. When I just passed the user's moment description directly, the AI would generate very literal lyrics like "I went to the office and got a promotion." Not exactly the emotional soundtrack I was going for!

I needed a smarter approach. Here's my 3-layer prompt system:

Layer 1: Emotion → Music Theory

First, I map emotions to actual music characteristics. A happy moment should sound different from a triumphant one:

const EMOTION_TO_MUSIC: Record<string, string> = {
  happy: 'upbeat, major key, bright instrumentation, energetic tempo around 120-140 BPM',
  sad: 'melancholic, minor key, slow tempo 60-80 BPM, piano and strings, emotional',
  nostalgic: 'warm, analog feel, vintage production, reverb-soaked, mid-tempo',
  triumphant: 'epic, orchestral swells, powerful drums, anthemic, building crescendo',
  peaceful: 'ambient, soft, acoustic, gentle, relaxing, 70-90 BPM',
  energetic: 'high energy, driving beat, electronic elements, 130+ BPM, punchy',
};

Layer 2: Genre → Style Modifiers

Next, users select their preferred music style. Each genre has specific production characteristics:

const STYLE_MODIFIERS: Record<string, string> = {
  pop: 'catchy hooks, polished production, radio-friendly pop',
  rock: 'electric guitars, drums, raw rock energy',
  'lo-fi': 'lo-fi hip hop, vinyl crackle, mellow beats, chill',
  electronic: 'synthesizers, electronic beats, modern EDM production',
  'j-rock': 'Japanese rock, anime soundtrack style, visual kei influences',
  'k-pop': 'Korean pop, bright synths, catchy hooks, polished K-pop production',
  // ... 10 more genres
};

Layer 3: Language → Vocal Direction

Since the app supports Spanish, English, and Portuguese, I also direct the AI on vocal style:

const LANGUAGE_VOCAL_DIRECTION: Record<string, string> = {
  en: 'English vocals, clear pronunciation',
  es: 'Spanish vocals, Latin influence',
  pt: 'Portuguese vocals, Brazilian or Portuguese influence',
};

The Secret Sauce: Artistic Transformation

Here's what I learned after experimenting with different prompts: the model tends to generate very literal lyrics. If your moment is "Beach vacation with friends," you'll get lyrics about sand and waves.

The fix? Explicit instructions for poetic, metaphorical writing:

const prompt = [
  styleDesc,
  emotionDesc,
  vocalLang,
  `Song about: "${safeTitle}"`,
  `Core emotion to convey: ${emotionContext}`,
  `Context for inspiration (transform into poetry, NEVER copy literally): ${safeDescription}`,
  'IMPORTANT: Write artistic, metaphorical lyrics using imagery and symbolism',
  'Focus on the emotional journey and transformation, not literal events',
  'Use universal themes like light/dark, rise/fall, storms/calm to express feelings',
  '30-45 second track with vocals, clear intro and outro',
].join('. ');

The key phrases that changed everything:

"Transform into poetry, NEVER copy literally"
"Artistic, metaphorical lyrics using imagery and symbolism"
"Universal themes like light/dark, rise/fall, storms/calm"

Now instead of "I went to the beach," I get lyrics about "waves carrying yesterday's promises" and "salt-kissed memories fading into golden horizons."

Multi-language Support

Additionally, for this to work in Spanish and Portuguese, I had to think about cultural context too. The LANGUAGE_VOCAL_DIRECTION doesn't just specify the language - it adds cultural flavor:

English: Clear pronunciation for accessibility
Spanish: Latin influence for rhythm and warmth
Portuguese: Brazilian/Portuguese influence for that saudade feeling

What I Learned

After building this, here are my key takeaways:

What works well:

Emotional mapping creates genuinely different-sounding tracks
The "artistic transformation" instruction dramatically improves lyrics
45-second tracks are the sweet spot for individual moments
Multi-language vocals work surprisingly well

What could be better:

The model sometimes doesn't follow style instructions closely (especially for niche genres)
Very specific prompts can confuse the model
No control over specific instruments or arrangements

Cost breakdown for a typical soundtrack:

5 moments × 45 seconds = ~3.75 minutes of audio
~1,875 credits per soundtrack
Plus optional 2-minute "global" track combining all moments

Try It Yourself

I hope you found this guide helpful! If you want to try generating your own year soundtrack, check out the live app: soundtrack.codeanding.com/en

The ElevenLabs Music API documentation is here: ElevenLabs Docs

I'd love to hear your thoughts - have you experimented with ElevenLabs Music API? What creative uses have you found for AI-generated music?

Let's keep coding and learning together!

This is part 1 of a 2-part series. In part 2, I'll cover how I used Supabase for authentication, database, and audio file storage.

From Life Moments to Songs

Why ElevenLabs Music API?

The Challenge: Turning Text into Music

Layer 1: Emotion → Music Theory

Layer 2: Genre → Style Modifiers

Layer 3: Language → Vocal Direction

The Secret Sauce: Artistic Transformation

Multi-language Support

What I Learned

Try It Yourself

Comments

More from this blog

Deploying a clinical RAG to AWS with Terraform

Why my clinical RAG is mostly SQL

Building My Own Nakama with RAG

Supabase in Practice: Auth, Database, and Storage

Command Palette

Why ElevenLabs Music API?

The Challenge: Turning Text into Music

Layer 1: Emotion → Music Theory

Layer 2: Genre → Style Modifiers

Layer 3: Language → Vocal Direction

The Secret Sauce: Artistic Transformation

Multi-language Support

What I Learned

Try It Yourself

Comments

More from this blog