link to website: https://trumpini-bot.replit.app/
I want to make a bot that responds without text, using images or sound. Inspired by the dialogues in video games like "Don’t Starve Together" and ‘animal crossing’, I decided to create a trumpet bot that responds to messages with trumpet sounds. For a quick prototype, I used AI-generated sounds (though I’m unsure if this is the best choice, as explained in the blog below).
https://www.youtube.com/watch?v=Yz9LPwsO9x0
https://www.youtube.com/watch?v=S7-njGsKmTI
Most bot build their character through content (like Twitter bots). Since I have less control over the AI bot’s output, I focused on creating a strong image by giving it an illustrated avatar to express a chill jazz vibe. I plan to showcase it on a website where I have more control over the visuals instead of using a platform.
For the code, I used an example by Max Bittier. I added a hidden prompt that combines with the user's input to generate the trumpet sound using musicgen on replicate. link to code
The main problem I encountered is the harsh cut at the end of the sound. A possible solution might be a better prompt or training my own model.
I’ve realized that I’m trying to avoid the “perfect AI quality” and instead aim for a more rustic, handmade style in this bot. Knowing the sound is AI-generated seems to take away some of its magic and charm. Using a music generation model is the easiest solution, but it may not be the best.
Other possible solutions could include generating a music score and playing it manually.
Further ideas
finetuning the music gen
transformer.js
audio analysis to transform voice into tone
speech cloning like 11lab to train the own voice
human voice training: “assembly voice”