I always considered a “real world” project the best way to learn a new tech: get the hands dirty, be guided by (sort-of) realistic user requirements, and the excitement of building something step after step, one solved failure at time.
This is why I decided to “be inspired” by the passion one of my kids has for Sonic the Hedgehog, and use the latest tools available in the ML and GenAI space to create for him a “Sonic-AI buddy”. A virtual chatbot, looking and acting like Sonic, my kid can interact and conversate with, safely and having fun.
To break-down complexity of such project, so I don’t need to learn everything-about-LLM before creating something, I want to start with a very basic working prototype providing simple chatbot features (the so-called MVP), and then develop different “skills”, with each one of them requiring learning and using different ML or GenAI techs to be be achieved. Incremental learning and improvements.
- The “Brain” (done): the core part of the project, a text chatbot agent able to impersonate Sonic, to provide my kid the feeling he can ask basic questions to him, and gets replies coherent with the style of his preferred heroes.
- Technologies: an LLM used as a chatbot, a UI to interact with it, a system prompt to give the basic characterization.
- The “Memories” (in progress): enrich the chatbot with domain-specific knowledge of the world of Sonic and his friends, so conversations won’t only be “in the tone” of Sonic, but also relavant to Sonic-verse.
- Technologies: a mix of better prompting, fine tuning, RAG or something else to give the LLM the right knowledge about the character to impersonate
- The “Voice” (in progress): what if the bot can speak with the voice my kid associates with Sonic?
- Technologies: a customized Text-to-Speech model trained on the voice to reproduce, and a speaker
- The “Hearing” (in progress): to completely get rid of text interaction, questions should be asked via voice
- Technologies: connect the chatbot with a Speech-To-Text engine, and a mic
- The “Eyes” (in progress): Sonic should be able to see the world around him
- Technologies: something to capture a video stream, and a multimodal LLM to process images and text.
- The “Body” (in progress): This is something that will connect the different input/output sensors. I’m still unsure how to create the body.about this In addition to a voice, the bot should have some sort of tangible body
- Technologies: it could be a 3D printed figure of Sonic, an animated characted or something else
There is another preprequisite I want to fullfil: everything can run locally and based on OSS software. I’m a little bit paranoic mindful about privacy, and for no reasons the interaction of my kid should end-up in training dataset, or for internal model analysis, or somewhere else. So, privacy first.
Let’s start with “the brain“, the main element to which all the rest can then be attached.