EN / RU
← Back to essays
· Essay · 2 min

Personal AI Assistant: First Steps

I've been nurturing the idea of a personal AI assistant for a while now and finally decided to take some steps in that direction.

<p>Ok, I've been nurturing the idea of a personal AI assistant for a while now, and I've finally decided to take some steps in that direction. I'll be writing small notes about what's happening.</p>

<p>At the moment, the project is open-source and will be available at <a href="https://github.com/rvnikita/rv_ai_assistant">https://github.com/rvnikita/rv_ai_assistant</a></p>;

<p>The main complaints about the current Voice GPT:</p>
<ol>
<li>Sometimes it glitches on long audio; I can dictate 2-3-5 minutes of thoughts, and then it says, “Oops, something went wrong,” and everything disappears. Technically, if it's voice in Telegram, this shouldn't happen.</li>
<li>It can't call functions - for example, I want it to listen and trigger some API for searching or to check the weather. Or, for instance, send a message.</li>
<li>GPT works poorly in the subway; it requires a constant connection, unlike Telegram, which can work and process messages even when there's no internet in the subway.</li>
</ol>

<p>So, at a high level, the current architecture is:</p>

<ul>
<li><strong>High-Level Architecture</strong>
<ul>
<li><strong>Dispatcher Agent:</strong>
<ul>
<li>The main agent that handles incoming messages and decides which module/agent to use based on the request.</li>
<li>Uses GPT-4o for decision making.</li>
<li>Interacts with a RAG (Retrieval-Augmented Generation) system to leverage past interactions and module usage data.</li>
</ul>
</li>
<li><strong>Modules/Agents:</strong>
<ul>
<li>Individual modules that handle specific tasks, such as downloading Twitter videos or future functionalities.</li>
<li>Each module has a standardized interface for inputs, outputs, and descriptions.</li>
</ul>
</li>
<li><strong>Database:</strong>
<ul>
<li>PostgreSQL for storing messages, configuration, user interactions, and feedback.</li>
<li>Alembic for database migrations.</li>
</ul>
</li>
<li><strong>Transcription Service:</strong>
<ul>
<li>Uses OpenAI's Whisper model for transcribing voice messages.</li>
</ul>
</li>
<li><strong>Environment Management:</strong>
<ul>
<li>Uses .env files to manage environment variables and API keys.</li>
</ul>
</li>
</ul>
</li>
</ul>

<p>For now, everything works on my local computer, but soon I'll try to deploy it on a server so that I can play with it.</p>

<p>In the current zero version, it simply copies the text that was sent to it as text, and also recognizes voice and returns text for the voice message.</p>

<p>The first module I want to implement is downloading videos from x.com, as this is a common task, and I constantly have to use third-party sites for this.</p>

<p>#ai #agi #rv_ai_assistant</p>

<p><a href="https://t.me/+OvImEUmA7W5mYTRi">————————— Мысли Рвачева —————————</a></p>