Personal AI Assistant: First Steps

Ok, I've been nurturing the idea of a personal AI assistant for a while now, and I've finally decided to take some steps in that direction. I'll be writing small notes about what's happening.

At the moment, the project is open-source and will be available at <a href="https://github.com/rvnikita/rv_ai_assistant">https://github.com/rvnikita/rv_ai_assistant</a>;

The main complaints about the current Voice GPT:
<ol>
<li>Sometimes it glitches on long audio; I can dictate 2-3-5 minutes of thoughts, and then it says, “Oops, something went wrong,” and everything disappears. Technically, if it's voice in Telegram, this shouldn't happen.</li>
<li>It can't call functions - for example, I want it to listen and trigger some API for searching or to check the weather. Or, for instance, send a message.</li>
<li>GPT works poorly in the subway; it requires a constant connection, unlike Telegram, which can work and process messages even when there's no internet in the subway.</li>
</ol>

So, at a high level, the current architecture is:

<ul>
<li>High-Level Architecture
<ul>
<li>Dispatcher Agent:
<ul>
<li>The main agent that handles incoming messages and decides which module/agent to use based on the request.</li>
<li>Uses GPT-4o for decision making.</li>
<li>Interacts with a RAG (Retrieval-Augmented Generation) system to leverage past interactions and module usage data.</li>
</ul>
</li>
<li>Modules/Agents:
<ul>
<li>Individual modules that handle specific tasks, such as downloading Twitter videos or future functionalities.</li>
<li>Each module has a standardized interface for inputs, outputs, and descriptions.</li>
</ul>
</li>
<li>Database:
<ul>
<li>PostgreSQL for storing messages, configuration, user interactions, and feedback.</li>
<li>Alembic for database migrations.</li>
</ul>
</li>
<li>Transcription Service:
<ul>
<li>Uses OpenAI's Whisper model for transcribing voice messages.</li>
</ul>
</li>
<li>Environment Management:
<ul>
<li>Uses .env files to manage environment variables and API keys.</li>
</ul>
</li>
</ul>
</li>
</ul>

For now, everything works on my local computer, but soon I'll try to deploy it on a server so that I can play with it.

In the current zero version, it simply copies the text that was sent to it as text, and also recognizes voice and returns text for the voice message.

The first module I want to implement is downloading videos from x.com, as this is a common task, and I constantly have to use third-party sites for this.

#ai #agi #rv_ai_assistant

<a href="https://t.me/+OvImEUmA7W5mYTRi">————————— Мысли Рвачева —————————</a>