AI Assistant embedded in my portfolio
I integrated an AI assistant into my portfolio to answer, in real time, questions about my projects, contact, tech stack, experience and education.
Problem
Traditional portfolios are static. Visitors have to navigate manually, read through pages, and figure out the technical context on their own. That creates friction — especially for recruiters or developers who want to quickly understand what you built, how you think, and how deep your knowledge actually goes.
Solution
A conversational AI assistant embedded in the portfolio that lets anyone explore my work through natural language. It answers questions about projects, experience, stack, and more — in real time, using my own content as context.
How it works
- The user asks a question in natural language
- The frontend sends the query to the backend with streaming enabled
- The backend generates an embedding and searches the vector database by cosine similarity
- The most relevant chunks are injected as context into the prompt
- Groq streams the response back to the frontend in real time
Tech stack
- Frontend: Next.js, Tailwind CSS
- Backend: Python, FastAPI
- CMS: Payload CMS
- Embeddings: HuggingFace Inference API (all-MiniLM-L6-v2)
- LLM: Groq, Llama 3.3 70B
- Vector database: Neon DB with pgvector
- Deploy: Docker, Azure App Service
Technical decisions
- RAG instead of static context: dynamically retrieves the most relevant fragments based on the question
- HuggingFace Inference API: avoids loading the model in memory, solving free tier limits
- Groq: low latency and real streaming for a fluid conversational experience
- Async FastAPI: non-blocking backend that handles streaming efficiently
- Azure App Service with Docker: no cold start, no artificial memory limits
Challenges and learnings
- Configuring real streaming without buffering through the Azure proxy
- Handling 2D vs 1D embeddings from the HuggingFace API
- Designing the system prompt for language detection and topic restriction
- Managing database reconnections after instance idle time