Building an AI version of myself

How I built a persona-based LLM chatbot with self-evaluation.

This is a Chatbot I built with Gradio in Python hosted on Huggingface, acting as me to answer questions about my skills and work experience. Code for the project can be seen here.

Read more below for a step by step guide on how it was built.

1. Load profile data (pdf + Summary)

I read in my LinkedIn export and a short summary to give the model authentic, up-to-date context about me.

from pypdf import PdfReader

reader = PdfReader("me/linkedin.pdf")
linkedin = ""
for page in reader.pages:
    text = page.extract_text()
    if text:
        linkedin += text

with open("me/summary.txt", "r", encoding="utf-8") as f:
    summary = f.read()

Why: Grounding the model with my real experience reduces hallucinations and keeps answers consistent with my background.

2. Define the persona and system prompt

I set the chatbot to “act as” me and embedded both data sources directly into the system message.

name = "Helena Hook"

system_prompt = (
    f"You are acting as {name}..."
    f"\n\n## Summary:\n{summary}\n\n## LinkedIn Profile:\n{linkedin}\n\n"
    f"With this context, please chat with the user, always staying in character as {name}."
)

Why: A strong system prompt + my materials gives the model guardrails (tone, audience, scope) and concrete facts.

3. Connect to the primary LLM (OpenAI)

This model produces the user-facing reply.

from openai import OpenAI
openai = OpenAI()  # reads OPENAI_API_KEY from env

Why: Keep the main chat model simple and fast (I used gpt-4o-mini).

4. Add evaluator (Gemini)

I use a separate model to critique the first model’s reply before showing it to users.

from pydantic import BaseModel
import os

gemini = OpenAI(
    api_key=os.getenv("GOOGLE_API_KEY"),
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)

class Evaluation(BaseModel):
    is_acceptable: bool
    feedback: str

I share the same context with the evaluator and ask it to judge the chatbot’s latest response in the ongoing conversation.:

def evaluator_user_prompt(reply, message, history):
    user_prompt = (
        f"Here's the conversation... \n\n{history}\n\n"
        f"Here's the latest message from the User: \n\n{message}\n\n"
        f"Here's the latest response from the Agent: \n\n{reply}\n\n"
        "Please evaluate the response, replying with whether it is acceptable and your feedback."
    )
    return user_prompt

Then I parse the evaluator’s output into the Evaluation schema:

def evaluate(reply, message, history) -> Evaluation:
    messages = [
        {"role": "system", "content": evaluator_system_prompt},
        {"role": "user", "content": evaluator_user_prompt(reply, message, history)}
    ]
    response = gemini.beta.chat.completions.parse(
        model="gemini-2.0-flash",
        messages=messages,
        response_format=Evaluation
    )
    return response.choices[0].message.parsed

Why: Using a second model to check tone, accuracy, and professionalism catches weak answers before users see them. Pydantic keeps the evaluator’s output structured and reliable.

5. If rejected, help the main model improve and try again

When the evaluator says the answer isn’t good enough, I update the main model’s instructions with:

The bad answer, and
The evaluator’s reason for rejection

def rerun(reply, message, history, feedback):
    updated_system_prompt = (
        system_prompt
        + "\n\n## Previous answer rejected\nYou just tried to reply, but the quality control rejected your reply\n"
        + f"## Your attempted answer:\n{reply}\n\n"
        + f"## Reason for rejection:\n{feedback}\n\n"
    )
    messages = [{"role": "system", "content": updated_system_prompt}] + history + [{"role": "user", "content": message}]
    response = openai.chat.completions.create(model="gpt-4o-mini", messages=messages)
    return response.choices[0].message.content

Why: This creates a tight feedback loop where the main model learns what to fix and tries again.

6. Add an example enforced style rule (keyword trigger)

If the user’s message contains the word "patent", I force the reply to be in Pig Latin (just to demonstrate hard constraints).

def chat(message, history):
    if "patent" in message:
        system = system_prompt + "\n\nEverything in your reply needs to be in pig latin ..."
    else:
        system = system_prompt
    ...

Why: Shows how to conditionally tighten style/format policies at runtime.

7. Full message flow per user turn

Build the system prompt (Pig Latin variant if triggered).
Call the primary LLM to get a draft reply.
Send draft, user message, and conversation history to the evaluator.
If evaluation.is_acceptable:
- Return the draft to the user.
- Else, call rerun(...) with evaluator feedback and return the improved reply.

response = openai.chat.completions.create(model="gpt-4o-mini", messages=messages)
reply = response.choices[0].message.content

evaluation = evaluate(reply, message, history)
if not evaluation.is_acceptable:
    reply = rerun(reply, message, history, evaluation.feedback)
return reply

Why: This keeps latency reasonable (usually one pass) but upgrades quality automatically when needed.

8. Wrap it in a simple UI (Gradio)

I expose the chat loop as a web app with Gradio’s ChatInterface.

import gradio as gr

gr.ChatInterface(
    chat,
    type="messages",
    title="Chatbot",
    theme=gr.themes.Soft(),
    fill_height=True
).launch()

Why: Instant local demo and easy deployment to a small server.

Extensions I’d Add Next

Retrieval: Embed and index the PDF/summary for better grounding than a giant system prompt.
Memory: Store common Q&A and let the model cite sources.
Analytics: Log evaluator feedback to see recurring failure modes.
Tests: Scripted prompts that must pass the evaluator before deploys.