converging on LLM product stack

Having tried out quite tools for building a production based LLM apps, I think I’ve landed on a stack which I feel really comfortable with. Largely, it based on my personal preferences for how apps are built: python backend and nextjs frontend.

At Researchable, we had talks if it makes sense to have templates to quick start a project. Some were against it, saying it would get outdated. Others (like me) were for it, but having trouble to list what is actually templateable.

I digress, previously, I also had similar ideas. For example, a prompt zoo. A web app which has not only the library of prompts, but also how to implement them and an interactive UI to test them. This would have been nice, because first time implementing AI patterns, as simple as they were, was a pain because of lack of good examples. Some prompts like Chain of Thought (COT) didn’t really make sense, because I tried to implement them when the tools already existed, essentially already having all the libraries implement this pattern in one way or another. Scope creep also had its own fair share of troubles. Pin pointing if I wanted to limit the examples to just prompts or also having RAG like integrations.

Frontend makes full use of AI SDK. It’s perfect from handling streaming to AI components. AI SDK has support for backend, but I wasn’t a big fan of it. Having it all in one project, really got me mixed up what ran on server and on client. And there was a lot of drilling responses to do things like human in the loop tool calls. Also, having a backend in javascript is nice for a quick start, but with the project growing it felt much harder to manage.

Separating the backend to python added clear separation of concerns. All the LLM handling happens on the backend, in python. All the interaction with the user on the frontend, in javascript. This did mean, I had to integrate with AI sdk by myself. But tbh, it just demystified all the abstraction away, which I feel will allow me to develop more complex workflows.

I digress, while developing, I understood what an agent is. Previously, I used to implement it using AI sdk backend and frequently, an agent would just stop midway, leading to very annoying interaction like saying to it, “continue”. In reality, this was not a complete agent, rather a chat correspondent, which could call tools. For it to really be agentic, you need to wrap that thing in a while loop.

This is, what I refer to now, as an AI pattern of an Agent. To implement it, you have a while loop which triggers an LLM:

# The conversation history (context)
messages = [{"role": "user", "content": "What's the weather?"}]
 
while True:
    # 1. Get response from LLM (can include text and/or tool calls)
    response = llm.chat(messages) 
    
    # 2. Add the assistant's response to the context
    messages.append(response.message)
 
    # 3. Check if the LLM wants to use tools
    if not response.tool_calls:
        # No more tools to call? We are done.
        break
 
    # 4. If there are tool calls, execute them
    for tool_call in response.tool_calls:
        result = execute_tool(tool_call.name, tool_call.args)
        
        # 5. Add the tool result back to context so the LLM can see it
        messages.append({
            "role": "tool",
            "tool_call_id": tool_call.id,
            "content": result
        })
 
    # 6. Loop continues: The LLM processes the tool results 
    # and decides if it needs more tools or can give the final answer.
 
print("Final Answer:", messages[-1]["content"])

Every step is also streamed to the user. So with this granular access to what is actually streamed to the user, you can make the flow more complex. For example, have it follow a workflow that you made. Like deep researcher would be, make query -> search -> make more queries -> ... -> search -> compile results to give a complete answer. Here you can control the context of every LLM call and the prompts that you give it.

I can imagine, this being used for specific buisness process like, do research and then generate a document. The agent finishes with sending you a file. And because you control how the flow works, you can force it to do just that.

nomomon

Recent Thoughts

reflections on family business

AI Braille prompting

server malware