My Week 0 in AI Era
2 Aug 2025 AI MCPRAGAIAgentThe past week was so exciting that I skipped a couple of meals and had sleepless nights. Apparently, I am late to the party: there have been many significent moments last year or ealier that should have marked the beginning of the AI era for human being (much more appropriate than last week), while for me, it was not until last week that I got the feeling of advencing into AI era.
I learnt a plenty of things last week in order to get to the speed. This post notes down those AI techniques which excite me.
To be precise, last week is my Week 0 in AI Era as an engineer. Before last week, I have been playing with chatbots (Gemini, ChatGPT, Deepseek) driven by Large-Language-Model (LLM), with all kinds of random questions. I have also had WoW moments when I saw someone built a LLM-driven wiki-timeline website in a kind “vibe coding” way, and when NotebookLM parsed the rule book of a board game, then generated a podcast where two hosts cross talk natually, introducing the rules to audience. The most amazing part is that I could even chime in anytime to ask questions! I had basic understanding on how neural-network-based machine learning works, and finished the 3-course program on ML given by Andrew Ng earlier this year.
Nevertheless, I felt no revolutional changes that would define a new era. What the recent advancement in AI bring to my life were just incremental improvements. Apparently, I was wrong. There have been ways you can invovle LLM into your workflow, even delegate specific tasks to AI agents and change how you work and live dramatically. The workflow, tasks, I am talking here refer to the kind of work I do in daily basis as an engineer, not just the general scenarios where people use LLM for Internet search, document summary etc. It was a thrilling moment for me to realize this. Thinking positively, there is a clear pathway for my dream come true. I was dreaming one day I could command AI agents to do the tedious work and I just sit there watching. From another perspective, if I could not transform my role by that day, I would be the one being replaced by AI agents!
A Really Simple Example of Building Agent
All my Week 0 experience starts with a post
shared from my colleague.
In that post, the author demonstrates along with simple go code
how LLM could be “trained” into an agent step by step.
First, LLM could learn the syntax of making function calls.
For interpret languages, function calls starts as a string (plain text, LLM is good at),
so not a magic.
Then, if the function has appropriate description on what it does,
what are the arguments,
LLM will figure out when, which, and how a function should be called.
All this is based on the semantic meaning of the natural language.
It appears amazing at the first glance,
but LLMs are just trained for Natural Language Processing (NLP),
so we should not be too much surprised.
Then the AI application steps in to play its role.
AI application is the program running on your side,
establishing connection to LLM service server,
collecting and send your messages to LLM,
and then displaying the response from LLM.
When the response from LLM appears to be a function call,
the AI application is responsible to parse the function call string,
and make the invocation (yup the functions are real).
Now it is up to application developers to define the functions and
control how they exposed are to LLM.
In the demo, the post author show how LLM does file editing in the way user asks
(of course the user commands in natural language, not a shell command).
An Agentic Coding Practice on the Really Simple Example
The post is well written and evocative.
While I do not have much experience with go programming, so does my colleague, I guess.
Here is how the experience went to the next level.
My colleague says, he pointed aider,
an AI-driven agentic coding assistant, to the post (using /web command in aider),
asks aider to make a Python script that does the same as the go example,
also put dummy responses where communication with LLM API endpoint was needed.
It turns out aider did good job, making a runnable Python script.
Then my colleague points aider to documentations of our internal LLM infra,
ask aider to use the internal service to get LLM response.
Then he got a totally functioning Python script that works like the file editing agent!
It just worked out of box, and he did not need to tweak the code at all
(so called vibe coding: forget the code even exists).
This is a big encourage for me. It sounds like almost no effort at all, and you get a Python agent to play with, right? So I tried his approach, and not surprisingly, aider gives ME a script consistently failing to connect LLM service. After a few struggles on changing prompts given to aider, back and forth telling aider my requests, I had no luck to get aider figure this out by itself. So I ended up digging into the LLM infra by myself, which turned out not a bad thing.
Tools and Beyond
After many trial and error that worth nothing to be detailed here,
I finally got the example working, but in an unexpected way.
The go code in that post handles the function call parsing in the AI application.
This does not need to happen in the AI application actually.
Many LLM APIs have intergrated this, I noticed.
It could be an optional argument of the function you prompt LLM.
Usually the argument is named tools.
In Python it could take a list of Callable.
A well-written docstring (in a certain style) of the Python function would be added into the context with LLM,
so that LLM could figure out what the “tool” does, as well as how to call this “tool”.
That is convenient.
With tools, LLM will no longer just present as a chatbot, but an agent that can actually do something for you.
Here is an example. I do monthly release of a package from a git repo, and for every release I create a tag with brief summary of the commits went in since last version. If I just throw the git repo URL to ChatGPT, it likely would say it has no access then make up an example. Well, it is super easy to write a Python function to do the git clone, and extract the commits. Once such a Python function is provided to LLM as a tool, LLM stops complaining about no git access, but makes tool calls to get what it needs. And if I give it another tool to create git tag, it will just do it for me, so I do not even need to copy paste or visit the git repo any more.
In fact, the concept of tool is one of the three primitives in Model Context Protocol (MCP). MCP is introduced by Anthropic in November 2024 with the intention to enrich the input, output of LLM models. This aligns with my understanding from the above examples, while this is not yet the deal breaker yet.
LLM as the Universal Middleware
I came across the inspiring article, MCP: An (Accidentally) Universal Plugin System, which presents an interesting point from another perspective.
As engineers, we all have been building all kinds of tools for our work already. For example, I have many scripts to conduct simulations, and perform post processing, analysis in various ways and for all kinds of purposes. Most of the scripts are not written by me, and I might only use them once or twice for a long period. For people having poor memory like me, likely we need to look up the documentation every time we want to run a script. Even for those scripts I run frequently, I would make typos, or I have to copy paste around multiple places to gather the right inputs. Those tedious things not only waste my time, but also make the workflow inefficient. Ideally, we would like to connect the tools for automation. The less human involvement, the higher the efficiency, as machines can run 24/7 tireless. The automation progess has started to some extent. Some of the scripts we have are wrappers of a set of other tightly connected tools. Those scripts themselves are created for the automation purpose (e.g., before I got a script launching simulation runs in batch, I would need to launch them one by one manually). The only blocker stops us from traveling the last mile towards full automation is natural language. Human has been using natural language for so long to communicate and collaborate, that we have been tied to this. Whenever we write the specification of a project, whenever we assign tasks to coworkers, whenever we ask others to help on some problems, we all would use natural language. Natual language is such a universally used tool to bridge every thing, including the tools we have built. If a script went wrong, we ask it to dump error messages. For scripts having multiple ways to use, we document the command line options in natural language. Nevertheless natural language is vague, causal and difficult for machine to execute. It would take many efforts to build a component versatile enough to bridge the natural language interface of two tools. The time saved by automation would not worth the effort.
But we have never been so close to finish the final mile than now.
LLM is the versatile component that is specialized in NLP, and MCP is the protocol to use LLM as a universal middleware in the system. AI agents that we are building, are essentially a collection of tools we have invented before, wrapped by MCP, then orchestrated by LLM.
It took me about a day to get the demo agent working, then the rest of the week, I was busy wrapping my tools for LLM to automate my work. Aider did not live up to my expectation on the demo example, however it started to do better job on writing those wrapper functions. Obviously, I need to think carefully as well as creatively on how those tools should be wrapped. Once I make up my mind and describe my idea to aider, it does write good code, including debuging in the loop, and teaches me a lot by the way.😂
MCP and RAG
Since mentioned MCP, it is unfair just talk only about tool, and leave the other two primitives alone. Tool is the model-centric primitive, that we let model spontaneously decide when to use some tools. The other two primitives are user-driven, focusing on populating context. One of them is prompt. Different from the prompt that users type in the messages, the prompt-completion provided from MCP server could save user some time. Additionally there could be hidden system prompts added into context for LLM to get prior knowledge or implications that are too implicit for users to tell. For example, an AI-driven routing app might add the following prompt to set a role for LLM:
You are an expert in geospatial data and API utilization, specifically tasked with generating precise Google Maps static image URLs…
The last primitives is resources, which has only a subtle difference from prompts. Resources usually refers to documents in their raw formats, and external systems like database. Please note, they are added as resources in all kinds of format (PDF, MP4, URL etc.) to the AI application, but they are not like the prompts (plain text) that could be fed into LLM context directly. Resources often time requires the AI application to do some integration work, extract the useful informaiton as plain text then give to LLM. For example, the AI application might export the user selected table (or rows, columns) from the database, as a CSV then include it to the context. In my option, prompts and resources are all just engineering practices to give convenience to users, so are not as a game changer as the tool is. They are definitely still great additions though.
Speaking of resources, I would like to just dump another somewhat related concept I recently learnt. It is Retrieval Augmented Generation (RAG). With tons tons of documents, it becomes difficult for us to find what we want. The traditional solution is we index the document snippets with embedding (a foundmental technique in NLP that transfers text into vector, via statistics on large volume I guess). Then we embed the keywords (or query sentence) in the same way. We let the the computer run through all document snippets, calculating the distance of two embeded vectors (a mathmatical way to indicate similarity or relevance), sort by the distance and return the most likely results. A naive RAG scheme is to let LLM generates a multiple queries similar to the user query. Then run the vector search on all variants of queries. Do a global ranking on all the results, then feed the top K unique snippets to LLM, ask it to summarize. At the end, the user gets the summary from LLM, along with the references to the actual documents. RAG is said to be the very first use case for LLM. It is also kind an example of the processing application would do on resources before feeding into LLM.
Credits
- How to Build an Agent - Such a good “Get Started” article that really got me started. Also a big thanks to my colleague Josh Misko who shared the article along with his practice!
- MCP 201: The power of protocol - One of the inventers of MCP talking about the design, which helps me finally get the subtle differences between the 3 primitives: prompts, resources, tools
- MCP: An (Accidentally) Universal Plugin System - Very inspiring perspective.
- What is RAG, Zhihu Article written in Chinese - Nice introduction to RAG. Explained in simple language that is good for beginners like me.
- Understanding Reciprocal Rank Fusion in Hybrid Search - Reciprocal Rank Fusion (RRF) used in RAG is well explained in this video.
