Leipzig Gophers
blog

🔗Hybrid Meetup #56 wrap-up

An older version of the matrix · 不気味の谷

Hybrid Meetup #56 took place 2025-11-25 19:00 at Basislager Leipzig and we looked into basic agents with Go, notes can be found here: miku/unplugged.

Agents are possible because of the reasoning and tool support of language models (and they are somewhat simple to write).

An early paper on tools was Toolformer: Language Models Can Teach Themselves to Use Tools (2023-02-09)

We introduce Toolformer, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. This is done in a self-supervised way, requiring nothing more than a handful of demonstrations for each API. We incorporate a range of tools, including a calculator, a Q&A system, a search engine, a translation system, and a calendar.

[…]

Given just a handful of human-written examples of how an API can be used, we let a LM annotate a huge language modeling dataset with potential API calls. We then use a self-supervised loss to determine which of these API calls actually help the model in predicting future tokens.

An agentic setup then is mostly a loop that manages a context over time with the help of tools.

Google ADK for Go at this time only supports gemini out of the box (#233, #242, …), so we wrote a simple agent from scratch (against any openai compatible endpoint) and ended up with an program that had a short list of tools (some of them just stubs):

get_weather
add_numbers
get_time
search_library_catalog
ping
list_files
read_file
grep
write_file
append_file
run_command

We used both an RTX 4000 SFF ADA and an AMD AI MAX+ 395 with an 8060S (via FWD).

Spec AMD Radeon 8060S NVIDIA RTX 4000 SFF Ada
FP16 (theoretical) 59.4 TFLOPS ~19.2 TFLOPS
Memory Bandwidth ~212 GB/s (DDR5-8000) 280 GB/s (GDDR6)

However, prefill is a bit faster on the nvidia card:

$ time OLLAMA_MODEL=qwen3:14b OLLAMA_HOST=http://ada:11434 ./one -m "how warm is it in leipzig?"
2025/11/26 09:57:17 user: how warm is it in leipzig?               ...
2025/11/26 09:57:17 context length: 4974                           ...
2025/11/26 09:57:23 assistant wants to call 1 tool(s)              ...
2025/11/26 09:57:23 calling tool: get_weather                      ...
2025/11/26 09:57:23 args: {"city":"Leipzig"}                       ...
2025/11/26 09:57:23     Result: {"city": "Leipzig", "temperature": ...
2025/11/26 09:57:23 context length: 5227                           ...
2025/11/26 09:57:30 assistant: The current temperature in Leipzig i...

real    0m13.423s
user    0m0.001s
sys     0m0.013s

$ time OLLAMA_MODEL=qwen3:14b OLLAMA_HOST=http://strix:11434 ./one -m "how warm is it in leipzig?"
2025/11/26 09:57:34 user: how warm is it in leipzig?             ...
2025/11/26 09:57:34 context length: 4974
2025/11/26 09:57:41 assistant wants to call 1 tool(s)
2025/11/26 09:57:41 calling tool: get_weather
2025/11/26 09:57:41 args: {"city":"Leipzig"}
2025/11/26 09:57:41     Result: {"city": "Leipzig", "temperature"...
2025/11/26 09:57:41 context length: 5227
2025/11/26 09:57:50 assistant: The current temperature in Leipzig...

real    0m15.826s
user    0m0.007s
sys     0m0.007s

Still, the interplay is interesting to observe. Requests like “save the temperature in leipzig to temp.txt” or “fetch https://golangleipzig.space/leipzig-gopher.png and convert it to jpg” work with a 9.3GB 14B tool supporting LLM like Qwen3-14B.

Expertise in agent capabilities, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks. – model card

Example: (1) fetch image, (2) convert to jpg, (3) calculate sha1 and (4) write the result to a file (speedup 1.5x):

Future ideas for tools:

Go’s concurrency facilities seems to be helpful when implementing agents.

Misc


If AI produces too much artificial reality, maybe apply some ubik?


Join our meetup to get notified of upcoming events.