Leipzig Gophers

🔗Virtual Meetup #17 wrap-up

(Fuzzy) Matching with command line tools and Go

Meetup #17 took place Apr 20, 2021 19:00 CEST, and was virtual again (crossing one year of virtual meetups). We had a lightning talk on a data engineering topic:

How to build a graph dataset with about 1B edges from semi-structured data? With Taco Bell style programming, you can reuse (UNIX) command line tools and combine it with a few custom Go programs.

The graph is about citations, so we looked at publications that cite a paper relevant to Go, namely the classic CSP paper from 1978.

Hoare, Charles Antony Richard. “Communicating sequential processes.” Communications of the ACM 21.8 (1978): 666-677.

The custom tool exploits sorted keys and works in a merge sort style to run computation on groups of items with the same key. One might consider key extraction a mapping and grouping operations a reduce step.

Graph stores and algorithms

Are there interesting graph libraries and project written in Go? There are a few …

A generic data science umbrella project is: Gonum - Consistent, composable, and comprehensible scientific code. It contains a package for graph processing as well.

Some project in other languages include:

Sometimes people write custom code for specific algorithms, e.g. for pagerank.


Data stores and analytics engines (outside Go):

Tiny, useful tools:

Reading recommendations:

Some research questions:

Misc in Go and other languages


Thanks everyone for dropping by - great to see people join from across Europe and the globe!

