(Fuzzy) Matching with command line tools and Go
Meetup #17 took place Apr 20, 2021 19:00 CEST, and was virtual again (crossing one year of virtual meetups). We had a lightning talk on a data engineering topic:
How to build a graph dataset with about 1B edges from semi-structured data? With Taco Bell style programming, you can reuse (UNIX) command line tools and combine it with a few custom Go programs.
The graph is about citations, so we looked at publications that cite a paper relevant to Go, namely the classic CSP paper from 1978.
Hoare, Charles Antony Richard. “Communicating sequential processes.” Communications of the ACM 21.8 (1978): 666-677.
The custom tool exploits sorted keys and works in a merge sort style to run computation on groups of items with the same key. One might consider key extraction a mapping and grouping operations a reduce step.
Graph stores and algorithms
Are there interesting graph libraries and project written in Go? There are a few …
A generic data science umbrella project is: Gonum - Consistent, composable, and comprehensible scientific code. It contains a package for graph processing as well.
Some project in other languages include:
Sometimes people write custom code for specific algorithms, e.g. for pagerank.
Misc
- The GOLAB conference hosts free (and paid) webinars over the coming months: https://golab.io/agenda/, e.g. Go and Tensorflow
- Go garbage collector notes: https://blog.golang.org/ismmkeynote
Data stores and analytics engines (outside Go):
Tiny, useful tools:
Reading recommendations:
Some research questions:
- good caching libraries (e.g. for HTTP and other data), beside LRU
- how to write parsers (e.g. for DSL or markup languages like simpleml - example library: https://github.com/alecthomas/participle)
Misc in Go and other languages
- The Most Beautiful Program Ever Written (Scheme)
- Boundaries, Gary Bernhardt
Thanks
Thanks everyone for dropping by - great to see people join from across Europe and the globe!
Join our meetup to get notified of upcoming events!