Like all good programmers my son might be a little bit lazy so I’m going to start teaching Python programming so that he can figure out how to do an essay generator.
“I’m not learning all of I will let you teach me Python if all my fridge if you teach me how to do sheet writing an essay using code”
Fast vector similarity queries using CouchDB views
Problem: I have a corpus of text documents, and I want to compare a new document and quickly find the most similar documents.
This is how I have built this website.
- Turn off old web hosting and let it rot.
- Figure out a cheap way to host HTML on the domain.
Let's dive in!
I'd done some previous work
with using Trello as a simple content management system, and also recently learned about Cloudflare's (free plan) Workers which enables simple serverless functions running on their edge networks.
Putting the two of these free services together gives me a lightweight and familiar content management system, and a super-fast and minimal serverless, global hosting for responding to page requests.
I'm purposely avoiding spending any money whatsoever on this for a simple personal site / blog, therefore I'm avoiding using the Workers KV storage, which would simplify some of the steps below.
Combining LDA and K-Means clustering for automated persona generation
A large project that I'm working on for the past few months is all about natural language analysis. I've had a great deal of fun diving deeply into both cloud api powered and local ML and NLP techniques, and in the meantime collecting largish data sets that are verbatim quotes from specially crafted consumer surveys.
The background: There's this construct in marketing-land called a "Persona", and it's a useful abstraction of the idea that there are groups of people who behave similarly, or have similar intents in the market, and can be described with a few key wants, needs, and desires.
Given the large amount of statements of consumers from the survey data, what can we do to help automatically create such personas? We may not be able to do the whole job with code, but there should be interesting ways to help find the data points and insights to do 80% of the job, and provide a starting point for human strategists to editorialize and craft into a useable persona document.