Fast vector similarity queries using CouchDB views

Problem: I have a corpus of text documents, and I want to compare a new document and quickly find the most similar documents.

Read More

Posted: Wed, 20 May 2020 17:25:00 GMT

Meta Post

This is how I have built this website.
  1. Turn off old web hosting and let it rot.
  2. Figure out a cheap way to host HTML on the domain.
Let's dive in!
I'd done some previous work with using Trello as a simple content management system, and also recently learned about Cloudflare's (free plan) Workers which enables simple serverless functions running on their edge networks.
Putting the two of these free services together gives me a lightweight and familiar content management system, and a super-fast and minimal serverless, global hosting for responding to page requests.
I'm purposely avoiding spending any money whatsoever on this for a simple personal site / blog, therefore I'm avoiding using the Workers KV storage, which would simplify some of the steps below.

Read More

Posted: Tue, 19 May 2020 17:17:00 GMT

Combining LDA and K-Means clustering for automated persona generation

A large project that I'm working on for the past few months is all about natural language analysis. I've had a great deal of fun diving deeply into both cloud api powered and local ML and NLP techniques, and in the meantime collecting largish data sets that are verbatim quotes from specially crafted consumer surveys.
The background: There's this construct in marketing-land called a "Persona", and it's a useful abstraction of the idea that there are groups of people who behave similarly, or have similar intents in the market, and can be described with a few key wants, needs, and desires.
Given the large amount of statements of consumers from the survey data, what can we do to help automatically create such personas? We may not be able to do the whole job with code, but there should be interesting ways to help find the data points and insights to do 80% of the job, and provide a starting point for human strategists to editorialize and craft into a useable persona document.

Read More

Posted: Sun, 17 Jun 2018 17:17:00 GMT