Show HN: An interactive explorable map of 5M Hacker News stories

lmcinnes.github.io

3 points by lmcinnes 7 hours ago

This is a data map providing a view of all HackerNews stories with a score of at least 2 (to remove most of the spam). Stories are close together in the map if they have semantically similar titles. In the bottom left is a histogram of stories over time. Hovering on a bar will select stories from that year, and dragging a selection allows selecting multiple years. A keyword based text search is in the upper left. Hold down the shift key and drag to lasso-select points and get a word cloud generated from the selection. Clicking on a point will open the URL for that story.

The dataset was filtered from https://huggingface.co/datasets/OpenPipe/hacker-news Stories were embedded in a vector space via nomic-embed: https://huggingface.co/nomic-ai/nomic-embed-text-v1.5 A 2D representation was generated using UMAP: https://github.com/lmcinnes/umap Clusters were generated and topics named via HDBSCAN and Toponymy using Cohere Command-R: - https://github.com/TutteInstitute/fast_hdbscan - https://github.com/TutteInstitute/toponymy - https://cohere.com/command The interactive map was generated using DataMapPlot: https://github.com/TutteInstitute/datamapplot

The map provides a great way to get an overview of Hacker News stories over the years, and to explore them, and find interesting niche topics. There are limitations to both the text embedding and the 2D representation. For example posts about John Gruber's "Daring Fireball" end up in "Sun-related phenomena" in the Astronomy region of the map; and some topics get squashed into odd places because of the limits of a 2D representation. Nonetheless, most topics, regions and stories are well placed. There is a wealth of knowledge and information packed in here, and a lot to explore.