((((sandro.net))))

sábado, 7 de setembro de 2024

Show HN: I mapped HN's favorite books with GPT-4o https://ift.tt/csRAKXe

Show HN: I mapped HN's favorite books with GPT-4o Hey HN! I love finding new books to read on here. I wanted to gather the most mentioned books and recreate the serendipity of physical browsing. I scraped 20k comments from HN threads related to reading, extracted the references and opinions using GPT-4o mini, and visualised their embeddings as a map. - OpenAI's embeddings were processed using UMAP and HDBSCAN. A direct 2D projection from the text embeddings didn't yield visually interesting results. Instead, HDBSCAN is first applied on a high-dimensional projection. Those clusters tend to correspond to different genres. The genre memberships are then embedded using a second round of UMAP (using Hellinger distance) which results in pleasingly dense structures. - The books' descriptions are based on extractions from the comments and GPT's general knowledge. Quality levels vary, and it leads to some oddly specific points, but I haven't found any yet that are straight up wrong. - There are multiple books with the same title. Currently, only the most popular one of those makes it onto the map. - It's surprisingly hard to get high quality book cover images. I tried Google Books and a bunch of open APIs, but they all had their issues. In the end, I got the covers from GoodReads through a hacked together process that combines their autocomplete search with GPT for data linkage. Does anyone know of a reliable source? https://ift.tt/oXN9Rn7 September 7, 2024 at 09:23AM

Nenhum comentário:

DJ Sandro

http://sandroxbox.listen2myradio.com