sexta-feira, 22 de novembro de 2024
Show HN: PDF2MD – Rust+Redis+ClickHouse+VLLM conversion pipeline for PDFs https://ift.tt/gSKaQCB
Show HN: PDF2MD – Rust+Redis+ClickHouse+VLLM conversion pipeline for PDFs If you just want to use it, try here - https://ift.tt/WVk9FDE . I think the LLM's are astoundingly good at converting complex powerpoint style infographics. I wouldn't normally think folks on HN would find this interesting as the general concept has been posted about already in the past few months. We were heavily inspired by Zerox[1]. However, the stack we went with was fun and over-engineered which is more likely to create interesting discussion. We use all the same tools at Trieve (our main product), but wanted to see if they would be a good fit for something that needed to get built in a tighter timeline and we think they were! Took us 2 weeks to get this setup end-to-end and it's by no means complete (see roadmap in linked README). However, it's cool that a relatively cookie cutter web service like this can be created with pure open-source dependencies and non-standard Rust tooling so quickly. Rust won't kill your startup! - Minijinja templates for the UI[2] - PDFObject for doc display in-browser[3] - actix/actix-web HTTP server framework[4] - Redis queue macro for worker async processing[5] - Clickhouse for task storage[6] - chm CLI to handle Clickhouse migrations[7] - MinIO S3 for object storage[8] [1]: https://ift.tt/PU6hNl4 [2]: https://ift.tt/eXvLh7c [3]: https://ift.tt/mLX3a4j [4]: https://ift.tt/B853AfO [5]: https://ift.tt/kOFiUJq... [6]: https://ift.tt/P2R9e4A [7]: https://ift.tt/60XcRZF [8]: https://ift.tt/BrkopPG https://ift.tt/AEjQ32u November 21, 2024 at 06:05PM
Assinar:
Postar comentários (Atom)
DJ Sandro
http://sandroxbox.listen2myradio.com
Nenhum comentário:
Postar um comentário