arxiv-sanity lite: tag arxiv papers of interest get recommendations of similar papers in a nice UI using SVMs over tfidf feature vectors based on paper abstracts.
Go to file
2021-11-25 13:28:04 -08:00
aslite sequester all file sytem IO ops only to db.py, so it's not total chaos 2021-11-25 13:28:04 -08:00
static big new feature: ability to inspect any paper to see the raw tfidf tokens and their weights that summarize the paper, and which powers the SVM recommendation engine. basically a bit of a debugging / insight feature, but a really good sanity check that papers are being properly represented 2021-11-21 20:51:01 -08:00
templates big new feature: ability to inspect any paper to see the raw tfidf tokens and their weights that summarize the paper, and which powers the SVM recommendation engine. basically a bit of a debugging / insight feature, but a really good sanity check that papers are being properly represented 2021-11-21 20:51:01 -08:00
.gitignore first leet codes 2021-11-12 20:40:19 -08:00
arxiv_daemon.py add an option to break out early when we've pulled in all new papers most likely 2021-11-24 09:15:36 -08:00
compute.py sequester all file sytem IO ops only to db.py, so it's not total chaos 2021-11-25 13:28:04 -08:00
LICENSE Initial commit 2021-11-12 20:34:22 -08:00
README.md add a screenshot and rearrange the readme a bit 2021-11-12 21:36:45 -08:00
screenshot.jpg add a screenshot and rearrange the readme a bit 2021-11-12 21:36:45 -08:00
serve.py sequester all file sytem IO ops only to db.py, so it's not total chaos 2021-11-25 13:28:04 -08:00

arxiv-sanity-lite

(WIP)

A much lighter-weight arxiv-sanity re-write. Currently runs only locally and doesn't exist as a website on the internet. However, the code is in a semi "feature-complete" state in the sense that you can look through arxiv papers, tag any of them arbitrarily, and then arxiv-sanity-lite recommends similar papers for each tag based on SVM on tfidf vectors constructed from the paper abstracts. So that's pretty cool, I find this personally plenty useful already, and it may be useful to you as well!

I hope to make this good over time and once it's ready to also host it publicly, deprecating the current bloated arxiv-sanity in favor of this new format. The biggest remaining todo's are adding user accounts and making everything nicer, faster, and more scalable as the number of papers in the database grows.

Screenshot

To run

  • (Periodically) run arxiv_daemon.py to add recent papers from arxiv to the database.
  • Then run compute.py to re-calculate tfidf features on the paper abstracts and save those to database.
  • Finally run serve.py to start the server and access the frontend layer over the data, e.g.: export FLASK_APP=serve.py; flask run.

License

MIT