kanshan/arxiv-sanity-lite

Fork 0

arxiv-sanity lite: tag arxiv papers of interest get recommendations of similar papers in a nice UI using SVMs over tfidf feature vectors based on paper abstracts.

arxiv deep-learning flask machine-learning

Go to file

Andrej Karpathy c3161b2a49 do not reveal username since they are kind of secret now		2021-11-26 17:11:19 -08:00
aslite	ok now we can sequester all the database files into data/ folder so everything is nice and clean yay	2021-11-25 13:47:45 -08:00
data	ok now we can sequester all the database files into data/ folder so everything is nice and clean yay	2021-11-25 13:47:45 -08:00
static	let things breathe a bit more	2021-11-26 16:44:21 -08:00
templates	do not reveal username since they are kind of secret now	2021-11-26 17:11:19 -08:00
.gitignore	and i think that's it, we now support user accounts (lite)git commit -m 'and i think that\'s it, we now support user accounts litegit status sweet.'! sweet.	2021-11-26 16:38:36 -08:00
arxiv_daemon.py	add an option to break out early when we've pulled in all new papers most likely	2021-11-24 09:15:36 -08:00
compute.py	sequester all file sytem IO ops only to db.py, so it's not total chaos	2021-11-25 13:28:04 -08:00
LICENSE	Initial commit	2021-11-12 20:34:22 -08:00
Makefile	example makefile	2021-11-25 13:51:52 -08:00
README.md	few notes on some outstanding todos	2021-11-26 10:28:49 -08:00
screenshot.jpg	add a screenshot and rearrange the readme a bit	2021-11-12 21:36:45 -08:00
serve.py	and i think that's it, we now support user accounts (lite)git commit -m 'and i think that\'s it, we now support user accounts litegit status sweet.'! sweet.	2021-11-26 16:38:36 -08:00

README.md

arxiv-sanity-lite

(WIP)

A much lighter-weight arxiv-sanity re-write. Currently runs only locally and doesn't exist as a website on the internet. However, the code is in a semi "feature-complete" state in the sense that you can look through arxiv papers, tag any of them arbitrarily, and then arxiv-sanity-lite recommends similar papers for each tag based on SVM on tfidf vectors constructed from the paper abstracts. So that's pretty cool, I find this personally plenty useful already, and it may be useful to you as well!

I hope to make this good over time and once it's ready to also host it publicly, deprecating the current bloated arxiv-sanity in favor of this new format. The biggest remaining todo's are adding user accounts and making everything nicer, faster, and more scalable as the number of papers in the database grows.

To run

(Periodically) run arxiv_daemon.py to add recent papers from arxiv to the database.
Then run compute.py to re-calculate tfidf features on the paper abstracts and save those to database.
Finally run serve.py to start the server and access the frontend layer over the data, e.g.: export FLASK_APP=serve.py; flask run.

todos

add user accounts so we can shipit
the metas table should not be a sqlitedict but a proper sqlite table, for efficiency
build a reverse index to support faster search, right now we iterate through the entire database

License

MIT