kanshan/arxiv-sanity-lite

Fork 0

arxiv-sanity lite: tag arxiv papers of interest get recommendations of similar papers in a nice UI using SVMs over tfidf feature vectors based on paper abstracts.

arxiv deep-learning flask machine-learning

Go to file

Andrej Karpathy e4fe77d118 show user warning if they are not logged in that things won't work		2021-11-26 20:57:20 -08:00
aslite	when writing features do it safely and atomically	2021-11-26 20:00:37 -08:00
data	ok now we can sequester all the database files into data/ folder so everything is nice and clean yay	2021-11-25 13:47:45 -08:00
static	show user warning if they are not logged in that things won't work	2021-11-26 20:57:20 -08:00
templates	show user warning if they are not logged in that things won't work	2021-11-26 20:57:20 -08:00
.gitignore	and i think that's it, we now support user accounts (lite)git commit -m 'and i think that\'s it, we now support user accounts litegit status sweet.'! sweet.	2021-11-26 16:38:36 -08:00
arxiv_daemon.py	use the process exit code to communicate whether any updates successfully made it into the database at all	2021-11-26 20:19:48 -08:00
compute.py	when writing features do it safely and atomically	2021-11-26 20:00:37 -08:00
LICENSE	Initial commit	2021-11-12 20:34:22 -08:00
Makefile	example makefile	2021-11-25 13:51:52 -08:00
README.md	update the readme	2021-11-26 20:31:57 -08:00
screenshot.jpg	update the screenshot since the interface changed quite a bit	2021-11-26 20:33:10 -08:00
serve.py	and i think that's it, we now support user accounts (lite)git commit -m 'and i think that\'s it, we now support user accounts litegit status sweet.'! sweet.	2021-11-26 16:38:36 -08:00

README.md

arxiv-sanity-lite

(WIP)

A much lighter-weight arxiv-sanity from-scratch re-write. Periodically polls arxiv API for new papers of interest and adds them to a database. Then allows a user to tag papers of interest with arbitrary tags, and recommends new papers for each tag based on SVMs running on tfidf features of paper abstracts. Allows one to search, rank, sort, slice and dice these results. Create your own tags, track recent arxiv papers in your area, and don't miss out!

I am running a live version of this code on arxiv-sanity-lite.com.

To run

To run this locally I usually run the following script to update the database with any new papers. I typically schedule this via a periodic cron job:

#!/bin/bash

python3 arxiv_daemon.py --num 2000

if [ $? -eq 0 ]; then
    echo "New papers detected! Running compute.py"
    python3 compute.py
else
    echo "No new papers were added, skipping feature computation"
fi

You can see that updating the database is a matter of first downloading the new papers via the arxiv api using arxiv_daemon.py, and then running compute.py to compute the tfidf features of the papers. Finally to serve the flask server locally we'd run something like:

export FLASK_APP=serve.py; flask run

All of the database will be stored inside the data directory. Finally, if you'd like to run your own instance on the interwebs I recommend simply running the above on a Linode, e.g. I am running this code currently on the smallest "Nanode 1 GB" instance indexing about 30K papers, which costs $5/month.

todos

I need a proper requirements.txt and such
The metas table should not be a sqlitedict but a proper sqlite table, for efficiency
Build a reverse index to support faster search, right now we iterate through the entire database

License

MIT