Datasets ▶ Nexus/STC [nexusstc]
If you are interested in mirroring this dataset for archival or LLM training purposes, please contact us.
Overview from datasets page.
Source Metadata Files
Nexus/STC [nexusstc]
✅ Summa database available through IPFS, though can be slow to download or directly interact with.
👩‍💻 Anna’s Archive manages a collection of Nexus/STC metadata, through this code.
✅ Data can be replicated through Iroh.
❌ No mirroring by Anna’s Archive or partner servers yet.

Nexus/STC is a sort of continuation of Sci-Hub, started in 2021. It focuses primarily on academic papers, and is built on distributed web technologies such as IPFS, Iroh, and Summa. It also has a particular focus on AI, machine learning, and large language models (LLMs).

“Nexus” is the name for the community, and seems to encompass various tools, of which STC is one. “STC” (Standard Template Construct) is the actual library and search engine for academic papers.

They often refer to the combination “Nexus/STC”, which we will do as well. This is particularly helpful becaue “nexus” is a common word, “Science Nexus” (the name of their subreddit) is also the name of a concept in the videogame Stellaris, and “STC” or “Standard Template Construct” refers to a concept in the board game Warhammer 40,000 (“a computer database said to have contained the sum total of human scientific and technological knowledge”).

Nexus/STC seems to be mainly run by one individual, who goes by the name of “Ultranymous”, “ultra_nymous”, “superpirate”, or “the_superpirate”.

At this point we have only integrated their metadata. For this we pull their Summa database (using this code), and repackage it in our Anna’s Archive Containers format. The resulting file can be downloaded on our Nexus/STC torrents page. To mirror the Nexus/STC content files, see their replication page.

As far as we can tell, all Nexus/STC records have either an MD5 hash, a CID (IPFS download hash), both, or neither. To accomodate for all these combinations, we index all Nexus/STC records in the Metadata section of our search page, through /nexusstc/<nexus_id> URLs. Files with an MD5 are represented in the regular Download and Journal articles sections, through our standard /md5/<md5> URLs. Files without an MD5 but with CID are also represented in those sections, but through /nexusstc_download/<nexus_id> URLs.

Resources