It saves snapshots of the URLs you feed it in several formats: HTML, PDF, PNG screenshots, WARC, and more out-of-the-box, with a wide variety of content extracted and preserved automatically (article text, audio/video, git repos, etc.). You can feed it URLs one at a time, or schedule regular imports from browser bookmarks or history, feeds like RSS, bookmark services like Pocket/Pinboard, and more. You can set it up as a command-line tool, web app, and desktop app (alpha), on Linux, macOS, and Windows. In case you have multiple archives, create a no.json file for each of them.Roadmap "Your own personal internet archive" (网站存档 / 爬虫)ĪrchiveBox is a powerful, self-hosted internet archiving solution to collect, save, and view sites you want to preserve offline. Where * matches 0 or more anything, and ? matches 0 or 1 anything. This is done by creating a file called no.json inside the 22120-arc folder (by default, this is in your home directory on Linux) with the following contents: The tool also offers the posibility of blacklisting some domains, so they are not archived. Pages saved for offline browsing by 22120Ģ2120 supports some command line arguments that allow changing the server port, launch it in save or serve mode, specify a different Chrome port, and specify the library path: There's also an option to search your archive, but this feature does not work yet. ![]() Configurationīesides setting 22120 in save or serve mode, from the 22120 settings page ( you can also set the system path of the archive (which on Linux defaults to a folder called 22120-arc in the home directory) and view the archive index - a list of URLs saved as your offline Internet library. Then you'll need to open a new tab and open to control 22120. for Chromium (the binary name may be different, depending on the Linux distribution you use - it may also be just chromium):Ĭhromium-browser -remote-debugging-port=9222 This is possible by launching the Chromium-based web browser of your choice using the -remote-debugging-port=9222 command line argument, e.g. ![]() I struggled a bit to find out how to use 22120 with a Chromium-based web browser that's not Google Chrome because this information seems to be missing from its documentation. The developer is also investigating archiving streaming content, like audio and video, and web sockets.Ĭlose Google Chrome, launch the 22120 binary and it will automatically launch Google Chrome with a new tab pointing to the 22120 local url ( from where you can set the web browser in either save (to save the pages you visit from this moment going forward) mode or serve (to serve locally cached web pages) mode. On the 22120 roadmap there's the ability to search the offline archive, a library server to serve the archive publicly, and distributed p2p web browser on IPFS. In case you use an ad blocker, it's worth noting that 22120 will not archive the blocked elements. You don't have to use it all the time - launch it when you want to save some web pages for offline browsing, then close it until you want to save more pages or visit the pages you've saved in offline mode.Īlong with the web pages you visit, this tool also archives the web page resources exactly as they are sent to the browser, except for video, audio and websockets, at least for now. ![]() Using 22120 allows you to browse websites that you've previously visited, completely offline, without your web browser noticing the difference.
0 Comments
Leave a Reply. |