All Versions
10
Latest Version
Avg Release Cycle
92 days
Latest Release
153 days ago

Changelog History

  • v1.1.8

    June 13, 2019

    🐛 Bug fixes

    • #375 - Cache doesn't work as expected
  • v1.1.7

    April 09, 2019

    Important notice

    ⬇️ Dropped support for Node 4. The lowest supported version is now 6.13.0. This is due to requirements of some dependencies.

    🆕 New features

    • The README is now generated using jsdoc-to-markdown.

    🐛 Bug fixes

    • #419 - srcset source termination
    • #432 - Add a replacement in cleanURL
    • #439 - Links getting skipped due to escape sequence in href
    • #441 - Pattern for CSS url() syntax matches wrongly also some JS url() function calls
    • #443 - Oldest unfetched item duplicates
    • #447 - Fix TypeError ERR_INVALID_CALLBACK on fs.writeFile for node.js v10
  • v1.1.6

    October 06, 2017

    🆕 New features

    • Sitemap directives in /robots.txt are now added to the queue if Crawler#respectRobotsTxt is truthy.

    🐛 Bug fixes

    • #398 - fix issue where multiple cookies weren't properly serialized for outbound requests
    • 📜 #400 - fix issue where <meta name="robots"> tags weren't properly parsed
  • v1.1.5

    August 15, 2017

    Administrative

    • Welcome @konstantinblaesi! We invited a few people who have made recent and significant contributions to come on board as simplecrawler collaborators. @konstantinblaesi heeded our call and has already submitted several PR's, including an ambitious upstream patch!
    • 👀 simplecrawler now has its own GitHub organisation and lives at simplecrawler/simplecrawler. This enables more fine tuned access controls for current and future collaborators. If you are interested in joining us, see #388.

    Important notice

    • 👀 We have dropped support for node 0.12. The lowest supported version is now 4.x. This enables us to use more modern language features and will hopefully enable more patches soon. See #382 for more details.

    🐛 Bug fixes

    • 🐎 #364 - improved performance of Crawler#cleanExpandResources. See #382 for relevant patch
    • 👀 #385 and #363 - @konstantinblaesi submitted an upstream patch to URI.js that employs the same validation logic when calling the URI constructor as when calling the hostname and port methods. See #393 and medialize/URI.js#345 for relevant patches
  • v1.1.4

    July 16, 2017

    🐛 Bug fixes

    • #377 - fixed multiple issues with Crawler#removeFetchCondition and Crawler#removeDownloadCondition. Previously, those methods promised to throw an error if they couldn't find the fetch/download condition that was targeted, but they did not. The previous system for condition ID's was not stable either, since they targeted indexes in an array would change length. Crawler#removeFetchCondition also had a bug where it looked for fetch condition references in the Crawler#_downloadConditions array rather than Crawler#_fetchConditions. All of these issues have been fixed now. Thanks to @venning for a great bug report and PR!
  • v1.1.3

    June 21, 2017

    🆕 New features

    • #376 - added Crawler#sortQueryParameters option. This option will sort the query parameters in a URL, making it simple to avoid fetching duplicate pages on sites that use a lot of query parameters. Thanks @HaroldPutman!
  • v1.1.2

    June 21, 2017

    🐛 Bug fixes

    • 🛠 #371 - fixed an issue where custom cache back-ends would not be properly type checked
  • v1.1.1

    March 19, 2017

    🐛 Bug fixes

    • ⚡️ #360 - updated README to correctly reflect new async fetch/download conditions API
    • ⚡️ #353 - updated default srcset discovery function to be more permissive
    • 🚚 #357 - ensure that the port parameter is properly removed from the Crawler#getRequestOptions return object when using a custom HTTP agent
  • v1.1.0

    March 10, 2017

    🆕 New features

    • ➕ Added the ability to make both fetch conditions and download conditions async. This change also deprecates the previous synchronous behavior (we will be removing it in the next major release). This suggestion was originally brought up by @maxcorbeau in #345
  • v1.0.5

    March 10, 2017

    🐛 Bug fixes

    • 🛠 #339 - fixed a regression in FetchQueue#defrost that would trick the queue into skipping over unfetched queue items when asked for the oldest unfetched one. Thanks @cival for a PR!