sponge is a website crawler and links downloader command-line tool

Overview

sponge - A website crawler and links downloader command line tool Build Status Codacy Badge Download sponge Awesome Kotlin Badge Twitter URL License

How to build and run it?

You will need a Java JDK 8+ and maven 3.3.9 or above.

mvn clean package assembly:single

cd target && unzip sponge-X.X-SNAPSHOT.zip && cd sponge-X.X-SNAPSHOT

./sponge [OPTIONS]

How to use it?

Usage: sponge [OPTIONS]

Options:
  -u, --uri VALUE                 URI (example: https://www.google.com)
  -o, --output PATH               Output directory where files are downloaded
  -t, --mime-type TEXT            Mime types to download (example: text/plain)
  -e, --file-extension TEXT       Extensions to download (example: png)
  -d, --depth INT                 Search depth (default: 1)
  -m, --max-uris INT              Maximum uris to visit (default: 1000000)
  -s, --include-subdomains        Include subdomains
  -R, --concurrent-requests INT   Concurrent requests (default: 1)
  -D, --concurrent-downloads INT  Concurrent downloads (default: 1)
  -r, --referrer TEXT             Referrer (default: https://www.google.com)
  -U, --user-agent TEXT           User agent (default: Mozilla/5.0 (X11; Linux
                                  x86_64) AppleWebKit/537.36 (KHTML, like
                                  Gecko) Chrome/80.0.3987.132 Safari/537.36)
  -O, --overwrite                 Overwrite existing files
  -v, --version                   Show the version and exit
  -h, --help                      Show this message and exit

Examples

./sponge -u https://freemusicarchive.org/genre/Blues \
         -o output \
         -e mp3 \
         -d 2 \
         -R 5 \
         -s
...
↓ /home/spypunk/output/files.freemusicarchive.org/storage-freemusicarchive-org/music/no_curator/Checkie_Brown/hey/Checkie_Brown_-_09_-_Mary_Roose_CB_36.mp3 [8 MB] [4145.71 kB/s] [1/19]
↓ /home/spypunk/output/files.freemusicarchive.org/storage-freemusicarchive-org/music/no_curator/Lobo_Loco/Not_my_Brain/Lobo_Loco_-_01_-_Brain_ID_1270.mp3 [8 MB] [4065.18 kB/s] [2/42]
↓ /home/spypunk/output/files.freemusicarchive.org/storage-freemusicarchive-org/music/no_curator/Lobo_Loco/Not_my_Brain/Lobo_Loco_-_02_-_Brain_-_Instrumental_Retro_ID_1271.mp3 [8 MB] [4216.95 kB/s] [3/56]
↓ /home/spypunk/output/files.freemusicarchive.org/storage-freemusicarchive-org/music/no_curator/Lobo_Loco/Salad_Mixed/Lobo_Loco_-_12_-_Madness_is_Everywhere_ID_1228.mp3 [9 MB] [4178.34 kB/s] [4/65]
↓ /home/spypunk/output/files.freemusicarchive.org/storage-freemusicarchive-org/music/no_curator/Lobo_Loco/Salad_Mixed/Lobo_Loco_-_01_-_Allright_in_Lousiana_ID_1234.mp3 [8 MB] [3843.75 kB/s] [5/84]
↓ /home/spypunk/output/files.freemusicarchive.org/storage-freemusicarchive-org/music/no_curator/Lobo_Loco/Salad_Mixed/Lobo_Loco_-_04_-_Peaceful_Morning_ID_1229.mp3 [9 MB] [2889.31 kB/s] [6/85]
↓ /home/spypunk/output/files.freemusicarchive.org/storage-freemusicarchive-org/music/no_curator/Lobo_Loco/Salad_Mixed/Lobo_Loco_-_03_-_Spencer_-_Bluegrass_ID_1230.mp3 [9 MB] [3951.36 kB/s] [7/94]
↓ /home/spypunk/output/files.freemusicarchive.org/storage-freemusicarchive-org/music/no_curator/My_Yearnings/You_get_the_Blues_ID_1201.mp3 [10 MB] [3990.58 kB/s] [8/101]
↓ /home/spypunk/output/files.freemusicarchive.org/storage-freemusicarchive-org/music/no_curator/My_Yearnings/Tropic_Island_-_Clearmix_ID_1172.mp3 [8 MB] [4146.20 kB/s] [9/101]
↓ /home/spypunk/output/files.freemusicarchive.org/storage-freemusicarchive-org/music/no_curator/My_Yearnings/Traveling_Horse_ID_1207.mp3 [6 MB] [3411.93 kB/s] [10/101]
...
./sponge -u https://www.gutenberg.org/ebooks/search/?sort_order=release_date \
         -o output \
         -t text/plain \
         -d 2 \
         -R 5 \
         -D 5
...
↓ /home/spypunk/output/www.gutenberg.org/files/61671/61671-0.txt [34 KB] [202.73 kB/s] [1/1]
↓ /home/spypunk/output/www.gutenberg.org/files/61673/61673-0.txt [363 KB] [778.25 kB/s] [2/3]
↓ /home/spypunk/output/www.gutenberg.org/files/61667/61667-0.txt [280 KB] [359.20 kB/s] [3/4]
↓ /home/spypunk/output/www.gutenberg.org/files/61672/61672-0.txt [953 KB] [149.74 kB/s] [4/4]
↓ /home/spypunk/output/www.gutenberg.org/files/61666/61666-0.txt [866 KB] [438.76 kB/s] [5/6]
↓ /home/spypunk/output/www.gutenberg.org/files/61662/61662-0.txt [556 KB] [625.44 kB/s] [6/6]
↓ /home/spypunk/output/www.gutenberg.org/files/61670/61670-0.txt [140 KB] [397.26 kB/s] [7/7]
↓ /home/spypunk/output/www.gutenberg.org/files/61665/61665-0.txt [74 KB] [277.85 kB/s] [8/8]
↓ /home/spypunk/output/www.gutenberg.org/ebooks/61664.txt.utf-8 [388 KB] [801.17 kB/s] [9/9]
↓ /home/spypunk/output/www.gutenberg.org/files/61661/61661-0.txt [142 KB] [397.52 kB/s] [10/10]
...
./sponge -u https://free-images.com/  \
         -o output \
         -e jpeg \
         -e jpg \
         -e png \
         -d 2 \
         -R 5 \
         -D 5 \
         -s
...
↓ /home/spypunk/output/free-images.com/sm/de66/bees_in_hive.jpg [24 KB] [11229.43 kB/s] [283/300]
↓ /home/spypunk/output/free-images.com/sm/602e/bees_on_flowers_collecting.jpg [15 KB] [70732.48 kB/s] [284/300]
↓ /home/spypunk/output/free-images.com/sm/7491/bees_on_yellow_flowers.jpg [14 KB] [54516.06 kB/s] [285/300]
↓ /home/spypunk/output/free-images.com/sm/e081/bees_pollenating_basil.jpg [15 KB] [57209.33 kB/s] [286/300]
↓ /home/spypunk/output/free-images.com/sm/7e8b/bees_pollenating_insects_bugs.jpg [13 KB] [52990.93 kB/s] [287/300]
↓ /home/spypunk/output/free-images.com/sm/a8ad/bees_pollen_insects_wings.jpg [7 KB] [17312.71 kB/s] [288/300]
↓ /home/spypunk/output/free-images.com/sm/9b0c/bees_really_like_pollinating_0.jpg [11 KB] [66770.38 kB/s] [289/300]
↓ /home/spypunk/output/free-images.com/sm/2d32/delicate_arch_utah_arches.jpg [8 KB] [44408.98 kB/s] [290/300]
↓ /home/spypunk/output/free-images.com/sm/bf50/landscape_arch.jpg [16 KB] [73487.99 kB/s] [291/300]
↓ /home/spypunk/output/free-images.com/sm/0f6a/golden_arches_omaha.jpg [10 KB] [65179.10 kB/s] [292/300]
...

What about license?

This project is licensed under the WTFPL (Do What The Fuck You Want To Public License, Version 2)

WTFPL

Copyright © 2019-2020 spypunk [email protected]

This work is free. You can redistribute it and/or modify it under the terms of the Do What The Fuck You Want To Public License, Version 2, as published by Sam Hocevar. See the COPYING file for more details.

You might also like...
Utility tool to make you a computer ninja.

Cmd Window Never spend 6 minutes doing something by hand when you can spend 6 hours failing to automate it - Zhuowej Zhang What is this about? This to

Tool to look for several security related Android application vulnerabilities

Quick Android Review Kit This tool is designed to look for several security related Android application vulnerabilities, either in source code or pack

gRPC and protocol buffers for Android, Kotlin, and Java.

Wire “A man got to have a code!” - Omar Little See the project website for documentation and APIs. As our teams and programs grow, the variety and vol

General purpose utilities and hash functions for Android and Java (aka java-common)

Essentials Essentials are a collection of general-purpose classes we found useful in many occasions. Beats standard Java API performance, e.g. LongHas

Access and process various types of personal data in Android with a set of easy, uniform, and privacy-friendly APIs.
Access and process various types of personal data in Android with a set of easy, uniform, and privacy-friendly APIs.

PrivacyStreams PrivacyStreams is an Android library for easy and privacy-friendly personal data access and processing. It offers a functional programm

A library for fast and safe delivery of parameters for Activities and Fragments.

MorbidMask - 吸血面具 Read this in other languages: 中文, English, Change Log A library for fast and safe delivery of parameters for Activities and Fragment

A simple and easy to use stopwatch and timer library for android

TimeIt Now with Timer support! A simple and easy to use stopwatch and timer library for android Introduction A stopwatch can be a very important widge

Trail is a simple logging system for Java and Android. Create logs using the same API and the library will detect automatically in which platform the code is running.

Trail Trail is a simple logging system for Java and Android. Create logs using the same API and the library will detect automatically in which platfor

General purpose utilities and hash functions for Android and Java (aka java-common)

Essentials Essentials are a collection of general-purpose classes we found useful in many occasions. Beats standard Java API performance, e.g. LongHas

Comments
  • Issue Building

    Issue Building

    OS: Ubuntu 18.04.4 LTS x86_64 OpenJDK Version: java-8-openjdk-amd64 Maven Version 3.6.0

    All I did was git clone; cd into directory and issue mvn -e clean package assembly:single

    Thank you for your time.

    [INFO] --- kotlin-maven-plugin:1.3.70:test-compile (test-compile) @ sponge ---
    [ERROR] /home/username/Documents/repos/sponge/src/test/kotlin/spypunk/sponge/SpongeServiceTest.kt: (27, 40) Unresolved reference: of
    [ERROR] /home/username/Documents/repos/sponge/src/test/kotlin/spypunk/sponge/SpongeTest.kt: (27, 40) Unresolved reference: of
    [INFO] ------------------------------------------------------------------------
    [INFO] BUILD FAILURE
    [INFO] ------------------------------------------------------------------------
    [INFO] Total time:  10.541 s
    [INFO] Finished at: 2020-03-19T15:13:59-04:00
    [INFO] ------------------------------------------------------------------------
    [ERROR] Failed to execute goal org.jetbrains.kotlin:kotlin-maven-plugin:1.3.70:test-compile (test-compile) on project sponge: Compilation failure: Compilation failure: 
    [ERROR] /home/username/Documents/repos/sponge/src/test/kotlin/spypunk/sponge/SpongeServiceTest.kt:[27,40] Unresolved reference: of
    [ERROR] /home/username/Documents/repos/sponge/src/test/kotlin/spypunk/sponge/SpongeTest.kt:[27,40] Unresolved reference: of
    [ERROR] -> [Help 1]
    org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.jetbrains.kotlin:kotlin-maven-plugin:1.3.70:test-compile (test-compile) on project sponge: Compilation failure
        at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:215)
        at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:156)
        at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:148)
        at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117)
        at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81)
        at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:56)
        at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:128)
        at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:305)
        at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192)
        at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105)
        at org.apache.maven.cli.MavenCli.execute (MavenCli.java:956)
        at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:288)
        at org.apache.maven.cli.MavenCli.main (MavenCli.java:192)
        at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke (Method.java:498)
        at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:289)
        at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:229)
        at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:415)
        at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:356)
    Caused by: org.jetbrains.kotlin.maven.KotlinCompilationFailureException: Compilation failure
        at org.jetbrains.kotlin.maven.MavenPluginLogMessageCollector.throwKotlinCompilerException (MavenPluginLogMessageCollector.java:111)
        at org.jetbrains.kotlin.maven.KotlinCompileMojoBase.execute (KotlinCompileMojoBase.java:212)
        at org.jetbrains.kotlin.maven.K2JVMCompileMojo.execute (K2JVMCompileMojo.java:222)
        at org.jetbrains.kotlin.maven.KotlinTestCompileMojo.execute (KotlinTestCompileMojo.java:84)
        at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:137)
        at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:210)
        at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:156)
        at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:148)
        at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117)
        at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81)
        at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:56)
        at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:128)
        at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:305)
        at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192)
        at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105)
        at org.apache.maven.cli.MavenCli.execute (MavenCli.java:956)
        at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:288)
        at org.apache.maven.cli.MavenCli.main (MavenCli.java:192)
        at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke (Method.java:498)
        at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:289)
        at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:229)
        at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:415)
        at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:356)
    [ERROR] 
    [ERROR] Re-run Maven using the -X switch to enable full debug logging.
    [ERROR] 
    [ERROR] For more information about the errors and possible solutions, please read the following articles:
    [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
    
    bug 
    opened by sosukeinu 2
  • Bump jsoup from 1.13.1 to 1.14.2

    Bump jsoup from 1.13.1 to 1.14.2

    Bumps jsoup from 1.13.1 to 1.14.2.

    Release notes

    Sourced from jsoup's releases.

    jsoup 1.14.2

    Caught by the fuzz! jsoup 1.14.2 is out now, and includes a set of parser bug fixes and improvements for handling rough HTML and XML, as identified by the Jazzer JVM fuzzer. This release also includes other fixes and improvements.

    See the release announcement for the full changelog.

    jsoup 1.14.1

    jsoup 1.14.1 is out now, with simple request session management, increased parse robustness, and a ton of other improvements, speed-ups, and bug fixes.

    See the full announcement for all the details on what's changed.

    Changelog

    Sourced from jsoup's changelog.

    jsoup changelog

    *** Release 1.14.2 [PENDING]

    • Improvement: support Pattern.quote \Q and \E escapes in the selector regex matchers. jhy/jsoup#1536

    • Improvement: Element.absUrl() now supports tel: URLs, and other URLs that are already absolute but that Java does not have input stream handlers for. jhy/jsoup#1610

    • Bugfix: when serializing output, escape characters that are in the < 0x20 range. This improves XML output compatibility, and makes HTML output with these characters easier to read (as they're otherwise invisible). jhy/jsoup#1556

    • Bugfix: the *|el wildcard namespace selector now also matches elements with no namespace. jhy/jsoup#1565

    • Bugfix: corrected a potential case of the parser input stream not being closed immediately on a read exception.

    • Bugfix: when making a HTTP POST, if the request write fails, make sure the connection is immediately cleaned up.

    • Bugfix: in the XML parser, XML processing instructions without attributes would be serialized as if they did. jhy/jsoup#770

    • Bugfix: updated the HtmlTreeParser resetInsertionMode to the current spec for supported elements. jhy/jsoup#1491

    • Bugfix: fixed an NPE when parsing fragment HTML into a standalone table element. jhy/jsoup#1603

    • Bugfix: fixed an NPE when parsing fragment heading HTML into a standalone p element. jhy/jsoup#1601

    • Bugfix: fixed an IOOB when parsing a formatting fragment into a standalone p element. jhy/jsoup#1602

    • Bugfix: tag names must start with an ascii-alpha character. jhy/jsoup#1006

    • Bugfix [Fuzz]: fixed a slow parse when a tag or an attribute name has thousands of null characters in it. jhy/jsoup#1580

    • Bugfix [Fuzz]: the adoption agency algorithm can have an incorrect bookmark position jhy/jsoup#1576

    • Bugfix [Fuzz]: malformed HTML could result in null elements on stack jhy/jsoup#1579

    • Bugfix [Fuzz]: malformed deeply nested table elements could create a stack overflow. jhy/jsoup#1577

    ... (truncated)

    Commits
    • 19c7732 [maven-release-plugin] prepare release jsoup-1.14.2
    • acde180 Compress harder
    • 530c5b0 Refactored fuzz tests to iterate all files in directory; run timeout tests
    • d2c455c Speed improvement: cap number of cloned active formatting elements
    • 0dcb53a Correctly consume to exit state
    • 42da864 In bogusComment, make sure unconsume not called after a potential buffer up
    • eba3e39 Fix an IOOB when HTML root cleared and then attributes added
    • 9d538e6 Annotate some nullables
    • a909600 Comment on URL normalization
    • 6b3ec64 Removed unused import
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Bump httpclient from 4.5.12 to 4.5.13

    Bump httpclient from 4.5.12 to 4.5.13

    Bumps httpclient from 4.5.12 to 4.5.13.

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Bump jsoup from 1.14.2 to 1.15.3

    Bump jsoup from 1.14.2 to 1.15.3

    Bumps jsoup from 1.14.2 to 1.15.3.

    Release notes

    Sourced from jsoup's releases.

    jsoup 1.15.3

    jsoup 1.15.3 is out now, and includes a security fix for potential XSS attacks, along with other bug fixes and improvements, including more descriptive validation error messages.

    Details:

    jsoup 1.15.2 is out now with a bunch of improvements and bug fixes.

    jsoup 1.15.1 is out now with a bunch of improvements and bug fixes.

    jsoup 1.14.3

    jsoup 1.14.3 is out now, adding native XPath selector support, improved \<template> support, and also includes a bunch of bug fixes, improvements, and performance enhancements.

    See the release announcement for the full changelog.

    Changelog

    Sourced from jsoup's changelog.

    jsoup changelog

    Release 1.15.3 [2022-Aug-24]

    • Security: fixed an issue where the jsoup cleaner may incorrectly sanitize crafted XSS attempts if SafeList.preserveRelativeLinks is enabled. https://github.com/jhy/jsoup/security/advisories/GHSA-gp7f-rwcx-9369

    • Improvement: the Cleaner will preserve the source position of cleaned elements, if source tracking is enabled in the original parse.

    • Improvement: the error messages output from Validate are more descriptive. Exceptions are now ValidationExceptions (extending IllegalArgumentException). Stack traces do not include the Validate class, to make it simpler to see where the exception originated. Common validation errors including malformed URLs and empty selector results have more explicit error messages.

    • Bugfix: the DataUtil would incorrectly read from InputStreams that emitted reads less than the requested size. This lead to incorrect results when parsing from chunked server responses, for e.g. jhy/jsoup#1807

    • Build Improvement: added implementation version and related fields to the jar manifest. jhy/jsoup#1809

    *** Release 1.15.2 [2022-Jul-04]

    • Improvement: added the ability to track the position (line, column, index) in the original input source from where a given node was parsed. Accessible via Node.sourceRange() and Element.endSourceRange(). jhy/jsoup#1790

    • Improvement: added Element.firstElementChild(), Element.lastElementChild(), Node.firstChild(), Node.lastChild(), as convenient accessors to those child nodes and elements.

    • Improvement: added Element.expectFirst(cssQuery), which is just like Element.selectFirst(), but instead of returning a null if there is no match, will throw an IllegalArgumentException. This is useful if you want to simply abort processing if an expected match is not found.

    • Improvement: when pretty-printing HTML, doctypes are emitted on a newline if there is a preceding comment. jhy/jsoup#1664

    • Improvement: when pretty-printing, trim the leading and trailing spaces of textnodes in block tags when possible, so that they are indented correctly. jhy/jsoup#1798

    • Improvement: in Element#selectXpath(), disable namespace awareness. This makes it possible to always select elements by their simple local name, regardless of whether an xmlns attribute was set. jhy/jsoup#1801

    • Bugfix: when using the readToByteBuffer method, such as in Connection.Response.body(), if the document has not already been parsed and must be read fully, and there is any maximum buffer size being applied, only the default internal buffer size is read. jhy/jsoup#1774

    ... (truncated)

    Commits
    • c596417 [maven-release-plugin] prepare release jsoup-1.15.3
    • d2d9ac3 Changelog for URL cleaner improvement
    • 4ea768d Strip control characters from URLs when resolving absolute URLs
    • 985f1fe Include help link for malformed URLs
    • 6b67d05 Improved Validate error messages
    • 653da57 Normalized API doc link
    • 5ed84f6 Simplified the Test Server startup
    • c58112a Set the read size correctly when capped
    • fa13c80 Added jar manifest default implementation entries.
    • 5b19390 Bump maven-resources-plugin from 3.2.0 to 3.3.0 (#1814)
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
Releases(1.46)
Owner
spypunk
spypunk
Small command-line utility to safely merge multiple WhatsApp backups (msgstore.db) into one.

whatsapp-database-merger A small command-line utility that can be used to merge multiple WhatsApp databases (msgstore.db) into one. A few notes: input

Mattia Iavarone 31 Dec 27, 2022
A command line utility to help you investigate the sensitive data associated with Macie findings.

Macie Finding Data Reveal This project contains a command line utility to help you investigate the sensitive data associated with Macie findings.

AWS Samples 8 Nov 16, 2022
LearningRegex - Parse links from text via RegEx

Parse links from text via RegEx Supported types: Hashtags Urls emails Using in p

Boris 0 Feb 16, 2022
Command framework built around Kord, built to be robust and scalable, following Kord's convention and design patterns.

Command framework built around Kord, built to be robust and scalable, following Kord's convention and design patterns.

ZeroTwo Bot 4 Jun 15, 2022
Handy library to send & receive command with payload between subscribers for Android.

Commander Handy library to send & receive command with payload between subscribers for Android. Features Subscription based No dependency on Framework

Romman Sabbir 3 Oct 19, 2021
An easy-to-use, cross-platform measurement tool that pulls data out of CD pipelines and analysis the four key metrics for you.

Maintained by SEA team, ThoughtWorks Inc. Read this in other languages: English, 简体中文 Table of Contents About the Project Usage How to Compute Contrib

Thoughtworks 277 Jan 7, 2023
BinGait is a tool to disassemble and view java class files, developed by BinClub.

BinGait Tool to diassemble java class files created by x4e. Usage To run BinGait, run java -jar target/bingait-shadow.jar and BinGait will launch. If

null 18 Jul 7, 2022
A tool to validate text inside TextInputLayout

Download dependencies { implementation 'com.github.anderscheow:validator:2.2.1' } Usage Available rules LengthRule MaxRule MinRule NotEmptyRule NotN

Anders Cheow 129 Nov 25, 2022
recompose is a tool for converting Android layouts in XML to Kotlin code using Jetpack Compose.

recompose is a tool for converting Android layouts in XML to Kotlin code using Jetpack Compose.

Sebastian Kaspari 565 Jan 2, 2023
An intelligent diff tool for the output of Gradle's dependencies task

Dependency Tree Diff An intelligent diff tool for the output of Gradle's dependencies task which always shows the path to the root dependency. +--- c

Jake Wharton 693 Dec 26, 2022