A Kotlin-based testing/scraping/parsing library providing the ability to analyze and extract data from HTML

Overview

Documentation maven central skrape-it @ kotlinlang.slack.com License

master build status last commit Codecov JDK

contributions welcome GitHub contributors Donate Awesome Kotlin Badge

skrape{it}

skrape{it} is a Kotlin-based HTML/XML testing and web scraping library that can be used seamlessly in Spring-Boot, Ktor, Android or other Kotlin-JVM projects. The ability to analyze and extract HTML including client-side rendered DOM trees and all other XML-related markup specifications such as SVG, UML, RSS,... makes it unique. It places particular emphasis on ease of use and a high level of readability by providing an intuitive DSL. First and foremost skrape{it} aims to be a testing tool (not tied to a particular test runner), but it can also be used to scrape websites in a convenient fashion.

Features

Parsing

  • Deserialization of HTML/XML from websites, local html files and html as string to data classes / POJOs.
  • Designed to deserialize HTML but can handle any XML-related markup specifications such as SVG, UML, RSS or XML itself.
  • DSL to select html elements as well as supporting CSS query-selector syntax by string invocation.

Http-Client

  • Http-Client without verbosity and ceremony to make requests and corresponding request options like headers, cookies etc in a fluent style interface.
  • Pre-configure client regarding auth and other request settings
  • Can handle client side rendered web pages. Javascript execution results can optionally be considered in the response body.

Idomatic

  • Easy to use, idiomatic and type-safe DSL to ensure a high level of readability.
  • Build-in matchers/assertions based on infix functions to archive a very high level of readability.
  • DSL is behaving like a Fluent-Api to make data extraction/scraping as comfortable as possible.

Compatibility

  • Not bind to a specific test-runner, framework or whatever.
  • Open to use any other assertion library of your choice.
  • Open to implement your own fetcher
  • Supports non-blocking fetching / Coroutine support

Extensions

In addition, extensions for well-known testing libraries are provided to extend them with the mentioned skrape{it} functionality. Currently available:


Quick Start

Read the Docs

You'll always find the latest documentation, release notes and examples regarding official releases at https://docs.skrape.it. The README file you are reading right now provides example related to the latest master. Just use it if you won't wait for latest changes to be released. If you don't want to read that much or just want to get a rough overview on how to use skrape{it}, you can have a look at the Documentation by Example section which refers to the current master.

Installation

All our official/stable releases will be published to mavens central repository.

Add dependency

Gradle
dependencies {
    implementation("it.skrape:skrapeit:1.1.7")
}
Maven
<dependency>
    <groupId>it.skrapegroupId>
    <artifactId>skrapeitartifactId>
    <version>1.1.7version>
dependency>

using bleeding edge features before official release

We are offering snapshot releases by publishing every successful build of a commit that has been pushed to master branch. Thereby you can just install the latest implementation of skrape{it}. Be careful since these are non-official releases and may be unstable as well as breaking changes can occur at any time.

Add experimental stuff
Gradle
repositories {
    maven { url = uri("https://oss.sonatype.org/content/repositories/snapshots/") }
}
dependencies {
    implementation("it.skrape:skrapeit:0-SNAPSHOT") { isChanging = true } // version number will stay - implementation may change ...
}

// optional
configurations.all {
    resolutionStrategy {
        cacheChangingModulesFor(0, "seconds")
    }
}
Maven
<repositories>
    <repository>
        <id>snapshotid>
        <url>https://oss.sonatype.org/content/repositories/snapshots/url>
    repository>
repositories>

...

<dependency>
    <groupId>it.skrapegroupId>
    <artifactId>skrapeitartifactId>
    <version>0-SNAPSHOTversion>
dependency>

Documentation by Example

(referring to current master)

You can find further examples in the projects integration tests.

Android

We have a working Android sample using jetpack-compose in our example projects as living documentation.

Parse and verify HTML from String

welcome

first p-element

some p-element

last p-element

""") { h1 { findFirst { text toBe "welcome" } } p { withClass = "foo" findFirst { text toBe "some p-element" className toBe "foo" } } p { findAll { text toContain "p-element" } findLast { text toBe "last p-element" } } } } }">
@Test
fun `can read and return html from String`() {
    htmlDocument("""
        
            
                

welcome

first p-element

some p-element

last p-element

""") { h1 { findFirst { text toBe "welcome" } } p { withClass = "foo" findFirst { text toBe "some p-element" className toBe "foo" } } p { findAll { text toContain "p-element" } findLast { text toBe "last p-element" } } } } }

Parse HTML and extract

data class MySimpleDataClass(
    val httpStatusCode: Int,
    val httpStatusMessage: String,
    val paragraph: String,
    val allParagraphs: List<String>,
    val allLinks: List<String>
)

class HtmlExtractionService {

    fun extract() {
        val extracted = skrape(HttpFetcher) {
            request {
                url = "http://localhost:8080"
            }

            response {
                MySimpleDataClass(
                    httpStatusCode = status { code },
                    httpStatusMessage = status { message },
                    allParagraphs = document.p { findAll { eachText } },
                    paragraph = document.p { findFirst { text } },
                    allLinks = document.a { findAll { eachHref } }
                )
            }
        }
        print(extracted)
        // will print:
        // MyDataClass(httpStatusCode=200, httpStatusMessage=OK, paragraph=i'm a paragraph, allParagraphs=[i'm a paragraph, i'm a second paragraph], allLinks=[http://some.url, http://some-other.url])
    }
}

Parse HTML and extract it

= emptyList(), var allLinks: List = emptyList() ) class HtmlExtractionService { fun extract() { val extracted = skrape(HttpFetcher) { request { url = "http://localhost:8080" } extractIt { it.httpStatusCode = statusCode it.httpStatusMessage = statusMessage.toString() htmlDocument { it.allParagraphs = p { findAll { eachText }} it.paragraph = p { findFirst { text }} it.allLinks = a { findAll { eachHref }} } } } print(extracted) // will print: // MyDataClass(httpStatusCode=200, httpStatusMessage=OK, paragraph=i'm a paragraph, allParagraphs=[i'm a paragraph, i'm a second paragraph], allLinks=[http://some.url, http://some-other.url]) } }">
data class MyDataClass(
        var httpStatusCode: Int = 0,
        var httpStatusMessage: String = "",
        var paragraph: String = "",
        var allParagraphs: List<String> = emptyList(),
        var allLinks: List<String> = emptyList()
)

class HtmlExtractionService {

    fun extract() {
        val extracted = skrape(HttpFetcher) {
            request {
                url = "http://localhost:8080"
            }           

            extractIt<MyDataClass> {
                it.httpStatusCode = statusCode
                it.httpStatusMessage = statusMessage.toString()
                htmlDocument {
                    it.allParagraphs = p { findAll { eachText }}
                    it.paragraph = p { findFirst { text }}
                    it.allLinks = a { findAll { eachHref }}
                }
            }
        }
        print(extracted)
        // will print:
        // MyDataClass(httpStatusCode=200, httpStatusMessage=OK, paragraph=i'm a paragraph, allParagraphs=[i'm a paragraph, i'm a second paragraph], allLinks=[http://some.url, http://some-other.url])
    }
}

Testing HTML responses:

@Test
fun `dsl can skrape by url`() {
    skrape(HttpFetcher) {
        request {
            url = "http://localhost:8080/example"
        }       
        response {
            htmlDocument {
                // all official html and html5 elements are supported by the DSL
                div {
                    withClass = "foo" and "bar" and "fizz" and "buzz"

                    findFirst {
                        text toBe "div with class foo"

                        // it's possible to search for elements from former search results
                        // e.g. search all matching span elements within the above div with class foo etc...
                        span {
                            findAll {
                                // do something
                            }                       
                        }                   
                    }

                    findAll {
                        toBePresentExactlyTwice
                    }
                }
                // can handle custom tags as well
                "a-custom-tag" {
                    findFirst {
                        toBePresentExactlyOnce
                        text toBe "i'm a custom html5 tag"
                        text
                    }
                }
                // can handle custom tags written in css selctor query syntax
                "div.foo.bar.fizz.buzz" {
                    findFirst {
                        text toBe "div with class foo"
                    }
                }

                // can handle custom tags and add selector specificas via DSL
                "div.foo" {

                    withClass = "bar" and "fizz" and "buzz"

                    findFirst {
                        text toBe "div with class foo"
                    }
                }
            }
        }
    }
}

Scrape a client side rendered page:

fun getDocumentByUrl(urlToScrape: String) = skrape(BrowserFetcher) { // <--- pass BrowserFetcher to include rendered JS
    request { url = urlToScrape }
    response { htmlDocument { this } }
}


fun main() {
    // do stuff with the document
    println(getDocumentByUrl("https://docs.skrape.it").eachLink)
}

Scrape async

skrape{it}'s AsyncFetcher provides coroutine support

suspend fun getAllLinks(): Map<String, String> = skrape(AsyncFetcher) {
    request {
        url = "https://my-fancy.website"
    }
    response {
        htmlDocument { eachLink }
    }
}

Configure HTTP-Client:

defaults to 5000ms followRedirects = true // optional -> defaults to true userAgent = "some custom user agent" // optional -> defaults to "Mozilla/5.0 skrape.it" cookies = mapOf("some-cookie-name" to "some-value") // optional headers = mapOf("some-custom-header" to "some-value") // optional } } @Test fun `can use preconfigured client`() { myPreConfiguredClient.response { status { code toBe 200 } // do more stuff } // slightly modify preconfigured client myPreConfiguredClient.apply { request { followRedirects = false } }.response { status { code toBe 301 } // do more stuff } } }">
class ExampleTest {
    val myPreConfiguredClient = skrape(HttpFetcher) {
        // url can be a plain url as string or build by #urlBuilder
        request {
            method = Method.POST // defaults to GET
            
            url = "" // you can  either pass url as String (defaults to 'http://localhost:8080')
            url { // or build url (will respect value from url as String param)
                // thereby you can pass a url and just override or add parts
                protocol = UrlBuilder.Protocol.HTTPS // defaults to given scheme from url param (HTTP if not set)
                host = "skrape.it" // defaults to given host from url param (localhost if not set)
                port = 12345  // defaults to given port from url param (8080 if not set explicitly - none port if given url param value does noit have port) - set to -1 to remove port
                path = "/foo" // defaults to given path from url param (none path if not set)
                queryParam { // can handle adding query parameters of several types (defaults to none)
                    "foo" to "bar" // add query paramter foo=bar
                    "aaa" to false // add query paramter aaa=false
                    "bbb" to .4711 // add query paramter bbb=0.4711
                    "ccc" to 42    // add query paramter ccc=42
                    "ddd" to listOf("a", 1, null) // add query paramter ddd=a,1,null
                    +"xxx"         // add query paramter xxx (just key, no value)
                }
            }
            timeout = 5000 // optional -> defaults to 5000ms
            followRedirects = true // optional -> defaults to true
            userAgent = "some custom user agent" // optional -> defaults to "Mozilla/5.0 skrape.it"
            cookies = mapOf("some-cookie-name" to "some-value") // optional
            headers = mapOf("some-custom-header" to "some-value") // optional
        }
    }
    
    @Test
    fun `can use preconfigured client`() {
    
        myPreConfiguredClient.response {
            status { code toBe 200 }
            // do more stuff
        }
    
        // slightly modify preconfigured client
        myPreConfiguredClient.apply {
            request {
                followRedirects = false
            }
        }.response {
            status { code toBe 301 }
            // do more stuff
        }
    }
}

send request body

1) plain as string

most low level option, needs to set content-type header "by hand"
skrape(HttpFetcher) {
    request {
        url = "https://www.my-fancy.url"
        method = Method.GET
        headers = mapOf("Content-Type" to "application/json")
        body = """{"foo":"bar"}"""
    }
    response {
        htmlDocument {
            ...

2) plain text with auto added content-type header that can be optionally overwritten

skrape(HttpFetcher) {
    request {
        url = "https://www.my-fancy.url"
        method = Method.POST
        body {
            data = "just a plain text" // content-type header will automatically set to "text/plain"
            contentType = "your-custom/content" // can optionally override content-type
        }
    }
    response {
        htmlDocument {
            ...

3) with helper functions for json or xml bodies

supports json and xml autocompletion when using intellij
bar") // will automatically set content-type header to "text/xml" // or form("foo=bar") // will automatically set content-type header to "application/x-www-form-urlencoded" } } response { htmlDocument { ...">
skrape(HttpFetcher) {
    request {
        url = "https://www.my-fancy.url"
        method = Method.POST
        body {
            json("""{"foo":"bar"}""") // will automatically set content-type header to "application/json" 
            // or
            xml("
   
    bar
   ") // will automatically set content-type header to "text/xml" 
            // or
            form("foo=bar") // will automatically set content-type header to "application/x-www-form-urlencoded" 
        }
    }
    response {
        htmlDocument {
            ...

4 with on the fly created json via dsl

skrape(HttpFetcher) {
    request {
        url = "https://www.my-fancy.url"
        method = Method.POST
        body {
            // will automatically set content-type header to "application/json"
            // will create {"foo":"bar","xxx":{"a":"b","c":[1,"d"]}} as request body
            json {
                "foo" to "bar"
                "xxx" to json {
                    "a" to "b"
                    "c" to listOf(1, "d")
                }
            }
        }
    }
    response {
        htmlDocument {
            ...

5 with on the fly created form via dsl

skrape(HttpFetcher) {
    request {
        url = "https://www.my-fancy.url"
        method = Method.POST
        body {
            // will automatically set content-type header to "application/x-www-form-urlencoded"
            // will create foo=bar&xxx=1.5 as request body
            form {
                "foo" to "bar"
                "xxx" to 1.5
            }
        }
    }
    response {
        htmlDocument {
            ...

Get in touch

If you need help, have questions on how to use skrape{it} or want to discuss features please don't hesitate to use the projects discussions section on GitHub or raise an issue if you found a bug.

💖 Support the project

Skrape{it} is and always will be free and open-source. I try to reply to everyone needing help using these projects. Obviously, the development, maintenance takes time.

However, if you are using this project and be happy with it or just want to encourage me to continue creating stuff or fund the caffeine and pizzas that fuel its development, there are few ways you can do it :-

  • Starring and sharing the project 🚀 to help make it more popular
  • Giving proper credit when you use skrape{it}, tell your friends and others about it 😃
  • Sponsor Skrape{it} with a one-time donations via PayPal by just click this button → Donate or use the GitHub sponsors programm to support on a monthly basis 💖

Stargazers repo roster for @skrapeit/skrape.it

Comments
  • [BUG]  BrowserFetcher is still not working on Android

    [BUG] BrowserFetcher is still not working on Android

    Here is the error I get when using BrowseFetcher I think the error is beacuse of hunit-android

    2022-04-12 21:07:05.566 5395-5451/ir.kazemcodes.infinityreader E/AndroidRuntime: FATAL EXCEPTION: DefaultDispatcher-worker-2
        Process: ir.kazemcodes.infinityreader, PID: 5395
        java.lang.NoClassDefFoundError: Failed resolution of: Ljava/awt/datatransfer/ClipboardOwner;
            at com.gargoylesoftware.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.handleCharacters(HtmlUnitNekoDOMBuilder.java:593)
            at com.gargoylesoftware.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.startElement(HtmlUnitNekoDOMBuilder.java:303)
            at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source:146)
            at com.gargoylesoftware.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.startElement(HtmlUnitNekoDOMBuilder.java:289)
            at org.apache.xerces.parsers.AbstractXMLDocumentParser.emptyElement(Unknown Source:0)
            at net.sourceforge.htmlunit.cyberneko.HTMLTagBalancer.startElement(HTMLTagBalancer.java:812)
            at net.sourceforge.htmlunit.cyberneko.filters.DefaultFilter.startElement(DefaultFilter.java:140)
            at net.sourceforge.htmlunit.cyberneko.filters.NamespaceBinder.startElement(NamespaceBinder.java:278)
            at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scanStartElement(HTMLScanner.java:2811)
            at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2131)
            at net.sourceforge.htmlunit.cyberneko.HTMLScanner.scanDocument(HTMLScanner.java:937)
            at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:443)
            at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:394)
            at org.apache.xerces.parsers.XMLParser.parse(Unknown Source:5)
            at com.gargoylesoftware.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.parse(HtmlUnitNekoDOMBuilder.java:758)
            at com.gargoylesoftware.htmlunit.html.parser.neko.HtmlUnitNekoHtmlParser.parse(HtmlUnitNekoHtmlParser.java:204)
            at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:298)
            at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:218)
            at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:686)
            at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:588)
            at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:506)
            at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:413)
            at it.skrape.fetcher.BrowserFetcher.fetch(BrowserFetcher.kt:19)
            at org.ireader.presentation.feature_library.presentation.LibraryScreenKt$LibraryScreen$3$2$1$1$1$2$1.invokeSuspend(LibraryScreen.kt:157)
            at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
            at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
            at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:571)
            at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:750)
            at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:678)
            at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:665)
         Caused by: java.lang.ClassNotFoundException: Didn't find class "java.awt.datatransfer.ClipboardOwner" on path: DexPathList[[dex file "/data/data/ir.kazemcodes.infinityreader/code_cache/.overlay/base.apk/classes4.dex", dex file "/data/data/ir.kazemcodes.infinityreader/code_cache/.overlay/base.apk/classes11.dex", zip file "/data/app/~~frwX1pOecaUkVEBUDn-uGQ==/ir.kazemcodes.infinityreader-nayBeOZyEhA8jqrDIdfXeQ==/base.apk"],nativeLibraryDirectories=[/data/app/~~frwX1pOecaUkVEBUDn-uGQ==/ir.kazemcodes.infinityreader-nayBeOZyEhA8jqrDIdfXeQ==/lib/arm64, /data/app/~~frwX1pOecaUkVEBUDn-uGQ==/ir.kazemcodes.infinityreader-nayBeOZyEhA8jqrDIdfXeQ==/base.apk!/lib/arm64-v8a, /system/lib64, /system/system_ext/lib64]]
            at dalvik.system.BaseDexClassLoader.findClass(BaseDexClassLoader.java:207)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:379)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:312)
            at com.gargoylesoftware.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.handleCharacters(HtmlUnitNekoDOMBuilder.java:593) 
            at com.gargoylesoftware.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.startElement(HtmlUnitNekoDOMBuilder.java:303) 
            at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source:146) 
            at com.gargoylesoftware.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.startElement(HtmlUnitNekoDOMBuilder.java:289) 
            at org.apache.xerces.parsers.AbstractXMLDocumentParser.emptyElement(Unknown Source:0) 
            at net.sourceforge.htmlunit.cyberneko.HTMLTagBalancer.startElement(HTMLTagBalancer.java:812) 
            at net.sourceforge.htmlunit.cyberneko.filters.DefaultFilter.startElement(DefaultFilter.java:140) 
            at net.sourceforge.htmlunit.cyberneko.filters.NamespaceBinder.startElement(NamespaceBinder.java:278) 
            at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scanStartElement(HTMLScanner.java:2811) 
            at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2131) 
            at net.sourceforge.htmlunit.cyberneko.HTMLScanner.scanDocument(HTMLScanner.java:937) 
            at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:443) 
            at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:394) 
            at org.apache.xerces.parsers.XMLParser.parse(Unknown Source:5) 
            at com.gargoylesoftware.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.parse(HtmlUnitNekoDOMBuilder.java:758) 
            at com.gargoylesoftware.htmlunit.html.parser.neko.HtmlUnitNekoHtmlParser.parse(HtmlUnitNekoHtmlParser.java:204) 
            at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:298) 
            at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:218) 
            at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:686) 
            at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:588) 
            at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:506) 
            at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:413) 
            at it.skrape.fetcher.BrowserFetcher.fetch(BrowserFetcher.kt:19) 
            at org.ireader.presentation.feature_library.presentation.LibraryScreenKt$LibraryScreen$3$2$1$1$1$2$1.invokeSuspend(LibraryScreen.kt:157) 
            at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) 
            at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106) 
            at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:571) 
            at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:750) 
            at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:678) 
            at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:665) 
    
    bug 
    opened by kazemcodes 19
  • [BUG] Android Studio Project crashes after adding library dependency

    [BUG] Android Studio Project crashes after adding library dependency

    Hi! Im trying to use the library in an Android project. As suggested here https://github.com/skrapeit/skrape.it/issues/89 I added this to my build.gradle to avoid having problems with the methods containing space in its names and so.

    repositories {
        maven { url "https://jitpack.io" }
    }
    dependencies {
          testImplementation("com.github.skrapeit:skrape.it:master-SNAPSHOT")
    }
    

    And I was able to run this Unit Test successfully

    import it.skrape.core.htmlDocument
    import it.skrape.matchers.toBe
    import it.skrape.matchers.toContain
    import it.skrape.selects.html5.h1
    import it.skrape.selects.html5.p
    import org.junit.Test
    
    import org.junit.Assert.*
    
    /**
     * Example local unit test, which will execute on the development machine (host).
     *
     * See [testing documentation](http://d.android.com/tools/testing).
     */
    class ExampleUnitTest {
    
        @Test
        internal fun `can read and return html from String`() {
            htmlDocument(
                """
            <html>
                <body>
                    <h1>welcome</h1>
                    <div>
                        <p>first p-element</p>
                        <p class="foo">some p-element</p>
                        <p class="foo">last p-element</p>
                    </div>
                </body>
            </html>"""
            ) {
    
                h1 {
                    findFirst {
                        text toBe "welcome"
                    }
                }
                p {
                    withClass = "foo"
                    findFirst {
                        text toBe "some p-element"
                        className toBe "foo"
                    }
                }
                p {
                    findAll {
                        [email protected] toContain "p-element"
                    }
                    findLast {
                        text toBe "last p-element"
                    }
                }
    
            }
        }
    }
    
    

    image

    No problem so far, I guess because the Unit Tests run inside JVM.

    After that I tried to add the dependency to use it from my Android Project adding this to my build.gradle

    implementation("com.github.skrapeit:skrape.it:master-SNAPSHOT")

    And received this error:

    More than one file was found with OS independent path 'META-INF/DEPENDENCIES'

    After a little google search I found this to be the possible solution, so I added this to my build.gradle

    android {
        packagingOptions {
            pickFirst "META-INF/DEPENDENCIES"
        }
    }
    

    After adding that Im gettin more errors, so I think this might not be the solution Here is the StackTrace when I try to run the app

    2020-05-19 01:26:30.841 14644-14644/? I/webscrappertes: Late-enabling -Xcheck:jni
    2020-05-19 01:26:30.879 14644-14644/? E/webscrappertes: Unknown bits set in runtime_flags: 0x8000
    2020-05-19 01:26:31.299 14644-14644/? W/webscrappertes: Bad encoded_array value: Failure to verify dex file '/data/app/cu.neosoft.webscrappertest-hy3VhMUfmFw2eV1pLwg5LQ==/base.apk': Bad encoded_value method type size 7
    2020-05-19 01:26:31.306 14644-14644/? E/LoadedApk: Unable to instantiate appComponentFactory
        java.lang.ClassNotFoundException: Didn't find class "androidx.core.app.CoreComponentFactory" on path: DexPathList[[zip file "/data/app/cu.neosoft.webscrappertest-hy3VhMUfmFw2eV1pLwg5LQ==/base.apk"],nativeLibraryDirectories=[/data/app/cu.neosoft.webscrappertest-hy3VhMUfmFw2eV1pLwg5LQ==/lib/arm64, /system/lib64, /vendor/lib64, /system/product/lib64]]
            at dalvik.system.BaseDexClassLoader.findClass(BaseDexClassLoader.java:196)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:379)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:312)
            at android.app.LoadedApk.createAppFactory(LoadedApk.java:256)
            at android.app.LoadedApk.createOrUpdateClassLoaderLocked(LoadedApk.java:855)
            at android.app.LoadedApk.getClassLoader(LoadedApk.java:950)
            at android.app.LoadedApk.getResources(LoadedApk.java:1188)
            at android.app.ContextImpl.createAppContext(ContextImpl.java:2462)
            at android.app.ContextImpl.createAppContext(ContextImpl.java:2454)
            at android.app.ActivityThread.handleBindApplication(ActivityThread.java:6353)
            at android.app.ActivityThread.access$1300(ActivityThread.java:220)
            at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1860)
            at android.os.Handler.dispatchMessage(Handler.java:107)
            at android.os.Looper.loop(Looper.java:214)
            at android.app.ActivityThread.main(ActivityThread.java:7397)
            at java.lang.reflect.Method.invoke(Native Method)
            at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:492)
            at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:935)
        	Suppressed: java.io.IOException: Failed to open dex files from /data/app/cu.neosoft.webscrappertest-hy3VhMUfmFw2eV1pLwg5LQ==/base.apk because: Bad encoded_array value: Failure to verify dex file '/data/app/cu.neosoft.webscrappertest-hy3VhMUfmFw2eV1pLwg5LQ==/base.apk': Bad encoded_value method type size 7
            at dalvik.system.DexFile.openDexFileNative(Native Method)
            at dalvik.system.DexFile.openDexFile(DexFile.java:365)
            at dalvik.system.DexFile.<init>(DexFile.java:107)
            at dalvik.system.DexFile.<init>(DexFile.java:80)
            at dalvik.system.DexPathList.loadDexFile(DexPathList.java:444)
            at dalvik.system.DexPathList.makeDexElements(DexPathList.java:403)
            at dalvik.system.DexPathList.<init>(DexPathList.java:164)
            at dalvik.system.BaseDexClassLoader.<init>(BaseDexClassLoader.java:126)
            at dalvik.system.BaseDexClassLoader.<init>(BaseDexClassLoader.java:101)
            at dalvik.system.PathClassLoader.<init>(PathClassLoader.java:74)
            at com.android.internal.os.ClassLoaderFactory.createClassLoader(ClassLoaderFactory.java:87)
            at com.android.internal.os.ClassLoaderFactory.createClassLoader(ClassLoaderFactory.java:116)
            at android.app.ApplicationLoaders.getClassLoader(ApplicationLoaders.java:114)
            at android.app.ApplicationLoaders.getClassLoaderWithSharedLibraries(ApplicationLoaders.java:60)
            at android.app.LoadedApk.createOrUpdateClassLoaderLocked(LoadedApk.java:851)
            		... 13 more
    2020-05-19 01:26:31.330 14644-14644/? I/Perf: Connecting to perf service.
    2020-05-19 01:26:31.342 14644-14679/? E/Perf: Fail to get file list cu.neosoft.webscrappertest
    2020-05-19 01:26:31.342 14644-14679/? E/Perf: getFolderSize() : Exception_1 = java.lang.NullPointerException: Attempt to get length of null array
    2020-05-19 01:26:31.342 14644-14679/? E/Perf: Fail to get file list cu.neosoft.webscrappertest
    2020-05-19 01:26:31.343 14644-14679/? E/Perf: getFolderSize() : Exception_1 = java.lang.NullPointerException: Attempt to get length of null array
    2020-05-19 01:26:31.396 14644-14644/? D/AndroidRuntime: Shutting down VM
        
        
        --------- beginning of crash
    2020-05-19 01:26:31.400 14644-14644/? E/AndroidRuntime: FATAL EXCEPTION: main
        Process: cu.neosoft.webscrappertest, PID: 14644
        java.lang.RuntimeException: Unable to instantiate activity ComponentInfo{cu.neosoft.webscrappertest/cu.neosoft.webscrappertest.MainActivity}: java.lang.ClassNotFoundException: Didn't find class "cu.neosoft.webscrappertest.MainActivity" on path: DexPathList[[zip file "/data/app/cu.neosoft.webscrappertest-hy3VhMUfmFw2eV1pLwg5LQ==/base.apk"],nativeLibraryDirectories=[/data/app/cu.neosoft.webscrappertest-hy3VhMUfmFw2eV1pLwg5LQ==/lib/arm64, /system/lib64, /vendor/lib64, /system/product/lib64]]
            at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:3195)
            at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:3410)
            at android.app.servertransaction.LaunchActivityItem.execute(LaunchActivityItem.java:83)
            at android.app.servertransaction.TransactionExecutor.executeCallbacks(TransactionExecutor.java:135)
            at android.app.servertransaction.TransactionExecutor.execute(TransactionExecutor.java:95)
            at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2017)
            at android.os.Handler.dispatchMessage(Handler.java:107)
            at android.os.Looper.loop(Looper.java:214)
            at android.app.ActivityThread.main(ActivityThread.java:7397)
            at java.lang.reflect.Method.invoke(Native Method)
            at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:492)
            at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:935)
         Caused by: java.lang.ClassNotFoundException: Didn't find class "cu.neosoft.webscrappertest.MainActivity" on path: DexPathList[[zip file "/data/app/cu.neosoft.webscrappertest-hy3VhMUfmFw2eV1pLwg5LQ==/base.apk"],nativeLibraryDirectories=[/data/app/cu.neosoft.webscrappertest-hy3VhMUfmFw2eV1pLwg5LQ==/lib/arm64, /system/lib64, /vendor/lib64, /system/product/lib64]]
            at dalvik.system.BaseDexClassLoader.findClass(BaseDexClassLoader.java:196)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:379)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:312)
            at android.app.AppComponentFactory.instantiateActivity(AppComponentFactory.java:95)
            at android.app.Instrumentation.newActivity(Instrumentation.java:1251)
            at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:3183)
            at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:3410) 
            at android.app.servertransaction.LaunchActivityItem.execute(LaunchActivityItem.java:83) 
            at android.app.servertransaction.TransactionExecutor.executeCallbacks(TransactionExecutor.java:135) 
            at android.app.servertransaction.TransactionExecutor.execute(TransactionExecutor.java:95) 
            at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2017) 
            at android.os.Handler.dispatchMessage(Handler.java:107) 
            at android.os.Looper.loop(Looper.java:214) 
            at android.app.ActivityThread.main(ActivityThread.java:7397) 
            at java.lang.reflect.Method.invoke(Native Method) 
            at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:492) 
            at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:935) 
        	Suppressed: java.io.IOException: Failed to open dex files from /data/app/cu.neosoft.webscrappertest-hy3VhMUfmFw2eV1pLwg5LQ==/base.apk because: Bad encoded_array value: Failure to verify dex file '/data/app/cu.neosoft.webscrappertest-hy3VhMUfmFw2eV1pLwg5LQ==/base.apk': Bad encoded_value method type size 7
            at dalvik.system.DexFile.openDexFileNative(Native Method)
            at dalvik.system.DexFile.openDexFile(DexFile.java:365)
            at dalvik.system.DexFile.<init>(DexFile.java:107)
            at dalvik.system.DexFile.<init>(DexFile.java:80)
            at dalvik.system.DexPathList.loadDexFile(DexPathList.java:444)
            at dalvik.system.DexPathList.makeDexElements(DexPathList.java:403)
            at dalvik.system.DexPathList.<init>(DexPathList.java:164)
            at dalvik.system.BaseDexClassLoader.<init>(BaseDexClassLoader.java:126)
            at dalvik.system.BaseDexClassLoader.<init>(BaseDexClassLoader.java:101)
            at dalvik.system.PathClassLoader.<init>(PathClassLoader.java:74)
            at com.android.internal.os.ClassLoaderFactory.createClassLoader(ClassLoaderFactory.java:87)
            at com.android.internal.os.ClassLoaderFactory.createClassLoader(ClassLoaderFactory.java:116)
            at android.app.ApplicationLoaders.getClassLoader(ApplicationLoaders.java:114)
            at android.app.ApplicationLoaders.getClassLoaderWithSharedLibraries(ApplicationLoaders.java:60)
            at android.app.LoadedApk.createOrUpdateClassLoaderLocked(LoadedApk.java:851)
            at android.app.LoadedApk.getClassLoader(LoadedApk.java:950)
            at android.app.LoadedApk.getResources(LoadedApk.java:1188)
            at android.app.ContextImpl.createAppContext(ContextImpl.java:2462)
            at android.app.ContextImpl.createAppContext(ContextImpl.java:2454)
            at android.app.ActivityThread.handleBindApplication(ActivityThread.java:6353)
    2020-05-19 01:26:31.400 14644-14644/? E/AndroidRuntime:     at android.app.ActivityThread.access$1300(ActivityThread.java:220)
            at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1860)
            		... 6 more
    

    If there is any need for more info I'll be happy to provide it. Thanks in advance !

    bug help wanted 
    opened by javiereugenio 14
  • [BUG] BrowserFetcher not working on Android

    [BUG] BrowserFetcher not working on Android

    Hey, im new to this, can you help me get the HTML of a whole page, and if you can, also help me parse it into objects?

    I basically want to get all the nutritional information in these tables. pic

    And I also need to make sure 100g is selected pic2

    Here is my code But it's not working, I get error "No static field INSTANCE..." I'm using your code @here

    question 
    opened by p4ulor 12
  • [BUG] error while deploying on Android

    [BUG] error while deploying on Android

    Hi guys, i just trying this lib for scraping webpage, but i got an error when trying to deploy it on my device

    "Space characters in SimpleName 'to be' are not allowed prior to DEX version 040"

    bug 
    opened by glomowa 11
  • Crash on Android api level 30

    Crash on Android api level 30

    Describe the bug Getting crash on Android Level 30 due OkHttp version please update

    Code Sample

     suspend fun extract() {
            coroutineScope {
                val extracted = skrape(HttpFetcher) {
                    request {
                        url = "SUPER_FANCY_URL"
                    }
    
                    extractIt<ScrapSource> {
                        status {
                            it.httpStatusCode = code
                            it.httpStatusMessage = message
                        }
                        htmlDocument {
                            it.allParagraphs = p { findAll { eachText }}
                            it.paragraph = p { findFirst { text }}
                            it.allLinks = a { findAll { eachHref }}
                        }
                    }
                }
                _source.postValue(extracted)
            }
        }
    

    Expected behavior Should be able to work in level 30

    Additional context I can create a PR with similar change following https://stackoverflow.com/questions/63917431/expected-android-api-level-21-but-was-30

    bug 
    opened by cbedoy 10
  • [BUG] Unable to crawling the mvnrepository site

    [BUG] Unable to crawling the mvnrepository site

    Crawling this website in skrape.it will get the wrong HTML, directly in jsoup will get 403, but if via okhttp, everything is normal, can this be solved?

    My current solution is:

    skrape(OkHttpFetcher) {
      request { url = "https://mvnrepository.com/artifact/kotlin" }
      println(scrape().responseBody)
    }
    
    object OkHttpFetcher : NonBlockingFetcher<Request> {
      override val requestBuilder: Request get() = Request()
    
      @Suppress("BlockingMethodInNonBlockingContext")
      override suspend fun fetch(request: Request): Result = OkHttpClient().newCall(
        okhttp3.Request.Builder()
          .url(request.url)
          .build()
      ).execute().let {
        val body = it.body!!
        Result(
          responseBody = body.string(),
          responseStatus = Result.Status(it.code, it.message),
          contentType = body.contentType()?.toString()?.replace(" ", ""),
          headers = it.headers.toMap(),
          cookies = emptyList(),
          baseUri = it.request.url.toString()
        )
      }
    }
    
    bug 
    opened by chachako 9
  • [FEATURE] Support for native image (Spring Native/GraalVM)

    [FEATURE] Support for native image (Spring Native/GraalVM)

    Is your feature request related to a problem? Please describe. I was wondering if it's bug report or feature request but it would be nice to have support for native image building e.g. Spring Native. Currently skrape.it added as dependency instantly fails build process. It might be connected to usage of logback.xml here. I did small reproduction of this problem with logback and it turned out that it can fail build while having logback.xml in classpath

    Describe the solution you'd like Skrape.it supporting native image building.

    Additional context

      - Additional action of task ':generateAot' was implemented by the Java lambda 'org.springframework.aot.gradle.SpringAotGradlePlugin$$Lambda$916/0x00000008012f5230'. Reason: Using Java lambdas is not supported as task inputs. Please refer to https://docs.gradle.org/7.5/userguide/validation_problems.html#implementation_unknown for more details about this problem.
    I 11:19:13.722 [ld.ContextBootstrapContributor] Detected application class: pl.something.api.ApiApplication
    I 11:19:13.724 [ld.ContextBootstrapContributor] Processing application context
    
    org.springframework.boot.logging.LogbackHints$LogbackXmlException: Embedded logback.xml file is not supported yet with Spring Native, read the support section of the documentation for more details
    
    FAILURE: Build failed with an exception.
    
    * What went wrong:
    Execution failed for task ':generateAot'.
    > Process 'command '/Users/user/.sdkman/candidates/java/22.2.r17-grl/bin/java'' finished with non-zero exit value 1
    

    It might be related to https://github.com/spring-projects-experimental/spring-native/issues/625

    feature request 
    opened by marceligrabowski 8
  • [QUESTION] Socket timeout on self signed SSL certs

    [QUESTION] Socket timeout on self signed SSL certs

    Hello! I'm building a simple android app that is going to scrape data from a specific website, and I get socket timeouts on request calls for https sites with self signed certs. I tried a few different sites that have self signed ssl certs and always the same thing happens.

    I tried using the sslRelaxed option for the request function and playing around with different timeout values, but I can't make it work at all.

    Could someone point me in right direction what could be a problem, and or give me some sample code how to do it in case of self singed certs?

    I haven't put a sample code since it is super trivial and similar to samples in the doc., since I just found the skrape.it lib and trying to evaluate it for an app. Thank you!

    question 
    opened by nikoinist 8
  • [BUG] element extraction methods like `$`, el, element and elements not found

    [BUG] element extraction methods like `$`, el, element and elements not found

    Describe the bug The documentation for extracting data from a website is out of date and does not compile.

    Code Sample import it.skrape.extract import it.skrape.selects.$` <-- is not in the selects package and doesn't compile import it.skrape.selects.el <-- is not in the selects package and doesn't compile import it.skrape.skrape

    data class MyScrapedData( val userName: String, val repositoryNames: List )

    fun main() { val githubUserData = skrape { url = "https://github.com/skrapeit"

        extract {
            MyScrapedData(
                    userName = el(".h-card .p-nickname").text(),
                    repositoryNames = `$`("span.repo").map { it.text() }
            )
        }
    }
    println("${githubUserData.userName}'s repos are ${githubUserData.repositoryNames}")
    

    }`

    Expected behavior I've tried all but the most basic examples to learn the different components of scraping. selects.element and selects.elements are also used in the examples but they don't appear to be in the code. This very well could be a problem with how I have or haven't configured intellij.

    bug 
    opened by pedramkeyani 7
  • [IMPROVEMENT] automate release process

    [IMPROVEMENT] automate release process

    Releasing a new version should happen completely automated.

    It should happen on pushing a particular tags to the master (since GitHub Actions doesn't support a parametrized manual build trigger).

    following tags are allowed values and will trigger a corresponding release (bump and commit project version afterwards publish to maven central):

    • major (will bump the major version - e.g. 2.11.1 --> 3.0.0 || 2.11.1-alpha1 --> 3.0.0)
    • feature (will bump the minor version - e.g. 2.11.1 --> 2.12.0 || 2.11.1-alpha1 --> 2.12.0)
    • bug (will bump the patch version - e.g. 2.11.1 --> 2.11.2 || 2.11.1-alpha1 --> 2.11.2)
    • alpha (will bump the alpha version - e.g. 2.11.1 --> 2.11.1-alpha1 || 2.11.1-alpha1 --> 2.11.1-alpha2)
    • beta (will bump the beta version - e.g. 2.11.1 --> 2.11.1-beta1 || 2.11.1-beta1 --> 2.11.1-beta2)
    • rc (will bump the rc version - e.g. 2.11.1 --> 2.11.1-rc1 || 2.11.1-rc1 --> 2.11.1-rc2)
    technical-improvement 
    opened by skrapeit 7
  • [BUG] Skrape.It causes a stack dump when trying to run it from an Android application

    [BUG] Skrape.It causes a stack dump when trying to run it from an Android application

    Describe the bug I get a stack trace when trying to use skrape.it from within an Android app.

    Minimal example, gradle.build and a stack trace are in this gist: https://gist.github.com/Git-Jiro/34e7f49d6abddfe825f53cc6df4d4a4d

    Expected behavior Scraper should not cause a stack trace. (My code works fine in normal java / kotlin application)

    bug help wanted 
    opened by Git-Jiro 6
  • [QUESTION] Execution error on some android devices

    [QUESTION] Execution error on some android devices

    describe what you want to archive Skrape.it works fine on almost all android devices, but there is a small percentage that generate an exception like this and I don't know how to fix it.

    Error report I attach the error that appears. E/System: Uncaught exception thrown by finalizer E/System: java.lang.NullPointerException: Attempt to invoke interface method 'void org.apache.commons.logging.Log.debug(java.lang.Object)' on a null object reference at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.shutdown(PoolingNHttpClientConnectionManager.java:232) at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.finalize(PoolingNHttpClientConnectionManager.java:213) at java.lang.Daemons$FinalizerDaemon.doFinalize(Daemons.java:190) at java.lang.Daemons$FinalizerDaemon.run(Daemons.java:173) at java.lang.Thread.run(Thread.java:818) E/AndroidRuntime: FATAL EXCEPTION: DefaultDispatcher-worker-1 Process: dev.jmarin.bibliotecasugr, PID: 3567 java.lang.NoSuchFieldError: No static field INSTANCE of type Lorg/apache/http/message/BasicLineFormatter; in class Lorg/apache/http/message/BasicLineFormatter; or its superclasses (declaration of 'org.apache.http.message.BasicLineFormatter' appears in /system/framework/ext.jar)

    question 
    opened by jesusma3009 0
  • [BUG] No static field INSTANCE of type Lorg/apache/http/message/BasicLineFormatter

    [BUG] No static field INSTANCE of type Lorg/apache/http/message/BasicLineFormatter

    skrapeit-1.3.0-alpha.1

    java.lang.NoSuchFieldError: No static field INSTANCE of type Lorg/apache/http/message/BasicLineFormatter; in class Lorg/apache/http/message/BasicLineFormatter; or its superclasses (declaration of 'org.apache.http.message.BasicLineFormatter' appears in /system/framework/org.apache.http.legacy.jar)
    		at org.apache.http.impl.nio.codecs.DefaultHttpRequestWriterFactory.<init>(DefaultHttpRequestWriterFactory.java:53)
    		at org.apache.http.impl.nio.codecs.DefaultHttpRequestWriterFactory.<init>(DefaultHttpRequestWriterFactory.java:57)
    		at org.apache.http.impl.nio.codecs.DefaultHttpRequestWriterFactory.<clinit>(DefaultHttpRequestWriterFactory.java:47)
    		at org.apache.http.impl.nio.conn.ManagedNHttpClientConnectionFactory.<init>(ManagedNHttpClientConnectionFactory.java:75)
    		at org.apache.http.impl.nio.conn.ManagedNHttpClientConnectionFactory.<init>(ManagedNHttpClientConnectionFactory.java:83)
    		at org.apache.http.impl.nio.conn.ManagedNHttpClientConnectionFactory.<clinit>(ManagedNHttpClientConnectionFactory.java:64)
    		at org.apache.http.impl.nio.client.HttpAsyncClientBuilder.build(HttpAsyncClientBuilder.java:688)
    		at io.ktor.client.engine.apache.ApacheEngine.prepareClient(ApacheEngine.kt:78)
    		at io.ktor.client.engine.apache.ApacheEngine.<init>(ApacheEngine.kt:33)
    		at io.ktor.client.engine.apache.Apache.create(Apache.kt:19)
    		at io.ktor.client.HttpClientKt.HttpClient(HttpClient.kt:41)
    		at it.skrape.fetcher.HttpFetcher.configuredClient(HttpFetcher.kt:28)
    		at it.skrape.fetcher.HttpFetcher.fetch(HttpFetcher.kt:24)
    		at it.skrape.fetcher.HttpFetcher.fetch(HttpFetcher.kt:20)
    		at it.skrape.fetcher.FetcherConverter.fetch(Scraper.kt:30)
    		at it.skrape.fetcher.Scraper.scrape(Scraper.kt:17)
    		at it.skrape.fetcher.ScraperKt.response(Scraper.kt:87)
    		at video.downloader.saver.story.helpers.HtmlDynamicLoader$extract$extracted$1.invokeSuspend(HtmlDynamicLoader.kt:19)
    		at video.downloader.saver.story.helpers.HtmlDynamicLoader$extract$extracted$1.invoke(Unknown Source:8)
    		at video.downloader.saver.story.helpers.HtmlDynamicLoader$extract$extracted$1.invoke(Unknown Source:4)
    		at it.skrape.fetcher.ScraperKt$skrape$1.invokeSuspend(Scraper.kt:43)
    		at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
    		at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
    		at kotlinx.coroutines.EventLoopImplBase.processNextEvent(EventLoop.common.kt:279)
    		at kotlinx.coroutines.BlockingCoroutine.joinBlocking(Builders.kt:85)
    		at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:59)
    		at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source:1)
    		at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(Builders.kt:38)
    		at kotlinx.coroutines.BuildersKt.runBlocking$default(Unknown Source:1)
    		at it.skrape.fetcher.ScraperKt.skrape(Scraper.kt:42)
    		at video.downloader.saver.story.helpers.HtmlDynamicLoader.extract(HtmlDynamicLoader.kt:14)
    		at video.downloader.saver.story.ui.fragment.browser.BrowserTabFragment$12.doInUIThread(BrowserTabFragment.java:1030)
    		at com.arasthel.asyncjob.AsyncJob$1.run(AsyncJob.java:46)
    		at android.os.Handler.handleCallback(Handler.java:938)
    		at android.os.Handler.dispatchMessage(Handler.java:99)
    		at android.os.Looper.loopOnce(Looper.java:226)
    		at android.os.Looper.loop(Looper.java:313)
    		at android.app.ActivityThread.main(ActivityThread.java:8751)
    		at java.lang.reflect.Method.invoke(Native Method)
    		at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:571)
    		at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1135)
    
    bug 
    opened by nikitoSha 0
  • Multiplatform

    Multiplatform

    I've been working over the last few months to create a multiplatform version for skrape.it It's somewhat in line with #196 but I'm a bit further along in some areas, which is why i wanted to get this out. So far I've converted the buildscripts to multiplatform and implemented some modules in JS. This is still pretty much WIP and I intend to keep working on it. I'll update the pull request as I get further along and improve the code

    What's done so far:

    • Converted buildscripts to multiplatform
    • Added and implemented JS-Targets for the following modules:
      • :dsl
      • :fechter:base-fetcher
      • :html-parser
      • :test-utils
    • Converted the multiplatform modules to use the robstoll/atrium test framework

    What still needs to be done:

    • Convert the rest of the modules
    • Decide what to do with the different fetchers. Are they really necessary?
    • Fixup the kover reports (Should be pretty much the same as #196)
    • Cleanup the code and document it
    • Migrate the JS Target to the new IR compiler (waiting on atrium for that)

    Other notable changes:

    • Kotlin version was bumped to 1.7.10 and a few other dependecies were updated
    • Disabled build caching as well as RepositoriesMode.PREFER_SETTINGS since those can unfortunately mess up the builds in multiplatform
    opened by McDjuady 6
  • [BUG] Crash on Android when using R8

    [BUG] Crash on Android when using R8

    Describe the bug When R8 is enabled, I get the exception ExceptionInInitializerError. Here is the stack trace:

    java.lang.ExceptionInInitializerError
        at v7.u.b(SourceFile:3)
        at x4.f.b(Unknown Source:2)
        at it.skrape.fetcher.ScraperKt.a(SourceFile:5)
        at com.moefactory.bettermiuiexpress.repository.ExpressRepository$queryExpressDetailsFromCaiNiaoActual$2.t(SourceFile:6)
        at com.moefactory.bettermiuiexpress.repository.ExpressRepository$queryExpressDetailsFromCaiNiaoActual$2.m(SourceFile:2)
        at it.skrape.fetcher.ScraperKt$skrape$1.t(SourceFile:4)
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.k(SourceFile:3)
        at v7.y.run(SourceFile:18)
        at kotlinx.coroutines.c.A(SourceFile:21)
        at v7.u.Y(SourceFile:14)
        at it.skrape.fetcher.ScraperKt.b(Unknown Source:8)
        at com.moefactory.bettermiuiexpress.repository.ExpressRepository$queryExpressDetailsFromCaiNiao$1.t(SourceFile:5)
        at com.moefactory.bettermiuiexpress.repository.ExpressRepository$queryExpressDetailsFromCaiNiao$1.m(SourceFile:2)
        at androidx.lifecycle.BlockRunner$maybeRun$1.t(SourceFile:9)
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.k(SourceFile:3)
        at v7.y.run(SourceFile:18)
        at y7.e.run(SourceFile:2)
        at z7.h.run(SourceFile:1)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$a.run(SourceFile:15)
        Suppressed: kotlinx.coroutines.DiagnosticCoroutineContextException: [w0{Cancelling}@6e8eabc, Dispatchers.IO]
    Caused by: org.apache.commons.logging.LogConfigurationException: java.lang.ClassNotFoundException: Didn't find class "org.apache.commons.logging.impl.LogFactoryImpl" on path: DexPathList[[zip file "/data/app/~~s6TmJS25sPj8Sk_G1Isbhg==/com.moefactory.bettermiuiexpress-cGy7sMs8jCsggIb5mjNEJA==/base.apk"],nativeLibraryDirectories=[/data/app/~~s6TmJS25sPj8Sk_G1Isbhg==/com.moefactory.bettermiuiexpress-cGy7sMs8jCsggIb5mjNEJA==/lib/arm64, /system/lib64, /system_ext/lib64]] (Caused by java.lang.ClassNotFoundException: Didn't find class "org.apache.commons.logging.impl.LogFactoryImpl" on path: DexPathList[[zip file "/data/app/~~s6TmJS25sPj8Sk_G1Isbhg==/com.moefactory.bettermiuiexpress-cGy7sMs8jCsggIb5mjNEJA==/base.apk"],nativeLibraryDirectories=[/data/app/~~s6TmJS25sPj8Sk_G1Isbhg==/com.moefactory.bettermiuiexpress-cGy7sMs8jCsggIb5mjNEJA==/lib/arm64, /system/lib64, /system_ext/lib64]])
        at n9.b.run(SourceFile:48)
        at java.security.AccessController.doPrivileged(AccessController.java:43)
        at n9.d.l(SourceFile:1)
        at n9.d.c(SourceFile:74)
        at n9.d.f(Unknown Source:0)
        at com.gargoylesoftware.htmlunit.WebClient.<clinit>(SourceFile:1)
        ... 19 more
    Caused by: java.lang.ClassNotFoundException: Didn't find class "org.apache.commons.logging.impl.LogFactoryImpl" on path: DexPathList[[zip file "/data/app/~~s6TmJS25sPj8Sk_G1Isbhg==/com.moefactory.bettermiuiexpress-cGy7sMs8jCsggIb5mjNEJA==/base.apk"],nativeLibraryDirectories=[/data/app/~~s6TmJS25sPj8Sk_G1Isbhg==/com.moefactory.bettermiuiexpress-cGy7sMs8jCsggIb5mjNEJA==/lib/arm64, /system/lib64, /system_ext/lib64]]
        at dalvik.system.BaseDexClassLoader.findClass(BaseDexClassLoader.java:218)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:379)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:312)
        at n9.b.run(SourceFile:2)
        ... 24 more
    

    It seems that some classes are renamed by R8 causing initialization failure.

    Code Sample

    skrape(BrowserFetcher) {
        request {
            url {
                protocol = UrlBuilder.Protocol.HTTPS
                host = "a.example.com"
                port = -1
                path = "/path/to/query"
            }
            userAgent = "Mozilla/5.0 (Linux; Android 12; M2102K1C) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Mobile Safari/537.36 EdgA/105.0.1343.48"
            sslRelaxed = true
        }
    
        response {
            val jDoc = Jsoup.parse(responseBody)
    
            // Parse using Jsoup
        }
    }
    

    Expected behavior skrape.it could run normally when using R8.

    Additional context Maybe adding some proguard rules helps?

    bug 
    opened by Robotxm 2
  • Three vulnerabilities detected

    Three vulnerabilities detected

    Hello, Gradle informs me of three vulnerabilities coming from jsoup and xalan :

    https://devhub.checkmarx.com/cve-details/CVE-2021-37714/ https://devhub.checkmarx.com/cve-details/CVE-2022-36033/ https://devhub.checkmarx.com/cve-details/CVE-2022-34169/

    Have these libs been updated or will be?

    Thanks

    technical-improvement 
    opened by Nico-GS 0
  • Initial Kotlin Multiplatform setup

    Initial Kotlin Multiplatform setup

    Initial groundwork for Kotlin Multiplatform #192

    Depends on #194

    I was expect this to be a lot more difficult! I indended just to do one module, but I found that they were all very easy to migrate. html-parser was the most involved.

    That said, I can't run most of the tests (I'm on Windows), so I could have broken some stuff. And the really hard work of actually implementing JS and/or Native code can be done later.

    WIP

    • [x] Migrate test-utils
    • [x] Update Kover config (or disable Kover if this is too difficult)
    • [x] ~Configure Maven publishing buildSrc plugin (shouldn't be too much work to do, I can copy & paste some existing config that works)~ I've briefly tested this locally and it seems to work as expected.
    • [ ] Verify that the new publications are correct and work. This means checking the POMs are correct and expose the right API dependencies.
    • [x] ~jsExecution feature variant - I can't find an alternative for this with Kotlin Multiplatform. https://youtrack.jetbrains.com/issue/KT-33432. ~ I've simply added the 'maven publishing' config to the browser-fetcher project. I think that will achieve the same result.

    Notes

    • bump Kotlin to 1.7.10, and the language level to 1.7, and - ⚠ breaking change - the api level to 1.5 (from 1.4). Kotlin 1.7 has some nice improvements for Kotlin Multiplatform. And level 1.4 is deprecated. This seemed like a good time to bump it.
    • JVM only
    • All tests are still JUnit
    • I didn't try migrating any code, just moving things into the correct source sets
    • The real work was creating expect/actual definitions - so check them out and see if they make sense. The expect definitions are essentially like interfaces that the platform code will implement.
    • The 'JS browser execution' feature probably won't work - I disabled the Gradle option for it
    • The HttpFetcher and BrowserFetcher objects are pretty redundant, as they don't significantly extend from the BlockingFetcher interface. I think you can refactor the common code to only rely on the interface.
    • Publishing and releasing are still TODO
    opened by aSemy 3
High level parsing to ensure your input is in the right shape and satisfies all constraints that business logic requires.

Parsix High level parsing to ensure your input is in the right shape and satisfies all constraints that business logic requires. It is highly inspired

null 190 Oct 16, 2022
CreditCardHelper 🖊️ A Jetpack-Compose library providing useful credit card utilities such as card type recognition and TextField ViewTransformations

CreditCardHelper ??️ A Jetpack-Compose library providing useful credit card utilities such as card type recognition and TextField ViewTransformations

Stelios Papamichail 18 Dec 19, 2022
HTML to PDF convertor for Android

HTML to PDF Convertor A simple HTML to PDF convertor for Android Download Add mavenCentral() repository in project's build.gradle allprojects { re

Nvest Solutions 49 Dec 19, 2022
KmmCaching - An application that illustrates fetching data from remote data source and caching it in local storage

An application that illustrates fetching data from remote data source and caching it in local storage for both IOS and Android platforms using Kotlin Multiplatform Mobile and SqlDelight.

Felix Kariuki 5 Oct 6, 2022
Complete packet manipulator and sender for testing

Packeteer Packet listening and sending utilities for debugging Installation Download the jar from Releases OR compile it yourself. Instructions to do

Cepi 1 Jan 8, 2022
Native Kotlin library for time-based TOTP and HMAC-based HOTP one-time passwords

A kotlin implementation of HOTP (RFC-4226) and TOTP (RFC-6238). Supports validation and generation of 2-factor authentication codes, recovery codes and randomly secure secrets.

Robin Ohs 6 Dec 19, 2022
A simple Android utils library to write any type of data into cache files and read them later.

CacheUtilsLibrary This is a simple Android utils library to write any type of data into cache files and then read them later, using Gson to serialize

Wesley Lin 134 Nov 25, 2022
Multiplaform kotlin library for calculating text differences. Based on java-diff-utils, supports JVM, JS and native targets.

kotlin-multiplatform-diff This is a port of java-diff-utils to kotlin with multiplatform support. All credit for the implementation goes to original a

Peter Trifanov 51 Jan 3, 2023
A library for calculating on string data.

FookCalc What is FookCalc? A library for calculating on string data. Gradle Add the following to your project's root build.gradle file repositories {

Halil ibrahim EKİNCİ 10 Aug 15, 2022
A convenient library to show a shimmer effect while loading data

A convenient library to show a shimmer effect while loading data. Easily convert your current view with a slick skeleton loading animation just by wrapping your view.

Justin Guedes 11 Apr 28, 2022
Access and process various types of personal data in Android with a set of easy, uniform, and privacy-friendly APIs.

PrivacyStreams PrivacyStreams is an Android library for easy and privacy-friendly personal data access and processing. It offers a functional programm

null 269 Dec 1, 2022
Little utilities for more pleasant immutable data in Kotlin

What can KopyKat do? Mutable copy Nested mutation Nested collections Mapping copyMap copy for sealed hierarchies copy from supertypes copy for type al

KopyKat 193 Dec 19, 2022
An easy-to-use, cross-platform measurement tool that pulls data out of CD pipelines and analysis the four key metrics for you.

Maintained by SEA team, ThoughtWorks Inc. Read this in other languages: English, 简体中文 Table of Contents About the Project Usage How to Compute Contrib

Thoughtworks 277 Jan 7, 2023
Preferences data store example

DataStore Example this example shows how you can use data store to store data in key value pairs and get rid of shared preferences Medium Article: htt

Kashif Mehmood 24 Dec 15, 2022
Keep data as a linked list on disk. A alternative way to reduce redundant operation for DiskLruCache

DiskLinkedList Keep data as a linked list on disk. An alternative way to reduce redundant operation for DiskLruCache Use-case Android have build-in Di

Cuong V. Nguyen 6 Oct 29, 2021
A command line utility to help you investigate the sensitive data associated with Macie findings.

Macie Finding Data Reveal This project contains a command line utility to help you investigate the sensitive data associated with Macie findings.

AWS Samples 8 Nov 16, 2022
A lightning fast, transactional, file-based FIFO for Android and Java.

Tape by Square, Inc. Tape is a collection of queue-related classes for Android and Java. QueueFile is a lightning-fast, transactional, file-based FIFO

Square 2.4k Dec 30, 2022
Gitversion - A native console application to calculate a version based on git commits and tags

GitCommit A native console application to calculate a version based on git commi

Solugo 5 Sep 13, 2022