Live Scroll Transcript with kotlin

Overview

Live Scroll Transcript

This project enables Android users to automatically scroll text content in-sync with currently playing audio (For example, listening to an audiobook and having the ebook automatically follow along).

The Live Caption feature provides captions for nearly any audio source. This project provides Live Scroll Transcript, an accessibility service that uses these captions to scroll along in a text source that is currently on-screen.

Contributing

See CONTRIBUTING.md for details.

License

Apache 2.0; see LICENSE for details.

Disclaimer

This project is not an official Google project. It is not supported by Google and Google specifically disclaims all warranties as to its quality, merchantability, or fitness for a particular purpose.

Comments
  • Use text instead of OCR to get live transcribe content

    Use text instead of OCR to get live transcribe content

    I noticed OCR was particularly buggy when trying to use this service listening to podcasts. I also noticed that you can actually get the text content of the live transcribe window using the accessibility service, so I just did that and took the last 10 or so words. That seemed to be enough/not too much in some limited testing to keep up with scrolling but there's probably some more tuning that could be done.

    This will partially fix #8 too (at least as far as it can be fixed from this service's side - live caption still has to support the languages) since we're not depending on OCR anymore.

    opened by alexschneider 3
  • Remove OCR (HUGE thanks to alexschneider!)

    Remove OCR (HUGE thanks to alexschneider!)

    HUGE thanks to alexschneider's https://github.com/KyleFin/live-scroll-transcript/pull/9

    Get caption text directly from AccessibilityEvent source. This greatly simplifies the code and should help performance (CPU / battery usage, not require downloading or supporting OCR libraries).

    This provides access to a large amount of historical caption text, but we'll only process the most recent caption text.

    Added notes about numCaptionCharsToLookAt and captionViewScrollsThreshold.

    opened by KyleFin 2
  • Check that rootInActiveWindow is not null before recursive search.

    Check that rootInActiveWindow is not null before recursive search.

    This prevents an IllegalState exception which crashes the accessibility service if root is null (for example if captions scroll and window is requested while a user is switching between apps).

    opened by KyleFin 0
  • Initial implementation

    Initial implementation

    This is a working version that uses OCR to read recent Live Captions and search for them in the current screen's accessibility tree.

    Demo video: https://photos.app.goo.gl/erc5cugnLUC4tfp8A

    opened by KyleFin 0
  • Make scrolling frequency and search size user customizable?

    Make scrolling frequency and search size user customizable?

    Adjusting frequency of how often we search for text and how much text we search (from captions and in accessibility tree) can improve the user experience.

    For example:

    • More frequent/granular searching for articles/podcasts.
    • Less frequent but more accurate scrolling in a full novel (https://github.com/KyleFin/live-scroll-transcript/issues/5).

    Users may have more context to decide what is appropriate.

    My hesitation in making this configurable is that I don't want users to have to deeply understand the inner workings of how this feature works, and I'm not sure how best to make the user experience simple.

    When I was using OCR to read Live Captions, I was thinking to use numCaptionLines to determine scroll frequency. My thought was that it may be intuitive for users that "we scroll every time the caption box has been entirely replaced" so if you want to scroll less frequently and consider more caption text (like in an ebook), you can double tap the caption box to expand it.

    Another solution might be to have configurable options in Settings > Accessibility > Live Scroll Transcript.

    opened by KyleFin 0
  • Support for non-English languages

    Support for non-English languages

    This is dependent on Live Caption ~and ML Kit~ supporting non-English languages. Once that is available it shouldn't be hard to add.

    Consider looking into StringSearch which may be helpful for matching text in different languages.

    We could also translate captions and/or transcripts into different languages. (E.g. listen in English and read along in a Spanish transcript) This could be done now for English audio ~and text in languages supported by ML Kit~.

    opened by KyleFin 0
  • Support apps without reliable a11y trees

    Support apps without reliable a11y trees

    The initial implementation works well for text that is all in one page (Chrome or apps with webviews like Gospel Library), broken down by paragraph into small AccessibilityNodeInfos that can be accessed from rootInActiveWindow. (https://github.com/KyleFin/live-scroll-transcript/issues/6 is about communicating this to users)

    There are many use-cases I would like to support. Here are some thoughts:

    Use cases

    (See sample code for an example of how to print a11y tree)

    • Scrolling apps where text is in one giant node or not exposed in a11y tree (Google Docs, Drive, Gmail)
      • If we can't access a11y tree, we might use OCR and swipe gestures to scroll. One difficulty is knowing if we should continue scrolling when images filling the whole screen or if we miss scrolling and audio goes beyond the current screen.
      • Create a generic "ctrl + f" macro functionality to find text in the current app?
    • PDFs
      • Could be very useful. OCR and swipe gestures. In addition to images, another complication is knowing which way to swipe (how to handle multiple columns on the same page)
    • Page-turning apps (Hoopla, Libby, Google Play Books)
      • Some may provide good a11y info, but only for the current page. We can probably turn page quite reliably with just a tap or swipe gesture. Also may support NEXT_PAGE action.
      • How to know which apps support page turning (can AccessibilityService query support for NEXT_PAGE?)
      • How to decide when to turn the page? (Once we've matched text on the page, turn page immediately to stay ahead? Wait until audio goes to next page and try to keep up? What if there are images?)
    • Kindle (page-turn or continuous)
      • This may be lower priority because some Kindle books have WhisperSync with Audible to automatically sync text/audio. Live Scroll would extend support for using different versions of the media and work for virtually any title.
      • Page-turn is same as other page-turning apps and provides a11y info for current page.
      • Continuous scroll provides no text a11y info (confirm if implementing), but we could scroll with OCR and gestures.
        • We can know we're in page-turn or continuous mode from package name (Kindle) and content description in KRFView (has text for page, empty for continuous). Confirm if implementing.
      • Same concerns about images and how to recover if we fall behind.
    • Photos (panning instead of scrolling)
      • Very low priority, but it could be neat.
      • Requires OCR and gestures.

    Solutions

    • OCR

      • How to decide where to search for text.
        • Everywhere except in Live Caption box? Exclude system headers?
      • May be helpful to use window changed a11y events to know when current screen has changed (maybe user switched apps so we should start or stop trying to use OCR to scroll).
    • Gestures

      • We may prefer sending more direct commands if possible (show_on_screen, next paragraph, scroll, etc) because they're more targeted and probably efficient.
      • See dispatchGesture documentation and sample GestureDescription code below.
      • Pros: work with any app
      • Cons: could interfere with other apps. If we fall behind audio, it's hard to recover.
    • How to determine how to swipe/tap?

      • Curated package/view list? (Specific apps to allow or block)
        • ML model to determine given a screenshot and/or a11y tree how to interact with a given app?
      • Fallback after attempting to use a11y tree.
        • Need to notify user if this is happening so we're not needlessly trying to scroll when user doesn't want scrolling.
      • Can a11y service determine if current screen is scrollable or supports page-turn?
    • How to determine where to swipe?

      • Start from where word is matched and swipe to top of screen?

    Sample code

      // Logging current a11y tree (very similar to getNodesContainingWord)
      private fun printAccessibilityTree(root: AccessibilityNodeInfo, level: Int) {
            if (root == null) return
            Log.d(tag, "Node at level %s with childCount %s: %s".format(level, root.getChildCount(), root))
            for (i in 1..root.childCount) {
                root.getChild(i - 1)?.let { printAccessibilityTree(it, level + 1) }
            }
        }
    
      // From AccessibilityService:
      printAccessibilityTree(this.rootInActiveWindow, 0)
    
      private GestureDescription advanceTextGestureDescription() {
        if (currGestureRegion.equals(paginatedAppGestureRegion)) {
          return tapRightSideOfScreen(); // swipeLeftGestureDescription();
        } else if (currGestureRegion.equals(scrollableAppGestureRegion)) {
          return swipeUpGestureDescription(currGestureRegion.bottom);
        }
        return null;
      }
    
      private GestureDescription swipeUpGestureDescription(int initialY) {
        // ** Swipe up (e.g. to scroll down). */
        Path path = new Path();
        path.moveTo(currGestureRegion.left, initialY);
        path.lineTo(currGestureRegion.left, currGestureRegion.top);
        StrokeDescription strokeDescription =
            new StrokeDescription(path, /*startTime=*/ 0L, /*duration (in ms)=*/ 500L);
        return new GestureDescription.Builder().addStroke(strokeDescription).build();
      }
    
      private GestureDescription swipeLeftGestureDescription() {
        // ** Swipe left (e.g. to turn to next page). */
        Path longSlowPath = new Path();
        longSlowPath.moveTo(900, 1000);
        longSlowPath.lineTo(200, 1000);
    
        Path flickPath = new Path();
        flickPath.moveTo(200, 1000);
        flickPath.lineTo(100, 1000);
    
        StrokeDescription strokeDescription =
            new StrokeDescription(
                longSlowPath, /*startTime=*/ 0L, /*duration (in ms)=*/ 400L, /* willContinue= */ true);
        strokeDescription.continueStroke(
            flickPath, /*startTime=*/ 0L, /*duration (in ms)=*/ 100L, /* willContinue= */ false);
    
        return new GestureDescription.Builder().addStroke(strokeDescription).build();
      }
    
      private GestureDescription tapRightSideOfScreen() {
        // ** Tap right side of screen (e.g. to turn to next page). */
        Path path = new Path();
        path.moveTo(screenHeight / 2, 5 * (screenWidth / 6));
        StrokeDescription strokeDescription =
            new StrokeDescription(path, /*startTime=*/ 0L, /*duration (in ms)=*/ 10L);
        return new GestureDescription.Builder().addStroke(strokeDescription).build();
      }
    
    opened by KyleFin 0
  • Communicate to users which apps are supported

    Communicate to users which apps are supported

    In app description:

    • Works with any audio that works with Live Caption (bluetooth, not casting)
    • Any text that TalkBack can read? (I don't think this is true and this issue is to find a better description) Explain that some apps block Live Caption or screenshots

    The initial implementation works well for text that is all in one page (Chrome or apps with webviews like Gospel Library), broken down by paragraph into small AccessibilityNodeInfos that can be accessed from rootInActiveWindow.

    I want to determine how to clearly identify and communicate to users (in documentation or pop-ups) which apps are supported and why.

    Perhaps there is a way to identify apps which follow a11y guidelines and work well with TalkBack (IIUC that set of apps should heavily overlap with apps we can scroll).

    opened by KyleFin 0
  • Improve performance with large a11y trees (entire ebook in one page)

    Improve performance with large a11y trees (entire ebook in one page)

    The initial implementation works well for articles, podcasts, etc but not as well for longer text (i.e. the full text of Dracula on Project Gutenberg)

    A few of my thoughts about how to improve performance:

    • We may not want to search as frequently. (Scrolling would be more accurate but more delayed.)

      • Frequency time or captionViewScrollsThreshold could be user-configurable, but I'd strongly prefer having it "just work" since tuning these parameters may be finicky, especially without the logs to really understand what's going on.
      • DO THIS? Increase captionViewScrollsThreshold? If user expands Live Caption box this would also allow having more caption text to build up, possibly giving a better keyword (longest word).
        • The more I think about this, the more I love this solution. We could adjust threshold dynamically based on how many lines are in the caption box. We can document/suggest that users should expand the box for longer pages to allow better keyword selection and prevent too-frequent scrolling. I don't love that the caption box will cover up more of the screen, but it seems pretty intuitive to me that instead of reasoning about frequency you just need to understand "Live Scroll will attempt to scroll every time the Live Caption text has been completely replaced. More lines == less frequent scrolling and higher chance of matching"
      • Add a timer to enforce how frequently we scroll?
      • Reducing frequency should fix screenshot-too-soon errors.
    • We want to avoid searching the full page more than necessary

      • Once we've successfully scrolled a node, we could start searching from there (or its parent) first instead of from root every time.
        • Maybe we can use the "next" actions used by TalkBack to search for or display the next paragraph, etc.
        • This may resolve the "stack too big" error I've seen a few times.
    opened by KyleFin 0
  • Improve user notifications

    Improve user notifications

    Toasts suggesting page refresh and screenshot too soon in initialImpl are helpful.

    Consider adding more messaging for things like:

    • Attempted to search for text in current screen (every time so user knows we attempted and they can understand frequency of attempts? Every N times to remind user the service is running and they may want to turn it off?)
    • After failed to find match N times? (Remind user the service is on when they aren't actively using it. I want to investigate performance of leaving the service running constantly. Maybe it's fine.)
    • Scrolled
    • Highlight text that was matched?
    • Other failures?
    opened by KyleFin 1
Owner
Kyle Finlinson
Kyle Finlinson
A simple Football Live Score project using Kotlin.

Football Live Score App (Kotlin) A simple Malaysian Football Live Score project using Kotlin. Started on Aug 2020. Project terminated. Self-developed

Nur Ameerul Ameen 1 Oct 8, 2021
Run Kotlin/JS libraries in Kotlin/JVM and Kotlin/Native programs

Zipline This library streamlines using Kotlin/JS libraries from Kotlin/JVM and Kotlin/Native programs. It makes it possible to do continuous deploymen

Cash App 1.5k Dec 30, 2022
A somewhat copy past of Jetbrain's code from the kotlin plugin repo to make it humanly possible to test Intellij IDEA kotlin plugins that work on kotlin

A somewhat copy past of Jetbrain's code from the kotlin plugin repo to make it humanly possible to test Intellij IDEA kotlin plugins that work on kotlin

common sense OSS 0 Jan 20, 2022
Real life Kotlin Multiplatform project with an iOS application developed in Swift with SwiftUI, an Android application developed in Kotlin with Jetpack Compose and a backed in Kotlin hosted on AppEngine.

Conferences4Hall Real life Kotlin Multiplatform project with an iOS application developed in Swift with SwiftUI, an Android application developed in K

Gรฉrard Paligot 98 Dec 15, 2022
Android + Kotlin + Github Actions + ktlint + Detekt + Gradle Kotlin DSL + buildSrc = โค๏ธ

kotlin-android-template ?? A simple Github template that lets you create an Android/Kotlin project and be up and running in a few seconds. This templa

Nicola Corti 1.5k Jan 3, 2023
LifecycleMvp 1.2 0.0 Kotlin is MVP architecture implementation with Android Architecture Components and Kotlin language features

MinSDK 14+ Download Gradle Add to project level build.gradle allprojects { repositories { ... maven { url 'https://jitpack.io' }

Robert 20 Nov 9, 2021
Opinionated Redux-like implementation backed by Kotlin Coroutines and Kotlin Multiplatform Mobile

CoRed CoRed is Redux-like implementation that maintains the benefits of Redux's core idea without the boilerplate. No more action types, action creato

Kittinun Vantasin 28 Dec 10, 2022
๐Ÿ‘‹ A common toolkit (utils) โš’๏ธ built to help you further reduce Kotlin boilerplate code and improve development efficiency. Do you think 'kotlin-stdlib' or 'android-ktx' is not sweet enough? You need this! ๐Ÿญ

Toolkit [ ?? Work in progress โ› ?? ??๏ธ ?? ] Snapshot version: repositories { maven("https://s01.oss.sonatype.org/content/repositories/snapshots") }

ๅ‡› 35 Jul 23, 2022
An app architecture for Kotlin/Native on Android/iOS. Use Kotlin Multiplatform Mobile.

An app architecture for Kotlin/Native on Android/iOS. Use Kotlin Multiplatform Mobile. ้กน็›ฎๆžถๆž„ไธป่ฆๅˆ†ไธบๅŽŸ็”Ÿ็ณป็ปŸๅฑ‚ใ€Android/iOSไธšๅŠกSDKๅฑ‚ใ€KMM SDKๅฑ‚ใ€KMMไธšๅŠก้€ป่พ‘SDKๅฑ‚ใ€iOS sdkfra

libill 4 Nov 20, 2022
Provides Kotlin libs and some features for building Kotlin plugins

Kotlin Plugin Provides Kotlin libs and some features for building awesome Kotlin plugins. Can be used instead of CreeperFace's KotlinLib (don't use to

null 3 Dec 24, 2021
Notes-App-Kotlin - Notes App Built Using Kotlin

Notes-App-Kotlin Splash Screen Home Page Adding New Notes Filter Feature Search

Priyanka 4 Oct 2, 2022
Kotlin-client-dsl - A kotlin-based dsl project for a (Client) -> (Plugin) styled program

kotlin-client-dsl a kotlin-based dsl project for a (Client) -> (Plugin) styled p

jackson 3 Dec 10, 2022
A Kotlin Native program to show the time since a date, using Kotlin LibUI

TimeSince A Kotlin Native program to show the time since a date, using Kotlin LibUI Report Bug . Request Feature About The Project TimeSince is a Kotl

Russell Banks 2 May 6, 2022
RoomJetpackCompose is an app written in Kotlin and shows a simple solution to perform CRUD operations in the Room database using Kotlin Flow in clean architecture.

RoomJetpackCompose is an app written in Kotlin and shows a simple solution to perform CRUD operations in the Room database using Kotlin Flow in clean architecture.

Alex 27 Jan 1, 2023
Create an application with Kotlin/JVM and Kotlin/JS, and explore features around code sharing, serialization, server- and client

Practical Kotlin Multiplatform on the Web ๋ณธ ์ €์žฅ์†Œ๋Š” ์ฝ”ํ‹€๋ฆฐ ๋ฉ€ํ‹ฐํ”Œ๋žซํผ ๊ธฐ๋ฐ˜ ์›น ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์›Œํฌ์ˆ(๊ฐ•์ขŒ)์„ ์œ„ํ•ด ์ž‘์„ฑ๋œ ํ…œํ”Œ๋ฆฟ ํ”„๋กœ์ ํŠธ๊ฐ€ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ์›Œํฌ์ˆ ๊ณผ์ •์—์„œ ์ฝ”ํ‹€๋ฆฐ ๋ฉ€ํ‹ฐํ”Œ๋žซํผ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ”„๋ก ํŠธ์—”๋“œ(front-end)๋Š” Ko

SpringRunner 14 Nov 5, 2022
Create an application with Kotlin/JVM and Kotlin/JS, and explore features around code sharing, serialization, server- and client

Building a Full Stack Web App with Kotlin Multiplatform ๋ณธ ์ €์žฅ์†Œ๋Š” INFCON 2022์—์„œ ์ฝ”ํ‹€๋ฆฐ ๋ฉ€ํ‹ฐํ”Œ๋žซํผ ๊ธฐ๋ฐ˜ ์›น ํ”„๋กœ๊ทธ๋ž˜๋ฐ ํ•ธ์ฆˆ์˜จ๋žฉ์„ ์œ„ํ•ด ์ž‘์„ฑ๋œ ํ…œํ”Œ๋ฆฟ ํ”„๋กœ์ ํŠธ๊ฐ€ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ํ•ธ์ฆˆ์˜จ ๊ณผ์ •์—์„œ ์ฝ”ํ‹€๋ฆฐ ๋ฉ€ํ‹ฐํ”Œ๋žซํผ์„

Arawn Park 19 Sep 8, 2022
Kotlin library for Android

KAndroid Kotlin library for Android providing useful extensions to eliminate boilerplate code in Android SDK and focus on productivity. Download Downl

Paweล‚ Gajda 890 Nov 13, 2022
Type-safe time calculations in Kotlin, powered by generics.

Time This library is made for you if you have ever written something like this: val duration = 10 * 1000 to represent a duration of 10 seconds(in mill

Kizito Nwose 958 Dec 10, 2022