pokestack is an all-in-one solution for mobile voice interfaces on Android.

Overview

Spokestack Android

CircleCI Coverage Maven Central Javadocs License

Spokestack is an all-in-one solution for mobile voice interfaces on Android. It provides every piece of the speech processing puzzle, including voice activity detection, wakeword detection, speech recognition, natural language understanding (NLU), and speech synthesis (TTS). Under its default configuration (on newer Android devices), everything except TTS happens directly on the mobile device—no communication with the cloud means faster results and better privacy.

And Android isn't the only platform it supports!

Creating a free account at spokestack.io lets you train your own NLU models and test out TTS without adding code to your app. We can even train a custom wakeword and TTS voice for you, ensuring that your app's voice is unique and memorable.

For a brief introduction, read on, but for more detailed guides, see the following:

Installation


Note: Spokestack used to be hosted on JCenter, but since the announcement of its discontinuation, we've moved distribution to Maven Central. Please ensure that your root-level build.gradle file includes mavenCentral() in its repositories block in order to access versions >= 11.0.2.


A Note on API Level

The minimum Android SDK version listed in Spokestack's manifest is 8 because that's all you should need to run wake word detection and speech recognition. To use other features, it's best to target at least API level 21.

If you include ExoPlayer for TTS playback (see below), you might have trouble running on versions of Android older than API level 24. If you run into this problem, try adding the following line to your gradle.properties file:

android.enableDexingArtifactTransform=false

Dependencies

Add the following to your app's build.gradle:

android {

  // ...

  compileOptions {
    sourceCompatibility JavaVersion.VERSION_1_8
    targetCompatibility JavaVersion.VERSION_1_8
  }
}

// ...

dependencies {
  // ...

  // make sure to check the badge above or "releases" on the right for the
  // latest version!
  implementation 'io.spokestack:spokestack-android:11.4.2'

  // for TensorFlow Lite-powered wakeword detection and/or NLU, add this one too
  implementation 'org.tensorflow:tensorflow-lite:2.4.0'

  // for automatic playback of TTS audio
  implementation 'androidx.media:media:1.3.0'
  implementation 'com.google.android.exoplayer:exoplayer-core:2.14.0'

  // if you plan to use Google ASR, include these
  implementation 'com.google.cloud:google-cloud-speech:1.22.2'
  implementation 'io.grpc:grpc-okhttp:1.28.0'

  // if you plan to use Azure Speech Service, include these, and
  // note that you'll also need to add the following to your top-level
  // build.gradle's `repositories` block:
  // maven { url 'https://csspeechstorage.blob.core.windows.net/maven/' }
  implementation 'com.microsoft.cognitiveservices.speech:client-sdk:1.9.0'

}

Usage

See the quickstart guide for more information, but here's the 30-second version of setup:

  1. You'll need to request the RECORD_AUDIO permission at runtime. See our skeleton project for an example of this. The INTERNET permission is also required but is included by the library's manifest by default.
  2. Add the following code somewhere, probably in an Activity if you're just starting out:
private lateinit var spokestack: Spokestack

// ...
spokestack = Spokestack.Builder()
    .setProperty("wake-detect-path", "$cacheDir/detect.tflite")
    .setProperty("wake-encode-path", "$cacheDir/encode.tflite")
    .setProperty("wake-filter-path", "$cacheDir/filter.tflite")
    .setProperty("nlu-model-path", "$cacheDir/nlu.tflite")
    .setProperty("nlu-metadata-path", "$cacheDir/metadata.json")
    .setProperty("wordpiece-vocab-path", "$cacheDir/vocab.txt")
    .setProperty("spokestack-id", "your-client-id")
    .setProperty("spokestack-secret", "your-secret-key")
    // `applicationContext` is available inside all `Activity`s
    .withAndroidContext(applicationContext)
    // see below; `listener` here inherits from `SpokestackAdapter`
    .addListener(listener)
    .build()

// ...

// starting the pipeline makes Spokestack listen for the wakeword
spokestack.start()

This example assumes you're storing wakeword and NLU models in your app's cache directory; again, see the skeleton project for an example of decompressing these files from the assets bundle into this directory.

To use the demo "Spokestack" wakeword, download the TensorFlow Lite models: detect | encode | filter

If you don't want to bother with that yet, just disable wakeword detection and NLU, and you can leave out all the file paths above:

spokestack = Spokestack.Builder()
    .withoutWakeword()
    .withoutNlu()
    // ...
    .build()

In this case, you'll still need to start() Spokestack as above, but you'll also want to create a button somewhere that calls spokestack.activate() when pressed; this starts ASR, which transcribes user speech.

Alternately, you can set Spokestack to start ASR any time it detects speech by using a non-default speech pipeline profile as described in the speech pipeline documentation. In this case you'd want the VADTriggerAndroidASR profile:

// replace
.withoutWakeword()
// with
.withPipelineProfile("io.spokestack.spokestack.profile.VADTriggerAndroidASR")

Note also the addListener() line during setup. Speech processing happens continuously on a background thread, so your app needs a way to find out when the user has spoken to it. Important events are delivered via events to a subclass of SpokestackAdapter. Your subclass can override as many of the following event methods as you like. Choosing to not implement one won't break anything; you just won't receive those events.

  • speechEvent(SpeechContext.Event, SpeechContext): This communicates events from the speech pipeline, including everything from notifications that ASR has been activated/deactivated to partial and complete transcripts of user speech.
  • nluResult(NLUResult): When the NLU is enabled, user speech is automatically sent through NLU for classification. You'll want the results of that classification to help your app decide what to do next.
  • ttsEvent(TTSEvent): If you're managing TTS playback yourself, you'll want to know when speech you've synthesized is ready to play (the AUDIO_AVAILABLE event); even if you're not, the PLAYBACK_COMPLETE event may be helpful if you want to automatically reactivate the microphone after your app reads a response.
  • trace(SpokestackModule, String): This combines log/trace messages from every Spokestack module. Some modules include trace events in their own event methods, but each of those events is also sent here.
  • error(SpokestackModule, Throwable): This combines errors from every Spokestack module. Some modules include error events in their own event methods, but each of those events is also sent here.

The quickstart guide contains sample implementations of most of these methods.

As we mentioned, classification is handled automatically if NLU is enabled, so the main methods you need to know about while Spokestack is running are:

  • start()/stop(): Starts/stops the pipeline. While running, Spokestack uses the microphone to listen for your app's wakeword unless wakeword is disabled, in which case ASR must be activated another way. The pipeline should be stopped when Spokestack is no longer needed (or when the app is suspended) to free resources.
  • activate()/deactivate(): Activates/deactivates ASR, which listens to and transcribes what the user says.
  • synthesize(SynthesisRequest): Sends text to Spokestack's cloud TTS service to be synthesized as audio. Under the default configuration, this audio will be played automatically when available.

Development

Maven is used for building/deployment, and the package is hosted at Maven Central.

This package requires the Android NDK to be installed and the ANDROID_HOME and ANDROID_NDK_HOME variables to be set. On OSX, ANDROID_HOME is usually set to ~/Library/Android/sdk and ANDROID_NDK_HOME is usually set to ~/Library/Android/sdk/ndk/.

ANDROID_NDK_HOME can also be specified in your local Maven settings.xml file as the android.ndk.path property.

Testing/Coverage

mvn test jacoco:report

Lint

mvn checkstyle:check

Release

Ensure that your Sonatype/Maven Central credentials are in your user settings.xml (usually ~/.m2/settings.xml):

<servers>
    <server>
        <id>ossrhid>
        <username>sonatype-usernameusername>
        <password>sonatype-passwordpassword>
    server>
servers>

On a non-master branch, run the following command. This will prompt you to enter a version number and tag for the new version, push the tag to GitHub, and deploy the package to the Sonatype repository.

mvn release:clean release:prepare release:perform

The Maven goal may fail due to a bug where it tries to upload the files twice, but the release has still happened.

Complete the process by creating and merging a pull request for the new branch on GitHub and updating the release notes by editing the tag.

For additional information about releasing see http://maven.apache.org/maven-release/maven-release-plugin/

License

Copyright 2021 Spokestack, Inc.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

  http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Comments
  • Missing three trained TensorFlow Lite models for android

    Missing three trained TensorFlow Lite models for android

    Hi, Thank you for the voice pipelines. I couldn't find the three models that you mentioned over github.

    The wakeword trigger uses three trained TensorFlow Lite models: a filter model for spectrum preprocessing, an autoregressive encoder encode model, and a detect decoder model for keyword classification

    Can you please guide where to download?

    Thanks

    opened by mustansarsaeed 8
  • Network error while using VADTriggerAndroidASR Profile

    Network error while using VADTriggerAndroidASR Profile

    Hi I am trying to implement the following profile VADTriggerAndroidASR - which seems to give NETWORK_ERROR always after activation. Please find the log below.

    Can you please suggest a solution for this? Some preliminary google search gave the following result.

    This might happen due to having an overlapping MediaRecorder or AudioRecord instance active at the same time (link)

    { isActive: true,
          error: 'io.spokestack.spokestack.android.SpeechRecognizerError: SpeechRecognizer error code 2: NETWORK_ERROR\n\tat AndroidSpeechRecognizer$SpokestackListener.onError(AndroidSpeechRecognizer.java:143)\n\tat android.speech.SpeechRecognizer$InternalListener$1.handleMessage(SpeechRecognizer.java:450)\n\tat android.os.Handler.dispatchMessage(Handler.java:106)\n\tat android.os.Looper.loop(Looper.java:216)\n\tat android.app.ActivityThread.main(ActivityThread.java:7266)\n\tat java.lang.reflect.Method.invoke(Native Method)\n\tat com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:494)\n\tat com.android.internal.os.ZygoteInit.main(ZygoteInit.java:975)\n',
          message: null,
          transcript: '',
          event: 'ERROR' }
    
    bug 
    opened by karthikpala 8
  • TFWakeWordAzureASR Profile

    TFWakeWordAzureASR Profile

    Hello,

    Your docs indicate TFWakewordAzureASR to be a valid pipeline profile.

    java.lang.IllegalArgumentException: TFWakewordAzureASR pipeline profile is invalid!

    What is the correct way to call upon the profile?

    opened by rayyan808 6
  • Error when building Google Cloud ASR pipeline

    Error when building Google Cloud ASR pipeline

    Hi 👋

    I'm trying to set up the Google Cloud ASR with this configuration:

    var json: String? = null
            try {
                val  inputStream: InputStream = assets.open("service_account.json")
                json = inputStream.bufferedReader().use{it.readText()}
            } catch (ex: Exception) {
                ex.printStackTrace()
            }
    
            val builder = Spokestack.Builder()
                .withoutWakeword()
                .withoutNlu()
                .setProperty("spokestack-id", "my id")
                .setProperty("spokestack-secret", "my secret")
                .withAndroidContext(this)
                .addListener(listener)
            builder
                .pipelineBuilder
                .setProperty("google-credentials", json)
                .setProperty("language", "en-US")
                .useProfile("io.spokestack.spokestack.profile.VADTriggerGoogleASR")
            return builder.build()
    

    Unfortunately, this configuration throws the following exception(s):

    E/AndroidRuntime: FATAL EXCEPTION: main
        Process: mypackagename, PID: 26259
        java.lang.RuntimeException: Unable to start activity ComponentInfo{mypackagename.MainActivity}: java.lang.reflect.InvocationTargetException
            at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:3448)
            at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:3595)
            at android.app.servertransaction.LaunchActivityItem.execute(LaunchActivityItem.java:83)
            at android.app.servertransaction.TransactionExecutor.executeCallbacks(TransactionExecutor.java:135)
            at android.app.servertransaction.TransactionExecutor.execute(TransactionExecutor.java:95)
            at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2147)
            at android.os.Handler.dispatchMessage(Handler.java:107)
            at android.os.Looper.loop(Looper.java:237)
            at android.app.ActivityThread.main(ActivityThread.java:7814)
            at java.lang.reflect.Method.invoke(Native Method)
            at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493)
            at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1075)
         Caused by: java.lang.reflect.InvocationTargetException
            at java.lang.reflect.Constructor.newInstance0(Native Method)
            at java.lang.reflect.Constructor.newInstance(Constructor.java:343)
            at io.spokestack.spokestack.SpeechPipeline.createComponents(SpeechPipeline.java:203)
            at io.spokestack.spokestack.SpeechPipeline.start(SpeechPipeline.java:182)
            at io.spokestack.spokestack.Spokestack.start(Spokestack.java:182)
            at mypackagename.MainActivity.onCreate(MainActivity.kt:54)
            at android.app.Activity.performCreate(Activity.java:7955)
            at android.app.Activity.performCreate(Activity.java:7944)
            at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1307)
            at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:3423)
            at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:3595) 
            at android.app.servertransaction.LaunchActivityItem.execute(LaunchActivityItem.java:83) 
            at android.app.servertransaction.TransactionExecutor.executeCallbacks(TransactionExecutor.java:135) 
            at android.app.servertransaction.TransactionExecutor.execute(TransactionExecutor.java:95) 
            at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2147) 
            at android.os.Handler.dispatchMessage(Handler.java:107) 
            at android.os.Looper.loop(Looper.java:237) 
            at android.app.ActivityThread.main(ActivityThread.java:7814) 
            at java.lang.reflect.Method.invoke(Native Method) 
            at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493) 
            at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1075) 
         Caused by: java.lang.NoClassDefFoundError: Failed resolution of: Lcom/google/auth/oauth2/ServiceAccountCredentials;
            at io.spokestack.spokestack.google.GoogleSpeechRecognizer.<init>(GoogleSpeechRecognizer.java:66)
            at java.lang.reflect.Constructor.newInstance0(Native Method) 
            at java.lang.reflect.Constructor.newInstance(Constructor.java:343) 
            at io.spokestack.spokestack.SpeechPipeline.createComponents(SpeechPipeline.java:203) 
            at io.spokestack.spokestack.SpeechPipeline.start(SpeechPipeline.java:182) 
            at io.spokestack.spokestack.Spokestack.start(Spokestack.java:182) 
            at mypackagename.MainActivity.onCreate(MainActivity.kt:54) 
            at android.app.Activity.performCreate(Activity.java:7955) 
            at android.app.Activity.performCreate(Activity.java:7944) 
            at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1307) 
            at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:3423) 
            at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:3595) 
            at android.app.servertransaction.LaunchActivityItem.execute(LaunchActivityItem.java:83) 
            at android.app.servertransaction.TransactionExecutor.executeCallbacks(TransactionExecutor.java:135) 
            at android.app.servertransaction.TransactionExecutor.execute(TransactionExecutor.java:95) 
            at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2147) 
            at android.os.Handler.dispatchMessage(Handler.java:107) 
            at android.os.Looper.loop(Looper.java:237) 
            at android.app.ActivityThread.main(ActivityThread.java:7814) 
            at java.lang.reflect.Method.invoke(Native Method) 
            at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493) 
            at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1075) 
         Caused by: java.lang.ClassNotFoundException: Didn't find class "com.google.auth.oauth2.ServiceAccountCredentials" on path: DexPathList[[zip file "/data/app/mypackagename-IVppXU7KnFHxIENF0_Db1w==/base.apk"],nativeLibraryDirectories=[/data/app/mypackagename-IVppXU7KnFHxIENF0_Db1w==/lib/arm64, /data/app/mypackagename-IVppXU7KnFHxIENF0_Db1w==/base.apk!/lib/arm64-v8a, /system/lib64]]
            at dalvik.system.BaseDexClassLoader.findClass(BaseDexClassLoader.java:196)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:379)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:312)
            at io.spokestack.spokestack.google.GoogleSpeechRecognizer.<init>(GoogleSpeechRecognizer.java:66) 
            at java.lang.reflect.Constructor.newInstance0(Native Method)
    

    I'm using the .json file from the service account configured in GCP. What could be the issue here?

    Thank you! 🙏

    opened by DeBusscherePieter 5
  • NLU module updates, threaded through the `Spokestack` wrapper

    NLU module updates, threaded through the `Spokestack` wrapper

    I'd like an API sanity check here—see if the commit message pasted below makes sense. Further explanation: I think we need to keep Spokestack.start() and stop() limited to interacting with the speech pipeline—a user might want to force the pipeline to stop listening while playing a TTS prompt, so stop() can't call close() on TTS—but it would also be good to have the ability to fully release all modules' resources if necessary.

    Would it be less surprising if start() also ran prepare() implicitly?


    This change makes the NLUService extend AutoCloseable, which forces a close() method on all implementors. The existing service uses this to release TensorFlow model and vocab resources.

    On the NLU manager, close() has been duplicated as a convenience method called release(), named for parallelism with the newly added prepare() method, which is its inverse.

    The NLU module was the last one to provide release/prepare support, so adding it suggests a change to the Spokestack wrapper API, removing the release/prepare methods used for the TTS module and repurposing them to handle resources for both NLU and TTS.

    opened by space-pope 4
  • Crash in WordpieceTextEncoder

    Crash in WordpieceTextEncoder

    I tried to search for min sdk version and in the manifest it looks like api 8, however in the class WordpieceTextEncoder there is a call using

    return this.vocabulary.getOrDefault(token,this.vocabulary.get(UNKNOWN));
    

    in encodeSingle (line 90) from WordpieceTextEncoder I had a crash because of getOrDefault, this is supported only from api 24+ would be nice to use something like

    return this.vocabulary[token] ?:  this.vocabulary.get(UNKNOWN));
    

    version 11.4.1

    (tha's kotlin)

    opened by zenyagami 3
  • Add proguard rules to keep spokestack even when used dynamically

    Add proguard rules to keep spokestack even when used dynamically

    When a project is minified using proguard, Spokestack classes can get removed unless they are loaded and used up front. Some apps may not want to initialize Spokestack until later (e.g. after authentication). I think there's a way to add proguard rules to the project to keep spokestack through minification. For instance, with -keep class com.pylon.spokestack.** { *; }.

    opened by timmywil 2
  • feat: wakeword-only profile and empty ASR

    feat: wakeword-only profile and empty ASR

    This adds a no-op ASR and new pipeline profile for a wakeword-only use case. Upon successful wakeword recognition, the pipeline remains active for a single frame and is then deactivated.

    Closes #155.

    opened by space-pope 1
  • fix: ensure pipeline is resumed when tts stops

    fix: ensure pipeline is resumed when tts stops

    Dumb fix here...the Spokestack wrapper resumes the pipeline when it receives an event signaling that playback has stopped, but we only send that event when we know we were playing content, in order to avoid spurious events caused by system sounds, etc. resetPlayerState clears out that knowledge, and it's currently being run in a different thread than (and thus usually/always being executed before) the code that stops the media player.

    opened by space-pope 1
  • fix: enforce ordering of TTS responses

    fix: enforce ordering of TTS responses

    I'd like a sanity check on the implementation here to make sure I'm not overcomplicating things. Requests have to be async, so I need to impose some external ordering on the responses.


    Currently, TTS requests submitted in close proximity can result in audio being delivered to the client in a different order than the requests were submitted.

    This change keeps the requests asynchronous (as they must be for Android networking) while enforcing ordering for the results by introducing a request queue in the TTS manager.

    opened by space-pope 1
  • update OKHttp dependency

    update OKHttp dependency

    This addresses an error in the logs on startup when running on Android API 30. The error doesn't appear to affect program functionality, but it can make it look like there are SSL problems. Running with the latest OKHttp eliminates the log.

    opened by space-pope 1
  • Custom HTTP timeouts for Spokestack TTS

    Custom HTTP timeouts for Spokestack TTS

    The read and connect timeouts for SpokestackTTSClient should be configurable. This should be achievable by looking for new SpeechConfig properties in SpokestackTTSService and passing them directly to a new client constructor that accepts this configuration (the current constructor should set the configuration to the current values as defaults and call this new constructor).

    enhancement 
    opened by space-pope 0
Releases(spokestack-android-11.5.2)
  • spokestack-android-11.5.2(Aug 20, 2021)

  • spokestack-android-11.5.1(Aug 10, 2021)

    This release includes minor fixes for the Azure speech recognizer.

    Bug Fixes

    • honor locale in azure speech recognizer
    • add partial recognition listener for Azure
    • avoid timeout for empty Azure transcripts
    Source code(tar.gz)
    Source code(zip)
  • spokestack-android-11.5.0(Jul 27, 2021)

  • spokestack-android-11.4.2(Jun 3, 2021)

  • spokestack-android-11.4.1(May 14, 2021)

  • spokestack-android-11.4.0(May 12, 2021)

  • spokestack-android-11.3.0(Apr 29, 2021)

    Features

    • Rasa NLU and dialogue policy
    • Finalize prompts via Spokestack wrapper

    Bug Fixes

    • enforce ordering of TTS responses
    • ensure pipeline is resumed when TTS is stopped with stopPlayback
    Source code(tar.gz)
    Source code(zip)
  • spokestack-android-11.2.0(Apr 2, 2021)

    Features

    • The speech pipeline profile can now be set directly from the Spokestack builder object via withPipelineProfile().
    • Classes for a keyword model can be loaded from the model's metadata file instead of being specified manually (see keyword-metadata-path).
    Source code(tar.gz)
    Source code(zip)
  • spokestack-android-11.1.0(Mar 10, 2021)

    Features

    • added a new profile for using a keyword detector as ASR

    Fixes

    • a typo in the Spokestack ASR profile was preventing it from loading
    Source code(tar.gz)
    Source code(zip)
  • spokestack-android-11.0.3(Mar 5, 2021)

  • spokestack-android-11.0.1(Jan 25, 2021)

    Fixes

    This release updates the receptive field for the keyword recognizer which, along with a new model architecture, improves both its accuracy and computational efficiency.

    Source code(tar.gz)
    Source code(zip)
  • spokestack-android-11.0.0(Dec 17, 2020)

    Breaking changes

    • Spokestack.start() and stop() control all modules; pause() and resume() handle the speech pipeline

      The NLU module has been brought to parity with the other modules in that its services implement AutoCloseable and can release their internal resources (e.g., TensorFlow Lite interpreters). This, in turn, adjusted the higher-level Spokestack API: start and stop now manage resources for all registered modules (ASR, NLU, TTS). To temporarily suspend passive listening (so, for example, the app cannot receive a false positive wakeword activation during a TTS prompt), call pause; to resume, call resume. Spokestack calls pause and resume automatically in response to TTS events so you don't have to remember to do so.

    • TTS module no longer responds to lifecycle events

      Lifecycle responsiveness has been removed from the TTS module, as Spokestack is expected to be a long-lived component that survives Activity transitions. This allows TTS audio to continue playing even as the app transitions between Activitys but changes the builder API for the TTS module and Spokestack itself.

    Features

    • Allow access to the current NLU service

      To match the other modules, NLUManager now provides access to its underlying NLUService via getNlu().

    Fixes

    • Tighten task submission in TTS player

      Tasks submitted to the media player thread have been consolidated to avoid a potential race condition when attempting to play two TTS prompts in quick succession.

    • Timeout event when no keyword is recognized

      In order to match other speech recognizers, KeywordRecognizer has been adjusted to send a TIMEOUT event when no keyword is recognized after its activation limit.

    Source code(tar.gz)
    Source code(zip)
  • spokestack-android-10.0.0(Nov 20, 2020)

    Breaking changes

    • A refactor to the NLU module has changed the type of Spokestack's nlu field from TensorflowNLU to NLUManager. This allows for future expansion to the NLU module to support new providers, like ASR currently works. No new providers are included in this release, but custom implementations can be supplied at build time.
    • A draft dialogue management API is also included, and wired into the Spokestack setup class. The API is undocumented, and its use is optional, so it should be considered experimental for now.

    Features

    • References to the Android Context and Lifecycle can now be updated by convenience methods on Spokestack. This is useful for multi-activity applications that need to adjust component lifecycles along with activity transitions.

    Fixes

    • Runtime addition/removal of a TTS listener is now propagated to the TTS output class so that the intended objects receive playback events.
    • Fixed a potential NPE in SpokestackTTSOutput that occurred when it was released before any synthesized speech had been played.
    Source code(tar.gz)
    Source code(zip)
  • spokestack-android-9.1.0(Oct 12, 2020)

    Features

    • clearer SpokestackAdapter method names We've added module-specific listener method names so it's easier to tell what you're overriding in your listeners

    Fixes

    • allow clients to remove event listeners Previously, only speech pipeline listeners could be removed, which could lead to a memory leak if a multi-activity application registered Activity classes as listeners, as they would not be garbage collected.
    Source code(tar.gz)
    Source code(zip)
  • spokestack-android-9.0.0(Oct 6, 2020)

    This release introduces a turn-key setup wrapper (the Spokestack) class used to build and initialize all Spokestack modules at once. Events can be consumed via the unified SpokestackAdapter interface or old-style listeners attached to the individual modules at build time. See the documentation for more details.

    Breaking changes

    • ASR activation property names have been reverted from active-(min|max) to wake-active-(min|max) to allow the React Native library to set them properly for both platforms.

    Fixes

    • Don't send empty ASR transcripts (#107)
      • If an ASR returns the empty string as either a partial or final result, it will not be reported to the client.
    • Force pipeline stages to reset on deactivate
      • This allows AndroidSpeechRecognizer to be properly stopped when deactivate() is called on the speech pipeline.
    • Send timeout events for empty transcripts
      • If an ASR returns the empty string as a final result, it will be reported to the client as a timeout.
    • Don't send irrelevant playback events
      • PLAYBACK_COMPLETE events were being dispatched from the audio player for audio events unrelated to Spokestack TTS, such as when the Assistant activation beep was complete.
    Source code(tar.gz)
    Source code(zip)
  • spokestack-android-8.1.0(Aug 20, 2020)

    Features

    This release includes the new KeywordRecognizer component, which uses TensorFlow Lite models similar to the wakeword models but designed for multiclass detection. KeywordRecognizer is capable of serving as a lightweight on-device ASR for a limited set of commands.

    Fixes

    Errors reading from the device microphone now stop the speech pipeline instead of attempting to read again on the next dispatch loop. This prevents error spam in host apps, but also means that the app will have to manually call start on a pipeline that has experienced such an error.

    Source code(tar.gz)
    Source code(zip)
  • spokestack-android-8.0.2(Aug 17, 2020)

    This fixes an issue with AndroidSpeechRecognizer where it was possible to stop the speech pipeline without freeing the speech context, which made wakeword and ASR inoperable on a pipeline restart.

    Source code(tar.gz)
    Source code(zip)
  • spokestack-android-8.0.1(Aug 13, 2020)

  • spokestack-android-8.0.0(Aug 11, 2020)

    This release resolves issues with stale state left over after Spokestack regains control of the microphone from a component that previously had it. It's a major release because the fix required adding a new method to the SpeechProcessor interface, so any custom implementations will also need to include this method.

    Source code(tar.gz)
    Source code(zip)
  • spokestack-android-7.0.2(Aug 10, 2020)

    This release addresses an issue with control flow when using the AndroidSpeechRecognizer. Programmatic reactivation of the pipeline was occasionally being blocked due to internal management of the microphone; this should no longer happen.

    Source code(tar.gz)
    Source code(zip)
  • spokestack-android-7.0.1(Aug 5, 2020)

  • spokestack-android-7.0.0(Jul 29, 2020)

    Breaking changes

    Calling start() on a running SpeechPipeline no longer throws an exception. This should only be an issue for clients that were relying on the error, which is inadvisable.

    Features

    • Expose Spokestack NLU slot types Not all NLUs parse slots into types, but Spokestack's NLU does, and those types need to be exposed to other libraries that wrap spokestack-android (such as react-native-spokestack).
    • Propagate partial ASR results to client ASR providers that offer the ability to receive partial results (all current providers do) now send those results to speech listeners via the new PARTIAL_RECOGNIZE event. The text of the result is available as the SpeechContext's transcript.

    Fixes

    • Re-authorize Spokestack ASR socket For two back-to-back utterances submitted to the Spokestack ASR, the first frame of the second request was resulting in an error due to a missing handshake.
    • Workaround for Android ASR timeouts Android's built-in SpeechRecognizer has been returning NO_MATCH errors more than it used to, notably in cases where it used to send SPEECH_TIMEOUT. In response, we've temporarily remapped NO_MATCH to fire a timeout event to speech listeners instead of an error.
    Source code(tar.gz)
    Source code(zip)
  • spokestack-android-6.0.0(Jul 24, 2020)

    Note The only reason this is a major release is the new annotations (described below). No breaking changes to actual library features are expected

    Features

    • Spokestack cloud-based ASR provider Spokestack credentials can now be used to access a cloud-based ASR independent from Google and Microsoft. Pipeline profiles that use this new component are also included.
    • Annotation updates for IDE convenience Parameters for speech recognition, NLU, and TTS events have been annotated with @NonNull to enable cleaner client code. The downside is that the compiler/Android Studio will now throw errors for any Kotlin event listener methods marked as overrides that mark these parameters as optional.

    Fixes

    • Remove preference for offline Android ASR The Android ASR now throws an error if the caller indicates that the offline model should be used. This flag has been removed from Spokestack's usage of the on-device Android ASR as a temporary fix.
    Source code(tar.gz)
    Source code(zip)
  • spokestack-android-5.6.0(Jun 15, 2020)

    Features

    • Allow TTS playback to be stopped The TTSManager.stopPlayback() method will stop playback of any currently playing synthesis result and clear any queued results. This is useful if you want to allow the user to interrupt system speech and not have queued speech resume playback when the ASR request ends.

    Fixes

    • Microphone sharing for Android ASR (#53) This addresses an issue with Spokestack sharing the microphone with platform-supplied SpeechRecognizer instances. AndroidSpeechRecognizer and associated pipeline profiles now have much broader device compatibility.

    • Allow i_* tag series to be parsed as slots NLU results were previously dependent on the model returning strictly valid IOB tag sequences. This has been relaxed to allow either a b_ or i_ tag to mark the beginning of a slot.

    Source code(tar.gz)
    Source code(zip)
  • spokestack-android-5.5.0(Jun 10, 2020)

    Features

    • new TTS event for playback completion The PLAYBACK_COMPLETE notifies listeners when the media player has finished playing TTS prompts so that ASR can be reactivated if desired.

    Fixes

    • (NLU) If a slot value recognized in an utterance is not valid according to the model metadata, a Slot with a null value will be returned in order to avoid failing the entire classification with an exception.
    Source code(tar.gz)
    Source code(zip)
  • spokestack-android-5.4.0(Jun 1, 2020)

    Features

    Slot output changes:

    • Slots declared by an intent but not tagged by the model are returned to the caller in the output with a null value
    • Slots not declared by an intent but tagged by the model do not cause an error and are not returned to the caller

    Fixes

    Removed references to a deprecated configuration parameter for the NLU model.

    Source code(tar.gz)
    Source code(zip)
  • spokestack-android-5.3.0(Apr 30, 2020)

    Features

    • Faster first classification via the NLU model

    Fixes

    • TTS errors, both GraphQL and HTTP, are now surfaced to the client as ERROR events; these were previously being swallowed by the library.
    Source code(tar.gz)
    Source code(zip)
  • spokestack-android-5.2.0(Apr 22, 2020)

    Features

    • Return unknown slots as raw values
      • If the NLU model tags a slot not associated with the predicted intent, that slot's value will be returned as a String in the NLUResult's slot map instead of causing an error.
    • Support new slot features
      • Newer models include intent-level implicit slots and a capture_name field that changes the slot's return name; these are now supported by the client library.

    Fixes

    • Allow slot tags to be discontinuous
      • Separate spans of "b_slot_name" tags will now be concatenated instead of dropping all tokens but the last one
    • Trim punctuation for slot value extraction
      • Slots recognized at the end of an input string that contains punctuation will no longer cause an error. As a consequence, both leading and trailing punctuation are removed from the slot value before parsing, so such punctuation is invalid in training data for the slot values.
    Source code(tar.gz)
    Source code(zip)
  • spokestack-android-5.1.0(Mar 16, 2020)

    Features

    • On-device NLU via TensorFlow Lite and BERT-based custom models

    Fixes

    • Declare a minSdkVersion and targetSdkVersion in the manifest to avoid requesting suspicious-looking permissions by default
    Source code(tar.gz)
    Source code(zip)
  • spokestack-android-5.0.0(Jan 24, 2020)

    Breaking changes

    • Turn activation timeout into a component
      • This removes the activation timeout for wakeword activation and changes the names of the properties used to control it

    Features

    • Azure Speech Service ASR
    • Speech Markdown support in TTS
    • SpeechPipeline.Builder profiles

    Fixes

    • Set default vad-fall-delay to 500ms
    • Simple ProGuard rules to avoid Spokestack classes being removed by the tool.
    Source code(tar.gz)
    Source code(zip)
Owner
Spokestack
Voice development platform that enables customized voice navigation for mobile and browser applications
Spokestack
Open-sourced voice labeling application

vLabeler vLabeler is an open-sourced voice labeling application, aiming: Modern and fluent UI/UX Customizable labeling/import/export, to be used by di

null 42 Dec 23, 2022
Is an All in One app for Muslims with lots of features such as Prayer Times & Adhan, Collections of Dhikr and Prayer sourced from Authentic Hadith, The Holy Qur'an, Qibla, Notes and many more!

Is an All in One app for Muslims with lots of features such as Prayer Times & Adhan, Collections of Dhikr and Prayer sourced from Authentic Hadith, The Holy Qur'an, Qibla, Notes and many more!

DzikirQu 112 Dec 26, 2022
Video Transcoder is an application which uses the open source program FFmpeg to transcode video files from one format to another.

Video Transcoder Do you want to encode videos on your phone into different formats, trim videos, or extract audio? Are you looking for a free solution

Branden Archer 358 Dec 30, 2022
This repository contains all the development I did to the Jitsi Video calling application.

This repository contains all the development I did to the Jitsi Video calling application. The jitsi-Media-Transform-mod directory contains the develo

null 1 Mar 22, 2022
iOS(iPhone & iPad) and Android Radio/Podcast Streaming Apps built in Kotlin Multiplatform Mobile (KMM) with SwiftUI & Jetpack Compose

iOS(iPhone & iPad) and Android Radio/Podcast Streaming Apps built in Kotlin Multiplatform Mobile (KMM) with SwiftUI & Jetpack Compose

MwaiBanda 1 May 31, 2022
A radio player mobile application which streams audio from Radio Sai Global Harmony.

A radio player mobile application which streams audio from Radio Sai Global Harmony.

Sai Rajendra Immadi 11 Nov 26, 2022
A Flutter plugin that retrieves images and videos from mobile native gallery

Photo Gallery A Flutter plugin that retrieves images and videos from mobile native gallery. Installation First, add photo_gallery as a dependency in y

MUHAMMAD USMAN 0 Nov 1, 2021
A basic natural gas mobile application homework.

Natural Gas Homework This is a basic natural gas mobile application project. Compiling You need Android Studio https://developer.android.com/studio/ a

null 0 Mar 9, 2022
A Spotify Clone that plays music and has similar UI to actual Spotify Mobile App. Made with Exoplayer and love ❤️

Spotify Clone A Spotify Clone App that can play music, and has a good looking UI that is very similar to actual Spotify Mobile App on Play Store I use

Utku Oruç 4 Oct 12, 2022
:sound: [Android Library] Easily generate pure audio tone of any frequency in android

Android library to easily generate audio tone in android. Generating pure tone of an specific frequency was never that easy. ZenTone does all the heav

Nishant Srivastava 102 Dec 19, 2022
mpv-android is a video player for Android based on libmpv.

mpv-android is a video player for Android based on libmpv.

null 1.1k Jan 6, 2023
FFmpeg compiled for Android. Execute FFmpeg commands with ease in your Android app.

FFMPEG video operations FFmpeg compiled for Android. Execute FFmpeg commands with ease in your Android app. Getting Started This project is provide in

Simform Solutions 277 Jan 2, 2023
Echo is a lightweight and minimal music player for Android, built with Android Studio and written in Kotlin

Echo - Echo, A light-weight, minimal music player for Android, with shuffle, favorites and audio visualization

Divins Mathew 0 Feb 7, 2022
Youtube Android Clone 🚀an Android Youtube Clone made out of XML and Kotlin

Youtube Android Clone ?? This app consumes The Youtube Api to fetch and display a list of popular videos, The app uses MVVM design pattern to allow se

Breens Robert 38 Dec 13, 2022
An extensible media player for Android

ExoPlayer ExoPlayer is an application level media player for Android. It provides an alternative to Android’s MediaPlayer API for playing audio and vi

Google 20.2k Jan 1, 2023
Android/iOS video player based on FFmpeg n3.4, with MediaCodec, VideoToolbox support.

ijkplayer Platform Build Status Android iOS Video player based on ffplay Download Android: Gradle # required allprojects { repositories {

bilibili 31k Jan 3, 2023
Custom Android view with video player, loader and placeholder image

VideoPlayerView Custom Android view with video player, loader and placeholder image. To stay up-to-date with news about the library Usage Here is an e

Marcin Moskała 89 Nov 18, 2022
[] FFmpeg build for android random architectures with example jni

AndroidFFmpegLibrary This project aims to create working library providing playing video files in android via ffmpeg libraries. With some effort and N

AppUnite Sp. z o.o. Spk. 1.1k Dec 27, 2022