pokestack is an all-in-one solution for mobile voice interfaces on Android.

Spokestack

Last update: Nov 20, 2022

Related tags

Video/Audio android text-to-speech nlu voice speech tts speech-synthesis voice-recognition speech-recognition vad asr voice-assistant natural-language-understanding voice-as-an-interface speech-api voice-activity-detection voice-synthesis wakeword wakeword-activation

Overview

Spokestack is an all-in-one solution for mobile voice interfaces on Android. It provides every piece of the speech processing puzzle, including voice activity detection, wakeword detection, speech recognition, natural language understanding (NLU), and speech synthesis (TTS). Under its default configuration (on newer Android devices), everything except TTS happens directly on the mobile device—no communication with the cloud means faster results and better privacy.

And Android isn't the only platform it supports!

Creating a free account at spokestack.io lets you train your own NLU models and test out TTS without adding code to your app. We can even train a custom wakeword and TTS voice for you, ensuring that your app's voice is unique and memorable.

For a brief introduction, read on, but for more detailed guides, see the following:

Installation

Note: Spokestack used to be hosted on JCenter, but since the announcement of its discontinuation, we've moved distribution to Maven Central. Please ensure that your root-level build.gradle file includes mavenCentral() in its repositories block in order to access versions >= 11.0.2.

A Note on API Level

The minimum Android SDK version listed in Spokestack's manifest is 8 because that's all you should need to run wake word detection and speech recognition. To use other features, it's best to target at least API level 21.

If you include ExoPlayer for TTS playback (see below), you might have trouble running on versions of Android older than API level 24. If you run into this problem, try adding the following line to your gradle.properties file:

android.enableDexingArtifactTransform=false

Dependencies

Add the following to your app's build.gradle:

android {

  // ...

  compileOptions {
    sourceCompatibility JavaVersion.VERSION_1_8
    targetCompatibility JavaVersion.VERSION_1_8
  }
}

// ...

dependencies {
  // ...

  // make sure to check the badge above or "releases" on the right for the
  // latest version!
  implementation 'io.spokestack:spokestack-android:11.4.2'

  // for TensorFlow Lite-powered wakeword detection and/or NLU, add this one too
  implementation 'org.tensorflow:tensorflow-lite:2.4.0'

  // for automatic playback of TTS audio
  implementation 'androidx.media:media:1.3.0'
  implementation 'com.google.android.exoplayer:exoplayer-core:2.14.0'

  // if you plan to use Google ASR, include these
  implementation 'com.google.cloud:google-cloud-speech:1.22.2'
  implementation 'io.grpc:grpc-okhttp:1.28.0'

  // if you plan to use Azure Speech Service, include these, and
  // note that you'll also need to add the following to your top-level
  // build.gradle's `repositories` block:
  // maven { url 'https://csspeechstorage.blob.core.windows.net/maven/' }
  implementation 'com.microsoft.cognitiveservices.speech:client-sdk:1.9.0'

}

Usage

See the quickstart guide for more information, but here's the 30-second version of setup:

You'll need to request the RECORD_AUDIO permission at runtime. See our skeleton project for an example of this. The INTERNET permission is also required but is included by the library's manifest by default.
Add the following code somewhere, probably in an Activity if you're just starting out:

private lateinit var spokestack: Spokestack

// ...
spokestack = Spokestack.Builder()
    .setProperty("wake-detect-path", "$cacheDir/detect.tflite")
    .setProperty("wake-encode-path", "$cacheDir/encode.tflite")
    .setProperty("wake-filter-path", "$cacheDir/filter.tflite")
    .setProperty("nlu-model-path", "$cacheDir/nlu.tflite")
    .setProperty("nlu-metadata-path", "$cacheDir/metadata.json")
    .setProperty("wordpiece-vocab-path", "$cacheDir/vocab.txt")
    .setProperty("spokestack-id", "your-client-id")
    .setProperty("spokestack-secret", "your-secret-key")
    // `applicationContext` is available inside all `Activity`s
    .withAndroidContext(applicationContext)
    // see below; `listener` here inherits from `SpokestackAdapter`
    .addListener(listener)
    .build()

// ...

// starting the pipeline makes Spokestack listen for the wakeword
spokestack.start()

This example assumes you're storing wakeword and NLU models in your app's cache directory; again, see the skeleton project for an example of decompressing these files from the assets bundle into this directory.

To use the demo "Spokestack" wakeword, download the TensorFlow Lite models: detect | encode | filter

If you don't want to bother with that yet, just disable wakeword detection and NLU, and you can leave out all the file paths above:

spokestack = Spokestack.Builder()
    .withoutWakeword()
    .withoutNlu()
    // ...
    .build()

In this case, you'll still need to start() Spokestack as above, but you'll also want to create a button somewhere that calls spokestack.activate() when pressed; this starts ASR, which transcribes user speech.

Alternately, you can set Spokestack to start ASR any time it detects speech by using a non-default speech pipeline profile as described in the speech pipeline documentation. In this case you'd want the VADTriggerAndroidASR profile:

// replace
.withoutWakeword()
// with
.withPipelineProfile("io.spokestack.spokestack.profile.VADTriggerAndroidASR")

Note also the addListener() line during setup. Speech processing happens continuously on a background thread, so your app needs a way to find out when the user has spoken to it. Important events are delivered via events to a subclass of SpokestackAdapter. Your subclass can override as many of the following event methods as you like. Choosing to not implement one won't break anything; you just won't receive those events.

speechEvent(SpeechContext.Event, SpeechContext): This communicates events from the speech pipeline, including everything from notifications that ASR has been activated/deactivated to partial and complete transcripts of user speech.
nluResult(NLUResult): When the NLU is enabled, user speech is automatically sent through NLU for classification. You'll want the results of that classification to help your app decide what to do next.
ttsEvent(TTSEvent): If you're managing TTS playback yourself, you'll want to know when speech you've synthesized is ready to play (the AUDIO_AVAILABLE event); even if you're not, the PLAYBACK_COMPLETE event may be helpful if you want to automatically reactivate the microphone after your app reads a response.
trace(SpokestackModule, String): This combines log/trace messages from every Spokestack module. Some modules include trace events in their own event methods, but each of those events is also sent here.
error(SpokestackModule, Throwable): This combines errors from every Spokestack module. Some modules include error events in their own event methods, but each of those events is also sent here.

The quickstart guide contains sample implementations of most of these methods.

As we mentioned, classification is handled automatically if NLU is enabled, so the main methods you need to know about while Spokestack is running are:

start()/stop(): Starts/stops the pipeline. While running, Spokestack uses the microphone to listen for your app's wakeword unless wakeword is disabled, in which case ASR must be activated another way. The pipeline should be stopped when Spokestack is no longer needed (or when the app is suspended) to free resources.
activate()/deactivate(): Activates/deactivates ASR, which listens to and transcribes what the user says.
synthesize(SynthesisRequest): Sends text to Spokestack's cloud TTS service to be synthesized as audio. Under the default configuration, this audio will be played automatically when available.

Development

Maven is used for building/deployment, and the package is hosted at Maven Central.

This package requires the Android NDK to be installed and the ANDROID_HOME and ANDROID_NDK_HOME variables to be set. On OSX, ANDROID_HOME is usually set to ~/Library/Android/sdk and ANDROID_NDK_HOME is usually set to ~/Library/Android/sdk/ndk/.

ANDROID_NDK_HOME can also be specified in your local Maven settings.xml file as the android.ndk.path property.

Testing/Coverage

mvn test jacoco:report

Lint

mvn checkstyle:check

Release

Ensure that your Sonatype/Maven Central credentials are in your user settings.xml (usually ~/.m2/settings.xml):

<servers>
    <server>
        <id>ossrhid>
        <username>sonatype-usernameusername>
        <password>sonatype-passwordpassword>
    server>
servers>

On a non-master branch, run the following command. This will prompt you to enter a version number and tag for the new version, push the tag to GitHub, and deploy the package to the Sonatype repository.

mvn release:clean release:prepare release:perform

The Maven goal may fail due to a bug where it tries to upload the files twice, but the release has still happened.

Complete the process by creating and merging a pull request for the new branch on GitHub and updating the release notes by editing the tag.

For additional information about releasing see http://maven.apache.org/maven-release/maven-release-plugin/

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

  http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Comments

Missing three trained TensorFlow Lite models for android

Hi, Thank you for the voice pipelines. I couldn't find the three models that you mentioned over github.

The wakeword trigger uses three trained TensorFlow Lite models: a filter model for spectrum preprocessing, an autoregressive encoder encode model, and a detect decoder model for keyword classification

Can you please guide where to download?

Thanks

opened by mustansarsaeed 8

Network error while using VADTriggerAndroidASR Profile

Hi I am trying to implement the following profile VADTriggerAndroidASR - which seems to give NETWORK_ERROR always after activation. Please find the log below.

Can you please suggest a solution for this? Some preliminary google search gave the following result.

This might happen due to having an overlapping MediaRecorder or AudioRecord instance active at the same time (link)

{ isActive: true,
      error: 'io.spokestack.spokestack.android.SpeechRecognizerError: SpeechRecognizer error code 2: NETWORK_ERROR\n\tat AndroidSpeechRecognizer$SpokestackListener.onError(AndroidSpeechRecognizer.java:143)\n\tat android.speech.SpeechRecognizer$InternalListener$1.handleMessage(SpeechRecognizer.java:450)\n\tat android.os.Handler.dispatchMessage(Handler.java:106)\n\tat android.os.Looper.loop(Looper.java:216)\n\tat android.app.ActivityThread.main(ActivityThread.java:7266)\n\tat java.lang.reflect.Method.invoke(Native Method)\n\tat com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:494)\n\tat com.android.internal.os.ZygoteInit.main(ZygoteInit.java:975)\n',
      message: null,
      transcript: '',
      event: 'ERROR' }

bug

opened by karthikpala 8

TFWakeWordAzureASR Profile

Hello,

Your docs indicate TFWakewordAzureASR to be a valid pipeline profile.

java.lang.IllegalArgumentException: TFWakewordAzureASR pipeline profile is invalid!

What is the correct way to call upon the profile?

opened by rayyan808 6

Error when building Google Cloud ASR pipeline

Hi 👋

I'm trying to set up the Google Cloud ASR with this configuration:

var json: String? = null
        try {
            val  inputStream: InputStream = assets.open("service_account.json")
            json = inputStream.bufferedReader().use{it.readText()}
        } catch (ex: Exception) {
            ex.printStackTrace()
        }

        val builder = Spokestack.Builder()
            .withoutWakeword()
            .withoutNlu()
            .setProperty("spokestack-id", "my id")
            .setProperty("spokestack-secret", "my secret")
            .withAndroidContext(this)
            .addListener(listener)
        builder
            .pipelineBuilder
            .setProperty("google-credentials", json)
            .setProperty("language", "en-US")
            .useProfile("io.spokestack.spokestack.profile.VADTriggerGoogleASR")
        return builder.build()

Unfortunately, this configuration throws the following exception(s):

E/AndroidRuntime: FATAL EXCEPTION: main
    Process: mypackagename, PID: 26259
    java.lang.RuntimeException: Unable to start activity ComponentInfo{mypackagename.MainActivity}: java.lang.reflect.InvocationTargetException
        at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:3448)
        at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:3595)
        at android.app.servertransaction.LaunchActivityItem.execute(LaunchActivityItem.java:83)
        at android.app.servertransaction.TransactionExecutor.executeCallbacks(TransactionExecutor.java:135)
        at android.app.servertransaction.TransactionExecutor.execute(TransactionExecutor.java:95)
        at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2147)
        at android.os.Handler.dispatchMessage(Handler.java:107)
        at android.os.Looper.loop(Looper.java:237)
        at android.app.ActivityThread.main(ActivityThread.java:7814)
        at java.lang.reflect.Method.invoke(Native Method)
        at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493)
        at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1075)
     Caused by: java.lang.reflect.InvocationTargetException
        at java.lang.reflect.Constructor.newInstance0(Native Method)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:343)
        at io.spokestack.spokestack.SpeechPipeline.createComponents(SpeechPipeline.java:203)
        at io.spokestack.spokestack.SpeechPipeline.start(SpeechPipeline.java:182)
        at io.spokestack.spokestack.Spokestack.start(Spokestack.java:182)
        at mypackagename.MainActivity.onCreate(MainActivity.kt:54)
        at android.app.Activity.performCreate(Activity.java:7955)
        at android.app.Activity.performCreate(Activity.java:7944)
        at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1307)
        at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:3423)
        at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:3595) 
        at android.app.servertransaction.LaunchActivityItem.execute(LaunchActivityItem.java:83) 
        at android.app.servertransaction.TransactionExecutor.executeCallbacks(TransactionExecutor.java:135) 
        at android.app.servertransaction.TransactionExecutor.execute(TransactionExecutor.java:95) 
        at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2147) 
        at android.os.Handler.dispatchMessage(Handler.java:107) 
        at android.os.Looper.loop(Looper.java:237) 
        at android.app.ActivityThread.main(ActivityThread.java:7814) 
        at java.lang.reflect.Method.invoke(Native Method) 
        at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493) 
        at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1075) 
     Caused by: java.lang.NoClassDefFoundError: Failed resolution of: Lcom/google/auth/oauth2/ServiceAccountCredentials;
        at io.spokestack.spokestack.google.GoogleSpeechRecognizer.<init>(GoogleSpeechRecognizer.java:66)
        at java.lang.reflect.Constructor.newInstance0(Native Method) 
        at java.lang.reflect.Constructor.newInstance(Constructor.java:343) 
        at io.spokestack.spokestack.SpeechPipeline.createComponents(SpeechPipeline.java:203) 
        at io.spokestack.spokestack.SpeechPipeline.start(SpeechPipeline.java:182) 
        at io.spokestack.spokestack.Spokestack.start(Spokestack.java:182) 
        at mypackagename.MainActivity.onCreate(MainActivity.kt:54) 
        at android.app.Activity.performCreate(Activity.java:7955) 
        at android.app.Activity.performCreate(Activity.java:7944) 
        at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1307) 
        at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:3423) 
        at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:3595) 
        at android.app.servertransaction.LaunchActivityItem.execute(LaunchActivityItem.java:83) 
        at android.app.servertransaction.TransactionExecutor.executeCallbacks(TransactionExecutor.java:135) 
        at android.app.servertransaction.TransactionExecutor.execute(TransactionExecutor.java:95) 
        at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2147) 
        at android.os.Handler.dispatchMessage(Handler.java:107) 
        at android.os.Looper.loop(Looper.java:237) 
        at android.app.ActivityThread.main(ActivityThread.java:7814) 
        at java.lang.reflect.Method.invoke(Native Method) 
        at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493) 
        at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1075) 
     Caused by: java.lang.ClassNotFoundException: Didn't find class "com.google.auth.oauth2.ServiceAccountCredentials" on path: DexPathList[[zip file "/data/app/mypackagename-IVppXU7KnFHxIENF0_Db1w==/base.apk"],nativeLibraryDirectories=[/data/app/mypackagename-IVppXU7KnFHxIENF0_Db1w==/lib/arm64, /data/app/mypackagename-IVppXU7KnFHxIENF0_Db1w==/base.apk!/lib/arm64-v8a, /system/lib64]]
        at dalvik.system.BaseDexClassLoader.findClass(BaseDexClassLoader.java:196)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:379)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:312)
        at io.spokestack.spokestack.google.GoogleSpeechRecognizer.<init>(GoogleSpeechRecognizer.java:66) 
        at java.lang.reflect.Constructor.newInstance0(Native Method)

I'm using the .json file from the service account configured in GCP. What could be the issue here?

Thank you! 🙏

opened by DeBusscherePieter 5

NLU module updates, threaded through the `Spokestack` wrapper

I'd like an API sanity check here—see if the commit message pasted below makes sense. Further explanation: I think we need to keep Spokestack.start() and stop() limited to interacting with the speech pipeline—a user might want to force the pipeline to stop listening while playing a TTS prompt, so stop() can't call close() on TTS—but it would also be good to have the ability to fully release all modules' resources if necessary.

Would it be less surprising if start() also ran prepare() implicitly?

This change makes the NLUService extend AutoCloseable, which forces a close() method on all implementors. The existing service uses this to release TensorFlow model and vocab resources.

On the NLU manager, close() has been duplicated as a convenience method called release(), named for parallelism with the newly added prepare() method, which is its inverse.

The NLU module was the last one to provide release/prepare support, so adding it suggests a change to the Spokestack wrapper API, removing the release/prepare methods used for the TTS module and repurposing them to handle resources for both NLU and TTS.

opened by space-pope 4
Crash in WordpieceTextEncoder
I tried to search for min sdk version and in the manifest it looks like api 8, however in the class WordpieceTextEncoder there is a call using

return this.vocabulary.getOrDefault(token,this.vocabulary.get(UNKNOWN));

in encodeSingle (line 90) from WordpieceTextEncoder I had a crash because of getOrDefault, this is supported only from api 24+ would be nice to use something like

return this.vocabulary[token] ?: this.vocabulary.get(UNKNOWN));

version 11.4.1

(tha's kotlin)
opened by zenyagami 3
Add proguard rules to keep spokestack even when used dynamically

When a project is minified using proguard, Spokestack classes can get removed unless they are loaded and used up front. Some apps may not want to initialize Spokestack until later (e.g. after authentication). I think there's a way to add proguard rules to the project to keep spokestack through minification. For instance, with -keep class com.pylon.spokestack.** { *; }.

opened by timmywil 2
feat: wakeword-only profile and empty ASR

This adds a no-op ASR and new pipeline profile for a wakeword-only use case. Upon successful wakeword recognition, the pipeline remains active for a single frame and is then deactivated.

Closes #155.

opened by space-pope 1
fix: ensure pipeline is resumed when tts stops

Dumb fix here...the Spokestack wrapper resumes the pipeline when it receives an event signaling that playback has stopped, but we only send that event when we know we were playing content, in order to avoid spurious events caused by system sounds, etc. resetPlayerState clears out that knowledge, and it's currently being run in a different thread than (and thus usually/always being executed before) the code that stops the media player.

opened by space-pope 1
fix: enforce ordering of TTS responses

I'd like a sanity check on the implementation here to make sure I'm not overcomplicating things. Requests have to be async, so I need to impose some external ordering on the responses.

Currently, TTS requests submitted in close proximity can result in audio being delivered to the client in a different order than the requests were submitted.

This change keeps the requests asynchronous (as they must be for Android networking) while enforcing ordering for the results by introducing a request queue in the TTS manager.

opened by space-pope 1
update OKHttp dependency

This addresses an error in the logs on startup when running on Android API 30. The error doesn't appear to affect program functionality, but it can make it look like there are SSL problems. Running with the latest OKHttp eliminates the log.

opened by space-pope 1
Custom HTTP timeouts for Spokestack TTS

The read and connect timeouts for SpokestackTTSClient should be configurable. This should be achievable by looking for new SpeechConfig properties in SpokestackTTSService and passing them directly to a new client constructor that accepts this configuration (the current constructor should set the configuration to the current values as defaults and call this new constructor).
enhancement

opened by space-pope 0

Releases(spokestack-android-11.5.2)

spokestack-android-11.5.2(Aug 20, 2021)
Fixes

calling resume on a pipeline without first calling pause no longer causes the pipeline to hang

Source code(tar.gz)
Source code(zip)
spokestack-android-11.5.1(Aug 10, 2021)
This release includes minor fixes for the Azure speech recognizer.

Bug Fixes

honor locale in azure speech recognizer

add partial recognition listener for Azure

avoid timeout for empty Azure transcripts

Source code(tar.gz)
Source code(zip)
spokestack-android-11.5.0(Jul 27, 2021)
Features

wakeword-only pipeline profile and empty ASR stage

Source code(tar.gz)
Source code(zip)
spokestack-android-11.4.2(Jun 3, 2021)
Bug Fixes

Refactored getOrDefault call in NLU module that caused crashes on devices running < API 24

Source code(tar.gz)
Source code(zip)
spokestack-android-11.4.1(May 14, 2021)
Bug Fixes

Reset wake word detector on pipeline deactivation to avoid spurious reactivation

Source code(tar.gz)
Source code(zip)
spokestack-android-11.4.0(May 12, 2021)
Features

New speech pipeline profile for using on-device wakeword detection and keyword recognition ASR simultaneously

Source code(tar.gz)
Source code(zip)
spokestack-android-11.3.0(Apr 29, 2021)
Features

Rasa NLU and dialogue policy

Finalize prompts via Spokestack wrapper

Bug Fixes

enforce ordering of TTS responses

ensure pipeline is resumed when TTS is stopped with stopPlayback

Source code(tar.gz)
Source code(zip)
spokestack-android-11.2.0(Apr 2, 2021)
Features

The speech pipeline profile can now be set directly from the Spokestack builder object via withPipelineProfile().

Classes for a keyword model can be loaded from the model's metadata file instead of being specified manually (see keyword-metadata-path).

Source code(tar.gz)
Source code(zip)
spokestack-android-11.1.0(Mar 10, 2021)
Features

added a new profile for using a keyword detector as ASR

Fixes

a typo in the Spokestack ASR profile was preventing it from loading

Source code(tar.gz)
Source code(zip)
spokestack-android-11.0.3(Mar 5, 2021)

This release updates a default configuration value of the keyword recognizer to match the current training regime for keyword models.
Source code(tar.gz)
Source code(zip)
spokestack-android-11.0.1(Jan 25, 2021)

Fixes

This release updates the receptive field for the keyword recognizer which, along with a new model architecture, improves both its accuracy and computational efficiency.
Source code(tar.gz)
Source code(zip)
spokestack-android-11.0.0(Dec 17, 2020)
Breaking changes

Spokestack.start() and stop() control all modules; pause() and resume() handle the speech pipeline

The NLU module has been brought to parity with the other modules in that its services implement AutoCloseable and can release their internal resources (e.g., TensorFlow Lite interpreters). This, in turn, adjusted the higher-level Spokestack API: start and stop now manage resources for all registered modules (ASR, NLU, TTS). To temporarily suspend passive listening (so, for example, the app cannot receive a false positive wakeword activation during a TTS prompt), call pause; to resume, call resume. Spokestack calls pause and resume automatically in response to TTS events so you don't have to remember to do so.

TTS module no longer responds to lifecycle events

Lifecycle responsiveness has been removed from the TTS module, as Spokestack is expected to be a long-lived component that survives Activity transitions. This allows TTS audio to continue playing even as the app transitions between Activitys but changes the builder API for the TTS module and Spokestack itself.

Features

Allow access to the current NLU service

To match the other modules, NLUManager now provides access to its underlying NLUService via getNlu().

Fixes

Tighten task submission in TTS player

Tasks submitted to the media player thread have been consolidated to avoid a potential race condition when attempting to play two TTS prompts in quick succession.

Timeout event when no keyword is recognized

In order to match other speech recognizers, KeywordRecognizer has been adjusted to send a TIMEOUT event when no keyword is recognized after its activation limit.

Source code(tar.gz)
Source code(zip)
spokestack-android-10.0.0(Nov 20, 2020)
Breaking changes

A refactor to the NLU module has changed the type of Spokestack's nlu field from TensorflowNLU to NLUManager. This allows for future expansion to the NLU module to support new providers, like ASR currently works. No new providers are included in this release, but custom implementations can be supplied at build time.

A draft dialogue management API is also included, and wired into the Spokestack setup class. The API is undocumented, and its use is optional, so it should be considered experimental for now.

Features

References to the Android Context and Lifecycle can now be updated by convenience methods on Spokestack. This is useful for multi-activity applications that need to adjust component lifecycles along with activity transitions.

Fixes

Runtime addition/removal of a TTS listener is now propagated to the TTS output class so that the intended objects receive playback events.

Fixed a potential NPE in SpokestackTTSOutput that occurred when it was released before any synthesized speech had been played.

Source code(tar.gz)
Source code(zip)
spokestack-android-9.1.0(Oct 12, 2020)
Features

clearer SpokestackAdapter method names We've added module-specific listener method names so it's easier to tell what you're overriding in your listeners

Fixes

allow clients to remove event listeners Previously, only speech pipeline listeners could be removed, which could lead to a memory leak if a multi-activity application registered Activity classes as listeners, as they would not be garbage collected.

Source code(tar.gz)
Source code(zip)
spokestack-android-9.0.0(Oct 6, 2020)
This release introduces a turn-key setup wrapper (the Spokestack) class used to build and initialize all Spokestack modules at once. Events can be consumed via the unified SpokestackAdapter interface or old-style listeners attached to the individual modules at build time. See the documentation for more details.

Breaking changes

ASR activation property names have been reverted from active-(min|max) to wake-active-(min|max) to allow the React Native library to set them properly for both platforms.

Fixes

Don't send empty ASR transcripts (#107)

If an ASR returns the empty string as either a partial or final result, it will not be reported to the client.

Force pipeline stages to reset on deactivate

This allows AndroidSpeechRecognizer to be properly stopped when deactivate() is called on the speech pipeline.

Send timeout events for empty transcripts

If an ASR returns the empty string as a final result, it will be reported to the client as a timeout.

Don't send irrelevant playback events

PLAYBACK_COMPLETE events were being dispatched from the audio player for audio events unrelated to Spokestack TTS, such as when the Assistant activation beep was complete.

Source code(tar.gz)
Source code(zip)
spokestack-android-8.1.0(Aug 20, 2020)

Features

This release includes the new KeywordRecognizer component, which uses TensorFlow Lite models similar to the wakeword models but designed for multiclass detection. KeywordRecognizer is capable of serving as a lightweight on-device ASR for a limited set of commands.

Fixes

Errors reading from the device microphone now stop the speech pipeline instead of attempting to read again on the next dispatch loop. This prevents error spam in host apps, but also means that the app will have to manually call start on a pipeline that has experienced such an error.
Source code(tar.gz)
Source code(zip)
spokestack-android-8.0.2(Aug 17, 2020)

This fixes an issue with AndroidSpeechRecognizer where it was possible to stop the speech pipeline without freeing the speech context, which made wakeword and ASR inoperable on a pipeline restart.
Source code(tar.gz)
Source code(zip)
spokestack-android-8.0.1(Aug 13, 2020)

This release addresses an error that could be reached when resetting or closing AndroidSpeechRecognizer without first activating it.
Source code(tar.gz)
Source code(zip)
spokestack-android-8.0.0(Aug 11, 2020)

This release resolves issues with stale state left over after Spokestack regains control of the microphone from a component that previously had it. It's a major release because the fix required adding a new method to the SpeechProcessor interface, so any custom implementations will also need to include this method.
Source code(tar.gz)
Source code(zip)
spokestack-android-7.0.2(Aug 10, 2020)

This release addresses an issue with control flow when using the AndroidSpeechRecognizer. Programmatic reactivation of the pipeline was occasionally being blocked due to internal management of the microphone; this should no longer happen.
Source code(tar.gz)
Source code(zip)
spokestack-android-7.0.1(Aug 5, 2020)

This is a patch release created because some files from 7.0.0 failed to deploy to JCenter. No code has changed.
Source code(tar.gz)
Source code(zip)
spokestack-android-7.0.0(Jul 29, 2020)
Breaking changes

Calling start() on a running SpeechPipeline no longer throws an exception. This should only be an issue for clients that were relying on the error, which is inadvisable.

Features

Expose Spokestack NLU slot types Not all NLUs parse slots into types, but Spokestack's NLU does, and those types need to be exposed to other libraries that wrap spokestack-android (such as react-native-spokestack).

Propagate partial ASR results to client ASR providers that offer the ability to receive partial results (all current providers do) now send those results to speech listeners via the new PARTIAL_RECOGNIZE event. The text of the result is available as the SpeechContext's transcript.

Fixes

Re-authorize Spokestack ASR socket For two back-to-back utterances submitted to the Spokestack ASR, the first frame of the second request was resulting in an error due to a missing handshake.

Workaround for Android ASR timeouts Android's built-in SpeechRecognizer has been returning NO_MATCH errors more than it used to, notably in cases where it used to send SPEECH_TIMEOUT. In response, we've temporarily remapped NO_MATCH to fire a timeout event to speech listeners instead of an error.

Source code(tar.gz)
Source code(zip)
spokestack-android-6.0.0(Jul 24, 2020)
Note The only reason this is a major release is the new annotations (described below). No breaking changes to actual library features are expected

Features

Spokestack cloud-based ASR provider Spokestack credentials can now be used to access a cloud-based ASR independent from Google and Microsoft. Pipeline profiles that use this new component are also included.

Annotation updates for IDE convenience Parameters for speech recognition, NLU, and TTS events have been annotated with @NonNull to enable cleaner client code. The downside is that the compiler/Android Studio will now throw errors for any Kotlin event listener methods marked as overrides that mark these parameters as optional.

Fixes

Remove preference for offline Android ASR The Android ASR now throws an error if the caller indicates that the offline model should be used. This flag has been removed from Spokestack's usage of the on-device Android ASR as a temporary fix.

Source code(tar.gz)
Source code(zip)
spokestack-android-5.6.0(Jun 15, 2020)
Features

Allow TTS playback to be stopped The TTSManager.stopPlayback() method will stop playback of any currently playing synthesis result and clear any queued results. This is useful if you want to allow the user to interrupt system speech and not have queued speech resume playback when the ASR request ends.

Fixes

Microphone sharing for Android ASR (#53) This addresses an issue with Spokestack sharing the microphone with platform-supplied SpeechRecognizer instances. AndroidSpeechRecognizer and associated pipeline profiles now have much broader device compatibility.

Allow i_* tag series to be parsed as slots NLU results were previously dependent on the model returning strictly valid IOB tag sequences. This has been relaxed to allow either a b_ or i_ tag to mark the beginning of a slot.

Source code(tar.gz)
Source code(zip)
spokestack-android-5.5.0(Jun 10, 2020)
Features

new TTS event for playback completion The PLAYBACK_COMPLETE notifies listeners when the media player has finished playing TTS prompts so that ASR can be reactivated if desired.

Fixes

(NLU) If a slot value recognized in an utterance is not valid according to the model metadata, a Slot with a null value will be returned in order to avoid failing the entire classification with an exception.

Source code(tar.gz)
Source code(zip)
spokestack-android-5.4.0(Jun 1, 2020)
Features

Slot output changes:

Slots declared by an intent but not tagged by the model are returned to the caller in the output with a null value

Slots not declared by an intent but tagged by the model do not cause an error and are not returned to the caller

Fixes

Removed references to a deprecated configuration parameter for the NLU model.
Source code(tar.gz)
Source code(zip)
spokestack-android-5.3.0(Apr 30, 2020)
Features

Faster first classification via the NLU model

Fixes

TTS errors, both GraphQL and HTTP, are now surfaced to the client as ERROR events; these were previously being swallowed by the library.

Source code(tar.gz)
Source code(zip)
spokestack-android-5.2.0(Apr 22, 2020)
Features

Return unknown slots as raw values

If the NLU model tags a slot not associated with the predicted intent, that slot's value will be returned as a String in the NLUResult's slot map instead of causing an error.

Support new slot features

Newer models include intent-level implicit slots and a capture_name field that changes the slot's return name; these are now supported by the client library.

Fixes

Allow slot tags to be discontinuous

Separate spans of "b_slot_name" tags will now be concatenated instead of dropping all tokens but the last one

Trim punctuation for slot value extraction

Slots recognized at the end of an input string that contains punctuation will no longer cause an error. As a consequence, both leading and trailing punctuation are removed from the slot value before parsing, so such punctuation is invalid in training data for the slot values.

Source code(tar.gz)
Source code(zip)
spokestack-android-5.1.0(Mar 16, 2020)
Features

On-device NLU via TensorFlow Lite and BERT-based custom models

Fixes

Declare a minSdkVersion and targetSdkVersion in the manifest to avoid requesting suspicious-looking permissions by default

Source code(tar.gz)
Source code(zip)
spokestack-android-5.0.0(Jan 24, 2020)
Breaking changes

Turn activation timeout into a component

This removes the activation timeout for wakeword activation and changes the names of the properties used to control it

Features

Azure Speech Service ASR

Speech Markdown support in TTS

SpeechPipeline.Builder profiles

Fixes

Set default vad-fall-delay to 500ms

Simple ProGuard rules to avoid Spokestack classes being removed by the tool.

Source code(tar.gz)
Source code(zip)