Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 64 additions & 33 deletions explainers/on-device-speech-recognition.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ The Web Speech API is a powerful browser feature that enables applications to pe
To address these issues, we introduce **on-device speech recognition capabilities** as part of the Web Speech API. This enhancement allows speech recognition to run locally on user devices, providing a faster, more private, and offline-compatible experience.

## Why Use On-Device Speech Recognition?

### 1. **Privacy**
On-device processing ensures that neither raw audio nor transcriptions leave the user's device, enhancing data security and user trust.

Expand All @@ -20,6 +20,40 @@ Local processing reduces latency, providing a smoother and faster user experienc

### 3. **Offline Functionality**
Applications can offer speech recognition capabilities even without an active internet connection, increasing their utility in remote or low-connectivity environments.
## New API Members

This enhancement introduces new members to the Web Speech API to support on-device recognition:

* An instance attribute `processLocally` on the `SpeechRecognition` object to control processing for individual recognition sessions.
* A `SpeechRecognitionOptions` dictionary used for querying and installing on-device capabilities.
* Static methods `SpeechRecognition.available()` and `SpeechRecognition.install()` for managing these capabilities.

### Controlling On-Device Processing for a Session

To instruct a specific speech recognition session to be performed on-device, the `processLocally` attribute on the `SpeechRecognition` instance is used.

- `SpeechRecognition.processLocally` (`boolean`): When set to `true`, it mandates that the recognition for this particular session occurs on the user's device. If `false` (the default), the user agent can select any available recognition method (local or cloud-based).

#### Example: Requesting On-Device for a Single Session
```javascript
const recognition = new SpeechRecognition();
recognition.processLocally = true; // Instruct this session to run on-device
recognition.start();
```

### `SpeechRecognitionOptions` Dictionary

This dictionary is used to configure speech recognition preferences, both for individual sessions and for querying or installing capabilities.

It includes the following members:

- `processLocally`: A boolean that, if `true`, instructs the recognition to be performed on-device. If `false` (the default), any available recognition method (cloud-based or on-device) may be used.

```idl
dictionary SpeechRecognitionOptions {
boolean processLocally = false; // Instructs the recognition to be performed on-device. If `false` (default), any available recognition method may be used.
};
```

## Example use cases
### 1. Company with data residency requirements
Expand All @@ -33,55 +67,52 @@ Applications that need to function in unreliable or offline network conditions

## New Methods

### 1. `Promise<boolean> availableOnDevice(DOMString lang)`
This method checks if on-device speech recognition is available for a specific language. Developers can use this to determine whether to enable features that require on-device speech recognition.
### 1. `static Promise<AvailabilityStatus> SpeechRecognition.available(SpeechRecognitionOptions options)`
This static method checks the availability of speech recognition capabilities matching the provided `SpeechRecognitionOptions`.

The method returns a `Promise` that resolves to an `AvailabilityStatus` enum string:
- `"available"`: Ready to use according to the specified options.
- `"downloadable"`: Not currently available, but resources (e.g., language packs for on-device) can be downloaded.
- `"downloading"`: Resources are currently being downloaded.
- `"unavailable"`: Not available and not downloadable.

#### Example Usage
```javascript
const lang = 'en-US';
SpeechRecognition.availableOnDevice(lang).then((available) => {
if (available) {
console.log(`On-device speech recognition is available for ${lang}.`);
// Check availability for on-device English (US)
const options = { langs: ['en-US'], processLocally: true };

SpeechRecognition.available(options).then((status) => {
console.log(`Speech recognition status for ${options.langs.join(', ')} (on-device): ${status}.`);
if (status === 'available') {
console.log('Ready to use on-device speech recognition.');
} else if (status === 'downloadable') {
console.log('Resources are downloadable. Call install() if needed.');
} else if (status === 'downloading') {
console.log('Resources are currently downloading.');
} else {
console.log(`On-device speech recognition is not available for ${lang}.`);
console.log('Not available for on-device speech recognition.');
}
});
```

### 2. `Promise<boolean> installOnDevice(DOMString[] lang)`
This method install the resources required for on-device speech recognition for the given BCP-47 language codes. The installation process may download and configure necessary language models.
### 2. `Promise<boolean> install(SpeechRecognitionOptions options)`
This method installs the resources required for speech recognition matching the provided `SpeechRecognitionOptions`. The installation process may download and configure necessary language models.

#### Example Usage
```javascript
const lang = 'en-US';
SpeechRecognition.installOnDevice([lang]).then((success) => {
// Install on-device resources for English (US)
const options = { langs: ['en-US'], processLocally: true };
SpeechRecognition.install(options).then((success) => {
if (success) {
console.log('On-device speech recognition resources installed successfully.');
console.log(`On-device speech recognition resources for ${options.langs.join(', ')} installed successfully.`);
} else {
console.error('Unable to install on-device speech recognition.');
console.error(`Unable to install on-device speech recognition resources for ${options.langs.join(', ')}. This could be due to unsupported languages or download issues.`);
}
});
```

## New Attribute

### 1. `mode` attribute in the `SpeechRecognition` interface
The `mode` attribute in the `SpeechRecognition` interface defines how speech recognition should behave when starting a session.

#### `SpeechRecognitionMode` Enum

- **"on-device-preferred"**: Use on-device speech recognition if available. If not, fall back to cloud-based speech recognition.
- **"on-device-only"**: Only use on-device speech recognition. If it's unavailable, throw an error.

#### Example Usage
```javascript
const recognition = new SpeechRecognition();
recognition.mode = "ondevice-only"; // Only use on-device speech recognition.
recognition.start();
```

## Privacy considerations
To reduce the risk of fingerprinting, user agents must implementing privacy-preserving countermeasures. The Web Speech API will employ the same masking techniques used by the [Web Translation API](https://github.com/webmachinelearning/writing-assistance-apis/pull/47).
To reduce the risk of fingerprinting, user agents must implement privacy-preserving countermeasures. The Web Speech API will employ the same masking techniques used by the [Web Translation API](https://github.com/webmachinelearning/writing-assistance-apis/pull/47).

## Conclusion
The addition of on-device speech recognition capabilities to the Web Speech API marks a significant step forward in creating more private, performant, and accessible web applications. By leveraging these new methods, developers can enhance user experiences while addressing key concerns around privacy and connectivity.
The addition of on-device speech recognition capabilities to the Web Speech API marks a significant step forward in creating more private, performant, and accessible web applications. By leveraging these new methods, developers can enhance user experiences while addressing key concerns around privacy and connectivity.