From 9e09bc96a434b4c8833826765663f94a9aa7e25d Mon Sep 17 00:00:00 2001 From: Evan Liu Date: Fri, 16 May 2025 17:16:55 -0700 Subject: [PATCH 1/2] Update explainer with spec changes --- explainers/on-device-speech-recognition.md | 90 ++++++++++++++-------- 1 file changed, 59 insertions(+), 31 deletions(-) diff --git a/explainers/on-device-speech-recognition.md b/explainers/on-device-speech-recognition.md index 3b0488e..02ab7cf 100644 --- a/explainers/on-device-speech-recognition.md +++ b/explainers/on-device-speech-recognition.md @@ -11,7 +11,7 @@ The Web Speech API is a powerful browser feature that enables applications to pe To address these issues, we introduce **on-device speech recognition capabilities** as part of the Web Speech API. This enhancement allows speech recognition to run locally on user devices, providing a faster, more private, and offline-compatible experience. ## Why Use On-Device Speech Recognition? - + ### 1. **Privacy** On-device processing ensures that neither raw audio nor transcriptions leave the user's device, enhancing data security and user trust. @@ -20,58 +20,86 @@ Local processing reduces latency, providing a smoother and faster user experienc ### 3. **Offline Functionality** Applications can offer speech recognition capabilities even without an active internet connection, increasing their utility in remote or low-connectivity environments. +## New API Members + +This enhancement introduces new members to the Web Speech API to support on-device recognition: a dictionary for configuration, an instance attribute, and static methods for managing capabilities. + +### `SpeechRecognitionOptions` Dictionary + +This dictionary is used to configure speech recognition preferences, both for individual sessions and for querying or installing capabilities. + +It includes the following members: + +- `langs`: A required sequence of `DOMString` representing BCP-47 language tags (e.g., `['en-US']`). +- `processLocally`: A boolean that, if `true`, instructs the recognition to be performed on-device. If `false` (the default), any available recognition method (cloud-based or on-device) may be used. + + +```idl +dictionary SpeechRecognitionOptions { + required sequence langs; // BCP-47 language tags + boolean processLocally = false; // Instructs the recognition to be performed on-device. If `false` (default), any available recognition method may be used. +}; +``` + +#### Example Usage +```javascript +const recognition = new SpeechRecognition(); +recognition.options = { + langs: ['en-US'], + processLocally: true +}; +recognition.start(); +``` ## New Methods -### 1. `Promise availableOnDevice(DOMString lang)` -This method checks if on-device speech recognition is available for a specific language. Developers can use this to determine whether to enable features that require on-device speech recognition. +### 1. `static Promise SpeechRecognition.available(SpeechRecognitionOptions options)` +This static method checks the availability of speech recognition capabilities matching the provided `SpeechRecognitionOptions`. + +The method returns a `Promise` that resolves to an `AvailabilityStatus` enum string: +- `"available"`: Ready to use according to the specified options. +- `"downloadable"`: Not currently available, but resources (e.g., language packs for on-device) can be downloaded. +- `"downloading"`: Resources are currently being downloaded. +- `"unavailable"`: Not available and not downloadable. #### Example Usage ```javascript -const lang = 'en-US'; -SpeechRecognition.availableOnDevice(lang).then((available) => { - if (available) { - console.log(`On-device speech recognition is available for ${lang}.`); +// Check availability for on-device English (US) +const options = { langs: ['en-US'], processLocally: true }; + +SpeechRecognition.available(options).then((status) => { + console.log(`Speech recognition status for ${options.langs.join(', ')} (on-device): ${status}.`); + if (status === 'available') { + console.log('Ready to use on-device speech recognition.'); + } else if (status === 'downloadable') { + console.log('Resources are downloadable. Call install() if needed.'); + } else if (status === 'downloading') { + console.log('Resources are currently downloading.'); } else { - console.log(`On-device speech recognition is not available for ${lang}.`); + console.log('Not available for on-device speech recognition.'); } }); ``` -### 2. `Promise installOnDevice(DOMString[] lang)` -This method install the resources required for on-device speech recognition for the given BCP-47 language codes. The installation process may download and configure necessary language models. +### 2. `Promise install(SpeechRecognitionOptions options)` +This method installs the resources required for speech recognition matching the provided `SpeechRecognitionOptions`. The installation process may download and configure necessary language models. #### Example Usage ```javascript -const lang = 'en-US'; -SpeechRecognition.installOnDevice([lang]).then((success) => { +// Install on-device resources for English (US) +const options = { langs: ['en-US'], processLocally: true }; +SpeechRecognition.install(options).then((success) => { if (success) { - console.log('On-device speech recognition resources installed successfully.'); + console.log(`On-device speech recognition resources for ${options.langs.join(', ')} installed successfully.`); } else { - console.error('Unable to install on-device speech recognition.'); + console.error(`Unable to install on-device speech recognition resources for ${options.langs.join(', ')}. This could be due to unsupported languages or download issues.`); } }); ``` -## New Attribute - -### 1. `mode` attribute in the `SpeechRecognition` interface -The `mode` attribute in the `SpeechRecognition` interface defines how speech recognition should behave when starting a session. - -#### `SpeechRecognitionMode` Enum - -- **"on-device-preferred"**: Use on-device speech recognition if available. If not, fall back to cloud-based speech recognition. -- **"on-device-only"**: Only use on-device speech recognition. If it's unavailable, throw an error. - -#### Example Usage -```javascript -const recognition = new SpeechRecognition(); -recognition.mode = "ondevice-only"; // Only use on-device speech recognition. -recognition.start(); -``` ## Privacy considerations -To reduce the risk of fingerprinting, user agents must implementing privacy-preserving countermeasures. The Web Speech API will employ the same masking techniques used by the [Web Translation API](https://github.com/webmachinelearning/writing-assistance-apis/pull/47). +To reduce the risk of fingerprinting, user agents must implement privacy-preserving countermeasures. The Web Speech API will employ the same masking techniques used by the [Web Translation API](https://github.com/webmachinelearning/writing-assistance-apis/pull/47). ## Conclusion The addition of on-device speech recognition capabilities to the Web Speech API marks a significant step forward in creating more private, performant, and accessible web applications. By leveraging these new methods, developers can enhance user experiences while addressing key concerns around privacy and connectivity. \ No newline at end of file From 4a151ae9d726128dfaa58a9c8e3a10def2795f41 Mon Sep 17 00:00:00 2001 From: Evan Liu Date: Fri, 23 May 2025 22:29:12 -0700 Subject: [PATCH 2/2] Update on-device-speech-recognition.md --- explainers/on-device-speech-recognition.md | 35 ++++++++++++---------- 1 file changed, 19 insertions(+), 16 deletions(-) diff --git a/explainers/on-device-speech-recognition.md b/explainers/on-device-speech-recognition.md index 7aa4453..c0cf7e7 100644 --- a/explainers/on-device-speech-recognition.md +++ b/explainers/on-device-speech-recognition.md @@ -22,7 +22,24 @@ Local processing reduces latency, providing a smoother and faster user experienc Applications can offer speech recognition capabilities even without an active internet connection, increasing their utility in remote or low-connectivity environments. ## New API Members -This enhancement introduces new members to the Web Speech API to support on-device recognition: a dictionary for configuration, an instance attribute, and static methods for managing capabilities. +This enhancement introduces new members to the Web Speech API to support on-device recognition: + +* An instance attribute `processLocally` on the `SpeechRecognition` object to control processing for individual recognition sessions. +* A `SpeechRecognitionOptions` dictionary used for querying and installing on-device capabilities. +* Static methods `SpeechRecognition.available()` and `SpeechRecognition.install()` for managing these capabilities. + +### Controlling On-Device Processing for a Session + +To instruct a specific speech recognition session to be performed on-device, the `processLocally` attribute on the `SpeechRecognition` instance is used. + +- `SpeechRecognition.processLocally` (`boolean`): When set to `true`, it mandates that the recognition for this particular session occurs on the user's device. If `false` (the default), the user agent can select any available recognition method (local or cloud-based). + +#### Example: Requesting On-Device for a Single Session +```javascript +const recognition = new SpeechRecognition(); +recognition.processLocally = true; // Instruct this session to run on-device +recognition.start(); +``` ### `SpeechRecognitionOptions` Dictionary @@ -30,27 +47,14 @@ This dictionary is used to configure speech recognition preferences, both for in It includes the following members: -- `langs`: A required sequence of `DOMString` representing BCP-47 language tags (e.g., `['en-US']`). - `processLocally`: A boolean that, if `true`, instructs the recognition to be performed on-device. If `false` (the default), any available recognition method (cloud-based or on-device) may be used. - ```idl dictionary SpeechRecognitionOptions { - required sequence langs; // BCP-47 language tags boolean processLocally = false; // Instructs the recognition to be performed on-device. If `false` (default), any available recognition method may be used. }; ``` -#### Example Usage -```javascript -const recognition = new SpeechRecognition(); -recognition.options = { - langs: ['en-US'], - processLocally: true -}; -recognition.start(); -``` - ## Example use cases ### 1. Company with data residency requirements Websites with strict data residency requirements (i.e., regulatory, legal, or company policy) can ensure that audio data remains on the user's device and is not sent over the network for processing. This is particularly crucial for compliance with regulations like GDPR, which considers voice as personally identifiable information (PII) as voice recordings can reveal information about an individual's gender, ethnic origin, or even potential health conditions. On-device processing significantly enhances user privacy by minimizing the exposure of sensitive voice data. @@ -107,9 +111,8 @@ SpeechRecognition.install(options).then((success) => { }); ``` - ## Privacy considerations To reduce the risk of fingerprinting, user agents must implement privacy-preserving countermeasures. The Web Speech API will employ the same masking techniques used by the [Web Translation API](https://github.com/webmachinelearning/writing-assistance-apis/pull/47). ## Conclusion -The addition of on-device speech recognition capabilities to the Web Speech API marks a significant step forward in creating more private, performant, and accessible web applications. By leveraging these new methods, developers can enhance user experiences while addressing key concerns around privacy and connectivity. \ No newline at end of file +The addition of on-device speech recognition capabilities to the Web Speech API marks a significant step forward in creating more private, performant, and accessible web applications. By leveraging these new methods, developers can enhance user experiences while addressing key concerns around privacy and connectivity.