Translator and Language Detector APIs

1. Introduction

The translator and language detector APIs expose the ability to translate text between human languages, and detect the language of such text. They are complementary to any built-in browser UI features for these purposes, giving web developers the ability to trigger these operations programmatically and integrate them into their applications. This can be especially useful for operating on user input, or text retrieved from the network.

These APIs are designed to provide a high-level interface for translation and language detection, abstracting away the complexities of underlying machine learning models and their management. To deal with possible interoperability issues arising from different implementation strategies or language support, the API design guides developers toward checking the availability of languages their applications are dependent on, and including appropriate error-handling.

2. Dependencies

This specification depends on the Infra Standard. [INFRA]

As with the rest of the web platform, human languages are identified in these APIs by BCP 47 language tags, such as "ja", "en-US", "sr-Cyrl", or "de-CH-1901-x-phonebk-extended". The specific algorithms used for validation, canonicalization, and language tag matching are those from the ECMAScript Internationalization API Specification, which in turn defers some of its processing to Unicode Locale Data Markup Language (LDML). [BCP47] [ECMA-402] [UTS35].

These APIs are part of a family of APIs expected to be powered by machine learning models, which share common API surface idioms and specification patterns. Currently, the specification text for these shared parts lives in Writing Assistance APIs § 5 Shared infrastructure, and the common privacy and security considerations are discussed in Writing Assistance APIs § 6 Privacy considerations and Writing Assistance APIs § 7 Security considerations. Implementing these APIs requires implementing that shared infrastructure, and conforming to those privacy and security considerations. But it does not require implementing or exposing the actual writing assistance APIs. [WRITING-ASSISTANCE-APIS]

3. The translator API

[Exposed=Window, SecureContext]
interface Translator {
  static Promise<Translator> create(TranslatorCreateOptions options);
  static Promise<Availability> availability(TranslatorCreateCoreOptions options);

  Promise<DOMString> translate(
    DOMString input,
    optional TranslatorTranslateOptions options = {}
  );
  ReadableStream translateStreaming(
    DOMString input,
    optional TranslatorTranslateOptions options = {}
  );

  readonly attribute DOMString sourceLanguage;
  readonly attribute DOMString targetLanguage;

  Promise<double> measureInputUsage(
    DOMString input,
    optional TranslatorTranslateOptions options = {}
  );
  readonly attribute unrestricted double inputQuota;
};
Translator includes DestroyableModel;

dictionary TranslatorCreateCoreOptions {
  required DOMString sourceLanguage;
  required DOMString targetLanguage;
};

dictionary TranslatorCreateOptions : TranslatorCreateCoreOptions {
  AbortSignal signal;
  CreateMonitorCallback monitor;
};

dictionary TranslatorTranslateOptions {
  AbortSignal signal;
};

3.1. Creation

The static create(options) method steps are:

Return the result of creating an AI model object given options, "translator", validate and canonicalize translator options, compute translator options availability, download the translation model, initialize the translation model, and create the translator object.

To validate and canonicalize translator options given a TranslatorCreateCoreOptions options, perform the following steps. They mutate options in place to canonicalize language tags, and throw an exception if any are invalid.

Validate and canonicalize language tags given options and "sourceLanguage".
Validate and canonicalize language tags given options and "targetLanguage".

To download the translation model, given a TranslatorCreateCoreOptions options:

Assert: these steps are running in parallel.
Initiate the download process for everything the user agent needs to translate text from options["sourceLanguage"] to options["targetLanguage"].

This could include both a base translation model and specific language arc material, or perhaps material for multiple language arcs if an intermediate language is used.
If the download process cannot be started for any reason, then return false.
Return true.

To initialize the translation model, given a TranslatorCreateCoreOptions options:

Assert: these steps are running in parallel.
Perform any necessary initialization operations for the AI model backing the user agent’s capabilities for translating from options["sourceLanguage"] to options["targetLanguage"].

This could include loading the model into memory, or loading any fine-tunings necessary to support the specific options in question.
If initialization failed for any reason, then return a DOMException error information whose name is "OperationError" and whose details contain appropriate detail.
Return null.

To create the translator object, given a realm realm and a TranslatorCreateCoreOptions options:

Assert: these steps are running on realm’s surrounding agent’s event loop.
Let inputQuota be the amount of input quota that is available to the user agent for future translation operations. (This value is implementation-defined, and may be +∞ if there are no specific limits beyond, e.g., the user’s memory, or the limits of JavaScript strings.)
Return a new Translator object, created in realm, with

source language

options["sourceLanguage"]

target language

options["targetLanguage"]

input quota

inputQuota

3.2. Availability

The static availability(options) method steps are:

Return the result of computing AI model availability given options, "translator", validate and canonicalize translator options, and compute translator options availability.

To compute translator options availability given a TranslatorCreateCoreOptions options, perform the following steps. They return either an Availability value or null, and they mutate options in place to update language tags to their best-fit matches.

Assert: this algorithm is running in parallel.
Let availabilities be the user agent’s translator language arc availabilities.
If availabilities is null, then return null.
For each languageArc → availability in availabilities:
1. Let sourceLanguageBestFit be LookupMatchingLocaleByBestFit(« languageArc’s source language », « options["sourceLanguage"] »).
2. Let targetLanguageBestFit be LookupMatchingLocaleByBestFit(« languageArc’s target language », « options["targetLanguage"] »).
3. If sourceLanguageBestFit and targetLanguageBestFit are both not undefined, then:
  1. Set options["sourceLanguage"] to sourceLanguageBestFit.[[locale]].
  2. Set options["targetLanguage"] to targetLanguageBestFit.[[locale]].
  3. Return availability.
If (options["sourceLanguage"], options["targetLanguage"]) can be fulfilled by the identity translation, then return "available".

Such cases could also return "downloadable", "downloading", or "available" because of the above steps, if the user agent has specific entries in its translator language arc availabilities for the given language arc. However, the identity translation is always available, so this step ensures that we never return "unavailable" for such cases.

One language arc that can be fulfilled by the identity translation is ("en-US", "en-GB"). It is conceivable that an implementation might support a specialized model for this translation, which would show up in the translator language arc availabilities.

On the other hand, it’s pretty unlikely that an implementation has any specialized model for the language arc ("en-x-asdf", "en-x-xyzw"). In such a case, this step takes over, and later calls to the translate algorithm will use the identity translation.

Note that when this step takes over, options["sourceLanguage"] and options["targetLanguage"] are not modified, so if this algorithm is being called from create(), that means the resulting Translator object’s sourceLanguage and targetLanguage properties will return the original inputs, and not some canonicalized form.
Return "unavailable".

A language arc is a tuple of two strings, a source language and a target language. Each item is a Unicode canonicalized locale identifier.

The translator language arc availabilities are given by the following steps. They return a map from language arcs to Availability values, or null.

Assert: this algorithm is running in parallel.
If there is some error attempting to determine what language arcs the user agent can support translating text between, which the user agent believes to be transient (such that re-querying could stop producing such an error), then return null.
Return a map from language arcs to Availability values, where each key is a language arc that the user agent supports translating text between, filled according to the following constraints:
- If the user agent currently supports translating text from the source language to the target language of the language arc, then the map must contain an entry whose key is that language arc and whose value is "available".
- If the user agent believes it will be able to support translating text from the source language to the target language of the language arc, but only after finishing a download that is already ongoing, then the map must contain an entry whose key is that language arc and whose value is "downloading".
- If the user agent believes it will be able to support translating text from the source language to the target language of the language arc, but only after performing a not-currently ongoing download, then the map must contain an entry whose key is that language arc and whose value is "downloadable".
- The keys must not include any language arcs that overlap with the other keys.

Let’s suppose that the user agent’s translator language arc availabilities are as follows:

("en", "zh-Hans") → "available"
("en", "zh-Hant") → "downloadable"

The use of LookupMatchingLocaleByBestFit means that availability() will probably give the following answers:

function a(sourceLanguage, targetLanguage) {
  return ai.translator.availability({ sourceLanguage, targetLanguage }):
}

await a("en", "zh-Hans") === "available";
await a("en", "zh-Hant") === "downloadable";

await a("en", "zh") === "available";            // zh will best-fit to zh-Hans

await a("en", "zh-TW") === "downloadable";      // zh-TW will best-fit to zh-Hant
await a("en", "zh-HK") === "available";         // zh-HK will best-fit to zh-Hans
await a("en", "zh-CN") === "available";         // zh-CN will best-fit to zh-Hans

await a("en-US", "zh-Hant") === "downloadable"; // en-US will best-fit to en
await a("en-GB", "zh-Hant") === "downloadable"; // en-GB will best-fit to en

// Even very unexpected subtags will best-fit to en or zh-Hans
await a("en-Braille-x-lolcat", "zh-Hant") === "downloadable";
await a("en", "zh-BR-Kana") === "available";

A language arc arc overlaps with a set of language arcs otherArcs if the following steps return true:

Let sourceLanguages be the set composed of the source languages of each item in otherArcs.
If LookupMatchingLocaleByBestFit(sourceLanguages, « arc’s source language ») is not undefined, then return true.
Let targetLanguages be the set composed of the target languages of each item in otherArcs.
If LookupMatchingLocaleByBestFit(targetLanguages, « arc’s target language ») is not undefined, then return true.
Return false.

The language arc ("en", "fr") overlaps with « ("en", "fr-CA") », so the user agent’s translator language arc availabilities cannot contain both of these language arcs at the same time.

Instead, a typical user agent will either support only one English-to-French language arc (presumably ("en", "fr")), or it could support multiple non-overlapping English-to-French language arcs, such as ("en", "fr-FR"), ("en", "fr-CA"), and ("en", "fr-CH").

In the latter case, if the web developer requested to create a translator using ai.translator.create({ sourceLanguage: "en", targetLanguage: "fr" }), the LookupMatchingLocaleByBestFit algorithm would choose one of the three possible language arcs to use (presumably ("en", "fr-FR")).

A language arc arc can be fulfilled by the identity translation if the following steps return true:

If LookupMatchingLocaleByBestFit(« arc’s source language », « arc’s target language ») is not undefined, then return true.
If LookupMatchingLocaleByBestFit(« arc’s target language », « arc’s source language ») is not undefined, then return true.
Return false.

3.3. The `Translator` class

Every Translator has a source language, a string, set during creation.

Every Translator has a target language, a string, set during creation.

Every Translator has an input quota, a number, set during creation.

The sourceLanguage getter steps are to return this’s source language.

The targetLanguage getter steps are to return this’s target language.

The inputQuota getter steps are to return this’s input quota.

The translate(input, options) method steps are:

Let operation be an algorithm step which takes arguments chunkProduced, done, error, and stopProducing, and translates input given this’s source language, this’s target language, this’s input quota, chunkProduced, done, error, and stopProducing.
Return the result of getting an aggregated AI model result given this, options, and operation.

The translateStreaming(input, options) method steps are:

Let operation be an algorithm step which takes arguments chunkProduced, done, error, and stopProducing, and translates input given this’s source language, this’s target language, this’s input quota, chunkProduced, done, error, and stopProducing.
Return the result of getting a streaming AI model result given this, options, and operation.

The measureInputUsage(input, options) method steps are:

Let measureUsage be an algorithm step which takes argument stopMeasuring, and returns the result of measuring translator input usage given input, this’s source language, this’s target language, and stopMeasuring.
Return the result of measuring AI model input usage given this, options, and measureUsage.

3.4. Translation

3.4.1. The algorithm

To translate given:

a string input,
a Unicode canonicalized locale identifier sourceLanguage,
a Unicode canonicalized locale identifier targetLanguage,
a number inputQuota,
an algorithm chunkProduced that takes a string and returns nothing,
an algorithm done that takes no arguments and returns nothing,
an algorithm error that takes error information and returns nothing, and
an algorithm stopProducing that takes no arguments and returns a boolean,

perform the following steps:

Assert: this algorithm is running in parallel.
Let requested be the result of measuring translator input usage given input, sourceLanguage, targetLanguage, and stopProducing.
If requested is null, then return.
If requested is an error information, then:
1. Perform error given requested.
2. Return.
Assert: requested is a number.
If requested is greater than inputQuota, then:
1. Let errorInfo be a quota exceeded error information with a requested of requested and a quota of inputQuota.
2. Perform error given errorInfo.
3. Return.
In reality, we expect that implementations will check the input usage against the quota as part of the same call into the model as the translation itself. The steps are only separated in the specification for ease of understanding.
In an implementation-defined manner, subject to the following guidelines, begin the processs of translating input from sourceLanguage into targetLanguage.

If input is the empty string, or otherwise consists of no translatable content (e.g., only contains whitespace, or control characters), then the resulting translation should be input. In such cases, sourceLanguage and targetLanguage should be ignored.

If (sourceLanguage, targetLanguage) can be fulfilled by the identity translation, then the resulting translation should be input.

The translation process must conform to the guidance given in Writing Assistance APIs § 6 Privacy considerations and Writing Assistance APIs § 7 Security considerations, notably including (but not limited to) Writing Assistance APIs § 6.4 User input and Writing Assistance APIs § 7.2 Runtime shared resources.
While true:
1. Wait for the next chunk of translated text to be produced, for the translation process to finish, or for the result of calling stopProducing to become true.
2. If such a chunk is successfully produced:
  1. Let it be represented as a string chunk.
  2. Perform chunkProduced given chunk.
3. Otherwise, if the translation process has finished:
  1. Perform done.
  2. Break.
4. Otherwise, if stopProducing returns true, then break.
5. Otherwise, if an error occurred during translation:
  1. Let the error be represented as a DOMException error information errorInfo according to the guidance in § 3.4.3 Errors.
  2. Perform error given errorInfo.
  3. Break.

3.4.2. Usage

To measure translator input usage, given:

a string input,
a Unicode canonicalized locale identifier sourceLanguage,
a Unicode canonicalized locale identifier targetLanguage, and
an algorithm stopMeasuring that takes no arguments and returns a boolean,

perform the following steps:

Assert: this algorithm is running in parallel.
Let inputToModel be the implementation-defined string that would be sent to the underlying model in order to translate input from sourceLanguage to targetLanguage.

This might be just input itself, if sourceLanguage and targetLanguage were loaded into the model during initialization. Or it might consist of more, e.g. appropriate quota usage for encoding the languages in question, or some sort of wrapper prompt to a language model.

If during this process stopMeasuring starts returning true, then return null.

If an error occurs during this process, then return an appropriate DOMException error information according to the guidance in § 3.4.3 Errors.
Return the amount of input usage needed to represent inputToModel when given to the underlying model. The exact calculation procedure is implementation-defined, subject to the following constraints.

The returned input usage must be nonnegative and finite. It must be 0, if there are no usage quotas for the translation process (i.e., if the input quota is +∞). Otherwise, it must be positive and should be roughly proportional to the length of inputToModel.

This might be the number of tokens needed to represent input in a language model tokenization scheme, or it might be input’s length. It could also be some variation of these which also counts the usage of any prefixes or suffixes necessary to give to the model.

If during this process stopMeasuring starts returning true, then instead return null.

If an error occurs during this process, then instead return an appropriate DOMException error information according to the guidance in § 3.4.3 Errors.

3.4.3. Errors

When translation fails, the following possible reasons may be surfaced to the web developer. This table lists the possible DOMException names and the cases in which an implementation should use them:

`DOMException` name	Scenarios
"`NotAllowedError`"	Translation is disabled by user choice or user agent policy.
"`NotReadableError`"	The translation output was filtered by the user agent, e.g., because it was detected to be harmful, inaccurate, or nonsensical.
"`UnknownError`"	All other scenarios, including if the user agent believes it cannot translate and also meet the requirements given in Writing Assistance APIs § 6 Privacy considerations and Writing Assistance APIs § 7 Security considerations. Or, if the user agent would prefer not to disclose the failure reason.

This table does not give the complete list of exceptions that can be surfaced by the translator API. It only contains those which can come from certain implementation-defined steps.

3.5. Permissions policy integration

Access to the translator API is gated behind the policy-controlled feature "translator", which has a default allowlist of 'self'.

4. The language detector API

[Exposed=Window, SecureContext]
interface LanguageDetector {
  static Promise<LanguageDetector> create(
    optional LanguageDetectorCreateOptions options = {}
  );
  static Promise<Availability> availability(
    optional LanguageDetectorCreateCoreOptions options = {}
  );

  Promise<sequence<LanguageDetectionResult>> detect(
    DOMString input,
    optional LanguageDetectorDetectOptions options = {}
  );

  readonly attribute FrozenArray<DOMString>? expectedInputLanguages;

  Promise<double> measureInputUsage(
    DOMString input,
    optional LanguageDetectorDetectOptions options = {}
  );
  readonly attribute unrestricted double inputQuota;
};
LanguageDetector includes DestroyableModel;

dictionary LanguageDetectorCreateCoreOptions {
  sequence<DOMString> expectedInputLanguages;
};

dictionary LanguageDetectorCreateOptions : LanguageDetectorCreateCoreOptions {
  AbortSignal signal;
  CreateMonitorCallback monitor;
};

dictionary LanguageDetectorDetectOptions {
  AbortSignal signal;
};

dictionary LanguageDetectionResult {
  DOMString detectedLanguage;
  double confidence;
};

4.1. Creation

The static create(options) method steps are:

Return the result of creating an AI model object given options, "language-detector", validate and canonicalize language detector options, compute language detector options availability, download the language detector model, initialize the language detector model, and create the language detector object.

To validate and canonicalize language detector options given a LanguageDetectorCreateCoreOptions options, perform the following steps. They mutate options in place to canonicalize language tags, and throw an exception if any are invalid.

Validate and canonicalize language tags given options and "expectedInputLanguages".

To download the language detector model, given a LanguageDetectorCreateCoreOptions options:

Assert: these steps are running in parallel.
Initiate the download process for everything the user agent needs to detect the languages of input text, including all the languages in options["expectedInputLanguages"].

This could include both a base language detection model, and specific fine-tunings or other material to help with the languages identified in options["expectedInputLanguages"].
If the download process cannot be started for any reason, then return false.
Return true.

To initialize the language detector model, given a LanguageDetectorCreateCoreOptions options:

Assert: these steps are running in parallel.
Perform any necessary initialization operations for the AI model backing the user agent’s capabilities for detecting the languages of input text.

This could include loading the model into memory, or loading any fine-tunings necessary to support the languages identified in options["expectedInputLanguages"].
If initialization failed for any reason, then return a DOMException error information whose name is "OperationError" and whose details contain appropriate detail.
Return null.

To create the language detector object, given a realm realm and a LanguageDetectorCreateCoreOptions options:

Assert: these steps are running on realm’s surrounding agent’s event loop.
Let inputQuota be the amount of input quota that is available to the user agent for future language detection operations. (This value is implementation-defined, and may be +∞ if there are no specific limits beyond, e.g., the user’s memory, or the limits of JavaScript strings.)
Return a new LanguageDetector object, created in realm, with

expected input languages

the result of creating a frozen array given options["expectedInputLanguages"] if it is not empty; otherwise null

input quota

inputQuota

4.2. Availability

The static availability(options) method steps are:

Return the result of computing AI model availability given options, "language-detector", validate and canonicalize language detector options, and compute language detector options availability.

To compute language detector options availability given a LanguageDetectorCreateCoreOptions options, perform the following steps. They return either an Availability value or null, and they mutate options in place to update language tags to their best-fit matches.

Assert: this algorithm is running in parallel.
If there is some error attempting to determine what language detection capabilities the user agent can support, which the user agent believes to be transient (such that re-querying could stop producing such an error), then return null.
Let partition be the result of getting the language availabilities partition given the purpose of detecting text written in that language.
Return the result of computing language availability given options["expectedInputLanguages"] and partition.

4.3. The `LanguageDetector` class

Every LanguageDetector has an expected input languages, a FrozenArray<DOMString> or null, set during creation.

Every LanguageDetector has an input quota, a number, set during creation.

The expectedInputLanguages getter steps are to return this’s expected input languages.

The inputQuota getter steps are to return this’s input quota.

The detect(input, options) method steps are:

Let global be this’s relevant global object.

Assert: global is a Window object.
If global’s associated Document is not fully active, then return a promise rejected with an "InvalidStateError" DOMException.
Let signals be « this’s destruction abort controller’s signal ».
If options["signal"] exists, then append it to signals.
Let compositeSignal be the result of creating a dependent abort signal given signals using AbortSignal and this’s relevant realm.
If compositeSignal is aborted, then return a promise rejected with compositeSignal’s abort reason.
Let promise be a new promise created in this’s relevant realm.
Let abortedDuringOperation be false.

This variable will be written to from the event loop, but read from in parallel.
Add the following abort steps to compositeSignal:
1. Set abortedDuringOperation to true.
2. Reject promise with compositeSignal’s abort reason.
Let inputQuota be this’s input quota.
In parallel:
1. Let stopProducing be the following steps:
  1. Return abortedDuringOperation.
2. Let result be the result of detecting languages given input, inputQuota, and stopProducing.
3. Queue a global task on the AI task source given global to perform the following steps:
  1. If abortedDuringOperation is true, then abort these steps.
  2. Otherwise, if result is an error information, then reject promise with the result of converting error information into an exception object given result.
  3. Otherwise:
    1. Assert: result is a list of LanguageDetectionResult dictionaries. (It is not null, since in that case abortedDuringOperation would have been true.)
    2. Resolve promise with result.

The measureInputUsage(input, options) method steps are:

Let measureUsage be an algorithm step which takes argument stopMeasuring, and returns the result of measuring language detector input usage given input and stopMeasuring.
Return the result of measuring AI model input usage given this, options, and measureUsage.

4.4. Language detection

4.4.1. The algorithm

To detect languages given a string input, a number inputQuota, and an algorithm stopProducing that takes no arguments and returns a boolean, perform the following steps. They will return either null, an error information, or a list of LanguageDetectionResult dictionaries.

Assert: this algorithm is running in parallel.
Let requested be the result of measuring language detector input usage given input and stopProducing.
If requested is null or an error information, then return requested.
Assert: requested is a number.
If requested is greater than inputQuota, then return a quota exceeded error information with a requested of requested and a quota of inputQuota.

In reality, we expect that implementations will check the input usage against the quota as part of the same call into the model as the language detection itself. The steps are only separated in the specification for ease of understanding.
Let partition be the result of getting the language availabilities partition given the purpose of detecting text written in that language.
Let currentlyAvailableLanguages be partition["available"].
In an implementation-defined manner, subject to the following guidelines, let rawResult and unknown be the result of detecting the languages of input.

rawResult must be a map which has a key for each language in currentlyAvailableLanguages. The value for each such key must be a number between 0 and 1. This value must represent the implementation’s confidence that input is written in that language.

unknown must be a number between 0 and 1 that represents the implementation’s confidence that input is not written in any of the languages in currentlyAvailableLanguages.

The values of rawResult, plus unknown, must sum to 1. Each such value, or unknown, may be 0.

If the implementation believes input to be written in multiple languages, then it should attempt to apportion the values of rawResult and unknown such that they are proportionate to the amount of input written in each detected language. The exact scheme for apportioning input is implementation-defined.

If input is "tacosを食べる", the implementation might split this into "tacos" and "を食べる", and then detect the languages of each separately. The first part might be detected as English with confidence 0.5 and Spanish with confidence 0.5, and the second part as Japanese with confidence 1. The resulting rawResult then might be «[ "en" → 0.25, "es" → 0.25, "ja" → 0.5 ]» (with unknown set to 0).

The decision to split this into two parts, instead of e.g. the three parts "tacos", "を", and "食べる", was an implementation-defined choice. Similarly, the decision to treat each part as contributing to "half" of the result, instead of e.g. weighting by number of code points, was implementation-defined.

(Realistically, we expect that implementations will split on larger chunks than this, as generally more than 4-5 code points are necessary for most language detection models.)

If stopProducing returns true at any point during this process, then return null.

If an error occurred during language detection, then return an error information according to the guidance in § 4.4.3 Errors.

The detection process must conform to the guidance given in Writing Assistance APIs § 6 Privacy considerations and Writing Assistance APIs § 7 Security considerations, notably including (but not limited to) Writing Assistance APIs § 6.4 User input and Writing Assistance APIs § 7.2 Runtime shared resources.
Sort in descending order rawResult with a less than algorithm which given entries a and b, returns true if a’s value is less than b’s value.
Let results be an empty list.
Let cumulativeConfidence be 0.
For each key → value of rawResult:
1. If value is 0, then break.
2. If value is less than unknown, then break.
3. Append «[ "detectedLanguage" → key, "confidence" → value ]» to results.
4. Set cumulativeConfidence to cumulativeConfidence + value.
5. If cumulativeConfidence is greater than or equal to 0.99, then break.
Assert: 1 − cumulativeConfidence is greater than or equal to unknown.
Assert: If results’s size is greater than 0, then results[results’s size - 1]["confidence"] is greater than or equal to unknown.
Append «[ "detectedLanguage" → "und", "confidence" → unknown ]» to results.
Return results.

Languages which are less than 1% likely, or contribute to less than 1% of the text, are considered more likely to be noise and so not worth returning to the web developer. Similarly, if the implementation is less sure about a language than it is about the text not being in any of the languages it knows, that language is probably not worth returning to the web developer.

Because of such omitted low-probability results, the sum of all confidence values returned to the web developer could be less than 1.

4.4.2. Usage

To measure language detector input usage, given a string input and an algorithm stopMeasuring that takes no arguments and returns a boolean, perform the following steps:

Assert: this algorithm is running in parallel.
Let inputToModel be the implementation-defined string that would be sent to the underlying model in order to detect languages given input.

This might be just input itself, or it might include some sort of wrapper prompt to a language model.

If during this process stopMeasuring starts returning true, then return null.

If an error occurs during this process, then return an appropriate DOMException error information according to the guidance in § 4.4.3 Errors.
Return the amount of input usage needed to represent inputToModel when given to the underlying model. The exact calculation procedure is implementation-defined, subject to the following constraints.

The returned input usage must be nonnegative and finite. It must be 0, if there are no usage quotas for the translation process (i.e., if the input quota is +∞). Otherwise, it must be positive and should be roughly proportional to the length of inputToModel.

This might be the number of tokens needed to represent input in a language model tokenization scheme, or it might be input’s length. It could also be some variation of these which also counts the usage of any prefixes or suffixes necessary to give to the model.

If during this process stopMeasuring starts returning true, then instead return null.

If an error occurs during this process, then instead return an appropriate DOMException error information according to the guidance in § 4.4.3 Errors.

4.4.3. Errors

When language detection fails, the following possible reasons may be surfaced to the web developer. This table lists the possible DOMException names and the cases in which an implementation should use them:

`DOMException` name	Scenarios
"`NotAllowedError`"	Language detection is disabled by user choice or user agent policy.
"`UnknownError`"	All other scenarios, including if the user agent believes it cannot detect and also meet the requirements given in Writing Assistance APIs § 6 Privacy considerations and Writing Assistance APIs § 7 Security considerations. Or, if the user agent would prefer not to disclose the failure reason.

This table does not give the complete list of exceptions that can be surfaced by the language detector API. It only contains those which can come from certain implementation-defined steps.

4.5. Permissions policy integration

Access to the language detector API is gated behind the policy-controlled feature "language-detector", which has a default allowlist of 'self'.

5. Privacy considerations

Please see Writing Assistance APIs § 6 Privacy considerations for a discussion of privacy considerations for the translator and language detector APIs. That text was written to apply to all APIs sharing the same infrastructure, as noted in § 2 Dependencies.

6. Security considerations

Please see Writing Assistance APIs § 7 Security considerations for a discussion of security considerations for the translator and language detector APIs. That text was written to apply to all APIs sharing the same infrastructure, as noted in § 2 Dependencies.

Translator and Language Detector APIs

Abstract

Status of this document

1. Introduction

2. Dependencies

3. The translator API

3.1. Creation

3.2. Availability

3.3. The `Translator` class

3.4. Translation

3.4.1. The algorithm

3.4.2. Usage

3.4.3. Errors

3.5. Permissions policy integration

4. The language detector API

4.1. Creation

4.2. Availability

4.3. The `LanguageDetector` class

4.4. Language detection

4.4.1. The algorithm

4.4.2. Usage

4.4.3. Errors

4.5. Permissions policy integration

5. Privacy considerations

6. Security considerations

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

Informative References

IDL Index

Translator and Language Detector APIs

Abstract

Status of this document

1. Introduction

2. Dependencies

3. The translator API

3.1. Creation

3.2. Availability

3.3. The Translator class

3.4. Translation

3.4.1. The algorithm

3.4.2. Usage

3.4.3. Errors

3.5. Permissions policy integration

4. The language detector API

4.1. Creation

4.2. Availability

4.3. The LanguageDetector class

4.4. Language detection

4.4.1. The algorithm

4.4.2. Usage

4.4.3. Errors

4.5. Permissions policy integration

5. Privacy considerations

6. Security considerations

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

Informative References

IDL Index

3.3. The `Translator` class

4.3. The `LanguageDetector` class