Web Neural Network API

Editor’s Draft,

More details about this document
This version:
https://webmachinelearning.github.io/webnn/
Latest published version:
https://www.w3.org/TR/webnn/
Implementation Report:
https://wpt.fyi/results/webnn?label=master&label=experimental&aligned&q=webnn
Test Suite:
https://github.com/web-platform-tests/wpt/tree/master/webnn
Feedback:
GitHub
Inline In Spec
Editors:
Ningxin Hu (Intel Corporation)
Dwayne Robinson (Microsoft Corporation)
Former Editor:
Chai Chaoweeraprasit (Microsoft Corporation)
Other:
Implementation Status, Explainer, Samples

Abstract

This document describes a dedicated low-level API for neural network inference hardware acceleration.

Status of this document

This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document was published by the Web Machine Learning Working Group as an Editors' Draft. This document is intended to become a W3C Recommendation. Feedback and comments on this specification are welcome. Please use Github issues. Discussions may also be found in the public-webmachinelearning@w3.org archives.

Publication as an Editors' Draft does not imply endorsement by W3C and its Members. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 03 November 2023 W3C Process Document.

Since the initial Candidate Recommendation Snapshot the Working Group has gathered further implementation experience and added new operations and data types needed for well-known transformers to support generative AI use cases. In addition, informed by this implementation experience, the group removed MLCommandEncoder, support for synchronous execution, and higher-level operations that can be expressed in terms of lower-level primitives in a performant manner. The group has also updated the specification to use modern authoring conventions to improve interoperability and precision of normative definitions. The group is developing a new feature, a backend-agnostic storage type, to improve performance and interoperability between the WebNN, WebGPU APIs and purpose-built hardware for ML and expects to republish this document as a Candidate Recommendation Snapshot when ready for implementation. This document is maintained and updated at any time. Some parts of this document are work in progress and further improvements are expected to be reflected in revised Candidate Recommendation Drafts and Snapshots.

Before requesting transition to Proposed Recommendation, the Working Group will seek to demonstrate that:

1. Introduction

The Web Neural Network API defines a web-friendly hardware-agnostic abstraction layer that makes use of Machine Learning capabilities of operating systems and underlying hardware platforms without being tied to platform-specific capabilities. The abstraction layer addresses the requirements of key Machine Learning JavaScript frameworks and also allows web developers familiar with the ML domain to write custom code without the help of libraries.

For an illustrated introduction, please see the explainer.

2. Use cases

2.1. Application Use Cases

This section illustrates application-level use cases for neural network inference hardware acceleration. All applications in those use cases can be built on top of pre-trained deep neural network (DNN) [models].

Note: Please be aware that some of the use cases described here, are by their very nature, privacy-invasive. Developers who are planning to use the API for such use cases should ensure that the API is being used to benefit users, for purposes that users understand, and approve. They should apply the Ethical Principles for Web Machine Learning [webmachinelearning-ethics] and implement appropriate privacy risk mitigations such as transparency, data minimisation, and users controls.

2.1.1. Person Detection

A user opens a web-based video conferencing application, but she temporarily leaves from her room. The application is watching whether she is in front of her PC by using object detection (for example, using object detection approaches such as [SSD] or [YOLO] that use a single DNN) to detect regions in a camera input frame that include persons.

When she comes back, the application automatically detects her and notifies other online users that she is active now.

2.1.2. Semantic Segmentation

A user joins a teleconference via a web-based video conferencing application at her desk since no meeting room in her office is available. During the teleconference, she does not wish that her room and people in the background are visible. To protect the privacy of the other people and the surroundings, the application runs a machine learning model such as [DeepLabv3+], [MaskR-CNN] or [SegAny] to semantically split an image into segments and replaces segments that represent other people and background with another picture.

2.1.3. Skeleton Detection

A web-based video conferencing application tracks a pose of user’s skeleton by running a machine learning model, which allows for real-time human pose estimation, such as [PoseNet] to recognize her gesture and body language. When she raises her hand, her microphone is automatically unmuted and she can start speaking on the teleconference.

2.1.4. Face Recognition

There are multiple people in the conference room and they join an online meeting using a web-based video conferencing application. The application detects faces of participants by using object detection (for example, using object detection approaches such as [SSD]) and checks whether each face was present at the previous meeting or not by running a machine learning model such as [FaceNet], which verifies whether two faces would be identical or not.

2.1.5. Facial Landmark Detection

A user wants to find new glasses that beautifully fits her on an online glasses store. The online store offers web-based try-on simulator that runs a machine learning model such as Face Alignment Network [FAN] to detect facial landmarks like eyes, nose, mouth, etc. When she chooses a pair of glasses, the simulator properly renders the selected glasses on the detected position of eyes on her facial image.

2.1.6. Style Transfer

A user is looking for cosmetics on an online store and wondering which color may fit her face. The online store shows sample facial makeup images of cosmetics, and offers makeup simulator that runs a machine learning model like [ContextualLoss] or [PairedCycleGAN] to transfer the makeup style of the sample makeup image to her facial image. She can check how the selected makeup looks like on her face by the simulator.

2.1.7. Super Resolution

A web-based video conferencing is receiving a video stream from its peer, but the resolution of the video becomes lower due to network congestion. To prevent degradation of the perceived video quality, the application runs a machine learning model for super-resolution such as [SRGAN] to generate higher-resolution video frames.

2.1.8. Image Captioning

For better accessibility, a web-based presentation application provides automatic image captioning by running a machine learning model such as [im2txt] which predicts explanatory words of the presentation slides.

2.1.9. Text-to-image

Images are a core part of modern web experiences. An ability to generate images based on text input in a privacy-preserving manner enables visual personalization and adaptation of web applications and content. For example, a web application can use as an input a natural language description on the web page or a description provided by the user within a text prompt to produce an image matching the text description. This text-to-image use case enabled by latent diffusion model architecture [LDM] forms the basis for additional text-to-image use cases. For example, inpainting where a portion of an existing image on the web page is selectively modified using the newly generated content, or the converse, outpainting, where an original image is extended beyond its original dimensions filling the empty space with generated content.

2.1.10. Machine Translation

Multiple people from various countries are talking via a web-based real-time text chat application. The application translates their conversation by using a machine learning model such as [GNMT] or [OpenNMT], which translates every text into different language.

2.1.11. Emotion Analysis

A user is talking to her friend via a web-based real-time text chat application, and she is wondering how the friend feels because she cannot see the friend’s face. The application analyses the friend’s emotion by using a machine learning model such as [DeepMoji], which infers emotion from input texts, and displays an emoji that represents the estimated emotion.

2.1.12. Video Summarization

A web-based video conferencing application records received video streams, and it needs to reduce recorded video data to be stored. The application generates the short version of the recorded video by using a machine learning model for video summarization such as [Video-Summarization-with-LSTM].

2.1.13. Noise Suppression

A web-based video conferencing application records received audio streams, but usually the background noise is everywhere. The application leverages real-time noise suppression using Recurrent Neural Network such as [RNNoise] for suppressing background dynamic noise like baby cry or dog barking to improve audio experiences in video conferences.

2.1.14. Speech Recognition

Speech recognition, also known as speech to text, enables recognition and translation of spoken language into text. Example applications of speech recognition include transcription, automatic translation, multimodal interaction, real-time captioning and virtual assistants. Speech recognition improves accessibility of auditory content and makes it possible to interact with such content in a privacy-preserving manner in a textual form. Examples of common use cases include watching videos or participating in online meetings using real-time captioning. Models such as [Whisper] approach humans in their accuracy and robustness and are well positioned to improve accessibility of such use cases.

2.1.15. Text Generation

Various text generation use cases are enabled by large language models (LLM) that are able to perform tasks where a general ability to predict the next item in a text sequence is required. This class of models can translate texts, answer questions based on a text input, summarize a larger body of text, or generate text output based on a textual input. LLMs enable better performance compared to older models based on RNN, CNN, or LSTM architectures and further improve the performance of many other use cases discussed in this section. Examples of LLMs include [t5-small], [m2m100_418M], [gpt2], and [llama-2-7b].

2.1.16. Detecting fake video

A user is exposed to realistic fake videos generated by ‘deepfake’ on the web. The fake video can swap the speaker’s face into the president’s face to incite a user politically or to manipulate user’s opinion. The deepfake detection applications such as [FaceForensics++] analyze the videos and protect a user against the fake videos or images. When she watches a fake video on the web, the detection application alerts her of the fraud video in real-time.

2.2. Framework Use Cases

This section collects framework-level use cases for a dedicated low-level API for neural network inference hardware acceleration. It is expected that Machine Learning frameworks will be key consumers of the Web Neural Network API (WebNN API) and the low-level details exposed through the WebNN API are abstracted out from typical web developers. However, it is also expected that web developers with specific interest and competence in Machine Learning will want to interface with the WebNN API directly instead of a higher-level ML framework.

2.2.1. Custom Layer

A web application developer wants to run a DNN model on the WebNN API. However, she has found that some of activation functions like [LeakyReLU], [ELU], etc. are not included in the WebNN API. To address this issue, she constructs custom layers of the additional activation functions on top of the WebNN API. Note that the scope of custom layers may include convolution, normalization, etc. as well as activation.

2.2.2. Network Concatenation

A web application uses a DNN model, and its model data of upper convolutional layers and lower fully-connected layers are stored in separate files, since model data of the fully-connected layers are periodically updated due to fine tuning at the server side.

Therefore, the application downloads both partial model files at first and concatenates them into a single model. When the model is updated, the application downloads fine-tuned part of the model and replace only the fully-connected layers with it.

2.2.3. Performance Adaptation

A web application developer has a concern about performance of her DNN model on mobile devices. She has confirmed that it may run too slow on mobile devices which do not have GPU acceleration. To address this issue, her web application refers to the WebNN API to confirm whether acceleration is available or not, so that the application can display the warning for devices without acceleration.

After several weeks, she has developed a tiny DNN model that can even run on CPU. In order to accommodate CPU execution, she modifies the application so that the application loads the tiny model in the case of CPU-only devices.

2.2.4. Operation Level Execution

A JavaScript ML framework is responsible for loading, interpreting and executing a ML model. During the model execution phase, the framework iterates through the operations of the model and executes each operation on the hardware device, like CPU, GPU or ML accelerator. To avoid the unnecessary data copying across devices, the framework selects the same device to execute the operations. For a compute intensive operation, such as convolution 2D or matrix multiplication, the framework uses WebNN API to execute it with the ML-specific acceleration available on that selected device.

2.2.5. Integration with real-time video processing

The user experience of WebRTC-based video conferencing is enhanced using real-time video processing. For example, background blur implemented using a § 2.1.2 Semantic Segmentation model blurs the background in the user’s live camera feed. To satisfy the performance requirements of this use case, the WebNN API integrates with primitives from other Web APIs that make up the media pipeline to allow WebNN API-based transformation of real-time video streams.

3. Security Considerations

This specification defines a low-level API for neural network inference hardware acceleration. This API is considered a powerful feature [POWERFUL-FEATURES] because it grants low-level access to a user’s computer. To meet the authentication and confidentiality expectations of a powerful feature and to prevent man-in-the-middle attacks, all interfaces defined by this specification are only available in a secure context.

This API is disabled by default in all cross-origin frames using the § 6.4 Permissions Policy Integration. This prevents third-party content from using this API unless the embedding page explicitly sets a policy that grants permission.

This API allows creation of an MLContext from a GPUDevice defined by WebGPU specification. See WebGPU Security Considerations for more information regarding security characteristics of this context.

This API provides an abstraction across GPU, CPU, and dedicated ML accelerator hardware. When using a GPU, denial of service considerations similar to WebGPU apply. When using a CPU or a dedicated ML accelerator, the types of potential resource contention are different and mitigations will be implementation and configuration dependent. Implementations should use whatever mechanisms are available from the platform to prevent sites from using an unfair amount of system resources. These compute units are shared resources, and the use of any compute API will affect overall performance on a fully-loaded system.

Once the graph is fully constructed and compiled, the input shapes into each of the operations in the graph are inferred and finalized. The bounds checking occurs when the compute method is invoked that executes the graph against the actual data. No actual data is bound to the compiled graph before this stage. It is the implementation’s responsibility to make sure proper bounds checking occurs against the shapes of the data already inferred by that time.

Document operations susceptible to out-of-bounds access as a guidance to implementers.

Implementations must defend against control-flow attacks based on changes to data considered to be constant. For example, optimizations in the underlying platform may assume that a weight remains unchanged throughout a computation. If the API allowed the contents of buffers holding weights to change during a computation then those optimization assumptions would be invalidated, causing undefined behavior in the underlying platform. The API mitigates this category of attacks from script by always copying or transferring buffers, but implementations should consider additional defenses such as process isolation of data assumed to be constant.

As a future-proofing measure, the API design allows certain operations that can be generically emulated to be deprecated for security, performance, or other reasons without breaking compatibility. This is made possible by high-level functions that are defined in terms of smaller primitive operations defined in this specifications. This enables a native implementation of a high-level function to be replaced with a polyfill implementation.

Investigate side channel attack feasibility considering the current state where CPU is shared between processes running renderers.

In order to not allow an attacker to target a specific implementation that may contain a flaw, the § 6.2 Device Selection mechanism is a hint only, and the concrete device selection is left to the implementation - a user agent could for instance choose never to run a model on a device with known vulnerabilities. As a further mitigation, no device enumeration mechanism is defined.

Hinting partially mitigates the concern. Investigate additional mitigations.

The API design minimizes the attack surface for the compiled computational graph. The MLGraphBuilder interface that hosts the various operations is a data definition API and as such doesn’t execute anything, only constructs data. What follows, is that the potential for an attack is limited to when binding the data to the graph before executing it by invoking the MLContext.dispatch() method. This enables implementers to focus on hardening the MLContext.dispatch() method. For example, by making sure it honors the boundary of data and fails appropriately when the bounds are not respected.

Purpose-built Web APIs for measuring high-resolution time mitigate against timing attacks using techniques such as resolution reduction, adding jitter, detection of abuse and API call throttling [hr-time-3]. The practical deployment of WebNN implementations are likely to bring enough jitter to make timing attacks impractical (e.g. because they would use IPC) but implementers are advised to consider and test their implementations against timing attacks.

3.1. Guidelines for new operations

This section is non-normative.

To ensure operations defined in this specification are shaped in a way they can be implemented securely, this section includes guidelines on how operations are expected to be defined to reduce potential for implementation problems. These guidelines are expected to evolve over time to align with industry best practices:

In general, always consider the security and privacy implications as documented in [security-privacy-questionnaire] by the Technical Architecture Group and the Privacy Interest Group when adding new features.

4. Privacy Considerations

This API enhances privacy compared to cloud-based inference, since input data such as locally sourced images or video streams stay within the browser’s sandbox.

This API exposes the minimum amount of information necessary to address the identified § 2 Use cases for the best performance and reliability of results.

No information from the underlying platform is exposed directly. An execution time analysis may reveal indirectly the performance of the underlying platform’s neural network hardware acceleration capabilities relative to another underlying platform.

Note: The group is soliciting further input on the proposed execution time analysis fingerprinting vector and will augment this section with more information and mitigations to inform the implementers of this API.

Unlike WebGPU, this API does not intrinsically support custom shader authoring; and as a result is not prone to timing attacks that rely on shader caches, or other persistent data. The API builds upon pre-existing shaders and lower level primitives of the browser or the underlying OS. Web developers who interface with GPUDevice are expected to be aware of WebGPU compilation cache considerations.

The WebGPU API identifies machine-specific artifacts as a privacy consideration. Similarly, the WebNN API’s compute unit scheduling may under certain circumstances introduce a fingerprint. However, similarly to WebGPU, such fingerprints are identical across most or all of the devices of each vendor, mitigating the concern. Furthermore, software implementations can be used to further eliminate such artifacts.

The WebNN API defines two developer-settable preferences to help inform § 6.2 Device Selection and allow the implementation to better select the most appropriate underlying execution device for the workload. An MLDeviceType normatively indicates the kind of device and is one of: "cpu", "gpu", "npu". If this type cannot be satisfied, an "OperationError" DOMException is thrown, thus this type can in some cases add two bits of entropy to the fingerprint. An MLPowerPreference indicates preference as related to the power consumption and is considered a hint only and as such does not increase entropy of the fingerprint.

MLContextOptions is under active development, and the design is expected to change, informed by further implementation experience and new use cases from the wider web community. [Issue #623]

If a future version of this specification introduces support for a new MLDeviceType that can only support a subset of MLOperandDataTypes, that could introduce a new fingerprint.

In general, implementers of this API are expected to apply WebGPU Privacy Considerations to their implementations where applicable.

5. Ethical Considerations

The Working Group has started documenting ethical issues associated with using Machine Learning on the Web, to help identify what mitigations its normative specifications should take into account. The Working Group publishes and maintains an Ethical Principles for Web Machine Learning document [webmachinelearning-ethics] open to contributions from the wider community via a dedicated GitHub repository.

6. Programming Model

6.1. Overview

At the heart of neural networks is a computational graph of mathematical operations. These operations are the building blocks of modern machine learning technologies in computer vision, natural language processing, and robotics. The WebNN API is a specification for constructing, compiling, and executing computational graphs of neural networks.

The MLGraph interface represents a compiled computational graph that is immutable (that is, a model).

The MLGraphBuilder interface serves as a builder (factory) to construct a computational graph (its graph) that is then compiled to create an MLGraph.

In WebNN, a computational graph is composed of operators which act on data, and are the nodes of the graph. MLOperands are a representation of data that flows within the computational graph, and are the edges of the graph. MLOperands include a computational graph's input values for inference, constants (including trained weights) used for inference, intermediate values (often referred to as activations) computed during inference, as well as the output values of inference. An operator's input is one or more MLOperands. An operator's output is one or more MLOperands. Operators have operator-specific parameters that control their behavior, which can include zero or more activation functions.

A key part of the MLGraphBuilder interface are methods such as gemm() and relu() which create an operator which represents the actual operation to perform on the input data when the computation is run, and return a new MLOperand holding the operator. Methods that create an MLOperand connect any inputs and activations to the operator. Each method invocation returns a distinct new value, without changing the value of any other MLOperand.

An operator has a label, a string which may be included in diagnostics such as exception messages. When an operator is created its label is initialized in an implementation-defined manner and may include the passed label.

Note: Implementations are encouraged to use the label provided by developers to enhance error messages and improve debuggability, including both synchronous errors during graph construction and for errors that occur during the asynchronous build() method.

Consider adding a mechanism for reporting errors during dispatch(). [Issue #778]

At inference time, every MLOperand will be bound to a tensor (the actual data), which are essentially multidimensional arrays. The representation of the tensors is implementation dependent, but it typically includes the array data stored in some buffer (memory) and some metadata describing the array data (such as its shape).

Operations within the computational graph have functional semantics. This allows the implementation to potentially share the array data between multiple tensors. For example, the implementation of operations such as reshape, or slice may return a view of its input tensor that shares the same buffer as the input tensor. (In the case of reshape, the entire data is shared, while in the case of slice, a part of the input data is shared.) The implementation may use views, as above, for intermediate values.

Before the execution, the computation graph that is used to compute one or more specified outputs needs to be converted, compiled, and optimized. The key purpose of the compilation step is to enable optimizations that span two or more operations, such as operation or loop fusion. The user agent may also perform these optimizations during graph conversion.

The MLGraphBuilder.build() method compiles the graph in the background without blocking the calling thread, and returns a Promise that resolves to an MLGraph. Each MLGraphBuilder can build at most one MLGraph.

The MLGraph underlying implementation will be composed of platform-specific representations of operators and operands which correspond to the MLGraphBuilder's operators and MLOperands, but which are not script-visible and may be compositions or decompositions of the graph as constructed by script.

Once the MLGraph is constructed, the MLContext.dispatch() method performs the execution of the graph asynchronously either on a parallel timeline in a separate worker thread for the CPU execution or on a GPU timeline in a GPU command queue. This method returns immediately without blocking the calling thread while the actual execution is offloaded to a different timeline. The caller supplies the input values using MLNamedTensors, binding the input MLOperands to their values. The caller also supplies MLNamedTensors for output MLOperands which will contain the result of graph execution, if successful, which may be read back to script using the MLContext.readTensor(tensor) method. This type of execution supports CPU, GPU, and NPU devices.

6.2. Device Selection

An MLContext interface represents a global state of neural network execution. One of the important context states is the underlying execution device that manages the resources and facilitates the compilation and the eventual execution of the neural network graph. In addition to the default method of creation with MLContextOptions, an MLContext could also be created from a specific GPUDevice that is already in use by the application.

In a situation when a GPU context executes a graph with a constant or an input in the system memory as an ArrayBufferView, the input content is automatically uploaded from the system memory to the GPU memory, and downloaded back to the system memory of an ArrayBufferView output buffer at the end of the graph execution. This data upload and download cycles will only occur whenever the execution device requires the data to be copied out of and back into the system memory, such as in the case of the GPU. It doesn’t occur when the device is a CPU device. Additionally, the result of the graph execution is in a known layout format. While the execution may be optimized for a native memory access pattern in an intermediate result within the graph, the output of the last operation of the graph must convert the content back to a known layout format at the end of the graph in order to maintain the expected behavior from the caller’s perspective.

When an MLContext is created with MLContextOptions, the user agent selects and creates the underlying execution device by taking into account the application’s MLPowerPreference and MLDeviceType options.

6.3. Task Source

The ML task source is a task source to be used for all tasks related to asynchronous compilation and execution of MLGraphs and creation of MLContexts.

To queue an ML task given a global object global and a series of steps steps, queue a global task on the ML task source with global and steps.

6.4. Permissions Policy Integration

This specification defines a policy-controlled feature identified by the string "webnn". Its default allowlist is 'self'.

7. API

7.1. The navigator.ml interface

An ML object is available in the Window and DedicatedWorkerGlobalScope contexts through the Navigator and WorkerNavigator interfaces respectively and is exposed via navigator.ml.

interface mixin NavigatorML {
  [SecureContext, SameObject] readonly attribute ML ml;
};
Navigator includes NavigatorML;
WorkerNavigator includes NavigatorML;

7.2. ML interface

enum MLDeviceType {
  "cpu",
  "gpu",
  "npu"
};

enum MLPowerPreference {
  "default",
  "high-performance",
  "low-power"
};

dictionary MLContextOptions {
  MLDeviceType deviceType = "cpu";
  MLPowerPreference powerPreference = "default";
};

[SecureContext, Exposed=(Window, DedicatedWorker)]
interface ML {
  Promise<MLContext> createContext(optional MLContextOptions options = {});
  Promise<MLContext> createContext(GPUDevice gpuDevice);
};

7.2.1. MLContextOptions

MLContextOptions is under active development, and the design is expected to change, informed by further implementation experience and new use cases from the wider web community. The Working Group is considering additional API controls to allow the definition of a fallback device, multiple devices in a preferred order, or an exclusion of a specific device. Other considerations under discussion include error handling, ultimate fallback, and quantized operators. Feedback is welcome on any of these design considerations from web developers, library authors, OS and hardware vendors, and other stakeholders via GitHub: [Issue #623]

The deviceType option is an MLDeviceType and indicates the application’s preference for the kind of device used for the context. It is one of the following:

"cpu"
Provides the broadest compatibility and usability across all client devices with varying degrees of performance.
"gpu"
Provides the broadest range of achievable performance across graphics hardware platforms from consumer devices to professional workstations. The underlying platform implementation may fall back to other devices for certain operators and parts of the graph.
"npu"
Provides power efficiency for sustained workloads across hardware platforms with purpose-built accelerators. The underlying platform implementation may fall back to other devices for certain operators and parts of the graph.

The powerPreference option is an MLPowerPreference and indicates the application’s preference as related to power consumption. It is one of the following:

"default"
Let the user agent select the most suitable behavior.
"high-performance"
Prioritizes execution speed over power consumption.
"low-power"
Prioritizes power consumption over other considerations such as execution speed.

7.2.2. createContext()

Arguments: Returns: an MLContext.
To create a context given realm realm and options (a GPUDevice or MLContextOptions), run these steps:
  1. Let context be a new MLContext object with realm.

  2. If options is a GPUDevice object:

    1. Set context.[[contextType]] to "webgpu".

    2. Set context.[[deviceType]] to "gpu".

    3. Set context.[[powerPreference]] to "default".

  3. Otherwise:

    1. Set context.[[contextType]] to "default".

    2. Set context.[[lost]] to a new promise.

    3. If options["deviceType"] exists, then set context.[[deviceType]] to options["deviceType"]. Otherwise, set context.[[deviceType]] to "cpu".

    4. If options["powerPreference"] exists, then set context.[[powerPreference]] to options["powerPreference"]. Otherwise, set context.[[powerPreference]] to "default".

  4. If the user agent cannot support context.[[contextType]], context.[[deviceType]] and context.[[powerPreference]], return failure.

  5. Return context.

The createContext(options) steps are:
  1. Let global be this's relevant global object.

  2. If global’s associated Document is not allowed to use the webnn feature, return a new promise rejected with a "SecurityError" DOMException.

  3. Let realm be this's relevant realm.

  4. Let promise be a new promise.

  5. Run the following steps in parallel.

    1. Let context be the result of creating a context given realm and options. If that returns failure, then queue an ML task with global to reject promise with a "NotSupportedError" DOMException and abort these steps.

    2. Queue an ML task with global to resolve promise with context.

  6. Return promise.

The createContext(gpuDevice) method steps are:
  1. Let global be this's relevant global object.

  2. If global’s associated Document is not allowed to use the webnn feature, return a new promise rejected with a "SecurityError" DOMException.

  3. Let realm be this's relevant realm.

  4. Let promise be a new promise.

  5. Run the following steps in parallel.

    1. Let context be the result of creating a context given realm and gpuDevice. If that returns failure, then queue an ML task with global to reject promise with a "NotSupportedError" DOMException and abort these steps.

    2. Queue an ML task with global to resolve promise with context.

  6. Return promise.

7.3. MLContext interface

The MLContext interface represents a global state of neural network compute workload and execution processes. Each MLContext object has associated context type, MLDeviceType and MLPowerPreference.
typedef record<USVString, MLTensor> MLNamedTensors;

dictionary MLContextLostInfo {
  DOMString message;
};

[SecureContext, Exposed=(Window, DedicatedWorker)]
interface MLContext {
  undefined dispatch(MLGraph graph, MLNamedTensors inputs, MLNamedTensors outputs);

  Promise<MLTensor> createTensor(MLTensorDescriptor descriptor);

  Promise<ArrayBuffer> readTensor(MLTensor tensor);
  Promise<undefined> readTensor(MLTensor tensor, AllowSharedBufferSource outputData);

  undefined writeTensor(MLTensor tensor, AllowSharedBufferSource inputData);

  MLOpSupportLimits opSupportLimits();

  undefined destroy();

  readonly attribute Promise<MLContextLostInfo> lost;
};
MLContext has the following internal slots:
[[contextType]] of type context type.

The MLContext's context type.

[[deviceType]] of type MLDeviceType.

The MLContext's MLDeviceType.

[[powerPreference]] of type MLPowerPreference.

The MLContext's MLPowerPreference.

[[lost]] of type Promise<MLContextLostInfo>.

A Promise that is resolved when the MLContext's underlying execution device is no longer available.

[[timeline]]

A timeline associated with the execution of operations on the compute units of the MLContext. These operations include inferencing on computational graphs and modifying the [[data]] of MLTensors.

More rigorously define this timeline. [Issue #529]

The context type is the type of the execution context that manages the resources and facilitates the compilation and execution of the neural network graph:

"default"
Context created per user preference options.
"webgpu"
Context created from WebGPU device.
When the [[contextType]] is set to default with the MLContextOptions.deviceType set to "gpu", the user agent is responsible for creating an internal GPU device that operates within the context and is capable of ML workload submission on behalf of the calling application.
To validate buffer with descriptor given AllowSharedBufferSource bufferSource and MLOperandDescriptor descriptor, run the following steps:
  1. If bufferSource’s byte length is not equal to descriptor’s byte length return false.

  2. Switch on the type of bufferSource:

    ArrayBuffer

    Return true.

    SharedArrayBuffer

    Return true.

    ArrayBufferView

    If bufferSource’s element type matches descriptor’s dataType according to this table return true, otherwise return false.

To validate tensors with descriptors given an MLNamedTensors namedTensors with record<USVString, MLOperandDescriptor> namedDescriptors:
  1. If namedTensors’s size is not equal to namedDescriptors’s size, then return false.

  2. For each nametensor of namedTensors:

    1. If namedDescriptors[name] does not exist, then return false.

    2. If tensor.[[descriptor]] is not equal to namedDescriptors[name], then return false.

  3. Return true.

7.3.1. dispatch()

Schedules the computational workload of a compiled MLGraph on the MLContext's [[timeline]].

Arguments:

Returns: undefined.

Note: dispatch() itself provides no signal that graph execution has completed. Rather, callers can await the results of reading back the output tensors. See § 7.3.1.1 Examples below.

The dispatch(graph, inputs, outputs) method steps are:
  1. If graph.[[context]] is not this, then throw a TypeError.

  2. If graph.[[isDestroyed]] is true, then throw an "InvalidStateError" DOMException.

  3. Let allTensors be a list of MLTensors consisting of inputs’s values extended by outputs’s values.

  4. If allTensors contains any duplicate items, then throw a TypeError.

  5. For each tensor of allTensors:

    1. If tensor.[[context]] is not this, then throw a TypeError.

    2. If tensor.[[isDestroyed]] is true, then throw a TypeError.

  6. If validating tensors with descriptors given inputs and graph.[[inputDescriptors]] returns false, then throw a TypeError.

  7. If validating tensors with descriptors given outputs and graph.[[outputDescriptors]] returns false, then throw a TypeError.

  8. Enqueue the following steps to graph.[[context]].[[timeline]]:

    1. Run these steps, but abort when this is lost:

      1. Issue a compute request to graph.[[implementation]] given inputs and outputs.

        Add a mechanism for reporting errors during graph execution. [Issue #778]

  9. Return undefined.

7.3.1.1. Examples
The following code showcases executing an MLGraph using MLTensors.
const descriptor = {dataType: 'float32', shape: [2, 2]};
const context = await navigator.ml.createContext();
const builder = new MLGraphBuilder(context);

// 1. Create a computational graph 'C = 0.2 * A + B'.
const constant = builder.constant(descriptor, new Float32Array(4).fill(0.2));
const A = builder.input('A', descriptor);
const B = builder.input('B', descriptor);
const C = builder.add(builder.mul(A, constant), B);

// 2. Compile the graph.
const graph = await builder.build({'C': C});

// 3. Create reusable input and output tensors.
const [inputTensorA, inputTensorB, outputTensorC] =
    await Promise.all([
      context.createTensor({
        dataType: A.dataType, shape: A.shape, writable: true
      }),
      context.createTensor({
        dataType: B.dataType, shape: B.shape, writable: true
      }),
      context.createTensor({
        dataType: C.dataType, shape: C.shape, readable: true
      })
    ]);

// 4. Initialize the inputs.
context.writeTensor(inputTensorA, new Float32Array(4).fill(1.0));
context.writeTensor(inputTensorB, new Float32Array(4).fill(0.8));

// 5. Execute the graph.
const inputs = {
  'A': inputTensorA,
  'B': inputTensorB
};
const outputs = {
  'C': outputTensorC
};
context.dispatch(graph, inputs, outputs);
    
// 6. Read back the computed result.
const result = await context.readTensor(outputTensorC);
console.log('Output value:', new Float32Array(result));  // [1, 1, 1, 1]

7.3.2. createTensor()

Creates an MLTensor associated with this MLContext.

Arguments:

Returns: Promise<MLTensor>.

The createTensor(descriptor) method steps are:
  1. Let global be this's relevant global object.

  2. If this is lost, then return a new promise rejected with an "InvalidStateError" DOMException.

  3. Let tensor be the result of creating an MLTensor given this, and descriptor.

  4. Let promise be a new promise.

  5. Enqueue the following steps to this.[[timeline]]:

    1. Run these steps, but abort when this is lost:

      1. Create tensor.[[data]] given descriptor and initialize all bytes to zeros.

      2. If that fails, then queue an ML task with global to reject promise with an "UnknownError" DOMException, and abort these steps.

      3. Otherwise, queue an ML task with global to resolve promise with tensor.

    2. If aborted, then queue an ML task with global to reject promise with an "InvalidStateError" DOMException.

  6. Return promise.

7.3.3. readTensor(tensor)

Reads back the [[data]] of an MLTensor from the MLContext.[[timeline]] to script.

Arguments:

Returns: Promise<ArrayBuffer>. A buffer containing the result of the read.

The readTensor(tensor) method steps are:
  1. Let global be this's relevant global object.

  2. Let realm be this's relevant realm.

  3. If tensor.[[context]] is not this, then return a new promise rejected with a TypeError.

  4. If tensor.[[isDestroyed]] is true, then return a new promise rejected with a TypeError.

  5. If tensor.[[descriptor]].readable is false, then return a new promise rejected with a TypeError.

  6. Let promise be a new promise.

  7. Enqueue the following steps to tensor.[[context]].[[timeline]]:

    1. Run these steps, but abort when this is lost:

      1. Let bytes be a byte sequence containing a copy of tensor.[[data]].

      2. If that fails, then queue an ML task with global to reject promise with an "UnknownError" DOMException, and abort these steps.

      3. Otherwise, queue an ML task with global to create an ArrayBuffer result given bytes and realm and then resolve promise with result.

    2. If aborted, then queue an ML task with global to reject promise with an "InvalidStateError" DOMException.

  8. Return promise.

7.3.4. readTensor(tensor, outputData)

Bring-your-own-buffer variant of readTensor(tensor). Reads back the [[data]] of an MLTensor into the provided buffer.

Arguments:

Returns: Promise<undefined>.

The readTensor(tensor, outputData) method steps are:
  1. Let global be this's relevant global object.

  2. If tensor.[[context]] is not this, then return a new promise rejected with a TypeError.

  3. If tensor.[[isDestroyed]] is true, then return a new promise rejected with a TypeError.

  4. If tensor.[[descriptor]].readable is false, then return a new promise rejected with a TypeError.

  5. If validating buffer with descriptor given outputData and tensor.[[descriptor]] returns false, then return a new promise rejected with a TypeError.

  6. Let promise be a new promise.

  7. Enqueue the following steps to tensor.[[context]].[[timeline]]:

    1. Run these steps, but abort when this is lost:

      1. Let bytes be a byte sequence containing a copy of tensor.[[data]].

      2. If that fails, then queue an ML task with global to reject promise with an "UnknownError" DOMException, and abort these steps.

      3. Otherwise, queue an ML task with global to run these steps:

        1. If outputData is detached, reject promise with a TypeError, and abort these steps.

          Note: Validating buffer with descriptor above will fail if outputData is detached, but it is possible that outputData could be detached between that step and this one.

        2. Write bytes to outputData.

        3. Resolve promise with undefined.

    2. If aborted, then queue an ML task with global to reject promise with an "InvalidStateError" DOMException.

  8. Return promise.

7.3.5. writeTensor()

Writes data to the [[data]] of an MLTensor on the MLContext's [[timeline]].

Arguments:

Returns: undefined.

The writeTensor(tensor, inputData) method steps are:
  1. If tensor.[[context]] is not this, then throw a TypeError.

  2. If tensor.[[isDestroyed]] is true, then throw a TypeError.

  3. If tensor.[[descriptor]].writable is false, then throw a TypeError.

  4. If validating buffer with descriptor given inputData and tensor.[[descriptor]] returns false, then throw a TypeError.

  5. Let bytes be the result of getting a copy of the bytes held by the buffer source given inputData.

  6. Assert: bytes’s length is equal to tensor.[[descriptor]]'s byte length.

  7. Enqueue the following steps to tensor.[[context]].[[timeline]]:

    1. Run these steps, but abort when this is lost:

      1. Copy bytes to tensor.[[data]].

        Add a mechanism for reporting errors while writing to a tensor. [Issue #778]

  8. Return undefined.

Note: Similar to dispatch(), writeTensor() itself provides no signal that the write has completed. To inspect the contents of a tensor, callers can await the results of reading back the tensor.

7.3.6. opSupportLimits()

The opSupportLimits() exposes level of support that differs across implementations at operator level. Consumers of the WebNN API are encouraged to probe feature support level by using opSupportLimits() to determine the optimal model architecture to be deployed for each target platform.
7.3.6.1. MLOpSupportLimits dictionary
The MLOpSupportLimits has the following top level members, aside from these, each operator has a corresponding member defined in its builder method.
dictionary MLOpSupportLimits {
  MLInputOperandLayout preferredInputLayout;
  MLSupportLimits input;
  MLSupportLimits constant;
  MLSupportLimits output;
};
preferredInputLayout, of type MLInputOperandLayout

Preferred input layout for layout dependent operators like conv2d().

input, of type MLSupportLimits

Support limits for input MLOperands for an MLGraph.

constant, of type MLSupportLimits

Support limits for constant MLOperands for an MLGraph.

output, of type MLSupportLimits

Support limits for output MLOperands for an MLGraph.

7.3.6.2. MLSupportLimits dictionary
dictionary MLSupportLimits {
  sequence<MLOperandDataType> dataTypes;
};
dataTypes, of type sequence<MLOperandDataType>

Supported data types.

7.3.6.3. MLBinarySupportLimits dictionary
dictionary MLBinarySupportLimits {
  MLSupportLimits a;
  MLSupportLimits b;
  MLSupportLimits output;
};
a, of type MLSupportLimits

MLSupportLimits for a operand.

b, of type MLSupportLimits

MLSupportLimits for b operand.

output, of type MLSupportLimits

MLSupportLimits for output operand.

7.3.6.4. MLSingleInputSupportLimits dictionary
dictionary MLSingleInputSupportLimits {
  MLSupportLimits input;
  MLSupportLimits output;
};
input, of type MLSupportLimits

MLSupportLimits for input operand.

output, of type MLSupportLimits

MLSupportLimits for output operand.

7.3.7. destroy()

The destroy() method can be called to release all resources associated with the context. Any outstanding compute requests and MLTensor creation/read/write requests will fail.

The destroy() method steps are:
  1. If this is lost, then abort these steps.

  2. Run the steps to lose this with an implementation-defined message.

    Note: A message indicating that destroy() was called can help developers distinguish the cause of the context loss.

7.3.8. Errors

When a user agent determines that an MLContext is no longer available to fulfill requests, it must run the context lost steps for it.

The context lost steps for MLContext context, are:
  1. Let global be context’s relevant global object.

  2. Queue an ML task with global to run these steps:

    1. Lose context, with an implementation-defined message.

To lose MLContext context with DOMString message:
  1. Let info be a new MLContextLostInfo.

  2. Set info.message to message.

  3. Resolve context.[[lost]] with info.

  4. For each MLGraph graph where graph.[[context]] equals this:

    1. Run the destroy() method steps for graph with graph as this.

  5. For each MLTensor tensor where tensor.[[context]] equals this:

    1. Run the destroy() method steps for tensor with tensor as this.

message, of type DOMString

An implementation-defined message providing information about the error that occurred.

The lost getter steps are to return this's [[lost]] Promise.

A MLContext is lost if its [[lost]] Promise is settled.

7.4. MLGraph interface

The MLGraph interface represents a compiled computational graph. A compiled graph once constructed is immutable and cannot be subsequently changed.
[SecureContext, Exposed=(Window, DedicatedWorker)]
interface MLGraph {
  undefined destroy();
};
MLGraph has the following internal slots:
[[context]] of type MLContext

The context of type MLContext associated with this MLGraph.

[[inputDescriptors]] of type record<USVString, MLOperandDescriptor>

Maps the name of an input MLOperand to its MLOperandDescriptor for all input MLOperands of this MLGraph.

[[outputDescriptors]] of type record<USVString, MLOperandDescriptor>

Maps the name of an output MLOperand to its MLOperandDescriptor for all output MLOperands of this MLGraph.

[[implementation]]

The underlying implementation provided by the User Agent.

[[isDestroyed]] of type boolean

Whether the MLGraph.destroy() method steps have been run. Once destroyed, the MLGraph can no longer be used.

7.4.1. destroy()

The destroy() method can be called to release all resources associated with the graph.

The destroy() method steps are:
  1. If this.[[isDestroyed]] is true, then abort these steps.

  2. Set this.[[isDestroyed]] to true.

  3. Queue a task on this.[[context]].[[timeline]] to mark resources owned by this graph as freeable.

Note: Since no further workloads can be enqueued using this graph, implementations can free any additional resource allocations associated with this graph once all previously submitted workloads using it are complete.

7.5. MLOperandDescriptor dictionary

An MLOperandDescriptor describes the shape (dimensions) and data type of an operand. They are used to describe the inputs and constants for an MLGraph, and every MLOperand has an internal MLOperandDescriptor.

enum MLInputOperandLayout {
  "nchw",
  "nhwc"
};

enum MLOperandDataType {
  "float32",
  "float16",
  "int32",
  "uint32",
  "int64",
  "uint64",
  "int8",
  "uint8"
};

dictionary MLOperandDescriptor {
  required MLOperandDataType dataType;
  required sequence<[EnforceRange] unsigned long> shape;
};
dataType, of type MLOperandDataType

The operand data type.

shape, of type sequence<[EnforceRange] unsigned long>

The list of dimensions of the operand. It is empty for scalar operands.

To create an MLOperandDescriptor given MLOperandDataType dataType and list shape, run the following steps:
  1. Let descriptor be a new MLOperandDescriptor.

  2. Set descriptor.dataType to dataType.

  3. Set descriptor.shape to a clone of shape.

  4. Return descriptor.

The byte length of an MLOperandDescriptor desc is the value returned by the following steps:
  1. Let elementLength be 1.

  2. For each dimension of desc.shape:

    1. Set elementLength to elementLength * dimension.

  3. Let elementSize be the element size of one of the ArrayBufferView types that matches desc.dataType according to this table.

  4. Return elementLength * elementSize.

A valid dimension is an integer greater than zero and in the range of long. Implementations may impose a smaller upper bound.

Should 0-size dimensions be supported? [Issue #391]

To check dimensions given MLOperandDescriptor descriptor, run the following steps:
  1. If any element of descriptor.shape is not a valid dimension, return false.

  2. If descriptor.shape's size is too large to be supported by the implementation, return false.

    The maximum number of operand dimensions is not defined, but native ML APIs usually have a maximum supported size. [Issue #456]

  3. If descriptor’s byte length is not supported by the implementation, then return false.

  4. Return true.

7.6. MLOperand interface

An MLOperand represents an intermediary graph being constructed as a result of compositing parts of an operation into a fully composed operation.

For instance, an MLOperand can represent a constant feeding to an operation or the result from combining multiple constants together into an operation. See also § 6 Programming Model.

[SecureContext, Exposed=(Window, DedicatedWorker)]
interface MLOperand {
  readonly attribute MLOperandDataType dataType;
  readonly attribute FrozenArray<unsigned long> shape;
};

dictionary MLOperatorOptions {
  USVString label = "";
};

typedef (bigint or unrestricted double) MLNumber;
MLOperand has the following internal slots:
[[builder]] of type MLGraphBuilder

The MLOperand's associated builder object.

[[descriptor]] of type MLOperandDescriptor

The MLOperand's descriptor.

[[name]] of type string

The MLOperand's name (only for input operands).

[[operator]] of type operator

Reference to MLOperand's corresponding operator.

An MLOperand's dataType is its [[descriptor]].dataType.

An MLOperand's shape is its [[descriptor]].shape.

An MLOperand's rank is its shape's size.

The dataType getter steps are to return this's dataType.

The shape getter steps are to return this's shape.

Since the [[builder]] object is bound by the MLGraphBuilder() constructor to an MLContext object, an MLOperand is also always bound to the same MLContext object.

If an operation supports only a subset of MLOperandDataTypes, the allowed data types for each of the operation’s input operands, including both positional arguments and options, are given as either an explicit list of MLOperandDataTypes, or a constraint that the operand’s dataType must be the same as the dataType of another input operand, or any to allow any MLOperandDataType.

If an operation requires input operands with a particular rank, the allowed ranks for each of the operation’s input operands, including both positional arguments and options, are given as an explicit rank (e.g. 1), or N to allow any dimensionality, or the same as another operand. More specific constraints are common, such as when an input operand’s shape must be unidirectionally broadcastable to or bidirectionally broadcastable with another input operand; in these cases, the allowed ranks are listed as a range, with specific validation given as steps in the operation.

MLOperatorOptions has the following members:

label, of type USVString, defaulting to ""

Optionally provided when an operator is created using MLGraphBuilder methods that create MLOperands. The implementation may use this value to initialize the operator's label.

7.6.1. Creating an MLOperand

The MLOperand objects are created by the methods of MLGraphBuilder, internally using the following algorithms.
To create an MLOperand given MLGraphBuilder builder and MLOperandDescriptor desc, run the following steps:
  1. Let operand be a new MLOperand.

  2. Set operand.[[builder]] to builder.

  3. Set operand.[[descriptor]] to desc.

  4. Return operand.

To copy an MLOperand given MLOperand operand, run the following steps:
  1. Let result be a new MLOperand.

  2. Set result.[[builder]] to operand.[[builder]].

  3. Set result.[[descriptor]] to operand.[[descriptor]].

  4. If operand.[[name]] exists, then set result.[[name]] to operand.[[name]].

  5. Return result.

To validate operand given MLGraphBuilder builder and MLOperand operand, return true if operand.[[builder]] is builder, and false otherwise.

7.6.1.1. MLNumber

MLNumber is used when specifying the type of a numeric option for an MLOperand which can be of any MLOperandDataType, including both 64-bit integer types ("uint64" and "int64") and 32-bit floating point ("float32"). Implementations process the value according to the corresponding MLOperandDataType. For example, if clamp(input, options) is called with an MLOperand with dataType "uint32", the MLNumber parameters are explicitly cast to unsigned long.

Specifying the option as double would lose accuracy when passing values over 253, and specifying long long would disallow values over 263.

Support for unions of bigint and numeric types is new in [WEBIDL], and implementation support is also limited. Prototype implementations are encouraged to provide feedback for this approach. [Issue #whatwg/webidl#1388]

7.7. MLTensorDescriptor dictionary

An MLTensorDescriptor describes the characteristics and capabilities of an MLTensor.

dictionary MLTensorDescriptor : MLOperandDescriptor {
  boolean readable = false;
  boolean writable = false;
};
readable, of type boolean, defaulting to false

Whether the tensor’s contents can be read via readTensor(tensor) or readTensor(tensor, outputData).

writable, of type boolean, defaulting to false

Whether the tensor’s contents can be written to via writeTensor().

7.8. MLTensor interface

The MLTensor interface represents a tensor which may be used as an input or output to an MLGraph. The memory backing an MLTensor should be allocated in an implementation-defined fashion according to the requirements of the MLContext and the MLTensorDescriptor used to create it. Operations involving the [[data]] of an MLTensor occur on the [[timeline]] of its associated MLContext.

The implementation-defined requirements of how an MLTensor is allocated may include constraints such as that the memory is allocated with a particular byte alignment or in a particular memory pool.

[SecureContext, Exposed=(Window, DedicatedWorker)]
interface MLTensor {
  readonly attribute MLOperandDataType dataType;
  readonly attribute FrozenArray<unsigned long> shape;
  readonly attribute boolean readable;
  readonly attribute boolean writable;

  undefined destroy();
};
MLTensor has the following internal slots:
[[context]] of type MLContext

The MLTensor's associated context.

[[descriptor]] of type MLTensorDescriptor

The MLTensor's descriptor.

[[isDestroyed]] of type boolean

Whether the MLTensor.destroy() steps have been run. Once destroyed, the MLTensor can no longer be used.

[[data]] of an implementation-defined type

The bytes backing the MLTensor. This data may only be accessed or modified from the [[context]].[[timeline]].

An MLTensor's dataType is its [[descriptor]]'s dataType.

An MLTensor's shape is its [[descriptor]]'s shape.

The dataType getter steps are to return this's dataType.

The shape getter steps are to return this's shape.

The readable getter steps are to return this.[[descriptor]].readable.

The writable getter steps are to return this.[[descriptor]].writable.

7.8.1. Creating an MLTensor

An MLTensor is created by its associated MLContext.

To create an MLTensor given MLContext context and MLTensorDescriptor descriptor, run the following steps:
  1. Let tensor be a new MLTensor.

  2. Set tensor.[[context]] to context.

  3. Set tensor.[[descriptor]] to descriptor.

  4. Set tensor.[[isDestroyed]] to false.

  5. Return tensor.

7.8.2. destroy()

Releases the resources associated with the MLTensor. This method is idempotent.

Returns: undefined.
The destroy() method steps are:
  1. Set this.[[isDestroyed]] to true.

  2. Enqueue the following steps to this.[[context]].[[timeline]]:

    1. Release this.[[data]].

  3. Return undefined.

Note: Since no further operations can be enqueued using this tensor, implementations can free any additional resource allocations associated with this tensor once all previously submitted operations using it are complete.

7.9. MLGraphBuilder interface

The MLGraphBuilder interface defines a set of operations as identified by the § 2 Use cases that can be composed into a computational graph. It also represents the intermediate state of a graph building session.

typedef record<USVString, MLOperand> MLNamedOperands;

[SecureContext, Exposed=(Window, DedicatedWorker)]
interface MLGraphBuilder {
  // Construct the graph builder from the context.
  constructor(MLContext context);

  // Create an operand for a graph input.
  MLOperand input(USVString name, MLOperandDescriptor descriptor);

  // Create an operand for a graph constant.
  MLOperand constant(MLOperandDescriptor descriptor,
                     AllowSharedBufferSource buffer);

  // Create a scalar operand from the specified number of the specified type.
  MLOperand constant(MLOperandDataType type, MLNumber value);

  // Compile the graph up to the specified output operands asynchronously.
  Promise<MLGraph> build(MLNamedOperands outputs);
};
The MLGraphBuilder.build() method compiles the graph builder state up to the specified output operands into a compiled graph according to the type of MLContext that creates it. When the [[contextType]] of the MLContext is set to "default", the compiled graph is initialized right before the MLGraph is returned. This graph initialization stage is important for optimal performance of the subsequent graph executions. It typically involves a process known as "weight preprocessing" where all the constant inputs to the graph are preprocessed and cached at the operating system level for subsequent graph execution calls. The initializing inputs are typically the constant weight data specified through the constant() method as constant operands during graph construction time.
MLGraphBuilder has the following internal slots:
[[context]] of type MLContext

The context of type MLContext associated with this MLGraphBuilder.

[[hasBuilt]] of type boolean

Whether MLGraphBuilder.build() has been called. Once built, the MLGraphBuilder can no longer create operators or compile MLGraphs.

An MLGraphBuilder can build if its [[hasBuilt]] is false and its [[context]] is not lost.

7.9.1. MLGraphBuilder constructor

Arguments:
The new MLGraphBuilder(context) constructor steps are:
  1. If this's relevant global object's associated Document is not allowed to use the webnn feature, then throw a "SecurityError" DOMException.

  2. If context is lost, then throw an "InvalidStateError" DOMException.

  3. Set this.[[context]] to context.

  4. Set this.[[hasBuilt]] to false.

7.9.2. input operands

Create a named MLOperand based on a descriptor, that can be used as an input.

Arguments: Returns: an MLOperand.
The input(name, descriptor) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If name is empty, then throw a TypeError.

  3. If any MLOperands in this's graph's inputs have a [[name]] equal to name, then throw a TypeError.

  4. If checking dimensions given descriptor returns false, then throw a TypeError.

  5. Make graph connections:

    1. Let operand be the result of creating an MLOperand given this and descriptor.

    2. Set operand.[[name]] to name.

    3. Add operand to this's graph's inputs.

  6. Return operand.

The MLGraphBuilder API allows creating an MLGraph without input operands. If the underlying platform doesn’t support that, implementations can add a stub input, or pass constants as inputs to the graph.

7.9.3. constant operands

Create a constant MLOperand that can be used in MLGraphBuilder methods.
7.9.3.1. constant(descriptor, buffer)
Create a constant MLOperand of the specified data type and shape that contains the initializing data.
Arguments: Returns: an MLOperand. The constant output tensor.
The constant(descriptor, buffer) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If checking dimensions given descriptor returns false, then throw a TypeError.

  3. If validating buffer with descriptor given buffer and descriptor returns false, then throw a TypeError.

  4. Make graph connections:

    1. Let operand be the result of creating an MLOperand given this and descriptor.

    2. Let bytes be the result of getting a copy of the bytes held by the buffer source given buffer.

    3. Add operand to this's graph's constants with bytes as value.

  5. Return operand.

7.9.3.2. constant(type, value)
Create a scalar constant MLOperand of the specified value and data type.
Data truncation will occur when the specified value exceeds the range of the specified output data type e.g. when a floating point value is assigned to an "int8" data type, etc.
Arguments: Returns: an MLOperand. The constant output.
The constant(type, value) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. Set value to the result of casting value to type.

  3. Let descriptor be the result of creating an MLOperandDescriptor given type and « ».

  4. Make graph connections:

    1. Let operand be the result of creating an MLOperand given this and descriptor.

    2. Add operand to this's graph's constants with value as value.

  5. Return operand.

7.9.4. build method

Build a composed graph up to a given output operand into a computational graph asynchronously.
Arguments: Returns: Promise<MLGraph>.
The build(outputs) method steps are:
  1. If this can not build, then return a new promise rejected with an "InvalidStateError" DOMException.

  2. If outputs is empty, then return a new promise rejected with a TypeError.

  3. For each nameoperand of outputs:

    1. If name is empty, then return a new promise rejected with a TypeError.

    2. If validating operand given this and operand returns false, then return a new promise rejected with a TypeError.

    3. If operand is in this's graph's inputs or constants, then return a new promise rejected with a TypeError.

  4. Let operands be a new empty set.

  5. Let operators be a new empty set.

  6. Let inputs be a new empty set.

  7. Let queue be a new queue containing outputs’s values.

  8. While queue is not empty:

    1. Dequeue operand from queue.

    2. Append operand to operands.

    3. Append operand.[[operator]] to operators.

    4. If operand is in this's graph's inputs, append operand to inputs.

    5. For each input of operand.[[operator]]'s inputs:

      1. Enqueue input to queue.

  9. Let global be this's relevant global object.

  10. Let realm be this's relevant realm.

  11. Let graph be a new MLGraph with realm.

  12. Set graph.[[context]] to this.[[context]].

  13. Set graph.[[isDestroyed]] to false.

  14. For each operand in inputs:

    1. Set graph.[[inputDescriptors]][operand.[[name]]] to operand.[[descriptor]].

  15. For each nameoperand of outputs:

    1. Set graph.[[outputDescriptors]][name] to operand.[[descriptor]].

  16. Set this.[[hasBuilt]] to true.

  17. Let promise be a new promise.

  18. Run the following steps in parallel:

    1. Run these steps, but abort when graph.[[context]] is lost:

      1. Let graphImpl be the result of converting this's graph with operands, operators, inputs, and outputs’s values into an implementation-defined format which can be interpreted by the underlying platform.

      2. If the previous step failed, then queue an ML task with global to reject promise with an "OperationError" DOMException, and abort these steps.

      3. Set graph.[[implementation]] to graphImpl.

      4. Queue an ML task with global to resolve promise with graph.

    2. If aborted, then queue an ML task with global to reject promise with an "InvalidStateError" DOMException.

  19. Return promise.

NOTE: Specifying an input operand or constant operand as a graph output results in an error, as this is usually an incorrect usage of the API. Callers can work around this by introducing an identity() operator.

7.9.5. argMin/argMax operations

Return the index location of the minimum or maximum values of all the input values along the axis. In case of ties, the identity of the return value is implementation dependent.
dictionary MLArgMinMaxOptions : MLOperatorOptions {
  boolean keepDimensions = false;
  MLOperandDataType outputDataType = "int32";
};

partial interface MLGraphBuilder {
  MLOperand argMin(MLOperand input, [EnforceRange] unsigned long axis,
                   optional MLArgMinMaxOptions options = {});
  MLOperand argMax(MLOperand input, [EnforceRange] unsigned long axis,
                   optional MLArgMinMaxOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits argMin;
  MLSingleInputSupportLimits argMax;
};

MLArgMinMaxOptions has the following members:

keepDimensions, of type boolean, defaulting to false

If true, retains reduced dimensions with size 1.

outputDataType, of type MLOperandDataType, defaulting to "int32"

An MLOperandDataType. The output data type.

Arguments:

Returns: an MLOperand. The N-D tensor of the reduced shape. The values must be of type options.outputDataType in the range [0, N-1] where N is the size of the input dimension specified by axis.

Constraints for argMin()/argMax()
operand allowed data types allowed ranks
input any N
output outputDataType input's rank - 1 to input's rank

MLOpSupportLimits has the following members for argMin() and argMax():

argMin, of type MLSingleInputSupportLimits

Support limits for operator argMin().

argMax, of type MLSingleInputSupportLimits

Support limits for operator argMax().

To create argMin/argMax operation given string op, MLOperand input, unsigned long axis, and MLArgMinMaxOptions options, run the following steps:
  1. Assert: op is one of "argMin", "argMax".

  2. If this can not build, then throw an "InvalidStateError" DOMException.

  3. If validating operand with this and input returns false, then throw a TypeError.

  4. If input’s shape[axis] is greater than options.outputDataType's maximum value, throw a TypeError.

  5. Let outputShape be the result of calculating reduction output sizes given input’s shape, « axis », and options.keepDimensions. If that returns failure, then throw a TypeError.

  6. Let desc be the result of creating an MLOperandDescriptor given options.outputDataType and outputShape.

  7. Make graph connections:

    1. Let operator be an operator for the op operation, given options.

    2. Let output be the result of creating an MLOperand given this and desc.

    3. Set output.[[operator]] to operator.

    4. Set operator’s input to input.

    5. Set operator’s output to output.

  8. Return output.

The following argMin/argMax algorithms are supported.
The argMin(input, axis, options) method steps are:
  1. Let output be the result of running the create argMin/argMax operation given "argMin", input, axis and options.

    1. If that throws an error, then re-throw the error.

  2. Return output.

The argMax(input, axis, options) method steps are:
  1. Let output be the result of running the create argMin/argMax operation given "argMax", input, axis and options.

    1. If that throws an error, then re-throw the error.

  2. Return output.

7.9.6. batchNormalization

Normalize the values of the input tensor using [Batch-Normalization]. For each input feature, the mean and variance values of that feature are computed across all the samples in the batch dimension while the model is trained. These mean and variance values are then subsequently given to this operation during model inference.
dictionary MLBatchNormalizationOptions : MLOperatorOptions {
  MLOperand scale;
  MLOperand bias;
  [EnforceRange] unsigned long axis = 1;
  double epsilon = 1e-5;
};

partial interface MLGraphBuilder {
  MLOperand batchNormalization(MLOperand input, MLOperand mean, MLOperand variance,
                               optional MLBatchNormalizationOptions options = {});
};

dictionary MLBatchNormalizationSupportLimits {
  MLSupportLimits input;
  MLSupportLimits mean;
  MLSupportLimits variance;
  MLSupportLimits scale;
  MLSupportLimits bias;
  MLSupportLimits output;
};

partial dictionary MLOpSupportLimits {
  MLBatchNormalizationSupportLimits batchNormalization;
};

MLBatchNormalizationOptions has the following members:

scale, of type MLOperand

The 1-D tensor of the scaling values whose size is equal to the size of the input dimension denoted by axis.

bias, of type MLOperand

The 1-D tensor of the bias values whose size is equal to the size of the input dimension denoted by axis.

axis, of type unsigned long, defaulting to 1

The index to the feature count dimension of the input shape for which the mean and variance values are. Its value must be in the range [0, N-1] where N is the rank of the input tensor. The default value is 1, corresponding to the channel ("c") dimension in the "nchw" data layout.

epsilon, of type double, defaulting to 1e-5

A small value to prevent computational error due to divide-by-zero.

Arguments:

Returns: an MLOperand. The batch-normalized N-D tensor of the same shape as input.

Constraints for batchNormalization()
operand allowed data types allowed ranks
input "float32", "float16" N
mean same as input 1
variance same as input 1
scale same as input 1
bias same as input 1
output same as input same as input

MLBatchNormalizationSupportLimits has the following members:

input, of type MLSupportLimits

MLSupportLimits for input operand.

mean, of type MLSupportLimits

MLSupportLimits for mean operand.

variance, of type MLSupportLimits

MLSupportLimits for variance operand.

scale, of type MLSupportLimits

MLSupportLimits for scale operand.

bias, of type MLSupportLimits

MLSupportLimits for bias operand.

output, of type MLSupportLimits

MLSupportLimits for output operand.

MLOpSupportLimits has the following members for batchNormalization():

batchNormalization, of type MLBatchNormalizationSupportLimits

Support limits for operator batchNormalization().

The batchNormalization(input, mean, variance, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and any of input, mean, variance, options.scale (if it exists), and options.bias (if it exists) returns false, then throw a TypeError.

  3. If input’s dataType is not one of its allowed data types (according to this table), then throw a TypeError.

  4. If options.axis is not in the range 0 to input’s rank, exclusive, then throw a TypeError.

  5. If mean’s dataType is not one of its allowed data types (according to this table), then throw a TypeError.

  6. If mean’s shape is not equal to « input’s shape[options.axis] », then throw a TypeError.

  7. If variance’s dataType is not one of its allowed data types (according to this table), then throw a TypeError.

  8. If variance’s shape is not equal to « input’s shape[options.axis] », then throw a TypeError.

  9. Set options.epsilon to the result of casting options.epsilon to input’s dataType.

  10. If options.scale exists:

    1. If its dataType is not one of its allowed data types (according to this table), then throw a TypeError.

    2. If its shape is not equal to « input’s shape[options.axis] », then throw a TypeError.

  11. If options.bias exists:

    1. If its dataType is not one of its allowed data types (according to this table), then throw a TypeError.

    2. If its shape is not equal to « input’s shape[options.axis] », then throw a TypeError.

  12. Make graph connections:

    1. Let operator be an operator for the "batchNormalization" operation, given input, mean, variance and options.

    2. Let output be the result of creating an MLOperand given this and input.[[descriptor]].

    3. Set output.[[operator]] to operator.

    4. Set operator’s inputs to input, mean, and variance.

    5. If options.scale exists, then add it to operator’s inputs.

    6. If options.bias exists, then add it to operator’s inputs.

    7. Set operator’s output to output.

  13. Return output.

The behavior of this operation when the input tensor is 4-D of the "nchw" layout can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
function batchNormalization(builder, input, mean, variance, options) {
  const shape = [1, input.shape[options.axis], 1, 1];
  return builder.add(
    builder.mul(
      builder.reshape(options.scale, shape),
      builder.div(
        builder.sub(input, builder.reshape(mean, shape)),
        builder.sqrt(builder.add(
          builder.reshape(variance, shape),
          builder.constant(input.dataType, options.epsilon))))),
    builder.reshape(options.bias, shape));
}

7.9.7. cast

Cast each element in the input tensor to the target data type.
partial interface MLGraphBuilder {
  MLOperand cast(MLOperand input,
                 MLOperandDataType type,
                 optional MLOperatorOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits cast;
};
Arguments:

Returns: an MLOperand. The N-D tensor of the same shape as input with each element casted to the target data type.

Constraints for cast()
operand allowed data types allowed ranks
input any N
output type same as input

MLOpSupportLimits has the following members for cast():

cast, of type MLSingleInputSupportLimits

Support limits for operator cast().

Casting between MLOperandDataTypes is specified for some cases and implementation-defined in other cases, according to the following table:

Behavior of the cast() operation given the input's dataType (rows) and target type (columns).
Target type Input type "float32", "float16" "int32", "uint32", "int64", "uint64", "int8", "uint8"
"float32", "float16" If in range, nearest representable value.

If out of range, +/-Infinity.

If in range, truncated.

If out of range, implementation-defined.

"int32", "uint32", "int64", "uint64", "int8", "uint8" If in range, nearest representable value.

If out of range, +/-Infinity.

If in range, same value.

If out of range, lowest N bits reinterpreted as target type, assuming two’s complement for signed types.

NOTE: For example, casting -1 from "int8" to "uint8" is specified to yield 255. But casting -1 from "float32" to "uint8" is implementation-defined.

The cast(input, type, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and input returns false, then throw a TypeError.

  3. Make graph connections:

    1. Let operator be an operator for the "cast" operation, given type and options.

    2. Let output be the result of copying an MLOperand given input.

    3. Set output.[[operator]] to operator.

    4. Set operator’s input to input.

    5. Set operator’s output to output.

  4. Return output.

7.9.8. clamp

Clamp the input tensor element-wise within a range specified by the minimum and maximum values.
dictionary MLClampOptions : MLOperatorOptions {
  MLNumber minValue;
  MLNumber maxValue;
};

partial interface MLGraphBuilder {
  MLOperand clamp(MLOperand input, optional MLClampOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits clamp;
};

MLClampOptions has the following members:

minValue, of type MLNumber

The minimum value of the range. When it is not specified, the clamping is not performed on the lower limit of the range.

maxValue, of type MLNumber

The maximum value of the range. When it is not specified, the clamping is not performed on the upper limit of the range.

Arguments: Returns:
Constraints for clamp()
operand allowed data types allowed ranks
input any N
output same as input same as input

MLOpSupportLimits has the following member for clamp():

clamp, of type MLSingleInputSupportLimits

Support limits for operator clamp().

The clamp(input, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and input returns false, then throw a TypeError.

  3. Let minValue be the options.minValue if given, or Infinity otherwise.

  4. Set options.minValue to the result of casting minValue to input’s dataType.

  5. Let maxValue be the options.maxValue if given, or -Infinity otherwise.

  6. Set options.maxValue to the result of casting maxValue to input’s dataType.

  7. If options.minValue is greater than options.maxValue, then throw a TypeError.

  8. Make graph connections:

    1. Let output be the result of copying an MLOperand given input.

    2. Let operator be an operator for the "clamp" operation, given options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s input to input.

    5. Set operator’s output to output.

  9. Return output.

The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
function clamp(builder, input, options) {
  if (options.minValue === undefined) {
    if (options.maxValue === undefined) {
      return input;
    } else {
      return builder.min(
        input, builder.constant(input.dataType, options.maxValue));
    }
  } else {
    if (options.maxValue === undefined) {
      return builder.max(
        input, builder.constant(input.dataType, options.minValue));
    } else {
      return builder.min(
        builder.max(input, builder.constant(input.dataType, options.minValue)),
        builder.constant(input.dataType, options.maxValue));
    }
  }
}

7.9.9. concat

Concatenates the input tensors along a given axis.
partial interface MLGraphBuilder {
  MLOperand concat(sequence<MLOperand> inputs,
                   [EnforceRange] unsigned long axis,
                   optional MLOperatorOptions options = {});
};

dictionary MLConcatSupportLimits {
  MLSupportLimits inputs;
  MLSupportLimits output;
};

partial dictionary MLOpSupportLimits {
  MLConcatSupportLimits concat;
};
Arguments:

Returns: an MLOperand. The concatenated tensor of all the inputs along the axis. The output tensor has the same shape except on the dimension that all the inputs concatenated along. The size of that dimension is computed as the sum of all the input sizes of the same dimension.

Constraints for concat()
operand allowed data types allowed ranks
inputs's items any N
output same as inputs's items same as inputs's items

MLConcatSupportLimits has the following members:

inputs, of type MLSupportLimits

MLSupportLimits for all input operands.

output, of type MLSupportLimits

MLSupportLimits for output operand.

MLOpSupportLimits has the following member for concat():

concat, of type MLConcatSupportLimits

Support limits for operator concat().

The concat(inputs, axis, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and any item in inputs returns false, then throw a TypeError.

  3. If inputs is empty, then throw a TypeError.

  4. Let first be inputs[0].

  5. If axis is greater than or equal to first’s rank, then throw a TypeError.

  6. Let desc be the result of creating an MLOperandDescriptor given first’s dataType and first’s shape.

  7. Set desc.shape[axis] to first’s shape[axis].

  8. For each index in the range 1 to inputs’s size, exclusive:

    1. Let input be inputs[index].

    2. If input’s dataType is not equal to first’s dataType, then throw a TypeError.

    3. If input’s rank is not equal to first’s rank, then throw a TypeError.

    4. For each dim in the range 0 to input’s rank, exclusive:

      If the shape of each corresponding dimension and type of the operands, except for those of the dimension given by axis, is not the same, fail.
      1. If dim is not equal to axis and if input’s shape[dim] is not equal to first’s shape[dim], then throw a TypeError.

      2. If dim is equal to axis:

        1. Let size be the sum of desc.shape[axis] and input’s shape[dim].

        2. If size is not a valid dimension, then throw a TypeError.

        3. Set desc.shape[axis] to size.

  9. Make graph connections:

    1. Let output be the result of creating an MLOperand given this and desc.

    2. Let operator be an operator for the "concat" operation, given inputs, axis, and options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s inputs to inputs.

    5. Set operator’s output to output.

  10. Return output.

7.9.10. conv2d

Compute a 2-D convolution given 4-D input and filter tensors
enum MLConv2dFilterOperandLayout {
  "oihw",
  "hwio",
  "ohwi",
  "ihwo"
};

dictionary MLConv2dOptions : MLOperatorOptions {
  sequence<[EnforceRange] unsigned long> padding;
  sequence<[EnforceRange] unsigned long> strides;
  sequence<[EnforceRange] unsigned long> dilations;
  [EnforceRange] unsigned long groups = 1;
  MLInputOperandLayout inputLayout = "nchw";
  MLConv2dFilterOperandLayout filterLayout = "oihw";
  MLOperand bias;
};

partial interface MLGraphBuilder {
  MLOperand conv2d(MLOperand input,
                   MLOperand filter,
                   optional MLConv2dOptions options = {});
};

dictionary MLConv2dSupportLimits {
  MLSupportLimits input;
  MLSupportLimits filter;
  MLSupportLimits bias;
  MLSupportLimits output;
};

partial dictionary MLOpSupportLimits {
  MLConv2dSupportLimits conv2d;
};

MLConv2dOptions has the following members:

padding, of type sequence<[EnforceRange] unsigned long>

A list of length 4: [beginningHeight, endingHeight, beginningWidth, endingWidth]. Specifies the additional rows and columns added to the beginning and ending of each spatial dimension of the convolution input. The default value is [0, 0, 0, 0].

strides, of type sequence<[EnforceRange] unsigned long>

A list of length 2: [strideHeight, strideWidth]. Specifies the stride of the sliding window for each spatial dimension of the convolution input. The default value is [1, 1].

dilations, of type sequence<[EnforceRange] unsigned long>

A list of length 2: [dilationHeight, dilationWidth]. Specifies the dilation factor for each spatial dimension applied on the convolution filter (kernel). The default value is [1, 1].

groups, of type unsigned long, defaulting to 1

The number of groups that input channels and output channels are divided into.

inputLayout, of type MLInputOperandLayout, defaulting to "nchw"

Specifies the layout format of the input and output tensor as follows:

  • "nchw"

    • input tensor: [batches, inputChannels, height, width]

    • output tensor: [batches, outputChannels, height, width]

  • "nhwc":

    • input tensor: [batches, height, width, inputChannels]

    • output tensor: [batches, height, width, outputChannels]

filterLayout, of type MLConv2dFilterOperandLayout, defaulting to "oihw"

Specifies the layout format of the filter tensor as follows:

  • "oihw": [outputChannels, inputChannels/groups, height, width]

  • "hwio": [height, width, inputChannels/groups, outputChannels]

  • "ohwi": [outputChannels, height, width, inputChannels/groups]

  • "ihwo": [inputChannels/groups, height, width, outputChannels]

bias, of type MLOperand

An additional 1-D tensor with the shape of [outputChannels] whose values are to be added to the convolution result.

Arguments:

Returns: an MLOperand. The output 4-D tensor that contains the convolution result. The output shape is interpreted according to the options.inputLayout value. More specifically, the spatial dimensions or the sizes of the last two dimensions of the output tensor for the "nchw" input layout can be calculated as follows:

outputSize = 1 + (inputSize - (filterSize - 1) * dilation - 1 + beginningPadding + endingPadding) / stride

Constraints for conv2d()
operand allowed data types allowed ranks
input "float32", "float16" 4
filter same as input 4
bias same as input 1
output same as input 4

MLConv2dSupportLimits has the following members:

input, of type MLSupportLimits

MLSupportLimits for input operand.

filter, of type MLSupportLimits

MLSupportLimits for filter operand.

bias, of type MLSupportLimits

MLSupportLimits for bias operand.

output, of type MLSupportLimits

MLSupportLimits for output operand.

MLOpSupportLimits has the following member for conv2d():

conv2d, of type MLConv2dSupportLimits

Support limits for operator conv2d().

A depthwise conv2d operation is a variant of grouped convolution, used in models like the MobileNet, where the options.groups = inputChannels = outputChannels and the shape of filter tensor is [options.groups, 1, height, width] for "oihw" layout, [height, width, 1, options.groups] for "hwio" layout, [options.groups, height, width, 1] for "ohwi" layout and [1, height, width, options.groups] for "ihwo" layout.
To calculate conv output size given unsigned integers inputSize, filterSize, beginningPadding, endingPadding, stride and dilation, perform these steps. They return a number.
  1. Let effectiveFilterSize be ( filterSize - 1 ) * dilation + 1.

  2. Let outputSize be ( inputSize - effectiveFilterSize + beginningPadding + endingPadding ) / stride + 1.

  3. Return outputSize.

To calculate conv2d output sizes given unsigned integers inputHeight, inputWidth, filterHeight and filterWidth, list of 4 unsigned integers padding, list of 2 unsigned integers strides, and list of 2 unsigned integers dilations, perform these steps. They return a list of 2 numbers.
  1. Let outputHeight be the result of calculating conv output size given inputHeight, filterHeight, padding[0], padding[1], strides[0] and dilations[0].

  2. Let outputWidth be the result of calculating conv output size given inputWidth, filterWidth, padding[2], padding[3], strides[1] and dilations[1].

  3. Return « outputHeight, outputWidth ».

The conv2d(input, filter, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and any of input, filter, and options.bias (if it exists) returns false, then throw a TypeError.

  3. If input’s dataType is not one of its allowed data types (according to this table), then throw a TypeError.

  4. If input’s rank is not its allowed rank, then throw a TypeError.

  5. If filter’s rank is not its allowed rank, then throw a TypeError.

  6. If filter’s dataType is not one of its allowed data types (according to this table), then throw a TypeError.

  7. If options.padding does not exist, set it to the list « 0, 0, 0, 0 ».

  8. Otherwise, if options.padding's size is not 4, then throw a TypeError.

  9. If options.strides does not exist, set it to the list « 1, 1 ».

  10. Otherwise, if options.strides's size is not 2, then throw a TypeError.

  11. If any element in options.strides is equal to 0, then throw a TypeError.

  12. If options.dilations does not exist, set it to the list « 1, 1 ».

  13. Otherwise, if options.dilations's size is not 2, then throw a TypeError.

  14. If any element in options.dilations is equal to 0, then throw a TypeError.

  15. If options.groups is 0, then throw a TypeError.

  16. Calculate the output shape:

    1. Let inputShape be input’s shape.

    2. Switch on options.inputLayout:

      "nchw"

      Let « batches, inputChannels, inputHeight, inputWidth » be inputShape.

      "nhwc"

      Let « batches, inputHeight, inputWidth, inputChannels » be inputShape.

    3. Let filterShape be filter’s shape.

    4. Switch on options.filterLayout:

      "hwio"

      Let « filterHeight, filterWidth, filterInputChannels, outputChannels » be filterShape.

      "ohwi"

      Let « outputChannels, filterHeight, filterWidth, filterInputChannels » be filterShape.

      "ihwo"

      Let « filterInputChannels, filterHeight, filterWidth, outputChannels » be filterShape.

      "oihw"

      Let « outputChannels, filterInputChannels, filterHeight, filterWidth » be filterShape.

    5. If inputChannels % options.groups is not 0, then throw a TypeError.

    6. Otherwise, if inputChannels / options.groups is not equal to filterInputChannels, then throw a TypeError.

    7. If options.bias exists:

      1. If its shape is not equal to « outputChannels », then throw a TypeError.

      2. If its dataType is not one of its allowed data types (according to this table), then throw a TypeError.

    8. Let outputSizes be the result of calculating conv2d output sizes given inputHeight, inputWidth, filterHeight, filterWidth, options.padding, options.strides, and options.dilations.

    9. Switch on options.inputLayout:

      "nchw"

      Let outputShape be « batches, outputChannels, floor( outputSizes[0] ), floor( outputSizes[1] ) ».

      "nhwc"

      Let outputShape be « batches, floor( outputSizes[0] ), floor( outputSizes[1] ), outputChannels ».

    10. If any item in outputShape is not a valid dimension, then throw a TypeError.

    11. Let desc be the result of creating an MLOperandDescriptor given input’s dataType and outputShape.

  17. Make graph connections:

    1. Let output be the result of creating an MLOperand given this and desc.

    2. Let operator be an operator for the "conv2d" operation, given options and filter.

    3. Set output.[[operator]] to operator.

    4. Set operator’s inputs to input and filter.

    5. If options.bias exists, then add it to operator’s inputs.

    6. Set operator’s output to output.

  18. Return output.

7.9.11. convTranspose2d

Compute a 2-D transposed convolution given 4-D input and filter tensors
enum MLConvTranspose2dFilterOperandLayout {
  "iohw",
  "hwoi",
  "ohwi"
};

dictionary MLConvTranspose2dOptions : MLOperatorOptions {
  sequence<[EnforceRange] unsigned long> padding;
  sequence<[EnforceRange] unsigned long> strides;
  sequence<[EnforceRange] unsigned long> dilations;
  sequence<[EnforceRange] unsigned long> outputPadding;
  sequence<[EnforceRange] unsigned long> outputSizes;
  [EnforceRange] unsigned long groups = 1;
  MLInputOperandLayout inputLayout = "nchw";
  MLConvTranspose2dFilterOperandLayout filterLayout = "iohw";
  MLOperand bias;
};

partial interface MLGraphBuilder {
  MLOperand convTranspose2d(MLOperand input, MLOperand filter,
                            optional MLConvTranspose2dOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLConv2dSupportLimits convTranspose2d;
};

MLConvTranspose2dOptions has the following members:

padding, of type sequence<[EnforceRange] unsigned long>

A list of length 4: [beginningHeight, endingHeight, beginningWidth, endingWidth]. Specifies the additional rows and columns added to the beginning and ending of each spatial dimension of the convolution input. The default value is [0, 0, 0, 0].

strides, of type sequence<[EnforceRange] unsigned long>

A list of length 2: [strideHeight, strideWidth]. Specifies the stride of the sliding window for each spatial dimension of the convolution input. The default value is [1, 1].

dilations, of type sequence<[EnforceRange] unsigned long>

A list of length 2: [dilationHeight, dilationWidth]. Specifies the dilation factor for each spatial dimension applied on the convolution filter (kernel). The default value is [1, 1].

outputPadding, of type sequence<[EnforceRange] unsigned long>

A list of length 2. Specifies the padding values applied to each spatial dimension of the output tensor. The explicit padding values are needed to disambiguate the output tensor shape for transposed convolution when the value of the options.strides is greater than 1.

Note that these values are only used to disambiguate output shape when needed; it does not necessarily cause any padding value to be written to the output tensor.

The default value is [0, 0].

outputSizes, of type sequence<[EnforceRange] unsigned long>

A list of length 2. Specifies the sizes of the last two dimensions of the output tensor. When the output sizes are explicitly specified, the output padding values in outputPadding are ignored.

If not specified, the output sizes are automatically computed.

groups, of type unsigned long, defaulting to 1

The number of groups that input channels and output channels are divided into.

inputLayout, of type MLInputOperandLayout, defaulting to "nchw"

Specifies the layout format of the input and output tensor as follows:

  • "nchw"

    • input tensor: [batches, inputChannels, height, width]

    • output tensor: [batches, outputChannels, height, width]

  • "nhwc":

    • input tensor: [batches, height, width, inputChannels]

    • output tensor: [batches, height, width, outputChannels]

filterLayout, of type MLConvTranspose2dFilterOperandLayout, defaulting to "iohw"

Specifies the layout format of the filter tensor as follows:

  • "iohw": [inputChannels, outputChannels/groups, height, width]

  • "hwoi": [height, width, outputChannels/groups, inputChannels]

  • "ohwi": [outputChannels/groups, height, width, inputChannels]

bias, of type MLOperand

An additional 1-D tensor with the shape of [outputChannels] whose values are to be added to the convolution result.

Arguments:

Returns: an MLOperand. The output 4-D tensor that contains the transposed convolution result. The output shape is interpreted according to the options.inputLayout value. More specifically, unless the options.outputSizes values are explicitly specified, the options.outputPadding is needed to compute the spatial dimension values of the output tensor as follows:

outputSize = (inputSize - 1) * stride + (filterSize - 1) * dilation + 1 - beginningPadding - endingPadding + outputPadding

Constraints for convTranspose2d()
operand allowed data types allowed ranks
input "float32", "float16" 4
filter same as input 4
bias same as input 1
output same as input 4

MLOpSupportLimits has the following member for convTranspose2d():

convTranspose2d, of type MLConv2dSupportLimits

Support limits for operator convTranspose2d().

To calculate convtranspose output size given unsigned integers inputSize, filterSize, beginningPadding, endingPadding, stride, dilation, and outputPadding, perform these steps. They return a number.
  1. Let effectiveFilterSize be ( filterSize - 1 ) * dilation + 1.

  2. Let outputSize be ( inputSize - 1 ) * stride + effectiveFilterSize - beginningPadding - endingPadding + outputPadding.

  3. Return outputSize.

To calculate convtranspose2d output sizes given unsigned integers inputHeight, inputWidth, filterHeight and filterWidth, list of 4 unsigned integers padding, list of 2 unsigned integers strides, list of 2 unsigned integers dilations, and list of 2 unsigned integers outputPadding, perform these steps. They return a list of 2 numbers.
  1. Let outputHeight be the result of calculating convtranspose output size given inputHeight, filterHeight, padding[0], padding[1], strides[0], dilations[0], and outputPadding[0].

  2. Let outputWidth be the result of calculating convtranspose output size given inputWidth, filterWidth, padding[2], padding[3], strides[1], dilations[1] and outputPadding[1].

  3. Return « outputHeight, outputWidth ».

The convTranspose2d(input, filter, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and any of input, filter, and options.bias (if it exists) returns false, then throw a TypeError.

  3. If input’s rank is not its allowed rank, then throw a TypeError.

  4. If input’s dataType is not one of its allowed data types (according to this table), then throw a TypeError.

  5. If filter’s rank is not its allowed rank, then throw a TypeError.

  6. If filter’s dataType is not one of its allowed data types (according to this table), then throw a TypeError.

  7. If options.padding does not exist, set it to the list « 0, 0, 0, 0 ».

  8. Otherwise, if options.padding's size is not 4, then throw a TypeError.

  9. If options.strides does not exist, set it to the list « 1, 1 ».

  10. Otherwise, if options.strides's size is not 2, then throw a TypeError.

  11. If any element in options.strides is equal to 0, then throw a TypeError.

  12. If options.dilations does not exist, set it to the list « 1, 1 ».

  13. Otherwise, if options.dilations's size is not 2, then throw a TypeError.

  14. If any element in options.dilations is equal to 0, then throw a TypeError.

  15. If options.outputPadding does not exist, set it to the list « 0, 0 ».

  16. Otherwise, if options.outputPadding's size is not 2, then throw a TypeError.

  17. If options.outputSizes exists:

    1. If its size is not 2, then throw a TypeError.

  18. Otherwise:

    1. If options.outputPadding[0] is greater than or equal to options.strides[0], or options.outputPadding[1] is greater than or equal to options.strides[1], then throw a TypeError.

  19. If options.groups is 0, then throw a TypeError.

  20. Calculate the output shape:

    1. Let inputShape be input’s shape.

    2. Switch on options.inputLayout:

      "nchw"

      Let « batches, inputChannels, inputHeight, inputWidth » be inputShape.

      "nhwc"

      Let « batches, inputHeight, inputWidth, inputChannels » be inputShape.

    3. Let filterShape be filter’s shape.

    4. Switch on options.filterLayout:

      "iohw"

      Let « filterInputChannels, filterOutputChannels, filterHeight, filterWidth » be filterShape.

      "hwoi"

      Let « filterHeight, filterWidth, filterOutputChannels, filterInputChannels » be filterShape.

      "ohwi"

      Let « filterOutputChannels, filterHeight, filterWidth, filterInputChannels » be filterShape.

    5. If inputChannels is not equal to filterInputChannels, then throw a TypeError.

    6. Let outputChannels be filterOutputChannels * options.groups.

    7. If options.bias exists:

      1. If its shape is not equal to « outputChannels », then throw a TypeError.

      2. If its dataType is not one of its allowed data types (according to this table), then throw a TypeError.

    8. If options.outputSizes exists, let outputSizes be options.outputSizes.

    9. Otherwise, let outputSizes be the result of calculating convtranspose2d output sizes given inputHeight, inputWidth, filterHeight, filterWidth, options.padding, options.strides, options.dilations, and options.outputPadding.

    10. Switch on options.inputLayout:

      "nchw"

      Let outputShape be « batches, outputChannels, floor( outputSizes[0] ), floor( outputSizes[1] ) ».

      "nhwc"

      Let outputShape be « batches, floor( outputSizes[0] ), floor( outputSizes[1] ), outputChannels ».

    11. If any item in outputShape is not a valid dimension, then throw a TypeError.

    12. Let desc be the result of creating an MLOperandDescriptor given input’s dataType and outputShape.

  21. Make graph connections:

    1. Let output be the result of creating an MLOperand given this and desc.

    2. Let operator be an operator for the "convTranspose2d" operation, given options and filter.

    3. Set output.[[operator]] to operator.

    4. Set operator’s inputs to input and filter.

    5. If options.bias exists, then add it to operator’s inputs.

    6. Set operator’s output to output.

  22. Return output.

7.9.12. Element-wise binary operations

Compute the element-wise binary addition, subtraction, multiplication, division, power, maximum and minimum of the two input tensors.

The operation will be broadcast according to [numpy-broadcasting-rule]. The input tensors must be bidirectionally broadcastable. The rank of the output tensor is the maximum rank of the input tensors. For each dimension of the output tensor, its size is the maximum size along that dimension of the input tensors.

partial interface MLGraphBuilder {
  MLOperand add(MLOperand a, MLOperand b, optional MLOperatorOptions options = {});
  MLOperand sub(MLOperand a, MLOperand b, optional MLOperatorOptions options = {});
  MLOperand mul(MLOperand a, MLOperand b, optional MLOperatorOptions options = {});
  MLOperand div(MLOperand a, MLOperand b, optional MLOperatorOptions options = {});
  MLOperand max(MLOperand a, MLOperand b, optional MLOperatorOptions options = {});
  MLOperand min(MLOperand a, MLOperand b, optional MLOperatorOptions options = {});
  MLOperand pow(MLOperand a, MLOperand b, optional MLOperatorOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLBinarySupportLimits add;
  MLBinarySupportLimits sub;
  MLBinarySupportLimits mul;
  MLBinarySupportLimits div;
  MLBinarySupportLimits max;
  MLBinarySupportLimits min;
  MLBinarySupportLimits pow;
};
Arguments:

Returns: an MLOperand. The output tensor that contains the result of element-wise binary operation of the two input tensors.

Operation types:
Constraints for element-wise binary options
operand allowed data types allowed ranks
a any N
b same as a N
output same as a maximum of a's rank and b's rank

MLOpSupportLimits has the following members for element-wise binary operations:

add, of type MLBinarySupportLimits

Support limits for operator add().

sub, of type MLBinarySupportLimits

Support limits for operator sub().

mul, of type MLBinarySupportLimits

Support limits for operator mul().

div, of type MLBinarySupportLimits

Support limits for operator div().

max, of type MLBinarySupportLimits

Support limits for operator max().

min, of type MLBinarySupportLimits

Support limits for operator min().

pow, of type MLBinarySupportLimits

Support limits for operator pow().

To create element-wise binary operation given string op, MLOperand a, MLOperand b, and MLOperatorOptions options, run the following steps:
  1. Assert: op is one of "add", "sub", "mul", "div", "max", "min", "pow".

  2. If this can not build, then throw an "InvalidStateError" DOMException.

  3. If validating operand with this and any of a and b returns false, then throw a TypeError.

  4. If a’s dataType is not equal to b’s dataType, then throw a TypeError.

  5. Let outputShape be the result of bidirectionally broadcasting a’s shape and b’s shape.

    1. If that returns failure, then throw a TypeError.

  6. Let descriptor be the result of creating an MLOperandDescriptor given a’s dataType and outputShape.

  7. Make graph connections:

    1. Let output be the result of creating an MLOperand given this and descriptor.

    2. Let operator be an operator for the op operation, given a, b, and options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s inputs to a and b.

    5. Set operator’s output to output.

  8. Return output.

The element-wise binary operation algorithms invoke the create element-wise binary operation steps as follows.
The add(a, b, options) method steps are:
  1. Let output be the result of running the create element-wise binary operation given "add", a, b, and options.

    1. If that throws an error, then re-throw the error.

  2. Return output.

The sub(a, b, options) method steps are:
  1. Let output be the result of running the create element-wise binary operation given "sub", a, b, and options.

    1. If that throws an error, then re-throw the error.

  2. Return output.

The mul(a, b, options) method steps are:
  1. Let output be the result of running the create element-wise binary operation given "mul", a, b, and options.

    1. If that throws an error, then re-throw the error.

  2. Return output.

The div(a, b, options) method steps are:
  1. Let output be the result of running the create element-wise binary operation given "div", a, b, and options.

    1. If that throws an error, then re-throw the error.

  2. Return output.

The max(a, b, options) method steps are:
  1. Let output be the result of running the create element-wise binary operation given "max", a, b, and options.

    1. If that throws an error, then re-throw the error.

  2. Return output.

The min(a, b, options) method steps are:
  1. Let output be the result of running the create element-wise binary operation given "min", a, b, and options.

    1. If that throws an error, then re-throw the error.

  2. Return output.

The pow(a, b, options) method steps are:
  1. Let output be the result of running the create element-wise binary operation given "pow", a, b, and options.

    1. If that throws an error, then re-throw the error.

  2. Return output.

7.9.13. Element-wise logical operations

Compare input tensors element-wise and return a "uint8" tensor of values 0 (false) or 1 (true) for the comparisons. For single-operand operations, return the logical results of the operation.

For multiple-operand operations, the operation will be broadcast according to [numpy-broadcasting-rule]. The input tensors must be bidirectionally broadcastable. The rank of the output tensor is the maximum rank of the input tensors. For each dimension of the output tensor, its size is the maximum size along that dimension of the input tensors.

partial interface MLGraphBuilder {
  MLOperand equal(MLOperand a,
                  MLOperand b,
                  optional MLOperatorOptions options = {});
  MLOperand greater(MLOperand a,
                    MLOperand b,
                    optional MLOperatorOptions options = {});
  MLOperand greaterOrEqual(MLOperand a,
                           MLOperand b,
                           optional MLOperatorOptions options = {});
  MLOperand lesser(MLOperand a,
                   MLOperand b,
                   optional MLOperatorOptions options = {});
  MLOperand lesserOrEqual(MLOperand a,
                          MLOperand b,
                          optional MLOperatorOptions options = {});
  MLOperand logicalNot(MLOperand a, optional MLOperatorOptions options = {});
};

dictionary MLLogicalNotSupportLimits {
  MLSupportLimits a;
  MLSupportLimits output;
};

partial dictionary MLOpSupportLimits {
  MLBinarySupportLimits equal;
  MLBinarySupportLimits greater;
  MLBinarySupportLimits greaterOrEqual;
  MLBinarySupportLimits lesser;
  MLBinarySupportLimits lesserOrEqual;
  MLLogicalNotSupportLimits logicalNot;
};
Arguments:

Returns: an MLOperand. The output tensor that contains the result of element-wise comparison of the two input tensors.

Constraints for element-wise logical options
operand allowed data types allowed ranks
a specified as part of operation steps N
b same as a N
output same as a maximum of a's rank and b's rank

MLLogicalNotSupportLimits has the following members:

a, of type MLSupportLimits

MLSupportLimits for a operand.

output, of type MLSupportLimits

MLSupportLimits for output operand.

MLOpSupportLimits has the following members for element-wise logical operations:

equal, of type MLBinarySupportLimits

Support limits for operator equal().

greater, of type MLBinarySupportLimits

Support limits for operator greater().

greaterOrEqual, of type MLBinarySupportLimits

Support limits for operator greaterOrEqual().

lesser, of type MLBinarySupportLimits

Support limits for operator lesser().

lesserOrEqual, of type MLBinarySupportLimits

Support limits for operator lesserOrEqual().

logicalNot, of type MLLogicalNotSupportLimits

Support limits for operator logicalNot().

Operation types:
Although operations greaterOrEqual() and lesserOrEqual() can each be implemented in terms of operations logicalNot(), lesser(), and greater() in other words builder.greaterOrEqual(a, b) is builder.logicalNot(builder.lesser(a, b)), they are specifically defined to handle NaN cases and for performance reason to avoid double comparisons.
To create element-wise logical operation given string op, MLOperand a, an optional MLOperand b, and MLOperatorOptions options, run the following steps:
  1. Assert: op is one of "equal", "greater", "greaterOrEqual", "lesser", "lesserOrEqual", "logicalNot".

  2. If this can not build, then throw an "InvalidStateError" DOMException.

  3. If op is "logicalNot":

    1. If validating operand with this and a returns false, then throw a TypeError.

    2. If a’s dataType is not "uint8", then throw a TypeError.

    3. Let outputShape be a clone of a’s shape.

  4. Otherwise:

    1. If validating operand with this and any of a and b returns false, then throw a TypeError.

    2. If a’s dataType is not equal to b’s dataType, then throw a TypeError.

    3. Let outputShape be the result of bidirectionally broadcasting a’s shape and b’s shape. If that returns failure, then throw a TypeError.

  5. Let descriptor be the result of creating an MLOperandDescriptor given "uint8" and outputShape.

  6. Make graph connections:

    1. Let output be the result of creating an MLOperand given this and descriptor.

    2. Let operator be an operator for the op operation, given a and (if op is not "logicalNot") b, and options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s inputs to a and (if op is anything other than "logicalNot") b.

    5. Set operator’s output to output.

  7. Return output.

The element-wise logical operation algorithms invoke the create element-wise logical operation steps as follows.
The equal(a, b, options) method steps are:
  1. Let output be the result of running the create element-wise logical operation given "equal", a, b, and options.

    1. If that throws an error, then re-throw the error.

  2. Return output.

The greater(a, b, options) method steps are:
  1. Let output be the result of running the create element-wise logical operation given "greater", a, b, and options.

    1. If that throws an error, then re-throw the error.

  2. Return output.

The greaterOrEqual(a, b, options) method steps are:
  1. Let output be the result of running the create element-wise logical operation given "greaterOrEqual", a, b, and options.

    1. If that throws an error, then re-throw the error.

  2. Return output.

The lesser(a, b, options) method steps are:
  1. Let output be the result of running the create element-wise logical operation given "lesser", a, b, and options.

    1. If that throws an error, then re-throw the error.

  2. Return output.

The lesserOrEqual(a, b, options) method steps are:
  1. Let output be the result of running the create element-wise logical operation given "lesserOrEqual", a, b, and options.

    1. If that throws an error, then re-throw the error.

  2. Return output.

The logicalNot(a, options) method steps are:
  1. Let output be the result of running the create element-wise logical operation given "logicalNot", a, and options.

    1. If that throws an error, then re-throw the error.

  2. Return output.

7.9.14. Element-wise unary operations

Compute the element-wise unary operation for input tensor.
partial interface MLGraphBuilder {
  MLOperand abs(MLOperand input, optional MLOperatorOptions options = {});
  MLOperand ceil(MLOperand input, optional MLOperatorOptions options = {});
  MLOperand cos(MLOperand input, optional MLOperatorOptions options = {});
  MLOperand erf(MLOperand input, optional MLOperatorOptions options = {});
  MLOperand exp(MLOperand input, optional MLOperatorOptions options = {});
  MLOperand floor(MLOperand input, optional MLOperatorOptions options = {});
  MLOperand identity(MLOperand input, optional MLOperatorOptions options = {});
  MLOperand log(MLOperand input, optional MLOperatorOptions options = {});
  MLOperand neg(MLOperand input, optional MLOperatorOptions options = {});
  MLOperand reciprocal(MLOperand input, optional MLOperatorOptions options = {});
  MLOperand sin(MLOperand input, optional MLOperatorOptions options = {});
  MLOperand sqrt(MLOperand input, optional MLOperatorOptions options = {});
  MLOperand tan(MLOperand input, optional MLOperatorOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits abs;
  MLSingleInputSupportLimits ceil;
  MLSingleInputSupportLimits cos;
  MLSingleInputSupportLimits erf;
  MLSingleInputSupportLimits exp;
  MLSingleInputSupportLimits floor;
  MLSingleInputSupportLimits identity;
  MLSingleInputSupportLimits log;
  MLSingleInputSupportLimits neg;
  MLSingleInputSupportLimits reciprocal;
  MLSingleInputSupportLimits sin;
  MLSingleInputSupportLimits sqrt;
  MLSingleInputSupportLimits tan;
};
Arguments:

Returns: an MLOperand. The output tensor that contains the result of element-wise unary operation of the input tensor. The shape of the output tensor is the same as the shape of input tensor.

Constraints for element-wise unary options
operand allowed data types allowed ranks
input specified as part of operation steps N
output same as input same as input

MLOpSupportLimits has the following members for element-wise unary operations:

abs, of type MLSingleInputSupportLimits

Support limits for operator abs().

ceil, of type MLSingleInputSupportLimits

Support limits for operator ceil().

cos, of type MLSingleInputSupportLimits

Support limits for operator cos().

erf, of type MLSingleInputSupportLimits

Support limits for operator erf().

exp, of type MLSingleInputSupportLimits

Support limits for operator exp().

floor, of type MLSingleInputSupportLimits

Support limits for operator floor().

identity, of type MLSingleInputSupportLimits

Support limits for operator identity().

log, of type MLSingleInputSupportLimits

Support limits for operator log().

neg, of type MLSingleInputSupportLimits

Support limits for operator neg().

reciprocal, of type MLSingleInputSupportLimits

Support limits for operator reciprocal().

sin, of type MLSingleInputSupportLimits

Support limits for operator sin().

sqrt, of type MLSingleInputSupportLimits

Support limits for operator sqrt().

tan, of type MLSingleInputSupportLimits

Support limits for operator tan().

Operation types:
To create element-wise unary operation given string op, MLOperand input, optional list allowedDataTypes, and options, run the following steps:
  1. Assert: op is one of "abs", "ceil", "cos", "erf", "exp", "floor", "identity", "log", "neg", "reciprocal", "sin", "sqrt", "tan".

  2. If this can not build, then throw an "InvalidStateError" DOMException.

  3. If validating operand with this and input returns false, then throw a TypeError.

  4. If allowedDataTypes is given and it does not contain input’s dataType, then throw a TypeError.

  5. Make graph connections:

    1. Let output be the result of copying an MLOperand given input.

    2. Let operator be an operator for the op operation given options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s input to input.

    5. Set operator’s output to output.

  6. Return output.

The element-wise unary operation algorithms invoke the create element-wise unary operation steps as follows.
The abs(input, options) method steps are:
  1. Let output be the result of running the create element-wise unary operation given "abs", input, « "float32", "float16", "int32", "int8" », and options.

    1. If that throws an error, then re-throw the error.

  2. Return output.

The ceil(input, options) method steps are:
  1. Let output be the result of running the create element-wise unary operation given "ceil", input, « "float32", "float16" », and options.

    1. If that throws an error, then re-throw the error.

  2. Return output.

The cos(input, options) method steps are:
  1. Let output be the result of running the create element-wise unary operation given "cos", input, « "float32", "float16" », and options.

    1. If that throws an error, then re-throw the error.

  2. Return output.

The erf(input, options) method steps are:
  1. Let output be the result of running the create element-wise unary operation given "erf", input, « "float32", "float16" », and options.

    1. If that throws an error, then re-throw the error.

  2. Return output.

The exp(input, options) method steps are:
  1. Let output be the result of running the create element-wise unary operation given "exp", input, « "float32", "float16" », and options.

    1. If that throws an error, then re-throw the error.

  2. Return output.

The floor(input, options) method steps are:
  1. Let output be the result of running the create element-wise unary operation given "floor", input, « "float32", "float16" », and options.

    1. If that throws an error, then re-throw the error.

  2. Return output.

The identity(input, options) method steps are:
  1. Let output be the result of running the create element-wise unary operation given "identity" input, and options.

    1. If that throws an error, then re-throw the error.

  2. Return output.

The log(input, options) method steps are:
  1. Let output be the result of running the create element-wise unary operation given "log", input, « "float32", "float16" », and options.

    1. If that throws an error, then re-throw the error.

  2. Return output.

The neg(input, options) method steps are:
  1. Let output be the result of running the create element-wise unary operation given "neg", input, « "float32", "float16", "int32", "int8" », and options.

    1. If that throws an error, then re-throw the error.

  2. Return output.

The reciprocal(input, options) method steps are:
  1. Let output be the result of running the create element-wise unary operation given "reciprocal", input, « "float32", "float16" », and options.

    1. If that throws an error, then re-throw the error.

  2. Return output.

The sin(input, options) method steps are:
  1. Let output be the result of running the create element-wise unary operation given "sin", input, « "float32", "float16" », and options.

    1. If that throws an error, then re-throw the error.

  2. Return output.

The sqrt(input, options) method steps are:
  1. Let output be the result of running the create element-wise unary operation given "sqrt", input, « "float32", "float16" », and options.

    1. If that throws an error, then re-throw the error.

  2. Return output.

The tan(input, options) method steps are:
  1. Let output be the result of running the create element-wise unary operation given "tan", input, « "float32", "float16" », and options.

    1. If that throws an error, then re-throw the error.

  2. Return output.

7.9.15. elu

Calculate the exponential linear unit function (ELU) on the input tensor element-wise. The calculation follows the expression max(0, x) + alpha * (exp(min(0, x)) - 1).
dictionary MLEluOptions : MLOperatorOptions {
  double alpha = 1;
};

partial interface MLGraphBuilder {
  MLOperand elu(MLOperand input, optional MLEluOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits elu;
};

MLEluOptions has the following members:

alpha, of type double, defaulting to 1

A scalar multiplier.

Arguments:

Returns:

Constraints for elu()
operand allowed data types allowed ranks
input "float32", "float16" N
output same as input same as input

MLOpSupportLimits has the following members for elu():

elu, of type MLSingleInputSupportLimits

Support limits for operator elu().

The elu(input, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and input returns false, then throw a TypeError.

  3. If input’s dataType is not one of its allowed data types (according to this table), then throw a TypeError.

  4. Set options.alpha to the result of casting options.alpha to input’s dataType.

  5. Make graph connections:

    1. Let output be the result of copying an MLOperand given input.

    2. Let operator be an operator for the "elu" operation, given options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s input to input.

    5. Set operator’s output to output.

  6. Return output.

The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
function elu(builder, input, options) {
  return builder.add(
    builder.max(builder.constant(input.dataType, 0), input),
    builder.mul(
      builder.constant(input.dataType, options.alpha),
      builder.sub(
        builder.exp(builder.min(builder.constant(input.dataType, 0), input)),
        builder.constant(input.dataType, 1))));
}

7.9.16. expand

Expand any dimension of size 1 of the input tensor to a larger size according to the new shape. The expansion is consistent with [numpy-broadcasting-rule]. The input tensor must be unidirectionally broadcastable to the new shape; each dimension must be of size 1 or match the sizes of the corresponding output dimensions according to the new shape.
partial interface MLGraphBuilder {
  MLOperand expand(MLOperand input,
                   sequence<[EnforceRange] unsigned long> newShape,
                   optional MLOperatorOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits expand;
};
Arguments:

Returns: an MLOperand. The tensor with expanded size shape.

Constraints for expand()
operand allowed data types allowed ranks
input any N
output same as input newShape's size

MLOpSupportLimits has the following members for expand():

expand, of type MLSingleInputSupportLimits

Support limits for operator expand().

The expand(input, newShape, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and input returns false, then throw a TypeError.

  3. Let outputShape be the result of unidirectionally broadcasting input’s shape and newShape.

    1. If that returns failure, then throw a TypeError.

  4. Let outputDescriptor be the result of creating an MLOperandDescriptor given input’s dataType and outputShape.

  5. Make graph connections:

    1. Let output be the result of creating an MLOperand given this and outputDescriptor.

    2. Let operator be an operator for the "expand" operation, given input, newShape, and options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s input to input.

    5. Set operator’s output to output.

  6. Return output.

7.9.17. gather

Gather values of the input tensor along an axis according to the indices.
dictionary MLGatherOptions : MLOperatorOptions {
  [EnforceRange] unsigned long axis = 0;
};

partial interface MLGraphBuilder {
  MLOperand gather(MLOperand input,
                   MLOperand indices,
                   optional MLGatherOptions options = {});
};

dictionary MLGatherSupportLimits {
  MLSupportLimits input;
  MLSupportLimits indices;
  MLSupportLimits output;
};

partial dictionary MLOpSupportLimits {
  MLGatherSupportLimits gather;
};

MLGatherOptions has the following members:

axis, of type unsigned long, defaulting to 0

The axis along which the gathered values are obtained. Its value must be in the range [0, N-1] where N is the rank of the input tensor.

Arguments:

Returns: an MLOperand. The output N-D tensor of rank equal to the rank of input + the rank of indices - 1.

The indices parameter to gather() can not be clamped to the allowed range when the graph is built because the inputs are not known until execution. Implementations can introduce clamp() in the compiled graph if the specified clamping behavior is not provided by the underlying platform. Similarly, if the underlying platform does not support negative indices, the implementation can introduce operations in the compiled graph to transform a negative index from the end of the dimension into a positive index.
Constraints for gather()
operand allowed data types allowed ranks
input any N
indices "int32", "uint32", "int64" N
output same as input input's rank + indices's rank - 1

MLGatherSupportLimits has the following members:

input, of type MLSupportLimits

MLSupportLimits for input operand.

indices, of type MLSupportLimits

MLSupportLimits for indices operand.

output, of type MLSupportLimits

MLSupportLimits for output operand.

MLOpSupportLimits has the following members for gather():

gather, of type MLGatherSupportLimits

Support limits for operator gather().

The gather(input, indices, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and any of input and indices returns false, then throw a TypeError.

  3. If indices’s dataType is not one of its allowed data types (according to this table), then throw a TypeError.

  4. Let shapeInput be input’s shape and rankInput be shapeInput’s rank.

  5. Let shapeIndices be indices’s shape.

  6. Let axis be options.axis.

  7. If axis is greater than or equal to rankInput, then throw a TypeError.

  8. Let dimCount be zero.

  9. Let rankOutput be zero.

  10. Let shapeOutput be an empty list.

  11. For each size of shapeInput:

    1. If dimCount is equal to axis then break.

    2. Set shapeOutput[dimCount] to size.

    3. Increment dimCount by one.

  12. Set rankOutput to dimCount.

  13. Let dimCount be zero.

  14. For each size of shapeIndices:

    1. Set shapeOutput[rankOutput + dimCount] to size.

    2. Increment dimCount by one.

  15. Set rankOutput to rankOutput + dimCount.

  16. Let dimCount be zero.

  17. For each size of shapeInput:

    1. If dimCount is less than or equal to axis then continue.

    2. Set shapeOutput[rankOutput + dimCount - axis - 1] to size.

    3. Increment dimCount by one.

  18. Let desc be the result of creating an MLOperandDescriptor given input’s dataType and shapeOutput.

  19. Make graph connections:

    1. Let output be the result of creating an MLOperand given desc.

    2. Let operator be an operator for the "gather" operation, given input, indices, and options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s inputs to input and indices.

    5. Set operator’s output to output.

  20. Return output.

Examples of how gather works in different slicing schemes.
// input of shape [4,3]:
//   [[ 0,  1,  2],
//    [10, 11, 12],
//    [20, 21, 22],
//    [30, 31, 32]]
const input = builder.constant(
  {shape: [4, 3]},
  new Float32Array([0, 1, 2, 10, 11, 12, 20, 21, 22, 30, 31, 32]));

const indices1 =
  builder.constant({dataType: 'uint32', shape: [2]}, new Uint32Array([3, 1]));

const indices2 = builder.constant(
  {dataType: 'uint32', shape: [3]}, new Uint32Array([2, 1, 1]));

const indices3 = builder.constant(
  {dataType: 'uint32', shape: [2, 2]}, new Uint32Array([0, 1, 1, 2]));

// axis = 0 (default)
// indices of shape [2]:
//   [3,1]
// output of shape [2,3]:
//   [[30, 31, 32],
//    [10, 11, 12]]
const output1 = builder.gather(input, indices1);

// axis = 1
// indices of shape [3]:
//   [2,1,1]
// output of shape [4,3]:
//   [[ 2,  1,  1],
//    [12, 11, 11],
//    [22, 21, 21],
//    [32, 31, 31]]
const output2 = builder.gather(input, indices2, {axis: 1});

// axis = 1
// indices of shape [2,2]:
//   [[0, 1],
//    [1, 2]]
// output of shape [4,2,2]:
//   [[[ 0,  1], [ 1,  2]],
//    [[10, 11], [11, 12]],
//    [[20, 21], [21, 22]],
//    [[30, 31], [31, 32]]]
const output3 = builder.gather(input, indices3, {axis: 1});

7.9.18. gelu

Compute the gaussian error linear unit function (GELU) of the input tensor. The calculation follows the expression 0.5 * x * (1 + erf(x / sqrt(2))).
partial interface MLGraphBuilder {
  MLOperand gelu(MLOperand input, optional MLOperatorOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits gelu;
};
Arguments:

Returns:

Constraints for gelu()
operand allowed data types allowed ranks
input "float32", "float16" N
output same as input same as input

MLOpSupportLimits has the following member for gelu():

gelu, of type MLSingleInputSupportLimits

Support limits for operator gelu().

The gelu(input, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and input returns false, then throw a TypeError.

  3. If input’s dataType is not one of its allowed data types (according to this table), then throw a TypeError.

  4. Make graph connections:

    1. Let output be the result of copying an MLOperand given input.

    2. Let operator be an operator for the "gelu" operation given options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s input to input.

    5. Set operator’s output to output.

  5. Return output.

The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
function gelu(builder, input) {
  return builder.mul(
    builder.mul(input, builder.constant(input.dataType, 0.5)),
    builder.add(
      builder.constant(input.dataType, 1),
      builder.erf(builder.div(
        input, builder.sqrt(builder.constant(input.dataType, 2))))));
}

7.9.19. gemm

Calculate the general matrix multiplication of the Basic Linear Algebra Subprograms. The calculation follows the expression alpha * A * B + beta * C, where A is a 2-D tensor with shape [M, K] or [K, M], B is a 2-D tensor with shape [K, N] or [N, K], and C is unidirectionally broadcastable to the shape [M, N]. A and B can optionally be transposed prior to the calculation.
dictionary MLGemmOptions : MLOperatorOptions {
  MLOperand c;
  double alpha = 1.0;
  double beta = 1.0;
  boolean aTranspose = false;
  boolean bTranspose = false;
};

partial interface MLGraphBuilder {
  MLOperand gemm(MLOperand a, MLOperand b, optional MLGemmOptions options = {});
};

dictionary MLGemmSupportLimits {
  MLSupportLimits a;
  MLSupportLimits b;
  MLSupportLimits c;
  MLSupportLimits output;
};

partial dictionary MLOpSupportLimits {
  MLGemmSupportLimits gemm;
};

MLGemmOptions has the following members:

c, of type MLOperand

The third input tensor. It is either a scalar, or of the shape that is unidirectionally broadcastable to the shape [M, N]. When it is not specified, the computation is done as if c is a scalar 0.0.

alpha, of type double, defaulting to 1.0

A multiplier for the first input.

beta, of type double, defaulting to 1.0

A multiplier for the third input c.

aTranspose, of type boolean, defaulting to false

Indicates if the first input is transposed prior to calculating the output.

bTranspose, of type boolean, defaulting to false

Indicates if the second input is transposed prior to calculating the output.

Arguments:

Returns: an MLOperand. The output 2-D tensor of shape [M, N] that contains the calculated product of all the inputs.

Constraints for gemm()
operand allowed data types allowed ranks
a "float32", "float16" 2
b same as a 2
c same as a 0 to 2
output same as a 2

MLGemmSupportLimits has the following members:

a, of type MLSupportLimits

MLSupportLimits for a operand.

b, of type MLSupportLimits

MLSupportLimits for b operand.

c, of type MLSupportLimits

MLSupportLimits for c operand.

output, of type MLSupportLimits

MLSupportLimits for output operand.

MLOpSupportLimits has the following member for gemm():

gemm, of type MLGemmSupportLimits

Support limits for operator gemm().

The gemm(a, b, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and any of a and b returns false, then throw a TypeError.

  3. If the dataType of any of a or b is not one of its allowed data types (according to this table), then throw a TypeError.

  4. If the rank of any of a or b is not its allowed rank, then throw a TypeError.

  5. Set options.alpha to the result of casting options.alpha to a’s dataType.

  6. Set options.beta to the result of casting options.beta to a’s dataType.

  7. Let shapeA be a clone of a’s shape.

  8. Let shapeB be a clone of b’s shape.

  9. If options.aTranspose is true, then reverse the order of the items in shapeA.

  10. If options.bTranspose is true, then reverse the order of the items in shapeB.

  11. If shapeA[1] is not equal to shapeB[0], then throw a TypeError.

  12. If options.c exists:

    1. If it is not unidirectionally broadcastable to the shape « shapeA[0], shapeB[1] », then throw a TypeError.

    2. If its dataType is not one of its allowed data types (according to this table), then throw a TypeError.

  13. Let desc be the result of creating an MLOperandDescriptor given a’s dataType and « shapeA[0], shapeB[1] ».

  14. Make graph connections:

    1. Let output be the result of creating an MLOperand given this and desc.

    2. Let operator be an operator for the "gemm" operation, given options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s inputs to a and b.

    5. If options.c exists, then add it to operator’s inputs.

    6. Set operator’s output to output.

  15. Return output.

The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
function gemm(builder, a, b, options) {
  if (options.aTranspose)
    a = builder.transpose(a);

  if (options.bTranspose)
    b = builder.transpose(b);

  let ab = builder.matmul(
    builder.mul(builder.constant(a.dataType, options.alpha), a), b);
  return (
    options.c ?
      builder.add(
        ab,
        builder.mul(builder.constant(a.dataType, options.beta), options.c)) :
      ab);
}

7.9.20. gru

Gated Recurrent Unit [GRU] recurrent network uses an update, reset, and new gate to compute the output state that rolls into the output across the temporal sequence of the network.
enum MLGruWeightLayout {
  "zrn",  // update-reset-new gate ordering
  "rzn"   // reset-update-new gate ordering
};

enum MLRecurrentNetworkActivation {
  "relu",
  "sigmoid",
  "tanh"
};

enum MLRecurrentNetworkDirection {
  "forward",
  "backward",
  "both"
};

dictionary MLGruOptions : MLOperatorOptions {
  MLOperand bias;
  MLOperand recurrentBias;
  MLOperand initialHiddenState;
  boolean resetAfter = true;
  boolean returnSequence = false;
  MLRecurrentNetworkDirection direction = "forward";
  MLGruWeightLayout layout = "zrn";
  sequence<MLRecurrentNetworkActivation> activations;
};

partial interface MLGraphBuilder {
  sequence<MLOperand> gru(MLOperand input,
                          MLOperand weight,
                          MLOperand recurrentWeight,
                          [EnforceRange] unsigned long steps,
                          [EnforceRange] unsigned long hiddenSize,
                          optional MLGruOptions options = {});
};

dictionary MLGruSupportLimits {
  MLSupportLimits input;
  MLSupportLimits weight;
  MLSupportLimits recurrentWeight;
  MLSupportLimits bias;
  MLSupportLimits recurrentBias;
  MLSupportLimits initialHiddenState;
  MLSupportLimits outputs;
};

partial dictionary MLOpSupportLimits {
  MLGruSupportLimits gru;
};

MLGruOptions has the following members:

bias, of type MLOperand

The 2-D input bias tensor of shape [numDirections, 3 * hiddenSize]. The ordering of the bias vectors in the second dimension of the tensor shape is specified according to the layout argument.

recurrentBias, of type MLOperand

The 2-D recurrent bias tensor of shape [numDirections, 3 * hiddenSize]. The ordering of the bias vectors in the second dimension of the tensor shape is specified according to the layout argument.

initialHiddenState, of type MLOperand

The 3-D initial hidden state tensor of shape [numDirections, batchSize, hiddenSize]. When not specified, implementations must use a tensor filled with zero.

resetAfter, of type boolean, defaulting to true

Indicates whether to apply the reset gate after or before matrix multiplication.

returnSequence, of type boolean, defaulting to false

Indicates whether to also return the entire sequence with every output from each time step in it in addition to the output of the last time step.

direction, of type MLRecurrentNetworkDirection, defaulting to "forward"

The processing direction of the input sequence. When set to "both", the size of the first dimension of the weight and the bias tensor shapes must be 2, and the input is processed in both directions.

layout, of type MLGruWeightLayout, defaulting to "zrn"

The ordering of the weight and bias vectors for the internal gates of GRU, specifically the update (z), reset (r), and new (n) gate, as indicated in the second dimension of the weight and bias tensor shape.

activations, of type sequence<MLRecurrentNetworkActivation>

Specifies a pair of activation functions with the first function used for the update and reset gate, and the second used for the new gate. When not specified, defaults to the "sigmoid" and "tanh" functions, respectively.

Arguments:

Returns: sequence<MLOperand>. The first element is a 3-D tensor of shape [numDirections, batchSize, hiddenSize], the cell output from the last time step of the network. Additionally, if options.returnSequence is set to true, the second element is the 4-D output tensor of shape [steps, numDirections, batchSize, hiddenSize] containing every cell outputs from each time step in the temporal sequence.

Constraints for gru()
operand allowed data types allowed ranks
input "float32", "float16" 3
weight same as input 3
recurrentWeight same as input 3
bias same as input 2
recurrentBias same as input 2
initialHiddenState same as input 3
outputs[0] same as input 3
outputs[1] if returnSequence is true same as input 4

MLGruSupportLimits has the following members:

input, of type MLSupportLimits

MLSupportLimits for input operand.

weight, of type MLSupportLimits

MLSupportLimits for weight operand.

recurrentWeight, of type MLSupportLimits

MLSupportLimits for recurrentWeight operand.

bias, of type MLSupportLimits

MLSupportLimits for bias operand.

recurrentBias, of type MLSupportLimits

MLSupportLimits for recurrentBias operand.

initialHiddenState, of type MLSupportLimits

MLSupportLimits for initialHiddenState operand.

outputs, of type MLSupportLimits

MLSupportLimits for all the output operands.

MLOpSupportLimits has the following member for gru():

gru, of type MLGruSupportLimits

Support limits for operator gru().

The gru(input, weight, recurrentWeight, steps, hiddenSize, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and any of input, weight, recurrentWeight, options.bias (if it exists), options.recurrentBias (if it exists), and options.initialHiddenState (if it exists) returns false, then throw a TypeError.

  3. If the dataType of any of input, weight or recurrentWeight is not one of its allowed data types (according to this table), then throw a TypeError.

  4. If the rank of any of input, weight or recurrentWeight is not its allowed rank, then throw a TypeError.

  5. If input’s shape[0] is not equal to steps, then throw a TypeError.

  6. Let batchSize be input’s shape[1].

  7. Let inputSize be input’s shape[2].

  8. Let numDirections be 2 if options.direction is "both", or 1 otherwise.

  9. If weight’s shape is not equal to « numDirections, 3 * hiddenSize, inputSize », then throw a TypeError.

  10. If recurrentWeight’s shape is not equal to « numDirections, 3 * hiddenSize, hiddenSize », then throw a TypeError.

  11. If hiddenSize * 6 is not a valid dimension, then throw a TypeError.

    Why hiddenSize * 6 ? Some underlying platforms operate on a single bias tensor which is a concatenation of bias and recurrentBias. Therefore, 3 * hiddenSize + 3 * hiddenSize also needs to be a valid dimension.
  12. If options.bias exists:

    1. If its dataType is not one of its allowed data types (according to this table), then throw a TypeError.

    2. If its shape is not equal to « numDirections, 3 * hiddenSize », then throw a TypeError.

  13. If options.recurrentBias exists:

    1. If its dataType is not one of its allowed data types (according to this table), then throw a TypeError.

    2. If its shape is not equal to « numDirections, 3 * hiddenSize », then throw a TypeError.

  14. If options.initialHiddenState exists:

    1. If its dataType is not one of its allowed data types (according to this table), then throw a TypeError.

    2. If its shape is not equal to « numDirections, batchSize, hiddenSize », then throw a TypeError.

  15. If options.activations exists:

    1. If its size is not 2, then throw a TypeError.

    2. Let activations be a clone of options.activations.

  16. Otherwise:

    1. Let activations be « "sigmoid", "tanh" ».

  17. Calculate the output shape:

    1. Let desc0 be the result of creating an MLOperandDescriptor given input’s dataType and « numDirections, batchSize, hiddenSize ».

    2. If options.returnSequence is true:

      1. Let desc1 be the result of creating an MLOperandDescriptor given input’s dataType and « steps, numDirections, batchSize, hiddenSize ».

  18. Make graph connections:

    1. Let operator be an operator for the "gru" operation, given weight, recurrentWeight, steps, hiddenSize and options.

    2. Let output0 be the result of creating an MLOperand given this and desc0.

    3. If options.returnSequence is true:

      1. Let output1 be the result of creating an MLOperand given this and desc1.

      2. Let output be the list « output0, output1 ».

      3. Set output0.[[operator]] and output1.[[operator]] to operator.

    4. Otherwise:

      1. Let output be the list « output0 ».

      2. Set output0.[[operator]] to operator.

    5. Set operator’s inputs to input, weight, and recurrentWeight.

    6. If options.bias exists, then add it to operator’s inputs.

    7. If options.recurrentBias exists, then add it to operator’s inputs.

    8. If options.initialHiddenState exists, then add it to operator’s inputs.

    9. Set operator’s activation functions to a clone of activations.

    10. Set operator’s output to output.

  19. Return output.

Using a squeeze() helper, the behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
function gru(
  builder, input, weight, recurrentWeight, steps, hiddenSize, options) {
  const batchSize = input.shape[1];
  const inputSize = input.shape[2];
  const numDirections = (options.direction == 'both' ? 2 : 1);
  let hiddenState = options.initialHiddenState;

  if (!hiddenState) {
    const desc = {dataType: 'float32', shape: [numDirections, 1, hiddenSize]};
    const totalSize = numDirections * hiddenSize;
    hiddenState = builder.constant(desc, new Float32Array(totalSize).fill(0));
  }

  let sequence = null;
  let currentWeight = [];
  let currentRecurrentWeight = [];
  let currentBias = [];
  let currentRecurrentBias = [];

  for (let dir = 0; dir < numDirections; ++dir) {
    currentWeight.push(squeeze(
      builder,
      builder.slice(weight, [dir, 0, 0], [1, 3 * hiddenSize, inputSize])));
    currentRecurrentWeight.push(squeeze(
      builder,
      builder.slice(
        recurrentWeight, [dir, 0, 0], [1, 3 * hiddenSize, hiddenSize])));
    currentBias.push(
      options.bias ?
        (squeeze(
          builder,
          builder.slice(options.bias, [dir, 0], [1, 3 * hiddenSize]))) :
        null);
    currentRecurrentBias.push(
      options.recurrentBias ?
        (squeeze(
          builder,
          builder.slice(
            options.recurrentBias, [dir, 0], [1, 3 * hiddenSize]))) :
        null);
  }

  for (let step = 0; step < steps; ++step) {
    let currentHidden = [];
    let currentOutput = null;

    for (let dir = 0; dir < numDirections; ++dir) {
      currentHidden.push(squeeze(
        builder,
        builder.slice(hiddenState, [dir, 0, 0], [1, batchSize, hiddenSize])));
    }

    for (let dir = 0; dir < numDirections; ++dir) {
      let slice =
        (dir == 1 || options.direction == 'backward' ? steps - step - 1 : step);
      let currentInput = squeeze(
        builder,
        builder.slice(input, [slice, 0, 0], [1, batchSize, inputSize]));

      let result = builder.reshape(
        builder.gruCell(
          currentInput,
          currentWeight[dir],
          currentRecurrentWeight[dir],
          currentHidden[dir],
          hiddenSize,
          {
            bias: currentBias[dir],
            recurrentBias: currentRecurrentBias[dir],
            resetAfter: options.resetAfter,
            layout: options.layout,
            activations: options.activations
          }),
        [1, batchSize, hiddenSize]);

      currentOutput =
        (currentOutput ? builder.concat([currentOutput, result], 0) : result);
    }

    hiddenState = currentOutput;

    if (options.returnSequence) {
      currentOutput = builder.reshape(
        currentOutput, [1, numDirections, batchSize, hiddenSize]);
      sequence =
        (sequence ? builder.concat([sequence, currentOutput], 0) :
                    currentOutput);
    }
  }

  return (sequence ? [hiddenState, sequence] : [hiddenState]);
}

7.9.21. gruCell

A single time step of the Gated Recurrent Unit [GRU] recurrent network using an update gate and a reset gate to compute the hidden state that rolls into the output across the temporal sequence of a recurrent network.
dictionary MLGruCellOptions : MLOperatorOptions {
  MLOperand bias;
  MLOperand recurrentBias;
  boolean resetAfter = true;
  MLGruWeightLayout layout = "zrn";
  sequence<MLRecurrentNetworkActivation> activations;
};

partial interface MLGraphBuilder {
  MLOperand gruCell(MLOperand input,
                    MLOperand weight,
                    MLOperand recurrentWeight,
                    MLOperand hiddenState,
                    [EnforceRange] unsigned long hiddenSize,
                    optional MLGruCellOptions options = {});
};

dictionary MLGruCellSupportLimits {
  MLSupportLimits input;
  MLSupportLimits weight;
  MLSupportLimits recurrentWeight;
  MLSupportLimits hiddenState;
  MLSupportLimits bias;
  MLSupportLimits recurrentBias;
  MLSupportLimits output;
};

partial dictionary MLOpSupportLimits {
  MLGruCellSupportLimits gruCell;
};

MLGruCellOptions has the following members:

bias, of type MLOperand

The 1-D input bias tensor of shape [3 * hiddenSize]. The ordering of the bias vectors in the second dimension of the tensor shape is specified according to the layout argument.

recurrentBias, of type MLOperand

The 1-D recurrent bias tensor of shape [3 * hiddenSize]. The ordering of the bias vectors in the second dimension of the tensor shape is specified according to the layout argument.

resetAfter, of type boolean, defaulting to true

Indicates whether to apply the reset gate after or before matrix multiplication.

layout, of type MLGruWeightLayout, defaulting to "zrn"

The ordering of the weight and bias vectors for the internal gates of GRU, specifically the update (z), reset (r), and new (n) gate, as indicated in the second dimension of the weight and bias tensor shape.

activations, of type sequence<MLRecurrentNetworkActivation>

Specifies a pair of activation functions with the first function used for the update and reset gate, and the second used for the new gate. When not specified, defaults to the "sigmoid" and "tanh" functions, respectively.

Arguments:

Returns: an MLOperand. The 2-D tensor of shape [batchSize, hiddenSize], the cell output hidden state of a single time step of the recurrent network.

Constraints for gruCell()
operand allowed data types allowed ranks
input "float32", "float16" 2
weight same as input 2
recurrentWeight same as input 2
bias same as input 1
recurrentBias same as input 1
output same as input 2

MLGruCellSupportLimits has the following members;

input, of type MLSupportLimits

MLSupportLimits for input operand.

weight, of type MLSupportLimits

MLSupportLimits for weight operand.

recurrentWeight, of type MLSupportLimits

MLSupportLimits for recurrentWeight operand.

hiddenState, of type MLSupportLimits

MLSupportLimits for hiddenState operand.

bias, of type MLSupportLimits

MLSupportLimits for bias operand.

recurrentBias, of type MLSupportLimits

MLSupportLimits for recurrentBias operand.

output, of type MLSupportLimits

MLSupportLimits for output operand.

MLOpSupportLimits has the following member for gruCell():

gruCell, of type MLGruCellSupportLimits

Support limits for operator gruCell().

The gruCell(input, weight, recurrentWeight, hiddenState, hiddenSize, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and any of input, weight, recurrentWeight, hiddenState, options.bias (if it exists), and options.recurrentBias (if it exists) returns false, then throw a TypeError.

  3. If the dataType of any of input, weight, recurrentWeight, or hiddenState is not one of its allowed data types (according to this table), then throw a TypeError.

  4. If the rank of any of input, weight, recurrentWeight or hiddenState is not its allowed ranks (according to this table), then throw a TypeError.

  5. Let batchSize be input’s shape[0].

  6. Let inputSize be input’s shape[1].

  7. If weight’s shape is not equal to « 3 * hiddenSize, inputSize », then throw a TypeError.

  8. If recurrentWeight’s shape is not equal to « 3 * hiddenSize, hiddenSize », then throw a TypeError.

  9. If hiddenState’s shape is not equal to « batchSize, hiddenSize », then throw a TypeError.

  10. If hiddenSize * 6 is not a valid dimension, then throw a TypeError.

    Why hiddenSize * 6 ? Some underlying platforms operate on a single bias tensor which is a concatenation of bias and recurrentBias. Therefore, 3 * hiddenSize + 3 * hiddenSize also needs to be a valid dimension.
  11. If options.bias exists:

    1. If its dataType is not one of its allowed data types (according to this table), then throw a TypeError.

    2. If its shape is not equal to « 3 * hiddenSize », then throw a TypeError.

  12. If options.recurrentBias exists:

    1. If its dataType is not one of its allowed data types (according to this table), then throw a TypeError.

    2. If its shape is not equal to « 3 * hiddenSize », then throw a TypeError.

  13. If options.activations exists:

    1. If its size is not 2, then throw a TypeError.

    2. Let activations be a clone of options.activations.

  14. Otherwise:

    1. Let activations be « "sigmoid", "tanh" ».

  15. Let desc be the result of creating an MLOperandDescriptor given input’s dataType and « batchSize, hiddenSize ».

  16. Make graph connections:

    1. Let output be the result of creating an MLOperand given this and desc.

    2. Let operator be an operator for the "gruCell" operation, given weight, recurrentWeight, hiddenState, hiddenSize and options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s inputs to input, weight, recurrentWeight, and hiddenState.

    5. If options.bias exists, then add it to operator’s inputs.

    6. If options.recurrentBias exists, then add it to operator’s inputs.

    7. Set operator’s activation functions to a clone of activations.

    8. Set operator’s output to output.

  17. Return output.

The behavior of this operation when the weight layout is the default "zrn" layout, and the activation functions of the update/reset gate and new gate are sigmoid() and tanh() respectively can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
function gruCell(
  builder, input, weight, recurrentWeight, hiddenState, hiddenSize, options) {
  const one = builder.constant(input.dataType, 1);
  const zero = builder.constant(input.dataType, 0);

  const inputSize = input.shape[1];

  // update gate (z)
  let z = builder.sigmoid(builder.add(
    builder.add(
      (options.bias ? builder.slice(options.bias, [0], [hiddenSize]) : zero),
      (options.recurrentBias ?
         builder.slice(options.recurrentBias, [0], [hiddenSize]) :
         zero)),
    builder.add(
      builder.matmul(
        input,
        builder.transpose(
          builder.slice(weight, [0, 0], [hiddenSize, inputSize]))),
      builder.matmul(
        hiddenState,
        builder.transpose(
          builder.slice(recurrentWeight, [0, 0], [hiddenSize, hiddenSize]))))));

  // reset gate (r)
  let r = builder.sigmoid(builder.add(
    builder.add(
      (options.bias ? builder.slice(options.bias, [hiddenSize], [hiddenSize]) :
                      zero),
      (options.recurrentBias ?
         builder.slice(options.recurrentBias, [hiddenSize], [hiddenSize]) :
         zero)),
    builder.add(
      builder.matmul(
        input,
        builder.transpose(
          builder.slice(weight, [hiddenSize, 0], [hiddenSize, inputSize]))),
      builder.matmul(
        hiddenState,
        builder.transpose(builder.slice(
          recurrentWeight, [hiddenSize, 0], [hiddenSize, hiddenSize]))))));

  // new gate (n)
  let n;
  if (options.resetAfter) {
    n = builder.tanh(builder.add(
      (options.bias ?
         builder.slice(options.bias, [2 * hiddenSize], [hiddenSize]) :
         zero),
      builder.add(
        builder.matmul(
          input,
          builder.transpose(builder.slice(
            weight, [2 * hiddenSize, 0], [hiddenSize, inputSize]))),
        builder.mul(
          r,
          builder.add(
            (options.recurrentBias ?
               builder.slice(
                 options.recurrentBias, [2 * hiddenSize], [hiddenSize]) :
               zero),
            builder.matmul(
              hiddenState,
              builder.transpose(builder.slice(
                recurrentWeight,
                [2 * hiddenSize, 0],
                [hiddenSize, hiddenSize]))))))));
  } else {
    n = builder.tanh(builder.add(
      builder.add(
        (options.bias ?
           builder.slice(options.bias, [2 * hiddenSize], [hiddenSize]) :
           zero),
        (options.recurrentBias ?
           builder.slice(
             options.recurrentBias, [2 * hiddenSize], [hiddenSize]) :
           zero)),
      builder.add(
        builder.matmul(
          input,
          builder.transpose(builder.slice(
            weight, [2 * hiddenSize, 0], [hiddenSize, inputSize]))),
        builder.matmul(
          builder.mul(r, hiddenState),
          builder.transpose(builder.slice(
            recurrentWeight,
            [2 * hiddenSize, 0],
            [hiddenSize, hiddenSize]))))));
  }

  // compute the new hidden state
  return builder.add(
    builder.mul(z, hiddenState), builder.mul(n, builder.sub(one, z)));
}

7.9.22. hardSigmoid

Calculate the non-smooth hard sigmoid function on the input tensor, used instead of the sigmoid function for faster computation.
dictionary MLHardSigmoidOptions : MLOperatorOptions {
  double alpha = 0.2;
  double beta = 0.5;
};

partial interface MLGraphBuilder {
  MLOperand hardSigmoid(MLOperand input, optional MLHardSigmoidOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits hardSigmoid;
};

MLHardSigmoidOptions has the following members:

alpha, of type double, defaulting to 0.2

A scalar multiplier.

beta, of type double, defaulting to 0.5

A scalar addition.

Arguments:

Returns:

Constraints for hardSigmoid()
operand allowed data types allowed ranks
input "float32", "float16" N
output same as input same as input

MLOpSupportLimits has the following member for hardSigmoid():

hardSigmoid, of type MLSingleInputSupportLimits

Support limits for operator hardSigmoid().

The hardSigmoid(input, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and input returns false, then throw a TypeError.

  3. If input’s dataType is not one of its allowed data types (according to this table), then throw a TypeError.

  4. Set options.alpha to the result of casting options.alpha to input’s dataType.

  5. Set options.beta to the result of casting options.beta to input’s dataType.

  6. Make graph connections:

    1. Let output be the result of copying an MLOperand given input.

    2. Let operator be an operator for the "hardSigmoid" operation, given options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s input to input.

    5. Set operator’s output to output.

  7. Return output.

The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
function hardSigmoid(builder, input, options) {
  return builder.max(
    builder.min(
      builder.add(
        builder.mul(builder.constant(input.dataType, options.alpha), input),
        builder.constant(input.dataType, options.beta)),
      builder.constant(input.dataType, 1)),
    builder.constant(input.dataType, 0));
}

7.9.23. hardSwish

Computes the nonlinear function y = x * max(0, min(6, (x + 3))) / 6 that is introduced by [MobileNetV3] on the input tensor element-wise.
partial interface MLGraphBuilder {
  MLOperand hardSwish(MLOperand input, optional MLOperatorOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits hardSwish;
};
Arguments:

Returns:

Constraints for hardSwish()
operand allowed data types allowed ranks
input "float32", "float16" N
output same as input same as input

MLOpSupportLimits has the following member for hardSwish():

hardSwish, of type MLSingleInputSupportLimits

Support limits for operator hardSwish().

The hardSwish(input, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and input returns false, then throw a TypeError.

  3. If input’s dataType is not one of its allowed data types (according to this table), then throw a TypeError.

  4. Make graph connections:

    1. Let output be the result of copying an MLOperand given input.

    2. Let operator be an operator for the "hardSwish" operation, given options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s input to input.

    5. Set operator’s output to output.

  5. Return output.

The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
function hardSwish(builder, input, options) {
  return builder.div(
    builder.mul(
      input,
      builder.max(
        builder.constant(input.dataType, 0),
        builder.min(
          builder.constant(input.dataType, 6),
          builder.add(input, builder.constant(input.dataType, 3))))),
    builder.constant(input.dataType, 6));
}

7.9.24. instanceNormalization

Normalize the input using [Instance-Normalization]. Unlike batchNormalization() where the mean and variance values used in the normalization are computed across all the samples in the batch dimension while the model is trained, the mean and variance values used in the instance normalization are computed on the fly for each input feature of each individual sample in the batch.
dictionary MLInstanceNormalizationOptions : MLOperatorOptions {
  MLOperand scale;
  MLOperand bias;
  double epsilon = 1e-5;
  MLInputOperandLayout layout = "nchw";
};

partial interface MLGraphBuilder {
  MLOperand instanceNormalization(MLOperand input,
                                  optional MLInstanceNormalizationOptions options = {});
};

dictionary MLNormalizationSupportLimits {
  MLSupportLimits input;
  MLSupportLimits scale;
  MLSupportLimits bias;
  MLSupportLimits output;
};

partial dictionary MLOpSupportLimits {
  MLNormalizationSupportLimits instanceNormalization;
};

MLInstanceNormalizationOptions has the following members:

scale, of type MLOperand

The 1-D tensor of the scaling values whose size is equal to the number of channels, i.e. the size of the feature dimension of the input. For example, for an input tensor with "nchw" layout, the size is equal to input’s shape[1].

bias, of type MLOperand

The 1-D tensor of the bias values whose size is equal to the size of the feature dimension of the input. For example, for an input tensor with "nchw" layout, the size is equal to input’s shape[1].

epsilon, of type double, defaulting to 1e-5

A small value to prevent computational error due to divide-by-zero.

layout, of type MLInputOperandLayout, defaulting to "nchw"

The layout format of the input.

Arguments:

Returns: an MLOperand. The instance-normalized 4-D tensor of the same shape as input.

Constraints for instanceNormalization()
operand allowed data types allowed ranks
input "float32", "float16" 4
scale same as input 1
bias same as input 1
output same as input 4

MLNormalizationSupportLimits has the following members:

input, of type MLSupportLimits

MLSupportLimits for input operand.

scale, of type MLSupportLimits

MLSupportLimits for scale operand.

bias, of type MLSupportLimits

MLSupportLimits for bias operand.

output, of type MLSupportLimits

MLSupportLimits for output operand.

MLOpSupportLimits has the following member for instanceNormalization():

instanceNormalization, of type MLNormalizationSupportLimits

Support limits for operator instanceNormalization().

The instanceNormalization(input, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and any of input, options.scale (if it exists), and options.bias (if it exists) returns false, then throw a TypeError.

  3. If input’s dataType is not one of its allowed data types (according to this table), then throw a TypeError.

  4. If input’s rank is not its allowed rank, then throw a TypeError.

  5. Set options.epsilon to the result of casting options.epsilon to input’s dataType.

  6. Let axis be 1 if options.layout is "nchw", and 3 otherwise.

  7. If options.scale exists:

    1. If its dataType is not one of its allowed data types (according to this table), then throw a TypeError.

    2. If its shape is not equal to « input’s shape[axis] », then throw a TypeError.

  8. If options.bias exists:

    1. If its dataType is not one of its allowed data types (according to this table), then throw a TypeError.

    2. If its shape is not equal to « input’s shape[axis] », then throw a TypeError.

  9. Make graph connections:

    1. Let output be the result of copying an MLOperand given input.

    2. Let operator be an operator for the "instanceNormalization" operation, given options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s input to input.

    5. If options.scale exists, then add it to operator’s inputs.

    6. If options.bias exists, then add it to operator’s inputs.

    7. Set operator’s output to output.

  10. Return output.

The behavior of this operation when the input tensor is 4-D of the "nchw" layout can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
function instanceNormalization(builder, input, options) {
  // The reduction of the mean and variance values happens over the spatial
  // dimensions of the input e.g. axis 2 and 3 of the input tensor.
  const reduceOptions = {axes: [2, 3], keepDimensions: true};
  const mean = builder.reduceMean(input, reduceOptions);
  const variance = builder.reduceMean(
    builder.pow(builder.sub(input, mean), builder.constant(input.dataType, 2)),
    reduceOptions);

  // The scale and bias values are applied per input feature
  // e.g. axis 1 of the input tensor.
  const shape = [1, input.shape[1], 1, 1];
  return builder.add(
    builder.mul(
      builder.reshape(options.scale, shape),
      builder.div(
        builder.sub(input, mean),
        builder.sqrt(builder.add(variance, options.epsilon)))),
    builder.reshape(options.bias, shape));
}

7.9.25. layerNormalization

Normalize the input using [Layer-Normalization]. Unlike batchNormalization() where the mean and variance values are computed across all the samples in the batch dimension while the model is trained, and in instanceNormalization() where the mean and variance values are computed on the fly for each input feature of each individual sample in the batch, the means and variance values of the layer normalization are computed on the fly across all the input features of each individual sample in the batch.
dictionary MLLayerNormalizationOptions : MLOperatorOptions {
  MLOperand scale;
  MLOperand bias;
  sequence<[EnforceRange] unsigned long> axes;
  double epsilon = 1e-5;
};

partial interface MLGraphBuilder {
  MLOperand layerNormalization(MLOperand input,
                               optional MLLayerNormalizationOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLNormalizationSupportLimits layerNormalization;
};

MLLayerNormalizationOptions has the following members:

scale, of type MLOperand

The N-D tensor of the scaling values whose shape is determined by the axes member in that each value in axes indicates the dimension of the input tensor with scaling values. For example, for an axes values of [1,2,3], the shape of this tensor is the list of the corresponding sizes of the input dimension 1, 2 and 3. When this member is not present, the scaling value is assumed to be 1.

bias, of type MLOperand

The N-D tensor of the bias values whose shape is determined by the axes member in that each value in axes indicates the dimension of the input tensor with bias values. For example, for an axes values of [1,2,3], the shape of this tensor is the list of the corresponding sizes of the input dimension 1, 2 and 3. When this member is not present, the bias value is assumed to be 0.

axes, of type sequence<[EnforceRange] unsigned long>

The indices to the input dimensions to reduce. When this member is not present, it is treated as if all dimensions except the first were given (e.g. for a 4-D input tensor, axes = [1,2,3]). That is, the reduction for the mean and variance values are calculated across all the input features for each independent batch. If empty, no dimensions are reduced.

epsilon, of type double, defaulting to 1e-5

A small value to prevent computational error due to divide-by-zero.

Arguments:

Returns: an MLOperand. The layer-normalized N-D tensor of the same shape as input.

Constraints for layerNormalization()
operand allowed data types allowed ranks
input "float32", "float16" N
scale same as input 0 to input's rank
bias same as input 0 to input's rank
output same as input same as input

MLOpSupportLimits has the following member for layerNormalization():

layerNormalization, of type MLNormalizationSupportLimits

Support limits for operator layerNormalization().

The layerNormalization(input, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and any of input, options.scale (if it exists), and options.bias (if it exists) returns false, then throw a TypeError.

  3. If input’s dataType is not one of its allowed data types (according to this table), then throw a TypeError.

  4. If options.axes does not exist, then set options.axes to a new list, either equal to the range from 1 to input’s rank, exclusive, if input’s rank is greater than 1, or an empty list otherwise.

  5. Otherwise, if options.axes contains duplicate values, or if any of its elements is not in the range 0 to input’s rank, exclusive, then return failure.

  6. Set options.epsilon to the result of casting options.epsilon to input’s dataType.

  7. If options.scale exists:

    1. If its dataType is not one of its allowed data types (according to this table), then throw a TypeError.

    2. If its rank is not equal to options.axes's size, then throw a TypeError.

  8. If options.bias exists:

    1. If its dataType is not one of its allowed data types (according to this table), then throw a TypeError.

    2. If its rank is not equal to options.axes's size, then throw a TypeError.

  9. For each index in the range 0 to options.axes's size, exclusive:

    1. Let axis be options.axes[index].

    2. If axis is greater or equal to input’s rank, then throw a TypeError.

    3. Let size be input’s shape[axis].

    4. If options.scale exists:

      1. If its shape[index] is not equal to size, then throw a TypeError.

    5. If options.bias exists:

      1. If its shape[index] is not equal to size, then throw a TypeError.

  10. Make graph connections:

    1. Let output be the result of copying an MLOperand given input.

    2. Let operator be an operator for the "layerNormalization" operation, given options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s input to input.

    5. If options.scale exists, then add it to operator’s inputs.

    6. If options.bias exists, then add it to operator’s inputs.

    7. Set operator’s output to output.

  11. Return output.

The behavior of this operation when the axes parameter is set to [1,2,3] can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
function layerNormalization(builder, input, options) {
  // The reduction of the mean and variance values happens over the spatial
  // dimensions across all the input features (i.e. all channels) of the input
  // tensor.
  const reduceOptions = {axes: [1, 2, 3], keepDimensions: true};
  const mean = builder.reduceMean(input, reduceOptions);
  const variance = builder.reduceMean(
    builder.pow(builder.sub(input, mean), builder.constant(input.dataType, 2)),
    reduceOptions);

  // The scale and bias tensors are of the shape of the input
  // specified by the values in the axes parameter (i.e. [1,2,3]).
  return builder.add(
    builder.mul(
      options.scale,
      builder.div(
        builder.sub(input, mean),
        builder.sqrt(builder.add(variance, options.epsilon)))),
    options.bias);
}

7.9.26. leakyRelu

Calculate the leaky version of rectified linear function on the input tensor element-wise. The calculation follows the expression max(0, x) + alpha * min(0, x).
dictionary MLLeakyReluOptions : MLOperatorOptions {
  double alpha = 0.01;
};

partial interface MLGraphBuilder {
  MLOperand leakyRelu(MLOperand input, optional MLLeakyReluOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits leakyRelu;
};

MLLeakyReluOptions has the following members:

alpha, of type double, defaulting to 0.01

A scalar multiplier.

Arguments:

Returns:

Constraints for leakyRelu()
operand allowed data types allowed ranks
input "float32", "float16" N
output same as input same as input

MLOpSupportLimits has the following member for leakyRelu():

leakyRelu, of type MLSingleInputSupportLimits

Support limits for operator leakyRelu().

The leakyRelu(input, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and input returns false, then throw a TypeError.

  3. If input’s dataType is not one of its allowed data types (according to this table), then throw a TypeError.

  4. Set options.alpha to the result of casting options.alpha to input’s dataType.

  5. Make graph connections:

    1. Let output be the result of copying an MLOperand given input.

    2. Let operator be an operator for the "leakyRelu" operation, given options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s input to input.

    5. Set operator’s output to output.

  6. Return output.

The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
function leakyRelu(builder, input, options) {
  return builder.add(
    builder.max(builder.constant(input.dataType, 0), input),
    builder.mul(
      builder.constant(input.dataType, options.alpha),
      builder.min(builder.constant(input.dataType, 0), input)));
}

7.9.27. linear

Calculate a linear function y = alpha * x + beta on the input tensor.
dictionary MLLinearOptions : MLOperatorOptions {
  double alpha = 1;
  double beta = 0;
};

partial interface MLGraphBuilder {
  MLOperand linear(MLOperand input, optional MLLinearOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits linear;
};

MLLinearOptions has the following members:

alpha, of type double, defaulting to 1

A scalar multiplier.

beta, of type double, defaulting to 0

A scalar addition.

Arguments:

Returns:

Constraints for linear()
operand allowed data types allowed ranks
input "float32", "float16" N
output same as input same as input

MLOpSupportLimits has the following member for linear():

linear, of type MLSingleInputSupportLimits

Support limits for operator linear().

The linear(input, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and input returns false, then throw a TypeError.

  3. If input’s dataType is not one of its allowed data types (according to this table), then throw a TypeError.

  4. Set options.alpha to the result of casting options.alpha to input’s dataType.

  5. Set options.beta to the result of casting options.beta to input’s dataType.

  6. Make graph connections:

    1. Let output be the result of copying an MLOperand given input.

    2. Let operator be an operator for the "linear" operation, given options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s input to input.

    5. Set operator’s output to output.

  7. Return output.

The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
function linear(builder, input, options) {
  return builder.add(
    builder.mul(input, builder.constant(input.dataType, options.alpha)),
    builder.constant(input.dataType, options.beta));
}

7.9.28. lstm

Long Short-Term Memory [LSTM] recurrent network uses an input, output, forget, and cell gate to compute the output state that rolls into the output across the temporal sequence of the network.
enum MLLstmWeightLayout {
  "iofg", // input-output-forget-cell gate ordering
  "ifgo"  // input-forget-cell-output gate ordering
};

dictionary MLLstmOptions : MLOperatorOptions {
  MLOperand bias;
  MLOperand recurrentBias;
  MLOperand peepholeWeight;
  MLOperand initialHiddenState;
  MLOperand initialCellState;
  boolean returnSequence = false;
  MLRecurrentNetworkDirection direction = "forward";
  MLLstmWeightLayout layout = "iofg";
  sequence<MLRecurrentNetworkActivation> activations;
};

partial interface MLGraphBuilder {
  sequence<MLOperand> lstm(MLOperand input,
                           MLOperand weight,
                           MLOperand recurrentWeight,
                           [EnforceRange] unsigned long steps,
                           [EnforceRange] unsigned long hiddenSize,
                           optional MLLstmOptions options = {});
};

dictionary MLLstmSupportLimits {
  MLSupportLimits input;
  MLSupportLimits weight;
  MLSupportLimits recurrentWeight;
  MLSupportLimits bias;
  MLSupportLimits recurrentBias;
  MLSupportLimits peepholeWeight;
  MLSupportLimits initialHiddenState;
  MLSupportLimits initialCellState;
  MLSupportLimits outputs;
};

partial dictionary MLOpSupportLimits {
  MLLstmSupportLimits lstm;
};

MLLstmOptions has the following members:

bias, of type MLOperand

The 2-D input bias tensor of shape [numDirections, 4 * hiddenSize]. The ordering of the bias vectors in the second dimension of the tensor shape is specified according to layout.

recurrentBias, of type MLOperand

The 2-D recurrent bias tensor of shape [numDirections, 4 * hiddenSize]. The ordering of the bias vectors in the first dimension of the tensor shape is specified according to layout.

peepholeWeight, of type MLOperand

The 2-D weight tensor for peepholes of shape [numDirections, 3 * hiddenSize]. The pack ordering of the weight vectors is for the input (i), output (o), and forget (f) gate, respectively.

initialHiddenState, of type MLOperand

The 3-D initial hidden state tensor of shape [numDirections, batchSize, hiddenSize]. When not specified, implementations must use a tensor filled with zero.

initialCellState, of type MLOperand

The 3-D initial hidden state tensor of shape [numDirections, batchSize, hiddenSize]. When not specified, implementations must use a tensor filled with zero.

returnSequence, of type boolean, defaulting to false

Indicates whether to also return the entire sequence with every output from each time step in it in addition to the output of the last time step.

direction, of type MLRecurrentNetworkDirection, defaulting to "forward"

The processing direction of the input sequence. When set to "both", the size of the first dimension of the weight and the bias tensor shapes must be 2, and the input is processed in both directions.

layout, of type MLLstmWeightLayout, defaulting to "iofg"

The ordering of the weight and bias vectors for the internal gates of LSTM, specifically the input (i), output (o), forget (f), and cell (g) gate, as indicated in the first dimension of the weight and bias tensor shapes.

activations, of type sequence<MLRecurrentNetworkActivation>

A list of three activation functions, the first one is used for the input (i), forget (f), and output (o) gate, the second one is used for the cell (g) gate, and the last used for filtering the output cell state before combining it with the result of the output gate to form the output hidden state. When not specified, defaults to a sequence of the "sigmoid", "tanh", and "tanh" functions, respectively.

Arguments:

Returns: sequence<MLOperand>. The first element is a 3-D tensor of shape [numDirections, batchSize, hiddenSize], the output hidden state from the last time step of the network. The second element is a 3-D tensor of shape [numDirections, batchSize, hiddenSize], the output cell state from the last time step of the network. Additionally, if options.returnSequence is set to true, the third element is the 4-D output tensor of shape [steps, numDirections, batchSize, hiddenSize] containing every output from each time step in the temporal sequence.

Constraints for lstm()
operand allowed data types allowed ranks
input "float32", "float16" 3
weight same as input 3
recurrentWeight same as input 3
bias same as input 2
recurrentBias same as input 2
peepholeWeight same as input 2
initialHiddenState same as input 3
initialCellState same as input 3
outputs[0] same as input 3
outputs[1] same as input 3
outputs[2] if returnSequence is true same as input 4

MLLstmSupportLimits has the following members:

input, of type MLSupportLimits

MLSupportLimits for input operand.

weight, of type MLSupportLimits

MLSupportLimits for weight operand.

recurrentWeight, of type MLSupportLimits

MLSupportLimits for recurrentWeight operand.

bias, of type MLSupportLimits

MLSupportLimits for bias operand.

recurrentBias, of type MLSupportLimits

MLSupportLimits for recurrentBias operand.

peepholeWeight, of type MLSupportLimits

MLSupportLimits for peepholeWeight operand.

initialHiddenState, of type MLSupportLimits

MLSupportLimits for initialHiddenState operand.

initialCellState, of type MLSupportLimits

MLSupportLimits for initialCellState operand.

outputs, of type MLSupportLimits

MLSupportLimits for all the output operands.

MLOpSupportLimits has the following member for lstm():

lstm, of type MLLstmSupportLimits

Support limits for operator lstm().

The lstm(input, weight, recurrentWeight, steps, hiddenSize, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and any of input, weight, recurrentWeight, options.bias (if it exists), options.recurrentBias (if it exists), options.peepholeWeight (if it exists), options.initialHiddenState (if it exists), and options.initialCellState (if it exists) returns false, then throw a TypeError.

  3. Let numDirections be 2 if options.direction is "both", or 1 otherwise.

  4. If the dataType of any of input, weight or recurrentWeight is not one of its allowed data types (according to this table), then throw a TypeError.

  5. If the rank of any of input, weight or recurrentWeight is not its allowed rank, then throw a TypeError.

  6. If input’s shape[0] is not equal to steps, then throw a TypeError.

  7. Let batchSize be input’s shape[1].

  8. Let inputSize be input’s shape[2].

  9. If weight’s shape is not equal to « numDirections, 4 * hiddenSize, inputSize », then throw a TypeError.

  10. If recurrentWeight’s shape is not equal to « numDirections, 4 * hiddenSize, hiddenSize », then throw a TypeError.

  11. If hiddenSize * 8 is not a valid dimension, then throw a TypeError.

    Why hiddenSize * 8 ? Some underlying platforms operate on a single bias tensor which is a concatenation of bias and recurrentBias. Therefore, 4 * hiddenSize + 4 * hiddenSize also needs to be a valid dimension.
  12. If options.bias exists:

    1. If its dataType is not one of its allowed data types (according to this table), then throw a TypeError.

    2. If its shape is not equal to « numDirections, 4 * hiddenSize », then throw a TypeError.

  13. If options.recurrentBias exists:

    1. If its dataType is not one of its allowed data types (according to this table), then throw a TypeError.

    2. If its shape is not equal to « numDirections, 4 * hiddenSize », then throw a TypeError.

  14. If options.peepholeWeight exists:

    1. If its dataType is not one of its allowed data types (according to this table), then throw a TypeError.

    2. If its shape is not equal to « numDirections, 3 * hiddenSize », then throw a TypeError.

  15. If options.initialHiddenState exists:

    1. If its dataType is not one of its allowed data types (according to this table), then throw a TypeError.

    2. If its shape is not equal to « numDirections, batchSize, hiddenSize », then throw a TypeError.

  16. If options.initialCellState exists:

    1. If its dataType is not one of its allowed data types (according to this table), then throw a TypeError.

    2. If its shape is not equal to « numDirections, batchSize, hiddenSize », then throw a TypeError.

  17. If options.activations exists:

    1. If its size is not 3, then throw a TypeError.

    2. Let activations be a clone of options.activations.

  18. Otherwise:

    1. Let activations be « "sigmoid", "tanh", "tanh" ».

  19. Calculate the output shape:

    1. Let desc be the result of creating an MLOperandDescriptor given input’s dataType and « numDirections, batchSize, hiddenSize ».

    2. If options.returnSequence is true:

      1. Let desc2 be the result of creating an MLOperandDescriptor given input’s dataType and « steps, numDirections, batchSize, hiddenSize ».

  20. Make graph connections:

    1. Let operator be an operator for the "lstm" operation, given weight, recurrentWeight, steps, hiddenSize and options.

    2. Let output0 be the result of creating an MLOperand given this and desc.

    3. Let output1 be the result of creating an MLOperand given this and desc.

    4. If options.returnSequence is true:

      1. Let output2 be the result of creating an MLOperand given this and desc2.

      2. Let output be the list « output0, output1, output2 ».

      3. Set output0.[[operator]], output1.[[operator]] and output2.[[operator]] to operator.

    5. Otherwise:

      1. Let output be the list « output0, output1 ».

      2. Set output0.[[operator]] and output1.[[operator]] to operator.

    6. Set operator’s inputs to input, weight, and recurrentWeight.

    7. If options.bias exists, then add it to operator’s inputs.

    8. If options.recurrentBias exists, then add it to operator’s inputs.

    9. If options.peepholeWeight exists, then add it to operator’s inputs.

    10. If options.initialHiddenState exists, then add it to operator’s inputs.

    11. If options.initialCellState exists, then add it to operator’s inputs.

    12. Set operator’s activation functions to a clone of activations.

    13. Set operator’s output to output.

  21. Return output.

Using a squeeze() helper, the behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
function lstm(
  builder, input, weight, recurrentWeight, steps, hiddenSize, options) {
  const batchSize = input.shape[1];
  const inputSize = input.shape[2];
  const numDirections = (options.direction == 'both' ? 2 : 1);
  let hiddenState = options.initialHiddenState;
  let cellState = options.initialCellState;

  if (!hiddenState) {
    const desc = {dataType: 'float32', shape: [numDirections, 1, hiddenSize]};
    const totalSize = numDirections * hiddenSize;
    hiddenState = builder.constant(desc, new Float32Array(totalSize).fill(0));
  }

  if (!cellState) {
    const desc = {dataType: 'float32', shape: [numDirections, 1, hiddenSize]};
    const totalSize = numDirections * hiddenSize;
    cellState = builder.constant(desc, new Float32Array(totalSize).fill(0));
  }

  let sequence = null;
  let currentWeight = [];
  let currentRecurrentWeight = [];
  let currentBias = [];
  let currentRecurrentBias = [];
  let currentPeepholeWeight = [];

  for (let dir = 0; dir < numDirections; ++dir) {
    currentWeight.push(squeeze(
      builder,
      builder.slice(weight, [dir, 0, 0], [1, 4 * hiddenSize, inputSize])));
    currentRecurrentWeight.push(squeeze(
      builder,
      builder.slice(
        recurrentWeight, [dir, 0, 0], [1, 4 * hiddenSize, hiddenSize])));
    currentBias.push(
      options.bias ?
        (squeeze(
          builder,
          builder.slice(options.bias, [dir, 0], [1, 4 * hiddenSize]))) :
        null);
    currentRecurrentBias.push(
      options.recurrentBias ?
        (squeeze(
          builder,
          builder.slice(
            options.recurrentBias, [dir, 0], [1, 4 * hiddenSize]))) :
        null);
    currentPeepholeWeight.push(
      options.peepholeWeight ?
        (squeeze(
          builder,
          builder.slice(
            options.peepholeWeight, [dir, 0], [1, 3 * hiddenSize]))) :
        null);
  }

  for (let step = 0; step < steps; ++step) {
    let currentHidden = [];
    let currentCell = [];
    let nextHidden = null;
    let nextCell = null;

    for (let dir = 0; dir < numDirections; ++dir) {
      currentHidden.push(squeeze(
        builder,
        builder.slice(hiddenState, [dir, 0, 0], [1, batchSize, hiddenSize])));
      currentCell.push(squeeze(
        builder,
        builder.slice(cellState, [dir, 0, 0], [1, batchSize, hiddenSize])));
    }

    for (let dir = 0; dir < numDirections; ++dir) {
      let slice =
        (dir == 1 || options.direction == 'backward' ? steps - step - 1 : step);
      let currentInput = squeeze(
        builder,
        builder.slice(input, [slice, 0, 0], [1, batchSize, inputSize]));

      let results = builder.lstmCell(
        currentInput,
        currentWeight[dir],
        currentRecurrentWeight[dir],
        currentHidden[dir],
        currentCell[dir],
        hiddenSize,
        {
          bias: currentBias[dir],
          recurrentBias: currentRecurrentBias[dir],
          peepholeWeight: currentPeepholeWeight[dir],
          layout: options.layout,
          activations: options.activations
        });

      let output = builder.reshape(results[0], [1, batchSize, hiddenSize]);
      let cell = builder.reshape(results[1], [1, batchSize, hiddenSize]);

      nextHidden =
        (nextHidden ? builder.concat([nextHidden, output], 0) : output);
      nextCell = (nextCell ? builder.concat([nextCell, cell], 0) : cell);
    }

    hiddenState = nextHidden;
    cellState = nextCell;

    if (options.returnSequence) {
      nextHidden =
        builder.reshape(nextHidden, [1, numDirections, batchSize, hiddenSize]);
      sequence =
        (sequence ? builder.concat([sequence, nextHidden], 0) : nextHidden);
    }
  }

  return (
    sequence ? [hiddenState, cellState, sequence] : [hiddenState, cellState]);
}

7.9.29. lstmCell

A single time step of the Long Short-Term Memory [LSTM] recurrent network using a cell state, an input, output, and forget gate to compute the cell state and the hidden state of the next time step that rolls into the output across the temporal sequence of the network.
dictionary MLLstmCellOptions : MLOperatorOptions {
  MLOperand bias;
  MLOperand recurrentBias;
  MLOperand peepholeWeight;
  MLLstmWeightLayout layout = "iofg";
  sequence<MLRecurrentNetworkActivation> activations;
};

partial interface MLGraphBuilder {
  sequence<MLOperand> lstmCell(MLOperand input,
                               MLOperand weight,
                               MLOperand recurrentWeight,
                               MLOperand hiddenState,
                               MLOperand cellState,
                               [EnforceRange] unsigned long hiddenSize,
                               optional MLLstmCellOptions options = {});
};

dictionary MLLstmCellSupportLimits {
  MLSupportLimits input;
  MLSupportLimits weight;
  MLSupportLimits recurrentWeight;
  MLSupportLimits hiddenState;
  MLSupportLimits cellState;
  MLSupportLimits bias;
  MLSupportLimits recurrentBias;
  MLSupportLimits peepholeWeight;
  MLSupportLimits outputs;
};

partial dictionary MLOpSupportLimits {
  MLLstmCellSupportLimits lstmCell;
};

MLLstmCellOptions has the following members:

bias, of type MLOperand

The 1-D input bias tensor of shape [4 * hiddenSize]. The ordering of the bias vectors in the first dimension of the tensor shape is specified according to the layout argument.

recurrentBias, of type MLOperand

The 1-D recurrent bias tensor of shape [4 * hiddenSize]. The ordering of the bias vectors in the first dimension of the tensor shape is specified according to the layout argument.

peepholeWeight, of type MLOperand

The 1-D weight tensor for peepholes of shape [3 * hiddenSize]. The pack ordering of the weight vectors is for the input (i), output (o), and forget (f) gate, respectively.

layout, of type MLLstmWeightLayout, defaulting to "iofg"

The ordering of the weight and bias vectors for the internal gates of LSTM, specifically the input (i), output (o), forget (f), and cell (g) gate, as indicated in the first dimension of the weight and bias tensor shapes.

activations, of type sequence<MLRecurrentNetworkActivation>

A list of three activation functions, the first one is used for the input (i), forget (f), and output (o) gate, the second one is used for the cell (g) gate, and the last used for filtering the output cell state before combining it with the result of the output gate to form the output hidden state. When not specified, defaults to a sequence of the "sigmoid", "tanh", and "tanh" functions, respectively.

Arguments:

Returns: sequence<MLOperand>. The first element is the output hidden state of the current time step of the recurrent network. The following element is the output cell state. Both elements are 2-D tensors of shape [batchSize, hiddenSize].

Constraints for lstmCell()
operand allowed data types allowed ranks
input "float32", "float16" 2
weight same as input 2
recurrentWeight same as input 2
hiddenState same as input 2
cellState same as input 2
bias same as input 1
recurrentBias same as input 1
peepholeWeight same as input 1
outputs[0] same as input 2
outputs[1] same as input 2

MLLstmCellSupportLimits has the following members:

input, of type MLSupportLimits

MLSupportLimits for input operand.

weight, of type MLSupportLimits

MLSupportLimits for weight operand.

recurrentWeight, of type MLSupportLimits

MLSupportLimits for recurrentWeight operand.

hiddenState, of type MLSupportLimits

MLSupportLimits for hiddenState operand.

cellState, of type MLSupportLimits

MLSupportLimits for cellState operand.

bias, of type MLSupportLimits

MLSupportLimits for bias operand.

recurrentBias, of type MLSupportLimits

MLSupportLimits for recurrentBias operand.

peepholeWeight, of type MLSupportLimits

MLSupportLimits for peepholeWeight operand.

outputs, of type MLSupportLimits

MLSupportLimits for all the output operands.

MLOpSupportLimits has the following member for lstmCell():

lstmCell, of type MLLstmCellSupportLimits

Support limits for operator lstmCell().

The lstmCell(input, weight, recurrentWeight, hiddenState, cellState, hiddenSize, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and any of input, weight, recurrentWeight, hiddenState, cellState, options.bias (if it exists), options.recurrentBias (if it exists), and options.peepholeWeight (if it exists) returns false, then throw a TypeError.

  3. If the dataType of any of input, weight, recurrentWeight, hiddenState or cellState is not one of its allowed data types (according to this table), then throw a TypeError.

  4. If the rank of any of input, weight, recurrentWeight, hiddenState or cellState is not its allowed rank, then throw a TypeError.

  5. Let batchSize be input’s shape[0].

  6. Let inputSize be input’s shape[1].

  7. If weight’s shape is not equal to « 4 * hiddenSize, inputSize », then throw a TypeError.

  8. If recurrentWeight’s shape is not equal to « 4 * hiddenSize, hiddenSize », then throw a TypeError.

  9. If hiddenState’s shape is not equal to « batchSize, hiddenSize », then throw a TypeError.

  10. If cellState’s shape is not equal to « batchSize, hiddenSize », then throw a TypeError.

  11. If hiddenSize * 8 is not a valid dimension, then throw a TypeError.

    Why hiddenSize * 8 ? Some underlying platforms operate on a single bias tensor which is a concatenation of bias and recurrentBias. Therefore, 4 * hiddenSize + 4 * hiddenSize also needs to be a valid dimension.
  12. If options.bias exists:

    1. If its dataType is not one of its allowed data types (according to this table), then throw a TypeError.

    2. If its shape is not equal to « 4 * hiddenSize », then throw a TypeError.

  13. If options.recurrentBias exists:

    1. If its dataType is not one of its allowed data types (according to this table), then throw a TypeError.

    2. If its shape is not equal to « 4 * hiddenSize », then throw a TypeError.

  14. If options.peepholeWeight exists:

    1. If its dataType is not one of its allowed data types (according to this table), then throw a TypeError.

    2. If its shape is not equal to « 3 * hiddenSize », then throw a TypeError.

  15. If options.activations exists:

    1. If its size is not 3, then throw a TypeError.

    2. Let activations be a clone of options.activations.

  16. Otherwise:

    1. Let activations be « "sigmoid", "tanh", "tanh" ».

  17. Let desc be a new MLOperandDescriptor.

  18. Set desc.shape to the list « batchSize, hiddenSize ».

  19. Set desc.dataType to input’s dataType.

  20. Make graph connections:

    1. Let output0 be the result of creating an MLOperand given this and desc.

    2. Let output1 be the result of creating an MLOperand given this and desc.

    3. Let output be the list « output0, output1 ».

    4. Let operator be an operator for the "lstmCell" operation, given weight, recurrentWeight, hiddenState, cellState, hiddenSize and options.

    5. Set output0.[[operator]] and output1.[[operator]] to operator.

    6. Set operator’s inputs to input, weight, recurrentWeight, hiddenState, and cellState.

    7. If options.bias exists, then add it to operator’s inputs.

    8. If options.recurrentBias exists, then add it to operator’s inputs.

    9. If options.peepholeWeight exists, then add it to operator’s inputs.

    10. Set operator’s activation functions to a clone of activations.

    11. Set operator’s output to output.

  21. Return output.

The behavior of this operation when the weight layout is the default "iofg" layout, and the activation functions of the input/forget/output gate and the cell gate/the cell state’s filter for the output hidden state are sigmoid() and tanh() respectively can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
function lstmCell(
  builder,
  input,
  weight,
  recurrentWeight,
  hiddenState,
  cellState,
  hiddenSize,
  options) {
  const zero = builder.constant(input.dataType, 0);

  const inputSize = input.shape[1];

  // input gate (i)
  let i = builder.sigmoid(builder.add(
    builder.mul(
      cellState,
      (options.peepholeWeight ?
         builder.slice(options.peepholeWeight, [0], [hiddenSize]) :
         zero)),
    builder.add(
      builder.add(
        (options.bias ? builder.slice(options.bias, [0], [hiddenSize]) : zero),
        (options.recurrentBias ?
           builder.slice(options.recurrentBias, [0], [hiddenSize]) :
           zero)),
      builder.add(
        builder.matmul(
          input,
          builder.transpose(
            builder.slice(weight, [0, 0], [hiddenSize, inputSize]))),
        builder.matmul(
          hiddenState,
          builder.transpose(builder.slice(
            recurrentWeight, [0, 0], [hiddenSize, hiddenSize])))))));

  // forget gate (f)
  let f = builder.sigmoid(builder.add(
    builder.mul(
      cellState,
      (options.peepholeWeight ?
         builder.slice(options.peepholeWeight, [2 * hiddenSize], [hiddenSize]) :
         zero)),
    builder.add(
      builder.add(
        (options.bias ?
           builder.slice(options.bias, [2 * hiddenSize], [hiddenSize]) :
           zero),
        (options.recurrentBias ?
           builder.slice(
             options.recurrentBias, [2 * hiddenSize], [hiddenSize]) :
           zero)),
      builder.add(
        builder.matmul(
          input,
          builder.transpose(builder.slice(
            weight, [2 * hiddenSize, 0], [hiddenSize, inputSize]))),
        builder.matmul(
          hiddenState,
          builder.transpose(builder.slice(
            recurrentWeight,
            [2 * hiddenSize, 0],
            [hiddenSize, hiddenSize])))))));

  // cell gate (g)
  let g = builder.tanh(builder.add(
    builder.add(
      (options.bias ?
         builder.slice(options.bias, [3 * hiddenSize], [hiddenSize]) :
         zero),
      (options.recurrentBias ?
         builder.slice(options.recurrentBias, [3 * hiddenSize], [hiddenSize]) :
         zero)),
    builder.add(
      builder.matmul(
        input,
        builder.transpose(
          builder.slice(weight, [3 * hiddenSize, 0], [hiddenSize, inputSize]))),
      builder.matmul(
        hiddenState,
        builder.transpose(builder.slice(
          recurrentWeight, [3 * hiddenSize, 0], [hiddenSize, hiddenSize]))))));

  // output gate (o)
  let o = builder.sigmoid(builder.add(
    builder.mul(
      cellState,
      (options.peepholeWeight ?
         builder.slice(options.peepholeWeight, [hiddenSize], [hiddenSize]) :
         zero)),
    builder.add(
      builder.add(
        (options.bias ?
           builder.slice(options.bias, [hiddenSize], [hiddenSize]) :
           zero),
        (options.recurrentBias ?
           builder.slice(options.recurrentBias, [hiddenSize], [hiddenSize]) :
           zero)),
      builder.add(
        builder.matmul(
          input,
          builder.transpose(
            builder.slice(weight, [hiddenSize, 0], [hiddenSize, inputSize]))),
        builder.matmul(
          hiddenState,
          builder.transpose(builder.slice(
            recurrentWeight, [hiddenSize, 0], [hiddenSize, hiddenSize])))))));

  // output cell state (ct)
  let ct = builder.add(builder.mul(f, cellState), builder.mul(i, g));

  // output hidden state (ht)
  let ht = builder.mul(o, builder.tanh(ct));

  return [ht, ct];
}

7.9.30. matmul

Compute the matrix product of two input tensors.
partial interface MLGraphBuilder {
  MLOperand matmul(MLOperand a, MLOperand b, optional MLOperatorOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLBinarySupportLimits matmul;
};
Arguments:

Returns: an MLOperand. The output tensor that contains the matrix product of two input tensors.

Computes the matrix product of two input tensors as follows:
Constraints for matmul()
operand allowed data types allowed ranks
a "float32", "float16" 2 or greater
b same as a 2 or greater
output same as a maximum of a's rank and b's rank

MLOpSupportLimits has the following member for matmul():

matmul, of type MLBinarySupportLimits

Support limits for operator matmul().

To calculate matmul output sizes, given MLOperand a and MLOperand b run the following steps:
  1. Let shapeA be a clone of a’s shape.

  2. Let rankA be a’s rank.

  3. Let shapeB be a clone of b’s shape.

  4. Let rankB be b’s rank.

  5. If either rankA or rankB is less than 2, then throw a TypeError.

  6. Let colsA be shapeA[rankA - 1].

  7. Let rowsA be shapeA[rankA - 2].

  8. Let colsB be shapeB[rankB - 1].

  9. Let rowsB be shapeB[rankB - 2].

  10. If colsA is not equal to rowsB, then throw a TypeError.

  11. Let batchShapeA be a clone of shapeA with the spatial dimensions (last 2 items) removed.

  12. Let batchShapeB be a clone of shapeB with the spatial dimensions (last 2 items) removed.

  13. Let outputShape be the result of bidirectionally broadcasting batchShapeA and batchShapeB. If that returns failure, then throw a TypeError.

  14. Append « rowsA, colsB » to outputShape.

  15. Return outputShape.

The matmul(a, b, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and any of a and b returns false, then throw a TypeError.

  3. If the dataType of any of a or b is not one of its allowed data types (according to this table), then throw a TypeError.

  4. Let outputShape be the result of calculating matmul output sizes given a and b.

    1. If that throws an error, re-throw the error.

  5. Let desc be the result of creating an MLOperandDescriptor given a’s dataType and outputShape.

  6. Make graph connections:

    1. Let output be the result of creating an MLOperand given this and desc.

    2. Let operator be an operator for the "matmul" operation, given options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s inputs to a and b.

    5. Set operator’s output to output.

  7. Return output.

7.9.31. pad

Inflate the tensor with constant or mirrored values on the edges.
enum MLPaddingMode {
  "constant",
  "edge",
  "reflection",
  "symmetric"
};

dictionary MLPadOptions : MLOperatorOptions {
  MLPaddingMode mode = "constant";
  MLNumber value = 0;
};

partial interface MLGraphBuilder {
  MLOperand pad(MLOperand input,
                sequence<[EnforceRange] unsigned long> beginningPadding,
                sequence<[EnforceRange] unsigned long> endingPadding,
                optional MLPadOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits pad;
};

MLPadOptions has the following members:

mode, of type MLPaddingMode, defaulting to "constant"

The different ways to pad the tensor.

value, of type MLNumber, defaulting to 0

The padding value when mode is set to "constant".

Arguments:

Returns: an MLOperand. The padded output tensor. Each dimension of the output tensor can be calculated as follows:

output size = beginning padding + input size + ending padding

Constraints for pad()
operand allowed data types allowed ranks
input any N
output same as input same as input

MLOpSupportLimits has the following member for pad():

pad, of type MLSingleInputSupportLimits

Support limits for operator pad().

To calculate padding output sizes, given input, beginningPadding and endingPadding, run the following steps:
  1. Let shape be a copy of input’s shape.

  2. For each index in the range 0 to shape’s rank, exclusive:

    1. Add to shape[index] the value of beginningPadding[index].

    2. Add to shape[index] the value of endingPadding[index].

  3. Return shape.

The pad(input, beginningPadding, endingPadding, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and input returns false, then throw a TypeError.

  3. If input’s rank is 0, then throw a TypeError.

  4. If beginningPadding’s size and endingPadding’s size are not both equal to input’s rank, then throw a TypeError.

  5. Let desc be a copy of input.[[descriptor]].

  6. Let outputShape be the result of calculating padding output sizes given input, beginningPadding and endingPadding.

  7. If any item in outputShape is not a valid dimension, then throw a TypeError.

  8. Set options.value to the result of casting options.value to input’s dataType.

  9. Set desc.shape to outputShape.

  10. Make graph connections:

    1. Let output be the result of creating an MLOperand given this and desc.

    2. Let operator be an operator for the "padding" operation, given beginningPadding, endingPadding and options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s input to input.

    5. Set operator’s output to output.

  11. Return output.

Examples for constant, edge, reflection and symmetric padding:
// input: [[1,2,3], [4,5,6]]
const input = builder.constant(
  {dataType: 'float32', shape: [2, 3]}, new Float32Array([1, 2, 3, 4, 5, 6]));

const beginningPadding = [1, 2];
const endingPadding = [1, 2];

// "constant" padded:
//    [[0,0,0,0,0,0,0],
//     [0,0,1,2,3,0,0],
//     [0,0,4,5,6,0,0],
//     [0,0,0,0,0,0,0]]
builder.pad(input, beginningPadding, endingPadding);

// "edge" padded:
//    [[1,1,1,2,3,3,3],
//     [1,1,1,2,3,3,3],
//     [4,4,4,5,6,6,6],
//     [4,4,4,5,6,6,6]]
builder.pad(input, beginningPadding, endingPadding, {mode: 'edge'});

// "reflection" padded:
//    [[6,5,4,5,6,5,4],
//     [3,2,1,2,3,2,1],
//     [6,5,4,5,6,5,4],
//     [3,2,1,2,3,2,1]]
builder.pad(input, beginningPadding, endingPadding, {mode: 'reflection'});

// "symmetric" padded:
//    [[2,1,1,2,3,3,2],
//     [2,1,1,2,3,3,2],
//     [5,4,4,5,6,6,5],
//     [5,4,4,5,6,6,5]]
builder.pad(input, beginningPadding, endingPadding, {mode: 'symmetric'});

7.9.32. Pooling operations

Compute a pooling operation across all the elements within the moving window over the input tensor.
enum MLRoundingType {
  "floor",
  "ceil"
};

dictionary MLPool2dOptions : MLOperatorOptions {
  sequence<[EnforceRange] unsigned long> windowDimensions;
  sequence<[EnforceRange] unsigned long> padding;
  sequence<[EnforceRange] unsigned long> strides;
  sequence<[EnforceRange] unsigned long> dilations;
  MLInputOperandLayout layout = "nchw";
  MLRoundingType roundingType = "floor";
  sequence<[EnforceRange] unsigned long> outputSizes;
};

partial interface MLGraphBuilder {
  MLOperand averagePool2d(MLOperand input, optional MLPool2dOptions options = {});
  MLOperand l2Pool2d(MLOperand input, optional MLPool2dOptions options = {});
  MLOperand maxPool2d(MLOperand input, optional MLPool2dOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits averagePool2d;
  MLSingleInputSupportLimits l2Pool2d;
  MLSingleInputSupportLimits maxPool2d;
};

MLPool2dOptions has the following members:

windowDimensions, of type sequence<[EnforceRange] unsigned long>

A list of length 2: [windowHeight, windowWidth]. Specifies the dimensions of the sliding window. The default value for the window dimensions are the height and width dimensions of the input shape.

padding, of type sequence<[EnforceRange] unsigned long>

A list of length 4: [beginningHeight, endingHeight, beginningWidth, endingWidth]. Specifies the additional rows and columns added to the beginning and ending of each spatial dimension of the convolution input. The default value is [0,0,0,0].

strides, of type sequence<[EnforceRange] unsigned long>

A list of length 2: [strideHeight, strideWidth]. Specifies the stride of the sliding window for each spatial dimension of the convolution input. The default value is [1,1].

dilations, of type sequence<[EnforceRange] unsigned long>

A list of length 2: [dilationHeight, dilationWidth]. Specifies the dilation factor for each spatial dimension applied on the convolution filter (kernel). The default value is [1,1].

layout, of type MLInputOperandLayout, defaulting to "nchw"

Specifies the layout format of the input and output tensor as follows:

  • "nchw"

    • input tensor: [batches, inputChannels, height, width]

    • output tensor: [batches, outputChannels, height, width]

  • "nhwc":

    • input tensor: [batches, height, width, inputChannels]

    • output tensor: [batches, height, width, outputChannels]

roundingType, of type MLRoundingType, defaulting to "floor"

The rounding function used to compute the output shape.

outputSizes, of type sequence<[EnforceRange] unsigned long>

A list of length 2. Specifies the sizes of the two spacial dimensions of the output tensor. When the output sizes are explicitly specified, the roundingType is ignored.

If not specified, the output sizes are automatically computed.

Arguments:

Returns: an MLOperand. The output 4-D tensor that contains the result of the reduction. The logical shape is interpreted according to the value of layout. More specifically, if the options.roundingType is "floor", the spatial dimensions of the output tensor can be calculated as follows:

output size = floor(1 + (input size - filter size + beginning padding + ending padding) / stride)

or if options.roundingType is "ceil":

output size = ceil(1 + (input size - filter size + beginning padding + ending padding) / stride)

Constraints for pooling operations
operand allowed data types allowed ranks
input specified as part of operation steps 4
output same as input 4

MLOpSupportLimits has the following members for pooling operations:

averagePool2d, of type MLSingleInputSupportLimits

Support limits for operator averagePool2d().

l2Pool2d, of type MLSingleInputSupportLimits

Support limits for operator l2Pool2d().

maxPool2d, of type MLSingleInputSupportLimits

Support limits for operator maxPool2d().

A global pooling operation such as one for the max pooling operation is a variant of pooling where the window dimensions is the spatial dimensions (last two dimensions) of the input shape, as follows.
// 'global' max pooling
builder.maxPool2d(input);
To calculate pool2d output sizes given MLInputOperandLayout layout, list of 4 unsigned integers inputShape, MLRoundingType roundingType, list of 2 unsigned integers windowDimensions, list of 4 unsigned integers padding, list of 2 unsigned integers strides, list of 2 unsigned integers dilations, and optional list of 2 unsigned integers outputSizes, perform these steps. They return a list of 4 unsigned integers.
  1. Switch on layout:

    "nchw"

    Let « batches, channels, inputHeight, inputWidth » be inputShape.

    "nhwc"

    Let « batches, inputHeight, inputWidth, channels » be inputShape.

  2. If outputSizes is given, then let « outputHeight, outputWidth » be outputSizes.

  3. Otherwise:

    1. Let outputSizes be the result of calculating conv2d output sizes given inputHeight, inputWidth, windowDimensions[0], windowDimensions[1], padding, strides, and dilations.

    2. Let « outputHeight, outputWidth » be outputSizes.

    3. Switch on roundingType:

      "floor"
      1. Set outputWidth to floor(outputWidth).

      2. Set outputHeight to floor(outputHeight).

      "ceil"
      1. Set outputWidth to ceiling(outputWidth).

      2. Set outputHeight to ceiling(outputHeight).

  4. Switch on layout:

    "nchw"

    Return « batches, channels, outputHeight, outputWidth ».

    "nhwc"

    Return « batches, outputHeight, outputWidth, channels ».

To create pooling operation given string op, MLOperand input, MLPool2dOptions options, and optional list allowedDataTypes, run the following steps:
  1. Assert: op is one of "averagePool2d", "l2Pool2d", "maxPool2d".

  2. If this can not build, then throw an "InvalidStateError" DOMException.

  3. If validating operand with this and input returns false, then throw a TypeError.

  4. If allowedDataTypes is given and it does not contain input’s dataType, then throw a TypeError.

  5. If input’s rank is not 4, then throw a TypeError.

  6. If options.windowDimensions exists and its size is not 2, then throw a TypeError.

  7. Otherwise, set options.windowDimensions to the height and width dimensions of the shape of input.

  8. If options.outputSizes exists, or if options.padding does not exist, set options.padding to the list « 0, 0, 0, 0 ».

  9. If options.padding's size is not 4, then throw a TypeError.

  10. If options.strides does not exist, set options.strides to the list « 1, 1 ».

  11. If options.strides's size is not 2, then throw a TypeError.

  12. If any value in options.strides is not greater than 0, then throw a TypeError.

  13. If options.outputSizes exists:

    1. If its size is not 2, then throw a TypeError.

    2. If its elements are not smaller than the elements at the same dimension (index) for options.strides, then throw a TypeError.

  14. If options.dilations does not exist, set options.dilations to the list « 1, 1 ».

  15. If options.dilations's size is not 2, then throw a TypeError.

  16. If any value in options.dilations is not greater than 0, then throw a TypeError.

  17. Let desc be a copy of input.[[descriptor]].

  18. Let outputShape be the result of calculating pool2d output sizes given options.layout, input’s shape, options.roundingType, options.windowDimensions, options.padding, options.strides, options.dilations, and options.outputSizes (if it exists).

  19. If any item in outputShape is not a valid dimension, then throw a TypeError.

  20. Set desc.shape to outputShape.

  21. Make graph connections:

    1. Let output be the result of creating an MLOperand given this and desc.

    2. Let operator be an operator for the op operation, given options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s input to input.

    5. Set operator’s output to output.

  22. Return output.

The following pooling algorithms are supported.
The averagePool2d(input, options) method steps are:
  1. Let output be the result of running the create pooling operation given "averagePool2d", input, options, and « "float32", "float16" ».

    1. If that throws an error, then re-throw the error.

  2. Return output.

The l2Pool2d(input, options) method steps are:
  1. Let output be the result of running the create pooling operation given "l2Pool2d", input, options, and « "float32", "float16" ».

    1. If that throws an error, then re-throw the error.

  2. Return output.

The maxPool2d(input, options) method steps are:
  1. Let output be the result of running the create pooling operation given "maxPool2d", input and options.

    1. If that throws an error, then re-throw the error.

  2. Return output.

7.9.32.1. averagePool2d
Calculate the average value for patches of a feature map, and use it to create a pooled feature map. See § 7.9.32 Pooling operations for more detail.
7.9.32.2. l2Pool2d
Apply the L2 norm function to a region of the input feature map. The L2 norm is the square root of the sum of the squares of its elements. See § 7.9.32 Pooling operations for more detail.
7.9.32.3. maxPool2d
Calculate the maximum value for patches of a feature map, and use it to create a pooled feature map. See § 7.9.32 Pooling operations for more detail.

7.9.33. prelu

Calculate the parametric version of rectified linear function (Parametric ReLU) on the input tensor element-wise. Parametric ReLU is a type of leaky ReLU that, instead of having a scalar slope like 0.01, making the slope (coefficient of leakage) into a parameter that is learned during the model training phase of this operation. The calculation follows the expression max(0, x) + slope * min(0, x).

The operation will be broadcast according to [numpy-broadcasting-rule]. The input tensors must be bidirectionally broadcastable. The rank of the output tensor is the maximum rank of the input tensors. For each dimension of the output tensor, its size is the maximum size along that dimension of the input tensors.

partial interface MLGraphBuilder {
  MLOperand prelu(MLOperand input,
                  MLOperand slope,
                  optional MLOperatorOptions options = {});
};

dictionary MLPreluSupportLimits {
  MLSupportLimits input;
  MLSupportLimits slope;
  MLSupportLimits output;
};

partial dictionary MLOpSupportLimits {
  MLPreluSupportLimits prelu;
};
Arguments:

Returns:

Constraints for prelu()
operand allowed data types allowed ranks
input "float32", "float16", "int32", "int8" N
slope same as input N
output same as input maximum of input's rank and slope's rank

MLPreluSupportLimits has the following members:

input, of type MLSupportLimits

MLSupportLimits for input operand.

slope, of type MLSupportLimits

MLSupportLimits for slope operand.

output, of type MLSupportLimits

MLSupportLimits for output operand.

MLOpSupportLimits has the following member for prelu():

prelu, of type MLPreluSupportLimits

Support limits for operator prelu().

The prelu(input, slope, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and any of input and slope returns false, then throw a TypeError.

  3. If the dataType of any of input or slope is not one of its allowed data types (according to this table), then throw a TypeError.

  4. Let outputShape be to the result of bidirectionally broadcasting slope’s shape and input’s shape.

    1. If that returns failure, then throw a TypeError.

  5. Let descriptor be the result of creating an MLOperandDescriptor given input’s dataType and outputShape.

  6. Make graph connections:

    1. Let output be the result of creating an MLOperand given this and descriptor.

    2. Let operator be an operator for the "prelu" operation, given slope and options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s inputs to input and slope.

    5. Set operator’s output to output.

  7. Return output.

The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
function prelu(builder, input, slope) {
  return builder.add(
    builder.max(builder.constant(input.dataType, 0), input),
    builder.mul(
      slope, builder.min(builder.constant(input.dataType, 0), input)));
}

7.9.34. Reduction operations

Reduce the input tensor along all dimensions, or along the axes specified in the axes array parameter. For each specified axis, the dimension with that index is reduced, i.e. the resulting tensor will not contain it, unless the keepDimensions option is specified. The values of the resulting tensor are calculated using the specified reduction function that takes as parameters all the input values across the reduced dimensions.
dictionary MLReduceOptions : MLOperatorOptions {
  sequence<[EnforceRange] unsigned long> axes;
  boolean keepDimensions = false;
};

partial interface MLGraphBuilder {
  MLOperand reduceL1(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceL2(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceLogSum(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceLogSumExp(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceMax(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceMean(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceMin(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceProduct(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceSum(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceSumSquare(MLOperand input, optional MLReduceOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits reduceL1;
  MLSingleInputSupportLimits reduceL2;
  MLSingleInputSupportLimits reduceLogSum;
  MLSingleInputSupportLimits reduceLogSumExp;
  MLSingleInputSupportLimits reduceMax;
  MLSingleInputSupportLimits reduceMean;
  MLSingleInputSupportLimits reduceMin;
  MLSingleInputSupportLimits reduceProduct;
  MLSingleInputSupportLimits reduceSum;
  MLSingleInputSupportLimits reduceSumSquare;
};

MLReduceOptions has the following members:

axes, of type sequence<[EnforceRange] unsigned long>

The dimensions to reduce, which also specifies which of the values in the input tensor are used with the reduction function. The axes in the list must be in the range [0, N-1] where N is the rank of the input tensor.

If not present, all dimensions are reduced. The input values for the reduction function are all of the values in the input tensor.

If present and not empty, the input values for the reduction function are all the values for the specified dimensions of the input tensor.

If present and empty, no dimensions are reduced, and the shape of the output tensor is the same as the shape of the input tensor; the reduction function is applied to each value in the tensor individually.

keepDimensions, of type boolean, defaulting to false

If true, the output has the same rank as the input, setting any reduced dimensions to size 1.

Arguments:

Returns: an MLOperand. The reduced output tensor. If the input operand is a scalar, the reduction function is applied to the scalar value, and the output is also a scalar.

Constraints for reduction-operations
operand allowed data types allowed ranks
input specified as part of operation steps N
output same as input 0 to input's rank, depending on axes and keepDimensions

MLOpSupportLimits has the following members for reduction operations:

reduceL1, of type MLSingleInputSupportLimits

Support limits for operator reduceL1().

reduceL2, of type MLSingleInputSupportLimits

Support limits for operator reduceL2().

reduceLogSum, of type MLSingleInputSupportLimits

Support limits for operator reduceLogSum().

reduceLogSumExp, of type MLSingleInputSupportLimits

Support limits for operator reduceLogSumExp().

reduceMax, of type MLSingleInputSupportLimits

Support limits for operator reduceMax().

reduceMean, of type MLSingleInputSupportLimits

Support limits for operator reduceMean().

reduceMin, of type MLSingleInputSupportLimits

Support limits for operator reduceMin().

reduceProduct, of type MLSingleInputSupportLimits

Support limits for operator reduceProduct().

reduceSum, of type MLSingleInputSupportLimits

Support limits for operator reduceSum().

reduceSumSquare, of type MLSingleInputSupportLimits

Support limits for operator reduceSumSquare().

Reduction types:
To calculate reduction output sizes, given a list of unsigned integers inputShape, a optional list of unsigned integers axes, and boolean keepDimensions, perform the following steps. They return a new list of unsigned integers, or failure.
  1. Let inputRank be inputShape’s size.

  2. If axes is not given, let axes be the range 0 to inputRank, exclusive.

  3. Otherwise, if axes contains duplicate values, or if any of its elements is not in the range 0 to inputRank, exclusive, then return failure.

  4. If keepDimensions is true, then:

    1. Let outputShape be a clone of inputShape.

    2. For each axis of axes:

      1. Set outputShape[axis] to 1.

  5. Otherwise:

    1. Let outputShape be an empty list.

    2. For each index in the range 0 to inputRank, exclusive:

      1. If axes does not contain index, then append inputShape[index].

  6. Return outputShape.

To create reduction operation given string op, MLOperand input, MLReduceOptions options, and optional list allowedDataTypes, run the following steps:
  1. Assert: op is one of "reduceL1", "reduceL2", "reduceLogSum", "reduceLogSumExp", "reduceMax", "reduceMean", "reduceMin", "reduceProduct", "reduceSum", "reduceSumSquare".

  2. If this can not build, then throw an "InvalidStateError" DOMException.

  3. If validating operand with this and input returns false, then throw a TypeError.

  4. If allowedDataTypes is given and it does not contain input’s dataType, then throw a TypeError.

  5. Let outputShape be the result of calculating reduction output sizes given input’s shape, options.axes (if it exists), and options.keepDimensions. If that returns failure, then throw a TypeError.

  6. Let desc be the result of creating an MLOperandDescriptor given input’s dataType and outputShape.

  7. Make graph connections:

    1. Let output be the result of creating an MLOperand given this and desc.

    2. Let operator be an operator for the op operation, given options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s input to input.

    5. Set operator’s output to output.

  8. Return output.

The following reduction algorithms are supported.
The reduceL1(input, options) method steps are:
  1. Let output be the result of creating reduction operation given "reduceL1", input, options, and « "float32", "float16", "int32", "uint32" ».

    1. If that throws an error, then re-throw the error.

  2. Return output.

The reduceL2(input, options) method steps are:
  1. Let output be the result of creating reduction operation given "reduceL2", input, options, and « "float32", "float16" ».

    1. If that throws an error, then re-throw the error.

  2. Return output.

The reduceLogSum(input, options) method steps are:
  1. Let output be the result of creating reduction operation given "reduceLogSum", input, options, and « "float32", "float16" ».

    1. If that throws an error, then re-throw the error.

  2. Return output.

The reduceLogSumExp(input, options) method steps are:
  1. Let output be the result of creating reduction operation given "reduceLogSumExp", input, options, and « "float32", "float16" ».

    1. If that throws an error, then re-throw the error.

  2. Return output.

The reduceMax(input, options) method steps are:
  1. Let output be the result of creating reduction operation given "reduceMax", input and options.

    1. If that throws an error, then re-throw the error.

  2. Return output.

The reduceMean(input, options) method steps are:
  1. Let output be the result of creating reduction operation given "reduceMean", input, options, and « "float32", "float16" ».

    1. If that throws an error, then re-throw the error.

  2. Return output.

The reduceMin(input, options) method steps are:
  1. Let output be the result of creating reduction operation given "reduceMin", input and options.

    1. If that throws an error, then re-throw the error.

  2. Return output.

The reduceProduct(input, options) method steps are:
  1. Let output be the result of creating reduction operation given "reduceProduct", input, options, and « "float32", "float16", "int32", "uint32" ».

    1. If that throws an error, then re-throw the error.

  2. Return output.

The reduceSum(input, options) method steps are:
  1. Let output be the result of creating reduction operation given "reduceSum", input, options, and « "float32", "float16", "int32", "uint32" ».

    1. If that throws an error, then re-throw the error.

  2. Return output.

The reduceSumSquare(input, options) method steps are:
  1. Let output be the result of creating reduction operation given "reduceSumSquare", input, options, and « "float32", "float16", "int32", "uint32" ».

    1. If that throws an error, then re-throw the error.

  2. Return output.

The behavior of several reduction operations can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
function reduceLogSum(builder, input, options) {
  return builder.log(builder.reduceSum(input, options));
}

function reduceLogSumExp(builder, input, options) {
  return builder.log(builder.reduceSum(builder.exp(input), options));
}

function reduceSumSquare(builder, input, options) {
  return builder.reduceSum(builder.pow(input, 2), options);
}
Some underlying platforms do not support an option like keepDimensions directly. This does not affect the underlying tensor data, only the shape. For example, if the input shape is [2, 3, 4], the axis is 1, and keepDimensions is true, the expected output shape is [2, 1 ,4]. If the underlying platform never keeps reduced dimensions it will produce an output shape of [2, 4]. The implementation can introduce a no-op reshape to [2, 1, 4]. A similar no-op reshape can be introduced if keepDimensions is false but the underlying platform always keeps reduced dimensions.

7.9.35. relu

Compute the rectified linear function of the input tensor.
partial interface MLGraphBuilder {
  MLOperand relu(MLOperand input, optional MLOperatorOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits relu;
};
Arguments:

Returns:

Constraints for relu()
operand allowed data types allowed ranks
input "float32", "float16", "int32", "int8" N
output same as input same as input

MLOpSupportLimits has the following member for relu():

relu, of type MLSingleInputSupportLimits

Support limits for operator relu().

The relu(input, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and input returns false, then throw a TypeError.

  3. If input’s dataType is not one of its allowed data types (according to this table), then throw a TypeError.

  4. Make graph connections:

    1. Let output be the result of copying an MLOperand given input.

    2. Let operator be an operator for the "relu" operation, given options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s input to input.

    5. Set operator’s output to output.

  5. Return output.

The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
function relu(builder, input) {
  return builder.max(builder.constant(input.dataType, 0), input);
}

7.9.36. resample2d

Resample the tensor values from the source to the destination dimensions according to the axes and scaling factors.
enum MLInterpolationMode {
  "nearest-neighbor",
  "linear"
};

dictionary MLResample2dOptions : MLOperatorOptions {
  MLInterpolationMode mode = "nearest-neighbor";
  sequence<float> scales;
  sequence<[EnforceRange] unsigned long> sizes;
  sequence<[EnforceRange] unsigned long> axes;
};

partial interface MLGraphBuilder {
  MLOperand resample2d(MLOperand input, optional MLResample2dOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits resample2d;
};
Arguments:

Returns: an MLOperand. The output 4-D tensor.

MLResample2dOptions has the following members:

mode, of type MLInterpolationMode, defaulting to "nearest-neighbor"

The interpolation algorithm used to fill the output tensor values.

scales, of type sequence<float>

A list of length 2. Specifies the scaling factor for each input dimension from axes: [scaleForFirstAxis, scaleForSecondAxis]. The default value is [1.0, 1.0].

sizes, of type sequence<[EnforceRange] unsigned long>

A list of length 2. Specifies the target sizes for each input dimension from axes: [sizeForFirstAxis, sizeForSecondAxis]. When the target sizes are specified, the scales argument is ignored, since the scaling factor values are derived from the target sizes of the input.

axes, of type sequence<[EnforceRange] unsigned long>

A list of length 2. Specifies the two dimensions of the input tensor to which the interpolation algorithm applies. The default value is [2, 3].

Constraints for resample2d()
operand allowed data types allowed ranks
input "float32", "float16" 4
output same as input 4

MLOpSupportLimits has the following member for resample2d():

resample2d, of type MLSingleInputSupportLimits

Support limits for operator resample2d().

To check resample options given options and input, run the following steps:
  1. If options.scales does not exist, set it to the list « 1.0, 1.0 ».

  2. Otherwise, if any of its values is not greater than 0, or if its size is not 2, return false.

  3. If options.sizes exists, and if its size is not 2, or if any of its values is not greater than 0, return false.

  4. If options.axes does not exists, set it to the list « 2, 3 ».

  5. Otherwise, if options.axes contains duplicate values, or if any of its elements is not in the range 0 to input’s rank, exclusive, then return false.

  6. Return true.

To calculate resample output sizes given MLOperand input and MLResample2dOptions options, run the following steps:
  1. Let inputDescriptor be input.[[descriptor]].

  2. Let outputShape be a clone of inputDescriptor.shape.

  3. For each index in the range 0 to options.axes's size, exclusive:

    1. If options.sizes exists, then let size be options.sizes[index].

    2. Otherwise, let size be floor(input’s shape[options.axes[index]] * options.scales[index]).

    3. If size is not a valid dimension, then return failure.

    4. Set outputShape[options.axes[index]] to size.

  4. Return the result of creating an MLOperandDescriptor given inputDescriptor.dataType and outputShape.

The resample2d(input, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and input returns false, then throw a TypeError.

  3. If input’s dataType is not one of its allowed data types (according to this table), then throw a TypeError.

  4. If input’s rank is not its allowed rank, then throw a TypeError.

  5. If checking resample options given options and input returns false, then throw a TypeError.

  6. Let desc be the result of calculating resample output sizes given input and options. If that returns failure, then throw a TypeError.

  7. Make graph connections:

    1. Let output be the result of creating an MLOperand given this and desc.

    2. Let operator be an operator for the "resample2d" operation, given options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s input to input.

    5. Set operator’s output to output.

  8. Return output.

7.9.37. reshape

Alter the shape of a tensor to a new shape. Reshape does not copy or change the content of the tensor. It just changes the tensor’s logical shape for the subsequent operations.
partial interface MLGraphBuilder {
  MLOperand reshape(MLOperand input,
                    sequence<[EnforceRange] unsigned long> newShape,
                    optional MLOperatorOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits reshape;
};
Arguments:

Returns: an MLOperand. The output tensor. The values of the output tensor are the same as values of the input tensor. The shape of the output tensor is specified by the newShape argument.

Constraints for reshape()
operand allowed data types allowed ranks
input any N
output same as input newShape's size

MLOpSupportLimits has the following member for reshape():

reshape, of type MLSingleInputSupportLimits

Support limits for operator reshape().

The reshape(input, newShape, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and input returns false, then throw a TypeError.

  3. Let outputShape be an empty array of unsigned long.

  4. If newShape’s size is 0, set outputShape to an empty list for a scalar.

  5. If any item in newShape is not a valid dimension, then throw a TypeError.

  6. Let inputElementCount be the product of all elements in input’s shape. Empty dimensions yield an inputElementCount of 1.

  7. If product of all values in newShape is not equal to inputElementCount, then throw a TypeError.

  8. Let desc be a copy of input.[[descriptor]].

  9. Set desc.shape to newShape.

  10. Make graph connections:

    1. Let output be the result of creating an MLOperand given this and desc.

    2. Let operator be an operator for the "reshape" operation, given options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s input to input.

    5. Set operator’s output to output.

  11. Return output.

7.9.38. sigmoid

Compute the sigmoid function of the input tensor. The calculation follows the expression 1 / (exp(-x) + 1).
partial interface MLGraphBuilder {
  MLOperand sigmoid(MLOperand input, optional MLOperatorOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits sigmoid;
};
Arguments:

Returns:

Constraints for sigmoid()
operand allowed data types allowed ranks
input "float32", "float16" N
output same as input same as input

MLOpSupportLimits has the following member for sigmoid():

sigmoid, of type MLSingleInputSupportLimits

Support limits for operator sigmoid().

The sigmoid(input, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and input returns false, then throw a TypeError.

  3. If input’s dataType is not one of its allowed data types (according to this table), then throw a TypeError.

  4. Make graph connections:

    1. Let output be the result of copying an MLOperand given input.

    2. Let operator be an operator for the "sigmoid" operation, given options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s input to input.

    5. Set operator’s output to output.

  5. Return output.

The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
function sigmoid(builder, input) {
  return builder.div(
    builder.constant(input.dataType, 1),
    builder.add(
      builder.exp(builder.neg(input)), builder.constant(input.dataType, 1)));
}

7.9.39. slice

Produce a slice of the input tensor.
partial interface MLGraphBuilder {
  MLOperand slice(MLOperand input,
                  sequence<[EnforceRange] unsigned long> starts,
                  sequence<[EnforceRange] unsigned long> sizes,
                  optional MLOperatorOptions options = {});
};

partial  dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits slice;
};
Arguments:

Returns: an MLOperand. The output tensor of the same rank as the input tensor with tensor values stripped to the specified starting and ending indices in each dimension.

Constraints for slice()
operand allowed data types allowed ranks
input any N
output same as input same as input

MLOpSupportLimits has the following member for slice():

slice, of type MLSingleInputSupportLimits

Support limits for operator slice().

The slice(input, starts, sizes, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and input returns false, then throw a TypeError.

  3. If any of sizes’s items are 0, then throw a TypeError.

  4. If starts’s size and sizes’s size are not both equal to input’s rank, then throw a TypeError.

  5. For each index in the range 0 to input’s rank, exclusive:

    1. If sizes[index] is 0, then throw a TypeError.

      If 0-size dimensions are allowed, revise these steps. [Issue #391]

    2. If starts[index] is greater than or equal to input’s shape[index], then throw a TypeError.

    3. If starts[index] + sizes[index] is greater than input’s shape[index], then throw a TypeError.

  6. Make graph connections:

    1. Let output be the result of copying an MLOperand given input.

    2. Let operator be an operator for the "slice" operation, given starts, sizes, and options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s input to input.

    5. Set operator’s output to output.

  7. Return output.

7.9.40. softmax

Compute the softmax values of the N-D input tensor along the given axis.
partial interface MLGraphBuilder {
  MLOperand softmax(MLOperand input,
                    [EnforceRange] unsigned long axis,
                    optional MLOperatorOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits softmax;
};
Arguments:

Returns:

Constraints for softmax()
operand allowed data types allowed ranks
input "float32", "float16" N
output same as input same as input

MLOpSupportLimits has the following member for softmax():

softmax, of type MLSingleInputSupportLimits

Support limits for operator softmax().

The softmax(input, axis, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and input returns false, then throw a TypeError.

  3. If input’s dataType is not one of its allowed data types (according to this table), then throw a TypeError.

  4. If axis is greater than or equal to input’s rank, then throw a TypeError.

  5. Make graph connections:

    1. Let output be the result of copying an MLOperand given input.

    2. Let operator be an operator for the "softmax" operation, given axis and options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s input to input.

    5. Set operator’s output to output.

  6. Return output.

The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
function softmax(builder, input, axis) {
  // This sample deploys a well-known implementation trick [1] to compute the
  // exponentials of the distances to the max value, instead of the exponentials
  // of the input values itself, in order to increase the numerical stability of
  // the result.
  // [1]: https://cs231n.github.io/linear-classify/#softmax
  const maxX = builder.reduceMax(input, {axes: [axis], keepDimensions: true});
  const expX = builder.exp(builder.sub(input, maxX));
  return builder.div(
    expX, builder.reduceSum(expX, {axes: [axis], keepDimensions: true}));
}

7.9.41. softplus

Compute the softplus function of the input tensor. The calculation follows the expression ln(1 + exp(x)).
partial interface MLGraphBuilder {
  MLOperand softplus(MLOperand input, optional MLOperatorOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits softplus;
};
Arguments:

Returns:

Constraints for softplus()
operand allowed data types allowed ranks
input "float32", "float16" N
output same as input same as input

MLOpSupportLimits has the following member for softplus():

softplus, of type MLSingleInputSupportLimits

Support limits for operator softplus().

The softplus(input, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and input returns false, then throw a TypeError.

  3. If input’s dataType is not one of its allowed data types (according to this table), then throw a TypeError.

  4. Make graph connections:

    1. Let output be the result of copying an MLOperand given input.

    2. Let operator be an operator for the "softplus" operation and options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s input to input.

    5. Set operator’s output to output.

  5. Return output.

The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
function softplus(builder, input) {
  return builder.log(
    builder.add(builder.exp(input), builder.constant(input.dataType, 1)));
}

7.9.42. softsign

Compute the softsign function of the input tensor. The calculation follows the expression x / (1 + |x|).
partial interface MLGraphBuilder {
  MLOperand softsign(MLOperand input, optional MLOperatorOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits softsign;
};
The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
function softsign(builder, input) {
  return builder.div(
    input,
    builder.add(builder.constant(input.dataType, 1), builder.abs(input)));
}
Arguments:

Returns:

Constraints for softsign()
operand allowed data types allowed ranks
input "float32", "float16" N
output same as input same as input

MLOpSupportLimits has the following member for softsign():

softsign, of type MLSingleInputSupportLimits

Support limits for operator softsign().

The softsign(input, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and input returns false, then throw a TypeError.

  3. If input’s dataType is not one of its allowed data types (according to this table), then throw a TypeError.

  4. Make graph connections:

    1. Let output be the result of copying an MLOperand given input.

    2. Let operator be an operator for the "softsign" operation and options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s input to input.

    5. Set operator’s output to output.

  5. Return output.

7.9.43. split

Split the input tensor into a number of sub tensors along the given axis.
dictionary MLSplitOptions : MLOperatorOptions {
  [EnforceRange] unsigned long axis = 0;
};

partial interface MLGraphBuilder {
  sequence<MLOperand> split(
      MLOperand input,
      ([EnforceRange] unsigned long or sequence<[EnforceRange] unsigned long>) splits,
      optional MLSplitOptions options = {});
};

dictionary MLSplitSupportLimits {
  MLSupportLimits input;
  MLSupportLimits outputs;
};

partial dictionary MLOpSupportLimits {
  MLSplitSupportLimits split;
};
Arguments:

Returns: sequence<MLOperand>. The split output tensors. If splits is an unsigned long, the size of the output is equal to splits. The shape of each output tensor is the same as input except the dimension size of axis equals to the quotient of dividing the dimension size of input along axis by splits. If splits is a sequence<unsigned long>, the size of the output equals the size of splits. The shape of the i-th output tensor is the same as input except along axis where the dimension size is splits[i].

MLSplitOptions has the following members:

axis, of type unsigned long, defaulting to 0

The dimension along which to split. Its value must be in the range [0, N-1] where N is the rank of the input tensor.

Constraints for split()
operand allowed data types allowed ranks
input any N
outputs same as input same as input

MLSplitSupportLimits has the following members:

input, of type MLSupportLimits

MLSupportLimits for input operand.

outputs, of type MLSupportLimits

MLSupportLimits for all the output operands.

MLOpSupportLimits has the following member for split():

split, of type MLSplitSupportLimits

Support limits for operator split().

The split(input, splits, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and input returns false, then throw a TypeError.

  3. Let axis be options.axis.

  4. If axis is greater than or equal to input’s rank, then throw a TypeError.

  5. If splits is an unsigned long:

    1. If input’s shape[axis] % splits is not 0, then throw a TypeError.

    2. Otherwise, let splitCount be splits.

  6. If splits is a sequence<unsigned long>:

    1. If any of its elements is equal to 0, then throw a TypeError.

      If 0-size dimensions are allowed, revise the above step. [Issue #391]

    2. If the sum of its elements is not equal to input’s shape[axis], then throw a TypeError.

    3. Otherwise, let splitCount be splits’s size.

  7. Make graph connections:

    1. Let operator be an operator for the "split" operation, given splits and options.

    2. Let outputs be a new list.

    3. For each index in the range 0 to splitCount, exclusive:

      1. Let operand be the result of copying an MLOperand given input.

      2. If splits is an unsigned long, then let newDimension be operand’s shape[axis] / splits.

      3. Otherwise, let newDimension be splits[index].

      4. Set operand’s shape[axis] to newDimension.

      5. Set operand.[[operator]] to operator.

      6. Append operand to outputs.

    4. Set operator’s input to input.

    5. Set operator’s outputs to outputs.

  8. Return outputs.

The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
function split(builder, input, splits, options) {
  // This sample shows the case that the splits parameter is an array.
  const outputs = [];
  const inputShape = input.shape;
  const inputRank = inputShape.length;
  let starts = Array(inputRank).fill(0);
  let sizes = inputShape;
  let start = 0;
  for (const size of splits) {
    starts[options.axis] = start;
    sizes[options.axis] = size;
    outputs.push(builder.slice(input, starts, sizes));
    start += size;
  }
  return outputs;
}

7.9.44. tanh

Compute the hyperbolic tangent function of the input tensor. The calculation follows the expression (exp(2 * x) - 1) / (exp(2 * x) + 1).
partial interface MLGraphBuilder {
  MLOperand tanh(MLOperand input, optional MLOperatorOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits tanh;
};
Arguments:

Returns:

Constraints for tanh()
operand allowed data types allowed ranks
input "float32", "float16" N
output same as input same as input

MLOpSupportLimits has the following member for tanh():

tanh, of type MLSingleInputSupportLimits

Support limits for operator tanh().

The tanh(input, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and input returns false, then throw a TypeError.

  3. If input’s dataType is not one of its allowed data types (according to this table), then throw a TypeError.

  4. Make graph connections:

    1. Let output be the result of copying an MLOperand given input.

    2. Let operator be an operator for the "tanh" operation, given options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s input to input.

    5. Set operator’s output to output.

  5. Return output.

The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
function tanh(builder, input) {
  return builder.div(
    builder.sub(
      builder.exp(builder.mul(builder.constant(input.dataType, 2), input)),
      builder.constant(input.dataType, 1)),
    builder.add(
      builder.exp(builder.mul(builder.constant(input.dataType, 2), input)),
      builder.constant(input.dataType, 1)));
}

7.9.45. transpose

Permute the dimensions of the input tensor according to the permutation argument.
dictionary MLTransposeOptions : MLOperatorOptions {
  sequence<[EnforceRange] unsigned long> permutation;
};

partial interface MLGraphBuilder {
  MLOperand transpose(MLOperand input, optional MLTransposeOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits transpose;
};

MLTransposeOptions has the following members:

permutation, of type sequence<[EnforceRange] unsigned long>

The values used to permute the output shape. The default is [N-1, ..., 0], where N is the rank of the input tensor, e.g. [2,1,0] for a 3-D tensor. These default values cause the output to become a transposed tensor of the input. When specified, the number of values must be the same as the rank of the input tensor, and the values must be within the range from 0 to N-1 with no duplicates.

Arguments:

Returns: an MLOperand. The permuted or transposed N-D tensor.

Constraints for transpose()
operand allowed data types allowed ranks
input any N
output same as input same as input

MLOpSupportLimits has the following member for transpose():

transpose, of type MLSingleInputSupportLimits

Support limits for operator transpose().

The transpose(input, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and input returns false, then throw a TypeError.

  3. If options.permutation does not exist, let options.permutation be the reversed sequence of all indices for input’s shape.

  4. Otherwise if options.permutation exists:

    1. If its size is not equal to input’s rank, then throw a TypeError.

    2. If its values are not in the range 0 to input’s rank exclusive, then throw a TypeError.

    3. If it contains duplicate values, then throw a TypeError.

  5. Make graph connections:

    1. Let output be the result of copying an MLOperand given input.

    2. Let operator be an operator for the "transpose" operation, given options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s input to input.

    5. Set operator’s output to output.

  6. Return output.

7.9.46. triangular

Given a 2-D tensor (matrix), return a 2-D tensor containing either the upper or lower triangular part of the input tensor. If the input tensor has greater than 2 dimensions it is treated as a batch of matrices and the result has the same shape.
dictionary MLTriangularOptions : MLOperatorOptions {
  boolean upper = true;
  [EnforceRange] long diagonal = 0;
};

partial interface MLGraphBuilder {
  MLOperand triangular(MLOperand input, optional MLTriangularOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits triangular;
};

MLTriangularOptions has the following members:

upper, of type boolean, defaulting to true

Indicates whether the output the upper or the lower part of the input matrix is retained. True indicates that the upper part is retained.

diagonal, of type long, defaulting to 0

Specifies how many diagonals above or below the main diagonals of the input matrix are retained or excluded. A value of 0 means no diagonals other than the main diagonals are affected.

Arguments:

Returns: an MLOperand. The output tensor representing a triangular matrix, or batch of matrices which is the same shape as the input.

Constraints for triangular()
operand allowed data types allowed ranks
input any 2 or greater
output same as input same as input

MLOpSupportLimits has the following member for triangular():

triangular, of type MLSingleInputSupportLimits

Support limits for operator triangular().

The triangular(input, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and input returns false, then throw a TypeError.

  3. If input’s rank is not one of its allowed ranks (according to this table), then throw a TypeError.

  4. Make graph connections:

    1. Let output be the result of copying an MLOperand given input.

    2. Let operator be an operator for the "triangular" operation, given options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s input to input.

    5. Set operator’s output to output.

  5. Return output.

Examples of how triangular works in different diagonal settings.
// input:
//   [[7, 1, 2],
//    [9, 4, 8],
//    [2, 6, 3]]
const input = builder.constant(
  {shape: [3, 3]}, new Float32Array([7, 1, 2, 9, 4, 8, 2, 6, 3]));

// upper triangular matrix:
//   [[7, 1, 2],
//    [0, 4, 8],
//    [0, 0, 3]]
const upper = builder.triangular(input);

// upper triangular matrix with one additional set of diagonals excluded:
//   [[0, 1, 2],
//    [0, 0, 8],
//    [0, 0, 0]]
const upperPositive = builder.triangular(input, {diagonal: 1});

// upper triangular matrix with one additional set of diagonals retained:
//   [[7, 1, 2],
//    [9, 4, 8],
//    [0, 6, 3]]
const upperNegative = builder.triangular(input, {diagonal: -1});

// lower triangular matrix:
//   [[7, 0, 0],
//    [9, 4, 0],
//    [2, 6, 3]]
const lower = builder.triangular(input, {upper: false});

// lower triangular matrix with one additional set of diagonals retained:
//   [[7, 1, 0],
//    [9, 4, 8],
//    [2, 6, 3]]
const lowerPositive = builder.triangular(input, {upper: false, diagonal: 1});

// lower triangular matrix with one additional set of diagonals excluded:
//   [[0, 0, 0],
//    [9, 0, 0],
//    [2, 6, 0]]
const lowerNegative = builder.triangular(input, {upper: false, diagonal: -1})

// lower triangular matrix with two batches:
//   [[[7, 0, 0],
//     [9, 4, 0],
//     [2, 6, 3]],
//    [[1, 0, 0],
//     [4, 5, 0],
//     [7, 8, 9]]]
const lowerWithBatches = builder.triangular(input, {upper: false});

7.9.47. where

Select the values from the trueValue or the falseValue tensor depending on the corresponding values of the condition tensor, where non-zero is true and zero is false. The condition tensor is often the output of one of the element-wise logical operations.

The operation will be broadcast according to [numpy-broadcasting-rule]. The input tensors must be bidirectionally broadcastable. The rank of the output tensor is the maximum rank of the input tensors. For each dimension of the output tensor, its size is the maximum size along that dimension of the input tensors.

partial interface MLGraphBuilder {
  MLOperand where(MLOperand condition,
                  MLOperand trueValue,
                  MLOperand falseValue,
                  optional MLOperatorOptions options = {});
};

dictionary MLWhereSupportLimits {
  MLSupportLimits condition;
  MLSupportLimits trueValue;
  MLSupportLimits falseValue;
  MLSupportLimits output;
};

partial dictionary MLOpSupportLimits {
  MLWhereSupportLimits where;
};
Arguments:

Returns: an MLOperand. The output tensor that contains the values selected element-wise from either the trueValue or the falseValue tensor.

Constraints for where()
operand allowed data types allowed ranks
condition "uint8" N
trueValue any N
falseValue same as trueValue N
output same as trueValue maximum of condition's rank, trueValue's rank and falseValue's rank

MLWhereSupportLimits has the following members:

condition, of type MLSupportLimits

MLSupportLimits for condition operand.

trueValue, of type MLSupportLimits

MLSupportLimits for trueValue operand.

falseValue, of type MLSupportLimits

MLSupportLimits for falseValue operand.

output, of type MLSupportLimits

MLSupportLimits for output operand.

MLOpSupportLimits has the following member for where():

where, of type MLWhereSupportLimits

Support limits for operator where().

The where(condition, trueValue, falseValue, options) method steps are:
  1. If this can not build, then throw an "InvalidStateError" DOMException.

  2. If validating operand with this and any of condition, trueValue, and falseValue returns false, then throw a TypeError.

  3. If the dataType of any of condition, trueValue, or falseValue is not one of its allowed data types (according to this table), then throw a TypeError.

  4. Let outputShape be the result of bidirectionally broadcasting trueValue’s shape and falseValue’s shape.

    1. If that returns failure, then throw a TypeError.

  5. Set outputShape to the result of bidirectionally broadcasting condition’s shape and outputShape.

    1. If that returns failure, then throw a TypeError.

  6. Let descriptor be the result of creating an MLOperandDescriptor given trueValue’s dataType and outputShape.

  7. Make graph connections:

    1. Let output be the result of creating an MLOperand given this and descriptor.

    2. Let operator be an operator for the "where" operation, given condition, trueValue, falseValue, and options.

    3. Set output.[[operator]] to operator.

    4. Set operator’s inputs to condition, trueValue and falseValue.

    5. Set operator’s output to output.

  8. Return output.

The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
function where(builder, condition, trueValue, falseValue) {
  const c = builder.clamp(condition, {'minValue': 0, 'maxValue': 1});
  builder.add(
    builder.mul(trueValue, builder.cast(c, trueValue.dataType)),
    builder.mul(
      falseValue, builder.cast(builder.logicalNot(c), falseValue.dataType)));
}

8. Algorithms

8.1. Broadcasting

Broadcasting describes how WebNN treats tensors with different shapes during graph construction and computation. It is heavily influenced by [NumPy] and follows the [numpy-broadcasting-rule]. Loosely speaking, it allows an operation on a smaller tensor to be "broadcast" across the shape of a larger tensor, so that the same data can be applied repeatedly without making copies.

The simplest example is the application of a scalar constant to an N-dimension tensor with element-wise binary operations such as add() or mul(). Rather than needing to allocate and populate a matching N-dimensional tensor containing multiple copies of the scalar constant, these element-wise operations allow the scalar constant to be used directly, and broadcast the scalar value across the N-dimensional tensor. With the following considerations, the same logic applies to tensors of other dimensions.

The shapes of the input tensors must be compatible. A tensor is unidirectionally broadcastable to another tensor if the first tensor can be "stretched" by repeating the first tensor along an axis with size 1 or repeating across new dimensions, starting from the last (rightmost) dimension. For example, a [4] tensor can be broadcast to a [5, 4] tensor by repeating it 5 times. A [1] tensor can be broadcast to a [5,4] tensor by repeating it 4 times on the last dimension and 5 times on the preceding dimension. Unidirectional broadcasting is important for operations such as expand() where the target tensor shape is explicitly given.

Two tensors are bidirectionally broadcastable if they can be mutually "stretched" (repeated) across their various dimensions, starting from the last dimension. For example, a [5,1] tensor can be bidirectionally broadcast with a [1,6] tensor by repeating the first tensor 6 times in the last dimension and the second tensor 5 times in preceding dimension. The result of the operation will be a [5,6] tensor. Bidirectional broadcasting is convenient for element-wise operations.

Some operations allow broadcasting with special semantics. For example, matmul() treats the last two dimensions of the input tensors as the rows and columns of the matrices, and the number of columns in the first matrix must be equal to the number of rows in the second matrix. The matrix multiplication is bidirectionally broadcast across any additional dimensions, treating the input tensors as stacks of matrices to multiply.

To unidirectionally broadcast the shapes shapeFrom and shapeTo, perform the following steps. shapeFrom and shapeTo are lists of positive integers, representing the dimensions of tensors, and the steps return a new list of positive integers, or failure.
  1. Let sizeFrom be shapeFrom’s size.

  2. Let sizeTo be shapeTo’s size.

  3. If sizeFrom > sizeTo, then return failure.

  4. Let paddedShapeFrom be a clone of shapeFrom.

  5. While paddedShapeFrom’s size is less than sizeTo, prepend 1 to paddedShapeFrom.

  6. Let outputShape be a new list.

  7. For each index in the range 0 to sizeTo, exclusive:

    1. Let dimFrom be paddedShapeFrom[index].

    2. Let dimTo be shapeTo[index].

    3. If dimTo is not equal to dimFrom and dimFrom is not equal to 1, then return failure.

    4. Append dimTo to outputShape.

  8. Return outputShape.

shapeFrom is unidirectionally broadcastable to shapeTo if unidirectionally broadcasting shapeFrom and shapeTo does not result in failure.

To bidirectionally broadcast the shapes shapeA and shapeB, perform the following steps. shapeA and shapeB are lists of positive integers, representing the dimensions of tensors, and the steps return a new list of positive integers, or failure.
  1. Let sizeA be shapeA’s size.

  2. Let sizeB be shapeB’s size.

  3. Let outputSize be the maximum of sizeA and sizeB.

  4. Let paddedA be a clone of shapeA.

  5. While paddedA’s size is less than outputSize, prepend 1 to paddedA.

  6. Let paddedB be a clone of shapeB.

  7. While paddedB’s size is less than outputSize, prepend 1 to paddedB.

  8. Let outputShape be a new list.

  9. For each index in the range 0 to outputSize, exclusive:

    1. Let dimA be paddedA[index].

    2. Let dimB be paddedB[index].

    3. If dimA is not equal to dimB, and dimA is not equal to 1, and dimB is not equal to 1, then return failure.

    4. Append the maximum of dimA and dimB to outputShape.

  10. Return outputShape.

shapeA is bidirectionally broadcastable to shapeB if bidirectionally broadcasting shapeA and shapeB does not result in failure.

8.2. Casting

Explicit numeric casting is used in algorithms where parameters passed as MLNumber or double need to be converted to match the MLOperandDataType of input or output MLOperands.

To cast a number x to a given MLOperandDataType dataType, perform the following steps. They return a number.
  1. Switch on dataType:

    "float32"

    Return ConvertToFloat(x, 32).

    "float16"

    Return ConvertToFloat(x, 16).

    "int64"

    Return ConvertToInt(x, 64, "signed").

    "uint64"

    Return ConvertToInt(x, 64, "unsigned").

    "int32"

    Return ConvertToInt(x, 32, "signed").

    "uint32"

    Return ConvertToInt(x, 32, "signed").

    "int8"

    Return ConvertToInt(x, 8, "signed").

    "uint8"

    Return ConvertToInt(x, 8, "unsigned").

NOTE: The input to cast is an abstract number with unlimited range and precision, including the special values Infinity, -Infinity and NaN. The output is also an abstract number, but exactly representable as the specified type.

The ConvertToFloat(x, bitLength) steps are:
  1. If x is NaN, then return NaN.

  2. Switch on bitLength:

    32
    1. Let upperBound be 2128.

    2. Let lowerBound be -2128.

    3. Let S be the set of [IEEE-754-2019] binary32 floating point values except -0, but with the special values upperBound and lowerBound added.

    16
    1. Let upperBound be 216.

    2. Let lowerBound be -216.

    3. Let S be the set of [IEEE-754-2019] binary16 floating point values except -0, but with the special values upperBound and lowerBound added.

  3. Let y be the number in S that is closest to x, selecting the number with an even significand if there are two equally close values. The two special values lowerBound and upperBound are considered to have even significands for this purpose.

  4. If y is upperBound, then return +Infinity.

  5. If y is lowerBound, then return -Infinity.

  6. If y is +0 and x is negative, return -0.

  7. Return y.

NOTE: This is based on a definition in [WEBIDL], but extended to cover 16-bit floating point values.

The ConvertToInt(x, bitLength, signedness) steps are:
  1. If signedness is "unsigned", then:

    1. Let lowerBound be 0.

    2. Let upperBound be 2bitLength - 1.

  2. Otherwise:

    1. Let lowerBound be -(2bitLength - 1).

    2. Let upperBound be 2bitLength - 1 - 1.

  3. If x is -0, then set x to +0.

  4. If x is NaN, then return +0.

  5. Set x to min(max(x, lowerBound), upperBound).

  6. Round x to the nearest integer, choosing the even integer if it lies halfway between two, and choosing +0 rather than -0.

  7. Return x.

NOTE: This is based on a definition in [WEBIDL] with these differences: 64-bit integers are not treated specially, the input x is an abstract number, and clamping is always performed.

9. Examples

Given the following build graph:
constant1 ---+
             +--- Add ---> intermediateOutput1 ---+
input1    ---+                                    |
                                                  +--- Mul---> output
constant2 ---+                                    |
             +--- Add ---> intermediateOutput2 ---+
input2    ---+
The following code implements the graph:
// Use tensors in 4 dimensions.
const TENSOR_SHAPE = [1, 2, 2, 2];
const TENSOR_SIZE = 8;

const context = await navigator.ml.createContext();
const builder = new MLGraphBuilder(context);

// Create MLOperandDescriptor object.
const desc = {
  dataType: 'float32',
  shape: TENSOR_SHAPE
};

// constant1 is a constant MLOperand with the value 0.5.
const constantBuffer1 = new Float32Array(TENSOR_SIZE).fill(0.5);
const constant1 = builder.constant(desc, constantBuffer1);

// input1 is one of the input MLOperands. Its value will be set before
// execution.
const input1 = builder.input('input1', desc);

// constant2 is another constant MLOperand with the value 0.5.
const constantBuffer2 = new Float32Array(TENSOR_SIZE).fill(0.5);
const constant2 = builder.constant(desc, constantBuffer2);

// input2 is another input MLOperand. Its value will be set before execution.
const input2 = builder.input('input2', desc);

// intermediateOutput1 is the output of the first Add operation.
const intermediateOutput1 = builder.add(constant1, input1);

// intermediateOutput2 is the output of the second Add operation.
const intermediateOutput2 = builder.add(constant2, input2);

// output is the output MLOperand of the Mul operation.
const output = builder.mul(intermediateOutput1, intermediateOutput2);

10. Operator Emulation

This section is non-normative.

Operations present in other neural network inference APIs can often be emulated using operations present in WebNN.

10.1. squeeze

The squeeze operation returns a tensor with all specified dimensions of input of size 1 removed. It can be generically implemented using the reshape() operation as follows:
function squeeze(builder, input, axes) {
  if (!axes)
    axes = [];
  if (!axes.length)
    input.shape.forEach((item, i) => {
      axes.push(i);
    });
  const shape = Array.from(input.shape);
  for (let axis in axes.sort().reverse())
    if (axis < shape.length && shape[axis] == 1)
      shape.splice(axis, 1);
  return builder.reshape(input, shape);
}

10.2. unsqueeze

The unsqueeze operation returns a new tensor with a dimension of size one inserted at the specified position. It can be generically implemented using the reshape() operation as follows:
function unsqueeze(builder, input, axes) {
  const shape = Array.from(input.shape);
  for (let axis in axes.sort())
    shape.splice(axis, 0, 1);
  return builder.reshape(input, shape);
}

10.3. flatten

The flatten operation reshapes the input into a one-dimensional tensor. It can be generically implemented using the reshape() operation as follows:
function flatten(builder, input, axis) {
  if (axis > input.shape.length)
    return input;
  const before = axis.slice(0, axis).reduce((a, b) => a * b);
  const after = axis.slice(axis, input.shape.length).reduce((a, b) => a * b);
  return builder.reshape(input, [before, after]);
}

11. Appendices

11.1. MLOperandDataType and ArrayBufferView compatibility

MLOperandDataType ArrayBufferView
float32 Float32Array
float16 Float16Array
int64 BigInt64Array
uint64 BigUint64Array
int32 Int32Array
uint32 Uint32Array
int8 Int8Array
uint8 Uint8Array

Float16Array is at ECMA Stage 3 signaling its design is finished. Implementers wanting to enable this type ahead native implementations can emulate the type by passing raw bits via Uint16Array. [Issue webnn#373]

12. Acknowledgements

This specification follows the concepts of the Android Neural Networks API C API.

Thanks to Tomoyuki Shimizu, Ningxin Hu, Zhiqiang Yu and Belem Zhang for the use cases.

Thanks to Nikhil Thorat, Daniel Smilkov, Ganesan Ramalingam, Rafael Cintron and Benjamin Poulain for their contributions to the API specification.

Thanks to Sangwhan Moon and the W3C Technical Architecture Group for review of this specification for web architecture fit, design consistency and developer ergonomics.

Thanks to Zoltan Kis for adding algorithms and making navigating this specification a delightful experience. Thanks to Joshua Bell for aligning the specification with modern editorial conventions. Thanks to Ningxin Hu, Lisha Guo, Shiyi Zou, Mingming Xu, Junwei Fu, Bruce Dai and Bin Miao for careful review and comments.

Thanks to W3C Privacy Interest Group for privacy and security review and feedback.

Thanks to Alex Gough and the Chrome Security team for security review and questions.

Thanks to Michal Karzynski for sharing practical guidelines and learnings from ONNX.

Thanks to Kaustubha Govind and Chrome privacy reviewers for feedback and privacy considerations.

Thanks to Jiewei Qian for Chromium implementation review and feedback.

Thanks to Dwayne Robinson, Joshua Lochner and Wanming Lin for their work investigating and providing recommendation for transformer support. Additional thanks to Dwayne and Wanming for providing reviews of operator conformance and web-platform-tests implementation.

Thanks to Feng Dai for his continuous contributions that keep web-platform-tests evolving alongside the specification.

Conformance

Document conventions

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Conformant Algorithms

Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and abort these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.

Conformance requirements phrased as algorithms or specific steps can be implemented in any manner, so long as the end result is equivalent. In particular, the algorithms defined in this specification are intended to be easy to understand and are not intended to be performant. Implementers are encouraged to optimize.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[ECMASCRIPT]
ECMAScript Language Specification. URL: https://tc39.es/ecma262/multipage/
[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[INFRA]
Anne van Kesteren; Domenic Denicola. Infra Standard. Living Standard. URL: https://infra.spec.whatwg.org/
[NUMPY-BROADCASTING-RULE]
The SciPy community. General Broadcasting Rules of NumPy. July 2019. URL: https://numpy.org/doc/stable/user/basics.broadcasting.html#general-broadcasting-rules
[PERMISSIONS-POLICY-1]
Ian Clelland. Permissions Policy. URL: https://w3c.github.io/webappsec-permissions-policy/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119
[WEBGPU]
Brandon Jones; Kai Ninomiya; Jim Blandy. WebGPU. URL: https://gpuweb.github.io/gpuweb/
[WEBIDL]
Edgar Chen; Timothy Gu. Web IDL Standard. Living Standard. URL: https://webidl.spec.whatwg.org/

Informative References

[Batch-Normalization]
Sergey Ioffe; Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. March 2015. URL: https://arxiv.org/abs/1502.03167
[ContextualLoss]
Roey Mechrez; Itamar Talmi; Lihi Zelnik-Manor. The Contextual Loss for Image Transformation with Non-Aligned Data. July 2018. URL: https://arxiv.org/abs/1803.02077
[DeepLabv3+]
Liang-Chieh Chen; et al. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. August 2018. URL: https://arxiv.org/abs/1802.02611
[DeepMoji]
Bjarke Felbo; et al. Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. October 2017. URL: https://arxiv.org/abs/1708.00524
[ELU]
Djork-Arné Clevert; Thomas Unterthiner; Sepp Hochreiter. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). February 2016. URL: https://arxiv.org/abs/1511.07289
[Error-Function]
Larry C. Andrews. Special functions of mathematics for engineers. 1998. URL: https://books.google.com/books?id=2CAqsF-RebgC&pg=PA110
[FaceForensics++]
Andreas Rössler; et al. FaceForensics++. January 2019. URL: https://github.com/ondyari/FaceForensics
[FaceNet]
Florian Schroff; Dmitry Kalenichenko; James Philbin. FaceNet: A Unified Embedding for Face Recognition and Clustering. June 2015. URL: https://arxiv.org/abs/1503.03832
[FAN]
Adrian Bulat; Georgios Tzimiropoulos. How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks). September 2017. URL: https://arxiv.org/abs/1703.07332
[GNMT]
Minh-Thang Luong; Eugene Brevdo; Rui Zhao. Neural Machine Translation (seq2seq) Tutorial. May 2017. URL: https://github.com/tensorflow/nmt
[GPT2]
Alec Radford; et al. Language Models are Unsupervised Multitask Learners. February 2019. URL: https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf
[GRU]
Kyunghyun Cho; et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. September 2014. URL: https://arxiv.org/pdf/1406.1078.pdf
[HR-TIME-3]
Yoav Weiss. High Resolution Time. URL: https://w3c.github.io/hr-time/
[IEEE-754-2019]
IEEE Standard for Floating-Point Arithmetic. 22 July 2019. URL: https://ieeexplore.ieee.org/document/8766229
[IM2TXT]
Oriol Vinyals; et al. Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge. September 2016. URL: https://arxiv.org/abs/1609.06647
[Instance-Normalization]
Dmitry Ulyanov; Andrea Vedaldi; Victor Lempitsky. Instance Normalization: The Missing Ingredient for Fast Stylization. July 2016. URL: https://arxiv.org/abs/1607.08022
[Layer-Normalization]
Jimmy Lei Ba; Jamie Ryan Kiros; Geoffrey E. Hinton. Layer Normalization. July 2016. URL: https://arxiv.org/abs/1607.06450
[LDM]
Robin Rombach; et al. High-Resolution Image Synthesis with Latent Diffusion Models. April 2022. URL: https://arxiv.org/abs/2112.10752
[LeakyReLU]
Andrew L. Maas; Awni Y. Hannun; Andrew Y. Ng. Rectifier Nonlinearities Improve Neural Network Acoustic Models. June 2013. URL: https://pdfs.semanticscholar.org/367f/2c63a6f6a10b3b64b8729d601e69337ee3cc.pdf
[LLAMA-2-7B]
Hugo Touvron; et al. Llama 2: Open Foundation and Fine-Tuned Chat Models. July 2023. URL: https://arxiv.org/abs/2307.09288
[LSTM]
Sepp Hochreiter; Jürgen Schmidhuber. Long Short-Term Memory. November 1997. URL: https://doi.org/10.1162/neco.1997.9.8.1735
[m2m100_418M]
Angela Fan; et al. Beyond English-Centric Multilingual Machine Translation. October 2020. URL: https://arxiv.org/abs/2010.11125
[MaskR-CNN]
Kaiming He; et al. Mask R-CNN. January 2018. URL: https://arxiv.org/abs/1703.06870
[MobileNetV3]
Andrew Howard; et al. Searching for MobileNetV3. November 2019. URL: https://arxiv.org/pdf/1905.02244
[MODELS]
Machine Learning for the Web Community Group. The first-wave models. 2020. URL: https://github.com/webmachinelearning/webnn/blob/master/op_compatibility/first_wave_models.md
[NumPy]
The SciPy community. NumPy. July 2019. URL: https://numpy.org/doc/stable/
[OpenNMT]
Guillaume Klein; et al. OpenNMT: Open-Source Toolkit for Neural Machine Translation. March 2017. URL: https://arxiv.org/abs/1701.02810
[PairedCycleGAN]
Huiwen Chang; et al. PairedCycleGAN: Asymmetric Style Transfer for Applying and Removing Makeup. June 2018. URL: http://openaccess.thecvf.com/content_cvpr_2018/html/Chang_PairedCycleGAN_Asymmetric_Style_CVPR_2018_paper.html
[PoseNet]
Dan Oved. Real-time Human Pose Estimation in the Browser with TensorFlow.js. May 2018. URL: https://medium.com/tensorflow/real-time-human-pose-estimation-in-the-browser-with-tensorflow-js-7dd0bc881cd5
[POWERFUL-FEATURES]
Mike West. Secure Contexts. URL: https://w3c.github.io/webappsec-secure-contexts/
[RNNoise]
Jean-Marc Valin. Recurrent neural network for audio noise reduction. September 2017. URL: https://github.com/xiph/rnnoise
[SECURITY-PRIVACY-QUESTIONNAIRE]
Theresa O'Connor; Peter Snyder. Self-Review Questionnaire: Security and Privacy. URL: https://w3ctag.github.io/security-questionnaire/
[SegAny]
Alexander Kirillov; et al. Segment Anything. April 2023. URL: https://arxiv.org/abs/2304.02643
[SRGAN]
Christian Ledig; et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. May 2017. URL: https://arxiv.org/abs/1609.04802
[SSD]
Wei Liu; et al. SSD: Single Shot MultiBox Detector. December 2016. URL: https://arxiv.org/abs/1512.02325
[T5-SMALL]
Colin Raffel; et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. June 2020. URL: https://jmlr.org/papers/volume21/20-074/20-074.pdf
[Video-Summarization-with-LSTM]
Ke Zhang; et al. Video summarization with long short-term memory. October 2016. URL: http://www-scf.usc.edu/~zhan355/ke_eccv2016.pdf
[WEBMACHINELEARNING-ETHICS]
Anssi Kostiainen. Ethical Principles for Web Machine Learning. URL: https://webmachinelearning.github.io/webmachinelearning-ethics/
[Whisper]
Alec Radford; et al. Robust Speech Recognition via Large-Scale Weak Supervision. December 2022. URL: https://arxiv.org/abs/2212.04356
[YOLO]
Joseph Redmon; et al. You Only Look Once: Unified, Real-Time Object Detection. May 2016. URL: https://arxiv.org/abs/1506.02640

IDL Index

interface mixin NavigatorML {
  [SecureContext, SameObject] readonly attribute ML ml;
};
Navigator includes NavigatorML;
WorkerNavigator includes NavigatorML;

enum MLDeviceType {
  "cpu",
  "gpu",
  "npu"
};

enum MLPowerPreference {
  "default",
  "high-performance",
  "low-power"
};

dictionary MLContextOptions {
  MLDeviceType deviceType = "cpu";
  MLPowerPreference powerPreference = "default";
};

[SecureContext, Exposed=(Window, DedicatedWorker)]
interface ML {
  Promise<MLContext> createContext(optional MLContextOptions options = {});
  Promise<MLContext> createContext(GPUDevice gpuDevice);
};

typedef record<USVString, MLTensor> MLNamedTensors;

dictionary MLContextLostInfo {
  DOMString message;
};

[SecureContext, Exposed=(Window, DedicatedWorker)]
interface MLContext {
  undefined dispatch(MLGraph graph, MLNamedTensors inputs, MLNamedTensors outputs);

  Promise<MLTensor> createTensor(MLTensorDescriptor descriptor);

  Promise<ArrayBuffer> readTensor(MLTensor tensor);
  Promise<undefined> readTensor(MLTensor tensor, AllowSharedBufferSource outputData);

  undefined writeTensor(MLTensor tensor, AllowSharedBufferSource inputData);

  MLOpSupportLimits opSupportLimits();

  undefined destroy();

  readonly attribute Promise<MLContextLostInfo> lost;
};

dictionary MLOpSupportLimits {
  MLInputOperandLayout preferredInputLayout;
  MLSupportLimits input;
  MLSupportLimits constant;
  MLSupportLimits output;
};

dictionary MLSupportLimits {
  sequence<MLOperandDataType> dataTypes;
};

dictionary MLBinarySupportLimits {
  MLSupportLimits a;
  MLSupportLimits b;
  MLSupportLimits output;
};

dictionary MLSingleInputSupportLimits {
  MLSupportLimits input;
  MLSupportLimits output;
};

[SecureContext, Exposed=(Window, DedicatedWorker)]
interface MLGraph {
  undefined destroy();
};

enum MLInputOperandLayout {
  "nchw",
  "nhwc"
};

enum MLOperandDataType {
  "float32",
  "float16",
  "int32",
  "uint32",
  "int64",
  "uint64",
  "int8",
  "uint8"
};

dictionary MLOperandDescriptor {
  required MLOperandDataType dataType;
  required sequence<[EnforceRange] unsigned long> shape;
};

[SecureContext, Exposed=(Window, DedicatedWorker)]
interface MLOperand {
  readonly attribute MLOperandDataType dataType;
  readonly attribute FrozenArray<unsigned long> shape;
};

dictionary MLOperatorOptions {
  USVString label = "";
};

typedef (bigint or unrestricted double) MLNumber;

dictionary MLTensorDescriptor : MLOperandDescriptor {
  boolean readable = false;
  boolean writable = false;
};

[SecureContext, Exposed=(Window, DedicatedWorker)]
interface MLTensor {
  readonly attribute MLOperandDataType dataType;
  readonly attribute FrozenArray<unsigned long> shape;
  readonly attribute boolean readable;
  readonly attribute boolean writable;

  undefined destroy();
};

typedef record<USVString, MLOperand> MLNamedOperands;

[SecureContext, Exposed=(Window, DedicatedWorker)]
interface MLGraphBuilder {
  // Construct the graph builder from the context.
  constructor(MLContext context);

  // Create an operand for a graph input.
  MLOperand input(USVString name, MLOperandDescriptor descriptor);

  // Create an operand for a graph constant.
  MLOperand constant(MLOperandDescriptor descriptor,
                     AllowSharedBufferSource buffer);

  // Create a scalar operand from the specified number of the specified type.
  MLOperand constant(MLOperandDataType type, MLNumber value);

  // Compile the graph up to the specified output operands asynchronously.
  Promise<MLGraph> build(MLNamedOperands outputs);
};

dictionary MLArgMinMaxOptions : MLOperatorOptions {
  boolean keepDimensions = false;
  MLOperandDataType outputDataType = "int32";
};

partial interface MLGraphBuilder {
  MLOperand argMin(MLOperand input, [EnforceRange] unsigned long axis,
                   optional MLArgMinMaxOptions options = {});
  MLOperand argMax(MLOperand input, [EnforceRange] unsigned long axis,
                   optional MLArgMinMaxOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits argMin;
  MLSingleInputSupportLimits argMax;
};

dictionary MLBatchNormalizationOptions : MLOperatorOptions {
  MLOperand scale;
  MLOperand bias;
  [EnforceRange] unsigned long axis = 1;
  double epsilon = 1e-5;
};

partial interface MLGraphBuilder {
  MLOperand batchNormalization(MLOperand input, MLOperand mean, MLOperand variance,
                               optional MLBatchNormalizationOptions options = {});
};

dictionary MLBatchNormalizationSupportLimits {
  MLSupportLimits input;
  MLSupportLimits mean;
  MLSupportLimits variance;
  MLSupportLimits scale;
  MLSupportLimits bias;
  MLSupportLimits output;
};

partial dictionary MLOpSupportLimits {
  MLBatchNormalizationSupportLimits batchNormalization;
};

partial interface MLGraphBuilder {
  MLOperand cast(MLOperand input,
                 MLOperandDataType type,
                 optional MLOperatorOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits cast;
};

dictionary MLClampOptions : MLOperatorOptions {
  MLNumber minValue;
  MLNumber maxValue;
};

partial interface MLGraphBuilder {
  MLOperand clamp(MLOperand input, optional MLClampOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits clamp;
};

partial interface MLGraphBuilder {
  MLOperand concat(sequence<MLOperand> inputs,
                   [EnforceRange] unsigned long axis,
                   optional MLOperatorOptions options = {});
};

dictionary MLConcatSupportLimits {
  MLSupportLimits inputs;
  MLSupportLimits output;
};

partial dictionary MLOpSupportLimits {
  MLConcatSupportLimits concat;
};

enum MLConv2dFilterOperandLayout {
  "oihw",
  "hwio",
  "ohwi",
  "ihwo"
};

dictionary MLConv2dOptions : MLOperatorOptions {
  sequence<[EnforceRange] unsigned long> padding;
  sequence<[EnforceRange] unsigned long> strides;
  sequence<[EnforceRange] unsigned long> dilations;
  [EnforceRange] unsigned long groups = 1;
  MLInputOperandLayout inputLayout = "nchw";
  MLConv2dFilterOperandLayout filterLayout = "oihw";
  MLOperand bias;
};

partial interface MLGraphBuilder {
  MLOperand conv2d(MLOperand input,
                   MLOperand filter,
                   optional MLConv2dOptions options = {});
};

dictionary MLConv2dSupportLimits {
  MLSupportLimits input;
  MLSupportLimits filter;
  MLSupportLimits bias;
  MLSupportLimits output;
};

partial dictionary MLOpSupportLimits {
  MLConv2dSupportLimits conv2d;
};

enum MLConvTranspose2dFilterOperandLayout {
  "iohw",
  "hwoi",
  "ohwi"
};

dictionary MLConvTranspose2dOptions : MLOperatorOptions {
  sequence<[EnforceRange] unsigned long> padding;
  sequence<[EnforceRange] unsigned long> strides;
  sequence<[EnforceRange] unsigned long> dilations;
  sequence<[EnforceRange] unsigned long> outputPadding;
  sequence<[EnforceRange] unsigned long> outputSizes;
  [EnforceRange] unsigned long groups = 1;
  MLInputOperandLayout inputLayout = "nchw";
  MLConvTranspose2dFilterOperandLayout filterLayout = "iohw";
  MLOperand bias;
};

partial interface MLGraphBuilder {
  MLOperand convTranspose2d(MLOperand input, MLOperand filter,
                            optional MLConvTranspose2dOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLConv2dSupportLimits convTranspose2d;
};

partial interface MLGraphBuilder {
  MLOperand add(MLOperand a, MLOperand b, optional MLOperatorOptions options = {});
  MLOperand sub(MLOperand a, MLOperand b, optional MLOperatorOptions options = {});
  MLOperand mul(MLOperand a, MLOperand b, optional MLOperatorOptions options = {});
  MLOperand div(MLOperand a, MLOperand b, optional MLOperatorOptions options = {});
  MLOperand max(MLOperand a, MLOperand b, optional MLOperatorOptions options = {});
  MLOperand min(MLOperand a, MLOperand b, optional MLOperatorOptions options = {});
  MLOperand pow(MLOperand a, MLOperand b, optional MLOperatorOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLBinarySupportLimits add;
  MLBinarySupportLimits sub;
  MLBinarySupportLimits mul;
  MLBinarySupportLimits div;
  MLBinarySupportLimits max;
  MLBinarySupportLimits min;
  MLBinarySupportLimits pow;
};

partial interface MLGraphBuilder {
  MLOperand equal(MLOperand a,
                  MLOperand b,
                  optional MLOperatorOptions options = {});
  MLOperand greater(MLOperand a,
                    MLOperand b,
                    optional MLOperatorOptions options = {});
  MLOperand greaterOrEqual(MLOperand a,
                           MLOperand b,
                           optional MLOperatorOptions options = {});
  MLOperand lesser(MLOperand a,
                   MLOperand b,
                   optional MLOperatorOptions options = {});
  MLOperand lesserOrEqual(MLOperand a,
                          MLOperand b,
                          optional MLOperatorOptions options = {});
  MLOperand logicalNot(MLOperand a, optional MLOperatorOptions options = {});
};

dictionary MLLogicalNotSupportLimits {
  MLSupportLimits a;
  MLSupportLimits output;
};

partial dictionary MLOpSupportLimits {
  MLBinarySupportLimits equal;
  MLBinarySupportLimits greater;
  MLBinarySupportLimits greaterOrEqual;
  MLBinarySupportLimits lesser;
  MLBinarySupportLimits lesserOrEqual;
  MLLogicalNotSupportLimits logicalNot;
};

partial interface MLGraphBuilder {
  MLOperand abs(MLOperand input, optional MLOperatorOptions options = {});
  MLOperand ceil(MLOperand input, optional MLOperatorOptions options = {});
  MLOperand cos(MLOperand input, optional MLOperatorOptions options = {});
  MLOperand erf(MLOperand input, optional MLOperatorOptions options = {});
  MLOperand exp(MLOperand input, optional MLOperatorOptions options = {});
  MLOperand floor(MLOperand input, optional MLOperatorOptions options = {});
  MLOperand identity(MLOperand input, optional MLOperatorOptions options = {});
  MLOperand log(MLOperand input, optional MLOperatorOptions options = {});
  MLOperand neg(MLOperand input, optional MLOperatorOptions options = {});
  MLOperand reciprocal(MLOperand input, optional MLOperatorOptions options = {});
  MLOperand sin(MLOperand input, optional MLOperatorOptions options = {});
  MLOperand sqrt(MLOperand input, optional MLOperatorOptions options = {});
  MLOperand tan(MLOperand input, optional MLOperatorOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits abs;
  MLSingleInputSupportLimits ceil;
  MLSingleInputSupportLimits cos;
  MLSingleInputSupportLimits erf;
  MLSingleInputSupportLimits exp;
  MLSingleInputSupportLimits floor;
  MLSingleInputSupportLimits identity;
  MLSingleInputSupportLimits log;
  MLSingleInputSupportLimits neg;
  MLSingleInputSupportLimits reciprocal;
  MLSingleInputSupportLimits sin;
  MLSingleInputSupportLimits sqrt;
  MLSingleInputSupportLimits tan;
};

dictionary MLEluOptions : MLOperatorOptions {
  double alpha = 1;
};

partial interface MLGraphBuilder {
  MLOperand elu(MLOperand input, optional MLEluOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits elu;
};

partial interface MLGraphBuilder {
  MLOperand expand(MLOperand input,
                   sequence<[EnforceRange] unsigned long> newShape,
                   optional MLOperatorOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits expand;
};

dictionary MLGatherOptions : MLOperatorOptions {
  [EnforceRange] unsigned long axis = 0;
};

partial interface MLGraphBuilder {
  MLOperand gather(MLOperand input,
                   MLOperand indices,
                   optional MLGatherOptions options = {});
};

dictionary MLGatherSupportLimits {
  MLSupportLimits input;
  MLSupportLimits indices;
  MLSupportLimits output;
};

partial dictionary MLOpSupportLimits {
  MLGatherSupportLimits gather;
};

partial interface MLGraphBuilder {
  MLOperand gelu(MLOperand input, optional MLOperatorOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits gelu;
};

dictionary MLGemmOptions : MLOperatorOptions {
  MLOperand c;
  double alpha = 1.0;
  double beta = 1.0;
  boolean aTranspose = false;
  boolean bTranspose = false;
};

partial interface MLGraphBuilder {
  MLOperand gemm(MLOperand a, MLOperand b, optional MLGemmOptions options = {});
};

dictionary MLGemmSupportLimits {
  MLSupportLimits a;
  MLSupportLimits b;
  MLSupportLimits c;
  MLSupportLimits output;
};

partial dictionary MLOpSupportLimits {
  MLGemmSupportLimits gemm;
};

enum MLGruWeightLayout {
  "zrn",  // update-reset-new gate ordering
  "rzn"   // reset-update-new gate ordering
};

enum MLRecurrentNetworkActivation {
  "relu",
  "sigmoid",
  "tanh"
};

enum MLRecurrentNetworkDirection {
  "forward",
  "backward",
  "both"
};

dictionary MLGruOptions : MLOperatorOptions {
  MLOperand bias;
  MLOperand recurrentBias;
  MLOperand initialHiddenState;
  boolean resetAfter = true;
  boolean returnSequence = false;
  MLRecurrentNetworkDirection direction = "forward";
  MLGruWeightLayout layout = "zrn";
  sequence<MLRecurrentNetworkActivation> activations;
};

partial interface MLGraphBuilder {
  sequence<MLOperand> gru(MLOperand input,
                          MLOperand weight,
                          MLOperand recurrentWeight,
                          [EnforceRange] unsigned long steps,
                          [EnforceRange] unsigned long hiddenSize,
                          optional MLGruOptions options = {});
};

dictionary MLGruSupportLimits {
  MLSupportLimits input;
  MLSupportLimits weight;
  MLSupportLimits recurrentWeight;
  MLSupportLimits bias;
  MLSupportLimits recurrentBias;
  MLSupportLimits initialHiddenState;
  MLSupportLimits outputs;
};

partial dictionary MLOpSupportLimits {
  MLGruSupportLimits gru;
};

dictionary MLGruCellOptions : MLOperatorOptions {
  MLOperand bias;
  MLOperand recurrentBias;
  boolean resetAfter = true;
  MLGruWeightLayout layout = "zrn";
  sequence<MLRecurrentNetworkActivation> activations;
};

partial interface MLGraphBuilder {
  MLOperand gruCell(MLOperand input,
                    MLOperand weight,
                    MLOperand recurrentWeight,
                    MLOperand hiddenState,
                    [EnforceRange] unsigned long hiddenSize,
                    optional MLGruCellOptions options = {});
};

dictionary MLGruCellSupportLimits {
  MLSupportLimits input;
  MLSupportLimits weight;
  MLSupportLimits recurrentWeight;
  MLSupportLimits hiddenState;
  MLSupportLimits bias;
  MLSupportLimits recurrentBias;
  MLSupportLimits output;
};

partial dictionary MLOpSupportLimits {
  MLGruCellSupportLimits gruCell;
};

dictionary MLHardSigmoidOptions : MLOperatorOptions {
  double alpha = 0.2;
  double beta = 0.5;
};

partial interface MLGraphBuilder {
  MLOperand hardSigmoid(MLOperand input, optional MLHardSigmoidOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits hardSigmoid;
};

partial interface MLGraphBuilder {
  MLOperand hardSwish(MLOperand input, optional MLOperatorOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits hardSwish;
};

dictionary MLInstanceNormalizationOptions : MLOperatorOptions {
  MLOperand scale;
  MLOperand bias;
  double epsilon = 1e-5;
  MLInputOperandLayout layout = "nchw";
};

partial interface MLGraphBuilder {
  MLOperand instanceNormalization(MLOperand input,
                                  optional MLInstanceNormalizationOptions options = {});
};

dictionary MLNormalizationSupportLimits {
  MLSupportLimits input;
  MLSupportLimits scale;
  MLSupportLimits bias;
  MLSupportLimits output;
};

partial dictionary MLOpSupportLimits {
  MLNormalizationSupportLimits instanceNormalization;
};

dictionary MLLayerNormalizationOptions : MLOperatorOptions {
  MLOperand scale;
  MLOperand bias;
  sequence<[EnforceRange] unsigned long> axes;
  double epsilon = 1e-5;
};

partial interface MLGraphBuilder {
  MLOperand layerNormalization(MLOperand input,
                               optional MLLayerNormalizationOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLNormalizationSupportLimits layerNormalization;
};

dictionary MLLeakyReluOptions : MLOperatorOptions {
  double alpha = 0.01;
};

partial interface MLGraphBuilder {
  MLOperand leakyRelu(MLOperand input, optional MLLeakyReluOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits leakyRelu;
};

dictionary MLLinearOptions : MLOperatorOptions {
  double alpha = 1;
  double beta = 0;
};

partial interface MLGraphBuilder {
  MLOperand linear(MLOperand input, optional MLLinearOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits linear;
};

enum MLLstmWeightLayout {
  "iofg", // input-output-forget-cell gate ordering
  "ifgo"  // input-forget-cell-output gate ordering
};

dictionary MLLstmOptions : MLOperatorOptions {
  MLOperand bias;
  MLOperand recurrentBias;
  MLOperand peepholeWeight;
  MLOperand initialHiddenState;
  MLOperand initialCellState;
  boolean returnSequence = false;
  MLRecurrentNetworkDirection direction = "forward";
  MLLstmWeightLayout layout = "iofg";
  sequence<MLRecurrentNetworkActivation> activations;
};

partial interface MLGraphBuilder {
  sequence<MLOperand> lstm(MLOperand input,
                           MLOperand weight,
                           MLOperand recurrentWeight,
                           [EnforceRange] unsigned long steps,
                           [EnforceRange] unsigned long hiddenSize,
                           optional MLLstmOptions options = {});
};

dictionary MLLstmSupportLimits {
  MLSupportLimits input;
  MLSupportLimits weight;
  MLSupportLimits recurrentWeight;
  MLSupportLimits bias;
  MLSupportLimits recurrentBias;
  MLSupportLimits peepholeWeight;
  MLSupportLimits initialHiddenState;
  MLSupportLimits initialCellState;
  MLSupportLimits outputs;
};

partial dictionary MLOpSupportLimits {
  MLLstmSupportLimits lstm;
};


dictionary MLLstmCellOptions : MLOperatorOptions {
  MLOperand bias;
  MLOperand recurrentBias;
  MLOperand peepholeWeight;
  MLLstmWeightLayout layout = "iofg";
  sequence<MLRecurrentNetworkActivation> activations;
};

partial interface MLGraphBuilder {
  sequence<MLOperand> lstmCell(MLOperand input,
                               MLOperand weight,
                               MLOperand recurrentWeight,
                               MLOperand hiddenState,
                               MLOperand cellState,
                               [EnforceRange] unsigned long hiddenSize,
                               optional MLLstmCellOptions options = {});
};

dictionary MLLstmCellSupportLimits {
  MLSupportLimits input;
  MLSupportLimits weight;
  MLSupportLimits recurrentWeight;
  MLSupportLimits hiddenState;
  MLSupportLimits cellState;
  MLSupportLimits bias;
  MLSupportLimits recurrentBias;
  MLSupportLimits peepholeWeight;
  MLSupportLimits outputs;
};

partial dictionary MLOpSupportLimits {
  MLLstmCellSupportLimits lstmCell;
};

partial interface MLGraphBuilder {
  MLOperand matmul(MLOperand a, MLOperand b, optional MLOperatorOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLBinarySupportLimits matmul;
};

enum MLPaddingMode {
  "constant",
  "edge",
  "reflection",
  "symmetric"
};

dictionary MLPadOptions : MLOperatorOptions {
  MLPaddingMode mode = "constant";
  MLNumber value = 0;
};

partial interface MLGraphBuilder {
  MLOperand pad(MLOperand input,
                sequence<[EnforceRange] unsigned long> beginningPadding,
                sequence<[EnforceRange] unsigned long> endingPadding,
                optional MLPadOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits pad;
};

enum MLRoundingType {
  "floor",
  "ceil"
};

dictionary MLPool2dOptions : MLOperatorOptions {
  sequence<[EnforceRange] unsigned long> windowDimensions;
  sequence<[EnforceRange] unsigned long> padding;
  sequence<[EnforceRange] unsigned long> strides;
  sequence<[EnforceRange] unsigned long> dilations;
  MLInputOperandLayout layout = "nchw";
  MLRoundingType roundingType = "floor";
  sequence<[EnforceRange] unsigned long> outputSizes;
};

partial interface MLGraphBuilder {
  MLOperand averagePool2d(MLOperand input, optional MLPool2dOptions options = {});
  MLOperand l2Pool2d(MLOperand input, optional MLPool2dOptions options = {});
  MLOperand maxPool2d(MLOperand input, optional MLPool2dOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits averagePool2d;
  MLSingleInputSupportLimits l2Pool2d;
  MLSingleInputSupportLimits maxPool2d;
};

partial interface MLGraphBuilder {
  MLOperand prelu(MLOperand input,
                  MLOperand slope,
                  optional MLOperatorOptions options = {});
};

dictionary MLPreluSupportLimits {
  MLSupportLimits input;
  MLSupportLimits slope;
  MLSupportLimits output;
};

partial dictionary MLOpSupportLimits {
  MLPreluSupportLimits prelu;
};

dictionary MLReduceOptions : MLOperatorOptions {
  sequence<[EnforceRange] unsigned long> axes;
  boolean keepDimensions = false;
};

partial interface MLGraphBuilder {
  MLOperand reduceL1(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceL2(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceLogSum(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceLogSumExp(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceMax(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceMean(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceMin(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceProduct(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceSum(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceSumSquare(MLOperand input, optional MLReduceOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits reduceL1;
  MLSingleInputSupportLimits reduceL2;
  MLSingleInputSupportLimits reduceLogSum;
  MLSingleInputSupportLimits reduceLogSumExp;
  MLSingleInputSupportLimits reduceMax;
  MLSingleInputSupportLimits reduceMean;
  MLSingleInputSupportLimits reduceMin;
  MLSingleInputSupportLimits reduceProduct;
  MLSingleInputSupportLimits reduceSum;
  MLSingleInputSupportLimits reduceSumSquare;
};

partial interface MLGraphBuilder {
  MLOperand relu(MLOperand input, optional MLOperatorOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits relu;
};

enum MLInterpolationMode {
  "nearest-neighbor",
  "linear"
};

dictionary MLResample2dOptions : MLOperatorOptions {
  MLInterpolationMode mode = "nearest-neighbor";
  sequence<float> scales;
  sequence<[EnforceRange] unsigned long> sizes;
  sequence<[EnforceRange] unsigned long> axes;
};

partial interface MLGraphBuilder {
  MLOperand resample2d(MLOperand input, optional MLResample2dOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits resample2d;
};

partial interface MLGraphBuilder {
  MLOperand reshape(MLOperand input,
                    sequence<[EnforceRange] unsigned long> newShape,
                    optional MLOperatorOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits reshape;
};

partial interface MLGraphBuilder {
  MLOperand sigmoid(MLOperand input, optional MLOperatorOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits sigmoid;
};

partial interface MLGraphBuilder {
  MLOperand slice(MLOperand input,
                  sequence<[EnforceRange] unsigned long> starts,
                  sequence<[EnforceRange] unsigned long> sizes,
                  optional MLOperatorOptions options = {});
};

partial  dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits slice;
};

partial interface MLGraphBuilder {
  MLOperand softmax(MLOperand input,
                    [EnforceRange] unsigned long axis,
                    optional MLOperatorOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits softmax;
};

partial interface MLGraphBuilder {
  MLOperand softplus(MLOperand input, optional MLOperatorOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits softplus;
};

partial interface MLGraphBuilder {
  MLOperand softsign(MLOperand input, optional MLOperatorOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits softsign;
};

dictionary MLSplitOptions : MLOperatorOptions {
  [EnforceRange] unsigned long axis = 0;
};

partial interface MLGraphBuilder {
  sequence<MLOperand> split(
      MLOperand input,
      ([EnforceRange] unsigned long or sequence<[EnforceRange] unsigned long>) splits,
      optional MLSplitOptions options = {});
};

dictionary MLSplitSupportLimits {
  MLSupportLimits input;
  MLSupportLimits outputs;
};

partial dictionary MLOpSupportLimits {
  MLSplitSupportLimits split;
};

partial interface MLGraphBuilder {
  MLOperand tanh(MLOperand input, optional MLOperatorOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits tanh;
};

dictionary MLTransposeOptions : MLOperatorOptions {
  sequence<[EnforceRange] unsigned long> permutation;
};

partial interface MLGraphBuilder {
  MLOperand transpose(MLOperand input, optional MLTransposeOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits transpose;
};

dictionary MLTriangularOptions : MLOperatorOptions {
  boolean upper = true;
  [EnforceRange] long diagonal = 0;
};

partial interface MLGraphBuilder {
  MLOperand triangular(MLOperand input, optional MLTriangularOptions options = {});
};

partial dictionary MLOpSupportLimits {
  MLSingleInputSupportLimits triangular;
};

partial interface MLGraphBuilder {
  MLOperand where(MLOperand condition,
                  MLOperand trueValue,
                  MLOperand falseValue,
                  optional MLOperatorOptions options = {});
};

dictionary MLWhereSupportLimits {
  MLSupportLimits condition;
  MLSupportLimits trueValue;
  MLSupportLimits falseValue;
  MLSupportLimits output;
};

partial dictionary MLOpSupportLimits {
  MLWhereSupportLimits where;
};

Issues Index

Document operations susceptible to out-of-bounds access as a guidance to implementers.
Investigate side channel attack feasibility considering the current state where CPU is shared between processes running renderers.
Hinting partially mitigates the concern. Investigate additional mitigations.
MLContextOptions is under active development, and the design is expected to change, informed by further implementation experience and new use cases from the wider web community. [Issue #623]
Consider adding a mechanism for reporting errors during dispatch(). [Issue #778]
MLContextOptions is under active development, and the design is expected to change, informed by further implementation experience and new use cases from the wider web community. The Working Group is considering additional API controls to allow the definition of a fallback device, multiple devices in a preferred order, or an exclusion of a specific device. Other considerations under discussion include error handling, ultimate fallback, and quantized operators. Feedback is welcome on any of these design considerations from web developers, library authors, OS and hardware vendors, and other stakeholders via GitHub: [Issue #623]
More rigorously define this timeline. [Issue #529]
Add a mechanism for reporting errors during graph execution. [Issue #778]
Add a mechanism for reporting errors while writing to a tensor. [Issue #778]
Should 0-size dimensions be supported? [Issue #391]
The maximum number of operand dimensions is not defined, but native ML APIs usually have a maximum supported size. [Issue #456]
Support for unions of bigint and numeric types is new in [WEBIDL], and implementation support is also limited. Prototype implementations are encouraged to provide feedback for this approach. [Issue #whatwg/webidl#1388]
If 0-size dimensions are allowed, revise these steps. [Issue #391]
If 0-size dimensions are allowed, revise the above step. [Issue #391]