Ethical Web Machine Learning

Editor’s Draft,

More details about this document
This version:
https://webmachinelearning.github.io/ethical-webmachinelearning
Latest published version:
https://w3.org/TR/ethical-webmachinelearning
Feedback:
GitHub
Editor:
Anssi Kostiainen (Intel Corporation)

Abstract

This document discusses ethical issues associated with using Machine Learning and outlines considerations for web technologies that enable related use cases.

Status of this document

This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document was published by the Web Machine Learning Working Group as an Editors' Draft. This document is intended to become a W3C Recommendation. Feedback and comments on this specification are welcome. Please use Github issues. Discussions may also be found in the public-webmachinelearning@w3.org archives.

Publication as an Editors' Draft does not imply endorsement by W3C and its Members. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 2 November 2021 W3C Process Document.

This document is published by the Machine Learning Working Group as a Working Group Note and does not contain any normative content. This document is for guidance only and does not constitute legal or professional advice. The document will evolve and receives updates as often as needed.

1. Introduction

Machine Learning (ML) enables compelling new user experiences that affect individuals, society and shape the world. These experiences may feel magical, but there’s no magic. It is algorithms at play that are just invisible and opaque to the user, and sometimes their output is wrong. Such a powerful technology with broad influence should be developed together with an ethical set of guiding principles to ensure any ethical issues that arise are widely understood so they can be considered from the get-go when designing and applying this technology to solve real-world problems. The objective of this document is to advance understanding of ethical issues at the intersection of the and web technologies.

Joseph Weizenbaum, a pioneer of modern artificial intelligence, wrote a book in 1976 whose guidance on the limitations of computers and unique human qualities remains even more important and relevant today:

"I want them [teachers of computer science] to have heard me affirm that the computer is a powerful new metaphor for helping us understand many aspects of the world, but that it enslaves the mind that has no other metaphors and few other resources to call on. The world is many things, and no single framework is large enough to contain them all, neither that of man’s science nor of his poetry, neither that of calculating reason nor that of pure intuition. And just as the love of music does not suffice to enable one to play the violin - one must also master the craft of the instrument and the music itself - so it is not enough to love humanity in order to help it survive. The teacher’s calling to his craft is therefore an honorable one. But he must do more than that: he must teach more than one metaphor, and he must teach more by the example of his conduct than by what he writes on the blackboard. He must teach the limitations of his tools as well as their power." —Joseph Weizenbaum [weizenbaum1976cpahr]

2. The branches of ethics

The branches of ethics that inform this field include, but are not limited to [europarl2020gdprai]:

Today, ML algorithms are commonly used in decision-making processes in modern applications across domains e.g. in chatbots, fraud detection systems and recommendation engines. ML is also increasingly used in applications that directly deal with personal data e.g. in facial recognition and speech recognition systems. ML-assisted decision-making processes that involve personal data are susceptible to ethical issues such as bias, transparency and privacy.

This document identifies and discusses key ethical issues in machine learning in particular from the perspective of the web as a development and deployment platform. The web community has accumulated expertise in privacy-by-design web standards, accessibility techniques to support social inclusion and more recently has also distilled ethical web principles to steer web technologies to consider general ethical issues. The aim of this document is to build on this foundation and focus on ethical issues at the Web and Machine Learning intersection, for example, to help identify mitigation techniques and best practices for normative web specifications.

3. Machine Learning and the Web Platform

In parallel to general advances in ML, the web platform is gaining client-side Machine Learning capabilities that makes large-scale deployment of ML systems feasible without investment in cloud-based infrastructure by moving this process to the client, mediated by the browser. This development opens the door to tens of millions of do-it-yourself web developers and aligns this technology with the decentralized web architecture ideal that minimizes single points of failure and single points of control, democratizing the technology’s use.

At its core, an application that makes use of ML employs a model trained to infer understanding from data in order to solve a domain-specific problem. These models used to solve real-world problems are complex input-output functions that attempt to uncover statistical correlations mapping inputs to expected outputs learning from past relationships. It is in practice impossible for humans without domain-specific expertise to understand and evaluate the predictions or decisions made by such systems. In web-based ML applications, the model may reside on the server or on the client, and similarly the data processing, or inference, can be offloaded to the client.

By offloading computationally expensive tasks involving ML to on-device hardware, any web application could be enriched with ML capabilities, and existing web content progressively enhanced. The web browser, also known as the user agent, acts on behalf of a user and is well-positioned to help the user make ethical decisions, e.g. by improving transparency and explainability of machine learning models.

James H. Moor defines four kinds of ethical robots: ethical impact agent, implicit ethical agents, explicit ethical agents, and full ethical agents. Using Moor’s definition, expand the user agent’s role as a good ethical decision-maker.

Solving all ethical issues identified in this space by means of web specifications and Web API design alone is not realistic. However, that does not mean substantial improvements could not be made or set in motion. Raising the awareness of ethical issues in this context is important and the group’s long-term aspiration is to see ethical issues elevated to the level of other web platform horizontal considerations — internationalization, accessibility, privacy and security — to ensure this evolving technology with all its flaws and uncertainties, remains a force for good when combined with the reach of the Web.

4. Ethical frameworks

This section aims to build understanding of the still evolving landscape of ethical frameworks through review of existing literature studies and recast the most prominent principles extracted from these studies relevant in web context in § 6 Ethical considerations for ML-enabling web technologies.

5. Ethical Web Principles

This section discusses the W3C TAG Ethical Web Principles [ETHICAL-WEB] and highlights issues and opportunities applying these principles to web technologies that enable new web user experiences with ML. Where applicable, these principles are mapped to § 6 Ethical considerations for ML-enabling web technologies.

There is one web

Differences in Internet connection speeds across geographical locations and large size of production-grade models means the user experience of on-device inference is not equal in all locations. This issue is not specific to ML and can be mitigated in part by using a Content Delivery Network and by offering reduced size models. This network condition-based adaptation is possible if network characteristics are exposed to the client, as proposed by the Network Information API. This concern generalizes to distribution of other large assets such as video and image assets that can adapt on demand.

The web should not cause harm to society

The group is committed to priority of constituencies as demonstrated by API design driven by use cases that prioritize privacy-enhancing and accessibility-improving features. The work is governed by the open W3C Process and W3C Code of Ethics and Professional Conduct to ensure the requirements and views of marginalized communities are respected.

The web must support healthy community and debate

The group encourages submission of use cases that apply this technology in ways that mitigate ethical issues on the web. For example, detecting fake video use case enables mitigation of misinformation by means of identifying fake videos and images in real-time using this technology.

The web is for all people

Machine Learning can improve web accessibility of people with a wide range of disabilities. People with visual disabilities benefit, for example, from person detection, face recognition, super resolution, image captioning, and video summarization capabilities. Furthermore, web user experience for people with auditory disabilities is improved with noise suppression and language disabilities are addressed by machine translation capabilities. Many of these capabilities are beneficial not just to disabled, but all people.

Security and privacy are essential

Client-side inference enhances privacy compared to cloud-based inference, since input data such as locally sourced images or video streams stay within the browser’s sandbox. Furthermore, transparency toward the user is increased when an explicit API is utilized compared to re-purposing e.g. the WebGL graphics API for inference.

The web must enable freedom of expression

Some use cases, such as face recognition, are known to be used in inappropriate ways e.g. for surveillance. While this is not a web-specific concern, the web has built-in affordances that help mitigate this concern by requiring explicit consent to access privacy-sensitive capabilities such as on-device camera.

The web must make it possible for people to verify the information they see

The complexity of real-world model architectures means validating or verifying the output of a model is in feasible. However, Web APIs by their design make it possible to integrate into browsers developer tools features that help build intuition on how neural networks work, in the spirit of "view source" principle. Web-based visualization tools have been developed for deep networks for educational use and their integration into browsers remains further work.

The web must enhance individuals' control and power

Client-side Web APIs such as Web Neural Network API enable local processing and enable decentralized web architecture democratizing development.

The web must be an environmentally sustainable platform

Machine learning algorithms consume significant amount of energy and the field of energy evaluation is still evolving, with both energy prediction modeling and power monitoring tools being developed [garcia2019energyconsumption]. This represents an opportunity for web browsers to make visible the energy impact of various workloads running in the browser, for example through the proposed Compute Pressure API.

The web is transparent

Given the design of the Web Neural Network API, it is possible to integrate into web browser developer tools a conceptual graph of the model’s structure to inspect and understand the model architecture.

The web is multi-browser, multi-OS and multi-device

The APIs in scope of this group will not be tied to any particular platform and will be implementable on top of existing major platform APIs, such as Android Neural Networks API, Windows DirectML, and macOS/iOS Metal Performance Shaders and Basic Neural Network Subroutines.

People should be able to render web content as they want

This technology is agnostic with respect to presentational aspects. There is however a useful analog to other web content that impose substantial network and compute resource usage on load such as auto-played videos. Similarly to videos, the sites should make it opt-in to load large models on load or run expensive compute tasks.

6. Ethical considerations for ML-enabling web technologies

Discuss ethical considerations for ML-enabling web technologies and web-applicable mitigations in this section. Choose a subset of issues identified in § 4 Ethical frameworks based on popularity, applicability and/or severity in the web context and cross-link to § 5 Ethical Web Principles?

While the WebGL graphics API [webgl-2], WebGPU API [webgpu] and WebAssembly [wasm-core-1] enable certain ML use cases in constrained manner, while many latency-sensitive real-world use cases backed by bigger complex models in practice require a purpose-built Web Neural Network API [webnn] for good user experience. This emerging API makes use of platform capabilities beneficial for ML such as CPU parallelism, general-purpose GPU, or dedicated ML hardware accelerators, and allow browsers to better understand workloads being executed and implement adequate controls.

6.1. Bias

Decision-making systems may be unfair, or have harmful impacts to certain groups of people due to social differences.

6.2. Privacy

Models that involve sensitive human-centric data e.g. biometrics, medical information and facial recognition data can be used to exploit user’s privacy.

TODO: Review An Overview of Privacy in Machine Learning [decristofaro2020overview] and extract web-applicable considerations.

6.3. Explainability and Transparency

Complex relationships between model inputs and outputs make it impossible for users to infer how the process leads to a particular outcome.

6.3.1. Visualization techniques

Visualization of neural networks helps domain experts build intuition on how changes to the network structure and hyperparameters influence the output. An example of an educational visualizations that runs in a browser is TensorFlow Playground [smilkov2017directmanipulation], particularly helpful for training. Netron is a viewer for neural networks that supports popular model formats and runs in a browser, aimed as a tool for domain experts to analyze a model’s structure.

Feature visualization techniques such as saliency map help visualize what a neural network is paying attention to and offers a view into computer vision models that otherwise seem like black boxes to regular web users. FlashTorch is one toolkit using this technique to help understand how neural networks perceive images.

Integration of visualization techniques into browser developer tools and ML libraries as user-facing features is possible future work.

6.3.2. Model reporting

Another approach to increasing transparency is to make visible information about what a trained machine learning model is and how well it works. Model Cards [mitchell2019modelcards] is a framework for reporting such information in a human-readable manner and is accompanied by Model Cards Toolkit to automate Model Card creation. Exposing such information through browser UI in a consistent manner is possible future work, as discussed at the W3C workshop.

7. References backlog

This section is where we park references and related thoughts that might be useful and need to be refined and looked into.

References

Informative References

[ABOUT-ML]
ABOUT ML Resources Library. URL: https://partnershiponai.org/about-ml-resources-library/
[ASBROECK2019DATAOWNERSHIP]
Big Data & Issues & Opportunities: Data Ownership. 2019. URL: https://www.twobirds.com/en/news/articles/2019/global/big-data-and-issues-and-opportunities-data-ownership
[DECRISTOFARO2020OVERVIEW]
An Overview of Privacy in Machine Learning. URL: https://arxiv.org/abs/2005.08679
[ETHICAL-WEB]
Daniel Appelquist; Hadley Beeman. W3C TAG Ethical Web Principles. 27 October 2020. TAG Finding. URL: https://www.w3.org/2001/tag/doc/ethical-web-principles/
[ETHZURICH2019STUDY]
Artificial Intelligence: the global landscape of ethics guidelines (preprint). 2019. URL: https://arxiv.org/pdf/1906.11668.pdf
[EUROPARL2020GDPRAI]
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence. 2020. URL: https://www.europarl.europa.eu/stoa/en/document/EPRS_STU(2020)641530
[GARCIA2019ENERGYCONSUMPTION]
Estimation of energy consumption in machine learning. 2019. URL: https://doi.org/10.1016/j.jpdc.2019.07.007
[HARVARD2020STUDY]
Principled Artificial Intelligence: Mapping Consensus in Ethical and Rights-based Approaches to Principles for AI. 2020. URL: https://dash.harvard.edu/handle/1/42160420
[MITCHELL2019MODELCARDS]
Model Cards for Model Reporting. 2019. URL: https://arxiv.org/abs/1810.03993
[SMILKOV2017DIRECTMANIPULATION]
Direct-Manipulation Visualization of Deep Networks. URL: https://arxiv.org/abs/1708.03788
[WASM-CORE-1]
Andreas Rossberg. WebAssembly Core Specification. 5 December 2019. REC. URL: https://www.w3.org/TR/wasm-core-1/
[WEBGL-2]
Dean Jackson; Jeff Gilbert. WebGL 2.0 Specification. 12 August 2017. URL: https://www.khronos.org/registry/webgl/specs/latest/2.0/
[WEBGPU]
Dzmitry Malyshau; Kai Ninomiya. WebGPU. 8 December 2021. WD. URL: https://www.w3.org/TR/webgpu/
[WEBNN]
Ningxin Hu; Chai Chaoweeraprasit. Web Neural Network API. 25 November 2021. WD. URL: https://www.w3.org/TR/webnn/
[WEFORUM2018STUDY]
How to Prevent Discriminatory Outcomes in Machine Learning. 2018. URL: https://www3.weforum.org/docs/WEF_40065_White_Paper_How_to_Prevent_Discriminatory_Outcomes_in_Machine_Learning.pdf
[WEIZENBAUM1976CPAHR]
Computer Power and Human Reason: From Judgment to Calculation. 1976.