Performing inference locally can:
- Preserve privacy, by not shipping user data across the network
- Improve performance, by eliminating network latency
- Provide a fallback if network access is unavailable, possibly using a smaller and lower quality model
- The API is modeled after existing inference APIs like TensorFlow Serving, Clipper, TensorRT, and MXNet Model Server, which are already widely used by many products and organizations for large volumes of requests. Using an API that’s similar to a server-based API makes it easier to switch between server-based and local-based inference.
Unlike the Shape Detection API, the model loader APIs are generic. Application-specific libraries and APIs could be built on top.
The API does not support training a model, modifying a model, federated learning, or other functionality. Maybe future APIs could address those, if useful.