Last updated on 2020-05-26
by Yuan Tang
This CRAN task view contains a list of packages, grouped by topic, that
provides functionalities to streamline the process of deploying models to various
environments, such as mobile devices, edge devices, cloud, and GPUs, for
scoring or inferencing on new data.
Model deployment is often challenging due to various reasons. Some example challenges are:
It involves deploying models on heterogenous environments, e.g. edge devices,
mobile devices, GPUs, etc.
It is hard to compress the model to very small size that could fit on devices with limited storage while keeping the same precision and minimizing the overhead to load the model for inference.
Deployed models sometimes need to process new data records within limited memory on small devices.
Many deployment environments have bad network connectivity so sometimes cloud solutions may not
meet the requirements.
There's interest in stronger user data privacy paradigms where user data does not need to leave the mobile device.
There's growing demand to perform on-device model-based data filtering before collecting the data.
Many of the areas discussed in this Task View are undergoing rapid
changes in industries and academia. Please send any suggestions to the
task view maintainer
or submit a pull request or issue to the
Github repository of this task view
Suggestions and corrections by Achim Zeileis, Dirk Eddelbuettel, and Kevin Kuo (as well as others I may have forgotten to add here) are gratefully acknowledged. Thanks to Dirk Eddelbuettel who made the initial
.ctv file and the Markdown conversion script available at the Github repository of CRAN Task View for High Performance Computing here. Last but not least, thanks to Dirk Eddelbuettel and Achim Zeileis who helped me get started on organizing this task view.
Deployment through Different Types of Artifacts
This section includes packages that provides functionalities to export the trained
model to an artifact that could fit in small devices such as mobile devices
(e.g. Android, iOS) and edge devices (Rasberri Pi). These packages are built
based on different model format.
Predictive Model Markup Language (PMML) is an XML-based language which
provides a way for applications to define statistical and data mining models
and to share models between PMML compliant applications. The following packages
are based on PMML:
- The pmml package provides the main interface to PMML.
- The pmmlTransformations package allows for data to be transformed before using it to
construct models. Builds structures to allow functions in the PMML package to
output transformation details in addition to the model in the resulting PMML file.
- The rattle package allows to load data from a CSV file (or via ODBC), transform and explore the data, build and evaluate models, and export models as PMML or as scores.
- The arules package provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). The associations can be written to disk in PMML.
- The arulesSequences package is an add-on for arules to handle and mine frequent sequences.
- The arulesCBA package provides a function to build an association rule-based classifier for data frames, and to classify incoming data frames using such a classifier.
Plain Old Java Object (POJO) or a Model Object, Optimized (MOJO) are intended to
be easily embeddable in any Java environment. The only compilation and runtime
dependency for a generated model is a h2o-genmodel.jar file produced as the
build output of these packages. The h2o package provides easy-to-use interface to
build a wide range of machine learning models, such as GLM, DRF, and XGBoost models based on xgboost package, which can then be exported as MOJO and POJO format. The MOJO and POJO artifacts can then be loaded by its REST interface as well as different language bindings, e.g. Java, Scala, R, and Python.
Portable Format for Analytics (PFA) is a specification for event-based
processors that perform predictive or analytic calculations and is aimed at
helping smooth the transition from statistical model development to large-scale
and/or online production. PFA combines the ease of portability across systems with algorithmic flexibility: models, pre-processing, and post-processing are all functions that can be arbitrarily composed, chained, or built into complex workflows. The aurelius package provides tools for converting R
objects and syntax into the PFA format.
TensorFlow's SavedModel as well as its optimized version TensorFlow Lite, which uses many techniques for achieving low latency such as optimizing the kernels for mobile apps, pre-fused activations, and quantized
kernels that allow smaller and faster (fixed-point math) models. It enables
on-device machine learning inference with low latency and small binary size.
The packages listed below can produce models in this format. Note that these packages are R wrappers of their corresponding Python API based on the reticulate package. Though Python binary is required for creating the models, it's not required during inference time for deployment.
- The tensorflow package provides full access to TensorFlow API for numerical computation using data flow graphs.
- The tfestimators package provides high-level API to machine learning models as well as highly customized neural network architectures.
- The keras package high-level API to construct different types of neural networks.
The onnx package provides the interface to Open Neural Network Exchange (ONNX) which is a standard format for models built using different frameworks (e.g. TensorFlow, MXNet, PyTorch, CNTK, etc). It defines an extensible computation graph model, as well as definitions of built-in operators and standard data types. Models trained in one framework can be easily transferred to another framework for inference. This open source format enables the interoperability between different frameworks and streamlining the path from research to production will increase the speed of innovation in the AI community. Note that this package is based on the reticulate package to interface with the original Python API so Python binary is required for deployment.
The mleap package is a sparklyr extension that provides an interface to MLeap. MLeap is an open source library that enables the persistence of Apache Spark ML pipelines and subsequent deployment in any Java-enabled device or service. At runtime, in addition to the serialized model file, the dependencies are a Java Virtual Machine (JVM) and the MLeap Runtime, and a Spark cluster is not required.
Deployment through Cloud/Server
Many deployment environments are based on cloud/server. The following packages
provides functionalities to deploy models in those types of environments:
- The yhatr package allows to deploy, maintain, and invoke models via the Yhat
- The cloudml package provides functionality to easily deploy models to
Google Cloud ML Engine.
The tfdeploy package provides functions to run a local test server that supports the same REST API as CloudML and RStudio Connect.
- The domino package provides R interface to Domino CLI, a service that makes it easy to run your code on scalable hardware, with integrated version control and collaboration features
designed for analytical workflows.
- The tidypredict package provides functionalities to run predictions inside database. It's based on dplyr and dbplyr that could translate data manipulations written in R to database queries that can be used later to execute the data transformations and aggregations inside various types of databases.
The ibmdbR package allows many basic and complex R operations to be pushed down into the database, which removes the main memory boundary of R and allows to make full use of parallel processing in the underlying database.
The sparklyr package provides bindings to Apache Spark's distributed machine learning library and allows to deploy the trained models to clusters. Additionally, the rsparkling package uses sparklyr for Spark job deployment while using h2o package for regular model building.
- The mrsdeploy package provides functions for establishing a remote session in a console application and for publishing and managing a web service that is backed by the R code block or script you provided.
- The opencpu package provides a server that exposes a simple but powerful HTTP API for RPC and data interchange with R. This provides a reliable and scalable foundation for statistical services or building R web applications.
- Several general purpose server/client frameworks for R exist that could help
deploy models in server based environments:
- The Rserve and RSclient packages both provide server and client functionality for TCP/IP or local socket interfaces to enable access to R from many languages and systems.
- The httpuv package provides a low-level socket and protocol support for handling HTTP
and WebSocket requests directly within R.
- Several packages offer functionality for turning R code into a web API:
- The FastRWeb package provides some basic infrastructure for this.
- The plumber package allows you to create a web API by merely decorating your existing R source code with special comments.
- The RestRservepackage is a R web API framework for building high-performance microservices and app backends based on Rserve.