Data scientists sometimes have to (help) “productionize” their work, i.e. integrate data analysis, dashboard, and predictive modeling into a larger process or software pipeline. For example, imagine a system that (1) monitors for a data change, (2) triggers data analysis process whenever a change happens, and (3) takes the output of the analysis to show a webpage and/or store output parameters in a database for other systems to use.

Data scientists typically work in part (2), prototyping bunch of R or python codes. But when it’s time to build and deploy the system, integrating such data science codes is not trivial. A big challenge is that Data scientists' work environment (e.g. Macbook laptop with R and many, many, many R packages) is typically very different from a “deployment” environment (e.g. linux box in AWS EC2 or corp VMs). Installing R and bunch of dependency R libraries on the machine is frowned upon by ops and software engineers, since it’s usually a painful, fragile process.

In an ideal world, R / python codes data scientist developed on their laptop would “just work” when dropped on the deployment server(s). Too good to be true?? Well, that ideal world is here already thanks to the fantastic technology called “Docker”. Using Docker, data science analysis and prototype could become super close to something that could be deployed very fast and efficiently. Just like devops helped developers productionize and operationalize their work better. We can even call it “data science devops”.

Essentially, the first step to achieve data science devops consist of two practices:

1. Make the R codes into a command line script that could be executed via Rscript, preferably with advanced option parsers like R argparse. This has the added benefit of forcing reproducibility. Also data scientists are forced to think more in terms of API way and “do one thing well” (UNIX philosophy) mentality that lead to cleaner code structure.
$docker pull your-org/my-r-app and could be run like: $ docker run your-org/my-r-app ARG1, ARG2, ...