Routine is mostly a good thing. Morning routine, gym routine, bedtime routine, etc. Thanks to routine or good habit, one doesn't spend too much time and energy on deciding on what/how to do it, saving energy for more important questions like "why".

Routine is mostly a good thing for data scientist, too. Here's my routine for starting a new data science project in R, large or small:

  • Create a github repo for the project with sensible name, all lowercase and dash, no underscore (~1min)
  • git clone to my usual project directory (~/projects/) (30sec)
  • Write for what the project is about (~1min)
  • Fire up Rstudio and create RStudio project (.Rproj) in the directory (~1min)
  • Write the first R script, typically named initial-analysis.R
  • First few lines of the scripts are almost always the same, like:
    • library(tidyverse)
    • df <- read_csv("datafile")
    • glimpse(df)
    • df %>% ggplot(aes(x, y)) + geom_.... : yes... this is where things start to diverge...

So, that's about 10min to hit the ground running and start producing useful stuff.

Once things start rolling, daily routines are similar:

  • Bunch of data massaging, like:
    • df %>%
    • group_by(x) %>%
    • filter(y %in% c("good", "fine")) %>%
    • summarize(mz=median(z))
  • ... and visualization:
    • df %>%
    • ggplot(aes(x, y)) +
    • geom_... +
    • facet_wrap(~w)
  • ... and reporting:
    • rmarkdown::render("that-special-markdown.Rmd")
  • ... and git commit / git push frequently.
  • Talk to the stakeholders for questions, news, etc.

But, overall, fairly automatic, fast, and effective. Yes, routine is mostly a good thing.

What's your routine for starting a data science project in R?

Very different from mine??

Let me (and the world) know!