For the last 12 months I’ve been using R and R markdown for the majority of my data projects. To be honest, I’m head over heels for improvements in reporting efficiency and a cleaner reproducible data pipeline.
The tidyverse has revolutionised R for me. It has taken me a good while to really reprogram how my brain thinks but I’m convince it was worth persisting.
Reading in Excel data with the tidyverse
data = readxl::read_xlsx(filename)
My X__1 = X…2 problem
Recently the tibble package updated the way names are modified to ensure they are unique. The readxl package used to name any unidentified columns as X__1, X__2, X__3, etc. As of Dec 2018 (tibble version 2.0.0) it names all ambiguous names with an appended with …n where n is the column number. e.g. ID, …2, Age, …4
This is the first time I’ve run into updates breaking my old code but inspired by the latest Not So Standard Deviations episode which mentioned backwards compatibility I was convinced there was a genuine solution (anything has to be better than me changing all my variable names to the new format). GitHub issue tracker to the rescue!
A huge thank you to Jenny Bryan for explaining the reasoning behind the change and offering up some great solutions. It’s truly amazing the work that goes into these open source packages – the world is indebted to the team at R Studio!
Modification for backwards compatibility
data = readxl::read_xlsx(filename,.name_repair=”minimal”)
data = tibble::repair_names(
tibble::as_tibble(data, validate = FALSE), prefix = “X”, sep = “__”
)