Blog

readxl – X__1 is no longer

For the last 12 months I’ve been using R and R markdown for the majority of my data projects. To be honest, I’m head over heels for improvements in reporting efficiency and a cleaner reproducible data pipeline.

The tidyverse has revolutionised R for me. It has taken me a good while to really reprogram how my brain thinks but I’m convince it was worth persisting.

Reading in Excel data with the tidyverse

data = readxl::read_xlsx(filename)

My X__1 = X…2 problem

Recently the tibble package updated the way names are modified to ensure they are unique. The readxl package used to name any unidentified columns as X__1, X__2, X__3, etc. As of Dec 2018 (tibble version 2.0.0) it names all ambiguous names with an appended with …n where n is the column number. e.g. ID, …2, Age, …4

This is the first time I’ve run into updates breaking my old code but inspired by the latest Not So Standard Deviations episode which mentioned backwards compatibility I was convinced there was a genuine solution (anything has to be better than me changing all my variable names to the new format). GitHub issue tracker to the rescue!

A huge thank you to Jenny Bryan for explaining the reasoning behind the change and offering up some great solutions. It’s truly amazing the work that goes into these open source packages – the world is indebted to the team at R Studio!

Modification for backwards compatibility

data = readxl::read_xlsx(filename,.name_repair=”minimal”)
data = tibble::repair_names(
tibble::as_tibble(data, validate = FALSE), prefix = “X”, sep = “__”
)

Generalisability of pharmacoepidemiological studies using restricted prescription data.

I am glad to report that my first ever 1st author publication is now officially available. I’ve been honoured to help many people publish work over my career as a clinical trial statistician (and I always appreciated the effort involved in getting everything together and approved) but it has been a great experience to do everything my self.

This is the first of many to come from the pharmacoepidemiology work I’ve been doing in Ireland and hopefully they’re not far behind this one.

I’m in favour of open access publishing so here is the link to the submitted version as per the policy of the Irish Journal of Medical Science: Brown et al. Generalisability of pharmacoepidemiological studies using GMS data

ANY function in SAS

I have been using SAS for a few months now and am starting to think in SAS code. There are still times when I get frustrated mainly because my internal function library isn’t complete.  Sometimes I’m sure SAS is missing a function I’m used to using elsewhere.  Luckily this isn’t a big problem – you just write a SAS macro to fill the void!

So this was my ANY function: %ANY(variable, condition1, contion2, …)
resolves to: variable=condition1 OR variable=contion2…

%macro ANY/parmbuff;
   %let var=%scan(&syspbuff,1);
   %let num=2;
   %let val=%scan(&syspbuff,&num);
   %let s = &var=&val;
   %do %while(&val ne);
      %let s = &s or &var=&val;
      %let num=%eval(&num+1);
      %let val=%scan(&syspbuff,&num);
   %end;
   &s;
%mend ANY;

Used as:

%PUT %ANY(id,321,2321);
PROC PRINT DATA=sashelp.class;
   WHERE %ANY(name,'Barbara','John');
RUN;

However, as with most things SAS, there is a super easy way to do this – you just need to know it! Meet the IN operator:

PROC PRINT DATA=sashelp.class;
   WHERE name in ('Barbara','John');
RUN;

I only noticed that this is what the clever SAS log complies to (NOTE: it is listed in the help under ‘IN operator’)…  Anyway, problem solved and  I will now never forget the IN.

End of my EndNote not the end of the world

I recently found myself without my beloved referencing software and had to discover the free alternatives.  You should try Zotero – it will probably win you over!  It even has a few features which you might find make it even make it better than EndNote.  

My Zotero LibraryAs you develop as an academic you embrace technology because it makes life easier.  One of the tools you’d now consider critical to your life is your electronic reference manager.  These tools save you hours because they capture search results direct from PubMed and automate numbering and bibliographies in your papers – priceless!

I had the good fortune of developing my skills as an academic at The University of Sydney.  In large institutions reference management quality software such as EndNote is widely available and after a while you open it nearly as often as your email.

Having taken long-term leave and made a temporary move to the Republic of Ireland to work on a research project I found myself without my beloved software.  Disaster – or is it?  You will be comforted by the fact that there are a multitude of free alternatives.  One such tool, Zotero , has won me and others over.

Here is a summary of a few of the things you’d find are different:

The good

  • You can have multiple folder levels so you can store your publication library in the tree-like structure it probably resembles.
  • It is super-easy to eliminate duplicate records (warning: if you have a huge number (over 1000!) this can be tiresome – I ended up implementing a cool program to automate a left click)
  • Zotero integrates directly into any browser (except Internet Explorer).  You click 1 button and most online publications (including almost any journal) download automatically into your library, storing the PDF if freely available.
  • The program can automatically sync your library with the Zotero server so that you can access everything online or on multiple computers (like EndNote web).
  • You can set up collaborative groups and share access to a reference library (privately or publicly) you can also do in EndNote but the Zotero interface for invitations is easy to use. When you are writing all sources are used to search for the citation you are looking for.

The not so good

  • I miss being able to search PubMed directly from EndNote (you can search for a PMID, DOI or ISDN in Zotero if you know it)
  • I really miss being able to “search for pdf” but am amazed at how many journals are already open access or have open archives (e.g. published >12months are available free)
  • Zotero is only free for up to 300mb of synced storage.  Relatively cheap storage options  (2,6,10,25GB) are available.
  • It slows down a bit once you add a lot of references. After adding my entire EndNote library my collection contains around 1000. I’m also synced with a 5000 reference group. Not as snapp as it was but still usable.

Summary

Zotero has come to the rescue and provides a sound & free alternative to EndNote for reference management.  Now that Mendeley has been sold Zotero is likely to become many people’s default. I know I’m only just discovering the power of this new tool but am ever weary of things to watch out for.  Have you used Zotero?  Is there anything I’ve missed that you love or dislike?  Most importantly, do you have any time-saving tricks for reference management?

My Ireland project

I’m pleased to announce the first publication on my new project.  Well not really but it is the first newsletter from the Cancer pharmacoepidemiology and pharmacoeconomics (CAPPE), Trinity College Dublin.  This is one of the research groups which the National Cancer Registry is collaborating with on this project.

Here is excerpt about my project:

Research Profile: Chris Brown, Statistician/Epidemiologist, National Cancer Registry Ireland

CAPPE NewsletterOvarian cancer is a significant problem in Ireland. It is currently the fourth most commonly diagnosed cancer in women and our long-term mortality rates are among the highest in Europe. There is growing evidence that commonly taken drugs (such as NSAIDs, statins and beta-blockers) may have anti-cancer effects. This project will extend previous breast cancer research by the group (Barron et al. and more recently Eva Flahavan and Susan Spillane), into the study of ovarian cancer. Specifically, it will investigate associations between the use of these drugs and how far the disease has spread at diagnosis, risk of recurrence and survival. The HRB-funded project will involve collaboration with colleagues at Queen’s University Belfast. It will bring together for the first time data from the Republic of Ireland, Northern Ireland and Britain, to ensure we have sufficient data to draw meaningful conclusions.

Converted to a Blog

I have finally bitten the bullet and made the leap of faith from my trusty simple old <HTML> website (view my archive) to a WordPress based blog!  I used WordPress to help build and maintain the website for our 2013 Australian Young Statisticians Conference (www.ysc2013.com) and I was so impressed by the ease at which you could develop, deploy and evolve the content.  Let alone the ability to quickly leverage plugins, taking care of tasks which would take a long time to otherwise carry out.  I still haven’t got my head around how to smoothly incorporate a WordPress blog as part of a wider site but I am sure the time will come when creative needs send me back to customised HTML/CSS pages (with a blog component) – although I’ve seen some amazing WordPress blogs so it’s possible that may never happen…

Thanks for reading – please fee free to get in contact.