How to reshape data in wide or long format in R

reshape data wide long r gather melt dplyr

Reshaping though frequently required in data analysis, so often it remains confusing even if you are frequent user of reshape function in R.Here I have provided more explanation with simulated data.

Reshape data from wide to long

Wide data: the observation has been made several time but all the observations for given person is recorded in single row in spreadsheet.For example, in clinic, person visiting every month for their blood pressure check, and nurse note down the value for each month in single row which belong to this specific person.

Let’s first create small wide data

# wide data
df <- data.frame(matrix(data = NA, nrow = 3, ncol = 5, dimnames = list(NULL, paste0(c("id","name","jan","feb","march")))))
df[1,] <- c(1,"A",123,120,125)
df[2,] <- c(2,"B",140,150,155)
df[3,] <- c(3,"C",96,86,97)

Our “wide” data look like

id name jan feb march
1 A 123 120 125
2 B 140 150 155
3 C 96 86 97

Now rather than recording each month’s observation in row, we want data each which only one observation per row for that given person. for that we need to use standard reshape command in R.

The reshape has following argument:

  • idvar : unique identifier for person,place or object on which observations(measurements) are made at different time points or repeatedly. Example Case ID

  • varying : if observation for specific individuals are made at different time points, in which columns values are recorded i.e time-varying columns. Example Jan, Feb, March

  • timevar: what will be the name of column once the time-varying columns above has been staked in rows. Example Month

  • times : what will be the values (of time) once the time-varying columns above has been staked in rows. Example Jan, Feb, March

  • v.names: what will be the values (of observations) once the time-varying columns above has been staked in rows. Example BP

  • direction: data needs to converted from wide to long format.

df_long <- reshape(df,
idvar = "id",  
        #[unique identifier for person,place or object on which observations(measurments) are made at different time points or repeatedly]

varying = c("jan","feb","march"),  
        # [if observation for specific individulas are made at different time points, in which columns values are recorded i.e time-varying columns ]

timevar = "month", 
        # [what will be the name of column once the timevarying columns above has been staked in rows]

times = c("jan","feb","march"),
        # [what will be the values (of time) once the timevarying columns above has been staked in rows]

v.names = "BP", 
        # [what will be the values (of observations) once the timevarying columns above has been staked in rows]

direction = "long")   
        # [we want to convert wide df into long one])

Our “long” data look like

df_long <- arrange(df_long, id)

kable(df_long,format = "html") %>%  kable_styling(full_width = F, position = "center",font_size = 14)
id name month BP
1 A jan 123
1 A feb 120
1 A march 125
2 B jan 140
2 B feb 150
2 B march 155
3 C jan 96
3 C feb 86
3 C march 97

Sometime, not only one type of measurement (BP) but also other types (such as Heart rate -HR) are measured and recorded row wise. For example,

This data can be converted into “long” by using list for group of time-varying columns for varying

df_long <- reshape(df,
idvar = "id",  
varying = list(c("BP_jan","BP_feb","BP_march"),c("HR_jan","HR_feb","HR_march") ),  
timevar = "month", 
times = c("jan","feb","march"),
v.names = c("BP","HR"), 
direction = "long")   
df_long <- arrange(df_long, id)
id name month BP HR
1 A jan 123 72
1 A feb 120 70
1 A march 125 71
2 B jan 140 85
2 B feb 150 82
2 B march 155 86
3 C jan 96 65
3 C feb 86 52
3 C march 97 59

Reshape data from long to wide

To make data “wide” from long, the reshape function will need only two main arguments apart from id.

  • v.names : which column represent values of the observations in long data ( so that reshape function can transform these values into rows for each ID)

  • timevar : which column represent the timing of the observations ( so that reshape function associate it with the value for given time for each ID )

Here is the example

df_wide <- reshape(df_long,
       idvar = "id",
       # unique identifier
       v.names = c("BP","HR"),
       # the columns represent the value of the observation (BP,HR)
       timevar = "month",
       # the column represent the timing of the observations
       direction = "wide",
       sep = "_"
        )

Here is the wide data

id name BP_jan HR_jan BP_feb HR_feb BP_march HR_march
1 1 A 123 72 120 70 125 71
4 2 B 140 85 150 82 155 86
7 3 C 96 65 86 52 97 59