Organizing Data

    Formatting, reshaping, and organizing data should be a second nature of every data scientists.

  1. List
    1. List is the most general (and flexible) form of objects in R.
      [] is used to access the sub-list, [[]] to access the element.

    2. Create List
      1. list() function
      2. alnum=list(c("a","b","c","d"), c(1,2))
        [[1]]
        [1] "a" "b" "c" "d"
        
        [[2]]
        [1] 1 2

        The first sub-list of ‘alnum’ would be the character vector. This character vector may be retrieved by running the following code.

        alnum[[1]]
        [1] "a" "b" "c" "d"

        Similarly, the following code will retrieve the second element of the second sub-list.

        alnum[[2]][2]
        [1] 2

        One can also name the sub-list simply adding labels before the sub list.

        alnum2=list(
          alphabet=c("e","f","g","h"), 
          integer=c(3,4))
        $alphabet
        [1] "e" "f" "g" "h"
        
        $integer
        [1] 3 4

        When the sub-lists are labeled, [[]] is no longer necessary.

        alnum2$alphabet     #equivalent of alnum2[[1]]
        [1] "e" "f" "g" "h"
        alnum2$integer[2]   #equivalent of alnum2[[2]][2]
        [1] 4

        cf) as.list()
        cf) unlist()

      3. List as an output from a function
      4. Functions such as lapply or strsplit generates its output in a list format.

        lapply(1:2, function(x) c(x-1,x+1))
        [[1]]
        [1] 0 2
        
        [[2]]
        [1] 1 3
        strsplit("abcdef", "c")
        [[1]]
        [1] "ab"  "def"

    3. Merge Multiple Lists
      1. Into a list via append function
      2. When someone is trying to combine two or more lists together, this is probably what they want.

        append(alnum, alnum2)
        [[1]]
        [1] "a" "b" "c" "d"
        
        [[2]]
        [1] 1 2
        
        $alphabet
        [1] "e" "f" "g" "h"
        
        $integer
        [1] 3 4

        Do not get confused with the following code.

        list(alnum, alnum2)
        [[1]]
        [[1]][[1]]
        [1] "a" "b" "c" "d"
        
        [[1]][[2]]
        [1] 1 2
        
        
        [[2]]
        [[2]]$alphabet
        [1] "e" "f" "g" "h"
        
        [[2]]$integer
        [1] 3 4
      3. Into a Data Frame
      4. Often times (especially when using a function that gives output in a list format) one may wish to coerce a list to a 2D matrix or into a data frame. The function do.call may be handy in this case in conjunction with functions such as rbind, cbind, rbind.data.frame, cbind.data.frame.

        lapply(1:2, function(x) c(x-1,x+1)) %>%
          do.call("rbind", .)
             [,1] [,2]
        [1,]    0    2
        [2,]    1    3

        cf) Check dplyr::bind_rows or plyr::rbind.fill for list with unequal length.

  2. Data Frame
    1. Data frame is a list of vectors of equal length with all contents being heterogeneous.
      cf) matrix for homogeneous

    2. Create Data Frame
      1. Using base R functions
      2. Use data.frame, as.data.frame, rbind.data.frame, cbind.data.frame to create or change existing data to data frame format. Note that the data.frame function builds data frame but as.data.frame is a function that coerce other objects into dataframe class.

        e.g. Check the output difference between the two codes.

        data.frame(matrix(1:9, nrow=3), letters[1:3])
          X1 X2 X3 letters.1.3.
        1  1  4  7            a
        2  2  5  8            b
        3  3  6  9            c
        as.data.frame(matrix(1:9, nrow=3), letters[1:3])
          V1 V2 V3
        a  1  4  7
        b  2  5  8
        c  3  6  9

        Useful argument. stringsAsFactors (logical): F if the character vector should not be converted to a factor. e.g.

        check=matrix(letters[1:9], nrow=3, byrow=T)
        check %>% as.data.frame() %>% str()
        'data.frame':   3 obs. of  3 variables:
         $ V1: Factor w/ 3 levels "a","d","g": 1 2 3
         $ V2: Factor w/ 3 levels "b","e","h": 1 2 3
         $ V3: Factor w/ 3 levels "c","f","i": 1 2 3
        check %>% as.data.frame(stringsAsFactors=F) %>% str()
        'data.frame':   3 obs. of  3 variables:
         $ V1: chr  "a" "d" "g"
         $ V2: chr  "b" "e" "h"
         $ V3: chr  "c" "f" "i"

        Note that the factors and levels are dropped.

    3. Subsetting
      1. Column subset
        1. The $ operator
        2. df$colmn name

        3. By the column number
        4. df[,column number]

        5. By the column name
        6. df[, "column name"]

        7. dplyr package
        8. Utilize the function select from the dplyr package.
          select(df, column name)
          recommend using it in conjunction with the pipe operator %>% and other functions such as filter, summarize or mutate.

        e.g.

        head(mtcars)
                           mpg cyl disp  hp drat    wt  qsec vs am gear carb
        Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
        Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
        Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
        Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
        Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
        Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
        mtcars$mpg
        mtcars[,1]
        mtcars[,"mpg"]
        mtcars %>%
          select(mpg)
        [1] 21.0 21.0 22.8 21.4 18.7 18.1


        Note

        • Subset multiple columns
        mtcars[,1:5]
        mtcars[,c(1,3,5)]
        mtcars[,c("mpg", "disp", "drat")]
        • Complement
        mtcars[,-c(1,3,5)]
        • S4 objects Check @ for S4.
          To be updated…


      2. Row subset
        1. By the row number
        2. df[row number,]

        3. By the row name
        4. df["row name"“, ]

          e.g.

          mtcars[1,]
          mtcars["Mazda RX4",]
                    mpg cyl disp  hp drat   wt  qsec vs am gear carb
          Mazda RX4  21   6  160 110  3.9 2.62 16.46  0  1    4    4

    4. Filtering
      1. Using base R functions
        1. == and !=
        2. df[df$“column name”==“value of interest”] df[df$“column name”!=“value of interest”]

          e.g. There are 3 different types of cylinders in the mtcars data set.

          
           4  6  8 
          11  7 14 

          Use the logical operators == or != to filter data with certain cylinder number.

          mtcars[mtcars$cyl==6,]
                          mpg cyl  disp  hp drat    wt  qsec vs am gear carb
          Mazda RX4      21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
          Mazda RX4 Wag  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
          Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
          Valiant        18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
          Merc 280       19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
          Merc 280C      17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
          Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
          mtcars[mtcars$cyl!=8,]
                          mpg cyl  disp  hp drat    wt  qsec vs am gear carb
          Mazda RX4      21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
          Mazda RX4 Wag  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
          Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
          Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
          Valiant        18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
          Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
          Merc 230       22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
          Merc 280       19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
          Merc 280C      17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
          Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
          Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
          Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
          Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
          Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
          Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
          Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
          Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
          Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

          Logical operators such as | and & may also be applied.

          mtcars[(mtcars$cyl==6)|(mtcars$cyl==8),]
                               mpg cyl  disp  hp drat    wt  qsec vs am gear carb
          Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
          Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
          Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
          Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
          Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
          Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
          Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
          Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
          Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
          Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
          Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
          Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
          Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
          Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
          Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
          AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
          Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
          Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
          Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
          Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
          Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
          mtcars[(mtcars$cyl==6) & (mtcars$hp==110),]
                          mpg cyl disp  hp drat    wt  qsec vs am gear carb
          Mazda RX4      21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
          Mazda RX4 Wag  21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
          Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
        3. %in% and its corresponding negation
        4. %in% is an operator that matches first argument with its second and returns a logical vector. It is somewhat similar to the == operator, but more useful if there are multiple values that needs to be subsetted. For example, the code in the above example: "mtcars[(mtcars$cyl==6)|(mtcars$cyl==8),]" can be simplified as "mtcars[mtcars$cyl %in% c(6, 8),]".

          identical(mtcars[mtcars$cyl %in% c(6, 8),], mtcars[(mtcars$cyl==6)|(mtcars$cyl==8),])
          [1] TRUE

          Similarly, %in% may be negated as the following code.

          mtcars[!mtcars$cyl %in% c(6, 8),]
                          mpg cyl  disp  hp drat    wt  qsec vs am gear carb
          Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
          Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
          Merc 230       22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
          Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
          Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
          Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
          Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
          Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
          Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
          Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
          Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

          Note that mtcars[mtcars$cyl !%in% c(6, 8),] would be a wrong code. I personally like to introduce a new function %!in%that negates the %in% function.

          '%!in%' <-function(x, y)!('%in%'(x,y))
          identical(mtcars[!mtcars$cyl %in% c(6, 8),],mtcars[mtcars$cyl %!in% c(6, 8),])
          [1] TRUE

      2. dplyr package
      3. filter function from the dplyr pacakge can do the same job, except is can be well integrated with other functions using the pipe operator %>%.

        mtcars %>%
          filter(cyl<6)
        Warning: package 'bindrcpp' was built under R version 3.3.3
            mpg cyl  disp  hp drat    wt  qsec vs am gear carb
        1  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
        2  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
        3  22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
        4  32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
        5  30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
        6  33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
        7  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
        8  27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
        9  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
        10 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
        11 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

        The filter function may be applied with |, %in%, or & operators as well.

        mtcars %>%
          filter(cyl %in% c(6,8))
            mpg cyl  disp  hp drat    wt  qsec vs am gear carb
        1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
        2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
        3  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
        4  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
        5  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
        6  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
        7  19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
        8  17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
        9  16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
        10 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
        11 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
        12 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
        13 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
        14 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
        15 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
        16 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
        17 13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
        18 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
        19 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
        20 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
        21 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
        mtcars %>%
          filter(cyl ==6 | gear==4)
            mpg cyl  disp  hp drat    wt  qsec vs am gear carb
        1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
        2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
        3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
        4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
        5  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
        6  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
        7  22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
        8  19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
        9  17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
        10 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
        11 30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
        12 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
        13 27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
        14 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
        15 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
        mtcars %>%
          filter(cyl==6 & gear==4)
           mpg cyl  disp  hp drat    wt  qsec vs am gear carb
        1 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
        2 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
        3 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
        4 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4

    5. Data Presentation
      1. Data set structured in a human friendly format dose not necessarily imply computer/machine friendly fashion or vise versa. Often times, therefore, data structure should be changed from a human friendly wide form to a computer friendly long form or in the reversed manner. This may be even more important if one is sending and receiving data with co-workers who are not familiar with programming. tidyr and reshape are two useful packages that converts between the two data format: long vs. wide.

      2. tidyr
        1. Wide to long format using the gather function
        2. Consider the following sub data set from the mtcars data set.

                   cars  mpg cyl  hp gear
          1   Mazda RX4 21.0   6 110    4
          2 Merc 450SLC 15.2   8 180    3

          This is an example of a wide form data set where all variables are seperated in each column of the data frame. This may be a human friendly data format, however, may not be efficient for the computer to read in. Use the gather function from the tidyr package to convert the data from wide format to long format.

          gather simply collapses multiple columns and its values from the original data frame into two columns: the ‘key’ and the ‘value’ column. The levels of the ‘key’ variable will be the name of the columns gathered, and the ‘value’ column will contain its corresponding values.

          Therefore, the function has 4 important arguments: data, key, value, and selected column.

          • data: The original wide format data frame.
          • key: Name of a column that will be newly created in the long format data set. The levels of the ‘key’ variable will be the column names of the ‘selected columns’ argument below.
          • value: Name of a column that will be newly created in the long format data set. The values in the ‘value’ are the values from the ‘selected columns’ argument.
          • selected columns: Specific columns to be gathered. These selected columns will be put into a single file formation. Column numbers may be used as well as its name.
          # Using column numbers
          subcars %>%
            gather(key=spec, value=performance, 2, 4)
          
          # Using column names
          subcars %>%
            gather(key=spec, value=performance, mpg, hp)
                   cars cyl gear spec performance
          1   Mazda RX4   6    4  mpg        21.0
          2 Merc 450SLC   8    3  mpg        15.2
          3   Mazda RX4   6    4   hp       110.0
          4 Merc 450SLC   8    3   hp       180.0
          # Using column numbers with :
          subcars %>%
            gather(key=spec, value=values, 2:5)
          
          # Using column names with :
          subcars %>%
            gather(key=spec, value=values, mpg:gear)
                   cars spec values
          1   Mazda RX4  mpg   21.0
          2 Merc 450SLC  mpg   15.2
          3   Mazda RX4  cyl    6.0
          4 Merc 450SLC  cyl    8.0
          5   Mazda RX4   hp  110.0
          6 Merc 450SLC   hp  180.0
          7   Mazda RX4 gear    4.0
          8 Merc 450SLC gear    3.0

          Sometimes, thinking in terms of complement or ‘everything else’ might be effecient when writing a script. Use the- to indicate what else to ‘gather’.

          subcars %>%
            gather(key=spec, value=performance, -1, -3)
          
          subcars %>%
            gather(key=spec, value=performance, -cars, -cyl)
                   cars cyl spec performance
          1   Mazda RX4   6  mpg        21.0
          2 Merc 450SLC   8  mpg        15.2
          3   Mazda RX4   6   hp       110.0
          4 Merc 450SLC   8   hp       180.0
          5   Mazda RX4   6 gear         4.0
          6 Merc 450SLC   8 gear         3.0
        3. Long to wide format using the spread function
        4. The spread function will do the opposite of the gather function. This is useful when one is trying to present a data set.

                   cars cyl gear spec performance
          1   Mazda RX4   6    4  mpg        21.0
          2 Merc 450SLC   8    3  mpg        15.2
          3   Mazda RX4   6    4   hp       110.0
          4 Merc 450SLC   8    3   hp       180.0
          longform %>%
            spread(key=spec, value=performance)
                   cars cyl gear  hp  mpg
          1   Mazda RX4   6    4 110 21.0
          2 Merc 450SLC   8    3 180 15.2

          An important argument that might be useful is the fill argument. This argument replaces missing values, either implicit or explicit.

          • Implicit Missing
          # Example of implicit missing
          implicit_missing%>%
            spread(key=spec, value=performance, fill=NA)
                   cars cyl gear spec performance
          1   Mazda RX4   6    4  mpg          21
          3   Mazda RX4   6    4   hp         110
          4 Merc 450SLC   8    3   hp         180
                   cars cyl gear  hp mpg
          1   Mazda RX4   6    4 110  21
          2 Merc 450SLC   8    3 180  NA
          • Explicit Missing
          # Example of implicit missing
          explicit_missing%>%
            spread(key=spec, value=performance, fill=NA)
                   cars cyl gear spec performance
          1   Mazda RX4   6    4  mpg          21
          2 Merc 450SLC   8    3  mpg          NA
          3   Mazda RX4   6    4   hp         110
          4 Merc 450SLC   8    3   hp         180
                   cars cyl gear  hp mpg
          1   Mazda RX4   6    4 110  21
          2 Merc 450SLC   8    3 180  NA

      3. reshape2
        1. The reshape2 package could do similar operations introduced in the tidyr section above. I think tidyr package has a simpler syntax compared to the reshape 2 package, however, reshape 2 package can perform additional features that the tidyr package cannot.

        2. Wide to long format using the melt function
        3. The melt function can ‘gather’ the data set.

          subcars %>%
            melt(
              variable.name = "spec",
              value.names = "performance",
              id.vars = c("cars", "cyl", "gear"))
                   cars cyl gear spec value
          1   Mazda RX4   6    4  mpg  21.0
          2 Merc 450SLC   8    3  mpg  15.2
          3   Mazda RX4   6    4   hp 110.0
          4 Merc 450SLC   8    3   hp 180.0
        4. Long to wide format using the dcast function
        5. dcast function is compatible to that of the ‘spread’ function.

          longform %>%
            dcast(formula = cars+cyl+gear ~ spec, value.var = "performance")
                   cars cyl gear  hp  mpg
          1   Mazda RX4   6    4 110 21.0
          2 Merc 450SLC   8    3 180 15.2
          longform %>%
            dcast(cars+cyl+gear ~ spec)
          Using performance as value column: use value.var to override.
                   cars cyl gear  hp  mpg
          1   Mazda RX4   6    4 110 21.0
          2 Merc 450SLC   8    3 180 15.2
          longform %>%
            dcast(cars+cyl+gear ~ ., mean)
          Using performance as value column: use value.var to override.
                   cars cyl gear    .
          1   Mazda RX4   6    4 65.5
          2 Merc 450SLC   8    3 97.6

          One positive side of using the reshape 2 package over the simpler tidyr package is that the cast function will aggregate all variables not defined in the function. It will not make much sence adding or taking means of “gear” and “mpg” for this particular example, but the example is given to check the syntax.

                   cars cyl gear spec performance
          1   Mazda RX4   6    4  mpg        21.0
          2 Merc 450SLC   8    3  mpg        15.2
          3   Mazda RX4   6    4   hp       110.0
          4 Merc 450SLC   8    3   hp       180.0
          # Sum of gear and mpg variable for each car+cylinder
          longform %>% 
            dcast(cars+cyl~., sum)
          Using performance as value column: use value.var to override.
                   cars cyl     .
          1   Mazda RX4   6 131.0
          2 Merc 450SLC   8 195.2
          # Mean of gear and mpg variable for each car+cylinder combination
          longform %>%
            dcast(cars+cyl ~ ., mean)
          Using performance as value column: use value.var to override.
                   cars cyl    .
          1   Mazda RX4   6 65.5
          2 Merc 450SLC   8 97.6

    6. Nested Data Frame
    7. a=lapply(1:4, function(x) rep(x,x))
      sapply(a, function(x) x+1)
      ## [[1]]
      ## [1] 2
      ## 
      ## [[2]]
      ## [1] 3 3
      ## 
      ## [[3]]
      ## [1] 4 4 4
      ## 
      ## [[4]]
      ## [1] 5 5 5 5