2  Basics of R programming for Data Analysis

Reference: Kabacoff (2022), Mayor (2015).

2.1 Running Basic R Commands

  1. In the Console panel: Type the following commands and press Enter after each one:

    1 + 1
    [1] 2
    x <- 10
    print(x)
    [1] 10

    You should see the output of the commands printed in the console.

  2. Create a new R Script:

    • Click File -> New File -> R Script.
    • Type the same commands from above into the script.
    • Save the script as lesson1.R.
  3. Run the script:

    • Click the “Source” button in the script editor (or press Ctrl+Shift+Enter).
    • The commands in the script will be executed in the console.

2.2 Installing and Loading Packages

R packages are collections of functions, data, and documentation that extend the capabilities of R. The tidyverse package is a collection of popular packages for data science.

  1. Install the tidyverse package: In the Console, type the following command and press Enter:

    install.packages("tidyverse")

    R will download and install the tidyverse package and its dependencies. This may take a few minutes.

  2. Load the tidyverse package: In the Console or in your script, type the following command and press Enter:

    library(tidyverse)

    This loads the tidyverse package into your R session, making its functions available for use.

2.3 Downloading Sample Data

We’ll use a sample CSV file for demonstration.

  • Download the exam_scores.csv file from the course materials to your data directory. You can also copy this link for downloading directly into R.: Sample CSV Data

2.4 Inspecting Data

Now, let’s read the exam_scores.csv file into R and inspect it:

#Replace this link with your actual link to your data.

exam_scores <- read.csv("https://raw.githubusercontent.com/sijuswamyresearch/R-for-Data-Analytics/refs/heads/main/data/exam_scores.csv")

#Display the first few rows.
head(exam_scores)
  student_id study_hours score grade
1          1          NA    65     C
2          2           5    88     B
3          3           1    52     F
4          4           3    76     c
5          5           4    82     B
6          6           2   100     C

2.5 Data Input and Data Types

2.5.1 Data Types in R

R supports several fundamental data types:

  1. Numeric: Numbers (e.g., 1, 3.14, -2.5).
  2. Character: Text strings (e.g., "hello", "Data Analysis").
  3. Factor: Categorical variables (e.g., "Low", "Medium", "High"). Factors are important for statistical analysis.
  4. Logical: Boolean values, TRUE or FALSE.
  5. Date: Dates and times (e.g., "2023-10-27").

2.5.2 Checking Data Types

The class() function tells you the data type of a variable:

x <- 10
class(x)
[1] "numeric"
y <- "hello"
class(y)
[1] "character"

2.5.3 Converting Data Types

You can convert between data types using the as.*() functions:

as.numeric()

as.character()

as.factor()

as.logical()

as.Date()

Example:

x <- "123"
class(x)
[1] "character"
x_numeric <- as.numeric(x)
class(x_numeric)
[1] "numeric"
Important Note:

Converting a character string that doesn’t represent a number to numeric will result in NA.

2.6 Lists, Arrays, and Data Frames in R

2.7 1. Lists

Lists are versatile data structures that can hold elements of different types. A list can contain numbers, strings, vectors, arrays, or even other lists.

2.7.1 Creating Lists

You can create a list using the list() function.

2.7.1.1 Example 1: Creating a Simple List

# Creating a simple list
my_list <- list(name = "John", age = 30, grades = c(85, 90, 78))
print(my_list)
$name
[1] "John"

$age
[1] 30

$grades
[1] 85 90 78

2.7.2 Accessing List Elements

You can access list elements using their names or indices.

2.7.2.1 Example 2: Accessing List Elements by Name

# Accessing list elements by name
name <- my_list$name
age <- my_list$age
print(paste("Name:", name))
[1] "Name: John"
print(paste("Age:", age))
[1] "Age: 30"

2.7.2.2 Example 3: Accessing List Elements by Index

# Accessing list elements by index
name <- my_list[[1]]
age <- my_list[[2]]
print(paste("Name:", name))
[1] "Name: John"
print(paste("Age:", age))
[1] "Age: 30"

2.7.3 Modifying Lists

You can modify lists by adding, updating, or deleting elements.

# Adding elements to a list
my_list$city <- "New York"
print(my_list)
$name
[1] "John"

$age
[1] 30

$grades
[1] 85 90 78

$city
[1] "New York"
# Updating list elements
my_list$age <- 31
print(my_list)
$name
[1] "John"

$age
[1] 31

$grades
[1] 85 90 78

$city
[1] "New York"
# Deleting list elements
my_list$grades <- NULL
print(my_list)
$name
[1] "John"

$age
[1] 31

$city
[1] "New York"

2.8 2. Arrays

Arrays are data structures that can hold elements of the same type in multiple dimensions.

2.8.1 Creating Arrays

You can create an array using the array() function.

#one dimensional array
a1=array(1:10)
print(a1)
 [1]  1  2  3  4  5  6  7  8  9 10
# Creating a 2D array (matrix)
my_matrix <- array(data = 1:9, dim = c(3, 3))
print(my_matrix)
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
# creating matrix using matrix function
m1=matrix(c(1,2,3,4,5,6,7,8,9),ncol=3,byrow=T)
m1
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9
# Creating a 3D array
my_3d_array <- array(data = 1:27, dim = c(3, 3, 3))
print(my_3d_array)
, , 1

     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

, , 2

     [,1] [,2] [,3]
[1,]   10   13   16
[2,]   11   14   17
[3,]   12   15   18

, , 3

     [,1] [,2] [,3]
[1,]   19   22   25
[2,]   20   23   26
[3,]   21   24   27
# Accessing array elements
my_matrix
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
element <- my_matrix[2, 3]
print(paste("Element at (2, 3):", element))
[1] "Element at (2, 3): 8"
# Modifying array elements
my_matrix[1, 1] <- 10
print(my_matrix)
     [,1] [,2] [,3]
[1,]   10    4    7
[2,]    2    5    8
[3,]    3    6    9

2.8.2 Array Operations

You can perform various operations on arrays, such as transposing and performing arithmetic operations.

# Transposing a matrix
transposed_matrix <- t(my_matrix)
print(transposed_matrix)
     [,1] [,2] [,3]
[1,]   10    2    3
[2,]    4    5    6
[3,]    7    8    9
# Creating another matrix
another_matrix <- array(data = 10:18, dim = c(3, 3))

# Matrix multiplication
multiplied_matrix <- my_matrix %*% another_matrix
print(multiplied_matrix)
     [,1] [,2] [,3]
[1,]  228  291  354
[2,]  171  216  261
[3,]  204  258  312

2.9 3. Data Frames

Data frames are table-like data structures that organize data into rows and columns. Each column can hold data of a different type.

You can create a data frame using the data.frame() function.

# Creating a data frame
my_data_frame <- data.frame(
  id = 1:3,
  name = c("Alice", "Bob", "Charlie"),
  age = c(25, 30, 28),
  score = c(85, 92, 78)
)
print(my_data_frame)
  id    name age score
1  1   Alice  25    85
2  2     Bob  30    92
3  3 Charlie  28    78

2.9.1 Accessing Data Frame Elements

You can access data frame elements using their column names or indices.

# Accessing data frame columns by name
names <- my_data_frame$name
ages <- my_data_frame$age
print(names)
[1] "Alice"   "Bob"     "Charlie"
print(ages)
[1] 25 30 28

2.9.2 Modifying Data Frames

You can modify data frames by adding, updating, or deleting columns and rows.

# Adding a column to a data frame
my_data_frame$city <- c("New York", "Los Angeles", "Chicago")
print(my_data_frame)
  id    name age score        city
1  1   Alice  25    85    New York
2  2     Bob  30    92 Los Angeles
3  3 Charlie  28    78     Chicago
# Updating data frame elements
my_data_frame$age[1] <- 26
print(my_data_frame)
  id    name age score        city
1  1   Alice  26    85    New York
2  2     Bob  30    92 Los Angeles
3  3 Charlie  28    78     Chicago
# Deleting a column from a data frame
my_data_frame$city <- NULL
print(my_data_frame)
  id    name age score
1  1   Alice  26    85
2  2     Bob  30    92
3  3 Charlie  28    78
# Adding rows to a data frame
new_row <- data.frame(id = 4, name = "David", age = 32, score = 90)
my_data_frame <- rbind(my_data_frame, new_row)
print(my_data_frame)
  id    name age score
1  1   Alice  26    85
2  2     Bob  30    92
3  3 Charlie  28    78
4  4   David  32    90
#adding a column using cbind
new_col=data.frame(city=c("Kottayam","Ettumanoor","Elanji","Muvattupuzha"))
my_data_frame=cbind(my_data_frame,new_col)
my_data_frame
  id    name age score         city
1  1   Alice  26    85     Kottayam
2  2     Bob  30    92   Ettumanoor
3  3 Charlie  28    78       Elanji
4  4   David  32    90 Muvattupuzha
# Deleting rows from a data frame
my_data_frame <- my_data_frame[-4, ]
print(my_data_frame)
  id    name age score       city
1  1   Alice  26    85   Kottayam
2  2     Bob  30    92 Ettumanoor
3  3 Charlie  28    78     Elanji

2.9.3 Data Frame Operations

You can perform various operations on data frames, such as subsetting, filtering, sorting, and merging.

# Subsetting data frames
subset_df <- my_data_frame[, c("name", "score")]
print(subset_df)
     name score
1   Alice    85
2     Bob    92
3 Charlie    78
# Filtering data frames
filtered_df <- my_data_frame[my_data_frame$age > 28, ]
print(filtered_df)
  id name age score       city
2  2  Bob  30    92 Ettumanoor
# Sorting data frames
sorted_df <- my_data_frame[order(my_data_frame$age), ]
print(sorted_df)
  id    name age score       city
1  1   Alice  26    85   Kottayam
3  3 Charlie  28    78     Elanji
2  2     Bob  30    92 Ettumanoor
Kabacoff, Robert. 2022. R in Action: Data Analysis and Graphics with r and Tidyverse. Simon; Schuster.
Mayor, Eric. 2015. Learning Predictive Analytics with r. Packt Publishing Ltd.