Methods Workshop in Quantitative Text Analysis Translated in R

Authors
Published

September 3, 2025

Preface

This work is a translation of the a Python tutorial from the following repository: https://github.com/jisukimmmm/NCCR_MWQTA_2024

It was then transformed in an interactive tutorial.

Introduction to R language - exercises & answers

Basic Syntax and Operations:

1. Calculate the area of a triangle:

Write a program to calculate the area of a triangle given its base and height.

Hint

The area of a triangle is calculated using the formula: (base * height) / 2

Solution:
base <- 10
height <- 3

triangle <- base * height / 2

paste("This is the area:", triangle)

2. Speed Conversion

Create a program that converts kilometers per hour to meters per second.

Hint

To convert km/h to m/s: 1. Multiply by 1000 to convert km to m 2. Divide by 3600 to convert hours to seconds

Solution:
kmph <- 100
ms <- kmph * 1000 / 3600
paste("The answer is", ms)

3. String Reversal

Write an R script that takes a string as input and prints its reverse.

Hint

Use strsplit() to split the string into characters, rev() to reverse them, and paste() with collapse to join them back.

Solution:
my_text <- "This is a text"
rev_text <- paste(rev(strsplit(my_text, NULL)[[1]]), collapse = "")
rev_text

Conditional Statements and Loops:

1. Leap Year Check

Create a program that checks whether a given year is a leap year or not.

Hint

A year is a leap year if: - It’s divisible by 4 AND not divisible by 100 - OR it’s divisible by 400 Use the modulo operator %% to check divisibility

Solution:
year <- 3000
if ((year %% 4 == 0 && year %% 100 != 0) | (year %% 400 == 0)) {
  "This is a leap year"
} else {
  "This is not a leap year"
}

2. Sum of Multiples

Write an R script to find the sum of all numbers between 1 and 1000 that are divisible by both 3 and 5.

Hint
  1. Create a sequence from 1 to 1000
  2. Use vector operations with modulo to find numbers divisible by both 3 and 5
  3. Use sum() to add them up
Solution:
numbers <- 1:1000
bag <- numbers[numbers %% 3 == 0 & numbers %% 5 == 0]
sum(bag)

3. Geometric Progression

Implement a program to print the first 10 terms of the geometric progression series: 2, 6, 18, 54, …

Hint
  1. Create a numeric vector to store the series
  2. First term is given
  3. Each subsequent term is previous term multiplied by common ratio
Solution:
common_ratio <- 3
gp_series <- 2

for (i in 2:10) {
  gp_series[i] <- gp_series[i-1] * common_ratio
}
gp_series

Lists and List Operations:

1. Largest and Smallest Elements

Create a program to find the largest and smallest elements in a list.

Hint

Use R’s built-in functions: - min() for smallest element - max() for largest element

Solution:
number_list <- c(2, 5, 1, 67, 4, 7)
mini <- min(number_list)
maxi <- max(number_list)
paste("Min:", mini, "Max:", maxi)

2. List Intersection

Write an R script to find the intersection of two lists.

Hint

Use the intersect() function to find common elements between two vectors

Solution:
list1 <- c(1, 2, 3, 4, 5)
list2 <- c(4, 5, 6, 7, 8)
intersection <- intersect(list1, list2)
intersection

3. Program to shuffle a deck of cards (x)

Implement a program to shuffle a deck of cards represented as a list.

Hint

Use the sample() function to randomly shuffle elements in a vector. First create a vector with all cards.

Solution:
deck <- c("A", "2", "3", "4", "5", "6", "7", "8", "9", "10", "J", "Q", "K")
shuffled_deck <- sample(deck)

print(shuffled_deck)

Strings and String Operations:

1. Capitalize the first letter of each word (x)

Write an R script to capitalize the first letter of each word in a sentence.

Hint

The tools::toTitleCase() function can capitalize the first letter of each word in a string.

Solution:
sentence <- "this is a sentence"
capitalized_sentence <- tools::toTitleCase(sentence)

print(capitalized_sentence)

2. Most Frequent Character (x)

Create a program to find the most frequent character in a given string.

Hint
  1. Split the string into characters using strsplit()
  2. Create a frequency table with table()
  3. Sort in descending order and get the first element
Solution:
string <- "this is a string"
most_frequent_char <- names(sort(table(strsplit(string, NULL)[[1]]), 
                               decreasing = TRUE))[1]
most_frequent_char

3. Check if a string contains only digits

Implement a program to check if a given string contains only digits.

Hint
  • Use grepl() function with a regular expression pattern
  • The pattern ^[0-9]+$ means:
    • ^ start of string
    • [0-9] any digit
    • + one or more occurrences
    • $ end of string
Solution:
string <- "123456"

is_digits <- grepl("^[0-9]+$", string)

print(paste("Does the string contain only digits?", is_digits))

Functions:

1. Perfect Square Check

Create a function to check whether a given number is a perfect square or not.

Hint
  1. Take the square root of the number
  2. Check if the square root is equal to its floor value
  3. Return TRUE/FALSE accordingly
Solution:
is_perfect_square <- function(x) {
  sqrt_x <- sqrt(x)
  return(sqrt_x == floor(sqrt_x))
}

# Test the function
is_perfect_square(16)  # Should return TRUE
is_perfect_square(15)  # Should return FALSE

2. Reverse the elements of a vector

Implement a function to reverse the elements of a list in place.

Hint

You can use R’s built-in rev() function to reverse a list or vector. Alternatively, you could write a loop that swaps elements from the beginning and end moving towards the middle.

Solution:
reverse_list <- function(lst) {
  return(rev(lst))
}

# Test the function
my_list <- c(1, 2, 3, 4, 5)
reversed <- reverse_list(my_list)
print(reversed)

3. Calculate the mean of a list of numbers

Create a function to calculate the mean (average) of a list of numbers.

Hint

The mean is calculated by summing all numbers and dividing by the count of numbers. In R, you can use the built-in mean() function or implement it using sum() and length().

Solution:
calculate_mean <- function(numbers) {
  return(mean(numbers))
}

# Test the function
numbers <- c(1, 2, 3, 4, 5)
avg <- calculate_mean(numbers)
print(paste("The mean is:", avg))

File Handling:

1. CSV Data Analysis (x)

Create a program to read a CSV file containing student scores and calculate their average.

Hint
  1. Use readr::read_csv() to read the CSV file
  2. Access the score column using $
  3. Calculate mean using mean()
Solution:
# Method 1
student_scores <- read.csv("data/student_scores.csv")
mean(student_scores$score)

# Method 2
library(readr)
student_scores <- read_csv("data/student_scores.csv")
mean(student_scores$score)

2. Find lines containing a specific word in a text file (x)

Write a Python script to find and print all lines containing a specific word in a text file.

Hint

Use readLines() to read the file content and grep() to search for matching lines. The grep() function with value=TRUE returns the actual matching lines.

Solution:
find_lines_with_word <- function(file_path, word) {
  lines <- readLines(file_path)
  matching_lines <- grep(word, lines, value = TRUE)
  return(matching_lines)
}

# Example usage
# find_lines_with_word("example.txt", "specific_word")

3. Count words in a text file

Implement a program to count the number of words in a text file.

Hint

Break this down into steps: 1. Read the file using readLines() 2. Split the text into words using strsplit() with whitespace as delimiter 3. Count the total words using length()

Solution:
count_words_in_file <- function(file_path) {
  lines <- readLines(file_path)
  words <- unlist(strsplit(lines, "\\s+"))
  return(length(words))
}

# Example usage
# count_words_in_file("example.txt")

Plotting:

1. Histogram

Histogram of Student Scores: Create a histogram showing the distribution of student scores.

Hint
  1. Load ggplot2
  2. Use geom_histogram()
  3. Set appropriate binwidth
  4. Add proper labels
Solution:
library(ggplot2)
ggplot(student_scores, aes(x = score)) +
  geom_histogram(binwidth = 5) +
  labs(title = "Histogram of Student Scores", 
       x = "Score", 
       y = "Frequency")

2. Create a Boxplot of Student Scores

Boxplot of Student Scores: Generate a boxplot to visualize the spread and central tendency of student scores.

Hint

Use ggplot() with geom_boxplot(). The data should be mapped to the y-axis since we want a vertical boxplot. Don’t forget to add appropriate labels.

Solution
ggplot(student_scores, aes(y = score)) +
  geom_boxplot() +
  labs(title = "Boxplot of Student Scores", y = "Score")

3. Create a Scatter Plot of Student Scores

Scatter Plot of Student Scores: Create a scatter plot to explore the relationship between student scores and student IDs.

Hint

Use ggplot() with geom_point(). Map student_id to x-axis and score to y-axis. Remember to include appropriate axis labels.

Solution
ggplot(student_scores, aes(x = student_id, y = score)) +
  geom_point() +
  labs(title = "Scatter Plot of Student Scores", x = "Student ID", y = "Score")