Cleaning and Visualizing 2011 TIMSS Question Level Data With R

Introduction
Data Munging
Visualization
Conclusion

Introduction

The Trends in International Math and Science Study (TIMSS) is a series of international assessments of the mathematics and science knowledge of students around the world. It is administered by both the International Association for the Evaluation of Educational achievement (IEA) and Boston College (BC) who first conducted the assessments in 1995 and re-administered them every 4 years after that (1999, 2003, 2007, 2011, and 2015). Different versions of the assessments are given to 4th graders and 8th graders of participating countries. Each question in the assessment test the students’ grasp of different cognitive and content domains. Every assessment contains both multiple-choice and free-response style questions. After the test is given, each student is given a performance score ranging from 0 as the lowest to 1000 as the highest, with 500 being the intended international mean. Each participating country, in turn, is given a performance score which is the average of all of its students’ scores. The study uses the scores 625, 550, 475, and 400 to represent advanced, high, intermediate, and low international benchmarks respectively. That is, a student with a performance score greater than 625 is considered to have an advanced grasp of either math or science while a student with a performance score less than 400 is considered to have a below low grasp of either math or science.

The primary purpose of this article is to show different methods for visualizing question-level data for the TIMSS Math Assesment. This will be done by analyzing the 2011 math test given to 8th grade students of the Republic of Korea, Lithuania, the United States of America, and Chile. Question-level data is the typical response provided by students of each country to each question asked. Visualizing question-level data will hopefully lead to better analysis of a country’s educational needs. Korea was chosen because it performed the best of all 45 participating countries with a performance score of 613. Lithuania was chosen because it performed just above the TIMSS scale centerpoint with a score of 502. Both the USA and Chile were chosen because they are countries of personal interest with scores of 509 and 416 respectively.¹

It is worth noting that the IEA and BC consider countries with more than 15% of students scoring less than 400 as countries that can not be reliably assessed. This is because such a high rate of below low performers suggests an increased probability of random guessing. If we eliminate all the countries that could not be reliably assessed, then the country with the lowest performance score is Chile, one of the two countries of personal interest.

The secondary purpose of this article is to tidy the datasets provided by IEA. The new tidy datasets will facilitate the visualization of data in this article as well as facilitate analysis that will be done in future articles. There are few articles online that talk about analyzing and cleaning the TIMSS dataset in the R. Therefore, the Data Munging section is meant as a resource for researchers who would like to analyze the TIMSS dataset in R. For everyone else, the section can be summarized as a repetition of the following code:

dataframe %>%
  group_by(column_of_interest) %>%
  nest() %>%
  mutate(
    new_column = lapply(.$data, anonymous_function) %>%
      unlist()) %>%
  unnest()

Feel free to skip directly to the data visualization section if data tidying does not fill you up with joy…

Data Munging

Tidy Data

According to Hadley Wickham, tidy data is “a standard way of mapping the meaning of a dataset to its structure. A dataset is messy or tidy depending on how rows, columns and tables are matched up with observations, variables and types. In tidy data:

Each variable forms a new column.
Each observation forms a row.
Each type of observational unit forms a table.²

This article will attempt to tidy the TIMSS dataset as much as possible, however, the nesting capabilities of the tidyr package makes it very tempting to keep all observations in a single table. If all information is kept in one table, then there is a greater chance that multiple observations will be contained in a single row.

The TIMSS dataset that this article works on can be divided into 3 main types of observational units. These observational units are measured at the Student, Book, and Question levels. The focus of this article is question-level data, however, rich question-level data cannot be derived without first organizing the TIMSS dataset at the student and book levels. With tidyr each observational unit can be stored in a single dataframe. Tidy data tables can be derived from this single dataset with three simple commands

dataframe %>%
  group_by(column_representing_observational_unit) %>%
  nest()

Therefore the single dataframe this article creates will only be three commands away from a truly tidy dataset.

Obtaining Data

IEA has a data repository where it publicly displays study data. It can be accessed by going to http://rms.iea-dpc.org/. The order of clicks goes in the order of SEARCH > TIMSS > Grade 8 > 2011 > Chile::Student Test Responses > Korea, Rep. of::Student Test Responses > Lithuania::Student Test Responses > United States::Student Test Responses > SPSS > Codebooks > Download Name:::Whatever > Add To Basket > View Basket > (disk/save icon)

The student achievement books contain student level information such as student responses to each question, the students’ gender, and the students’ school. The IEA naming convention for data files is:

Begin with ‘a’ for 4th grade tests or ‘b’ for 8th grade tests
Then ‘sa’ for student achievement files
A three letter string representing the country or ‘tms’ for a codebook.
End with ‘m5’ representing that this is the 5th administration of TIMS

so bsachlm5 represents the 2011 8th grade student achievement test for chile and bsatmsm5 represents the 2011 8th grade student achievement codebook.

Data Tidying

The principle packages used in this article will be tidyverse, and stringr. The tidyverse package is important because it includes the dplyr (cleaning), tidyr (cleaning) and ggplot (visualizing) packages as well as the magrittr (%>%) package. The stringr package is important because of its text/string manipulation capabilities. Other packages used are haven, readxl, and gridExtra.

The TIMSS dataset must be uploaded before an cleaning, analysis, or visualization can be done.

library(tidyverse)
library(stringr)

library(haven)

chl_achievement_11 <- read_spss('bsachlm5.sav')
kor_achievement_11 <- read_spss('bsakorm5.sav')
ltu_achievement_11 <- read_spss('bsaltum5.sav')
usa_achievement_11 <- read_spss('bsausam5.sav')

detach(package:haven)

The student achievement books are in wide format. Each row represents one student and each column represents either a question or student information. The books include both math and science questions as well as student performance scores, school and student IDs, gender, and test information.

dim(chl_achievement_11)

## [1] 5835  581

names(chl_achievement_11) %>%
  head(20)

##  [1] "IDCNTRY"  "IDBOOK"   "IDSCHOOL" "IDCLASS"  "IDSTUD"   "M032166" 
##  [7] "M032721"  "M032757"  "M032760A" "M032760B" "M032760C" "M032761" 
## [13] "M032692"  "M032626"  "M032595"  "M032673"  "M052216"  "M052231" 
## [19] "M052061"  "M052228"

The student achievement codebook must be uploaded as well. The codebook contains information specific to each question asked. However, we do not want all the information in the codebook so we will need to parse it down to only columns which provide important information. These columns are: FIELD_NAME, FIELD_LABL, MEAS_CLASS, and COMMENT1. FIELD_NAME provides the id for each question asked, FIELD_LABL is a very brief summary of what each question asks, MEAS_CLASS provides the answer of each multiple choice question (‘M1’ for ‘A’, ‘M2’ for ‘B’, ‘M3’ for ‘C’, and ‘M4’ for ‘D’) or ‘SA’ for each free response question, and COMMENT1 provides the content and cognitive domain for each question. Student answers to each multiple choice question in the student achievement books are simply numbers (‘1’, ‘2’, ‘3’, ‘4’) so the ‘M’ should be removed from MEAS_CLASS in order to to join the student achievement books with the student achievement codebook. The COMMENT1 column contains two separate variables (‘content domain’ and ‘cognitive domain’) and should be split into two separate columns.

library(readxl)

achievement_codebook_11 <- read_excel('bsatmsm5.xls') %>%
  select(FIELD_NAME, FIELD_LABL, MEAS_CLASS, COMMENT1) %>%
  mutate(MEAS_CLASS = str_replace(MEAS_CLASS, '^M', '')) %>%
  separate(COMMENT1, into = c('content_domain', 'cognitive_domain'), sep = '\\\\') %>%
  mutate(cognitive_domain = str_extract(cognitive_domain, '\\w+')) %>%
  mutate(question_type = sapply(MEAS_CLASS, function(x){
    if(str_detect(x, '\\d')){
      'Multiple Choice'
    } else if (str_detect(x, 'SA|DPC_D')){
      'Free Response'
    } else {
      'Other'
    }
  }))

detach(package:readxl)

The student achievement books, in dataframe format, must be manipulated and cleaned before question level visualization can be conducted. The cleaning/manipulation process can be performed in 5 steps, represented below in 4 separate functions. These functions either reshape the dataframe or add new information derived from existing information.

The dataframe should only contain columns of interest. The columns of interest are any columns that begin with the letter ‘M’ which represent student responses to a particular math question, any columns that begin with ‘BSM’ which represent different math performance scores different researchers gave to each student, and the identification variables such as ‘IDSTUD’, ‘IDBOOK’, and ‘ITSEX’ which represent a particular student’s test ID, which of the the 14 testbooks the student was given, and the student’s gender. This can be done by combining dplyr‘s non-standard evaluation and data manipulation capabilities as well as stringr’s regex capabilities. The questions should be combined into a single column and their values should be combined into another column so that each row has a single STUDENTID|question|answer combination - a process often referred to as going ’from wide to long format’. This can be done done by using tidyr’s reshaping capabilities. Finally, we want to get rid of all NA values in the answer column. NA values represent either questions that the student wasn’t given or questions that the student wasn’t able to respond to. We will see later that removing questions that a student was given, but did not answer, will not affect the analysis.

grab_math_questions <- function(df){
  df %>%
    #^M grabs all the math questions
    #^BSM grabs all the benchmark scores
    select_(.dots = names(.)[str_detect(names(.), '^M|IDSTUD|IDBOOK|IDSCHOOL|ITSEX|^BSM')]) %>%
    gather_('question', 'answer', names(.)[str_detect(names(.), '^M')]) %>% 
    filter(!is.na(answer))}

Book level information must be extracted before attempting to extract question level information. The first function deleted NA values, potentially deleting a STUDENTID|question|answer combination for a student who was given a question, but was unable to answer that question. The total number of students who were given a question can be derived by filtering for unique question|BOOKID combinations, which in turn can be combined with unique STUDENTID|BOOKID combinations to determine which questions were provided to which students. This can be done by using the powerful nest() and lapply() combination. The nest() method creates a column named data which is a dataframe of all selected columns. This is done by filtering all selected columns to unique variables in the unselected column or columns. This data column, once created, can be iterated over with an anonymous function passed to lapply().

get_book_info <- function(df){
  df %>%
    group_by(IDBOOK) %>% 
    nest() %>%
    mutate(
      students_per_book = sapply(seq_along(.$data), function(i){
        .$data[[i]] %>% 
          group_by(IDSTUD) %>%
          count() %>%
          nrow()
      })) %>%
    mutate(
      students_per_book_female = sapply(seq_along(.$data), function(i){
        .$data[[i]] %>% 
          group_by(IDSTUD, ITSEX) %>%
          count() %>%
          filter(ITSEX == 1) %>%
          nrow()
      })) %>%
    mutate(
      students_per_book_male = sapply(seq_along(.$data), function(i){
        .$data[[i]] %>%
          group_by(IDSTUD, ITSEX) %>%
          count() %>%
          filter(ITSEX == 2) %>%
          nrow()
      })) %>%
    unnest()}

The current dataframe identifies which questions were given to each student and it also provides each student’s response to the questions, however it does not provide the correct response to each question. Luckily, for free-response questions, the student’s answer is coded so that a number greater than or equal to 20 is fully correct and a number greater than or equal to 10 is either fully or partially correct. For purpose of this analysis, partially correct data will be considered correct. The answers to the multiple-choice questions can be found in the MEAS_CLASS column within the achivement_codebook_11 dataframe created earlier. The two dataframes can be combined by using dplyr’s table joining capabilities. New information joined into the dataframe should be cleaned for analysis.

combine_datasets <- function(df, cdbook){
  df %>%
    left_join(cdbook, by = c('question' = 'FIELD_NAME')) %>%
    mutate(
      student_gave_correct_answer = sapply(seq_along(.$MEAS_CLASS), function(i){
        if(.$MEAS_CLASS[i] %in% 1:4){
          .$MEAS_CLASS[i] == .$answer[i]
        } else {
          str_detect(.$answer[i], '^[12]')
        }})) %>%
    mutate(
      correct_answer_derived_from_labl = FIELD_LABL %>% 
        str_extract( '\\(\\d\\)|\\(\\w\\)')
    ) %>%
    mutate(
      FIELD_LABL = FIELD_LABL %>%
        str_replace(' \\(\\d\\)|\\(\\w\\)', '')
    )}

Question-level information now can be extracted because the correct answer to each question is known, the total number of students given the question is known, and the students’ answers to the questions are known. Extraction can be done by using the nest() and lapply() combination used in function #2.

get_question_info <- function(df){
  df %>%  
    group_by(question) %>%
    nest() %>%
    mutate(
      students_per_question = sapply(seq_along(.$data), function(i){
        .$data[[i]] %>% 
          group_by(IDBOOK, students_per_book) %>% 
          count() %>%
          .$students_per_book %>%
          sum()
      })) %>%
    mutate(
      students_per_question_female = sapply(seq_along(.$data), function(i){
        .$data[[i]] %>%
          group_by(IDBOOK, students_per_book_female) %>%
          count() %>%
          .$students_per_book_female %>%
          sum()
      })) %>%
    mutate(
      students_per_question_male = sapply(seq_along(.$data), function(i){
        .$data[[i]] %>%
          group_by(IDBOOK, students_per_book_male) %>%
          count() %>%
          .$students_per_book_male %>%
          sum()
      })) %>%
    mutate(
      correct_ratio_per_question = sapply(seq_along(.$data), function(i){
        tot_students  = .$students_per_question[i]
        .$data[[i]] %>%
          filter(student_gave_correct_answer) %>%
          nrow()/tot_students
      })) %>%
    mutate(
      correct_ratio_per_question_female = sapply(seq_along(.$data), function(i){
        tot_female = .$students_per_question_female[i]
        .$data[[i]] %>%
          filter(student_gave_correct_answer, ITSEX == 1) %>%
          nrow()/ tot_female
      })) %>%
    mutate(
      correct_ratio_per_question_male = sapply(seq_along(.$data), function(i){
        tot_male = .$students_per_question_male[i]
        .$data[[i]] %>%
          filter(student_gave_correct_answer, ITSEX == 2) %>%
          nrow()/ tot_male
      })) %>%
    unnest()}

Now that the functions used to clean the data are created, they can be used to actually clean the data. To prevent needless typing, all 4 functions can be contained in a wrapper function.

clean_math <- function(df, cdbook){
  df %>%
    grab_math_questions() %>%
    get_book_info() %>%
    combine_datasets(cdbook) %>%
    get_question_info()}

chl_timss_math_11 <- clean_math(chl_achievement_11, achievement_codebook_11)
ltu_timss_math_11 <- clean_math(ltu_achievement_11, achievement_codebook_11)
kor_timss_math_11 <- clean_math(kor_achievement_11, achievement_codebook_11)
usa_timss_math_11 <- clean_math(usa_achievement_11, achievement_codebook_11)

The cleaned data can be used in further projects and should be stored in seperate files.

write_csv(chl_timss_math_11, '2011_TIMSS_CHILE.csv')
write_csv(ltu_timss_math_11, '2011_TIMSS_LITHUANIA.csv')
write_csv(kor_timss_math_11, '2011_TIMSS_KOREA.csv')
write_csv(usa_timss_math_11, '2011_TIMSS_USA.csv')

Visualization

A separate smaller dataframe will be created specifically for graphing. It will contain only data useful for graphing with ggplot(). The goal of this article is to visualize the typical response provided by students of each country to each question asked. This means that each graphic should show:

Information about the country
Information about the typical student of each country
Information about how the typical student of each country responds to each question.

A typical measurement will be the correct ratio which is the number of students in a country who correctly responded to a question divided by the total number of students. Gender variations of the correct ratio will also be used.

Another typical measurement will be the question rank which is a ranking of questions by how many students in a country were able to successfully answer the question. A question rank of 1 will represent the question with the highest correct ratio.

basic_graph_setup <- function(math_df, country){
  math_df %>%
    group_by(question, students_per_question, correct_ratio_per_question, correct_ratio_per_question_female,
             correct_ratio_per_question_male, FIELD_LABL, content_domain, cognitive_domain, question_type) %>%
    nest() %>%
    arrange(desc(correct_ratio_per_question)) %>%
    mutate(question_rank = 1:nrow(.)) %>%
    mutate(country = country) %>%
    mutate(diff_male_female = correct_ratio_per_question_male - correct_ratio_per_question_female) %>%
    mutate(dominant_gender = sapply(.$diff_male_female, function(x){
      if(x >0) {
        'Male'
      } else if(x<0){
        'Female'
      } else {
        'Tie'
      }
    }))
}

chl_basic_graph_info <- basic_graph_setup(chl_timss_math_11, 'Chile')
ltu_basic_graph_info <- basic_graph_setup(ltu_timss_math_11, 'Lithuania')
kor_basic_graph_info <- basic_graph_setup(kor_timss_math_11, 'Korea')
usa_basic_graph_info <- basic_graph_setup(usa_timss_math_11, 'USA')

international_basic_graph_info <- rbind(chl_basic_graph_info, ltu_basic_graph_info, kor_basic_graph_info, usa_basic_graph_info)

A standard graph is the bar plot. A bar plot that compares the males correct ratio to females correct ratio can be made with geom_col().

ggplot(international_basic_graph_info, aes(question_rank, diff_male_female)) +
  geom_col(aes(fill = dominant_gender)) +
  scale_y_continuous(limits = c(-.15, .15)) +
  facet_wrap(~country) +
  labs(x = 'Questions Ranked From Easiest to Hardest According to Country',
       y = 'Males Correct Ratio - Females Correct Ratio',
       fill = 'Gender',
       title = 'Gender Comparison Within Countries',
       subtitle = 'Bar Plot')

It seems that males in Chile have a lot more success than females in math, while females in Lithuania have a lot more success than males in math. Males in both Korea and the USA have more success than females in math, but the difference isn’t as distinguishable as it is in Chile.

Another standard graph is the scatter plot. A scatter plot that compares the males correct ratio to females correct ratio can be made with geom_point, however, a more interesting version of the scatter plot can be made with geom_text(). A text plot is the same as a scatter plot, but with the points replaced with text. In the following example points are replaced with each question’s id.

ggplot(international_basic_graph_info, aes(correct_ratio_per_question, diff_male_female)) +
  geom_text(aes(label = question, color = dominant_gender)) +
  facet_wrap(~country) +
  labs(x = 'n Students Who Correctly Answered Question / Total Students',
       y = 'Males Correct Ratio - Females Correct Ratio',
       color = 'Gender',
       title = 'Gender Comparison Within Countries',
       subtitle = 'Scatter Plot')

Unfortunately, the graph seems overly clustered and there is not much gained by plotting the actually question id. However the scatter plot format does show a potentially interesting differences between countries.

It is worth exploring the scatter plot format further and it may even be worth combining the scatter plot with the a density plot (if a bar plot shows normal distribution then it can also be represented as a density plot).

The gridExtra package provides the ability to combine multiple graphs into one. A combination of scatter plot and density plots might be useful.³

get_legend<-function(myggplot){
  tmp <- ggplot_gtable(ggplot_build(myggplot))
  leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
  legend <- tmp$grobs[[leg]]
  return(legend)
}


combomain <- ggplot(international_basic_graph_info, aes(correct_ratio_per_question, diff_male_female)) +
  geom_hline(yintercept = 0) +
  geom_vline(xintercept = .5) +
  geom_point(aes(color = country), size = 3, alpha = 1/3) +
  labs(x = 'n Students Who Correctly Answered Question / Total Students',
       y = 'Males Correct Ratio - Females Correct Ratio',
       color = NULL)

combolegend <- get_legend(combomain)

combomain <- combomain +
  theme(legend.position = 'none')

blank_graph <-  list(
  labs(x = NULL, y = NULL, color = NULL),
  theme(legend.position = 'none',
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        axis.text.x = element_blank(), 
        axis.text.y = element_blank(),
        axis.ticks = element_blank(),
        axis.line = element_blank(),
        plot.background = element_blank(), 
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(), 
        panel.border = element_blank(),
        panel.background = element_blank()) 
)
  

combotop = ggplot(international_basic_graph_info, aes(correct_ratio_per_question, color = country, fill = country)) +
  geom_density(alpha = .25) + 
  labs(title = 'Gender and Correct Ratio Comparison of All Countries',
       subtitle = 'Scatter and Density Combination Plot') +
  blank_graph

comboright = ggplot(international_basic_graph_info, aes(diff_male_female, color = country, fill = country)) +
  geom_density(alpha = .25) +
  coord_flip() +
  blank_graph

library(gridExtra, warn.conflicts =  F)

grid.arrange(combotop, combolegend, combomain, comboright, 
             ncol=2, nrow=2, widths=c(4, 1.4), heights=c(1.4, 4))

detach(package:gridExtra)

The plot shows stark differences between Korea and Chile with regards to correct ratio while Lithuania clearly stands out as female dominant with regards to the the difference in gendered correct ratios.

The static scatter plot is nice, but it would be better if we could compare questions on an individual level. This graph will be made into an interactive graph to display question specific information.

Question domain information can also be shown in bar plot format.

ggplot(international_basic_graph_info, aes(question_rank, correct_ratio_per_question)) +
  geom_col(aes(fill = content_domain)) +
  scale_y_continuous(limits = c(0,1 )) +
  facet_wrap(~country) +
  labs(x = 'Questions Ranked From Easiest to Hardest According to Country',
       y = 'n Students Who Correctly Answered Question / Total Students', 
       fill = 'Content Domain',
       title = 'Content Domain Comparison of Countries', 
       subtitle = 'Barplot')

The bar plot does not show any clear pattern with regards to content domain.

ggplot(international_basic_graph_info, aes(question_rank, correct_ratio_per_question)) +
  geom_col(aes(fill = cognitive_domain)) +
  scale_y_continuous(limits = c(0,1)) +
  facet_wrap(~country) +
  labs(x = 'Questions Ranked From Easiest to Hardest According to Country',
       y = 'n Students Who Correctly Answered Question / Total Students', 
       fill = 'Cognitive Domain',
       title = 'Cognitive Domain Comparison of Countries', 
       subtitle = 'Barplot')

The bar plot suggests that most countries succeed in the Knowing cognitive domain. the bar plot also suggests that Korea, while still relatively successful to other countries in all domains, is less successful in the Reasoning cognitive domain.

However, sometimes a simple box plot is the most informative.

international_domain <- international_basic_graph_info %>%
  group_by(question, cognitive_domain, content_domain) %>%
  nest() %>%
  gather(type, domain, -c(question, data)) %>% 
  unnest()

ggplot(international_domain, aes(domain, correct_ratio_per_question)) +
  geom_boxplot(aes(color = country)) +
  geom_hline(yintercept = 0.5) +
  labs(x = 'Domain',
        y = 'n Students Who Correctly Answered Question / Total Students',
        color = 'Country',
        title ='TIMSS: Correct Response Rate Per Domain',
        subtitle = 'Boxplot') +
  theme(axis.text.x = element_text(angle = -45, hjust = 0))

It turns out every country’s worst domain is Reasoning and every country favors domains Data and Chance and Knowing

A gendered look at these box plots is worth a look.

ggplot(international_domain, aes(domain, diff_male_female)) +
  geom_boxplot(aes(color = country)) +
  geom_hline(yintercept = 0) +
  labs(x = 'Domain',
       y = 'Males Correct Ratio - Females Correct Ratio',
       color = 'Country',
       title = 'TIMSS: Comparison of Male and Female Response Per Domain',
       subtitle = 'Boxplot') +
  theme(axis.text.x = element_text(angle = -45, hjust = 0))

Females seems to excel in the Algebra and Knowing domains while males seem to excel in the Data and Chance and Number domains.

A scatter plot seems to be the most effective and showing information about individual questions.

ggplot(international_basic_graph_info, aes(correct_ratio_per_question, diff_male_female)) +
  geom_point(aes(color = content_domain, shape = cognitive_domain), position = 'jitter') +
  facet_wrap(~country) +
  labs(x = 'n Students Who Correctly Answered Question / Total Students',
       y = 'Males Correct Ratio - Females Correct Ratio',
       color = 'Gender',
       title = 'Gender Comparison Within Countries',
       subtitle = 'Scatter-Text Plot')

Korea performs uncharacteristically poorly on three questions all of which relate to the Data and Chance domain.

The problem with trying to graph TIMSS at the question level, is that there are more than 200 questions given to 4 different countries meaning that 800 point must be plotted. Comparing how each country performed on a specific question is near impossible because each the points in the plot have to have a different representation for more than 200 questions. Text doesn’t work because plotted text becomes illegible when more than two plotted text points overlap.

The best solution to comparing individual questions across countries is by using a network graph. A network graph connects nodes of similar information to each other. If the nodes of each country are ordered by rank, then the network graph can help show the differences in country responses to any particular question. A package optimized for network graphs is igraph, however, it requires learning data structures specific to network graphs. The below network graph is a simple network graph that can be made using ggplot() without learning any new data structures.⁴

network_graph_setup <- function(basic_graph, str_abr, link1, link2, from_top){
  basic_graph %>%
    select(-data) %>%
    mutate(id = str_c(str_abr, '_', question)) %>%
    mutate(link1 = link1) %>%
    mutate(link2 = link2) %>%
    mutate(order = from_top)
}

chl_network <- network_graph_setup(chl_basic_graph_info, 'chl', 'chl_ltu', 'bottom', 4)
ltu_network <- network_graph_setup(ltu_basic_graph_info, 'ltu', 'ltu_usa', 'chl_ltu', 3)
usa_network <- network_graph_setup(usa_basic_graph_info, 'usa', 'usa_kor', 'ltu_usa', 2)
kor_network <- network_graph_setup(kor_basic_graph_info, 'kor', 'top', 'usa_kor', 1)

international_nodes <- rbind(chl_network, ltu_network, usa_network, kor_network) %>%
  group_by(country, link1, link2) %>%
  nest() %>%
  gather(link_type, link, -c(data, country)) %>%
  unnest() %>%
  mutate(link = str_c(question, '_', link)) %>%
  arrange(order, question_rank) %>%
  mutate(country = factor(country, levels = unique(country), ordered = T)) %>%
  mutate(FIELD_LABL = str_to_title(FIELD_LABL)) %>%
  mutate(content_domain = str_to_title(content_domain)) %>%
  mutate(cognitive_domain = str_to_title(cognitive_domain))

ggplot(international_nodes, aes(question_rank, country, color = correct_ratio_per_question)) +
  geom_line(aes(group = as.factor(link)), color = 'grey80') +
  geom_point(size = 2) +
  scale_color_gradient2(high = 'forestgreen', mid = 'yellow', low = 'saddlebrown', midpoint = .5) +
  labs(x = 'Question Rank: From easiest to hardest',
       y = 'Country',
       color = 'Question Correct Ratio',
       title = 'TIMSS: Comparison of Country Correct Response Per Question',
       subtitle = 'Bi-Partite Network Graph')

This network graphs correct ratio, question rank, and links the same questions given to different countries.

The network graph is nice, but it is difficult to follow the links between nodes. This graph will be made into an interactive graph to display question specific information.

Conclusion

The priciple purpose of data visualization is to help people understand data. TIMSS is an assessment that provides a rich dataset. It is difficult to understand such a rich dataset by looking at numbers alone. This article only analyzed a small portion of the data available from TIMSS. The only dataset analyzed was the student achievement dataset. TIMSS offers other datasets that can combine the student level data provided by the student achievement dataset with student background information, school background information, and even teacher background information. The already rich dataset can be made richer, meaning that even more information can be derived at the question level. If more information can be derived at the question level, and better visualizations are created, then policy makers and educators can make better decisions with regards to education reform.

Further work will be dedicated to enriching the question-level data and visualization with student, school, and teacher background information.

Admittedly, the static graphs provided in this article leave much to be desired. We can see differences in how countries and genders perform on individual questions, however we cannot see information about the actual question. That is to say, we can see question rank, we can see question domain, we can see how a country performs on a question, however we cannot actually see the question id or the question summary. The attempted text-scatter plot fa iled because there was too much overlap.

A better alternative to these static graphs are interactive graphs. This paper identified three static graphs that make ideal candidates for informative interactive graphs: the two scatter plots and the network graph. Creation of these three interactive graphs will hopefully confirm that the TIMSS data can be effectively vizualized and analyzed at the question level.

“TIMSS 2011 User Guide for the International Database”↩
Learn more about tidy data by reading Wickham’s paper “Tidy Data”↩
Below code inspired by the STHDA article “ggplot2 - Easy way to mix multiple graphs on the same page - R software and data visualization”↩
An awesome resource for learning igraph is Katherine Ognyanova’s tutorial Network Analysis and Visualization with R and igraph ↩