Appendix C A List of Projects

[TODO] expand the description of this appendix.

We hand picked these project ideas for you to practice different techniques. Unless otherwise specified, you can take a crack at each the projects listed below with these steps:

  • Import the data frame

    • A description of each variable, in plain English. For example, “lifeExp – Average life expectancy in each country, measured in years.” In the description, you may include information such as: What does this variable measure? If this is a numerical variable, what is the unit of the measurement? Is this a key variable, i.e., you plan to use it in the replication analyses?

    • Missing data analysis. For each key variable, investigate whether there is any missing value (empty cell, or NA, or any other types of illegitimate values). Report missingness on each variable; how many values are missing in each column

    • For numerical variables, examine: mean, standard deviation, minimum, maximum, histogram, boxplot

    • For categorical variables: frequency tables

Afterwards, you may follow project-specific instructions.

C.1 Musician productivity

[TODO] provide a url for the data frame

Why are some classical composers so much more productive than others? In this exercise, you will explore the relationship between musician’s productivity and variables such as composer birth year, eminence, and periods in musical history.

Data and background material

The dataset for this project is embedded in the published article which can be accessed here. There is even a Ted talk about this article!

Target techniques

Data wrangling, correlation, regression

Guided steps

  • Examine correlations such as between average annual productivity, eminence and composer birth year, or between versatility and birth year.

    • Conduct a multiple regression with productivity being the outcome, and eminence, versatility, historical year being the predictors. Report and interpret the results.

C.2 Moral machine71

Consider this: a driverless car must choose between killing two passengers or five pedestrians. Which outcome do you think is more acceptable? This is the question that a group of researchers at MIT presented to tens of thousands of participants across the world, in order to understand how ethical standards in different societies would fare when faced with moral dilemmas presented by machine intelligence such as self-driving cars.

Data and background material

This dataset for this project can be accessed here. You can read the study associated with this dataset published in Nature here. There is also a Ted talk about this study.

Target technique

Data visualization, correlation

Guided steps

  • Visualize the relationship between Moral Machine preferences and other variables at the country level such as economic inequality and gender gap in health, similar to Figure 4 in the original article.

    • Compute the correlation between Moral Machine preferences and those variables at the country level.

C.3 Cost of lying72

If lying is cognitively harder than telling the truth, would “practice makes perfection” holds true for deception as well? A group of researchers explored the effects of practice on one’s ability to lie.

Data and background material

You can access the dataset for this project here. You can also read the study associated with this dataset here.

Target technique

ANOVA

Guided steps

  • Visualize the interaction between experiment phase (baseline vs. practice vs. test) and condition (frequent-lie vs. frequent-truth vs. control) on the test trials’ latency.

  • Conduct a three-by-three mixed ANOVA on the effect of experiment phase and condition on the test trials’ latency.


  1. We would like to thank Steve Highstead for bringing this study to our attention.↩︎

  2. We would like to thank Alex Rivard for bringing this study to our attention.↩︎