---
title: "Partners or Strangers? Cooperation, Monetary Trade, and the Choice of Scale of Interaction"
output: html_document
date: '2023-01-27'
---

```{r setup, include=FALSE}

# load packages
library(haven)
library(dplyr)
library(tidyverse)
library(ggplot2)
library(ggpubr)
library(knitr)
library(Rtools)
library(devtools)
devtools::install_github("nicolash2/ggbrace")
library(ggbrace)
library(mfx)
library(stargazer)

knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(error = TRUE)
options(dplyr.summarise.inform = FALSE)
```

# R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.

When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:


# Overview: load data and set up all important variables 

```{r}

data <- read_dta("data/stata/elaborations/data_money_partners_strangers.dta")

# only number of people in group given (2, 12 or 24), convert to group size small or large
data$groupSize <- NA
data$groupSize[data$numInGroup == 2] <- "partnership" 
data$groupSize[data$numInGroup > 2] <- "large" 

# define ResonseTime as sum over all response times
data$ResponseTime = rowSums(data[33:42])

# delete unneccessary columns
data <- data %>%
  dplyr::select(session, treatment, cycle, periodCycle, Subject, numInGroup, type, choice, otherChoice, outcome, 
         Profit, order, pref, gender, groupSize, RightAnswers, ResponseTime)

# impression of data

view <- as.data.frame(data)
head(view)

```

The variables are declared as follows:

session: session of the experiment

treatment:    1=token condition,    2=control 

cycle: cycle of the session, 6 cycles per session, where 1-4 are training phase

periodCycle: periods in cycle, cycles end randomly after period 16

Subject: ID for person participating in the experiment

type:         1=consumer,           2=producer

choice:             0=no choice,    
  for producer:     1=do not help,  2=give help,    3=give help if consumer gives token only  
  for consumer:     4=keep ticket,  5=give ticket,  7=give ticket if producer gives help only 
  
otherChoice: choice of partner

outcome:      0=no cooperation      1=cooperation

Profit: profit in periodCycle

order: order of group size in training (12-2-12-2 or 2-12-2-12), 0=start with large group 1=start with partnership
  
pref:         preference for large groups,           1=fixed pair,              2=random groups of 24

gender:       1=female,             2=male

groupSize: partnership (2) or large (12 or 24)

RightAnswers: amount of right answers in comprehension test on the experimental instructions

ResponseTime: response time for all questions in comprehension test on the experimental instructions


# CHAPTER A: TRAINING PHASE

We want to analyze the difference between cooperation rates and efficiency and the willingness to help in token and control group. 

Result 1: Average cooperation rates were higher in partnerships than in large groups. However, partnerships did not create significantly more surplus.

Result 2: In control give help (unconditional Z) was the predominant choice.

Result 3: In tokens producers were reluctant to help without being concurrently compensated and most of them choose to give help only in exchange for a token.

```{r}

# result 1
knitr::kable(data %>% # kable function for nicer output
  filter(cycle<=4 & periodCycle<=16) %>% # set cycle to training phase
  group_by(treatment, numInGroup) %>% # differentiate by treatment and group size
  summarize(
    cooperation = mean(outcome),
    efficiency = mean((Profit-4.5)/4.5) # normalize by 4.5
    # 4.5 corresponds to payoff if no one cooperates (see p. 200 in paper)
  ))

# result 2
bp1 <- data %>% 
  filter(cycle<=4 & periodCycle<=16 & treatment==2) %>% # training, control
  filter(type==2) %>% # producer
  count(choice) %>% # get percentages of decisions
  mutate(freq=n/sum(n)) %>%
  group_by(choice) # 0.597 give help

# make a barplot for illustration
knitr::kable(bp1)
bp1 <- as.data.frame(bp1)
bp1$help <- ifelse(bp1$choice==1, "no help", "help") # rename for plot

barplot(height=bp1$freq, names=bp1$help, xlab="Choice", ylab="Frequency of Choice")

# result 3
bp2 <- data %>% 
  filter(cycle<=4 & periodCycle<=16 & treatment==1) %>% # training, token
  filter((type==2 & otherChoice!=0) |(type==1 & choice != 0)) %>% # do not include no choice
  group_by(type) %>%
  count(choice) %>% # get percentages of decisions
  mutate(freq=n/sum(n)) %>%
  group_by(choice) # 0.634 of producers choose help if give

knitr::kable(bp2)
bp2 <- filter(bp2, type==2)
bp2 <- as.data.frame(bp2)
bp2$help <- ifelse(bp2$choice==1, "help", ifelse(bp2$choice==2, "no help", "help if token"))

barplot(height=bp2$freq, names=bp2$help, xlab="Choice", ylab="Frequency of Choice")

```

# TABLE 4: HOW MONEY AND GROUP SIZE INFLUENCE EFFICIENCY

Let us verify the observations by running a regression. We need cooperation and efficiency rates for tokens and control for the training phase as well as group sizes and transform all categorical variables to dummy variables. Then fit two regressions, one where cooperation is the dependent variable and one where efficiency is the dependent variable.

```{r results="asis"}

# get data for training phase for token and control and group size
data_t4 <- data %>% 
  filter(cycle<=4 & periodCycle<=16) %>%
  group_by(treatment,groupSize, periodCycle, cycle) %>%
  summarize(
    cooperation = mean(outcome),
    efficiency = mean((Profit-4.5)/4.5)
  )

# transform in dummy variables with interaction for treatment and group size
control <- ifelse(data_t4$treatment == 2, 1, 0)
tokens <- ifelse(data_t4$treatment == 1, 1, 0)
partnership <- ifelse(data_t4$groupSize == "partnership", 1, 0)
large <- ifelse(data_t4$groupSize == "large", 1, 0)
data_t4$tokens_x_small=tokens * partnership
data_t4$tokens_x_large=tokens * large
data_t4$control_x_large=control * large
data_t4$c1<-ifelse(data_t4$cycle == 1, 1, 0)
data_t4$c2<-ifelse(data_t4$cycle == 2, 1, 0)
data_t4$c3<-ifelse(data_t4$cycle == 3, 1, 0)
data_t4$c4<-ifelse(data_t4$cycle == 4, 1, 0)

# delete dummy for cycle 4 as cycle 4 is a linear combination of cyles 1&2&3
data_t4 <- subset(data_t4, select = -c(c1))  

# fit lm models
m1 <- lm(cooperation ~ control_x_large + tokens_x_small + tokens_x_large + c2+c3+c4, data=data_t4)
m2 <- lm(efficiency ~ control_x_large + tokens_x_small + tokens_x_large + c2+c3+c4, data=data_t4)

# compare models in stargazer
stargazer(m1, m2, type="html",
          colums.labels = c("Model 1", "Model 2")
)
```
We indeed see that cooperation rates in partnerships were higher than in groups, however, efficiency was higher in groups.

We also see that participants in tokens selected to interact in large groups more frequently than in control. 0.557 of participants in tokens prefer large groups, but only 0.393 in control.

```{r}

# result 2
knitr::kable(data %>% 
  filter(cycle==4 | cycle==5) %>% # we want to look at cycles 5&6, 
  # but prefs are expressed the cycle before, so filter cycles 4&5
  group_by(treatment) %>%
  summarize(
    pref_for_large_groups = mean(pref-1, na.rm=TRUE) # -1 as pref is coded 1/2
  )) # 0.557 vs 0.393

```

# TABLE 5: SHARE OF PREFERENCE FOR LARGE GROUPS

We want to explore the preference for large groups in control and token condition. 

Result 1: Over all cycles the preference for large groups is higher in tokens than in control.

Result 2: Looking only at the selection phase, that is cycles 5 and 6, the preference for large groups was 0.432 and 0.354 in control and 0.574 and 0.542 in tokens.

```{r}

# result 1
knitr::kable(data %>% 
  group_by(treatment) %>%
    summarize(
      pref_for_large_groups = mean(pref-1, na.rm=TRUE)
    )) # 0.421 and 0.546

# result 2
# selection phase only, as choices are made the cycle before, filter for 4&5 instead of 5&6
bp3 <- data %>% 
  filter(cycle==4 | cycle==5) %>%
  group_by(cycle,treatment) %>%
  summarize(
    pref_for_large_groups = mean(pref-1, na.rm=TRUE)
  ) # 0.432 and 0.573, 0.354 and 0.542

knitr::kable(bp3)
bp3 <- filter(bp3, cycle==5)
bp3 <- as.data.frame(bp3)
bp3$treat <- ifelse(bp3$treatment==1, "token", "control")

barplot(height=bp3$pref_for_large_groups, names=bp3$treat, xlab="Treatment", ylab="Mean preference for large groups")

```

We see that participants choose to interact in groups more frequently in tokens than in control.

# TABLE 6: HOW MONEY AFFECTS PREFERENCES FOR LARGE GROUPS 

We want to show that the result above is robust to separately considering cycles 5 and 6 and controlling for order, sex and understanding of the experiment, expressed by the amount of right answers and the response time on the comprehension test.

```{r}
data_t6 <- data %>%
  filter(cycle==4|cycle==5) %>%
  group_by(session, Subject, treatment, groupSize, cycle, order, RightAnswers, ResponseTime, gender) %>%
  summarize(
    pref_for_large_groups = mean(pref-1, na.rm=TRUE)
  )

# dummy coding
data_t6$tokens = ifelse(data_t6$treatment == 2, 0, 1)
data_t6$cycle6 = ifelse(data_t6$cycle == 5, 1, 0)

# use probitmfx function for panel probit regression, cluster for sessions
probitmfx(pref_for_large_groups ~ tokens + cycle6 + order + RightAnswers + ResponseTime + gender, clustervar1 = "session", data = data_t6)

```
The tokens coefficient is significant, meaning that subjects in tokens more often preferred to interact in large groups.

# FIGURE 1: DISTRIBUTION OF HELP IMBALANCE

As cooperation rates were not significantly different between conditions as seen above, the exposure to tokens must be the reason for the preference of large groups. We will show that the experience of full cooperation, that is, a subject received the same amount of help they gave, and help imbalance influenced the decisions for large groups.

To compute the help imbalance we compare the times that a subject received help and the times they gave help. That is, we have an imbalance if the amount of giving and receiving help differed. Full cooperation is attained if help is always exchanged and the imbalance is 0. Help imbalance is -1 if help was always given but never received and +1 vice versa. While subjects have the ability to punish their counterpart for exploiting behaviour in partnerships, this is not possible in large groups where subjects are matched with new partners each time. The question is whether tokens can mitigate this exploitation hazard. Let us look at the plots for fixed pairs and large groups in control and tokens condition regarding the help imbalance.
 

```{r}

# get data for individual graphs

# top left
# fixed pairs in control
a = data %>%
  filter(cycle<=4 & periodCycle<=16 & treatment==2 & numInGroup==2) %>%
  group_by(Subject, session, cycle) %>%
  summarize(
    N_help_received=sum(outcome==1&type==1),
    N_help_given=sum(outcome==1&type==2),
    imbalance=(N_help_received-N_help_given)/8,
    # each cycle consists of 16 periods, where each subject is 8 times consumer and 8 times producer
    # therefore /8 to get shares
    full_coop=imbalance==0 & N_help_received/8==1, # share of help received==1
    panel="a"
  )

# top right
# fixed pairs in tokens
b = data %>%
  filter(cycle<=4 & periodCycle<=16 & treatment==1 & numInGroup==2) %>%
  group_by(Subject, session, cycle) %>%
  summarize(
    N_help_received=sum(outcome==1&type==1),
    N_help_given=sum(outcome==1&type==2),
    imbalance=(N_help_received-N_help_given)/8,
    full_coop=imbalance==0 & N_help_received/8==1,
    panel="b"
  )

# bottom left 
# large groups in control
c = data %>%
  filter(cycle<=4 & periodCycle<=16 & treatment==2 & numInGroup>2) %>%
  group_by(Subject, session, cycle) %>%
  summarize(
    N_help_received=sum(outcome==1&type==1),
    N_help_given=sum(outcome==1&type==2),
    imbalance=(N_help_received-N_help_given)/8,
    full_coop=imbalance==0 & N_help_received/8==1,
    panel="c"
  )

# bottom right 
# large groups in tokens
d = data %>%
  filter(cycle<=4 & periodCycle<=16 & treatment==1 & numInGroup>2) %>%
  group_by(Subject, session, cycle) %>%
  summarize(
    N_help_received=sum(outcome==1&type==1),
    N_help_given=sum(outcome==1&type==2),
    imbalance=(N_help_received-N_help_given)/8,
    full_coop=imbalance==0 & N_help_received/8==1,
    panel="d"
  )

# combine data
abcd = rbind(a,b,c,d)

# count imbalance per panel
hist_count <- abcd %>%
  group_by(panel, imbalance) %>%
  count(imbalance) 

# plot all graphs with shares of imbalance
# use geom_brace from ggbrace package from github to plot nice braces with shares

pa <- ggplot(data=a, aes(x=imbalance, fill=full_coop)) +
  geom_histogram(aes(y = after_stat(count / sum(count))), binwidth=0.125, center=0) +
  ylim(0,0.65) + 
  ggtitle("Control condition") + ylab("Fixed pairs") + xlab("") + 
  scale_fill_discrete("", labels=c("Partial or no cooperation", "Full cooperation")) +
  geom_brace(aes(c(-1,-0.1),c(0.2,0.22), # defines position and size of brace
                 # get data from a where imb<0, compute share of imb<0
                 label=round(sum(hist_count$n[(hist_count$panel=="a")==1 & (hist_count$imbalance<0)==1])/sum(hist_count$n[(hist_count$panel=="a")]),3)), 
             inherit.data=F, labelsize = 3) + 
  geom_brace(aes(c(0.1,1),c(0.2,0.22), 
                 # get data from a where imb>0, compute share of imb>0
                 label=round(sum(hist_count$n[(hist_count$panel=="a")==1 & (hist_count$imbalance>0)==1])/sum(hist_count$n[(hist_count$panel=="a")]),3)), 
             inherit.data=F, labelsize = 3)

# repeat for b,c,d

pb <- ggplot(data=b, aes(x=imbalance, fill=full_coop)) + 
  geom_histogram(aes(y = after_stat(count / sum(count))), binwidth=0.125, center=0) +
  ylim(0,0.65) + 
  ggtitle("Token condition") + ylab("") + xlab("") +
  geom_brace(aes(c(-1,-0.1),c(0.2,0.22), 
                 label=round(sum(hist_count$n[(hist_count$panel=="b")==1 & (hist_count$imbalance<0)==1])/sum(hist_count$n[(hist_count$panel=="b")]),3)), 
             inherit.data=F, labelsize = 3) +
  geom_brace(aes(c(0.1,1),c(0.2,0.22), 
                 label=round(sum(hist_count$n[(hist_count$panel=="b")==1 & (hist_count$imbalance>0)==1])/sum(hist_count$n[(hist_count$panel=="b")]),3)), 
             inherit.data=F, labelsize = 3)

pc <- ggplot(data=c, aes(x=imbalance)) + 
  geom_histogram(aes(y = after_stat(count / sum(count))), binwidth=0.125, center=0) +
  ylim(0,0.65) + 
  xlab("Help imbalance") + ylab("Large groups") +
  geom_brace(aes(c(-1,-0.1),c(0.2,0.22), 
                 label=round(sum(hist_count$n[(hist_count$panel=="c")==1 & (hist_count$imbalance<0)==1])/sum(hist_count$n[(hist_count$panel=="c")]),3)), 
             inherit.data=F, labelsize = 3) +
  geom_brace(aes(c(0.1,1),c(0.2,0.22), 
                 label=round(sum(hist_count$n[(hist_count$panel=="c")==1 & (hist_count$imbalance>0)==1])/sum(hist_count$n[(hist_count$panel=="c")]),3)),
             inherit.data=F, labelsize = 3)

pd <- ggplot(data=d, aes(x=imbalance)) + 
  geom_histogram(aes(y = after_stat(count / sum(count))), binwidth=0.125, center=0) +
  ylim(0,0.65) +
  xlab("Help imbalance") + ylab("") +
  geom_brace(aes(c(-1,-0.1),c(0.2,0.22), 
                 label=round(sum(hist_count$n[(hist_count$panel=="d")==1 & (hist_count$imbalance<0)==1])/sum(hist_count$n[(hist_count$panel=="d")]),3)), 
             inherit.data=F, labelsize = 3) +
  geom_brace(aes(c(0.1,1),c(0.2,0.22), 
                 label=round(sum(hist_count$n[(hist_count$panel=="d")==1 & (hist_count$imbalance>0)==1])/sum(hist_count$n[(hist_count$panel=="d")]),3)),
             inherit.data=F, labelsize = 3)

# combine all graphs and plot
ggarrange(
  pa, pb, pc, pd,
  common.legend=TRUE, legend="bottom"
)

# amount of 0 help imbalance
knitr::kable(data %>% 
  filter(cycle<=4 & periodCycle<=16) %>%
  group_by(treatment, groupSize, Subject, session, cycle) %>%
  summarize(
    N_help_received=sum(outcome==1&type==1),
    N_help_given=sum(outcome==1&type==2),
    imbalance=(N_help_received-N_help_given )/8
  ) %>%
  group_by(treatment, groupSize) %>%
  count(treatment, imbalance==0) %>% # count how often 0 imb was attained per treatment
  summarize(
    share_imbalance_yes_vs_no =  n/sum(n) # compute shares
  )) # 0.563 vs 0.156, 0.609 vs 0.299

```

We see that a zero imbalance was more common in partnerships than large groups and also more common in tokens than control. Also note that in large groups more people were exploited than in large groups.

# TABLE 7

Let us look at how the experiences during the training phase impacted the preference for large groups in the selection phase. Therefore look at the preferences for cycles 5 and 6 as well as experienced imbalances in pairs vs groups and experienced full cooperation rate.

(I did not obtain the same result as in the paper, therefore these aggregations are likely not completely accurate.)

```{r}
pref4 <- data %>%
  filter(cycle==4) %>% # choice for cycle 5
  group_by(session, Subject) %>%
  summarize(
    pref4 = mean(pref-1, na.rm=TRUE)
  )

pref5 <- data %>%
  filter(cycle==5) %>% # choice for cycle 6
  group_by(session, Subject) %>%
  summarize(
    pref5 = mean(pref-1, na.r=TRUE)
  )

imb <- data %>%
  filter(cycle<=4 & periodCycle<=16) %>%
  group_by(Subject, session) %>%
  summarize(
    treatment=treatment,
    N_gift_received=sum(outcome==1&type==1),
    N_gift_given=sum(outcome==1&type==2),
    imbalance=(N_gift_received - N_gift_given )/8
  ) 

imb <- distinct(imb)

fc <- data %>%
  filter(cycle<=4 & periodCycle<=16) %>%
  group_by(Subject, session) %>%
  summarize(
    full_coop=mean(outcome)
  )

merge1 <- merge(pref4, pref5, by=c("session", "Subject"))
merge2 <- merge(merge1, imb, by=c("session", "Subject"))
merge3 <- merge(merge2, fc, by=c("session", "Subject"))

data_t7 <- data %>%
  filter(cycle==4|cycle==5) %>%
  filter(periodCycle<=16) %>%
  group_by(session, Subject, treatment, numInGroup, cycle, order, RightAnswers, ResponseTime, gender) %>%
  summarize(
    pref_for_large_groups = mean(pref-1, na.rm=TRUE)
  )

merge4 <- merge(merge3, data_t7, by=c("session", "Subject"))
data_t7 <- merge4

tokens <- ifelse(data_t7$treatment.x == 1, 1, 0)
cycle6 <- ifelse(data_t7$cycle == 5, 1, 0)
data_t7$cycle6 <- ifelse(data_t7$cycle == 5, 1, 0)
data_t7$tokens_x_cycle5=tokens * !cycle6
data_t7$tokens_x_cycle6=tokens * cycle6

full_coop_dummy <- ifelse(data_t7$full_coop >0, 1, 0)
partnership <- ifelse(data_t7$numInGroup == 2, 1, 0)
data_t7$full_coop2 <- full_coop_dummy * partnership
data_t7$imbalance2 <- data_t7$imbalance * partnership
data_t7$imbalance12 <- data_t7$imbalance * !partnership

probitmfx(pref_for_large_groups ~ tokens_x_cycle5 +tokens_x_cycle6 + cycle6 + imbalance2 + imbalance12 + full_coop2 + order + 
            RightAnswers + ResponseTime + gender, 
          clustervar1 = "session", data = data_t7)

```

Note that although the coefficients are not the same as in the paper, they go in the same direction. Full_coop2 is the exception, however, the coefficient is not significant in this model.

This regression shows that subjects who had a positive imbalance, that is, received more help than they gave, were more likely to choose large groups, while those that were exploited and had a negative help imbalance were more likely to prefer partnerships. This makes sense as in partnerships it is possible to directly punish the other player.

# TABLE 8: HOW MONETARY TRADE AND GROUP SIZE INFLUENCE EFFICIENCY

Lastly, we want to analyze differences in efficiency and cooperation rates between group sizes and treatment in the selection phase. Efficiency in partnerships are quite similar around 55% for both conditions, but in groups efficiency was significantly higher in tokens than in control.
We run a linear regression for the selection phase where select treatment, group size, cycle and session from the data.

```{r}

knitr::kable(data %>%
  filter(cycle==5|cycle==6) %>% # selection phase only
  filter(periodCycle<=16) %>%
  group_by(treatment, groupSize) %>%
  summarize(
    efficiency = mean((Profit-4.5)/4.5)
  ))

data_t8 <- data %>%
  filter(cycle==5|cycle==6) %>% # selection phase only
  filter(periodCycle<=16) %>%
  group_by(treatment, groupSize, cycle, session) %>%
  summarize(
    cooperation = mean(outcome),
    efficiency = mean((Profit-4.5)/4.5) # normalize profit
  )

# dummy coding
control <- ifelse(data_t8$treatment == 2, 1, 0)
tokens <- ifelse(data_t8$treatment == 1, 1, 0)
partnership <- ifelse(data_t8$groupSize == "partnership", 1, 0)
large <- ifelse(data_t8$groupSize == "large", 1, 0)
data_t8$tokens_x_small=tokens * partnership
data_t8$tokens_x_large=tokens * large
data_t8$control_x_large=control * large
data_t8$cycle6 <- ifelse(data_t8$cycle == 6, 1, 0)

# fit linear model
t8 <- lm(efficiency ~ control_x_large + tokens_x_small + tokens_x_large + cycle6, data=data_t8)
summary(t8)

```

The linear regression shows how efficiency varies with group sizes and treatment. The default condition is control and partnerships. We can verify that in partnerships efficiency was similar but in large groups a higher efficiency was attained in tokens.

# TABLE 9: COOPERATION IN THE SELECTION PHASE

Cooperation rates in partnerships are also quite similar at around 85% for both conditions, but in groups the average cooperation rate was significantly higher in tokens than in control. Using the same data as above we can also run a linear regression for cooperation rates. Again, the default condition  is control and partnerships.

```{r}

knitr::kable(data %>%
  filter(cycle==5|cycle==6) %>% # selection phase only
  filter(periodCycle<=16) %>%
  group_by(treatment, groupSize) %>%
  summarize(
    cooperation = mean(outcome)
  ))

# fit linear model
t9 <- lm(cooperation ~ control_x_large + tokens_x_small + tokens_x_large + cycle6, data=data_t8)
summary(t9)

```
Our first impressions are again verified. In partnerships cooperation rates are statistically similar across conditions. We see that in both tokens and control cooperation rates drop when looking at large groups. However, they drop significantly lower in control than in tokens.