Last shall be first: A field study of biases in seuqential performance evaluation on the Idol series
Forthcoming in the Journal of Economic Behavior and Organization (2009)
When performances are evaluated they are very often presented in a sequential order. Previous research suggests that the sequential presentation of alternatives may induce systematic biases in the way performances are evaluated. Such a phenomenon has been scarcely studied in economics. Using a large data set of performance evaluation in the Idol series (N=1522), this paper presents new evidence about the systematic biases in sequential evaluation of performances and the psychological phenomena at the origin of these biases.
- 56 Views
Last shall be first: A field study of biases in sequential performance evaluation on the Idol series.
Lionel Page∗ Westminster Business School University of Westminster 35, Marylebone Road NW1 5LS, London, UK l.page@wmin.ac.uk Katie Page Heythrop College University of London Kensington Square W8 5HQ, London, UK k.page@heythrop.ac.uk
∗ Corresponding
author. Tel: +44 2 07911 5000 (2706)
1
Abstract When performances are evaluated they are very often presented in a sequential order. Previous research suggests that the sequential presentation of alternatives may induce systematic biases in the way performances are evaluated. Such a phenomenon has been scarcely studied in economics. Using a large data set of performance evaluation in the Idol series (N=1522), this paper presents new evidence about the systematic biases in sequential evaluation of performances and the psychological phenomena at the origin of these biases. JEL codes: D81, Z1 Keywords: order effects, memory, television show
We frequently make judgements and decisions about information which is presented to us in a sequential manner. This in particular is the case when we have to quickly assess the performance of individuals within a pool of contestants: job interviews, singing auditions, political debates, or even dating evenings. The psychological literature suggests that sequential presentation of information may influence the way each piece of information is processed and recorded. Studies in economics (Neilson, 1998) and marketing (Novemsky and Dhar, 2005) have also found that a choice among situations of sequential choices may be dependent on the history of the sequence. This issue is of special importance for situations of performance evaluation. If there is any effect of the order in which people are assessed on the final evaluation of individual performances, it means that the evaluation process is biased. Stated simply, what should be completely irrelevant information (the passing order) plays a significant role in the evaluation process. The issue of potential bias in performance evaluation raises two main concerns: efficiency and fairness. First, from the perspective of the assessor, any bias in the evaluation process results in a loss in terms of efficiency because the best options may eventually not be selected. Second, from the perspective of the contestant, any bias in the evaluation process raises the question of the fairness of the selection process: are some contestants disadvantaged relative to others for irrelevant reasons? If there are biases in evaluation processes involving a sequential ordering of contestants/options, we need to be aware of these in order to design strategies to minimize their adverse effects and ensure that outcomes are as fair and efficient as possible. Paradoxically, few studies have attempted to assess empirically the presence of systematic biases in the sequential evaluation of performance (Bruine de Bruin, 2005). In economics, the fairness and efficiency of performance evaluation procedures have mostly been studied relative to the possible biases arising from the judges’ incentives (Prendergast and Topel, 1993; Clerides and Stengos, 2006) and from discriminating preferences (Goldin and Rouse, 2000; Segrest Purkiss et al., 2006). The economic literature has largely ignored the possible distortions arising from the pure cognitive biases in the evaluation of performance. 2
Such biases, if significant and of practical importance, must however be studied carefully in order to limit their detrimental effects on the efficiency and fairness of selection procedures which rely on the evaluation of performances. Using a unique dataset on the Idol series spanning competitions from 8 countries (Australia, Brazil, Canada, Germany, India, Netherlands, United Kingdom, USA), this paper contributes to our understanding of order biases in performance evaluation in a naturalistic setting. Because of their generic format, the Idol shows provide a large set of identical situations where a group of individuals have to perform sequentially and are assessed by television viewers who vote for them. The statistical analysis of this large dataset of 1,522 performances over 165 shows confirms some of the previous empirical literature on ordering effects and contributes to furthering our understanding of the underlying psychological phenomena of these effects. Our results suggest that systematic biases in sequential evaluation of performance arise through two parallel processes: the effect of the ordering on the propensity to remember each candidate, and the propensity to assess a contestant by comparing him or her to the previous contestant(s). The remainder of the paper is organized as follows, Section 1 presents the literature on sequential biases in performance evaluation, Section 2 presents our dataset and Section 3 our results. Section 4 concludes.
1
Sequential biases in performance evaluation
There are two main reasons why biases may result from sequential ordering. The first is that judges may not remember equally well the different performances in the sequence, and second, the criteria/benchmark of the evaluation may change over time. For example, the evaluation of a performance may be dependent on the history of previous performance(s). These potential caveats may produce two types of biases. First, ordering biases may result because your performance evaluation is conditional on your passing order. The second potential bias is that the evaluation of one’s performance may directly depend on the quality of the previous performance(s). We will call these two types of biases respectively “sequential order bias” and sequential history bias”.
1.1
Sequential order bias
Few studies have addressed the effect of order on judgments of performance. Generally the research evidence indicates that later serial positions benefit from more positive evaluations . The evidence comes from several naturalistic studies on performance in competitions, including a study on international synchronized swimming competitions (Wilson, 1977), work on the Queen Elizabeth Contest for violin and piano (Glejser and Heyndels, 2001), and studies of the Eurovision song contest (Bruine de Bruin, 2005) and ice skating competitions (Bruine de Bruin, 2005, 2006).
3
Wilson (1977) showed that there was a significant negative correlation between serial positions and final ranks in the 1973 World Championship synchronized swimming championships and an amateur meet held in the same year such that better rankings tended to be in later serial positions. An evaluation of the judgments by 15 experts in the Queen Elizabeth Contest for classical violin and piano by Glejser and Heyndels (2001) showed that musicians who performed on a later day in the competition received better judgments. Moreover, higher overall rankings were also given for performances scheduled later in the week and later in the evening (Glejser and Heyndels, 2001). Bruine de Bruin (2005) examined the effect or order in both the Eurovision song contest and ice skating judgments. She found an increasing linear trend such that contestants who were in the later serial positions had significantly higher ratings than those in the earlier positions. This effect was also found in her follow up study on ice skating (Bruine de Bruin, 2006) with a larger data set. Two potential explanations exist in the literature for this observed order bias. First, there is a well established literature on the effects of order on memory. The serial position effect is the phenomenon demonstrating that recall accuracy (usually for words) varies as a function of an item’s position within a list (Murdock, 1962). Specifically, there are two main effects: the primacy effect and the recency effect. When asked to free recall items from a list participants generally remember better those stimuli at both the beginning (primacy effect) and end (recency effect) of a sequence, resulting in a roughly u-shaped curve. The serial position effect is a robust well researched phenomenon in the cognitive psychological literature (Glanzer and Cunitz, 1966; Burgess and Hitch, 1999; Gershberg and Shimamura, 1994). These serial position effects have been demonstrated both in the laboratory (Singh and Cole, 1993; Snyder and Harrison, 1997) and in naturalistic settings (Terry, 2005; Pieters and Bijmolt, 1997). Different memory mechanisms have been proposed to underlie the primacy and recency effects, with primacy effects linked to long term memory and recency effects explained through short term memory mechanisms (Glanzer and Cunitz, 1966). Moreover, several factors have been found to influence or alter their effects, for example distinctiveness (Neath and Crowder, 1996), emotional content (Rubin and Friendly, 1986; Maratos et al., 2000; Snyder and Harrison, 1997), prolonged distraction (Glenberg et al., 1980) and the length of the series (Anderson et al., 1998). Generally though, holding other factors constant, first and last items are remembered better. The interest in the role of memory and its limitations in economic decision making has grown recently (Dow, 1991; Piccione and Rubinstein, 1997; Benabou and Tirole, 2002; Mullainathan, 2002; Devetag and Warglien, 2003; Bernheim and Thomadsen, 2005; Devetag and Warglien, 2007). For instance Mullainathan (2002) proposes an exponential decay of recall probabilities, compatible with a recency effect: 1 − ρ−k E(fk ) = f 1−ρ where fk is the probability to forget the event and ρ ∈ {0, 1} a parameter
4
representing the propensity to remember an event from one period to the next. Recently, such a recency effect has been integrated by Sarafidis (2007) in a model where individuals can anticipate such biases and use them strategically. More generally a recency effect will be compatible with ∂E(fk )/∂k ≥ 0 for k ≥ k , and a primacy effect with ∂E(fk )/∂k ≤ 0 for k < k . These memory explanations have been seldom linked to the evaluation of sequential performance, and this issue has not yet been studied in economics. If we extend the idea suggested by Mullainathan (2002) to model memory limitations as the effect of time on the probability to remember an event, it is clear that contestants whose performance/qualities are less likely to be remembered are less likely to be positively selected. The primacy and recency effect would therefore suggest that contestants who are in earlier and later positions will benefit positively as a result of their performances being remembered better. A second possible explanation for the empirical results is proposed by Bruine de Bruin (2005). They explain their results through a direction of comparison effect. Specifically, they posit that as each new option is presented judges search for unique features (positive or negative) in the performance and, if found, these influence upwardly (for positive unique features) and downwardly (for unique negative features) the judgments, because more weight is given to these unique aspects rather than any overlapping features of the performance. Overall, they conclude that the direction-of-comparison effect is most prominent in tasks that promote sequential judgment, and in options with unique positive features (Bruine de Bruin and Keren, 2003). They further speculate that the direction-of-comparison effect may have contributed to the linear order effects found in jury evaluations of world-level figure skating contests (Bruine de Bruin and Keren, 2003), international synchronized swimming competitions (Wilson, 1977), the Eurovision Song Contest for popular music (Bruine de Bruin, 2005), and the Queen Elizabeth Contest (Glejser and Heyndels, 2001). However, this would only be the case if the judges were focused on the unique positive features of each performance, which may or may not have been the case. Fundamentally, the idea of a direction-of-comparison effect relies on a specific form of reference dependent preferences which is one of the most important hypotheses in modern behavioural economics (Bruni and Sugden, 2007). The direction of comparison effect supposes that performances are evaluated for their differences relative to a previous set of performances. If contestants present positive and negative differences relative to previous ones, and if the judges focus on the positive ones, then a systematic positive trend in the evaluation of contestants should appear. It is however not clear why judges would focus on positive differences, while not taking into account negative differences.
1.2
Sequential history bias
The second possible bias in the sequential evaluation of performance is that each person’s performance evaluation may depend on the performance of the previous person relative to whom they are often implicitly compared. For each judgment in a given sequence (with the exception of the first judgment), it is 5
the case that the judge has already very recently evaluated another target on that same dimension. Therefore, the knowledge the judge has activated to make that previous judgment is highly accessible at the time the next judgment has to be made. Consequently, this knowledge of the previous judgment is likely to influence the subsequent judgment (Damisch et al., 2006).Thus, the evaluation of a target at almost any point of the sequence is likely to be affected by the information that was activated during the preceding judgment of another target on that dimension (Damisch et al., 2006, 167). Mussweiler et al. (2004) selective accessibility model outlines two main comparison processes - contrast and assimilation - that take place during the assessment of two consecutive stimuli. These comparison processes are, from an economic point of view, two different forms of reference dependent evaluation. Contrast occurs when judges focus on differences in the stimuli, and assimilation occurs when the focus is on similarities. More precisely, the direction of the influence is determined by the perceived similarity between the two sequential stimuli. A priori it is not clear what phenomenon is likely to be at work in a sequential performance evaluation, but regardless of its nature it is likely to create biases in the individual evaluation of performances because the evaluation of a contestant’s performance will be depend on the performances of the previous contestant. Damisch et al. (2006) examined sequential performance judgments in both the 2004 Olympic Games and data gathered in a laboratory setting. Their aim was to apply the concepts in the selective accessibility model (Mussweiler et al., 2004) to sequential judgments in sport. Their results demonstrated that the score of an athlete increases with increasing scores of his or her immediate predecessor and decreases with decreasing scores of his or her predecessor, showing assimilation rather than contrast. Moreover, this effect carries on after the first person such that the correlation between a target and subsequent targets, whom are not immediately after the target (but second third etc), are also significant. According to research by Mussweiler et al. (2004) and Gentner and Markman (1994) unless otherwise instructed judges tend to search for similarities in the performances of people, that is, assimilation often appears to be the default judgmental outcome, resulting in significant positive correlations between performances. This paper investigates these two potential form of biases in the sequential evaluation of performance in a large data set from a naturalistic setting. Its unique contribution is two fold. First, no previous work has evaluated these two biases concurrently, therefore this paper adds to the existing work by enabling a direct comparison of these two processes in sequential order effects on performance evaluation. This is extremely important because it will enable us to isolate what factors are contributing to an observed ordering effect in performance and provide clearer theoretical implications. Second, this paper uses a large, multicultural dataset which has the advantage of ecological validity and generalisbility. A large majority of the previous studies of these order biases tend to be laboratory based or naturalistic studies using much smaller or restricted datasets. Our paper is unique in this respect 6
and hence provides a strong base for testing the theoretical predictions.
2
The data
Our data consist of observations of the ranking of contestants in live shows for several pop Idol series: Australia (Australian Idol: 2003, 2004, 2006, 2007; X-factor: 2005), Brazil (´ Idolos Brazil: 2007), Canada (Canadian Idol: 2003, 2004, 2005, 2006, 2007), Germany (Deutschland sucht den Superstar: 2003, 2004, 2006, 2007), India (Indian Idol: 2006, 2007), Netherlands (Idols: 2005; X factor: 2006), UK (X-factor: 2004, 2005, 2006, 2007), and the USA (American Idol: 2002, 2003, 2004, 2005, 2006, 2007). All of these shows share the same format in their final stage, specifically, the final set of contestants (10 to 13 depending on the series) are progressively eliminated one by one after each show. In each show participants have to perform a new song. Their performance is then assessed by television viewers who can vote for their preferred performance. The votes are tallied and one of the last two (or three) contestants who have received the fewest votes from the public is then eliminated (sometimes this last step is determined by a choice from the judges). The generic format of these shows, which is almost identical across countries and seasons, provides a unique opportunity to study the effects of ordering on the evaluation of individual performance. In addition, the variety of countries in our sample ensures that our results are not idiosyncratic to a given culture or to a given series. For each season we observe the final shows where candidates have to perform one song one after the other before the public is allowed to vote for them. We do not analyse the very final stage of the competition, when four or five competitors are left and they each sing two or more songs. We therefore only observe shows where there are between 5 and 13 competitors singing one song and one or two competitors being voted off at the end of each show. These data have been collected from various online sources: wikipedia.org, tv.com and the shows’ websites. Due to the marketing policy of the show, and in order to maintain the highest suspense during the competition the shows do not reveal the exact proportion of votes for each contestant. However, we do have some information about the rankings of the contestants because the bottom two, three or four competitors are revealed each for each show.
7
Table 1: Breakdown of the number of shows by country and number of contestants Contestant AUS BRA CAN GER IND NED UK USA Total 5 6 7 8 9 10 11 12 13 Total 4 4 4 5 3 4 3 4 0 31 0 1 1 1 1 1 0 1 0 6 5 5 4 5 4 4 1 0 0 28 3 4 4 3 4 4 2 0 2 26 2 2 2 2 2 1 1 1 1 14 0 1 2 2 1 2 1 0 1 10 1 4 4 4 3 2 3 3 0 24 1 5 6 6 4 6 4 4 0 36 16 25 25 26 21 22 14 13 3 165
3
Method
To assess the existence of a bias in the evaluation of contestants’ performances, we compare the empirical probability to be “safe” during one show to the theoretical probability to be “safe” (when there are no biases from the sequential ordering). Imagine a series of shows with a constant number N of contestants and suppose that these contestants have the same qualities (hence the same a priori probability to be safe). Let bk ∈ {2, 4} be the number of individuals in the bottom tier for a show k, the probability to be safe for a contestant is: bk N Suppose now that the ordering of the performances in the live show has an impact on the evaluation of the performance by the television viewers. Some participants will be favored by their position in the series and other disadvantaged. Let’s call bias(X, Z) this systematic departure from the theoretical probability of being safe where X is a set of variables characterising the position of the contestant in the passing order, and Z is a set of variables describing the characteristics of previous contestants. The probability to be safe for a participant in this position is pk = 1 − bk + bias(X, Z) N Suppose that, in this simple situation, we want to estimate the bias linked with every position i of the order, E(bias(i, Z)|i), we could compare the theoretical probability to be safe pT = 1 − bk /N to the actual frequency of safe contestants in each position i, pi = 1{i is safe} /Ns , where Ns is the number of shows observed: pi = 1 −
8
E(bias(i, Z)|i) =
1{i is safe} bk − Ns N
Our data are slightly more complex than this example because the number of contestants varies across the shows. To estimate E(bias(X, Z)|X, Z), we calculate the variable biasjk , which, for a participant j performing in the show k takes the value: biasjk = 1{j
is safe}
− 1−
bk Nk
By definition, we have E(bias(X, Z)|X, Z) = E(biasjk |X, Z). We can then define the two biases found in the literature as: Definition 1 (Sequential order bias) There is a sequential order bias as soon as for any variable xj characterising the position of a performance j in the passing order: E(biasjk |xj ) = 0 Definition 2 (Sequential history bias) There is a sequential history bias as soon as for any variable z characterising the previous candidates: E(biasjk |z) = 0 The following sections will consecutively study these two possible biases.
4
Sequential order bias
A sequential order bias arises when a candidate is advantaged or disadvantaged for his/her position in the order. To study this possible bias, we first look at the value of E(biasjk |i) which represents, for a given position in the order of appearance i, the difference in percentage points between the actual and theoretical probability to be safe. It therefore measures the advantage/disadvantage the position confers to a contestant in terms of the probability to be safe. Specifically, if E(biasjk |i) is positive then a contestant j in position i is more likely to be safe, and if E(biasjk |i) is negative he/she is less likely to be safe. *** Figure 1: Bias in performance evaluation by position order *** Figure 1 presents the mean bias per order over the whole set of orders. A clear pattern emerges which shows a positive trend as the order increases. However, this graph is imperfect because the relative position of each order may be different. For example the 5th order will be the last one in some situations, while in other situations it will be located in the middle between the beginning and the end of the series. In this graph the last order also consists of different orders, for example, sometimes it is 5th, 9th or 11th. Figures 2 and 3 present the decomposition of the ordering effect for the shows which have between 5 9
and 12 contestants. The last contestants appear to benefit from a positive bias, while contestants in the middle of the order (especially closer to the beginning) seem to be disadvantaged. *** Figure 2: Order effect for each type of show *** *** Figure 3: Order effect for each type of show *** In order to summarize the effects at the beginning and at the end of the sequence, Figure 4 compares the evolution from the beginning of the order to the evolution when looking at the reverse order. The last contestants appear to have a significant advantage relative to the contestants in other positions. *** Figure 4: Bias in performance evaluation at the beginning and the end of the series *** Overall, these results suggest that there seems to be an increasing linear trend such that contestants in the later positions have an advantage relative to those contestants in earlier positions. The worst positions in terms of bias seem to be positions two and three. One potential caveat of the research concerns the allocation process of the contestants. The above analysis assumes the random ordering of contestants to positions. What if this is not the case? In fact, there are two main reasons to think that the ordering is not random. First, the goal of the production is to maximise the entertainment level and if there is not a strict rule about the random allocation of contestants, this could produce a spurious correlation between ordering and results. For instance, better quality contestants could be more likely to be placed in some specific positions (like the beginning or the end) just for production purposes. 10
11
This implies that even if there were no ordering effect at all, a selection bias could induce some differences between the probability of success of different positions. Second, the production could have an agenda regarding the applicants and be willing to keep good contestants longer (because they will attract more viewers later on for instance). So, if there is any ordering effect, they could use it to advantage/disadvantage some contestants. This implies that if there is an ordering effect, the magnitude of this effect could be biased by a selection effect. In order to control for this potential caveat, we implement fixed effect models and estimate the ordering effects while controlling for the ability of the contestant. To analyse the effect of the ordering on the evaluation of the performance of contestants, it is possible to use a linear regression model with the variable bias as a dependent variable. Contestants in general perform more than once in the shows, we can therefore write: biasjk = β0 + Xjk β + uj + εjk (1)
where Xjk is a vector of variables relative to the order i of the participant j in the show k. The term uj is an individual effect specific to the individual j representing his/her ability. If the allocation of contestants is random, contestants performing at different places in the order do not tend to have on average differences in ability: E(uj |Xjk ) = 0. In such a situation, the OLS estimator is unbiased but not efficient and a random effect model must be used instead. However one may doubt about the hypothesis of random allocation of contestants. One could suspect for instance the production to select on average better contestants to perform at the end of the show. In this case, we have E(uj |Xjk ) = 0 and the random effect estimation will be biased. To control for 12
such a possibility we use a fixed effect estimator to estimate equation (1). The fixed effect estimator is a within estimator which uses only the variations in results observed within each contestants when they perform at different position1 . Figure 5 presents the intuition of this estimator, and how it corrects for a possible bias in the allocation of contestants with hypothetical contestants. The higher part of the Figure shows a situation where there is a systematic bias in the allocation of contestants in a situation where two contestant with unequal abilities perform 4 time each in a sequence of 6 contestants. The weaker contestant (in the higher part of the Figure) tend to be allocated to slots at the beginning of the sequence (1,2,3,4). The stronger contestant (in the lower part of the Figure) tends to be allocated to slots at the end of the sequence (3,4,5,6). If one were to pool the observations and run an OLS regression (or a random effect model which relies on the same assumptions), one would get the spurious results that performing at the end of the sequence gives an advantage. To control for such a possibility we use a fixed effect model which only take into account the variations in performance relative to the average performance of the individual (right part of top of Figure 5). There is then no apparent link between ordering and change in performances. The lower part of Figure 5 presents the situation where there is a bias in the allocation and an ordering effect. The fixed effect estimation will control for the bias in the allocation and only keep the influence of the ordering on the performance of each individual taken separately. An effect of the ordering on the result will therefore be detected only if a given candidate tend to perform better if he/she is at the end of the order rather than at the beginning. *** Figure 5: Identification strategy: using within variations in results to eliminate a possible systematic bias in the allocation of contestants *** If there is no order effect, no variable x from Xjk should have a significant coefficient. Given that the result for each contestant is not independent of the result of other contestants within a given show, these models are estimated with a clusterised robust variance matrix with the shows as clusters. For all shows the order variable was normalised between to 0 (first) and 1 (last). A dummy variable was created to capture the difference between being the first to perform (1) and all other positions (0). Table 2 presents the regression results. The first three columns are random effect estimations, they are more efficient and well identified if the ordering of candidates is not linked with their specific characteristics. The last three columns are fixed effect estimations, they are unbiased even if the ordering of contestants depends on their specific characteristics. Overall the order effect is very significant and implies that, with the exception of the first position, moving one position closer to the end of the show
1 It is also called ANCOVA in psychology and other social sciences with the contestant playing the role of the group variable. While psychologists use the ANCOVA to study the between groups effect controlling for covariates, economists use the fixed effect model to study the effect of the covariates controlling for systematic differences between groups (here the contestants).
13
Table 2: Regression: the ordering effect on performance evaluation Dependent variable: bias Random effects (1) Order First Cons R2 N Number of group Hausman test p-value -0.139*** (-8.69) 1522 352 0.202*** (6.25) (2) 0.265*** (6.67) 0.111* (2.39) -0.182*** (-7.85) 1522 352 Fixed effects (3) 0.181*** (5.07) (4) 0.234*** (5.70) 0.092 (1.87) -0.128*** (-5.09) 0.026 1522 352 .492
-0.090*** (-4.58) 0.022 1522 352 .263
* p<0.05, ** p<0.01, *** p<0.001
14
provides an additional 5 percentage point chance of being safe for a contestant. Therefore, ordering plays a major role in the competition, at least to discriminate between contestants close in ability (which is often the case in the latter rounds of such competitions). The difference between the random effects and fixed effects model gives an indication about the existence of a selection bias of contestants for each position. The coefficients are very close indicating that the order effect is very unlikely to be driven by a selection bias. To test for a significant difference between the coefficients of the two types of model, we need to implement a generalised version of the Hausman test given that we use a matrix of variance robust to the clusterisation of data in our estimation of both models (Wooldridge, 2001, p. 291). In both cases this test indicates no significant difference in coefficients between the two models (p-values in the last row of Table 2). This result suggests that the random effects models are consistent and must be considered as the best estimation procedure available. Practically, this means that there is no reason to think that the results are driven by a non random allocation of the candidates. Figure 6 presents the estimation of the parametric prediction from the fixed effect model and a non parametric estimation using a local linear regression for greater flexibility. The two curves match very well and this confirms the good calibration of the linear models. The results for the effect of ordering on performance evaluation show a J-shaped curve rather than a U-shaped curve indicating both primacy and recency effects, with a stronger recency effect.
*** Figure 6: Effect of the relative order on performance evaluation ***
15
5
Sequential history bias
Another bias possibly arising from the sequential ordering of contestants is that the evaluation of a contestant’s performance may be influenced by the performance of the previous contestant to whom they may be compared. If there is an assimilation process, we would expect that contestants performing just after a good contestant are more likely to be highly evaluated and to be in safe. On the contrary, if there is a contrast effect, we would expect it to be an disadvantage to perform after a good contestant as this is likely to negatively affect the evaluation of the contestant’s performance. It is possible to have an indicator of the quality of the contestant with the previous results of each contestant. We calculate the indicator strong which is a binary variable indicating if the candidate has always been safe in the previous shows. While lots of contestants are in the bottom only once, when they are eliminated, lots of contestants are in the bottom several times before being eliminated. For each show following the first one, there are two categories of contestant: those who have always been safe before and those who have been in the bottom tier in a previous show. Arguably, for a given show, a contestant who has never been in the bottom tier previously is less likely to be in the lower range of the ranking than contestants who have been in the bottom tier. Using the variable strong, we examine the effect of being preceded by strong contestants on the probability to be safe. We therefore estimate the model:
6
biasjk = β0 + Xjk β +
h=1
strongi−h + uj + εjk
(2)
Where strongi−h is the dummy variable indicating if the contestant who passed h positions before has always been safe in previous shows. Table 3 displays the results of this model. The estimation of the random effect model does not indicate any effect of the quality of previous contestants. However the fixed effects model suggests a strong effect of the previous contestant. The Hausman test indicates that the coefficients in the fixed effects model are significantly different from the coefficients in the random effects model. This suggests that the random effects model is inconsistent. This may be the case if for instance the producers of the shows tend to prevent placing two weak candidates consecutively. The effect estimated in the fixed effects model is then underestimated in the random effects model. The results of the fixed effects model suggests a significant and important effect of the previous contestant quality on the evaluation of the current contestant performance. When the previous contestant has never once been in the bottom tier before, the current contestant has 10 percentage points more chance to be safe. The coefficients for other previous contestants are also negative but lower, and almost always non significant.
16
Table 3: Regression: the comparison effect relative to the previous contestant
Dependent variable: bias Random effects (1) Order strongi−1 strongi−2 strongi−3 strongi−4 Cons R2 N Nb of group Hausman p-value -0.225*** (-7.17) -0.222*** (-4.92) -0.241*** (-3.94) 0.272*** (6.88) 0.047 (1.84) (2) 0.288*** (6.06) 0.043 (1.51) -0.008 (-0.30) 0.291*** (5.20) 0.047 (1.53) -0.015 (-0.49) 0.026 (0.84) 0.310*** (4.61) 0.027 (0.82) 0.003 (0.09) 0.014 (0.41) -0.033 (-0.97) -0.209** (-2.65) 0.251*** (5.91) 0.108*** (3.90) 0.249*** (4.96) 0.102** (3.21) 0.034 (1.08) 0.234*** (3.93) 0.092** (2.61) 0.016 (0.48) 0.069* (2.13) 0.239** (2.99) 0.056 (1.47) 0.028 (0.70) 0.062 (1.58) -0.012 (-0.30) -0.229* (-2.56) 0.023 790 0.001 < Fixed effects
-0.219*** (-6.50) 0.047 1339 0.001 <
-0.239*** (-5.24) 0.039 1156 0.001 <
-0.260*** (-4.04) 0.033 973 0.001 <
1339
1156
973
790
* p<0.05, ** p<0.01, *** p<0.001
6
Test of the random allocation of the contestants
In the previous sections we have been careful to control for a possible non random allocation of the contestants in the show. It is however interesting to check if this allocation is random or not. Given that the shows do not reveal the number of votes received by each contestants, it is not possible do assess directly if the allocation of contestants is random or not. We have however some ways to assess if the allocation is roughly random or if it tends to be systematically biased. A first information comes from the fact that for a small subset of shows in the American version, a website (Diallidol) proposes an estimates of the success of each contestant in term of votes. The website estimates the number of phone calls sent for each candidate (voters have to call a number specific to the candidate they want to support). This website has proven very successful in its estimations with rates of success in predictions of respectively 87, 91 and 97% in the last three seasons. Using these numbers we can see if over these three seasons there is a link between the results of candidates in previous shows and their place in ther ordering sequence in a show. Using the sum of the results over the last shows as an indication of quality we estimated by local linear regression how the average quality of contestant varies as a function of the order in a show. Figure 7 shows the result of this estimation and indicates that there is no link between the relative place in the ordering sequence and the average quality of the contestants. *** Figure 7: Random allocation of the contestants in the American idol shows *** While worth noticing, this result concerns only a subset of our sample (N=215). Whilst we do not have complete information on the results of contestants for our whole dataset, the information on the performances of the contestants on previous shows provides us with a way to test more generally if there 17
is a random allocation in the show. We can test if “strong” contestants who have never been in the bottom tier in previous shows are more likely to be at the end or the beginning of the show. To do so, we assess the probability that a contestant at a given order is strong depending on his/her order: strongik = β0 + Xik β + νk + εik (3)
Where νk is the fixed effect specific to the show k. This fixed effect approach is necessary as the proportion of candidates having been placed in the bottom tier before may change from one show to the other, typically it can increase with the number of shows in the competition2 . Assuming that the term εik represents an error with a logit distribution, this model is a conditional logit. Table 4 present the results of the estimation of this model. These results confirm what our previous analyses suggested. There is no systematic bias in the allocation of contestants relative to the passing order; that is better contestants are not more likely to be toward the end or the beginning of the show.
2 Note
that this doe not bias the estimations presented in Table 3
18
Table 4: Test of the random allocation of the contestants Conditional logit Strong candidate Order First Observations R-squared -0.0299 (0.22) -0.0939 (0.22) 1153 <0.001
Robust standard errors in parentheses *** p<0.01, ** p<0.05, * p¡<.1
7
Discussion
Our results indicate that in a competition the order of contestants may have a decisive role in the evaluation of their performance. Given the importance of job interviews or oral examination competition in allocating positions and rewards, these results should raise a concern about the necessary awareness of these potential biases in the evaluation process. More specifically, our analyses suggest that the two mechanisms, memory and direct comparison, both play a role in the order bias. With respect to memory it appears that both primacy and recency effects are implicated when sequentially evaluating performance. Irrespective of ability, contestants who perform first are more likely to be positively evaluated than those who come in second and third positions, which provides evidence of a primacy effect. Contestants who perform in the later serial positions (particularly last position) have the largest advantage with respect to positive evaluations, implying a strong recency effect. The curve showing performance evaluation by serial positions is J-shaped for this dataset implying a much stronger recency effect. These results are partially consistent with those of Bruine de Bruin (2005) who found an increasing linear trend. However, there is divergence with respect to a primacy effect. We find evidence of a small primacy effect while Bruine de Bruin (2005) found no benefit to being in first position. This seems to indicate that memory limitations do play a role in the sequential evaluation of performance. In addition, it also suggests that the primacy effect could receive more attention in economics. The economic models of memory limitation like those of Mullainathan (2002) and Sarafidis (2007) only integrate a recency effect. The second bias we demonstrate is a direct comparison effect with the previous contestant. Specifically, one’s performance evaluation is influenced by the evaluation of the previous contestant. If you perform after a weak contestant there is a bias such that you are more likely to be evaluated poorly than if you perform after a strong contestant. Therefore, we find evidence for an assimilation effect with respect to sequential judgements. These findings lend further 19
support to the selective accessibility model of Mussweiler (2003); Mussweiler et al. (2004). Specifically, our results indicate that judges tend to assess performances based on similarities with the previous contestant and not differences. This also concurs with evidence from Damisch et al. (2006). Overall, we show that these two effects both operate and are important explanatory mechanisms in the evaluation of sequential performance. One factor which could influence these findings concerns the changing performance as a result of being privy to the performance of others. Specifically, it could be plausible that people change their performance (increase level of effort, motivation) after having witnessed the previous performance(s). This mechanism could work in one of two ways. If the task is novel the contestants could learn from the previous performances. However, this is not the case in most tasks which have been studied in the literature (sport and singing competitions) as the task is known in advance. Second, previous performances could act as a benchmark or goal that the future contestant can aim for. Exactly how this process works is unclear and not easy to predict. It could however be an explanation for the apparent dominance of assimilation over contrast because the actual performance is changing rather than the criteria of the judges. One way test this idea would be to investigate these biases in cases where performances are not seen by the contestants, for example in job interviews or private auditions and compare these effects to those cases where the performances are able to be witnessed. A limitation of the current study is that we do not have information about the number of people who are watching the shows throughout the broadcasts. It is possible, although unlikely in our opinion, that more people are watching the show toward the end of the program and these very same people who miss the beginning of the show also decide to vote. First, it seems likely that the people who are voting are the more ardent fanatics and are less likely to miss the beginning of the show. Second, even if there was a large enough proportion of people voting who miss the early performance(s) then this would mean that we should just see an increasing monotonic trend (assuming people do not vote for people they do not see). Having found a significant primacy effect this result is contrary to this prediction. If anything, these “late voters” would bias downwards the primacy effect which means our estimate of the initial memory effect is likely to be conservative. Relatively speaking the magnitude of the effect is quite large and therefore is likely to have a significant impact on both the contestants and the judges. Specifically, it is significant enough to raise questions about the fairness of the process from the contestants’ perspective and to pose problems in relation to the efficiency of the process from the perspective of the judges. These findings have implications for the way in which performances should be evaluated. At the very least judges (and perhaps contestants) could be made aware of these effects. What they do with this information and how best they assimilate it into their judgments (performances) remains to be studied. This work also suggests that future research is definitely needed in this area to study in depth these effects. For example, questions that need to be addressed 20
include which is the stronger of these two mechanisms? Do these biases depend of the type of competition and the delay before judging? Also, does making people aware of these biases eliminate them? Moreover, future work needs to study the conditions under which assimilation and contrast are likely to occur in the evaluation of sequential performance. Are certain types of performances (those that are judged on a tight set of criteria) more likely to lead to assimilation effects?
References
Anderson, J., D. Bothell, C. Lebiere, and M. Matessa, 1998, An Integrated Theory of List Memory, Journal of Memory and Language 38, 341–380. Arellano, M., 2003, Panel data Econometrics, Oxford University Press. Benabou, R. and J. Tirole, 2002, Self-Confidence and Personal Motivation, Quarterly Journal of Economics 117, 871–915. Bernheim, B. and R. Thomadsen, 2005, Memory and Anticipation, The Economic Journal 115, 271–304. Bruine de Bruin, W., 2005, Save the last dance for me: unwanted serial position effects in jury evaluations, Acta Psychologica 118, 245–260. Bruine de Bruin, W., 2006, Save the last dance II: Unwanted serial position effects in figure skating judgments, Acta Psychologica 123, 299–311. Bruine de Bruin, W. and G. Keren, 2003, Order effects in sequentially judged options due to the direction of comparison, Organizational Behavior and Human Decision Processes 92, 91–101. Bruni, L. and R. Sugden, 2007, The road not taken: how psychology was removed from economics, and how it might be brought back, The Economic Journal 117, 146–173. Burgess, N. and G. Hitch, 1999, Memory for serial order: A network model of the phonological loop and its timing, Psychological review 106, 551–581. Clerides, S. and T. Stengos, 2006, Love Thy Neighbour, Love Thy Kin: Strategy and Bias in the Eurovision Song Contest, Centre for Economic Policy Research . Damisch, L., T. Mussweiler, and H. Plessner, 2006, Olympic Medals as Fruits of Comparison? Assimilation and Contrast in Sequential Performance Judgments, Journal of Experimental Psychology Applied 12, 166. Devetag, G. and M. Warglien, 2003, Games and phone numbers: Do short-term memory bounds affect strategic behavior?, Journal of Economic Psychology 24, 189–202. 21
Devetag, G. and M. Warglien, 2007, Playing the wrong game: An experimental analysis of relational complexity and strategic misrepresentation, Games and Economic Behavior . Dow, J., 1991, Search Decisions with Limited Memory, Review of Economic Studies 58, 1–14. Gentner, D. and A. Markman, 1994, Structural alignment in comparison: No difference without similarity, Psychological Science 5, 152–158. Gershberg, F. and A. Shimamura, 1994, Serial position effects in implicit and explicit tests of memory, Learning, Memory 20, 1370–1378. Glanzer, M. and A. Cunitz, 1966, Two storage mechanisms in free recall, Journal of Verbal Learning and Verbal Behavior 5, 1–360. Glejser, H. and B. Heyndels, 2001, Efficiency and Inefficiency in the Ranking in Competitions: the Case of the Queen Elisabeth Music Contest, Journal of Cultural Economics 25, 109–129. Glenberg, A., M. Bradley, J. Stevenson, T. Kraus, M. Tkachuk, A. Gretz, et al., 1980, A two-process account of long-term serial position effects, Journal of Experimental Psychology: Human Learning and Memory 6. Goldin, C. and C. Rouse, 2000, Orchestrating Impartiality: The Impact of” Blind” Auditions on Female Musicians, The American Economic Review 90, 715–741. Maratos, E., K. Allan, and M. Rugg, 2000, Recognition memory for emotionally negative and neutral words: an ERP study, Neuropsychologia 38, 1452–1465. Mullainathan, S., 2002, A Memory-Based Model of Bounded Rationality, Quarterly Journal of Economics 117, 735–774. Murdock, B., 1962, The serial position effect of free recall, Journal of Experimental Psychology 64, 482–488. Mussweiler, T., 2003, Comparison Processes in Social Judgment: Mechanisms and Consequences, Psychological Review 110(3), 472–489. Mussweiler, T., K. R¨ter, and K. Epstude, 2004, The ups and downs of social u comparison: Mechanisms of assimilation and contrast, Journal of Personality and Social Psychology 87, 832–844. Neath, I. and R. Crowder, 1996, Distinctiveness and very short-term serial position effects, Memory 4, 1–18. Neilson, W., 1998, Reference Wealth Effects in Sequential Choice, Journal of Risk and Uncertainty 17, 27–48.
22
Novemsky, N. and R. Dhar, 2005, Goal Fulfillment and Goal Targets in Sequential Choice, Journal of Consumer Research 32, 396–404. Piccione, M. and A. Rubinstein, 1997, On the Interpretation of Decision Problems with Imperfect Recall, Games and Economic Behavior 20, 3–24. Pieters, R. and T. Bijmolt, 1997, Consumer Memory for Television Advertising: A Field Study of Duration, Serial Position, and Competition Effects, Journal of Consumer Research 23, 362. Prendergast, C. and R. Topel, 1993, Discretion and bias in performance evaluation, European Economic Review 37, 355–65. Rubin, D. C. and M. Friendly, 1986, Predicting which words get recalled: measures of free recall, availability, goodness, emotionality, and pronunciability for 925 nouns, Memory and cognition 14, 79–94. Sarafidis, Y., 2007, What Have you Done for me Lately? Release of Information and Strategic Manipulation of Memories, The Economic Journal 117, 307– 326. Segrest Purkiss, S., P. Perrew´, T. Gillespie, B. Mayes, and G. Ferris, 2006, e Implicit sources of bias in employment interview judgments and decisions, Organizational Behavior and Human Decision Processes 101, 152–167. Singh, S. and C. Cole, 1993, The Effects of Length, Content, and Repetition on Television Commercial Effectiveness, Journal of Marketing Research 30, 91–104. Snyder, K. and D. Harrison, 1997, The affective auditory verbal learning test, Archives of Clinical Neuropsychology 12, 477–482. Terry, W., 2005, Serial Position Effects in Recall of Television Commercials, The Journal of General Psychology 132, 151–164. Wilson, V., 1977, Objectivity and effect of order of appearance in judging of synchronized swimming meets, Perceptual and Motor Skills 44, 295–298. Wooldridge, J., 2001, Econometric Analysis of Cross Section and Panel Data (MIT Press).
23
Readers
Recent searches finding this paper
| Katie Page economics | via Google |
| Journal of Economic Behaviour and Organisation - Last shall be first | via Google |
| psychology- terry 2005) primacy and recency effect | via Google |
| terry "serial position effect" commercials | via Google |
| job interviews first and last studies primacy | via Google |
| pieters & bijmolt 1997 - serial position effect | via Google |
| save the last dance, assimilation model | via Google |
| "Predicting which words get recalled: Measures of free recall" | via Google |
| "last shall be first" idol | via Google |

Like (1)
Add Comment