Ensuring a Level Playing Field

Our 100&Change evaluation panel was a diverse group of thinkers and visionaries chosen to represent many different fields of expertise.

Rather than having our panel of wise heads review submissions based on their specific area, we randomly assigned proposals and asked them to evaluate projects on their broad knowledge. Each application was judged by a panel of five experts using four criteria: meaningful, verifiable, feasible, and durable.

With 413 judges and 801 proposals, we knew we were likely to encounter some conflicts of interest. We addressed conflicts of interest in two ways.

First, if anyone was a member of a team, an officer, or director of an organization that submitted a proposal, we recused them from service on the evaluation panel. Second, we asked judges to identify actual or potential conflicts with respect to any application they were randomly assigned. We then reassigned that application to a new judge.

For example, a professor at the School of Education at the University of X could serve on the evaluation panel. But that same professor would not read the application from the Medical School at University of X.

Another concern with such a large group of judges is developing a common, shared understanding of the four criteria. With a smaller panel, we might have convened the group in a single room and conducted calibration exercises to get each judge to interpret the numerical rating scale in the same way. With our large multinational evaluation panel, this was not an option.

In lieu of the in-person exercise, we held training webinars. We also built into the scoring traits specific language to define each numerical value on the scale of 1-5.

Nevertheless, we worried some judges might consistently give high scores and other judges consistently low scores. We did not want to disadvantage a proposal that was assigned to a judge who tended to give low scores or tip the scale in favor of a proposal that had a judge who tended to score high. So judges’ scores were statistically normalized to ensure that no matter which judges were assigned to an applicant, each proposal was given equal consideration.

In addition to scoring the proposals, each judge provided narrative comments to justify the rating for each evaluation criteria. Some judges still expressed the desire to provide an overall narrative assessment of the evaluation process, some of which we will share in coming days.