How to Have a Meaningful Global Research Study

Want to know a dirty little secret from the market research industry? Most global studies are useless. Well, not entirely useless, but meaningful comparison across countries is nearly impossible with the average research survey. The truth is, people from different countries and cultures perceive and respond to scales differently, and simply comparing them side-by-side is missing half the picture.

For instance, in Japan and Germany survey respondents tend to be more reserved, favoring the lower ends of scales to provide more critical ratings. However, in India and Brazil the scenario is quite the opposite. Respondents in these countries tend to embrace a more optimistic viewpoint, resulting in nearly every item receiving remarkably high ratings.

If you’re part of a global organization, chances are you’ve encountered this dilemma: either ignore the differences or rely on historical trends to grasp what is truly going on in each market. The problem is, ignoring the differences means missing key insights, and relying on historical trends is costly and often impractical.

While we can’t expect respondents in different corners of the globe to treat traditional scales the same, we can certainly control and even correct the issue. How do we do this? We disrupt the very psychology of rating scales by altering how the questions are framed. We can not only eliminate biases, but we can do so at the individual level, setting the stage for meaningful cross-country and within company comparisons. At PSB Insights, we’ve developed several innovative solutions that allow us to achieve these goals:

Choice-Based or Item Competition. Would you rather be wealthy, intelligent, or good-looking? I know I’d like to be all three, and if you ask me to rate each option on a 5-point scale, chances are I’d give a resounding “strongly agree” to all three. However, if you force me to choose or prioritize, that’s when you truly uncover my motivations.

While this approach absolutely solves the issue of response tendencies, it’s not always a practical solution for every type of question. Plus, you’re left without a measure of collective importance of the total set (though we have a fix for that). It can also be time-consuming if not done right.
Two-Step Rating. We love two-step ratings here at PSB Insights. Usually, when you ask someone on a 4- or 5- point scale how much they like something, they tend to stick to their cultural tendencies. Their comfort zone on the scale. That comfort zone changes for different individuals and for different cultures. This can lead to some extreme problems, including incorrect (even flipped) relationships, and bad business decisions.

However, you can utilize a two-step process for better results. In the first step, we take the same list of questions and ask a preliminary question like “What’s most important to you?” Then, we take just the items selected in the preliminary question and ask a second step, “Which of these are very important?”. The two steps combined produce ratings similar to a top-box (i.e., selected in the second step) and a top-2-box (selected in the first step). This approach does a bit of magic by reducing cross cultural response differences and minimizing extremely positive scores.

The magic is in how it changes the psychology of the rating task. Instead of sticking within a small range on a scale (say rating everything a 4 or 5), the two -step approach sets a different expectation. A list is provided, and it is clear some items are expected to be selected, but not all. This combats the perception that everything should be positive, or everything should be negative. And ultimately, it forces the respondent to give more consideration to what is being asked in the context of the questionnaire.

The biggest advantage to the two-step, beyond the greater differentiation and massive reduction in positive response tendencies, is the increased validity. Our research-on-research shows considerable improvement in predicting consumer choice from a two-step approach versus standard rating scales.

In most cases, this approach is enough. However, there still can remain more minor country related response tendencies as shown in the graphic, where India still has the highest likelihood of showing a positive response, albeit at a much-reduced level. There are also times when you are just stuck with your rating scales but still need to deal with the cultural response tendencies. In that case, we need to lean on some advanced analytics to level the field.
Modeling for Item Response Tendencies (IRT). While there can be some advanced modeling involved, the logic behind IRT is relatively straightforward. To put it simply, we estimate how you as an individual are likely to use a scale, and then correct for that effect. How, you ask? By throwing in completely unrelated items, we create a model of how you respond to survey questions, or a “response tendency”.

For example, there’s probably zero correlation between how much you like cats and what you are looking for in a car. But by asking about something completely unrelated, we can determine how you use the scale and then calibrate your responses accordingly so that the playing field is even across all respondents.

IRT correction solves our response tendency issue and allows us to finally make meaningful comparisons across countries. And if you’re dealing with an existing survey, fret not. The best part of IRT is if you have an existing survey that has already been fielded, IRT can still fix the issue.

If you are starting from scratch, we always recommend our clients get the best of both worlds by combining a two-step approach with IRT. Two step ratings are inherently better than single-step, and using IRT on top allows for meaningful country-by-country comparisons.

And the data speaks volumes. In the unadjusted importance rating below, it’s challenging to discern any notable variation for the metric across the four countries. When we employ the two-step approach (without IRT correction), we start to uncover more meaningful insights, particularly in Germany and India.

However, this isn’t the whole story. Given the tendency for respondents in India to lean towards more positive ratings and those in Germany towards more negative ones, how do we ensure fair comparison? Enter IRT. By correcting the responses, the importance of the metric in Germany becomes unmistakably clear.

Cross-cultural response tendencies create pervasive issue within international studies. Doing nothing with it is a choice, but there are better options to manage it:

Disrupt ratings with a stepped approach
Use a stepped approach to getting gradation is effective in reducing stepped ratings to mitigate issues.
Deal with the issue statistically
Multivariate IRT lets us get beyond the problem for real comparisons and models.
Combine both methods if possible
Using both approaches can be highly effective, particularly with cultures that use the top end of rating scales.

Written by Colleen McCauley
Questions? Ask Rob Kaiser, EVP of Advanced Analytics [email protected]