Understanding Statistical Testing

On the Suzy platform, we currently have statistical testing available for our monadic testing feature, rank grids, and scale grids. 

In our monadic feature, we added a new question type called rating that has statistical testing added when you select the PowerPoint export and demographic data excels. Only rating, rank grids, and scale grids in the monadic have statistical testing, all other question types in the monadic feature do not have statistical testing added.

Please note that we only apply statistical testing to a sample size of 50 or above.

For the feature we use z-test, which focuses on analyzing the differences between different samples. 

With the monadic feature - there is no overlap in sample between the different concepts, making each concept have a unique set of sample. As a result, two-sample proportion z-test is the best statistical model to help us determine the differences between the results of the concepts.

For rank and scale grids, you will see statistical testing for top box (lowest number for rank and highest number for scales) and the top two box level.

By default we test at the 95% (uppercase letters) and 90% (lowercase letters) confidence interval. If a concept is statistically higher at the 95% confidence level, it is also statistically higher at the 90% confidence level. The same is not true for the reverse.

The way we identify stat testing is through the PowerPoint export or the excel sheet. Each concept is assigned a letter (e.g. A, B, C, etc.). To read the output properly, identify which numbers feature a letter next to it. This indicates that the particular concept (on that metric) is significantly higher than the concepts identified by the letters. If there are no letters shown, the data was statistically tested but there were no statistical differences between the concepts.


In this instance:

  • Concepts 3 and 4 are significantly higher than Concept 1 at the 95% level for appeal.
  • Concept 2 is significantly higher than Concept 1 at the 90% level for appeal.
  • Concept 3 is statistically higher than Concept 1 at the 95% level for relevance.
  • Concept 3 is statistically higher than Concept 2 at the 90% level for relevance.