Assuming a missing data mechanism

There are two main ways to make an assumption about the missing data mechanism. The first and most important one is ‘common sense’. By what is known about the data collection process and the data in general, most researchers have an idea about the reasons for the missing data. It is very important to take this knowledge into account when making an assumption about the missing data mechanism.

A second way to make an assumption about the missing data mechanism is by statistical testing. The data can be separated into two groups, one group with participants with complete data on variable A (e.g., IQ) and one group with participants with missing data on variable A (e.g., IQ). We can create a dichotomous variable (i.e, missing data indicator) that indicates whether data on IQ are missing or complete. Then we can test for group differences with a t-test. For example we can test whether both groups differ in age. If there are differences we can conclude that our missing data are NOT complete at random, so not MCAR. We can also use the missing data indicator variable as outcome in a logistic regression model and test whether other variables in the data are related to that outcome. If we find variables related to the missing data indicator we can conclude that data are not MCAR. If we find no relations at all, we might be able to conclude that missing data are MCAR. However, we can never distinguish between MAR or MNAR. We can only test if data are MCAR or not MCAR. Furthermore, it is always important to combine statistical testing with common sense.