Share this:
February 21, 2018

Measuring shared decision making: how valid and reliable are our instruments?

By Victor M. Montori

Recently, a systematic review that my colleagues and I started working on two years ago, was published in PlosONE (link to paper). Here, we will provide a summary of the methods and results and share our conclusions and recommendations. The aim of this review was to rate the psychometric quality of existing instruments measuring the process of shared decision making (SDM). Publishing this work is a great milestone for me for several reasons. Doing a systematic literature review is a time-consuming and intense process, and for months you crave for the moment that the work will finally be published and shown to the world. Also, this is my first scientific article in the field of SDM, combining my experience with performing psychometric validation studies with my current research focus, and research passion, SDM.

 The main aim of this systematic review, as stated in the background, was to help researchers identify the best instrument to measure SDM in their studies. As there are so many SDM instruments available, reviewing the separate instruments provided us with the opportunity to aggregate results and identify overall strengths and limitations of the instruments and the methods applied in their development and evaluation studies. This, I think, is even of greater value to the SDM field than merely providing insight into the quality of the separate instruments. By presenting overall results on the methodological quality and the psychometric quality of SDM instruments, we aimed to point out the challenges that our field faces in the development and evaluation of the measurement instruments we use in our research and practice evaluation. I hope that our work will trigger reflection on the methods commonly applied and their limitations, and that it helps in starting and continuing discussions on future directions to help improve the quality of studies validating SDM instruments, as well as those using them.

I look forward to hearing your thoughts and views on our findings and ways forward. My co-authors and I will join a few conferences this year (e.g. SMDM-Europe 2018 in Leiden, the Netherlands and ICCH 2018 in Porto, Portugal), so for a discussion in person, please come and meet us there!


As the readers of this Blog may be aware of, research on shared decision making is extensively growing. Most studies on the extent of shared decision making (SDM) seen in clinical care, on the effects of SDM training and tools for healthcare providers and patients, and on the effect of SDM on psychosocial and physical patient outcomes make use of standardized measurement instruments to assess the actual realization of SDM. The validity of their results highly depends on the psychometric quality of the instruments used. Existing instruments to measure SDM include questionnaires for patients or providers, and observer-based coding schemes to be applied to audio- or videotaped consultations. We performed a systematic literature review of instruments assessing the SDM process, in order to help researchers choose the best instrument in terms of psychometric quality.


We systematically searched seven databases. Two researchers independently evaluated all retrieved records for eligibility, using pre-defined inclusion criteria (i.e., peer-reviewed articles that describe the development or evaluation of an SDM-process instrument). For each instrument we identified in the included articles, we rated the psychometric quality for ten separate measurement properties: separately for ten measurement properties: Internal consistency, reliability (test-retest reliability for questionnaires and intra-rater and inter-rater reliability for coding schemes), measurement error, content validity, structural validity, hypotheses testing, cross-cultural validity, criterion validity, responsiveness, and Interpretability.

For this quality rating we performed two quality appraisals: we appraised 1) the quality of the methods applied in the development and/or validation study, using the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN, see, [1-2] and 2) the psychometric quality of the measurement property per instrument, based on the results of the development and/or validation studies.[4] For each instrument, we synthesized the results of the two appraisals into a best level of evidence per measurement property. The levels of evidence were: ‘unknown’ (due to poor methods), ‘conflicting’, ‘limited’, ‘moderate’, and ‘strong’. These were scored as either positive or negative results for a measurement property evaluation. [5].


Our search resulted in 51 included articles, describing 23 different instruments measuring the SDM-process. Including all revisions and translations of these instruments, we found in total 40 different instrument versions. Most instruments were observer-based coding schemes (N=18), followed by patient (N=16) and provider questionnaires (N=4); two instruments were dyadic, i.e., they included multiple perspectives in the assessment of SDM.

Overall trends in the quality of SDM instruments and the methods applied in their validation studies

Generally, evidence is lacking regarding the measurement quality of existing SDM instruments, partly because not all measurement properties have been evaluated, and partly because the methodology applied in the evaluation studies was of poor quality.

Overall, six measurement properties have been evaluated for less than 20% of the instruments, accounting for their applicability: Test-retest reliability of questionnaires (17%), measurement error (0%), content validity (14%), cross-cultural validity (13%), responsiveness (2%), and interpretability (0%). The best-evidence synthesis indicated positive results for half of the instruments for content validity (50%) and structural validity (53%), if these had been evaluated. In contrast, negative results for about half of the instruments were found for inter-rater reliability (47%; coding schemes only) and hypotheses testing for construct validity (59%), in case these had been evaluated. Differences in the quality between instrument types were found for internal consistency and structural validity: results for questionnaires were overall more positive than for coding schemes, and for coding schemes more often unknown than for questionnaires, due to lack of validation of these measurement properties, or because of poor methodological quality of the studies.

Concerning the often poor results of hypothesis testing for construct validity evaluation, it is of note, hypotheses about relationships with instruments that were designed to measure the same construct (i.e., the SDM process), either measured from the same or from a different perspective, were often not confirmed, or did not reach the threshold we handled for positive results for correlation coefficients of ≥0.50. The weak correlations point both to a lack of consensus on how to define the process of SDM and to the question whether SDM viewed from the perspective of the patient, provider, or observer can be regarded as the same construct?

This fits the finding that developers often only provided a vague definition of the construct to be measured, or none at all. Also, only two developers explicitly mentioned which underlying measurement model they assumed: a formative model, in both instances. The choice for the measurement model–reflective, in which the latent construct determines the items (effect indicators) versus formative, in which the latent construct is a result of independent items (causal indicators)–has implications for the development and validation criteria of instruments [6]. Neglecting the differences may result in applying an inappropriate methodology. In 2011, Wollschläger called upon the SDM field to reach consensus on the most suitable underlying measurement model [7], a call that has not yet been clearly responded to.

Conclusions and recommendations

A large number of instruments are available to assess the SDM process, but, evidence is still lacking regarding their measurement quality, partly because measurement properties have not been evaluated at all, and partly because the validation studies have been of poor quality. Clearly, this does not imply that existing instruments are of poor quality, but rather, that their quality is often unknown. In practice, the choice for the most appropriate instrument for your research can therefore best be based on the content of the instrument and other characteristics of the instruments that suit best the aim of the study and the resources available for the study, such as the perspective that is assessed and the number of items. For quality improvement of existing SDM instruments, and improvement of the validation studies in the SDM field, we recommend the following:

Key recommendations:

-  Reach consensus on the most suitable underlying model for the construct of the SDM process.

-  Provide a clear definition of the construct the instrument aims to measure–in this case the SDM process.

-  Perform content validity analyses prior to further validation of new instruments.

-  Include large-enough sample sizes in validation studies; improvement of sample sizes is especially needed for inter- and intra-rater reliability testing of coding schemes.

-  Seek alternative ways to evaluate test-retest reliability of questionnaires for the process of SDM.

-  Find ways to improve inter-rater reliability of coding schemes; e.g., by providing more detailed descriptions of coding scheme items.

-  When formulating hypotheses to evaluate construct validity, include instruments with constructs that are as similar as possible to the construct of the instrument under investigation and, alternatively, make use of known-group differences testing.

-  Determine minimal important change values to inform the interpretation of change scores in intervention studies.

-  Above all, we recommend to further evaluate and refine existing instruments and to adhere as best as possible to the COSMIN guidelines ( to help guarantee high-quality evaluations of psychometric properties.

For a more detailed description of the methods and results of our systematic review and for a more nuanced discussion of our findings, please take a look at our full paper in PlosOne.

For any questions about this work feel free to contact Fania Gärtner:

Submitted by

Fania R. Gärtner1, Hanna Bomhof-Roordink1, Ian P. Smith1, Isabelle Scholl2,3, Anne M. Stiggelbout1, Arwen H. Pieterse1

Author affiliations

1 Medical Decision Making, Department of Biomedical Data Sciences, Leiden University Medical Centre, Leiden, the Netherlands

2 Department of Medical Psychology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany

3 The Dartmouth Institute for Health Policy and Clinical Practice, Lebanon, NH, Unites States


Dr. Fania Gärtner holds a Master’s degree in Social Psychology and a PhD in occupational medicine. In her work, she combines her expertise in the development and evaluation of measurement instruments, and doctor-patient communication and SDM. She has a special focus on learning needs and barriers of oncologists for applying SDM in daily practice. Next to her work as a researcher, Fania has extensive experience in training medical students and specialists in communication and SDM skills, which brings her in contact with diverse attitudes and levels of competencies, and feeds her eagerness for the research in this field.



  1. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual Life Res. 2010;19(4):539-49.
  2. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. Journal of clinical epidemiology. 2010;63(7):737-45.
  3. Terwee CB, Mokkink LB, Knol DL, Ostelo RW, Bouter LM, de Vet HC. Rating the methodological quality in systematic reviews of studies on measurement properties: a scoring system for the COSMIN checklist. Quality of life research : an international journal of quality of life aspects of treatment, care and rehabilitation. 2012;21(4):651-7.
  4. Terwee CB, Bot SD, de Boer MR, van der Windt DA, Knol DL, Dekker J, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2007;60(1):34-42.
  5. Terwee CB, Prinsen CA, Ricci Garotti MG, Suman A, de Vet HC, Mokkink LB. The quality of systematic reviews of health-related outcome measurement instruments. Quality of life research : an international journal of quality of life aspects of treatment, care and rehabilitation. 2016;25(4):767-79.
  6. Jarvis CB, Mackenzie SB, Podsakoff PM. A Critical Review of Construct Indicators and Measurement Model Misspecification in Marketing and Consumer Research. Journal of Consumer Research. 2003;30:199-218
  7. Wollschlager D. Short communication: Where is SDM at home? putting theoretical constraints on the way shared decision making is measured. Zeitschrift fur Evidenz, Fortbildung und Qualitat im Gesundheitswesen. 2012;106(4):272-4.

Tags: Uncategorized

Please login or register to post a reply.
Contact Us · Privacy Policy