2013 Classification Criteria for Systemic Sclerosis
2013 Classification Criteria for Systemic Sclerosis
The development and testing of the classification system for SSc was based on both data and expert clinical judgment. First, candidate items for the classification criteria were generated using consensus methods and evaluated using existing databases. Second, multicriteria decision analysis was used to reduce the number of candidate criteria and assign preliminary weights. The classification system was repeatedly tested and adapted using prospectively collected SSc cases and non-SSc controls, and compared against expert clinical judgment. Third, the classification criteria were tested in a validation cohort and tested against preexisting criteria sets.
One hundred sixty-eight candidate criteria were identified through two Delphi exercises. A 3-round Delphi exercise and a face-to-face consensus meeting using nominal group technique facilitated reduction of the 168 items to 23. Using a random sample of existing databases (SSc (n=783) and control patients with diseases similar to SSc (n=1 071), all based on physician diagnosis), the candidate criteria were found to have good discriminative validity.
Draft Classification System. A face-to-face meeting of four European and four North American SSc experts was held to further reduce items and assign preliminary weights using multicriteria decision analysis. The number of experts was limited in advance to 8, and they were invited based on geographic representation, knowledge from a scientific and a practical diagnostic viewpoint, and availability. At the meeting, the experts determined by consensus to which cases the criteria should be and should not be applied, and which items are sufficient to allow classification of a patient as having SSc (sufficient criteria). They then participated in a multicriteria decision analysis to further reduce the 23 items and assign preliminary weights. The experts were presented hypothetical pairs of cases with 2 of the 23 items at a time (eg, Raynaud's phenomenon positive and abnormal nailfold capillaries absent vs Raynaud's phenomenon negative and abnormal nailfold capillaries present, all other manifestations being considered equal) and they were asked to individually vote electronically on which case of the pair was more likely to be SSc. The result of the votes was immediately presented. If there was no complete agreement among the experts, considerations were discussed and a second round of voting was conducted. As a result of the repeated choices between two alternative cases, items were ranked, and weights for the items were derived using 1000 Minds decision-making software. Additional details about the methods are available in ref..
Initial Threshold Identification. The committee prepared summaries of 45 SSc cases, with a concentration of cases that were difficult to classify. These were presented to 22 SSc experts who classified the cases as definite SSc or not. The draft classification system derived from the multicriteria decision analysis was applied to the 45 cases, resulting in a score for each case. The ranking of cases by the SSc experts and the ranking of cases based on the scores provided with the draft scoring system were examined. Higher scores in the scoring system were expected to relate to a higher probability that the experts would classify the case as SSc. Using these results, an initial threshold score for SSc was identified.
Reduction and Testing of Iterative Changes. In the next step, the committee reduced the number of items, simplified the weights, and modified the threshold score. First, data on the candidate items were prospectively collected at 13 SSc centres in North America and 10 in Europe, using standardised case record forms. Data from 368 consecutive patients with SSc (diagnosis based on physician opinion) were collected, of whom half were to have had SSc for a maximum of 2 years (based on the time from the first non-Raynaud's symptom) in order to include early SSc. Data from 237 consecutive control patients with a scleroderma-like disorder (eosinophilic fasciitis (also called Shulman's disease or diffuse fasciitis with eosinophilia), scleromyxedema, systemic lupus erythematosus, dermatomyositis, polymyositis, primary Raynaud's phenomenon, mixed connective tissue disease, undifferentiated connective tissue disease, generalised morphea, nephrogenic systemic sclerosis, and diabetic cheiroarthropathy) were also collected. From these 605 patients a random sample of 100 SSc cases and 100 controls (50% from North America and 50% from Europe) was selected to form the derivation sample. The remaining 268 cases and 137 controls formed the validation sample. Institutional research ethics board approval was obtained for the collection of patient data.
The committee then met and made iterative changes to the draft system, which they continually applied in real time to the derivation cohort derived as described above. Using the derivation cohort, the scoring system was simplified by removing items that occurred with low frequency or were redundant, by aggregating similar items and then transforming the weights to obtain single digits. The preliminary score threshold was adjusted to account for the weight simplification. The impact of all proposed changes was evaluated by assessing changes to sensitivity and specificity of the criteria in the derivation cohort. The reference standard to test the sensitivity and specificity was the diagnosis by the SSc expert who submitted the case(s) and control(s).
At the same time, the changes in the classification system were also tested in 38 difficult-to-classify cases. Consequently, weights of some items were adjusted to align the scoring system with the reference standard formed by the opinions of the SSc experts as to which cases were to be classified as having SSc.
The final classification system was independently tested using the validation sample of SSc cases and controls. Sensitivity and specificity were calculated for the 1980 ACR preliminary classification criteria for SSc, the classification criteria proposed by LeRoy and Medsger in 2001, and the newly developed classification criteria. Exact binomial confidence limits were calculated for sensitivity and specificity. The ACR criteria and LeRoy/Medsger criteria were compared with the new criteria using 2×2 tables with McNemar's χ test and continuity correction. The criteria sets were also tested separately using only the subgroup of patients with a disease duration of ≤3 years. Further, the classification system was validated against the expert consensus on the set of 38 selected cases.
Methods
Overview
The development and testing of the classification system for SSc was based on both data and expert clinical judgment. First, candidate items for the classification criteria were generated using consensus methods and evaluated using existing databases. Second, multicriteria decision analysis was used to reduce the number of candidate criteria and assign preliminary weights. The classification system was repeatedly tested and adapted using prospectively collected SSc cases and non-SSc controls, and compared against expert clinical judgment. Third, the classification criteria were tested in a validation cohort and tested against preexisting criteria sets.
Item Generation and Reduction
One hundred sixty-eight candidate criteria were identified through two Delphi exercises. A 3-round Delphi exercise and a face-to-face consensus meeting using nominal group technique facilitated reduction of the 168 items to 23. Using a random sample of existing databases (SSc (n=783) and control patients with diseases similar to SSc (n=1 071), all based on physician diagnosis), the candidate criteria were found to have good discriminative validity.
Item Reduction and Weighting
Draft Classification System. A face-to-face meeting of four European and four North American SSc experts was held to further reduce items and assign preliminary weights using multicriteria decision analysis. The number of experts was limited in advance to 8, and they were invited based on geographic representation, knowledge from a scientific and a practical diagnostic viewpoint, and availability. At the meeting, the experts determined by consensus to which cases the criteria should be and should not be applied, and which items are sufficient to allow classification of a patient as having SSc (sufficient criteria). They then participated in a multicriteria decision analysis to further reduce the 23 items and assign preliminary weights. The experts were presented hypothetical pairs of cases with 2 of the 23 items at a time (eg, Raynaud's phenomenon positive and abnormal nailfold capillaries absent vs Raynaud's phenomenon negative and abnormal nailfold capillaries present, all other manifestations being considered equal) and they were asked to individually vote electronically on which case of the pair was more likely to be SSc. The result of the votes was immediately presented. If there was no complete agreement among the experts, considerations were discussed and a second round of voting was conducted. As a result of the repeated choices between two alternative cases, items were ranked, and weights for the items were derived using 1000 Minds decision-making software. Additional details about the methods are available in ref..
Initial Threshold Identification. The committee prepared summaries of 45 SSc cases, with a concentration of cases that were difficult to classify. These were presented to 22 SSc experts who classified the cases as definite SSc or not. The draft classification system derived from the multicriteria decision analysis was applied to the 45 cases, resulting in a score for each case. The ranking of cases by the SSc experts and the ranking of cases based on the scores provided with the draft scoring system were examined. Higher scores in the scoring system were expected to relate to a higher probability that the experts would classify the case as SSc. Using these results, an initial threshold score for SSc was identified.
Reduction and Testing of Iterative Changes. In the next step, the committee reduced the number of items, simplified the weights, and modified the threshold score. First, data on the candidate items were prospectively collected at 13 SSc centres in North America and 10 in Europe, using standardised case record forms. Data from 368 consecutive patients with SSc (diagnosis based on physician opinion) were collected, of whom half were to have had SSc for a maximum of 2 years (based on the time from the first non-Raynaud's symptom) in order to include early SSc. Data from 237 consecutive control patients with a scleroderma-like disorder (eosinophilic fasciitis (also called Shulman's disease or diffuse fasciitis with eosinophilia), scleromyxedema, systemic lupus erythematosus, dermatomyositis, polymyositis, primary Raynaud's phenomenon, mixed connective tissue disease, undifferentiated connective tissue disease, generalised morphea, nephrogenic systemic sclerosis, and diabetic cheiroarthropathy) were also collected. From these 605 patients a random sample of 100 SSc cases and 100 controls (50% from North America and 50% from Europe) was selected to form the derivation sample. The remaining 268 cases and 137 controls formed the validation sample. Institutional research ethics board approval was obtained for the collection of patient data.
The committee then met and made iterative changes to the draft system, which they continually applied in real time to the derivation cohort derived as described above. Using the derivation cohort, the scoring system was simplified by removing items that occurred with low frequency or were redundant, by aggregating similar items and then transforming the weights to obtain single digits. The preliminary score threshold was adjusted to account for the weight simplification. The impact of all proposed changes was evaluated by assessing changes to sensitivity and specificity of the criteria in the derivation cohort. The reference standard to test the sensitivity and specificity was the diagnosis by the SSc expert who submitted the case(s) and control(s).
At the same time, the changes in the classification system were also tested in 38 difficult-to-classify cases. Consequently, weights of some items were adjusted to align the scoring system with the reference standard formed by the opinions of the SSc experts as to which cases were to be classified as having SSc.
Validation
The final classification system was independently tested using the validation sample of SSc cases and controls. Sensitivity and specificity were calculated for the 1980 ACR preliminary classification criteria for SSc, the classification criteria proposed by LeRoy and Medsger in 2001, and the newly developed classification criteria. Exact binomial confidence limits were calculated for sensitivity and specificity. The ACR criteria and LeRoy/Medsger criteria were compared with the new criteria using 2×2 tables with McNemar's χ test and continuity correction. The criteria sets were also tested separately using only the subgroup of patients with a disease duration of ≤3 years. Further, the classification system was validated against the expert consensus on the set of 38 selected cases.