Does Pay-for-Performance Improve Surgical Outcomes?
Does Pay-for-Performance Improve Surgical Outcomes?
We used data from the SID of 12 states (Arizona, California, Florida, Iowa, Massachusetts, Maryland, North Carolina, Nebraska, New Jersey, New York, Washington, and Wisconsin) from 2003 to 2009. This data set is maintained and distributed as part of the Healthcare Cost and Utilization Project of the Agency for Healthcare Research and Quality and contains all inpatient discharges from short-term, acute-care, nonfederal, general, and other specialty hospitals in participating states. We chose these 12 states because they (1) were geographically dispersed across the United States (allowing for diversity in our sample), (2) were available for the period we were studying, and (3) had relatively large sample sizes. The discharge records from these databases contain information collected as part of billing records, including patient demographics, International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM), procedures, diagnoses, expected payer, admission and discharge dates, and disposition. Data on hospital characteristics were obtained from the American Hospital Association Annual Survey.
Using appropriate ICD-9-CM procedure codes, we identified all adult patients undergoing CABG (36.10–19), hip replacement (81.51–52), and total knee replacement (81.54) in these 12 states. We excluded patients undergoing CABG with procedure codes indicating that other operations were simultaneously performed (ie, valve replacement) (35.00–99, 36.2, 37.32, 37.34, 37.35). Patients undergoing joint replacement were excluded for revision procedures or trauma diagnoses (ICD-9-CM procedure codes 00.70–73, 00.80–85, 81.53, 81.55 and ICD-9-CM diagnosis codes 800–959).
In 2003, the Centers for Medicare & Medicaid Services (CMS) created the largest experiment in P4P to date: the Premier HQID. A total of 216 hospitals agreed to provide data on process and quality indicators for 3 medical conditions (acute myocardial infarction, congestive heart failure, and pneumonia) and 2 surgical procedures (CABG and total knee or hip replacement), with additional indicators for risk-adjusted mortality for acute myocardial infarction and CABG and 30-day readmissions for total knee or hip replacement. Recruitment of all participating hospitals was completed in March 2003. Hospitals were required to participate in each of the 5 clinical areas only if they provided care for that condition. The project also required a minimum patient volume of 30 cases to be included. In phase 1 of the Premier HQID (2003–2006), the top 20% of hospitals received 1% to 2% bonuses in Medicare reimbursements. The incentive structure was criticized for rewarding only high-performing hospitals, with little incentive for poor-performing hospitals to improve. Therefore, the Premier HQID restructured its incentive structure in 2006. In phase 2 (2006–2009), financial bonuses were additionally given to hospitals that significantly improved on their performance. Hospitals could now qualify for bonuses in 3 ways: (1) performing in the top 20% of hospitals ("Top Performance Award"); (2) performing above the median level of performance in the current year and ranking in the top 20% in terms of improvement ("Improvement Award"); and (3) performing above the median level of performance for a composite quality score benchmark from 2 years prior ("Attainment Award"). Over the 6 years of the demonstration, the CMS awarded more than $60 million in financial bonuses, with almost $12 million in incentive payments in the final year.
The goal of this analysis was to examine the impact of the HQID incentive structure changes on adverse events after fully taking into account temporal trends toward improved outcomes in cardiac and orthopedic surgery. We used an econometric technique, the difference-in-difference approach, which is commonly used to evaluate the impact of policy change. This approach isolates the impact of the policy change on outcomes above and beyond any changes seen in a control group that were not exposed to the policy change. In our analysis, we chose non-Premier hospitals in the SID as the control group because they are exposed to all other factors driving improved outcomes over time except participation in the Premier HQID.
Our outcomes of interest were risk-adjusted inpatient mortality, inpatient complication, and serious inpatient complications. We used specific ICD-9-CM codes to identify inpatient complications as previously validated by medical record review in The Complications Screening Program. The following postoperative complications were identified in our study: pulmonary failure (518.81, 518.4, 518.5, 518.8) pneumonia (481, 482.0–9, 483, 484, 485, 507.0), deep venous thrombosis/pulmonary embolism (415.1, 451.11, 451.19, 451.2, 451.81, 453.8), acute renal failure (584), hemorrhage (998.1), surgical site infection (958.3, 998.3, 998.5, 998.59, 998.51), gastrointestinal hemorrhage (530.82, 531.00–21, 531.40–41, 531.60–61, 532.00–21, 532.40–41, 532.60–61, 533.00–21, 533.40–41, 533.60–61, 534.00–21, 534.40–41, 534.60–61, 535.01, 535.11, 535.21, 535.31, 535.41, 535.51, 535.61, 578.9), and myocardial infarction (410.00–91) for patients undergoing only joint replacement. The coding of surgical and medical complications, including those identified in our study, has been shown by others to be in good agreement when ICD-9-CM codes, and the medical record were compared. Myocardial infarction was not considered a complication after CABG because of the inability to assess the temporal relationship of an acute myocardial infarction to the operation. Serious complications were noted as any of the aforementioned complications with a length of stay above the 75th percentile. This addition of the extended length of stay criterion was intended to increase the specificity of the outcome variable.
All participating Premier HQID hospitals through the 6 years of the demonstration in the aforementioned 12 states were identified from the Premier Web site and included in the analysis. To perform the difference-in-difference analysis, we used the following logistic regression model to evaluate the relationship between patient outcomes Yit (inpatient mortality, complications, and serious complications) and the HQID incentive structure changes:
We included categorical variables indicating whether the patient was treated at a Premier hospital (Premier) and whether this treatment was before or after (ie, pre-post) the incentive expansion (Post). Because the SID does not have dates of surgery, we used admission quarter to define whether patients had surgery in phase 1 (January 2003 to September 2006) or phase 2 (October 2006 to December 2009) of the program. To adjust for secular trends, we included a continuous time variable that effectively takes into account linear time trends. In all models, we adjusted for patient characteristics ([theta]Xit) by entering the 29 Elixhauser comborbid diseases as individual covariates, a widely used and previously validated approach for risk adjustment in administrative data. Finally, we added an interaction term of the Premier (vs non-Premier) variable and the pre-post incentive change variable (Premier*Post). The coefficient from this interaction term, that is, the difference-in-difference estimator, can be interpreted as the impact of the independent impact of the Premier HQID incentive changes.
We performed a sensitivity analysis examining the impact of the incentive change on the bottom 20% of hospitals. The incentive redesign in 2006 was aimed at motivating poor-performing hospitals to improve their performance. Hospitals in the bottom 20% of performance had the most room to improve and were the perceived targets of the expanded "Improvement Award." To establish which hospitals were in the bottom 20%, we used logistic regression to predict the probability of inpatient mortality, any complication, and serious complication for each patient incorporating patient and hospital characteristics for both CABG and joint replacement. We aggregated these data at the hospital level to calculate 6 distinct ratios of observed to expected outcome rates (inpatient mortality, complications, and serious complications for both procedures). The bottom 20% of hospitals for each adverse outcome of interest was isolated, and similar analysis was performed as done previously. All statistical analyses were performed using STATA 11.0 (College Station, Texas).
Methods
Data Source and Study Population
We used data from the SID of 12 states (Arizona, California, Florida, Iowa, Massachusetts, Maryland, North Carolina, Nebraska, New Jersey, New York, Washington, and Wisconsin) from 2003 to 2009. This data set is maintained and distributed as part of the Healthcare Cost and Utilization Project of the Agency for Healthcare Research and Quality and contains all inpatient discharges from short-term, acute-care, nonfederal, general, and other specialty hospitals in participating states. We chose these 12 states because they (1) were geographically dispersed across the United States (allowing for diversity in our sample), (2) were available for the period we were studying, and (3) had relatively large sample sizes. The discharge records from these databases contain information collected as part of billing records, including patient demographics, International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM), procedures, diagnoses, expected payer, admission and discharge dates, and disposition. Data on hospital characteristics were obtained from the American Hospital Association Annual Survey.
Using appropriate ICD-9-CM procedure codes, we identified all adult patients undergoing CABG (36.10–19), hip replacement (81.51–52), and total knee replacement (81.54) in these 12 states. We excluded patients undergoing CABG with procedure codes indicating that other operations were simultaneously performed (ie, valve replacement) (35.00–99, 36.2, 37.32, 37.34, 37.35). Patients undergoing joint replacement were excluded for revision procedures or trauma diagnoses (ICD-9-CM procedure codes 00.70–73, 00.80–85, 81.53, 81.55 and ICD-9-CM diagnosis codes 800–959).
Study Overview
In 2003, the Centers for Medicare & Medicaid Services (CMS) created the largest experiment in P4P to date: the Premier HQID. A total of 216 hospitals agreed to provide data on process and quality indicators for 3 medical conditions (acute myocardial infarction, congestive heart failure, and pneumonia) and 2 surgical procedures (CABG and total knee or hip replacement), with additional indicators for risk-adjusted mortality for acute myocardial infarction and CABG and 30-day readmissions for total knee or hip replacement. Recruitment of all participating hospitals was completed in March 2003. Hospitals were required to participate in each of the 5 clinical areas only if they provided care for that condition. The project also required a minimum patient volume of 30 cases to be included. In phase 1 of the Premier HQID (2003–2006), the top 20% of hospitals received 1% to 2% bonuses in Medicare reimbursements. The incentive structure was criticized for rewarding only high-performing hospitals, with little incentive for poor-performing hospitals to improve. Therefore, the Premier HQID restructured its incentive structure in 2006. In phase 2 (2006–2009), financial bonuses were additionally given to hospitals that significantly improved on their performance. Hospitals could now qualify for bonuses in 3 ways: (1) performing in the top 20% of hospitals ("Top Performance Award"); (2) performing above the median level of performance in the current year and ranking in the top 20% in terms of improvement ("Improvement Award"); and (3) performing above the median level of performance for a composite quality score benchmark from 2 years prior ("Attainment Award"). Over the 6 years of the demonstration, the CMS awarded more than $60 million in financial bonuses, with almost $12 million in incentive payments in the final year.
The goal of this analysis was to examine the impact of the HQID incentive structure changes on adverse events after fully taking into account temporal trends toward improved outcomes in cardiac and orthopedic surgery. We used an econometric technique, the difference-in-difference approach, which is commonly used to evaluate the impact of policy change. This approach isolates the impact of the policy change on outcomes above and beyond any changes seen in a control group that were not exposed to the policy change. In our analysis, we chose non-Premier hospitals in the SID as the control group because they are exposed to all other factors driving improved outcomes over time except participation in the Premier HQID.
Quality Measures
Our outcomes of interest were risk-adjusted inpatient mortality, inpatient complication, and serious inpatient complications. We used specific ICD-9-CM codes to identify inpatient complications as previously validated by medical record review in The Complications Screening Program. The following postoperative complications were identified in our study: pulmonary failure (518.81, 518.4, 518.5, 518.8) pneumonia (481, 482.0–9, 483, 484, 485, 507.0), deep venous thrombosis/pulmonary embolism (415.1, 451.11, 451.19, 451.2, 451.81, 453.8), acute renal failure (584), hemorrhage (998.1), surgical site infection (958.3, 998.3, 998.5, 998.59, 998.51), gastrointestinal hemorrhage (530.82, 531.00–21, 531.40–41, 531.60–61, 532.00–21, 532.40–41, 532.60–61, 533.00–21, 533.40–41, 533.60–61, 534.00–21, 534.40–41, 534.60–61, 535.01, 535.11, 535.21, 535.31, 535.41, 535.51, 535.61, 578.9), and myocardial infarction (410.00–91) for patients undergoing only joint replacement. The coding of surgical and medical complications, including those identified in our study, has been shown by others to be in good agreement when ICD-9-CM codes, and the medical record were compared. Myocardial infarction was not considered a complication after CABG because of the inability to assess the temporal relationship of an acute myocardial infarction to the operation. Serious complications were noted as any of the aforementioned complications with a length of stay above the 75th percentile. This addition of the extended length of stay criterion was intended to increase the specificity of the outcome variable.
Statistical Analysis
All participating Premier HQID hospitals through the 6 years of the demonstration in the aforementioned 12 states were identified from the Premier Web site and included in the analysis. To perform the difference-in-difference analysis, we used the following logistic regression model to evaluate the relationship between patient outcomes Yit (inpatient mortality, complications, and serious complications) and the HQID incentive structure changes:
We included categorical variables indicating whether the patient was treated at a Premier hospital (Premier) and whether this treatment was before or after (ie, pre-post) the incentive expansion (Post). Because the SID does not have dates of surgery, we used admission quarter to define whether patients had surgery in phase 1 (January 2003 to September 2006) or phase 2 (October 2006 to December 2009) of the program. To adjust for secular trends, we included a continuous time variable that effectively takes into account linear time trends. In all models, we adjusted for patient characteristics ([theta]Xit) by entering the 29 Elixhauser comborbid diseases as individual covariates, a widely used and previously validated approach for risk adjustment in administrative data. Finally, we added an interaction term of the Premier (vs non-Premier) variable and the pre-post incentive change variable (Premier*Post). The coefficient from this interaction term, that is, the difference-in-difference estimator, can be interpreted as the impact of the independent impact of the Premier HQID incentive changes.
We performed a sensitivity analysis examining the impact of the incentive change on the bottom 20% of hospitals. The incentive redesign in 2006 was aimed at motivating poor-performing hospitals to improve their performance. Hospitals in the bottom 20% of performance had the most room to improve and were the perceived targets of the expanded "Improvement Award." To establish which hospitals were in the bottom 20%, we used logistic regression to predict the probability of inpatient mortality, any complication, and serious complication for each patient incorporating patient and hospital characteristics for both CABG and joint replacement. We aggregated these data at the hospital level to calculate 6 distinct ratios of observed to expected outcome rates (inpatient mortality, complications, and serious complications for both procedures). The bottom 20% of hospitals for each adverse outcome of interest was isolated, and similar analysis was performed as done previously. All statistical analyses were performed using STATA 11.0 (College Station, Texas).