AbstractBackground The World Health Organization classification of myeloproliferative neoplasms discriminates between essential thrombocythemia and the prefibrotic phase of primary myelofibrosis. This discrimination is clinically relevant because essential thrombocythemia is associated with a favorable prognosis whereas patients with primary myelofibrosis have a higher risk of progression to myelofibrosis or blast crisis.Design and Methods To assess the reproducibility of the classification, six hematopathologists from five European countries re-classified 102 non-fibrotic bone marrow trephines, obtained because of sustained thrombocytosis.Results Consensus on histological classification defined as at least four identical diagnoses occurred for 63% of the samples. Inter-observer agreement showed low to moderate kappa values (0.28 to 0.57, average 0.41). The percentage of unclassifiable myeloproliferative neoplasms rose from 2% to 23% when minor criteria for primary myelofibrosis were taken into account. In contrast, the frequency of primary myelofibrosis dropped from 23% to 7%, indicating that the majority of patients with a histological diagnosis of primary myelofibrosis did not fulfill the complete criteria for this disease. Thus, over 50% of cases in this series either could not be reproducibly classified or fell into the category of unclassifiable myeloproliferative neoplasms.Conclusions World Health Organization criteria for discrimination of essential thrombocythemia from prefibrotic primary myelofibrosis are poorly to only moderately reproducible and lead to a higher proportion of non-classifiable myeloproliferative neoplasms than histology alone.
The classification of BCR-ABL-negative myeloproliferative neoplasms (MPN) has been a matter of debate for decades. The recent World Health Organization (WHO) classification discriminates between essential thrombocythemia (ET), primary myelofibrosis (PMF) and polycythemia vera (PV).1,2 It is the differentiation between the first two entities, both presenting with non-reactive thrombocytosis, which causes particular controversy.3–6 Fibrosis in PMF may follow a non-fibrotic cellular phase in a proportion of cases. This proliferation is clonal in nature7 and has previously been designated chronic megakaryocytic-granulocytic myelosis,8 cellular idiopathic myelofibrosis9 or cellular primary myelofibrosis.1,2 Three major and four minor criteria have been included in the WHO scheme to define PMF and to discriminate it from ET and other MPN. All three major and two of the minor criteria have to be fulfilled to diagnose PMF. PMF is characterized by the presence of megakaryocyte proliferation and atypia, usually accompanied by either reticulin and/or collagen fibrosis. In the absence of significant reticulin fibrosis, the megakaryocyte changes must be accompanied by increased bone marrow cellularity characterized by granulocytic proliferation and often decreased erythropoiesis (i.e., prefibrotic cellular-phase disease). Two of the following minor criteria have to be met in order to diagnose PMF: leukoerythroblastosis, increase in serum lactate dehydrogenase level, anemia, and palpable splenomegaly.1,2 Morphological criteria for ET require that bone marrow histology exhibits proliferation mainly of the megakaryocytic lineage with increased numbers of enlarged, mature megakaryocytes. There should be no significant increase or left-shift of neutrophilic granulopoiesis or erythropoiesis.1,2 Although single center studies were able to demonstrate a clinical significance of histology-based discrimination of ET and PMF,8 the reproducibility of the histological criteria has been questioned.4,6
In a recent very detailed study, 16 histopathological criteria such as cellularity and megakaryocyte morphology were analyzed for interobserver agreement and utility for identifying ET and prefibrotic PMF.4 Substantial interobserver variability was found for all criteria with the exception of reticulin grade.4
The European Bone Marrow Working Group initiated the present study on the interobserver agreement in classification of MPN based on the WHO criteria because: (i) this classification is broadly applied; (ii) in contrast to the newly proposed lymphoma classifications10,11 it has never been evaluated for feasibility in a multicenter setting; (iii) it introduces minor criteria which were not included in the retrospective studies by Thiele and co-workers, who inaugurated the concept of PMF;9,12 and (iv) its practical utility has been questioned.3,4,6
Five European centers participated in this study. In contrast to previous studies4,6 cases with overt fibrosis were not included. Also, a complete classification, combining histological and clinical data, was tested for reproducibility, rather than single histological parameters. Furthermore, neither obvious opponents nor authors of the MPN chapter of WHO classification were involved. The review panel consisted of experienced hematopathologists, who apply the WHO criteria in their everyday practice. No follow-up data with regard to progression to myelofibrosis were available for the study cases. It was beyond the scope of this study to evaluate the WHO criteria for their ability to predict myelofibrosis.
One hundred and two bone marrow biopsies from patients with non-reactive thrombocytoses were evaluated. In the first round only age and platelet counts were known to the panel members; in the second round clinical data on leukoerythroblastosis, serum lactate dehydrogenase level, hemoglobin, and spleen size were added and a revised diagnosis integrating clinical data was rendered.
Design and Methods
Design of the study
Six hematopathologists from five European countries, experienced in bone marrow histology but not involved in the WHO classification of MPN rendered independent diagnoses of ET, PMF, or MPN unclassifiable (MPNuc) on a set of 102 bone marrow trephines. Cases which were considered in consensus (4/6) as inconclusive for MPN were excluded from the study (n=6). When the diagnosis “inconclusive for MPN” was rendered by less than four panel members this was counted as a diagnosis of unclassifiable MPN. In the first round the diagnosis was based on morphology alone, and in the second round it included the knowledge of further clinical and molecular data. From the six pathologists, five contributed at least 20 cases.
The inclusion criteria were as follows: (i) biopsy taken in the period January 1 2009–December 31 2009; (ii) clinical suspicion of MPN because of sustained thrombocytosis; (iii) histopathological diagnosis of MPN (either ET or PMF); (iv) no fibrosis or fibrosis grade 1; (v) biopsy size of ≥ 1cm. Consecutive cases were included without further selection except abovementioned criteria (i) to (v).
In the majority of cases information on spleen size, blood picture, JAK2 status, lactate dehydrogenase concentration, and blood cell counts was available. In the other cases only incomplete data were reported with one or more of the parameters lacking (n=31). These cases were excluded from the second round of consensus assessment. Due to anonymization of study patients the missing clinical data could not be retrieved retrospectively. The clinical data of the study population are summarized in Table 1.
From each block, five sets of stained slides were produced, including slides stained with hematoxylin-eosin, Giemsa and Gomori-silver stains and an unstained section for individual use. Slides were sent to the participants together with information on gender, age and platelet count. After morphological diagnoses had been rendered, further clinical information on spleen size, blood picture, JAK2 status, lactate dehydrogenase concentration, blood cell counts and differential blood picture were distributed and the participants then modified the diagnoses according to WHO minor criteria for PMF. For the diagnosis of fibrotic as well as prefibrotic PMF three out of three major criteria and at least two out of four minor criteria (anemia, leukoerythroblastic blood picture, increased serum lactate dehydrogenase, splenomegaly) have to be fulfilled.1 A full set of data was available for 66 cases. A consensus for a diagnosis was defined as four identical diagnoses (66.6%) from the total of six that were given to each biopsy in both rounds.
Interobserver agreement was assessed using the kappa statistic, which takes into account the agreement expected solely on the basis of chance and can be used if more than two categories are classified. Total agreement is indicated by a value of 1.0, but agreement by chance only results in a zero value.
Although there is no generally accepted value of kappa that indicates sufficient (i.e. good) agreement in the literature, Landis and Koch13 suggested the following guidelines: a kappa less than 0.4 represents poor-to-fair agreement, a kappa of 0.4 – 0.6 indicates moderate agreement, from 0.6 – 0.8 indicates substantial agreement and greater than 0.8 indicates almost perfect agreement. To measure the grade of agreement, a weighted kappa-statistic was calculated using the statistic software package SAS, version 8.0. Below 0.1 was considered to indicate no agreement.
The study was approved by the Ethics Committee of the Medizinische Hochschule Hannover.
The participants of the trial chose between four differential diagnoses: ET, PMF (prefibrotic or cellular phase), MPNuc, inconclusive for MPN.
In the first round of assessment only 3% of cases received four different diagnoses, indicating total disagreement. Three different diagnoses were given in 18%. The majority of cases (79%) were uniformly categorized or had only two different diagnoses (Table 2). These numbers improved to 0% and 87%, respectively, after more clinical data were incorporated to apply the WHO minor criteria for PMF (Table 2).
The most frequent diagnosis, agreed upon by at least four participants was ET (37%), followed by PMF (24%). When minor PMF criteria were considered, the frequency of MPNuc rose to 23%, while PMF became less frequent (7%).
Complete consensus (100%) from all six hematopathologists concurring on a single morphological interpretation was observed in only 10% of cases. This figure increased to 17% when more detailed clinical data were considered. Representative consensus cases of ET and PMF categories are shown in Figure 1. At a consensus level of 83.3% (the same diagnosis by five out of the six hematopathologists) the figures were 35% in the first round and 32% in the second round, respectively. When consensus was defined by at least four out of the six hematopathologists coming to identical conclusions, consensus was reached in 61% and 64% of cases, respectively (Table 3).
In 12 cases, half of the panel members gave a morphological diagnosis of ET and half a diagnosis of PMF. Representative examples are illustrated in Figure 1. These cases were characterized by a slightly increased cellularity, a range of megakaryocytic size but no dense cluster formation and a lower grade of megakaryocytic pleomorphism. The number of ambiguous cases decreased from 12 to four after clinical data were taken into consideration, but then a comparably high number of divergent diagnoses was seen between ET and MPNuc (14%). The percentage of MPN cases considered by at least four participants as unclassifiable rose from 2% to 23% after inclusion of minor criteria for PMF, whereas the percentage of cases with a firm PMF diagnosis dropped from 24% to 7%. Thus, 71% of cases which would have been classified as PMF exclusively on the ground of bone marrow histology did not fulfill the complete WHO criteria.
When more clinical data became available, no cases remained as inconclusive for MPN by consensus diagnosis (four out of six participants). Based on histology alone, 6% of cases fell into this category and were not considered further for the calculation of consensus.
The concordance of diagnoses between pathologists was measured by the kappa statistic. The agreement between the diagnoses from each of the hematopathologists was determined, leading to 15 different kappa values. Interobserver variability in the first round showed low to moderate kappa values (0.28 to 0.57, average 0.41). Moderate concordance rates were seen in six out of 15 possible paired combinations between observers, whose agreement was compared (40% of all kappa values). Poor concordance rates occurred in 60%. After consideration of clinical data no concordance at all emerged for two out of 15 possible paired combinations between observers (13% of all kappa values) (Table 4).
Since Dameshek’s seminal description of the chronic myeloproliferatve diseases as a group of interrelated diseases14 there has been continuous controversy on how to subtype the disorders which fall into this category. The discovery of the JAK2 mutation15,16 confirmed that these diseases not only share clinical and histopathological features but are even more closely related because they have a common molecular abnormality and hence most likely also a shared pathogenesis. Consequently, these diseases were grouped together as MPN by the WHO classification.1,2 Despite considerable overlap not all MPN exhibit the same risk of progression to either myelofibrosis or acute leukemia. In particular Thiele et al. emphasized that the risk of myelofibrosis is intrinsic to a special subtype of MPN, which in its early stages is not fibrotic.9,12 The early stage was named the cellular phase of idiopathic myelofibrosis and later the name was changed to prefibrotic PMF.1,9 A number of histological features of the bone marrow, such as pleomorphism and clustering of megakaryocytes, help to distinguish early PMF from ET, which is said not to exhibit a tendency to progress to overt fibrosis or blast crisis.9 The WHO classification has adopted the morphological criteria and combined them with clinical criteria (minor criteria) to establish the diagnosis of PMF.1,2 Such a combination of criteria was not used in the purely histological studies by Thiele et al.9,12 and to the best of our knowledge has not been tested in clinical trials. This approach has been challenged in particular by the groups involved in the largest prospective therapy study on ET so far.4,5
When the diagnoses rendered by six hematopathologists in cases which fall into the category of either ET or PMF were compared, it became obvious that the interobserver variability is high and only low to moderate kappa values were achieved (Table 4). Only 13% of cases were interpreted in complete consensus. At least four out of six pathologists concurred in roughly two thirds of cases. Whereas the subclassification is obviously a matter of subjectivity, the diagnosis of MPN as such appears to be much more reproducible. “Inconclusive for MPN” was diagnosed either in consensus (six cases before more detailed clinical data were given, Table 3) or not at all (data not shown).
The current study shows that a considerable proportion of cases which would have been categorized as cellular idiopathic myelofibrosis or prefibrotic PMF on the grounds of histological criteria alone became unclassifiable when two minor criteria were also taken into consideration. Many pathologists, including those who apply the WHO criteria, may not be aware of the fact that a clear-cut case of PMF on the basis of histology cannot be diagnosed histologically as such according to the WHO criteria if the minor criteria are either not fulfilled or are not known. Therefore, more than 50% of PMF cases diagnosed morphologically by at least four out of six panel members, became MPNuc in this study when the WHO criteria were applied. A similar figure was reported by Campell et al.17
In pathology the typing and subtyping of most diseases should have a high degree of interobserver reliability. Classifications which can not guarantee this reliability must be reconsidered. In contrast to typing and subtyping, grading is known to be more prone to subjectivity with a lower degree of reproducibility but it has the capacity to roughly reflect differences in biology. Histomorphology would, therefore, be expected to reflect biological differences between thrombocytic MPN more adequately if a grading scheme were to be applied.
A possible scheme is proposed which would be discriminate between PV and ET within BCR-ABL-negative MPN. Both diseases can progress to myelofibrosis. The likelihood of ET doing this could be indicated by grading, paralleling the approach to PV. For example ET grade 1 could stand for “true” ET without an increase in reticulin, and ET grade 2 for cases with pleomorphic megakaryocytes and/or granulocytic proliferation and/or grade 1 fibrosis (corresponding to prefibrotic or cellular PMF). Grade 3 ET would comprise those bone marrow samples which in addition to the criteria for ET grade 2 exhibit overt fibrosis. The advantage of such a classification would be that it fits better with the overlapping findings of JAK2 and MPL mutations,18 and is in concordance with the approach to PV, in which it is recognized that some cases of PMF may follow undiagnosed PV. We suggest reserving the diagnosis of PMF for those cases with grade 3 fibrosis in the bone marrow and which fulfill at least two of the four minor criteria required for PMF by the WHO classification. In this way, the emotively loaded term “myelofibrosis” will be reserved for clinically relevant cases.
Although there seems to be a limited reproducibility of the histological category of PMF, differences in histology are paralleled by constant variations in gene expression between ET and PMF.18–21 The histological distinction, which is widely accepted,8,9,22 does, therefore, appear to have a biological basis and reflects differences in the propensity to advance to myelofibrosis.
This is confirmed by recent studies.23,24 In one of these studies,23 carried out in two centers, a kappa value of 0.626 (substantial) was achieved when only morphology was considered. This finding indicates that by training and consensus panels the inter-observer agreement can potentially be improved from moderate to substantial. In the current study no consensus conference was organized. Consensus conferences are not usually part of the every day practice of well-trained hematopathologists whose work was intended to be reflected in this study. Furthermore, two of the panelists were affiliated to the same institution. Interestingly, the level of concordance between them did not differ significantly from that found among the other panelists. This finding suggests that consensus conferences might lead to higher concordance rates which, however, will most likely not persist under routine diagnostic circumstances. However, as close as an experiment on reproducibility of diagnoses may come to reality it is not identical to the diagnostic situation in real life because of the lack of therapeutic impact for patients and artificially high frequency of similar cases. Consequently, the figures on consensus cannot be extrapolated directly to clinical practice.
In one of the studies reporting substantial concordance rates between two centers on the basis of histology alone, no patient in a series of 646 cases classified as having early PMF had a leukoerythroblastic blood picture, more than 50% of patients had no anemia, roughly 50% of patients did not have splenomegaly, and only a slight median increase of lactate dehydrogenase was documented.23 From this compilation of data alone it can be concluded, with high likelihood, that in this study minor WHO criteria for PMF were not fulfilled in a considerable proportion of patients with early PMF.23 Accordingly, the data suggest that strict adherence to the WHO criteria in this study would have led to a higher proportion of MPNuc cases, as seen in our study, a situation which is most unsatisfactory for clinical decision-making.
The WHO classification scheme, which leads to a high proportion of unclassifiable cases (23%) and causes discrepant diagnoses in at least 36% of cases, may thus be considered inadequate for routine clinical use or stratification in prospective therapeutic trials. Until accurate, prospective molecular markers for progression to myelofibrosis are available, pathologists and hematologists should be aware of the potential prognostic value but also the limited reproducibility of the currently applied histological classification.
We gratefully acknowledge the support of Harald Choritz and Britta Wiese in the statistical analysis of kappa-values. We thank Drs. Jürgen Thiele, Jim Vardiman and Guntram Büsche for useful discussions.
- Authorship and Disclosures The information provided by the authors about contributions from persons listed as authors and in acknowledgments is available with the full text of this paper at www.haematologica.org.
- Financial and other disclosures provided by the authors using the ICMJE (www.icmje.org) Uniform Format for Disclosure of Competing Interests are also available at www.haematologica.org.
- Received July 4, 2011.
- Revision received October 25, 2011.
- Accepted October 25, 2011.
- Swerdlow SH, Campo E, Harris NL, Jaffe SS, Pileri SA, Stein H. WHO Classification of Tumours of Haemtopoietic and Lymphoid Tissues Lyon. IARC: Lyon; 2008. Google Scholar
- Tefferi A, Thiele J, Orazi A, Kvasnicka HM, Barbui T, Hanson CA. Proposals and rationale for revision of the World Health Organization diagnostic criteria for polycythemia vera, essential thrombocythemia, and primary myelofibrosis: recommendations from an ad hoc international expert panel. Blood. 2007; 110(4):1092-7. PubMedhttps://doi.org/10.1182/blood-2007-04-083501Google Scholar
- Spivak JL, Silver RT. The revised World Health Organization diagnostic criteria for polycythemia vera, essential thrombocytosis, and primary myelofibrosis: an alternative proposal. Blood. 2008; 112(2):231-9. PubMedhttps://doi.org/10.1182/blood-2007-12-128454Google Scholar
- Wilkins BS, Erber WN, Bareford D, Buck G, Wheatley K, East CL. Bone marrow pathology in essential thrombocythemia: interobserver reliability and utility for identifying disease subtypes. Blood. 2008; 111(1):60-70. PubMedhttps://doi.org/10.1182/blood-2007-05-091850Google Scholar
- Campbell PJ, Bareford D, Erber WN, Wilkins BS, Wright P, Buck G. Reticulin accumulation in essential thrombocythemia: prognostic significance and relationship to therapy. J Clin Oncol. 2009; 27(18):2991-9. PubMedhttps://doi.org/10.1200/JCO.2008.20.3174Google Scholar
- Brousseau M, Parot-Schinkel E, Moles MP, Boyer F, Hunault M, Rousselet MC. Practical application and clinical impact of the WHO histopathological criteria on bone marrow biopsy for the diagnosis of essential thrombocythemia versus prefibrotic primary myelofibrosis. Histopathology. 2010; 56(6):758-67. PubMedhttps://doi.org/10.1111/j.1365-2559.2010.03545.xGoogle Scholar
- Kreipe H, Jaquet K, Felgner J, Radzun HJ, Parwaresch MR. Clonal granulocytes and bone marrow cells in the cellular phase of agnogenic myeloid metaplasia. Blood. 1991; 78(7):1814-7. PubMedGoogle Scholar
- Georgii A, Buhr T, Buesche G, Kreft A, Choritz H. Classification and staging of Ph-negative myeloproliferative disorders by histopathology from bone marrow biopsies. Leuk Lymphoma. 1996; 22(Suppl 1):15-29. PubMedhttps://doi.org/10.3109/10428199609074357Google Scholar
- Thiele J, Kvasnicka HM. Diagnostic differentiation of essential thrombocythaemia from thrombocythaemias associated with chronic idiopathic myelofibrosis by discriminate analysis of bone marrow features--a clinicopathological study on 272 patients. Histol Histopathol. 2003; 18(1):93-102. PubMedGoogle Scholar
- A clinical evaluation of the International Lymphoma Study Group classification of non-Hodgkin’s lymphoma. Blood. 1997; 89(11):3909-18. PubMedGoogle Scholar
- Naresh KN, Agarwal B, Nathwani BN, Diebold J, McLennan KA, Muller-Hermelink KH. Use of the World Health Organization (WHO) classification of non-Hodgkin’s lymphoma in Mumbai, India: a review of 200 consecutive cases by a panel of five expert hematopathologists. Leuk Lymphoma. 2004; 45(8):1569-77. PubMedhttps://doi.org/10.1080/10428190410001683679Google Scholar
- Thiele J, Kvasnicka HM. Chronic myeloproliferative disorders with thrombocythemia: a comparative study of two classification systems (PVSG, WHO) on 839 patients. Ann Hematol. 2003; 82(3):148-52. PubMedGoogle Scholar
- Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977; 33(1):159-74. PubMedhttps://doi.org/10.2307/2529310Google Scholar
- Dameshek W. Some speculations on the myeloproliferative syndromes. Blood. 1951; 6(4):372-5. PubMedGoogle Scholar
- James C, Ugo V, Le Couédic JP, Staerk J, Delhommeau F, Lacout C, Garçon L. A unique clonal JAK2 mutation leading to constitutive signalling causes polycythaemia vera. Nature. 2005; 434(7037):1144-8. PubMedhttps://doi.org/10.1038/nature03546Google Scholar
- Kralovics R, Passamonti F, Buser AS, Teo SS, Tiedt R, Passweg JR. A gain-of-function mutation of JAK2 in myeloproliferative disorders. N Engl J Med. 2005; 352(17):1779-90. PubMedhttps://doi.org/10.1056/NEJMoa051113Google Scholar
- Campbell PJ, Bareford D, Erber WN, Wilkins BS, Wright P, Buck G. Reply to J. Thiele, et al. J Clin Oncol. 2009; 27(18):e222-e223. https://doi.org/10.1200/JCO.2009.24.5233Google Scholar
- Hussein K, Bock O, Theophile K, von Neuhoff N, Buhr T, Schlué J. JAK2(V617F) allele burden discriminates essential thrombocythemia from a subset of prefibrotic-stage primary myelofibrosis. Exp Hematol. 2009; 37(10):e7-1193. PubMedhttps://doi.org/10.1016/j.exphem.2009.07.005Google Scholar
- Florena AM, Tripodo C, Di Bernardo A, Iannitto E, Guarnotta C, Porcasi R. Different immunophenotypical apoptotic profiles characterise megakaryocytes of essential thrombocythaemia and primary myelofibrosis. J Clin Pathol. 2009; 62(4):331-8. PubMedhttps://doi.org/10.1136/jcp.2007.054353Google Scholar
- Muth M, Engelhardt BM, Kröger N, Hussein K, Schlué J, Büsche G. Thrombospondin-1 (TSP-1) in primary myelofibrosis (PMF) - a megakaryocyte-derived biomarker which largely discriminates PMF from essential thrombocythemia. Ann Hematol. 2011; 90(1):33-40. PubMedhttps://doi.org/10.1007/s00277-010-1024-zGoogle Scholar
- Muth M, Büsche G, Bock O, Hussein K, Kreipe H. Aberrant proplatelet formation in chronic myeloproliferative neoplasms. Leuk Res. 2010; 34(11):1424-9. PubMedhttps://doi.org/10.1016/j.leukres.2010.03.040Google Scholar
- Florena AM, Tripodo C, Iannitto E, Porcasi R, Ingrao S, Franco V. Value of bone marrow biopsy in the diagnosis of essential thrombocythemia. Haematologica. 2004; 89(8):911-9. PubMedGoogle Scholar
- Thiele J, Kvasnicka HM, Müllauer L, Buxhofer-Autsch V, Gisslinger B, Gisslinger H. Essential thromobcythemia versus early primary myelofibrosis – a multicenter study to validate WHO classification. Blood. 2011; 117(21):5710-8. PubMedhttps://doi.org/10.1182/blood-2010-07-293761Google Scholar
- Barbui T, Thiele J, Passamont F, Rumi E, Boveri E, Ruggeri M. Survival and disease progression in essential thrombocythemia are significantly influenced by accurate morphologic diagnosis: an international study of 1,104 patients. J Clin Oncol. 2011; 29(23):3179-84. PubMedhttps://doi.org/10.1200/JCO.2010.34.5298Google Scholar