Bio-Rad - Preparing for a Stress-free QC Audit

Evaluation of AI colon biopsy screening tool errors

Background

Lenses and microscopes

The use of microscopy has been pivotal to the diagnosis of disease for hundreds of years. The earliest evidence for the use of magnification dates back to the 5th century BC and the use of simple microscopes and lenses in spectacles in the 13th century AD. Compound microscopes appeared in Europe in the early 17th century and enabled the discovery of microorganisms, reported by van Leeuwenhoek in 1676, as well as Koch’s postulates – the criteria to establish a causal relationship between a microbe and a disease – in 1884. The microscope remained largely in the same form until the end of the 19th / early 20th centuries when use of a condensing lens was better understood and electric light became available as a light source.

Stains

The advances in microscopy allowed the development of histopathology into the science as we understand it (‘the microscopic examination of tissue to study the manifestation of disease’), where very thin sections of biopsy material are fixed to slides for examination. Hematoxylin and eosin staining of cells (begun in the late 1800s) allowed for better visualization of the nucleus (purple/blue) and cytoplasm (pink) in cells and is still the most commonly used stain. Developed in the late 20th century, techniques such as immuno-histochemistry and fluorescence microscopy (using fluorescently labelled antibodies) and in situ hybridization have allowed the identification of precise antigens and DNA or RNA molecules, respectively.

Although used in many diagnostic and prognostic situations (such as infections and inflammatory diseases), histopathology is perhaps seen as most crucial in the diagnosis, staging and prognosis of cancer.

Digital pathology and whole slide imaging

Digital pathology grew out of early forays into telemedicine, where microscope-mounted static cameras were used to take static photographs of slides to allow image sharing and remote analysis. Digital pathology as we know it now, really developed in the early 2000s with the advent of whole slide imaging scanners that create digital images. There are many benefits of digital pathology, which has been discussed in earlier articles [1–3]. These include easier storage (no broken slides, fading stains or bulky physical storage archives), more efficient integration into laboratory information systems and patient records, as well of course as the original goal of telepathology and remote image analysis. Clinical and diagnostic benefits include greater accuracy and efficiency from high-resolution images that allow zooming in to inspect details; Z-stacking, which shows the different focal planes along the vertical axis of the tissue and so provides the pathologist with a greater spatial awareness of the features; and the multiplexing of different markers on the same slide.

Artificial intelligence (AI)

One of the important advantages of image digitization is, of course, the opportunity that this creates for objective quantification of data using image analysis tools. Image segmentation, which partitions the image into different image segments (or regions or objects) along with classification algorithms can then automatically identify and quantify features such as tumour necrosis, lymphovascular invasion, mitotic figure counts per square area of tumour, characterize the full extent of fibrosis, as well as provide an estimation of molecular biomarkers such as mutated genes, tumor mutational burden, or transcriptional changes. The automation of such tasks, usually done manually by a pathologist, increases standardization and pathologist throughput. This improves turnaround times, can increase accuracy and releases the pathologist from routine tasks to concentrate on the more difficult, equivocal decisions. For the patient, this kind of analysis can affect clinically impactful measures such as tumour grade, patient treatment options and prognosis. These automated analyses are implemented through neural networks and machine learning (which form the basis of AI), and are already being implemented in practice.

AdobeStock 1713838485

Digital pathology enables easier storage and information retrieval, tracking and sharing (Adobe Stock)

Evaluation of AI image analysis tools

In order for the benefits of AI to be realized in heath care, it must be effective and safe. To date, most assessment of AI tools has been retrospective and has concentrated on diagnostic accuracy, which is high. However, data on the clinical impact of the error rates is often not reported [4]. A recent study tested the IGUANA (Interpretable Gland-Graphs using a Neural Aggregator) algorithm that examines colon and rectal biopsy H&E WSIs and classifies them into normal or abnormal categories [5]. The results showed that the algorithm had a high specificity at high sensitivity cut-off values,
but, nevertheless, errors were made. The same team has also now investigated the potential clinical impact of the errors that the algorithm made [6].

The original data set consisted of 5054 WSIs from 2080 patients and 42% of the WSIs were classed as abnormal. Investigation showed that IGUANA had made false negative (FN) errors on 220 WSIs (4.4% of WSIs; 7.9% of patient cases). A smaller number of false positive (FP) errors were made. It was deemed that the FP errors were less problematic in terms of patient health impact, as these would be reviewed by a pathologist and the correct diagnosis received. However, FN errors would result in a stop in the patient care pathway, creating a high-risk situation where a patient who needed further investigation and/or treatment would not receive it, causing a delayed or missed diagnosis and treatment, potentially resulting in serious patient harm or death. Analysis of 218 FN WSIs showed the diagnoses that had been classified by IGUANA as normal. The most common features misclassified were: acute/active inflammation (36 WSIs); chronic inflammation (32); active chronic inflammation (22); hyperplastic polyp (20); Low-grade tubular adenoma (12); collagenous colitis and hyperplastic polyp (goblet cell rich variant/no or no serrations), (both 11); acute/active inflammation (with granuloma), lymphocytic colitis, within normal histological limits (WNHL)/subtle abnormality pathologically insignificant (SAPI) (all 10); with other missed features occurring in single digits.

Next, the level of harm that would have happened if these FNs had occurred in clinical practice was assigned, ranging through five levels from 1, no harm (88.4% of patients) ; 2, minimal harm (7.8%); 3, minor harm (2.3%); 4, moderate harm (0.8%); and 5, major harm (0.8%). Of the 15 cases where patient harm would have resulted (categories 2–5), the impact on patient management was assigned as delayed diagnosis (2 cases); delayed surveillance (2); and delayed treatment (11).

The authors also discussed the reasons why IGUANA made the FN errors as well as the reasons why a FN error did not lead to patient harm. They also discuss ways of improving the evaluation and the training of AI tools.

They conclude by saying that:

… even with a 4.4% WSI FN error rate, or 7.9% case-level FN error rate, this AI tool might be more suitable for adoption than this statistic portrays. It highlights that simply reporting error rates without addressing the clinical impact could lead to misrepresentation of AI tool safety.

Automated image analysis can greatly assist the human pathologist, but the adoption of these technologies is not always easy. Above all, however, there must be confidence that any errors are not going to cause patient harm. The paper by Evans et al. demonstrates a clinically relevant way of investigating errors that goes beyond simply reporting the statistical error rate and shows that this does not provide the full picture.

AdobeStock 372812412

Mildly inflamed colonic mucosa, 20× (Adobe Stock)

References
1. Williams B. Digital pathology in the clinical lab. CLI 2020/2021;Dec/Jan:6–7 (https://clinlabint.com/digital-pathology-in-the-clinical-lab/).
2. Abbey B, Parker BS. Colorimetric histology: benefits, opportunities and future directions. CLI 2021/2022;Dec/Jan:6–9 (https://clinlabint.com/colorimetric-histology-benefits-opportunities-and-future-directions/).
3. Yousif M, McClintock DS, Yao K. Artificial intelligence is the key driver for digital pathology adoption. CLI 2020/2021;Dec/Jan:8–11 (https://clinlabint.com/artificial-intelligence-is-the-key-driver-for-digital-pathology-adoption/).
4. McGenity C, Clarke EL, Jennings C, Matthews G, Cartlidge C et al. Artificial intelligence in digital pathology: a systematic review and meta-analysis of diagnostic test accuracy. NPJ Digit Med 2024;7(1):114 (https://doi.org/10.1038/s41746-024-01106-8).
5. Graham S, Minhas F, Bilal M, Ali M, Tsang YW et al. Screening of normal endoscopic large bowel biopsies with interpretable graph learning: a retrospective study. Gut 2023;72(9):1709–1721 (https://doi.org/10.1136/gutjnl-2023-329512).
6. Evans H, Sivakumar N, Bhanderi S, Graham S, Snead D et al. Evaluating the pathological and clinical implications of errors made by an artificial intelligence colon biopsy screening tool. BMJ Open Gastroenterol 2025;12(1):e001649 (https://doi.org/10.1136/bmjgast-2024-001649).