Publications

2023

Semantic Annotation of Tabular Data for Machine-to-Machine Interoperability via Neuro-Symbolic Anchoring
Shervin Mehryar, Remzi Celebi
In this paper we investigate automated annotation of tabular data using semantic technologies in combination with neural network embedding. Specifically, we propose an anchoring model in which property and cell types from the data embedding space are aligned with ontology relation and entity types. We show that by combining the power of symbolic reasoning, neural embeddings, and loss function design, a significant performance improvement as high as 86% for column property, 82% for column type, and 87% for column qualifier annotations can be achieved based on DBpedia and Wikidata table extractions.
Shervin Mehryar, Remzi Celebi. Semantic Annotation of Tabular Data for Machine-to-Machine Interoperability via Neuro-Symbolic Anchoring. SemTab’23: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching 2023, co-located with the 22nd. International Semantic Web Conference (ISWC), November 6-10, 2023, Athens, Greece
Deep Learning of Multimodal Ultrasound: Stratifying the Response to Neoadjuvant Chemotherapy in Breast Cancer Before Treatment
Jionghui Gu, Xian Zhong, Chengyu Fang, Wenjing Lou, Peifen Fu, Henry C. Woodruff, Baohua Wang, Tianan Jiang , Philippe Lambin
Background: Not only should resistance to neoadjuvant chemotherapy (NAC) be considered in patients with breast cancer but also the possibility of achieving a pathologic complete response (PCR) after NAC. Our study aims to develop 2 multimodal ultrasound deep learning (DL) models to noninvasively predict resistance and PCR to NAC before treatment.
Methods: From January 2017 to July 2022, a total of 170 patients with breast cancer were prospectively enrolled. All patients underwent multimodal ultrasound examination (grayscale 2D ultrasound and ultrasound elastography) before NAC. We combined clinicopathological information to develop 2 DL models, DL_Clinical_resistance and DL_Clinical_PCR, for predicting resistance and PCR to NAC, respectively. In addition, these 2 models were combined to stratify the prediction of response to NAC.
Results: In the test cohort, DL_Clinical_resistance had an AUC of 0.911 (95%CI, 0.814-0.979) with a sensitivity of 0.905 (95%CI, 0.765-1.000) and an NPV of 0.882 (95%CI, 0.708-1.000). Meanwhile, DL_Clinical_PCR achieved an AUC of 0.880 (95%CI, 0.751-0.973) and sensitivity and NPV of 0.875 (95%CI, 0.688-1.000) and 0.895 (95%CI, 0.739-1.000), respectively. By combining DL_Clinical_resistance and DL_Clinical_PCR, 37.1% of patients with resistance and 25.7% of patients with PCR were successfully identified by the combined model, suggesting that these patients could benefit by an early change of treatment strategy or by implementing an organ preservation strategy after NAC.
Conclusions: The proposed DL_Clinical_resistance and DL_Clinical_PCR models and combined strategy have the potential to predict resistance and PCR to NAC before treatment and allow stratified prediction of NAC response.
Gu J, Zhong X, Fang C, Lou W, Fu P, Woodruff HC, Wang B, Jiang T, Lambin P. Deep Learning of Multimodal Ultrasound: Stratifying the Response to Neoadjuvant Chemotherapy in Breast Cancer Before Treatment. Oncologist. 2023 Sep 5:oyad227. doi: 10.1093/oncolo/oyad227
Principles of ontology-based annotation of clinical narratives
Stefan Schulz, Warren Del-Pinto, Lifeng Han, Markus Kreuzthaler, Sareh Aghaei and Goran Nenadic
Despite the increasing availability of ontology-based semantic resources for biomedical content representation, large amounts of clinical data are in narrative form only. Therefore, many clinical information management tasks require information extraction using natural language processing (NLP).
Clinical corpora annotated by humans are crucial resources for this purpose. On the one hand, they are needed to domain-fine-tune language models (LMs) with the purpose to formally represent clinical information extracted from unstructured free-text. On the other hand, annotated corpora are indispensable for assessing the results of information extracting using NLP.
The effectiveness of annotations crucially depends on annotation quality. Detailed annotation guidelines, which define the form that extracted information should take, prevent human annotators from taking erratic annotation decisions and guarantee a good inter-annotator agreement. Our hypothesis is that, to this end, annotations should (i) be based on ontological principles and (ii) be consistent with existing clinical documentation standards.
With the experience of several annotation projects we highlight the need for sophisticated guidelines. We formulate a set of abstract principles on which such guidelines should be based, followed by examples how to keep them, on the one hand, user-friendly and consistent, and on the other hand compatible with the international semantic standards SNOMED CT and FHIR, including their areas of overlap.
We sketch the representation of the resulting representations in a knowledge graph as a state-of-the-art semantic representation paradigm, which can be enriched by additional content on A-Box and T-Box level and on which symbolic and neural reasoning tasks can be applied.
Schulz, Stefan & Del-Pinto, Warren & Han, Lifeng & Kreuzthaler, Markus & Dinani, Sareh & Nenadic, Goran. (2023). Principles of ontology-based annotation of clinical narratives. Published in International Conference on Biomedical Ontology 2023 Computer Science, Medicine
Toward human-level concept learning: Pattern benchmarking for AI algorithms
Andreas Holzinger, Anna Saranti, Alessa Angerschmid, Bettina Finzel, Ute Schmid, Heimo Mueller
Artificial intelligence (AI) today is very successful at standard pattern-recognition tasks due to the availability of large amounts of data and advances in statistical data-driven machine learning. However, there is still a large gap between AI pattern recognition and human-level concept learning. Humans can learn amazingly well even under uncertainty from just a few examples and are capable of generalizing these concepts to solve new conceptual problems. The growing interest in explainable machine intelligence requires experimental environments and diagnostic/benchmark datasets to analyze existing approaches and drive progress in pattern analysis and machine intelligence. In this paper, we provide an overview of current AI solutions for benchmarking concept learning, reasoning, and generalization; discuss the state-of-the-art of existing diagnostic/benchmark datasets (such as CLEVR, CLEVRER, CLOSURE, CURI, Bongard-LOGO, V-PROM, RAVEN, Kandinsky Patterns, CLEVR-Humans, CLEVRER-Humans, and their extension containing human language); and provide an outlook of some future research directions in this exciting research domain.
Holzinger A, Saranti A, Angerschmid A, Finzel B, Schmid U, Mueller H. Toward human-level concept learning: Pattern benchmarking for AI algorithms. Patterns (N Y). 2023 Jul 5;4(8):100788. doi: 10.1016/j.patter.2023.100788. PMC: 10435961
Combining Deep Learning and Handcrafted Radiomics for Classification of Suspicious Lesions on Contrast-enhanced Mammograms
Manon P. L. Beuque, Marc B. I. Lobbes, Yvonka van Wijk, Yousif Widaatalla, Sergey Primakov, Michael Majer, Corinne Balleyguier, Henry C. Woodruff, Philippe Lambin
A deep learning algorithm was able to accurately identify and delineate suspicious lesions on contrast-enhanced mammograms, and the combined outputs of this tool and a handcrafted radiomics model achieved good diagnostic performance.
Background Handcrafted radiomics and deep learning (DL) models individually achieve good performance in lesion classification (benign vs malignant) on contrast-enhanced mammography (CEM) images. Purpose To develop a comprehensive machine learning tool able to fully automatically identify, segment, and classify breast lesions on the basis of CEM images in recall patients. Materials and Methods CEM images and clinical data were retrospectively collected between 2013 and 2018 for 1601 recall patients at Maastricht UMC+ and 283 patients at Gustave Roussy Institute for external validation. Lesions with a known status (malignant or benign) were delineated by a research assistant overseen by an expert breast radiologist. Preprocessed low-energy and recombined images were used to train a DL model for automatic lesion identification, segmentation, and classification. A handcrafted radiomics model was also trained to classify both human- and DL-segmented lesions. Sensitivity for identification and the area under the receiver operating characteristic curve (AUC) for classification were compared between individual and combined models at the image and patient levels. Results After the exclusion of patients without suspicious lesions, the total number of patients included in the training, test, and validation data sets were 850 (mean age, 63 years ± 8 [SD]), 212 (62 years ± 8), and 279 (55 years ± 12), respectively. In the external data set, lesion identification sensitivity was 90% and 99% at the image and patient level, respectively, and the mean Dice coefficient was 0.71 and 0.80 at the image and patient level, respectively. Using manual segmentations, the combined DL and handcrafted radiomics classification model achieved the highest AUC (0.88 [95% CI: 0.86, 0.91]) (P < .05 except compared with DL, handcrafted radiomics, and clinical features model, where P = .90). Using DL-generated segmentations, the combined DL and handcrafted radiomics model showed the highest AUC (0.95 [95% CI: 0.94, 0.96]) (P < .05). Conclusion The DL model accurately identified and delineated suspicious lesions on CEM images, and the combined output of the DL and handcrafted radiomics models achieved good diagnostic performance.
euque MPL, Lobbes MBI, van Wijk Y, Widaatalla Y, Primakov S, Majer M, Balleyguier C, Woodruff HC, Lambin P. Combining Deep Learning and Handcrafted Radiomics for Classification of Suspicious Lesions on Contrast-enhanced Mammograms. Radiology. 2023 Jun;307(5):e221843. doi: 10.1148/radiol.221843. PMID: 37338353
AI for life: Trends in artificial intelligence for biotechnology
Andreas Holzinger, Katharina Keiblinger, Petr Holub, Kurt Zatloukal, Heimo Müller
Due to popular successes (e.g., ChatGPT) Artificial Intelligence (AI) is on everyone's lips today. When advances in biotechnology are combined with advances in AI unprecedented new potential solutions become available. This can help with many global problems and contribute to important Sustainability Development Goals. Current examples include Food Security, Health and Well-being, Clean Water, Clean Energy, Responsible Consumption and Production, Climate Action, Life below Water, or protect, restore and promote sustainable use of terrestrial ecosystems, sustainably manage forests, combat desertification, and halt and reverse land degradation and halt biodiversity loss. AI is ubiquitous in the life sciences today. Topics include a wide range from machine learning and Big Data analytics, knowledge discovery and data mining, biomedical ontologies, knowledge-based reasoning, natural language processing, decision support and reasoning under uncertainty, temporal and spatial representation and inference, and methodological aspects of explainable AI (XAI) with applications of biotechnology. In this pre-Editorial paper, we provide an overview of open research issues and challenges for each of the topics addressed in this special issue. Potential authors can directly use this as a guideline for developing their paper.
Holzinger A, Keiblinger K, Holub P, Zatloukal K, Müller H. AI for life: Trends in artificial intelligence for biotechnology. N Biotechnol. 2023;74:16-24. doi: 10.1016/j.nbt.2023.02.001. PMID: 36754147