Yifan Peng, PhD, is an Assistant Professor in the Division of Health Sciences Department of Population Health Sciences at Weill Cornell Medicine. His main research interests include BioNLP and medical image analysis. He has published in major AI and healthcare informatics venues, including ACL, CVPR, MICCAI, and ICHI, as well as medical venues, including Nature Medicine, Nucleic Acids Research, npj Digital Medicine, and JAMIA. His research has been funded by federal agencies, including NIH and NSF, as well as by industries such as Amazon and Google. He is an Editorial Board Member for the Journal of Biomedical Informatics. He received the AMIA New Investigator Award in 2023.
Natural Language Processing in HealthcareCornell Course
Course Overview
Clinical notes and patient records contain vast amounts of data, but this data is not always in a format machines can interpret. In this course, you will discover how natural language processing (NLP) can help you transform free text into structured data for extracting insights. You'll start by reviewing NLP methods to prepare raw text for machine analysis. Using the Python package spaCy, you'll perform NLP tasks like sentence splitting, tokenization, part-of-speech tagging, and parsing.
You will then explore key NLP applications. Using the scikit-learn and scispaCy Python packages, you'll apply text classification and named entity recognition (NER) to gain insights from medical texts. Finally, you'll advance to deep learning models, examining their application for healthcare tasks such as the de-identification of patient data. You will also consider the ethical implications of using such models, focusing on patient security and privacy. By the end of this course, you'll gain hands-on experience using NLP techniques to extract insights from healthcare data while also considering how to apply these methods ethically and responsibly.
Students must have intermediate proficiency in Python programming and machine learning to succeed in this course.
You are required to have completed the following courses or have equivalent experience before taking this course:
- Machine Learning in Healthcare
- Data Management in Healthcare
Key Course Takeaways
- Examine classical natural language processing methods, including sentence splitting, tokenization, part-of-speech tagging, and dependency parsing
- Examine NLP applications, including text classification and sequential labeling within the healthcare sector
- Explore BERT and generative AI as well as their applications within the healthcare sector
How It Works
Course Author
Who Should Enroll
- Data scientists
- Medical and health services managers
- Database and IT data architects
- Data engineers
- Digital transformation managers
- Clinicians with experience in informatics
- Biomedical and clinical informatics fellows
- Aspiring medical database managers or administrators
100% Online
cornell's Top Minds
career