Sumanta Basu is an Assistant Professor in the Department of Statistics and Data Science at Cornell University. Broadly, his research interests are structure learning and the prediction of large systems from data, with a particular emphasis on developing learning algorithms for time series data. Professor Basu also collaborates with biological and social scientists on a wide range of problems, including genomics, large-scale metabolomics, and systemic risk monitoring in financial markets. His research is supported by multiple awards from the National Science Foundation and the National Institutes of Health. At Cornell, Professor Basu teaches “Introductory Statistics” for graduate students outside the Statistics Department and “Computational Statistics” for Statistics Ph.D. students. He also serves as a faculty consultant at Cornell Statistical Consulting Unit, which assists the broader Cornell community with various aspects of analyzing empirical research. Professor Basu received his Ph.D. from the University of Michigan and was a postdoctoral scholar at the University of California, Berkeley, and Lawrence Berkeley National Laboratory. Before he received his Ph.D, Professor Basu was a business analyst, working with large retail companies on the design and data analysis of their promotional campaigns.
Exploring Summarization and VisualizationCornell Course
Course Overview
Summarizing and visualizing text data is a key skill for professionals looking to uncover meaningful insights from large volumes of information. In this course, you will master the tools and techniques to condense and display text data, making complex patterns easier to interpret.
Starting with the tidytext package in R, you will tokenize unstructured text data and convert it into structured data for analysis. You will then summarize word distributions within individual documents and bring them to life with visualizations like word clouds. As you progress, you will explore advanced techniques for summarizing and comparing text across multiple documents, using tools such as document-feature matrices.
By the end of the course, you will have the skills to compare word usage across texts and track how language patterns evolve over time, helping you reveal deeper trends in your data.
You are required to have completed the following course or have equivalent experience before taking this course:
- Mastering NLP Fundamentals
Key Course Takeaways
- Apply the tidytext R package to tokenize and analyze text
- Summarize and visualize text data within a single text document
- Select advanced text processing techniques for summarizing and visualizing text data across multiple documents

How It Works
Course Authors
Sreyoshi Das designs and offers courses on the applications of statistics and data science in the industry, with specific emphasis in the areas of economics and finance. Her courses aim to integrate academic training with hands-on work experience.
Before joining Cornell in 2022, Professor Das worked in economic consulting, where she developed a variety of quantitative and qualitative analyses to support testifying experts, client attorneys, government agencies, and corporations. In 2017, Professor Das received her Ph.D. in Economics from the University of Michigan, where she conducted research on banking and systemic risk, financial markets in emerging economies, and behavioral macroeconomics.
Who Should Enroll
- Data scientists
- Computer scientists
- Analysts
- User behavior and UX teams
- Researchers
- Social scientists
100% Online
cornell's Top Minds
career