Natural Language Processing (NLP) is a branch of Artificial Intelligence that enables computers to process, analyze, and understand text and speech. By introducing students to the fundamental concepts behind language models, including how GPT-style systems are trained to generate text and respond to prompts, students will gain an understanding of how computers process and generate human language. The course will also examine how factors such as bias, context, and data quality influence NLP performance.
Exploring fundamental concepts behind language models, students will engage with a variety of methods including:
- tokenization, the process of breaking down a piece of text into smaller pieces, like words or phrases, so that computers can understand it
- text classification, which categorizes text into different groups based on its content, such as identifying whether an email is spam or not
- sentiment analysis, which helps determine the emotions expressed in a piece of text, such as whether a review is positive or negative
- text generation, which creates new text using patterns learned from existing data.
Through hands-on programming in Python and guided use of the Hugging Face Transformers library, students will begin to develop the proficiency to apply pretrained models to real-world language tasks. By the end of the course, students will have the foundation to begin to build, evaluate, and responsibly deploy language-based Artificial Intelligent systems that communicate effectively and ethically, preparing them for future studies in computer science and engineering.