CSE 60556: Large Language Models
Description
This is a graduate-level elective course. It aims at graduate students who are interested in using and/or developing large language model (LLMs) techniques. It is designed for those who have knowledge and programming experience of machine learning. This course introduces tasks and datasets related to language models (LMs), LLM architectures, LLM training techniques, reasoning methods, knowledge augmentation methods, efficient LLM methods, various LLM applications (e.g., assistant, education, healthcare, RecSys, planning), and challenges in LLMs for social good. Specifically, we talk about popular LLM concepts such as Scaling Law, GPT, RLHF, ICL, IFT, CoT, RAG, PEFT, Agent, Hallucination, and Trustworthiness. We will cover each topic and discuss the concepts in depth. Students will be expected to routinely read and present research papers and complete a research project at the end. In the project, students attempt to reimplement and improve upon a research paper in a topic of their choosing.
Instructor
Prerequisites
Received credits with graduate student status from at least one course below: CSE 60625 Machine Learning, CSE 60647 Data Science, CSE 60657 Natural Language Processing, CSE 60868 Neural Networks; or highly-related graduate-level courses at another accredited university.
Course Topics
- Why language models: Tasks and benchmarks
- Neural language models
- Programming language models
- GPTs: GPT-1, GPT-2, scaling law, GPT3, and prompt engineering
- LLM training: RLHF, IFT, open-weight LLMs, and foundation models
- Reasoning: Tasks and benchmarks, CoT and its family
- RAG: Retrieval-based LM, retrieval augmentation, generative retrieval, knowledge augmentation, memory augmentation
- Efficient LM: PEFT, distillation, quantization, and long-context LLM
- LLM applications: Agent and assistant, RecSys, planning, education, healthcare, and strutured data
- LLM for social good: Hallucination and trustworthy