CSE 60556: Large Language Models

Description

This is a graduate-level elective course. It aims at graduate students who are interested in using and/or developing large language model (LLMs) techniques. It is designed for those who have knowledge and programming experience of machine learning. This course introduces tasks and datasets related to language models (LMs), LLM architectures, LLM training techniques, reasoning methods, knowledge augmentation methods, efficient LLM methods, various LLM applications (e.g., assistant, education, healthcare, RecSys, planning), and challenges in LLMs for social good. Specifically, we talk about popular LLM concepts such as Scaling Law, GPT, RLHF, ICL, IFT, CoT, RAG, PEFT, Agent, Hallucination, and Trustworthiness. We will cover each topic and discuss the concepts in depth. Students will be expected to routinely read and present research papers and complete a research project at the end. In the project, students attempt to reimplement and improve upon a research paper in a topic of their choosing.

Instructor

Meng Jiang

Prerequisites

Received credits with graduate student status from at least one course below: CSE 60625 Machine Learning, CSE 60647 Data Science, CSE 60657 Natural Language Processing, CSE 60868 Neural Networks; or highly-related graduate-level courses at another accredited university.

Course Topics

Why language models: Tasks and benchmarks
Neural language models
Programming language models
GPTs: GPT-1, GPT-2, scaling law, GPT3, and prompt engineering
LLM training: RLHF, IFT, open-weight LLMs, and foundation models
Reasoning: Tasks and benchmarks, CoT and its family
RAG: Retrieval-based LM, retrieval augmentation, generative retrieval, knowledge augmentation, memory augmentation
Efficient LM: PEFT, distillation, quantization, and long-context LLM
LLM applications: Agent and assistant, RecSys, planning, education, healthcare, and strutured data
LLM for social good: Hallucination and trustworthy