CS 598: Machine Learning Algorithms for Large Language Models

Fall 2024

Course Description

This course is a general overview of machine learning algorithms used in the current development of large language models (LLMs). It covers a relatively broad range of topics, starting with mathematical models for sequence generation, and important neural network architectures with a focus on transformers. We will then investigate variants of transformer based language models, along with algorithms for prompt engineering and improving reasoning capability. Other topics include ML techniques used in studying LLM safety, hallucination, fine-tuning of LLMs, alignment (reinforcement learning from human feedback), multimodal LLMs, and common methods for accelerating training and inference.

Basic Information

Lectures

Lecture Topic
1 Introduction
2 Training and Optimization
3 Sequence Modeling
4 Transformer
5 State Space Model
6 Extensions of Transformer
7 Encoder-Only and Encoder-Decoder Model
8 Decoder-Only Model (GPT)
9 Emerging Ability and Scaling Law
10 Prompt Engineering
11 LLM Safety
12 Hallucination
13 Retrieval Augmented Generation
14 Instruction Tuning - Methods
15 Instruction Tuning - Data Aquisition
16 Fine-Tuning and Evaluation
17 Resource Efficient Finetuning
18 RLHF Basics
19 RLHF Algorithms
20 Open-source LLMs
21 LLM with Tools
22 Planning and Agents
23 LLM Data Generation and Distillation
24 Coding LLM
25 Math LLM
26 Multimodal Embedding
27 Multimodal LLMs
28 GPU Acceleration Techniques