Introduction to Large Language Models (LLMs)

Overview

This journal-club presentation provides a comprehensive introduction to Large Language Models (LLMs), covering their background, modeling techniques, adaptation methods, and future directions.

Outline

LM Background: Evolution from traditional Language Models (LMs) to LLMs
LLM Modeling and Pre-training: Transformer architectures and training approaches
Adaptation to Downstream Tasks: Fine-tuning, prompting, and task-specific adaptations
Scaling and Modern LLMs: GPT-4, DeepSeek, and efficiency optimizations
Future Perspectives: Multi-modal models, scaling laws, and ethical concerns

Language Models (LMs)

LM Definition: Probability distribution over a sequence of tokens
Autoregressive LMs: Token generation based on prior context
Evolution to LLMs: N-gram models → RNNs → Transformers

Transformer-Based LLMs

Key Components: Self-attention, positional encoding, masked training
Architectures:
- Encoder-only (e.g., BERT, RoBERTa) – best for classification tasks
- Decoder-only (e.g., GPT-3, ChatGPT) – ideal for generative tasks
- Encoder-decoder (e.g., T5, BART) – used for translation and structured tasks

Adapting LLMs to Tasks

Supervised Fine-Tuning: Optimizing models for specific applications
Lightweight Fine-Tuning: Efficient tuning with minimal parameter updates (e.g., LoRA, BitFit)
Prompting Strategies: Zero-shot, one-shot, few-shot learning

Scaling Laws & Model Comparisons

GPT-4o: Multimodal expansion with 1.8T parameters and 128k token context window
DeepSeek-R1: Efficient MoE-based training with reduced GPU requirements
Comparison of LLMs: Trade-offs in efficiency, inference speed, and adaptability

Future of LLMs

Data Considerations: Privacy, fairness, contamination risks
Multi-Modality: Integration of text, images, and audio (e.g., CLIP, GPT-4V)
Beyond Transformers?: Exploring alternative architectures for next-gen AI

Resources

Courses:
- Stanford CS324 (Winter 2022)
- Stanford CS324 (Winter 2023)
Review Papers:
- Survey on LLMs (2023)
- Recent Advances in LLMs (2023)
Paper Lists & Blog Posts:
- Awesome LLM GitHub Repository
- Why Most LLMs are Decoder-Only
  Note: Many slides in this presentation were adapted from Changhao Shi.

Intro to LLMs

Intro to LLMs

Introduction to Large Language Models (LLMs)

Overview

Outline

Language Models (LMs)

Transformer-Based LLMs

Adapting LLMs to Tasks

Scaling Laws & Model Comparisons

Future of LLMs

Resources