Introduction to Large Language Models (LLMs)

Overview

This journal-club presentation provides a comprehensive introduction to Large Language Models (LLMs), covering their background, modeling techniques, adaptation methods, and future directions.

Outline

  • LM Background: Evolution from traditional Language Models (LMs) to LLMs
  • LLM Modeling and Pre-training: Transformer architectures and training approaches
  • Adaptation to Downstream Tasks: Fine-tuning, prompting, and task-specific adaptations
  • Scaling and Modern LLMs: GPT-4, DeepSeek, and efficiency optimizations
  • Future Perspectives: Multi-modal models, scaling laws, and ethical concerns

Language Models (LMs)

  • LM Definition: Probability distribution over a sequence of tokens
  • Autoregressive LMs: Token generation based on prior context
  • Evolution to LLMs: N-gram models → RNNs → Transformers

Transformer-Based LLMs

  • Key Components: Self-attention, positional encoding, masked training
  • Architectures:
    • Encoder-only (e.g., BERT, RoBERTa) – best for classification tasks
    • Decoder-only (e.g., GPT-3, ChatGPT) – ideal for generative tasks
    • Encoder-decoder (e.g., T5, BART) – used for translation and structured tasks

Adapting LLMs to Tasks

  • Supervised Fine-Tuning: Optimizing models for specific applications
  • Lightweight Fine-Tuning: Efficient tuning with minimal parameter updates (e.g., LoRA, BitFit)
  • Prompting Strategies: Zero-shot, one-shot, few-shot learning

Scaling Laws & Model Comparisons

  • GPT-4o: Multimodal expansion with 1.8T parameters and 128k token context window
  • DeepSeek-R1: Efficient MoE-based training with reduced GPU requirements
  • Comparison of LLMs: Trade-offs in efficiency, inference speed, and adaptability

Future of LLMs

  • Data Considerations: Privacy, fairness, contamination risks
  • Multi-Modality: Integration of text, images, and audio (e.g., CLIP, GPT-4V)
  • Beyond Transformers?: Exploring alternative architectures for next-gen AI

Resources