Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. posts
  4. ›
  5. …

  6. ›
  7. 2 1 Pinned Memory

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🍯 Honey never spoils — archaeologists found 3,000-year-old jars still edible.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🐙 Octopuses have three hearts and blue blood.
Cover Image for Pinned Memory (Page-Locked Memory) in CUDA and GPU Computing

Pinned Memory (Page-Locked Memory) in CUDA and GPU Computing

Learn how pinned memory (page-locked memory) improves CPU-to-GPU data transfer performance in CUDA, deep learning, and high-performance AI workloads using direct memory access (DMA).

Hitesh Sahu
Written by Hitesh Sahu, a passionate developer and blogger.

Tue May 26 2026

Share This on

← Previous

NVIDIA Certified Associate Generative AI (NCA-GENL) Practice Questions

Next →

XGBoost (Extreme Gradient Boosting) Explained

Pinned Memory (Page-Locked Memory)

Pinned memory is host RAM locked in physical memory so the GPU can transfer data faster using direct memory access (DMA).

Pinned Memory (also called Page-Locked Memory) is a region of host RAM that the operating system is not allowed to swap out to disk.

It is commonly used in:

  • CUDA
  • GPU programming
  • high-performance computing
  • AI training pipelines

Pinned memory enables faster data transfer between:

  • CPU memory
  • GPU memory

Why AI Training Uses Pinned Memory

flowchart TD

    A[Dataset on CPU]

    A --> B[Pinned Memory Buffer]

    B --> C[GPU Training]

    C --> D[Model Forward Pass]

Reduces GPU idle time.

Why Pinned Memory Matters

Normally, operating systems can:

  • move memory pages
  • swap pages to disk

This creates overhead during GPU data transfer.

Pinned memory prevents this.

Core Idea

flowchart LR

    A[CPU RAM] -->|Transfer| B[GPU VRAM]

    A -.Page Locked.- C[OS Cannot Swap Memory]

Because memory remains fixed in physical RAM:

  • DMA transfers become faster
  • GPU transfer latency decreases

Pageable vs Pinned Memory

Feature Pageable Memory Pinned Memory
OS can swap Yes No
Transfer speed Slower Faster
Allocation cost Lower Higher
GPU DMA support Limited Full
Memory flexibility High Lower

Normal Pageable Memory

flowchart TD

    A[Application Memory]

    A --> B[Virtual Memory]

    B --> C[OS May Swap to Disk]

    C --> D[Slower GPU Transfer]

CUDA Pageable Memory Transfer

sequenceDiagram

    participant CPU as CPU RAM
    participant TMP as Temporary Pinned Buffer
    participant GPU as GPU

    CPU->>TMP: Copy to Temporary Buffer

    TMP->>GPU: Transfer to GPU

Extra copy operation reduces performance.

Pinned Memory Workflow

flowchart TD

    A[Allocate Pinned Memory]

    A --> B[Memory Locked in RAM]

    B --> C[Direct DMA Transfer]

    C --> D[Faster GPU Copy]

CUDA Pinned Memory Transfer

sequenceDiagram

    participant CPU as Pinned Memory
    participant GPU as GPU

    CPU->>GPU: Direct DMA Transfer

Direct transfer improves throughput.

DMA (Direct Memory Access)

Pinned memory allows GPU hardware to directly access system memory using DMA.

CPU RAM→DMA→GPU VRAM\text{CPU RAM} \rightarrow \text{DMA} \rightarrow \text{GPU VRAM}CPU RAM→DMA→GPU VRAM

Without CPU intervention during transfer.

Zero-Copy Memory

Pinned memory can enable:

GPU Direct Access to Host Memory\text{GPU Direct Access to Host Memory}GPU Direct Access to Host Memory

Known as:

  • Zero-copy memory access

Though slower than VRAM access.


Performance Benefit

Pinned memory significantly improves:

  • Host-to-device transfer
  • Device-to-host transfer
  • Streaming workloads

Especially for:

  • large tensors
  • AI model training
  • batch pipelines

AI / Deep Learning Usage

Pinned memory is heavily used in:

  • PyTorch
  • TensorFlow
  • CUDA dataloaders

Examples

CUDA Pinned Memory Allocation

Example:

cudaMallocHost((void**)&ptr, size);

This allocates page-locked host memory.

Memory Transfer Example

cudaMemcpy(device_ptr,
           host_ptr,
           size,
           cudaMemcpyHostToDevice);

Transfers become faster with pinned memory.

PyTorch Example

DataLoader(
    dataset,
    batch_size=32,
    pin_memory=True
)

This accelerates GPU training input pipelines.


Advantages

Advantage Description
Faster GPU transfer Lower latency
DMA support Efficient hardware transfer
Better throughput Improves training pipelines
Useful for streaming Real-time workloads

Limitations

Limitation Description
Higher allocation overhead More expensive allocation
Reduces OS flexibility RAM cannot be swapped
Excessive usage hurts system Can reduce overall performance
Limited resource Too much pinned memory is dangerous

Best Practices

Use pinned memory for:

  • Frequent GPU transfers
  • Large batch pipelines
  • Streaming data workloads

Avoid excessive allocation

Too much pinned memory:

  • reduces available pageable RAM
  • can slow down the operating system

Pinned Memory vs Unified Memory

Pinned Memory Unified Memory
Explicit memory management Automatic migration
Faster transfers Easier programming
More control Less optimization control
Common in HPC Common in simpler CUDA apps

AI-Infrastructure/2-1-Pinned-Memory
Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.