포스트

Open-Channel SSD Series (3): pblk Read Path — Kernel Code Analysis

Kernel source code walkthrough of the pblk read path in the LightNVM Open-Channel SSD subsystem — from bio submission to block-layer dispatch.

Open-Channel SSD Series (3): pblk Read Path — Kernel Code Analysis

Related paper: Bjørling et al., “LightNVM: The Linux Open-Channel SSD Subsystem,” FAST’17.

pblk: Physical Block Device Target

pblk implements a fully associative, host-based FTL that exposes a traditional block I/O interface. Its primary responsibilities are:

  • Map logical addresses onto physical addresses (4 KB granularity) in an L2P table.
  • Maintain L2P table integrity and support recovery from normal shutdown and power outage.
  • Deal with controller- and media-specific constraints.
  • Handle I/O errors.
  • Implement garbage collection.
  • Maintain consistency across synchronization points in the I/O stack.

For more information: http://lightnvm.io

Source Code Overview

Most of LightNVM’s core functionality is implemented in drivers/lightnvm/ within the kernel source tree. Key files:

FilePurpose
pblk.hMain header for the pblk target
rrpc.h, rrpc.cRound-robin page-based hybrid FTL
pblk-cache.cpblk’s write cache
pblk-core.cCore functionality
pblk-gc.cGarbage collector
pblk-init.cInitialization
pblk-map.cLBA → PPA mapping strategy
pblk-rb.cWrite ring buffer
pblk-read.cRead path
pblk-recovery.cRecovery path
pblk-rl.cRate limiter for user I/O
pblk-sysfs.csysfs interface
pblk-write.cWrite path (buffer → media)

NVMe block device creation is implemented in core.c. The FTL functions (write buffering, address mapping, garbage collection) reside in the pblk-* files.

Read Path Overview

The read path starts when the file system submits a bio. The entry point is pblk_make_rq, which is registered as the make_rq callback:

1
2
3
4
5
6
7
static blk_qc_t pblk_make_rq(struct request_queue *q, struct bio *bio)
{
    ...
    switch (pblk_rw_io(q, pblk, bio)) {
        ...
    }
}

pblk_rw_io dispatches reads and writes to their respective handlers:

1
2
3
4
5
6
7
8
9
10
11
12
13
static int pblk_rw_io(struct request_queue *q, struct pblk *pblk,
                       struct bio *bio)
{
    if (bio_data_dir(bio) == READ) {
        ...
        ret = pblk_submit_read(pblk, bio);
        ...
        return ret;
    }

    // else → write path
    ...
}

pblk_submit_read

This function builds an nvm_rq structure (containing the bio and PPA addresses), performs LBA → PPA translation, and submits the read I/O:

1
2
3
4
5
6
7
8
9
10
11
12
13
int pblk_submit_read(struct pblk *pblk, struct bio *bio)
{
    ...
    // Build the rqd structure (bio + ppa)

    // LBA to PPA translation
    if (nr_secs > 1)
        pblk_read_ppalist_rq(pblk, rqd, blba, &read_bitmap);
    else
        pblk_read_rq(pblk, rqd, blba, &read_bitmap);
    ...
    ret = pblk_submit_read_io(pblk, rqd);
}

Address Translation (L2P Lookup)

Both pblk_read_ppalist_rq and pblk_read_rq call pblk_lookup_l2p_seq to translate logical addresses. If the data is found in the write buffer (cache), it is read from there directly:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
pblk_read_ppalist_rq(pblk, rqd, blba, &read_bitmap)
{
    ...
    // LBA → PPA translation
    pblk_lookup_l2p_seq(pblk, &ppa, lba, 1);
    ...
    // If data is in the write buffer, read from cache
    if (pblk_addr_in_cache(ppa)) {
        if (!pblk_read_from_cache(pblk, bio, lba, ppa, 0, 1)) {
            pblk_lookup_l2p_seq(pblk, &ppa, lba, 1);
            goto retry;
        }
    } else {
        rqd->ppa_addr = ppa;
    }
    ...
}

The L2P table is accessed under a spin lock to ensure thread safety:

1
2
3
4
5
6
7
8
9
10
11
12
void pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
                          u64 *lba_list, int nr_secs)
{
    ...
    spin_lock(&pblk->trans_lock);
    for (i = 0; i < nr_secs; i++) {
        lba = lba_list[i];
        ...
        ppas[i] = pblk_trans_map_get(pblk, lba);
    }
    spin_unlock(&pblk->trans_lock);
}

pblk_trans_map_get

The trans_map inside the pblk structure holds the L2P mapping table. This function returns the physical page address for a given logical block address:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
static inline struct ppa_addr pblk_trans_map_get(struct pblk *pblk,
                                                  sector_t lba)
{
    struct ppa_addr ppa;

    if (pblk->ppaf_bitsize < 32) {
        u32 *map = (u32 *)pblk->trans_map;
        ppa = pblk_ppa32_to_ppa64(pblk, map[lba]);
    } else {
        struct ppa_addr *map = (struct ppa_addr *)pblk->trans_map;
        ppa = map[lba];
    }

    return ppa;
}

I/O Submission

After translation, the I/O is submitted through the NVMe stack:

1
2
3
4
5
6
static int pblk_submit_read_io(struct pblk *pblk, struct nvm_rq *rqd)
{
    ...
    err = pblk_submit_io(pblk, rqd);
    ...
}

pblk_submit_io checks for bad PPAs (under a spin lock) and then calls:

1
2
3
4
5
6
7
8
9
10
11
int pblk_submit_io(struct pblk *pblk, struct nvm_rq *rqd)
{
    ...
    for (i = 0; i < rqd->nr_ppas; i++) {
        spin_lock(&line->lock);
        // bad PPA check
        spin_unlock(&line->lock);
    }
    ...
    return nvm_submit_io(dev, rqd);
}

This invokes the NVMe driver’s submit_io callback:

1
2
3
4
5
6
int nvm_submit_io(struct nvm_tgt_dev *tgt_dev, struct nvm_rq *rqd)
{
    ...
    ret = dev->ops->submit_io(dev, rqd);
    ...
}

Which is mapped to nvme_nvm_submit_io:

1
2
3
4
5
static int nvme_nvm_submit_io(struct nvm_dev *dev, struct nvm_rq *rqd)
{
    ...
    blk_execute_rq_nowait(q, NULL, rq, 0, nvme_nvm_end_io);
}

Block Layer Dispatch

blk_execute_rq_nowait inserts the request into the block I/O scheduler queue for asynchronous execution:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
void blk_execute_rq_nowait(struct request_queue *q,
                            struct gendisk *bd_disk,
                            struct request *rq, int at_head,
                            rq_end_io_fn *done)
{
    ...
    if (q->mq_ops) {
        blk_mq_sched_insert_request(rq, at_head, true, false, false);
        return;
    }
    spin_lock_irq(q->queue_lock);
    ...
    __elv_add_request(q, rq, where);
    __blk_run_queue(q);
    spin_unlock_irq(q->queue_lock);
}

Finally, __blk_run_queue_uncond invokes the registered request_fn. Note that multiple threads may run this function concurrently, which is why the active invocation count is tracked:

1
2
3
4
5
6
7
inline void __blk_run_queue_uncond(struct request_queue *q)
{
    ...
    q->request_fn_active++;
    q->request_fn(q);
    q->request_fn_active--;
}

2026 Update Note

  • This post was migrated from the original blog and language-polished in 2026.
  • The code references are based on Linux kernel ~4.12–4.14. Modern kernels have since adopted blk-mq (multi-queue) and the legacy single-queue path shown here has been removed. The pblk subsystem itself has been superseded by NVMe ZNS.
이 기사는 저작권자의 CC BY 4.0 라이센스를 따릅니다.