什么是地址空间address_space?是Linux内核提供的一种数据结构,通过该数据结构可以管理离散到各设备上的数据映射到内存的page缓存页面。
听起来很晦涩,其实地址空间就是一个中间层,内核把周边离散的设备组织起来,然后映射到page缓存页面,内核子系统通过地址空间,就可以操作这些page缓存页面,进而达到操作周边设备的目的。
所以page就是内核管理的虚拟内存空间,它是真实物理设备的在内存上的映射,通过操作page页面,就可以完成对具体物理设备操作,这样做的好处是:
- ① 屏蔽了多种设备的物理差异,避免内核子系统直接访问物理设备
- ② 相比CPU操作外设的等待时间,内存操作更加高效,提升了整体性能
- ③ 文件在设备上可能是不连续的,通过page页面屏蔽这个矛盾,通过连续的page呈现给程序
举例说明:一个文件的数据存储在块设备上,内核将块设备映射到缓存页面page,VFS通过地址空间就可以操作缓存页面page,完成文件的读取和写入。
一、数据结构
struct address_space结构在fs.h中声明,详细描述见下方代码和注释,这里针对重点字段做一个介绍:
- host:该地址空间的owner,可以对应一个文件的inode或者一个块设备
- i_pages:文件或块设备映射的页面Page
- i_mmap_writable:表示缓存空间是否可写
- nrpages:缓存page数
- writeback_index:回写到设备起始索引
- a_ops:地址空间操作表,完成页面读写等
// fs.h
/**
* struct address_space - Contents of a cacheable, mappable object.
* @host: Owner, either the inode or the block_device.
* @i_pages: Cached pages.
* @invalidate_lock: Guards coherency between page cache contents and
* file offset->disk block mappings in the filesystem during invalidates.
* It is also used to block modification of page cache contents through
* memory mappings.
* @gfp_mask: Memory allocation flags to use for allocating pages.
* @i_mmap_writable: Number of VM_SHARED mappings.
* @nr_thps: Number of THPs in the pagecache (non-shmem only).
* @i_mmap: Tree of private and shared mappings.
* @i_mmap_rwsem: Protects @i_mmap and @i_mmap_writable.
* @nrpages: Number of page entries, protected by the i_pages lock.
* @writeback_index: Writeback starts here.
* @a_ops: Methods.
* @flags: Error bits and flags (AS_*).
* @wb_err: The most recent error which has occurred.
* @private_lock: For use by the owner of the address_space.
* @private_list: For use by the owner of the address_space.
* @private_data: For use by the owner of the address_space.
*/
struct address_space {
struct inode *host;
struct xarray i_pages;
struct rw_semaphore invalidate_lock;
gfp_t gfp_mask;
atomic_t i_mmap_writable;
#ifdef CONFIG_READ_ONLY_THP_FOR_FS
/* number of thp, only for non-shmem files */
atomic_t nr_thps;
#endif
struct rb_root_cached i_mmap;
struct rw_semaphore i_mmap_rwsem;
unsigned long nrpages;
pgoff_t writeback_index;
const struct address_space_operations *a_ops;
unsigned long flags;
errseq_t wb_err;
spinlock_t private_lock;
struct list_head private_list;
void *private_data;
} __attribute__((aligned(sizeof(long)))) __randomize_layout;
二、地址空间操作表a_ops
作为核心数据结构,地址空间也提供了操作表,这个操作表主要包含了对缓存页面page的操作,通过对page操作来完成设备的操作,以下是代码和详细讲解:
//fs.h
struct address_space_operations {
int (*writepage)(struct page *page, struct writeback_control *wbc);
int (*readpage)(struct file *, struct page *);
/* Write back some dirty pages from this mapping. */
int (*writepages)(struct address_space *, struct writeback_control *);
/* Set a page dirty. Return true if this dirtied it */
int (*set_page_dirty)(struct page *page);
/*
* Reads in the requested pages. Unlike ->readpage(), this is
* PURELY used for read-ahead!.
*/
int (*readpages)(struct file *filp, struct address_space *mapping,
struct list_head *pages, unsigned nr_pages);
void (*readahead)(struct readahead_control *);
int (*write_begin)(struct file *, struct address_space *mapping,
loff_t pos, unsigned len, unsigned flags,
struct page **pagep, void **fsdata);
int (*write_end)(struct file *, struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata);
/* Unfortunately this kludge is needed for FIBMAP. Don't use it */
sector_t (*bmap)(struct address_space *, sector_t);
void (*invalidatepage) (struct page *, unsigned int, unsigned int);
int (*releasepage) (struct page *, gfp_t);
void (*freepage)(struct page *);
ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter);
/*
* migrate the contents of a page to the specified target. If
* migrate_mode is MIGRATE_ASYNC, it must not block.
*/
int (*migratepage) (struct address_space *,
struct page *, struct page *, enum migrate_mode);
bool (*isolate_page)(struct page *, isolate_mode_t);
void (*putback_page)(struct page *);
int (*launder_page) (struct page *);
int (*is_partially_uptodate) (struct page *, unsigned long,
unsigned long);
void (*is_dirty_writeback) (struct page *, bool *, bool *);
int (*error_remove_page)(struct address_space *, struct page *);
/* swapfile support */
int (*swap_activate)(struct swap_info_struct *sis, struct file *file,
sector_t *span);
void (*swap_deactivate)(struct file *file);
};
字段 | 字段含义 | 详细说明 |
writepage | 写脏page到后端设备 | 虚拟内存VM调用 |
readpage | 从后端设备读取数据到缓存页面 | 虚拟内存VM调用 |
writepages | 把地址空间关联的页面写到后端设备 | 虚拟内存VM调用 |
set_page_dirty | 设置脏页 | |
readpages | 从后端设备读取数据到缓存页面 | 用于read ahead |
readahead | ||
write_begin | 写准备动作 | 主要流程: 1)准备缓存页面page,如果没有此页面,就创建一个新页面 2)给page创建buffer |
write_end | 写结束动作,与write_begin成对出现 | 1)将数据提交到buffer,同时标注脏bh 2)修改inode的i_size字段 3)解锁page,释放page 4)标注脏inode |
bmap | 映射逻辑块Offset到物理块号 | 参考:generic_block_bmap |
invalidatepage | ||
releasepage | ||
freepage | ||
direct_IO | Direct IO支持 | 参考:blockdev_direct_IO |
migratepage | 缓存页面迁移 | |
isolate_page | 将一个页面标记为不可用,从而阻止对页面的访问和修改 | |
putback_page | 用于将页面重新放回 LRU 列表中 | |
launder_page | 用于缓存页面的清理和回收,通常发生在页面资源紧张情况下。 | |
is_partially_uptodate | 是文件系统用于判断一个页面是否存在部分更新 | |
is_dirty_writeback | 用于控制脏页回写机制 | 该返回设置为1时,表示启用脏页回写机制;设置为0时,表示禁用该机制 |
error_remove_page | ||
swap_activate | 读写swap page | 用于swapfile支持 |
swap_deactivate | swap off文件时调用 | 用于swapfile支持 |
参考资料:
官方文档:VFS
地址空间理解:Gorman关于VMM的地址空间讲解
《014 Linux文件系统数据结构详解:地址空间struct address_space》有一个想法