014 Linux文件系统数据结构详解:地址空间struct address_space

什么是地址空间address_space?是Linux内核提供的一种数据结构,通过该数据结构可以管理离散到各设备上的数据映射到内存的page缓存页面。

听起来很晦涩,其实地址空间就是一个中间层,内核把周边离散的设备组织起来,然后映射到page缓存页面,内核子系统通过地址空间,就可以操作这些page缓存页面,进而达到操作周边设备的目的。

所以page就是内核管理的虚拟内存空间,它是真实物理设备的在内存上的映射,通过操作page页面,就可以完成对具体物理设备操作,这样做的好处是:

  • ① 屏蔽了多种设备的物理差异,避免内核子系统直接访问物理设备
  • ② 相比CPU操作外设的等待时间,内存操作更加高效,提升了整体性能
  • ③ 文件在设备上可能是不连续的,通过page页面屏蔽这个矛盾,通过连续的page呈现给程序

举例说明:一个文件的数据存储在块设备上,内核将块设备映射到缓存页面page,VFS通过地址空间就可以操作缓存页面page,完成文件的读取和写入。

地址空间和虚拟内存
地址空间和虚拟内存

一、数据结构

struct address_space结构在fs.h中声明,详细描述见下方代码和注释,这里针对重点字段做一个介绍:

  • host:该地址空间的owner,可以对应一个文件的inode或者一个块设备
  • i_pages:文件或块设备映射的页面Page
  • i_mmap_writable:表示缓存空间是否可写
  • nrpages:缓存page数
  • writeback_index:回写到设备起始索引
  • a_ops:地址空间操作表,完成页面读写等
// fs.h

/**
 * struct address_space - Contents of a cacheable, mappable object.
 * @host: Owner, either the inode or the block_device.
 * @i_pages: Cached pages.
 * @invalidate_lock: Guards coherency between page cache contents and
 *   file offset->disk block mappings in the filesystem during invalidates.
 *   It is also used to block modification of page cache contents through
 *   memory mappings.
 * @gfp_mask: Memory allocation flags to use for allocating pages.
 * @i_mmap_writable: Number of VM_SHARED mappings.
 * @nr_thps: Number of THPs in the pagecache (non-shmem only).
 * @i_mmap: Tree of private and shared mappings.
 * @i_mmap_rwsem: Protects @i_mmap and @i_mmap_writable.
 * @nrpages: Number of page entries, protected by the i_pages lock.
 * @writeback_index: Writeback starts here.
 * @a_ops: Methods.
 * @flags: Error bits and flags (AS_*).
 * @wb_err: The most recent error which has occurred.
 * @private_lock: For use by the owner of the address_space.
 * @private_list: For use by the owner of the address_space.
 * @private_data: For use by the owner of the address_space.
 */
struct address_space {
	struct inode		*host;
	struct xarray		i_pages;
	struct rw_semaphore	invalidate_lock;
	gfp_t			gfp_mask;
	atomic_t		i_mmap_writable;
#ifdef CONFIG_READ_ONLY_THP_FOR_FS
	/* number of thp, only for non-shmem files */
	atomic_t		nr_thps;
#endif
	struct rb_root_cached	i_mmap;
	struct rw_semaphore	i_mmap_rwsem;
	unsigned long		nrpages;
	pgoff_t			writeback_index;
	const struct address_space_operations *a_ops;
	unsigned long		flags;
	errseq_t		wb_err;
	spinlock_t		private_lock;
	struct list_head	private_list;
	void			*private_data;
} __attribute__((aligned(sizeof(long)))) __randomize_layout;

二、地址空间操作表a_ops

作为核心数据结构,地址空间也提供了操作表,这个操作表主要包含了对缓存页面page的操作,通过对page操作来完成设备的操作,以下是代码和详细讲解:

//fs.h

struct address_space_operations {
	int (*writepage)(struct page *page, struct writeback_control *wbc);
	int (*readpage)(struct file *, struct page *);

	/* Write back some dirty pages from this mapping. */
	int (*writepages)(struct address_space *, struct writeback_control *);

	/* Set a page dirty.  Return true if this dirtied it */
	int (*set_page_dirty)(struct page *page);

	/*
	 * Reads in the requested pages. Unlike ->readpage(), this is
	 * PURELY used for read-ahead!.
	 */
	int (*readpages)(struct file *filp, struct address_space *mapping,
			struct list_head *pages, unsigned nr_pages);
	void (*readahead)(struct readahead_control *);

	int (*write_begin)(struct file *, struct address_space *mapping,
				loff_t pos, unsigned len, unsigned flags,
				struct page **pagep, void **fsdata);
	int (*write_end)(struct file *, struct address_space *mapping,
				loff_t pos, unsigned len, unsigned copied,
				struct page *page, void *fsdata);

	/* Unfortunately this kludge is needed for FIBMAP. Don't use it */
	sector_t (*bmap)(struct address_space *, sector_t);
	void (*invalidatepage) (struct page *, unsigned int, unsigned int);
	int (*releasepage) (struct page *, gfp_t);
	void (*freepage)(struct page *);
	ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter);
	/*
	 * migrate the contents of a page to the specified target. If
	 * migrate_mode is MIGRATE_ASYNC, it must not block.
	 */
	int (*migratepage) (struct address_space *,
			struct page *, struct page *, enum migrate_mode);
	bool (*isolate_page)(struct page *, isolate_mode_t);
	void (*putback_page)(struct page *);
	int (*launder_page) (struct page *);
	int (*is_partially_uptodate) (struct page *, unsigned long,
					unsigned long);
	void (*is_dirty_writeback) (struct page *, bool *, bool *);
	int (*error_remove_page)(struct address_space *, struct page *);

	/* swapfile support */
	int (*swap_activate)(struct swap_info_struct *sis, struct file *file,
				sector_t *span);
	void (*swap_deactivate)(struct file *file);
};
字段字段含义详细说明
writepage写脏page到后端设备虚拟内存VM调用
readpage从后端设备读取数据到缓存页面虚拟内存VM调用
writepages把地址空间关联的页面写到后端设备虚拟内存VM调用
set_page_dirty设置脏页
readpages从后端设备读取数据到缓存页面用于read ahead
readahead
write_begin写准备动作主要流程:
1)准备缓存页面page,如果没有此页面,就创建一个新页面
2)给page创建buffer
write_end写结束动作,与write_begin成对出现1)将数据提交到buffer,同时标注脏bh
2)修改inode的i_size字段
3)解锁page,释放page
4)标注脏inode
bmap映射逻辑块Offset到物理块号参考:generic_block_bmap
invalidatepage
releasepage
freepage
direct_IODirect IO支持参考:blockdev_direct_IO
migratepage缓存页面迁移
isolate_page将一个页面标记为不可用,从而阻止对页面的访问和修改
putback_page用于将页面重新放回 LRU 列表中
launder_page用于缓存页面的清理和回收,通常发生在页面资源紧张情况下。
is_partially_uptodate是文件系统用于判断一个页面是否存在部分更新
is_dirty_writeback用于控制脏页回写机制该返回设置为1时,表示启用脏页回写机制;设置为0时,表示禁用该机制
error_remove_page
swap_activate读写swap page用于swapfile支持
swap_deactivateswap off文件时调用用于swapfile支持

参考资料:

官方文档:VFS

地址空间理解:Gorman关于VMM的地址空间讲解

《014 Linux文件系统数据结构详解:地址空间struct address_space》有一个想法

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注