02 ext2如何构造超级块super_block

上一篇介绍了ext2注册文件系统注册的流程,其中提到在注册文件系统时,给VFS传递了file_system_type结构体,在这个结构体中,包含了挂载函数指针ext2_mount(),用于挂载的回调操作。

在ext2_mount()函数实现中,调用了VFS的mount_bdev()来挂载块设备,这个mount_bdev()函数最后一个参数,是一个函数指针,VFS使用此函数完成超级块的构造和填充,而超级块是文件系统核心数据结构,它的构造是构建文件系统主要流程,本文就重点梳理一下超级块的构造,以下是mount_bdev()函数原型:

dentry *mount_bdev(struct file_system_type *fs_type,
	int flags, const char *dev_name, void *data,
	int (*fill_super)(struct super_block *, void *, int));

一、先说数据结构

在说超级块的构造之前,有必要说一下超级块数据结构,前面章节介绍过,一般情况下,超级块结构有如下三种形态:

  • 磁盘上超级块结构,持久化结构(struct ext2_super_block)
  • 内存中缓存的超级块结构体(struct ext2_sb_info)
  • 文件系统挂载后,VFS使用的超级块(struct super_block)

ext2文件系统也不例外,也存在三种形态的超级块,并且这三种超级块类型,通过字段可以关联起来,这是文件系统实现的常规做法。

首先,VFS超级块super_block通过s_fs_info域可以链接到内存超级块,在内存超级块的结构中s_es域链接到持久化的超级块,这样就把三个结构体串联起来。

在Linux系统中,文件系统数据流转过程,传递的都是VFS的超级块,通过这个超级块就可以获取其他两个形态的结构,非常的方便。

ext2的三个超级块
ext2的三个超级块

VFS使用的超级块,是所有文件系统公用的,前面章节已经介绍过,所以本篇博客只介绍ext2特有的两个结构,持久化结构和内存中结构。

1.持久化超级块:ext2_super_block

以下代码是ext2的磁盘超级块结构,展示了超级块在磁盘上的存储格式,包含了整个ext2文件系统的信息:inode数量、block数量、空闲的block数量、空闲的inode数量、块组信息、碎片信息、第一个数据block、group信息等

/*
 * Structure of the super block
 */
struct ext2_super_block {
	__le32	s_inodes_count;		/* Inodes count */
	__le32	s_blocks_count;		/* Blocks count */
	__le32	s_r_blocks_count;	/* Reserved blocks count */
	__le32	s_free_blocks_count;	/* Free blocks count */
	__le32	s_free_inodes_count;	/* Free inodes count */
	__le32	s_first_data_block;	/* First Data Block */
	__le32	s_log_block_size;	/* Block size */
	__le32	s_log_frag_size;	/* Fragment size */
	__le32	s_blocks_per_group;	/* # Blocks per group */
	__le32	s_frags_per_group;	/* # Fragments per group */
	__le32	s_inodes_per_group;	/* # Inodes per group */
	__le32	s_mtime;		/* Mount time */
	__le32	s_wtime;		/* Write time */
	__le16	s_mnt_count;		/* Mount count */
	__le16	s_max_mnt_count;	/* Maximal mount count */
	__le16	s_magic;		/* Magic signature */
	__le16	s_state;		/* File system state */
	__le16	s_errors;		/* Behaviour when detecting errors */
	__le16	s_minor_rev_level; 	/* minor revision level */
	__le32	s_lastcheck;		/* time of last check */
	__le32	s_checkinterval;	/* max. time between checks */
	__le32	s_creator_os;		/* OS */
	__le32	s_rev_level;		/* Revision level */
	__le16	s_def_resuid;		/* Default uid for reserved blocks */
	__le16	s_def_resgid;		/* Default gid for reserved blocks */
	/*
	 * These fields are for EXT2_DYNAMIC_REV superblocks only.
	 *
	 * Note: the difference between the compatible feature set and
	 * the incompatible feature set is that if there is a bit set
	 * in the incompatible feature set that the kernel doesn't
	 * know about, it should refuse to mount the filesystem.
	 * 
	 * e2fsck's requirements are more strict; if it doesn't know
	 * about a feature in either the compatible or incompatible
	 * feature set, it must abort and not try to meddle with
	 * things it doesn't understand...
	 */
	__le32	s_first_ino; 		/* First non-reserved inode */
	__le16   s_inode_size; 		/* size of inode structure */
	__le16	s_block_group_nr; 	/* block group # of this superblock */
	__le32	s_feature_compat; 	/* compatible feature set */
	__le32	s_feature_incompat; 	/* incompatible feature set */
	__le32	s_feature_ro_compat; 	/* readonly-compatible feature set */
	__u8	s_uuid[16];		/* 128-bit uuid for volume */
	char	s_volume_name[16]; 	/* volume name */
	char	s_last_mounted[64]; 	/* directory where last mounted */
	__le32	s_algorithm_usage_bitmap; /* For compression */
	/*
	 * Performance hints.  Directory preallocation should only
	 * happen if the EXT2_COMPAT_PREALLOC flag is on.
	 */
	__u8	s_prealloc_blocks;	/* Nr of blocks to try to preallocate*/
	__u8	s_prealloc_dir_blocks;	/* Nr to preallocate for dirs */
	__u16	s_padding1;
	/*
	 * Journaling support valid if EXT3_FEATURE_COMPAT_HAS_JOURNAL set.
	 */
	__u8	s_journal_uuid[16];	/* uuid of journal superblock */
	__u32	s_journal_inum;		/* inode number of journal file */
	__u32	s_journal_dev;		/* device number of journal file */
	__u32	s_last_orphan;		/* start of list of inodes to delete */
	__u32	s_hash_seed[4];		/* HTREE hash seed */
	__u8	s_def_hash_version;	/* Default hash version to use */
	__u8	s_reserved_char_pad;
	__u16	s_reserved_word_pad;
	__le32	s_default_mount_opts;
 	__le32	s_first_meta_bg; 	/* First metablock block group */
	__u32	s_reserved[190];	/* Padding to the end of the block */
};

2.内存超级块:ext2_sb_info

以下是内存中超级块的结构,与磁盘超级块类似,包含了inode、block块信息、计数器等等。

/*
 * second extended-fs super-block data in memory
 */
struct ext2_sb_info {
	unsigned long s_frag_size;	/* Size of a fragment in bytes */
	unsigned long s_frags_per_block;/* Number of fragments per block */
	unsigned long s_inodes_per_block;/* Number of inodes per block */
	unsigned long s_frags_per_group;/* Number of fragments in a group */
	unsigned long s_blocks_per_group;/* Number of blocks in a group */
	unsigned long s_inodes_per_group;/* Number of inodes in a group */
	unsigned long s_itb_per_group;	/* Number of inode table blocks per group */
	unsigned long s_gdb_count;	/* Number of group descriptor blocks */
	unsigned long s_desc_per_block;	/* Number of group descriptors per block */
	unsigned long s_groups_count;	/* Number of groups in the fs */
	unsigned long s_overhead_last;  /* Last calculated overhead */
	unsigned long s_blocks_last;    /* Last seen block count */
	struct buffer_head * s_sbh;	/* Buffer containing the super block */
	struct ext2_super_block * s_es;	/* Pointer to the super block in the buffer */
	struct buffer_head ** s_group_desc;
	unsigned long  s_mount_opt;
	unsigned long s_sb_block;
	kuid_t s_resuid;
	kgid_t s_resgid;
	unsigned short s_mount_state;
	unsigned short s_pad;
	int s_addr_per_block_bits;
	int s_desc_per_block_bits;
	int s_inode_size;
	int s_first_ino;
	spinlock_t s_next_gen_lock;
	u32 s_next_generation;
	unsigned long s_dir_count;
	u8 *s_debts;
	struct percpu_counter s_freeblocks_counter;
	struct percpu_counter s_freeinodes_counter;
	struct percpu_counter s_dirs_counter;
	struct blockgroup_lock *s_blockgroup_lock;
	/* root of the per fs reservation window tree */
	spinlock_t s_rsv_window_lock;
	struct rb_root s_rsv_window_root;
	struct ext2_reserve_window_node s_rsv_window_head;
	/*
	 * s_lock protects against concurrent modifications of s_mount_state,
	 * s_blocks_last, s_overhead_last and the content of superblock's
	 * buffer pointed to by sbi->s_es.
	 *
	 * Note: It is used in ext2_show_options() to provide a consistent view
	 * of the mount options.
	 */
	spinlock_t s_lock;
	struct mb_cache *s_ea_block_cache;
	struct dax_device *s_daxdev;
};

二、构造(填充)超级块

文章开头已经说明,在挂载ext2文件系统时,VFS将会回调ext2_mount函数,在ext2_mount()函数中,调用VFS的mount_bdev()函数,携带了填充超级块的函数指针ext2_fill_super,VFS挂载后将会使用此函数构造超级块,函数指针原型如下:

/*
 * param1:VFS超级块对象
 * param2:mount挂载时,携带的选项数据
 * param3:未使用
 */
int ext2_fill_super(struct super_block *sb, void *data, int silent)

这个填充函数ext2_fill_super(),主要实现了几个功能:填充了VFS的超级块sb、构造了内存超级块sbi、构造持久化超级块es,具体流程见下面的分解动作:

1.获取超级块起始位置*

下面的代码是解析mount选项(get_sb_block),确认是否携带了sb=xxx参数,这个参数是指定超级块位置,单位是K。如果sb=1,表示从1K开始是超级块位置,sb的默认值就是1,也就是第二个1024位置。第一个1024位置是磁盘引导块,文件系统不使用。

unsigned long sb_block = get_sb_block(&data);
超级块位置
超级块位置

2.读取磁盘超级块***

调用sb_bread()函数,从磁盘上读取超级块,一共传递了三个非常重要的参数:

  • sb->s_bdev:块设备信息,这是挂载操作指定的块
  • block:块号,也就是起始块,超级块默认是1
  • sb->s_blocksize:块大小,默认是1024字节

读取磁盘超级块到内存之后,在首地址加了一个offset,其实offset默认值为0,如果超级块在特殊偏移位置,可以使用offset。

然后,把超级块内存首地址给了sbi的s_es域,同时获取魔术数给VFS的超级块。

if (!(bh = sb_bread(sb, logic_sb_block))) {
	ext2_msg(sb, KERN_ERR, "error: unable to read superblock");
	goto failed_sbi;
}
/*
 * Note: s_es must be initialized as soon as possible because
 *       some ext2 macro-instructions depend on its value
 */
es = (struct ext2_super_block *) (((char *)bh->b_data) + offset);
sbi->s_es = es;
sb->s_magic = le16_to_cpu(es->s_magic);

/*
 * Read a block from bdev
 */
static inline struct buffer_head *
sb_bread(struct super_block *sb, sector_t block)
{
	return __bread_gfp(sb->s_bdev, block, sb->s_blocksize, __GFP_MOVABLE);
}

3.初始化内存和VFS超级块信息1

这一部分主要是根据磁盘超级块信息,来初始化内存超级块,主要内容包括:

  • 各种挂载选项(es->s_default_mount_opts)
  • 块用户ID和组ID(es->s_def_resuid,es->s_def_resgid)
  • 检查s_rev_level、features、只读属性

这一部分代码较为简单,这里不做详细展开。

4.再次读取超级块

为什么要重新读取超级块呢,如果逻辑块大小与之前计算的不一致,需要重新计算超级块块号和偏移量,然后调用sb_bread()重新读取,整体逻辑与第2步中类似,此处不做过多赘述。

blocksize = BLOCK_SIZE << le32_to_cpu(sbi->s_es->s_log_block_size);


/* If the blocksize doesn't match, re-read the thing.. */
if (sb->s_blocksize != blocksize) {
	brelse(bh);

	if (!sb_set_blocksize(sb, blocksize)) {
		ext2_msg(sb, KERN_ERR,
			"error: bad blocksize %d", blocksize);
		goto failed_sbi;
	}

	logic_sb_block = (sb_block*BLOCK_SIZE) / blocksize;
	offset = (sb_block*BLOCK_SIZE) % blocksize;
	bh = sb_bread(sb, logic_sb_block);
	if(!bh) {
		ext2_msg(sb, KERN_ERR, "error: couldn't read"
			"superblock on 2nd try");
		goto failed_sbi;
	}
	es = (struct ext2_super_block *) (((char *)bh->b_data) + offset);
	sbi->s_es = es;
	if (es->s_magic != cpu_to_le16(EXT2_SUPER_MAGIC)) {
		ext2_msg(sb, KERN_ERR, "error: magic mismatch");
		goto failed_mount;
	}
}

5.初始化内存和VFS超级块信息2

第4步获取准确的逻辑块大小,读取了准确的超级块信息,接下来把这些信息赋值给内存超级块和VFS超级块:

  • sb->s_maxbytes:计算ext2支持最大文件大小
  • sb->s_max_links:最大支持的链接数:32000
  • sbi->s_inode_size:单个Inode结构大小
  • sbi->s_first_ino:第一个使用的inode编号(非系统保留的inode)
  • sbi->s_frag_size:单个碎片大小,最小默认为1024字节
  • sbi->s_frags_per_block:使用逻辑块大小除以碎片大小
  • sbi->s_blocks_per_group:每个块组的块数
  • sbi->s_frags_per_group:每个块组的碎片数
  • sbi->s_inodes_per_group:每个块组的inode数
  • sbi->s_inodes_per_block:每个逻辑块的inode数,逻辑块大小除以inode大小
  • sbi->s_itb_per_group:每个块组inode表块数
  • sbi->s_desc_per_block:每个块的组描述符数,逻辑块大小除以组描述符大小
  • sbi->s_sbh:磁盘超级块首地址
  • sbi->s_mount_state:就是磁盘超级块里的文件系统状态es->s_state
  • sbi->s_addr_per_block_bits:
  • sbi->s_desc_per_block_bits:
sb->s_maxbytes = ext2_max_size(sb->s_blocksize_bits);
sb->s_max_links = EXT2_LINK_MAX;
sb->s_time_min = S32_MIN;
sb->s_time_max = S32_MAX;

if (le32_to_cpu(es->s_rev_level) == EXT2_GOOD_OLD_REV) {
	sbi->s_inode_size = EXT2_GOOD_OLD_INODE_SIZE;
	sbi->s_first_ino = EXT2_GOOD_OLD_FIRST_INO;
} else {
	sbi->s_inode_size = le16_to_cpu(es->s_inode_size);
	sbi->s_first_ino = le32_to_cpu(es->s_first_ino);
	if ((sbi->s_inode_size < EXT2_GOOD_OLD_INODE_SIZE) ||
		!is_power_of_2(sbi->s_inode_size) ||
		(sbi->s_inode_size > blocksize)) {
		ext2_msg(sb, KERN_ERR,
			"error: unsupported inode size: %d",
			sbi->s_inode_size);
		goto failed_mount;
	}
}

sbi->s_frag_size = EXT2_MIN_FRAG_SIZE <<
			   le32_to_cpu(es->s_log_frag_size);
if (sbi->s_frag_size == 0)
	goto cantfind_ext2;
sbi->s_frags_per_block = sb->s_blocksize / sbi->s_frag_size;

sbi->s_blocks_per_group = le32_to_cpu(es->s_blocks_per_group);
sbi->s_frags_per_group = le32_to_cpu(es->s_frags_per_group);
sbi->s_inodes_per_group = le32_to_cpu(es->s_inodes_per_group);

sbi->s_inodes_per_block = sb->s_blocksize / EXT2_INODE_SIZE(sb);
if (sbi->s_inodes_per_block == 0 || sbi->s_inodes_per_group == 0)
	goto cantfind_ext2;
sbi->s_itb_per_group = sbi->s_inodes_per_group /
				sbi->s_inodes_per_block;
sbi->s_desc_per_block = sb->s_blocksize /
				sizeof (struct ext2_group_desc);
sbi->s_sbh = bh;
sbi->s_mount_state = le16_to_cpu(es->s_state);
sbi->s_addr_per_block_bits =
	ilog2 (EXT2_ADDR_PER_BLOCK(sb));
sbi->s_desc_per_block_bits =
	ilog2 (EXT2_DESC_PER_BLOCK(sb));

6.初始化内存和VFS超级块信息3

接下来继续初始化内存超级块和VFS超级块。

  • sbi->s_groups_count:逻辑块组描述符数量
  • sbi->s_blockgroup_lock:初始化所有块组锁
  • sbi->s_debts:
  • sbi->s_group_desc[i]:从设备上读取块组描述符
  • sbi->s_gdb_count:
  • sbi->s_freeblocks_counter:计数器初始化
  • sb->s_op:超级块操作表
  • sb->s_export_op:跟NFS交互的操作表
  • sb->s_xattr:Xattr操作表
  • sb->dq_op:磁盘配额特性操作表
  • sb->s_qcop:磁盘配额特性控制操作表
  • sb->s_quota_types:配额类型
sbi->s_groups_count = ((le32_to_cpu(es->s_blocks_count) -
			le32_to_cpu(es->s_first_data_block) - 1)
				/ EXT2_BLOCKS_PER_GROUP(sb)) + 1;
db_count = (sbi->s_groups_count + EXT2_DESC_PER_BLOCK(sb) - 1) /
	   EXT2_DESC_PER_BLOCK(sb);
sbi->s_group_desc = kmalloc_array(db_count,
				   sizeof(struct buffer_head *),
				   GFP_KERNEL);
if (sbi->s_group_desc == NULL) {
	ret = -ENOMEM;
	ext2_msg(sb, KERN_ERR, "error: not enough memory");
	goto failed_mount;
}
bgl_lock_init(sbi->s_blockgroup_lock);
sbi->s_debts = kcalloc(sbi->s_groups_count, sizeof(*sbi->s_debts), GFP_KERNEL);
if (!sbi->s_debts) {
	ret = -ENOMEM;
	ext2_msg(sb, KERN_ERR, "error: not enough memory");
	goto failed_mount_group_desc;
}
for (i = 0; i < db_count; i++) {
	block = descriptor_loc(sb, logic_sb_block, i);
	sbi->s_group_desc[i] = sb_bread(sb, block);
	if (!sbi->s_group_desc[i]) {
		for (j = 0; j < i; j++)
			brelse (sbi->s_group_desc[j]);
		ext2_msg(sb, KERN_ERR,
			"error: unable to read group descriptors");
		goto failed_mount_group_desc;
	}
}
if (!ext2_check_descriptors (sb)) {
	ext2_msg(sb, KERN_ERR, "group descriptors corrupted");
	goto failed_mount2;
}
sbi->s_gdb_count = db_count;
get_random_bytes(&sbi->s_next_generation, sizeof(u32));
spin_lock_init(&sbi->s_next_gen_lock);

/* per filesystem reservation list head & lock */
spin_lock_init(&sbi->s_rsv_window_lock);
sbi->s_rsv_window_root = RB_ROOT;
/*
 * Add a single, static dummy reservation to the start of the
 * reservation window list --- it gives us a placeholder for
 * append-at-start-of-list which makes the allocation logic
 * _much_ simpler.
 */
sbi->s_rsv_window_head.rsv_start = EXT2_RESERVE_WINDOW_NOT_ALLOCATED;
sbi->s_rsv_window_head.rsv_end = EXT2_RESERVE_WINDOW_NOT_ALLOCATED;
sbi->s_rsv_window_head.rsv_alloc_hit = 0;
sbi->s_rsv_window_head.rsv_goal_size = 0;
ext2_rsv_window_add(sb, &sbi->s_rsv_window_head);

err = percpu_counter_init(&sbi->s_freeblocks_counter,
			ext2_count_free_blocks(sb), GFP_KERNEL);
/*
 * set up enough so that it can read an inode
 */
sb->s_op = &ext2_sops;
sb->s_export_op = &ext2_export_ops;
sb->s_xattr = ext2_xattr_handlers;

#ifdef CONFIG_QUOTA
sb->dq_op = &dquot_operations;
sb->s_qcop = &ext2_quotactl_ops;
sb->s_quota_types = QTYPE_MASK_USR | QTYPE_MASK_GRP;
#endif

7.构造根inode

跟超级块类似,ext2文件系统的索引节点(inode)也有三种形态:

  • 磁盘上持久化的inode:struct ext2_inode
  • 内存中的inode:struct ext2_inode_info
  • VFS使用的inode:struct inode

本文虽然是讲述超级块填充,因为超级块结构中s_root指向根dentry,所以需要创建根dentry,而dentry结构d_inode域指向索引节点,所以也就需要构造出根inode,然后填充到根dentry中。

构造根inode过程与构造超级块类似,需要根据根inode编号(编号是2),从磁盘上读取inode信息,然后填充到内存形态的inode和VFS使用inode。

主要涉及的函数有:ext2_iget() -> ext2_get_inode(),其中ext2_get_inode负责从设备上读取inode,ext2_iget负责填充到内存inode和VFS使用的inode。

root = ext2_iget(sb, EXT2_ROOT_INO);
if (IS_ERR(root)) {
	ret = PTR_ERR(root);
	goto failed_mount3;
}

1)从磁盘读取inode

从磁盘读取inode的关键是知道块号,那如何计算块号呢?

  • 首先根据inode编号,计算出inode所属的块组:(ino – 1) / EXT2_INODES_PER_GROUP(sb)
  • 根据块组查找到块组描述符,这样就得到了inode_table的首地址
  • 然后再计算具体的偏移量:就是取模的零头*一个inode大小
  • 那块号就是,inode_table首地址
  • 调用sb_bread()函数读取块信息
  • 最终得到inode的首地址:bh->b_data + 偏移量
static struct ext2_inode *ext2_get_inode(struct super_block *sb, ino_t ino,
					struct buffer_head **p)
{
	struct buffer_head * bh;
	unsigned long block_group;
	unsigned long block;
	unsigned long offset;
	struct ext2_group_desc * gdp;

	......

	block_group = (ino - 1) / EXT2_INODES_PER_GROUP(sb);
	gdp = ext2_get_group_desc(sb, block_group, NULL);
	if (!gdp)
		goto Egdp;
	/*
	 * Figure out the offset within the block group inode table
	 */
	offset = ((ino - 1) % EXT2_INODES_PER_GROUP(sb)) * EXT2_INODE_SIZE(sb);
	block = le32_to_cpu(gdp->bg_inode_table) +
		(offset >> EXT2_BLOCK_SIZE_BITS(sb));
	if (!(bh = sb_bread(sb, block)))
		goto Eio;

	*p = bh;
	offset &= (EXT2_BLOCK_SIZE(sb) - 1);
	return (struct ext2_inode *) (bh->b_data + offset);

        ......
}

2)初始化内存中和VFS使用的inode信息

在ext2_iget()函数中,利用从磁盘读取的inode信息,完成了内存中Inode和VFS使用的Inode初始化:

  • raw_inode:磁盘上inode(struct ext2_inode)
  • ei:内存中inode(struct ext2_inode_info)
  • inode:VFS使用的Inode(struct inode)

详细的处理逻辑见下方代码,代码逻辑比较简单,在此不做赘述。只说明一点,类似le16_to_cpu这种宏,是将磁盘上小端转换成CPU要求的,如果CPU要求小端序,将不做处理,如果CPU是大端序,将会从端转换成大端。

struct inode *ext2_iget (struct super_block *sb, unsigned long ino)
{
	......

	raw_inode = ext2_get_inode(inode->i_sb, ino, &bh);
	if (IS_ERR(raw_inode)) {
		ret = PTR_ERR(raw_inode);
 		goto bad_inode;
	}

	inode->i_mode = le16_to_cpu(raw_inode->i_mode);
	i_uid = (uid_t)le16_to_cpu(raw_inode->i_uid_low);
	i_gid = (gid_t)le16_to_cpu(raw_inode->i_gid_low);
	if (!(test_opt (inode->i_sb, NO_UID32))) {
		i_uid |= le16_to_cpu(raw_inode->i_uid_high) << 16;
		i_gid |= le16_to_cpu(raw_inode->i_gid_high) << 16;
	}
	i_uid_write(inode, i_uid);
	i_gid_write(inode, i_gid);
	set_nlink(inode, le16_to_cpu(raw_inode->i_links_count));
	inode->i_size = le32_to_cpu(raw_inode->i_size);
	inode->i_atime.tv_sec = (signed)le32_to_cpu(raw_inode->i_atime);
	inode->i_ctime.tv_sec = (signed)le32_to_cpu(raw_inode->i_ctime);
	inode->i_mtime.tv_sec = (signed)le32_to_cpu(raw_inode->i_mtime);
	inode->i_atime.tv_nsec = inode->i_mtime.tv_nsec = inode->i_ctime.tv_nsec = 0;
	ei->i_dtime = le32_to_cpu(raw_inode->i_dtime);
	
	......
	
	inode->i_blocks = le32_to_cpu(raw_inode->i_blocks);
	ei->i_flags = le32_to_cpu(raw_inode->i_flags);
	ext2_set_inode_flags(inode);
	ei->i_faddr = le32_to_cpu(raw_inode->i_faddr);
	ei->i_frag_no = raw_inode->i_frag;
	ei->i_frag_size = raw_inode->i_fsize;
	ei->i_file_acl = le32_to_cpu(raw_inode->i_file_acl);
	ei->i_dir_acl = 0;

	if (ei->i_file_acl &&
	    !ext2_data_block_valid(EXT2_SB(sb), ei->i_file_acl, 1)) {
		ext2_error(sb, "ext2_iget", "bad extended attribute block %u",
			   ei->i_file_acl);
		ret = -EFSCORRUPTED;
		goto bad_inode;
	}

	if (S_ISREG(inode->i_mode))
		inode->i_size |= ((__u64)le32_to_cpu(raw_inode->i_size_high)) << 32;
	else
		ei->i_dir_acl = le32_to_cpu(raw_inode->i_dir_acl);
	if (i_size_read(inode) < 0) {
		ret = -EFSCORRUPTED;
		goto bad_inode;
	}
	ei->i_dtime = 0;
	inode->i_generation = le32_to_cpu(raw_inode->i_generation);
	ei->i_state = 0;
	ei->i_block_group = (ino - 1) / EXT2_INODES_PER_GROUP(inode->i_sb);
	ei->i_dir_start_lookup = 0;

	/*
	 * NOTE! The in-memory inode i_data array is in little-endian order
	 * even on big-endian machines: we do NOT byteswap the block numbers!
	 */
	for (n = 0; n < EXT2_N_BLOCKS; n++)
		ei->i_data[n] = raw_inode->i_block[n];

	if (S_ISREG(inode->i_mode)) {
		ext2_set_file_ops(inode);
	} else if (S_ISDIR(inode->i_mode)) {
		inode->i_op = &ext2_dir_inode_operations;
		inode->i_fop = &ext2_dir_operations;
		if (test_opt(inode->i_sb, NOBH))
			inode->i_mapping->a_ops = &ext2_nobh_aops;
		else
			inode->i_mapping->a_ops = &ext2_aops;
	} else if (S_ISLNK(inode->i_mode)) {
		if (ext2_inode_is_fast_symlink(inode)) {
			inode->i_link = (char *)ei->i_data;
			inode->i_op = &ext2_fast_symlink_inode_operations;
			nd_terminate_link(ei->i_data, inode->i_size,
				sizeof(ei->i_data) - 1);
		} else {
			inode->i_op = &ext2_symlink_inode_operations;
			inode_nohighmem(inode);
			if (test_opt(inode->i_sb, NOBH))
				inode->i_mapping->a_ops = &ext2_nobh_aops;
			else
				inode->i_mapping->a_ops = &ext2_aops;
		}
	} else {
		inode->i_op = &ext2_special_inode_operations;
		if (raw_inode->i_block[0])
			init_special_inode(inode, inode->i_mode,
			   old_decode_dev(le32_to_cpu(raw_inode->i_block[0])));
		else 
			init_special_inode(inode, inode->i_mode,
			   new_decode_dev(le32_to_cpu(raw_inode->i_block[1])));
	}
	brelse (bh);
	unlock_new_inode(inode);
	return inode;
	
bad_inode:
	brelse(bh);
	iget_failed(inode);
	return ERR_PTR(ret);
}

8.创建根dentry

首先dentry跟超级块和inode不一样,超级块和Inode都有实体的磁盘结构,而dentry是为了快速搜索inode,每次文件系统装载时,都要重新构造dentry,所以它只有内存结构,没有磁盘结构。以下代码展示如何创建根dentry,大体上分为如下几个步骤:

  • 分配dentry缓存
  • dentry结构中文件名初始化
  • d_lock、d_flags 、d_parent初始化
  • d_sb链接到超级块
  • 初始化各个链表:d_hash、d_lru、d_subdirs、d_child
  • 使用sb->s_d_op来初始化d_op
  • 调用d_op->d_init函数来完成dentry初始化
sb->s_root = d_make_root(root);
if (!sb->s_root) {
	ext2_msg(sb, KERN_ERR, "error: get root inode failed");
	ret = -ENOMEM;
	goto failed_mount3;
}


struct dentry *d_make_root(struct inode *root_inode)
{
	struct dentry *res = NULL;

	if (root_inode) {
		res = d_alloc_anon(root_inode->i_sb);
		if (res)
			d_instantiate(res, root_inode);
		else
			iput(root_inode);
	}
	return res;
}


struct dentry *d_alloc_anon(struct super_block *sb)
{
        /* 第二个参数为NULL,说明是根dentry */
	return __d_alloc(sb, NULL);
}

static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
{
	struct dentry *dentry;
	char *dname;
	int err;

        /* 给dentry分配缓存 */
	dentry = kmem_cache_alloc(dentry_cache, GFP_KERNEL);
	if (!dentry)
		return NULL;

	/*
	 * We guarantee that the inline name is always NUL-terminated.
	 * This way the memcpy() done by the name switching in rename
	 * will still always have a NUL at the end, even if we might
	 * be overwriting an internal NUL character
	 */
	dentry->d_iname[DNAME_INLINE_LEN-1] = 0;
	if (unlikely(!name)) {
                /* 文件名为空,使用“/” */
		name = &slash_name;
		dname = dentry->d_iname;
	} else if (name->len > DNAME_INLINE_LEN-1) {
		size_t size = offsetof(struct external_name, name[1]);
		struct external_name *p = kmalloc(size + name->len,
						  GFP_KERNEL_ACCOUNT |
						  __GFP_RECLAIMABLE);
		if (!p) {
			kmem_cache_free(dentry_cache, dentry); 
			return NULL;
		}
		atomic_set(&p->u.count, 1);
		dname = p->name;
	} else  {
		dname = dentry->d_iname;
	}	

	dentry->d_name.len = name->len;
	dentry->d_name.hash = name->hash;
	memcpy(dname, name->name, name->len);
	dname[name->len] = 0;

	/* Make sure we always see the terminating NUL character */
	smp_store_release(&dentry->d_name.name, dname); /* ^^^ */

	dentry->d_lockref.count = 1;
	dentry->d_flags = 0;
	spin_lock_init(&dentry->d_lock);
	seqcount_spinlock_init(&dentry->d_seq, &dentry->d_lock);
	dentry->d_inode = NULL;
	dentry->d_parent = dentry;

        /* 链接到超级块 */
	dentry->d_sb = sb;
	dentry->d_op = NULL;
	dentry->d_fsdata = NULL;

        /* 初始化各个链表 */
	INIT_HLIST_BL_NODE(&dentry->d_hash);
	INIT_LIST_HEAD(&dentry->d_lru);
	INIT_LIST_HEAD(&dentry->d_subdirs);
	INIT_HLIST_NODE(&dentry->d_u.d_alias);
	INIT_LIST_HEAD(&dentry->d_child);

        /* 设置d_op,并调用d_init函数,完成初始化 */
	d_set_d_op(dentry, dentry->d_sb->s_d_op);
	if (dentry->d_op && dentry->d_op->d_init) {
		err = dentry->d_op->d_init(dentry);
		if (err) {
			if (dname_external(dentry))
				kfree(external_name(dentry));
			kmem_cache_free(dentry_cache, dentry);
			return NULL;
		}
	}

        /* 目录项计数器增加1 */
	this_cpu_inc(nr_dentry);

	return dentry;
}

10.检查revision_level

如果修订级别revision_level大于EXT2_MAX_SUPP_REV,将当前文件系统设置成只读。

static int ext2_setup_super (struct super_block * sb,
			      struct ext2_super_block * es,
			      int read_only)
{
	int res = 0;
	struct ext2_sb_info *sbi = EXT2_SB(sb);

	if (le32_to_cpu(es->s_rev_level) > EXT2_MAX_SUPP_REV) {
		ext2_msg(sb, KERN_ERR,
			"error: revision level too high, "
			"forcing read-only mode");
		res = SB_RDONLY;
	}
	if (read_only)
		return res;
	
	.....
	
	return res;
}

11.同步超级块到磁盘

终于到了超级块的最后一步,将缓存中的超级块信息,同步到的磁盘上。为了完成同步,分为下面三个步骤来执行:

  • 重新计算可用的块数量,然后设置给es->s_free_blocks_count
  • 重新计算可用的Inode数量,然后设置给es->s_free_inodes_count
  • 获取当前时间到秒数,作为写入时间赋值给s_wtime
  • 将磁盘超级块的s_sbh缓存域标志成脏块:mark_buffer_dirty(EXT2_SB(sb)->s_sbh)
  • 将磁盘超级块的s_sbh缓存域同步到磁盘:sync_dirty_buffer(EXT2_SB(sb)->s_sbh)
static void ext2_write_super(struct super_block *sb)
{
	if (!sb_rdonly(sb))
		ext2_sync_fs(sb, 1);
}

void ext2_sync_super(struct super_block *sb, struct ext2_super_block *es,
		     int wait)
{
	ext2_clear_super_error(sb);
	spin_lock(&EXT2_SB(sb)->s_lock);
	es->s_free_blocks_count = cpu_to_le32(ext2_count_free_blocks(sb));
	es->s_free_inodes_count = cpu_to_le32(ext2_count_free_inodes(sb));
	es->s_wtime = cpu_to_le32(ktime_get_real_seconds());
	/* unlock before we do IO */
	spin_unlock(&EXT2_SB(sb)->s_lock);
	mark_buffer_dirty(EXT2_SB(sb)->s_sbh);
	if (wait)
		sync_dirty_buffer(EXT2_SB(sb)->s_sbh);
}

三、总结

挂载ext2文件系统时,会回调注册时提供的ext2_mount()函数,此函数调用mount_bdev()来挂载块设备,此函数要求提供填充超级块函数,ext2代码提供了ext2_fill_super()函数,来实现超级块构造和填充。

超级块有三种形态,磁盘上超级块、内存中超级块和VFS使用的超级块,他们都需要构造和填充,主要步骤包括:

  • 从磁盘上读取磁盘超级块
  • 初始化内存和VFS使用的超级块
  • 从磁盘上读取根inode信息
  • 初始化内存和VFS使用的inode
  • 创建根dentry
  • 将填充好的超级块信息,同步到磁盘上

上一篇:ext2流程(1):注册/解注册文件系统

参考资料:

https://www.cnblogs.com/codetravel/p/4779430.html

https://tldp.org/LDP/tlk/fs/filesystem.html

https://e2fsprogs.sourceforge.net/ext2intro.html

《02 ext2如何构造超级块super_block》有一个想法

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注