Last year, Paolo Bonzini, a Distinguished Engineer at Red Hat, proposed a new file system named statsfs for Linux kernel. Unlike other conventional file systems, statsfs aims to gather and display statistics for the Linux kernel subsystems.
Later, Greg Kroah-Hartman, lead Linux kernel maintainer, liked the idea and gave a nod to proceed further. Then, at the end of last month, Emanuele Giuseppe Esposito, Engineer Intern at Red Hat, finally pushed a series of implementation patches for review.
Statsfs: Synthetic RAM-Based Virtual File System
Linux kernel subsystems mainly comprise of five major components: Processor scheduler, Memory management unit (MMU), Virtual file system (VFS), Networking, and Inter-process communication unit. But as he states, currently, it does not have a common way to exhibit its statistics from kernel to userspace. However, the subsystems handle the stats on their own and store them in some form like files.
Hence, the idea of ‘statsfs’ filesystem was proposed as an independent system separate from kernel API to take care of stats for the Linux kernel subsystem. It stores each statistical data as a file in the desired folder hierarchy defined by the statsfs API. The files can be read or deleted if file mode is set to do so.
The new statsfs file system contains several components and concepts that bind together to work as a single file system. Let’s take a look at some important elements:
‘Values’ And ‘Sources’ In Statsfs
Starting with basics, statsfs consists of two concepts: “values” (for files) and “sources” (for directories). Values represent a single quantity of data such as the number of VM exits, amount of memory used by some data structure, and the length of the longest hash table.
Here is a class to define the values with other variables:
struct statsfs_value { const char *name; enum stat_type type; /* STAT_TYPE_{BOOL,U64,...} */ u16 aggr_kind; /* Bitmask with zero or more of * STAT_AGGR_{MIN,MAX,SUM,...} */ u16 mode; /* File mode */ int offset; /* Offset from base address * to field containing the value */ };
On the other hand, Sources consist of two kinds of variables — values (same statsfs_value) and subordinate sources (to create subdirectory).
struct stats_fs_value_source { void *base_addr; bool files_created; struct stats_fs_value *values; struct list_head list_element; };
Statsfs API
Now, to add and remove the values and subordinate into many sources, there is statsfs API that provides several functions.
struct statsfs_source *statsfs_source_create(const char *fmt,...); void statsfs_source_add_values(struct statsfs_source *source,struct statsfs_value *stat,int n, void *ptr); void statsfs_source_add_subordinate(struct statsfs_source *source,struct statsfs_source *sub); void statsfs_source_remove_subordinate(struct statsfs_source *source,struct statsfs_source *sub);
Statsfs API is a public API defined by include/linux/statsfs.h to easily manipulate statsfs sources and values. This API is also used to build the statistics directory tree by automatically gathering info.
To serve the statistics to end-users in sysfs, statsfs maps sources with directories and values to files and mount it to the root source, i.e., a virtual file system in /sys/kernel/stats. From here, userspace requests for values that implicitly invoke statsfs API.
Here in this patch, you can find all statsfs API functions and interfaces.
Statsfs To Replace KVM Debugfs
Other Kernel subsystems like KVM can also use the statsfs API to create a source, add child sources/values/aggregates and register it to virtual fs. Statsfs aims for a more-or-less stable API with a separate file system and mount point (/sys/kernel/stats).
Even KVM (Kernel-based virtual machine) would be the first user of statsfs that exposes its statistics in debugfs but limited by the security lockdown patches.
The Way Ahead
Statsfs received a good response from kernel developers who reviewed the first version. They also suggested several other methods and corrections that could be included to make it better.
Emanuele has also sent a second revision of patches with all improvements as suggested in the earlier version. For instance, replacing previous ‘statsfs’ function and file names with ‘stats_fs’ to avoid confusion with the existing “statfs” function name.
As of now, the code is open for review, and developers are adding review comments. It will definitely take a few more months to get merged into the mainline kernel.