Ext4
Encyclopedia
The ext4 or fourth extended filesystem is a journaling file system
Journaling file system
A journaling file system is a file system that keeps track of the changes that will be made in a journal before committing them to the main file system...

 for Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

, developed as the successor to ext3
Ext3
The ext3 or third extended filesystem is a journaled file system that is commonly used by the Linux kernel. It is the default file system for many popular Linux distributions, including Debian...

.

It was born as a series of backward compatible
Backward compatibility
In the context of telecommunications and computing, a device or technology is said to be backward or downward compatible if it can work with input generated by an older device...

 extensions to ext3, many of them originally developed by Cluster File Systems
Cluster File Systems
Cluster File Systems, Inc. is the company that originally developed the Lustre distributed file system. CFS was a privately held company with offices in the United States, and China.CFS was founded in 2001 by Dr. Peter Braam...

 for the Lustre
Lustre (file system)
Lustre is a massively parallel distributed file system, generally used for large scale cluster computing. The name Lustre is a portmanteau word derived from Linux and cluster...

 file system between 2003 and 2006, meant to extend storage limits and add other performance improvements. However, other Linux kernel
Linux kernel
The Linux kernel is an operating system kernel used by the Linux family of Unix-like operating systems. It is one of the most prominent examples of free and open source software....

 developers opposed accepting extensions to ext3 for stability reasons, and proposed to fork
Fork (software development)
In software engineering, a project fork happens when developers take a legal copy of source code from one software package and start independent development on it, creating a distinct piece of software...

 the source code of ext3, rename it as ext4, and do all the development there, without affecting the current ext3 users. This proposal was accepted, and on 28 June 2006, Theodore Ts'o
Theodore Ts'o
Theodore Y. "Ted" Ts'o is a software developer mainly known for his contributions to the Linux kernel, in particular his contributions to file systems.He graduated in 1990 from MIT with a degree in computer science...

, the ext3 maintainer, announced the new plan of development for ext4.

A preliminary development version of ext4 was included in version 2.6.19 of the Linux kernel
Linux kernel
The Linux kernel is an operating system kernel used by the Linux family of Unix-like operating systems. It is one of the most prominent examples of free and open source software....

. On 11 October 2008, the patches that mark ext4 as stable code were merged in the Linux 2.6.28 source code repositories, denoting the end of the development phase and recommending ext4 adoption. Kernel 2.6.28, containing the ext4 filesystem, was finally released on 25 December 2008. On 15 January 2010, Google
Google
Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...

 announced that it would upgrade its storage infrastructure from ext2
Ext2
The ext2 or second extended filesystem is a file system for the Linux kernel. It was initially designed by Rémy Card as a replacement for the extended file system ....

 to ext4. On December 14, 2010, they also announced they would use ext4, instead of YAFFS
YAFFS
YAFFS was designed and written by Charles Manning, of Whitecliffs, New Zealand, for the company .Yaffs1 is the first version of this file system and works on NAND chips that have 512 byte pages + 16 byte spare areas. These older chips also generally allow 2 or 3 write cycles per page, which...

, on Android 2.3.

Features

Large file system
The ext4 filesystem can support volumes with sizes up to 1 exbibyte (EiB)
Exbibyte
The exbibyte is a standards-based binary multiple of the byte, a unit of digital information storage. The exbibyte unit symbol is EiB....

 and files with sizes up to 16 tebibytes (TiB).


Extents
Extents replace the traditional block mapping
Block (data storage)
In computing , a block is a sequence of bytes or bits, having a nominal length . Data thus structured are said to be blocked. The process of putting data into blocks is called blocking. Blocking is used to facilitate the handling of the data-stream by the computer program receiving the data...

 scheme used by ext2/3 filesystems. An extent is a range of contiguous physical blocks, improving large file performance and reducing fragmentation. A single extent in ext4 can map up to 128 MiB of contiguous space with a 4 KiB block size. There can be 4 extents stored in the inode
Inode
In computing, an inode is a data structure on a traditional Unix-style file system such as UFS. An inode stores all the information about a regular file, directory, or other file system object, except its data and name....

. When there are more than 4 extents to a file, the rest of the extents are indexed in an Htree
Htree
An HTree is a specialized tree data structure for directory indexing, similar to a B-tree. They are constant depth of either one or two levels, have a high fanout factor, use a hash of the filename, and do not require balancing...

.


Backward compatibility
The ext4 filesystem is backward compatible with ext3
Ext3
The ext3 or third extended filesystem is a journaled file system that is commonly used by the Linux kernel. It is the default file system for many popular Linux distributions, including Debian...

 and ext2
Ext2
The ext2 or second extended filesystem is a file system for the Linux kernel. It was initially designed by Rémy Card as a replacement for the extended file system ....

, making it possible to mount
Mount (computing)
Mounting takes place before a computer can use any kind of storage device . The user or their operating system must make it accessible through the computer's file system. A user can access only files on mounted media.- Mount point :A mount point is a physical location in the partition used as a...

 ext3 and ext2 filesystems as ext4. This will slightly improve performance, because certain new features of ext4 can also be used with ext3 and ext2, such as the new block allocation algorithm.

The ext3
Ext3
The ext3 or third extended filesystem is a journaled file system that is commonly used by the Linux kernel. It is the default file system for many popular Linux distributions, including Debian...

 file system is partially forward compatible with ext4, that is, an ext4 filesystem can be mounted as an ext3 partition (using "ext3" as the filesystem type when mounting). However, if the ext4 partition uses extents (a major new feature of ext4), then the ability to mount the file system as ext3 is lost.


Persistent pre-allocation
The ext4 filesystem allows for pre-allocation of on-disk space for a file. The current method for this on most file systems is to write the file full of 0s to reserve the space when the file is created. This method is no longer required for ext4; instead, a new fallocate system call was added to the Linux kernel for use by filesystems, including ext4 and XFS
XFS
XFS is a high-performance journaling file system created by Silicon Graphics, Inc. It is the default file system in IRIX releases 5.3 and onwards and later ported to the Linux kernel. XFS is particularly proficient at parallel IO due to its allocation group based design...

, that have this capability. The space allocated for files such as these would be guaranteed and would likely be contiguous. This has applications for media streaming and databases.


Delayed allocation
Ext4 uses a filesystem performance technique called allocate-on-flush
Allocate-on-flush
Allocate-on-flush is a computer file system feature implemented in the HFS+, XFS, Reiser4, ZFS, Btrfs and ext4 file systems...

, also known as delayed allocation. It consists of delaying block allocation until the data is going to be written to the disk, unlike some other file systems, which may allocate the necessary blocks before that step. This improves performance and reduces fragmentation
File system fragmentation
In computing, file system fragmentation, sometimes called file system aging, is the inability of a file system to lay out related data sequentially , an inherent phenomenon in storage-backed file systems that allow in-place modification of their contents. It is a special case of data fragmentation...

 by improving block allocation decisions based on the actual file size.


Break 32,000 subdirectory limit
In ext3
Ext3
The ext3 or third extended filesystem is a journaled file system that is commonly used by the Linux kernel. It is the default file system for many popular Linux distributions, including Debian...

 the number of subdirectories that a directory can contain is limited to 32,000. This limit has been raised to 64,000 in ext4, and with the "dir_nlink" feature it can go beyond this (although it will stop increasing the link count on the parent). To allow for continued performance given the possibility of much larger directories, Htree
Htree
An HTree is a specialized tree data structure for directory indexing, similar to a B-tree. They are constant depth of either one or two levels, have a high fanout factor, use a hash of the filename, and do not require balancing...

 indexes
Index (information technology)
In computer science, an index can be:# an integer that identifies an array element# a data structure that enables sublinear-time lookup -Array element identifier:...

 (a specialized version of a B-tree
B-tree
In computer science, a B-tree is a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree is a generalization of a binary search tree in that a node can have more than two children...

) are turned on by default in ext4. This feature is implemented in Linux kernel 2.6.23. Htree is also available in ext3 when the dir_index feature is enabled.


Journal checksumming
Ext4 uses checksum
Checksum
A checksum or hash sum is a fixed-size datum computed from an arbitrary block of digital data for the purpose of detecting accidental errors that may have been introduced during its transmission or storage. The integrity of the data can be checked at any later time by recomputing the checksum and...

s in the journal to improve reliability, since the journal is one of the most used files of the disk. This feature has a side benefit; it can safely avoid a disk I/O wait during the journaling process, improving performance slightly. The technique of journal checksumming was inspired by a research paper from the University of Wisconsin titled IRON File Systems (specifically, section 6, called "transaction checksums") with modifications within the implementation of compound transactions performed by the IRON file system (originally proposed by Sam Naghshineh in the RedHat summit).

Faster file system checking
In ext4, unallocated block groups and sections of the inode table are marked as such. This enables e2fsck to skip them entirely on a check and greatly reduces the time it takes to check a file system of the size ext4 is built to support. This feature is implemented in version 2.6.24 of the Linux kernel.


Multiblock allocator
When a file is being appended to, ext3 calls the block allocator once for each block individually; with multiple concurrent writers, files can easily become fragmented
File system fragmentation
In computing, file system fragmentation, sometimes called file system aging, is the inability of a file system to lay out related data sequentially , an inherent phenomenon in storage-backed file systems that allow in-place modification of their contents. It is a special case of data fragmentation...

 on disk. With delayed allocation, however, ext4 buffers up a larger amount of data, and then allocates a group of blocks in a batch. This means that the allocator has more information about what's being written and can make better choices for allocating files contiguously on disk. The multiblock allocator is used when delayed allocation is enabled for a file system, or when files are opened in O_DIRECT mode. This feature does not affect the disk format.


Improved timestamps
As computers become faster in general and as Linux becomes used more for mission-critical
Mission Critical
Mission critical refers to any factor of a system whose failure will result in the failure of business operations. That is, it is critical to the organization's 'mission'....

 applications, the granularity of second-based timestamps becomes insufficient. To solve this, ext4 provides timestamp
Timestamp
A timestamp is a sequence of characters, denoting the date or time at which a certain event occurred. A timestamp is the time at which an event is recorded by a computer, not the time of the event itself...

s measured in nanoseconds. In addition, 2 bits of the expanded timestamp field are added to the most significant bits of the seconds field of the timestamps to defer the year 2038 problem
Year 2038 problem
The year 2038 problem may cause some computer software to fail at some point near the year 2038...

 for an additional 204 years.

Ext4 also adds support for date-created timestamps. But, as Theodore Ts'o
Theodore Ts'o
Theodore Y. "Ted" Ts'o is a software developer mainly known for his contributions to the Linux kernel, in particular his contributions to file systems.He graduated in 1990 from MIT with a degree in computer science...

 points out, while it is easy to add an extra creation-date field in the inode
Inode
In computing, an inode is a data structure on a traditional Unix-style file system such as UFS. An inode stores all the information about a regular file, directory, or other file system object, except its data and name....

 (thus technically enabling support for date-created timestamps in ext4), it is more difficult to modify or add the necessary system calls, like stat
Stat (Unix)
stat is a Unix system call that returns useful data about a file inode. The semantics of stat vary between operating systems. As an example, the Unix command ls uses it to retrieve information on : time of last modification , time of last status change and time of last access .-stat functions and...

 (which would probably require a new version), and the various libraries that depend on them (like glibc). These changes would require coordination of many projects. So, even if ext4 developers implement initial support for creation-date timestamps, this feature will not be available to user programs for now.

Criticism

In 2008, the principal developer of the ext3
Ext3
The ext3 or third extended filesystem is a journaled file system that is commonly used by the Linux kernel. It is the default file system for many popular Linux distributions, including Debian...

 and ext4 file systems, Theodore Ts'o
Theodore Ts'o
Theodore Y. "Ted" Ts'o is a software developer mainly known for his contributions to the Linux kernel, in particular his contributions to file systems.He graduated in 1990 from MIT with a degree in computer science...

, stated that although ext4 has improved features, it is not a major advance, it uses old technology, and is a stop-gap; Ts'o believes that Btrfs
Btrfs
Btrfs is a GPL-licensed copy-on-write file system for Linux.Development began at Oracle Corporation in 2007....

 is the better direction because "it offers improvements in scalability, reliability, and ease of management". Btrfs also has "a number of the same design ideas that reiser3
ReiserFS
ReiserFS is a general-purpose, journaled computer file system designed and implemented by a team at Namesys led by Hans Reiser. ReiserFS is currently supported on Linux . Introduced in version 2.4.1 of the Linux kernel, it was the first journaling file system to be included in the standard kernel...

/4
Reiser4
Reiser4 is a computer file system, successor to the ReiserFS file system, developed from scratch by Namesys and sponsored by DARPA as well as Linspire...

 had".

Delayed allocation and potential data loss

Because delayed allocation changes the behavior that programmers have been relying on with ext3
Ext3
The ext3 or third extended filesystem is a journaled file system that is commonly used by the Linux kernel. It is the default file system for many popular Linux distributions, including Debian...

, the feature poses some additional risk of data loss in cases where the system crashes or loses power before all of the data has been written to disk. Due to this, ext4 in kernel versions 2.6.30 and later automatically detects these cases and reverts to the old behavior.

The typical scenario in which this might occur is a program replacing the contents of a file without forcing a write to the disk with fsync. There are two common ways of replacing the contents of a file on Unix systems:
  • fd=open
    Open (system call)
    For most file systems, a program initializes access to a file in a filesystem using the open system call. This allocates resources associated to the file , and returns a handle that the process will use to refer to that file...

    ("file", O_TRUNC); write(fd, data); close(fd);
In this case, an existing file is truncated at the time of open (due to O_TRUNC flag), then new data is written out. Since the write can take some time, there is an opportunity of losing contents even with ext3, but usually very small. However, because ext4 can delay allocating file data for a long time, this opportunity is much greater.
There are several problems with this approach:
  1. If the write does not succeed (which may be due to error conditions in the writing program, or due to external conditions such as a full disk), then both the original version and the new version of the file will be lost, and the file may be corrupted because only a part of it has been written.
  2. If other processes access the file while it is being written, they see a corrupted version.
  3. If other processes have the file open and do not expect its contents to change, those processes may crash. One notable example is a shared library file which is mapped into running programs.
Because of these issues, often the following idiom is preferred over the above one:

  • fd=open("file.new"); write(fd, data); close(fd); rename("file.new", "file");
A new temporary file ("file.new") is created, which initially contains the new contents. Then the new file is renamed over the old one. Replacing files by the "rename" call is guaranteed to be atomic by POSIX
POSIX
POSIX , an acronym for "Portable Operating System Interface", is a family of standards specified by the IEEE for maintaining compatibility between operating systems...

 standards – i.e. either the old file remains, or it's overwritten with the new one. Because the ext3 default "ordered" journaling mode guarantees file data is written out on disk before metadata, this technique guarantees that either the old or the new file contents will persist on disk. ext4's delayed allocation breaks this expectation, because the file write can be delayed for a long time, and the rename is usually carried out before new file contents reach the disk.


Using fsync more often to reduce the risk for ext4 could lead to performance penalties on ext3 filesystems mounted with the data=ordered flag (the default on most Linux distributions). Given that both file systems will be in use for some time, this complicates matters for end-user application developers. In response, ext4 in Linux kernels 2.6.30 and newer detect the occurrence of these common cases and force the files to be allocated immediately. For a small cost in performance, this provides semantics similar to ext3 ordered mode and increases the chance that either version of the file will survive the crash. This new behavior is enabled by default, but can be disabled with the "noauto_da_alloc" mount option.

The new patches have become part of the mainline kernel 2.6.30, but various distributions chose to backport them to 2.6.28 or 2.6.29. For instance Ubuntu
Ubuntu (operating system)
Ubuntu is a computer operating system based on the Debian Linux distribution and distributed as free and open source software. It is named after the Southern African philosophy of Ubuntu...

 made them part of the 2.6.28 kernel in version 9.04 ("Jaunty Jackalope").

These patches don't completely prevent potential data loss or help at all with new files. No other filesystem is perfect in terms of data loss either, although the probability of data loss is lower on ext3. The only way to be safe is to write and use software that does fsync when it needs to. Performance problems can be minimized by limiting crucial disk writes that need fsync to occur less frequently.

Compatibility with Windows and Macintosh

Ext4 does not yet have as much support as ext2 and ext3 on non-linux operating systems. Ext2 and ext3 have stable drivers such as Ext2IFS
Ext2IFS
Ext2IFS is a freeware Installable File System for Microsoft Windows that provides support for reading and writing the ext2 and ext3 file systems that are typically used by Linux. According to the author, it is handy for users with a dual boot configuration to transfer files between the two...

, which are not yet available for ext4. It is possible to create compatible ext4 filesystems for use in Windows by disabling the extents feature, and sometimes specifying an Inode size. Viewing and copying files from ext4 to Windows, even with extents enabled, is also possible with the Ext2Read software. However, there are no available drivers that provide full read and write compatibility with Windows. Current Macintosh drivers are also read-only.

See also

  • List of file systems
  • Comparison of file systems
    Comparison of file systems
    -General information:-Limits:-Metadata:-Features:-Allocation and layout policies:-Supporting operating systems:-See also:* Comparison of archive formats* Comparison of file archivers* List of archive formats* List of file archivers...

  • Btrfs
    Btrfs
    Btrfs is a GPL-licensed copy-on-write file system for Linux.Development began at Oracle Corporation in 2007....

  • Reiser4
    Reiser4
    Reiser4 is a computer file system, successor to the ReiserFS file system, developed from scratch by Namesys and sponsored by DARPA as well as Linspire...

  • ZFS
    ZFS
    In computing, ZFS is a combined file system and logical volume manager designed by Sun Microsystems. The features of ZFS include data integrity verification against data corruption modes , support for high storage capacities, integration of the concepts of filesystem and volume management,...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK