Ext3
Encyclopedia
The ext3 or third extended filesystem is a journaled file system
Journaling file system
A journaling file system is a file system that keeps track of the changes that will be made in a journal before committing them to the main file system...

 that is commonly used by the Linux kernel
Linux kernel
The Linux kernel is an operating system kernel used by the Linux family of Unix-like operating systems. It is one of the most prominent examples of free and open source software....

. It is the default file system
File system
A file system is a means to organize data expected to be retained after a program terminates by providing procedures to store, retrieve and update data, as well as manage the available space on the device which contain it. A file system organizes data in an efficient manner and is tuned to the...

 for many popular Linux distributions, including Debian
Debian
Debian is a computer operating system composed of software packages released as free and open source software primarily under the GNU General Public License along with other free software licenses. Debian GNU/Linux, which includes the GNU OS tools and Linux kernel, is a popular and influential...

. Stephen Tweedie
Stephen Tweedie
Dr. Stephen C. Tweedie is a software developer who is known for his work on the Linux kernel, in particular his work on filesystems.After becoming involved with the development of the ext2 filesystem working on performance issues, he led the development of the ext3 filesystem which involved adding...

 first revealed that he was working on extending ext2
Ext2
The ext2 or second extended filesystem is a file system for the Linux kernel. It was initially designed by Rémy Card as a replacement for the extended file system ....

 in Journaling the Linux ext2fs Filesystem in a 1998 paper and later in a February 1999 kernel mailing list posting, and the filesystem was merged with the mainline Linux kernel
Linux kernel
The Linux kernel is an operating system kernel used by the Linux family of Unix-like operating systems. It is one of the most prominent examples of free and open source software....

 in November 2001 from 2.4.15 onward. Its main advantage over ext2 is journaling
Journaling file system
A journaling file system is a file system that keeps track of the changes that will be made in a journal before committing them to the main file system...

 which improves reliability and eliminates the need to check the file system after an unclean shutdown. Its successor is ext4
Ext4
The ext4 or fourth extended filesystem is a journaling file system for Linux, developed as the successor to ext3.It was born as a series of backward compatible extensions to ext3, many of them originally developed by Cluster File Systems for the Lustre file system between 2003 and 2006, meant to...

.

Advantages

Although its performance (speed) is less attractive than competing Linux filesystems such as ext4
Ext4
The ext4 or fourth extended filesystem is a journaling file system for Linux, developed as the successor to ext3.It was born as a series of backward compatible extensions to ext3, many of them originally developed by Cluster File Systems for the Lustre file system between 2003 and 2006, meant to...

, JFS, ReiserFS
ReiserFS
ReiserFS is a general-purpose, journaled computer file system designed and implemented by a team at Namesys led by Hans Reiser. ReiserFS is currently supported on Linux . Introduced in version 2.4.1 of the Linux kernel, it was the first journaling file system to be included in the standard kernel...

 and XFS
XFS
XFS is a high-performance journaling file system created by Silicon Graphics, Inc. It is the default file system in IRIX releases 5.3 and onwards and later ported to the Linux kernel. XFS is particularly proficient at parallel IO due to its allocation group based design...

, it has a significant advantage in that it allows in-place upgrades from the ext2 file system without having to back up
Backup
In information technology, a backup or the process of backing up is making copies of data which may be used to restore the original after a data loss event. The verb form is back up in two words, whereas the noun is backup....

 and restore data. Benchmarks suggest that ext3 also uses less CPU power than ReiserFS and XFS. It is also considered safer than the other Linux file systems due to its relative simplicity and wider testing base.

The ext3 file system adds, over its predecessor:
  • A Journaling file system
    Journaling file system
    A journaling file system is a file system that keeps track of the changes that will be made in a journal before committing them to the main file system...

    .
  • Online file system growth.
  • Htree
    Htree
    An HTree is a specialized tree data structure for directory indexing, similar to a B-tree. They are constant depth of either one or two levels, have a high fanout factor, use a hash of the filename, and do not require balancing...

     indexing for larger directories.


Without these, any ext3 file system is also a valid ext2 file system. This has allowed well-tested and mature file system maintenance utilities for maintaining and repairing ext2 file systems to also be used with ext3 without major changes. The ext2 and ext3 file systems share the same standard set of utilities, e2fsprogs
E2fsprogs
e2fsprogs is a set of utilities for maintaining the ext2, ext3 and ext4 file systems. Since those file systems are often the default for Linux distributions, it is commonly considered to be essential software....

, which includes an fsck
Fsck
The system utility fsck is a tool for checking the consistency of a file system in Unix and Unix-like operating systems such as Linux.-Use:...

 tool. The close relationship also makes conversion between the two file systems (both forward to ext3 and backward to ext2) straightforward.

While in some contexts the lack of "modern" filesystem features such as dynamic inode
Inode
In computing, an inode is a data structure on a traditional Unix-style file system such as UFS. An inode stores all the information about a regular file, directory, or other file system object, except its data and name....

 allocation and extents could be considered a disadvantage, in terms of recoverability this gives ext3 a significant advantage over file systems with those features. The file system metadata is all in fixed, well-known locations, and there is some redundancy inherent in the data structures that may allow ext2 and ext3 to be recoverable in the face of significant data corruption, where tree-based file systems may not be recoverable.

Size limits

ext3 has a maximum size for both individual files and the entire filesystem. For the filesystem as a whole that limit is 232 blocks
Block (data storage)
In computing , a block is a sequence of bytes or bits, having a nominal length . Data thus structured are said to be blocked. The process of putting data into blocks is called blocking. Blocking is used to facilitate the handling of the data-stream by the computer program receiving the data...

. Both limits are dependent on the block size of the filesystem; the following chart summarizes the limits:
Block size Maximum
file size
Maximum
file system size
1 KiB
Kibibyte
The kibibyte is a multiple of the unit byte for quantities of digital information. The binary prefix kibi means 1024; therefore, 1 kibibyte is . The unit symbol for the kibibyte is KiB. The unit was established by the International Electrotechnical Commission in 1999 and has been accepted for use...

16 GiB
Gibibyte
The gibibyte is a standards-based binary multiple of the byte, a unit of digital information storage. The gibibyte unit symbol is GiB....

2 TiB
Tebibyte
The tebibyte is a standards-based binary multiple of the byte, a unit of digital information storage. The tebibyte unit symbol is TiB....

2 KiB
Kibibyte
The kibibyte is a multiple of the unit byte for quantities of digital information. The binary prefix kibi means 1024; therefore, 1 kibibyte is . The unit symbol for the kibibyte is KiB. The unit was established by the International Electrotechnical Commission in 1999 and has been accepted for use...

256 GiB
Gibibyte
The gibibyte is a standards-based binary multiple of the byte, a unit of digital information storage. The gibibyte unit symbol is GiB....

8 TiB
Tebibyte
The tebibyte is a standards-based binary multiple of the byte, a unit of digital information storage. The tebibyte unit symbol is TiB....

4 KiB
Kibibyte
The kibibyte is a multiple of the unit byte for quantities of digital information. The binary prefix kibi means 1024; therefore, 1 kibibyte is . The unit symbol for the kibibyte is KiB. The unit was established by the International Electrotechnical Commission in 1999 and has been accepted for use...

2 TiB
Tebibyte
The tebibyte is a standards-based binary multiple of the byte, a unit of digital information storage. The tebibyte unit symbol is TiB....

16 TiB
Tebibyte
The tebibyte is a standards-based binary multiple of the byte, a unit of digital information storage. The tebibyte unit symbol is TiB....

8 KiB
Kibibyte
The kibibyte is a multiple of the unit byte for quantities of digital information. The binary prefix kibi means 1024; therefore, 1 kibibyte is . The unit symbol for the kibibyte is KiB. The unit was established by the International Electrotechnical Commission in 1999 and has been accepted for use...

In Linux, 8 KiB
Kibibyte
The kibibyte is a multiple of the unit byte for quantities of digital information. The binary prefix kibi means 1024; therefore, 1 kibibyte is . The unit symbol for the kibibyte is KiB. The unit was established by the International Electrotechnical Commission in 1999 and has been accepted for use...

 block size is only available on architectures which allow 8 KiB pages
Paging
In computer operating systems, paging is one of the memory-management schemes by which a computer can store and retrieve data from secondary storage for use in main memory. In the paging memory-management scheme, the operating system retrieves data from secondary storage in same-size blocks called...

, such as Alpha
DEC Alpha
Alpha, originally known as Alpha AXP, is a 64-bit reduced instruction set computer instruction set architecture developed by Digital Equipment Corporation , designed to replace the 32-bit VAX complex instruction set computer ISA and its implementations. Alpha was implemented in microprocessors...

.
2 TiB
Tebibyte
The tebibyte is a standards-based binary multiple of the byte, a unit of digital information storage. The tebibyte unit symbol is TiB....

32 TiB
Tebibyte
The tebibyte is a standards-based binary multiple of the byte, a unit of digital information storage. The tebibyte unit symbol is TiB....




Journaling levels

There are three levels of journaling
Journaling file system
A journaling file system is a file system that keeps track of the changes that will be made in a journal before committing them to the main file system...

 available in the Linux implementation of ext3:

Journal (lowest risk): Both metadata and file contents are written to the journal before being committed to the main file system. Because the journal is relatively continuous on disk, this can improve performance provided journal size is sufficient. In other cases, performance gets worse because the data must be written twice - once to the journal, and once to the main part of the filesystem.

Ordered (medium risk): Only metadata is journaled; file contents are not, but it's guaranteed that file contents are written to disk before associated metadata is marked as committed in the journal. This is the default on many Linux distributions. If there is a power outage or kernel panic
Kernel panic
A kernel panic is an action taken by an operating system upon detecting an internal fatal error from which it cannot safely recover. The term is largely specific to Unix and Unix-like systems; for Microsoft Windows operating systems the equivalent term is "Bug check" .The kernel routines that...

 while a file is being written or appended to, the journal will indicate the new file or appended data has not been "committed", so it will be purged by the cleanup process. (Thus appends and new files have the same level of integrity protection as the "journaled" level.) However, files being overwritten can be corrupted because the original version of the file is not stored. Thus it's possible to end up with a file in an intermediate state between new and old, without enough information to restore either one or the other (the new data never made it to disk completely, and the old data is not stored anywhere). Even worse, the intermediate state might intersperse old and new data, because the order of the write is left up to the disk's hardware. XFS uses this form of journaling.

Writeback (highest risk): Only metadata is journaled; file contents are not. The contents might be written before or after the journal is updated. As a result, files modified right before a crash can become corrupted. For example, a file being appended to may be marked in the journal as being larger than it actually is, causing garbage at the end. Older versions of files could also appear unexpectedly after a journal recovery. The lack of synchronization between data and journal is faster in many cases. JFS uses this level of journaling, but ensures that any "garbage" due to unwritten data is zeroed out on reboot.

In all three modes, internal structure of file system is assured to be consistent even after a crash. Risk only affects user data content and users metadata of files. In any case, only the data content of files or directories which were being modified when the system crashed will be affected; the rest will be intact after recovery.

Functionality

Since ext3 aims to be backwards compatible with the earlier ext2, many of the on-disk structures are similar to those of ext2. Because of that, ext3 lacks a number of features of more recent designs, such as extents, dynamic allocation of inode
Inode
In computing, an inode is a data structure on a traditional Unix-style file system such as UFS. An inode stores all the information about a regular file, directory, or other file system object, except its data and name....

s, and block suballocation
Block suballocation
Block suballocation is a feature of some computer file systems which allows large blocks or allocation units to be used while making efficient use of "slack" space at the end of large files, space which would otherwise be lost for other use to internal fragmentation.In file systems that don't...

. There is a limit of 31998 sub-directories per one directory, stemming from its limit of 32000 links per inode.

ext3, like most current Linux filesystems, cannot be fsck
Fsck
The system utility fsck is a tool for checking the consistency of a file system in Unix and Unix-like operating systems such as Linux.-Use:...

-ed while the filesystem is mounted for writing. Attempting to check a file system that is already mounted may detect bogus errors where changed data has not reached the disk yet, and corrupt the file system in an attempt to "fix" these errors.

Defragmentation

There is no online ext3 defragmentation
Defragmentation
In the maintenance of file systems, defragmentation is a process that reduces the amount of fragmentation. It does this by physically organizing the contents of the mass storage device used to store files into the smallest number of contiguous regions . It also attempts to create larger regions of...

 tool that works on the filesystem level. An offline ext2 defragmenter, e2defrag, exists but requires that the ext3 filesystem be converted back to ext2 first. But depending on the feature bits turned on in the filesystem, e2defrag may destroy data; it does not know how to treat many of the newer ext3 features.

There are userspace defragmentation tools like Shake and defrag. Shake works by allocating space for the whole file as one operation, which will generally cause the allocator to find contiguous disk space. It also tries to write files used at the same time next to each other. Defrag works by copying each file over itself. However they only work if the filesystem is reasonably empty. A true defragmentation tool does not exist for ext3.

That being said, as the Linux System Administrator Guide states, "Modern Linux filesystem(s) keep fragmentation at a minimum by keeping all blocks in a file close together, even if they can't be stored in consecutive sectors. Some filesystems, like ext3, effectively allocate the free block that is nearest to other blocks in a file. Therefore it is not necessary to worry about fragmentation in a Linux system."

While ext3 is more resistant to file fragmentation than the FAT
File Allocation Table
File Allocation Table is a computer file system architecture now widely used on many computer systems and most memory cards, such as those used with digital cameras. FAT file systems are commonly found on floppy disks, flash memory cards, digital cameras, and many other portable devices because of...

 filesystem, nonetheless ext3 filesystems can get fragmented over time or on specific usage patterns, like slowly-writing large files. Consequently the successor to the ext3 filesystem, ext4
Ext4
The ext4 or fourth extended filesystem is a journaling file system for Linux, developed as the successor to ext3.It was born as a series of backward compatible extensions to ext3, many of them originally developed by Cluster File Systems for the Lustre file system between 2003 and 2006, meant to...

, is planned to eventually include an online filesystem defragmentation utility, and currently supports extents (contiguous file regions).

Recovery

There is no support of deleted file recovery in the file system design.
The ext3 driver actively deletes files by wiping file inodes
for crash safety reasons. This is why an accidental 'rm -rf ...' command may cause permanent data loss.

There are still several techniques
and some free
and commercial
software for recovery of deleted or lost files using file system journal
analysis; however, they do not guarantee any specific file recovery.

Compression

Support for transparent compression
Data compression
In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....

 is available as an unofficial patch for ext3. This patch is a direct port of e2compr and still needs further development, it compiles and boots well with upstream kernels but journaling is not implemented yet. The current patch is named e3compr.

Lack of snapshots support

Unlike a number of modern file systems, Ext3 does not have native support for snapshots
Snapshot (computer storage)
In computer systems, a snapshot is the state of a system at a particular point in time. The term was coined as an analogy to that in photography. It can refer to an actual copy of the state of a system or to a capability provided by certain systems....

 - the ability to quickly capture the state of the filesystem at arbitrary times, instead relying on less space-efficient volume level snapshots provided by the Linux LVM
Logical Volume Manager (Linux)
LVM is a logical volume manager for the Linux kernel; it manages disk drives and similar mass-storage devices, in particular large ones. The term "volume" refers to a disk drive or partition thereof...

. The Next3
Next3
Next3 is a journaled file system for Linux based on ext3 which adds snapshots support, yet retains compatibility to the ext3 on-disk format. Next3 is implemented as open-source software, licensed under the GPL license.-Background:...

 file system is a modified version of Ext3 which offers snapshots support, yet retains compatibility to the EXT3 on-disk format.

No checksumming in journal

Ext3 does not do checksum
Checksum
A checksum or hash sum is a fixed-size datum computed from an arbitrary block of digital data for the purpose of detecting accidental errors that may have been introduced during its transmission or storage. The integrity of the data can be checked at any later time by recomputing the checksum and...

ming when writing to the journal. If barrier=1 is not enabled as a mount option (in /etc/fstab), and if the hardware is doing out-of-order write caching, one runs the risk of severe filesystem corruption during a crash.

Consider the following scenario: If hard disk writes are done out-of-order (due to modern hard disks caching writes in order to amortize
Amortized analysis
In computer science, amortized analysis is a method of analyzing algorithms that considers the entire sequence of operations of the program. It allows for the establishment of a worst-case bound for the performance of an algorithm irrespective of the inputs by looking at all of the operations...

 write speeds), it is likely that one will write a commit block of a transaction before the other relevant blocks are written. If a power failure or unrecoverable crash should occur before the other blocks get written, the system will have to be rebooted. Upon reboot, the file system will replay the log as normal, and replay the "winners" (transactions with a commit block, including the invalid transaction above which happened to be tagged with a valid commit block). The unfinished disk write above will thus proceed, but using corrupt journal data. The file system will thus mistakenly overwrite normal data with corrupt data while replaying the journal. There is a test program available to trigger the problematic behavior. If checksums had been used, where the blocks of the "fake winner" transaction were tagged with a mutual checksum, the file system could have known better and not replayed the corrupt data onto the disk. Journal checksumming has been added to ext4
Ext4
The ext4 or fourth extended filesystem is a journaling file system for Linux, developed as the successor to ext3.It was born as a series of backward compatible extensions to ext3, many of them originally developed by Cluster File Systems for the Lustre file system between 2003 and 2006, meant to...

.

Filesystems going through the device mapper interface (including software RAID
RAID
RAID is a storage technology that combines multiple disk drive components into a logical unit...

 and LVM
Logical Volume Manager (Linux)
LVM is a logical volume manager for the Linux kernel; it manages disk drives and similar mass-storage devices, in particular large ones. The term "volume" refers to a disk drive or partition thereof...

 implementations) may not support barriers, and will issue a warning if that mount option is used. There are also some disks that do not properly implement the write cache flushing extension necessary for barriers to work, which causes a similar warning. In these situations, where barriers are not supported or practical, reliable write ordering is possible by turning off the disk's write cache and using the data=journal mount option. Turning off the disk's write cache may be required even when barriers are available. Applications like databases expect a call to fsync
Sync (Unix)
sync is a standard system call in the Unix operating system, which commits to disk all data in the kernel filesystem buffers, i.e., data which has been scheduled for writing via low-level I/O system calls. Note that higher-level I/O layers such as stdio may maintain separate buffers of their...

 will flush pending writes to disk, and the barrier implementation doesn't always clear the drive's write cache in response to that call. There is also a potential issue with the barrier implementation related to error handling during events such as a drive failure It is also known that sometimes some virtualization
Virtualization
Virtualization, in computing, is the creation of a virtual version of something, such as a hardware platform, operating system, a storage device or network resources....

 technologies do not properly forward flush command to the underlying devices (files, volumes, disk) from guest operating system. Similarly some hard disks or controllers implements cache flushing incorrectly or not at all, but still advertise that it is supported, and do not return any error when it is used. For this reasons it is safer to assume that cache flushing do not work, or test it extensively with more reliable and tested components (like SCSI disks).

ext4

An enhanced version of the file system was announced by Theodore Ts'o
Theodore Ts'o
Theodore Y. "Ted" Ts'o is a software developer mainly known for his contributions to the Linux kernel, in particular his contributions to file systems.He graduated in 1990 from MIT with a degree in computer science...

, the principal developer of ext3, on June 28, 2006 under the name of ext4. On October 11, 2008, the patches that mark ext4
Ext4
The ext4 or fourth extended filesystem is a journaling file system for Linux, developed as the successor to ext3.It was born as a series of backward compatible extensions to ext3, many of them originally developed by Cluster File Systems for the Lustre file system between 2003 and 2006, meant to...

 as stable code were merged in the Linux 2.6.28 source code repositories, marking the end of the development phase and recommending its adoption.
In 2008, Ts'o stated that although ext4 has improved features, it is not a major advance, it uses old technology, and is a stop-gap; Ts'o believes that Btrfs
Btrfs
Btrfs is a GPL-licensed copy-on-write file system for Linux.Development began at Oracle Corporation in 2007....

 is the better direction because "it offers improvements in scalability, reliability, and ease of management". Btrfs also has "a number of the same design ideas that reiser3
ReiserFS
ReiserFS is a general-purpose, journaled computer file system designed and implemented by a team at Namesys led by Hans Reiser. ReiserFS is currently supported on Linux . Introduced in version 2.4.1 of the Linux kernel, it was the first journaling file system to be included in the standard kernel...

/4
Reiser4
Reiser4 is a computer file system, successor to the ReiserFS file system, developed from scratch by Namesys and sponsored by DARPA as well as Linspire...

 had".

See also

  • List of file systems
  • Comparison of file systems
    Comparison of file systems
    -General information:-Limits:-Metadata:-Features:-Allocation and layout policies:-Supporting operating systems:-See also:* Comparison of archive formats* Comparison of file archivers* List of archive formats* List of file archivers...

  • Extended file attributes
    Extended file attributes
    Extended file attributes is a file system feature that enables users to associate computer files with metadata not interpreted by the filesystem, whereas regular attributes have a purpose strictly defined by the filesystem...


External links

as of 2004-10-14.
  • Introducing ext3 – IBM developerWorks Advanced filesystem implementor's guide, Part 7
  • Paragon ExtBrowser Free ext2/ext3 Windows driver
  • Ext2 File System For Windows GPL ext2/ext3 file system driver for Windows 2000/XP/2003/VISTA/2008 (opensource, supports read & write) supports larger disks (max 256 I-nodes)
  • Ext2 Installable File System For Windows ext2/ext3 file system driver for MS Windows NT/2000/XP (freeware, supports read & write on Windows NT4.0/2000/XP/2003/Vista on x86/AMD64) does not support larger disks (max. 128 bit I-nodes)
  • EXT2 IFS ext2/ext3 file system driver (read only) for MS Windows NT/2000/XP (opensource), latest version in the web archive
  • Explore2fs An explorer-like GUI tool for accessing ext2/ext3 filesystems under MS Windows
  • "Ext2read" A windows application to read/copy ext2/ext3/ext4 files with extent and LVM2 support.
  • UFS Explorer Standard Recovery version 4 Commercial data recovery and file undelete software for Ext2/Ext3 file systems.
  • ext2/ext3 resizing tools
  • Presentation on EXT3 Journaling Filesystem by Dr. Stephen Tweedie at the Ottawa Linux Symposium, 20 July 2000
  • State of the Art: Where we are with the Ext3 filesystem by Mingming Cao, Theodore Y. Ts'o, Badari Pulavarty, Suparna Bhattacharya, IBM Linux Technology Center, 2005
  • Tutorial – Determining Your EXT3 Size Limits
  • HTree
  • fuse-ext2 An open source ext2/ext3 file system driver for FUSE
    Filesystem in Userspace
    Filesystem in Userspace is a loadable kernel module for Unix-like computer operating systems that lets non-privileged users create their own file systems without editing kernel code...

    . (Supports Mac OS X
    Mac OS X
    Mac OS X is a series of Unix-based operating systems and graphical user interfaces developed, marketed, and sold by Apple Inc. Since 2002, has been included with all new Macintosh computer systems...

     10.4 and later (Universal Binary
    Universal binary
    A universal binary is, in Apple parlance, an executable file or application bundle that runs natively on either PowerPC or Intel-manufactured IA-32 or Intel 64-based Macintosh computers; it is an implementation of the concept more generally known as a fat binary.With the release of Mac OS X Snow...

    ), using MacFuse)
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK