Write amplification
Encyclopedia
Write amplification is an undesirable phenomenon associated with Flash memory
Flash memory
Flash memory is a non-volatile computer storage chip that can be electrically erased and reprogrammed. It was developed from EEPROM and must be erased in fairly large blocks before these can be rewritten with new data...

 and solid-state drive
Solid-state drive
A solid-state drive , sometimes called a solid-state disk or electronic disk, is a data storage device that uses solid-state memory to store persistent data with the intention of providing access in the same manner of a traditional block i/o hard disk drive...

s (SSDs). Because Flash memory must be erased before it can be rewritten, the process to perform these operations results in moving (or rewriting) user data and metadata
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...

 more than once. This multiplying effect increases the number of writes required over the life of the SSD which shortens the time it can reliably operate. The increased writes also consume bandwidth
Bandwidth (computing)
In computer networking and computer science, bandwidth, network bandwidth, data bandwidth, or digital bandwidth is a measure of available or consumed data communication resources expressed in bits/second or multiples of it .Note that in textbooks on wireless communications, modem data transmission,...

 to the Flash memory which mainly reduces random write performance to the SSD. Many factors will affect the write amplification of an SSD; some can be controlled by the user and some are a direct result of the data written to and usage of the SSD.

As early as 2008, both Intel and SiliconSystems
Western Digital
Western Digital Corporation is one of the largest computer hard disk drive manufacturers in the world. It has a long history in the electronics industry as an integrated circuit maker and a storage products company. Western Digital was founded on April 23, 1970 by Alvin B...

 (acquired by Western Digital
Western Digital
Western Digital Corporation is one of the largest computer hard disk drive manufacturers in the world. It has a long history in the electronics industry as an integrated circuit maker and a storage products company. Western Digital was founded on April 23, 1970 by Alvin B...

 in 2009) were the first companies to use the term write amplification in their papers and publications. Write amplification is typically measured by the ratio of writes coming from the host system and the writes going to the Flash memory. Without compression
Data compression
In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....

, write amplification cannot drop below one. Using compression, SandForce
SandForce
SandForce is an American "fabless" semiconductor company based in Milpitas, California, that designs and manufactures flash memory controllers for solid-state drives . On October 26th 2011 it was acquired by LSI Corporation....

 has claimed to achieve a typical write amplification of 0.5.

Basic SSD operation

Due to the nature of Flash memory's operation, data cannot be directly overwritten
Overwriting (Computer Science)
Overwriting is a process of writing a binary set of data on a memory. In general it writes over the previous data, hence the name. Overwriting generally occurs when unused file system clusters are written upon with new data, though overwriting is also used in security algorithms...

 as it can in a hard disk drive. When data is first written to an SSD, the cells all start in an erased state so data can be written directly using pages at a time ( in size). The SSD controller on the SSD, which manages the Flash memory and interfaces
Interface (computer science)
In the field of computer science, an interface is a tool and concept that refers to a point of interaction between components, and is applicable at the level of both hardware and software...

 with the host system, uses a logical to physical mapping system known as logical block addressing
Logical block addressing
Logical block addressing is a common scheme used for specifying the location of blocks of data stored on computer storage devices, generally secondary storage systems such as hard disks....

 (LBA) and that is part of the Flash translation layer (FTL). When new data comes in replacing older data already written, the SSD controller will write the new data in a new location and update the logical mapping to point to the new physical location. The old location is no longer holding valid data, but it will eventually need to be erased before it can be written again.

Flash memory can only be programmed and erased a limited number of times. This is often referred to as the maximum number of program/erase cycles (P/E cycles) it can sustain over the life of the Flash memory. Single-level cell (SLC) Flash memory, designed for highest performance, can typically operate between 50,000 and 100,000 cycles. , Flash memory known as multi-level cell
Multi-level cell
In electronics, a multi-level cell is a memory element capable of storing more than a single bit of information.MLC NAND flash is a flash memory technology using multiple levels per cell to allow more bits to be stored using the same number of transistors...

 (MLC) is designed for lower cost applications and has a greatly reduced cycle count of typically between 3,000 and 5,000. A lower write amplification is more desirable, as it corresponds to a reduced number of P/E cycles on the Flash memory and thereby to an increased SSD life.

Calculating the value

Write amplification was always present in SSDs before the term was defined, but it was in 2008 that both Intel and SiliconSystems started using the term in their papers and publications. All SSDs have a write amplification value and it is based on both what is currently being written and what was previously written to the SSD. In order to accurately measure the value for a specific SSD, the selected test should be run for enough time to ensure the drive has reached a steady state
Steady state
A system in a steady state has numerous properties that are unchanging in time. This implies that for any property p of the system, the partial derivative with respect to time is zero:...

 condition. The formula to calculate the write amplification of an SSD is:

Simple write amplification formula


Factors affecting the value

Many factors affect the write amplification of an SSD. The table below lists the primary factors and how they affect the write amplification. For factors that are variable, the table notes if it has a direct relationship or an inverse relationship. For example, as the amount of over-provisioning increases, the write amplification decreases (inverse relationship). If the factor is a toggle (enabled or disabled) function then it has either a positive or negative relationship.
Write Amplification Factors
Factor Description Type Relationship*
Garbage collection The efficiency of the algorithm used to pick the next best block to erase and rewrite Variable Inverse (good)
Over-provisioning The percentage of physical capacity which is allocated to the SSD controller (and not given to the user) Variable Inverse (good)
TRIM A SATA command sent by the operating system (OS) which tells the SSD what data can be ignored during garbage collection Toggle Positive (good)
Free user space The percentage of the user capacity free of actual user data; requires TRIM, otherwise the SSD gains no benefit from any free user capacity Variable Inverse (good)
Secure erase Erases all user data and related metadata which resets the SSD to the initial out-of-box performance (until garbage collection resumes) Toggle Positive (good)
Wear leveling The efficiency of the algorithm that ensures every block is written an equal number of times to all other blocks as evenly as possible Variable Direct (bad)
Separating static and dynamic data Grouping data based on how often it tends to change Toggle Positive (good)
Sequential writes In theory, sequential writes have a write amplification of 1, but other factors will still affect the value Toggle Positive (good)
Random writes Writing to non-sequential LBAs will have the greatest impact on write amplification Toggle Negative (bad)


*Relationship Definitions
Relationship Description
Direct As the factor increases the WA increases
Inverse As the factor increases the WA decreases
Positive When the factor is present the WA decreases
Negative When the factor is present the WA increases

Garbage collection

Data is written to the Flash memory in units called pages (made up of multiple cells). However, the memory can only be erased in larger units called blocks (made up of multiple pages). If the data in some of the pages of the block are no longer needed (also called stale pages), only the pages with good data in that block are read and re-written into another previously erased empty block. Then the free pages left by not moving the stale data are available for new data. This is a process called garbage collection
Garbage collection (computer science)
In computer science, garbage collection is a form of automatic memory management. The garbage collector, or just collector, attempts to reclaim garbage, or memory occupied by objects that are no longer in use by the program...

(GC). All SSDs include some level of garbage collection, but they may differ in when and how fast they perform the process. Garbage collection is a big part of write amplification on the SSD.

Reads do not require an erase of the Flash memory, so they are not generally associated with write amplification. In the limited chance of a read disturb error, the data in that block is read and rewritten, but this would not have any material impact on the write amplification of the drive.

Background garbage collection

The process of garbage collection involves reading and rewriting data to the Flash memory. This means that a new write from the host will first require a read of the whole block, a write of the parts of the block which still include valid data, and then a write of the new data. This can significantly reduce the performance of the system. Some SSD controllers implement background garbage collection (BGC), sometimes called idle garbage collection or idle-time garbage collection (ITGC), where the controller uses idle
Idle (CPU)
A computer processor is described as idle when it is not being used by any program.Programs which make use of CPU Idle Time mean that they run at a low priority so as not to impact programs that run at normal priority...

 time to consolidate blocks of Flash memory before the host needs to write new data. This enables the performance of the device to remain high.

If the controller were to background garbage collect all of the spare blocks before it was absolutely necessary, new data written from the host could be written without having to move any data in advance, letting the performance operate at its peak speed. The trade-off is that some of those blocks of data are actually not needed by the host and will eventually be deleted, but the OS did not tell the controller this information. The result is that the soon-to-be-deleted data is rewritten to another location in the Flash memory increasing the write amplification. In some of the SSDs from OCZ the background garbage collection only clears up a small number of blocks then stops, thereby limiting the amount of excessive writes. Another solution is to have an efficient garbage collection system which can perform the necessary moves in parallel with the host writes. This solution is more effective in high write environments where the SSD is rarely idle. The SandForce
SandForce
SandForce is an American "fabless" semiconductor company based in Milpitas, California, that designs and manufactures flash memory controllers for solid-state drives . On October 26th 2011 it was acquired by LSI Corporation....

 SSD controllers and the systems from Violin Memory
Violin Memory
Violin Memory is a private American company based in Mountain View, California, that designs and manufactures enterprise flash memory arrays that combine Toshiba NAND flash, DRAM, distributed processing, and software to create solutions like Network-attached storage , local clusters, and QFabric...

 have this capability.

Over-provisioning

Over-provisioning (sometimes spelled as OP, over provisioning, or overprovisioning) is the difference between the physical capacity of the Flash memory and the logical capacity presented through the operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

 (OS) as available for the user. During the garbage collection, wear-leveling, and bad block mapping operations on the SSD, the additional space from over-provisioning helps lower the write amplification when the controller writes to the flash memory.

The first level of over-provisioning comes from the computation of the capacity and the use of units for gigabyte
Gigabyte
The gigabyte is a multiple of the unit byte for digital information storage. The prefix giga means 109 in the International System of Units , therefore 1 gigabyte is...

 (GB) as the binary prefix
Binary prefix
In computing, a binary prefix is a specifier or mnemonic that is prepended to the units of digital information, the bit and the byte, to indicate multiplication by a power of 2...

. Both HDD and SSD vendors use the term GB to represent a decimal GB or 1,000,000,000 (10^9)bytes. Flash memory (like most other electronic storage) is assembled in multiples of two, so calculating the physical capacity of an SSD would be based on 1,073,741,824 (2^30) per binary GB. The difference between these two values is 7.37% (=(2^30-10^9)/10^9). Therefore a 128 GB SSD with 0% over-provisioning would provide 128,000,000,000 bytes to the user. This initial 7.37% is typically not counted in the total over-provisioning number.

The second level of over-provisioning comes from the manufacturer. This level of over-provisioning is typically 0%, 7%, or 28% based on the difference between the decimal GB of the physical capacity and the decimal GB of the available space to the user. As an example, a manufacturer might publish a specification for their SSD at 100 GB or 120 GB based on 128 GB of possible capacity. This delta is 28% and 7% respectively and is the basis for the manufacturer claiming they have 28% of over-provisioning on their drive. This does not count the additional 7.37% of capacity available from the difference between the decimal and binary GB.

The third level of over-provisioning comes from end users. Some SSDs permit the end user to select additional over-provisioning to gain endurance and performance at the expense of capacity. Alternatively an OS partition created with less than the full user capacity on the SSD will perform the same function. Over-provisioning does take away from user capacity, but it gives back reduced write amplification, increased endurance, and increased performance.
Over-provisioning calculation


TRIM

TRIM
TRIM (SSD command)
In computing, a TRIM command allows an operating system to inform a solid-state drive which blocks of data are no longer considered in use and can be wiped internally. While TRIM is frequently spelled in capital letters, it is not an acronym; it is merely a command name.TRIM was introduced soon...

 is a SATA command that enables the operating system to tell an SSD what blocks of previously saved data are no longer needed as a result of file deletions or using the format command. When an LBA is replaced by the OS, as with an overwrite of a file, the SSD knows that the original LBA can be marked as stale or invalid and it will not save those blocks during garbage collection. If the user or operating system erases a file (not just remove parts of it), the file will typically be marked for deletion, but the actual contents on the disk are never actually erased. Because of this, the SSD does not know the LBAs that the file previously occupied can be erased, so the SSD will keep garbage collecting them.

The introduction of the TRIM command resolves this problem for operating systems which support it like Windows 7, and Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

 since 2.6.33. When a file is permanently deleted or the drive is formatted, the OS sends the TRIM command along with the LBAs that are no longer containing valid data. This informs the SSD that the LBAs in use can be erased and reused. This reduces the LBAs needing to be moved during garbage collection. The result is the SSD will have more free space enabling lower write amplification and higher performance.

Limitations and dependencies of TRIM

The TRIM command also needs the support of the SSD. If the firmware
Firmware
In electronic systems and computing, firmware is a term often used to denote the fixed, usually rather small, programs and/or data structures that internally control various electronic devices...

 in the SSD does not have support for the TRIM command, the LBAs received with the TRIM command will not be marked as invalid and the drive will continue to garbage collect the data assuming it is still valid. Only when the OS saves new data into those LBAs will the SSD know to mark the original LBA as invalid. SSD Manufacturers that did not originally build TRIM support into their drives can either offer a firmware upgrade to the user, or provide a separate utility that extracts the information on the invalid data from the OS and separately TRIMs the SSD. The benefit would only be realized after each run of that utility by the user. The user could set up that utility to run periodically in the background as an automatically scheduled task.

Just because an SSD supports the TRIM command does not necessarily mean it will be able to perform at top speed immediately after. The space which is freed up after the TRIM command may be random locations spread throughout the SSD. It will take a number of passes of writing data and garbage collecting before those spaces are consolidated to show improved performance.

Even after the OS and SSD are configured to support the TRIM command, other conditions will prevent any benefit from TRIM. , databases and RAID systems are not yet TRIM-aware and subsequently will not know how to pass that information on to the SSD. In those cases the SSD will continue to save and garbage collect those blocks until the OS uses those LBAs for new writes.

The actual benefit of the TRIM command depends upon the free user space on the SSD. If the user capacity on the SSD was 100 GB and the user actually saved 95 GB of data to the drive, any TRIM operation would not add more than 5 GB of free space for garbage collection and wear leveling. In those situations, increasing the amount of over-provisioning by 5 GB would allow the SSD to have more consistent performance because it would always have the additional 5 GB of additional free space without having to wait for the TRIM command to come from the OS.

Free user space

The SSD controller will use any free blocks on the SSD for garbage collection and wear leveling. The portion of the user capacity which is free from user data (either already TRIMed or never written in the first place) will look the same as over-provisioning space (until the user saves new data to the SSD). If the user only saves data consuming 1/2 of the total user capacity of the drive, the other half of the user capacity will look like additional over-provisioning (as long as the TRIM command is supported in the system).

Secure erase

The ATA Secure Erase command is designed to remove all user data from a drive. With an SSD without integrated encryption, this command will put the drive back to it original out-of-box state. This will initially restore its performance to the highest possible level and the best (lowest number) possible write amplification, but as soon as the drive starts garbage collecting again the performance and write amplification will start returning to the former levels. Many tools use the ATA Secure Erase command to reset the drive and provide a user interface as well. One free tool that is commonly referenced in the industry is called HDDErase.

Drives which encrypt all writes on the fly can implement ATA Secure Erase in another way. They simply zeroize and generate a new random encryption key each time a secure erase is done. In this way the old data cannot be read anymore, as it cannot be decrypted. Some drives with an integrated encryption may require a TRIM command be sent to the drive to put the drive back to it original out-of-box state.

Wear leveling

If a particular block were programmed and erased repeatedly without writing to any other blocks, the one block would wear out before all the other blocks, thereby prematurely ending the life of the SSD. For this reason, SSD controllers use a technique called wear leveling
Wear leveling
Wear leveling is a technique for prolonging the service life of some kinds of erasable computer storage media, such as Flash memory used in solid-state drives and USB Flash drives...

 to distribute writes as evenly as possible across all the Flash blocks in the SSD. In a perfect scenario, this would enable every block to be written to its maximum life so they all fail at the same time. Unfortunately, the process to evenly distribute writes requires data previously written and not changing (cold data) to be moved, so that data which are changing more frequently (hot data) can be written into those blocks. Each time data are relocated without being changed by the host system, this increases the write amplification and thus reduces the life of the Flash memory. The key is to find an optimum algorithm which maximizes them both.

Separating static and dynamic data

The separation of static and dynamic data to reduce write amplification is not a simple process for the SSD controller. The process requires the SSD controller to separate the LBAs with data which is constantly changing and requiring rewriting (dynamic data) from the LBAs with data which rarely changes and does not require any rewrites (static data). If the data is mixed in the same blocks, as with almost all systems today, any rewrites will require the SSD controller to garbage collect both the dynamic data (which caused the rewrite initially) and static data (which did not require any rewrite). Any garbage collection of data that would not have otherwise required moving will increase write amplification. Therefore separating the data will enable static data to stay at rest and if it never gets rewritten it will have the lowest possible write amplification for that data. The drawback to this process is that somehow the SSD controller must still find a way to wear level the static data because those blocks that never change will not get a chance to be written to their maximum P/E cycles.

Sequential writes

When an SSD is writing data sequentially, the write amplification is equal to one meaning there is no write amplification. The reason is as the data is written, the entire block is filled sequentially with data related to the same file. If the OS determines that file is to be replaced or deleted, the entire block can be marked as invalid, and there is no need to read parts of it to garbage collect and rewrite into another block. It will only need to be erased, which is much easier and faster than the read-modify-write process needed for randomly written data going through garbage collection.

Random writes

The peak random write performance on an SSD is driven by plenty of free blocks after the SSD is completely garbage collected, secure erased, 100% TRIMed, or newly installed. The maximum speed will depend upon the number of parallel Flash channels connected to the SSD controller, the efficiency of the firmware, and the speed of the Flash memory in writing to a page. During this phase the write amplification will be the best it can ever be for random writes and will be approaching one. Once the blocks are all written once, garbage collection will begin and the performance will be gated by the speed and efficiency of that process. Write amplification in this phase will increase to the highest levels the drive will experience.

Impact on performance

The overall performance of an SSD is dependent upon a number of factors, including write amplification. Writing to a Flash memory device takes longer than reading from it. An SSD generally uses multiple Flash memory components connected in parallel to increase performance. If the SSD has a high write amplification, the controller will be required to write that many more times to the Flash memory. This requires even more time to write the data from the host. An SSD with a low write amplification will not need to write as much data and can therefore be finished writing sooner than a drive with a high write amplification.

Product statements

In September 2008, Intel announced the X25-M SATA SSD with a reported WA as low as 1.1. In April 2009, SandForce announced the SF-1000 SSD Processor family with a reported WA of 0.5 which appears to come from some form of data compression. Before this announcement, a write amplification of 1.0 was considered the lowest that could be attained with an SSD. Currently, only SandForce employs compression in their SSD controller.

External links

(held for future external links to be added)
-->
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK