Algorithmic efficiency - AbsoluteAstronomy.com

Computer science

Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

, efficiency is used to describe properties of an algorithm

Algorithm

In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...

relating to how much of various types of resources

Computational resource

In computational complexity theory, a computational resource is a resource used by some computational models in the solution of computational problems....

it consumes. Algorithmic efficiency can be thought of as analogous to engineering productivity

Productivity

Productivity is a measure of the efficiency of production. Productivity is a ratio of what is produced to what is required to produce it. Usually this ratio is in the form of an average, expressing the total output divided by the total input...

for a repeating or continuous process, where the goal is to reduce resource consumption, including time to completion, to some acceptable, optimal level.

No automatic process

Gregory Chaitin

Gregory John Chaitin is an Argentine-American mathematician and computer scientist.-Mathematics and computer science:Beginning in 2009 Chaitin has worked on metabiology, a field parallel to biology dealing with the random evolution of artificial software instead of natural software .Beginning in...

proved that compacting an algorithm cannot be automated by a generalized algorithm; rather, it can only be done heuristically, i.e. by exhaustive search (examples to be found at Busy beaver

Busy beaver

In computability theory, a busy beaver is a Turing machine that attains the maximum "operational busyness" among all the Turing machines in a certain class...

), trial and error, cleverness, insight, application of inductive reasoning

Inductive reasoning

Inductive reasoning, also known as induction or inductive logic, is a kind of reasoning that constructs or evaluates propositions that are abstractions of observations. It is commonly construed as a form of reasoning that makes generalizations based on individual instances...

, etc.

Software metrics

The two most frequently encountered and measurable metrics of an algorithm are:-

speed or running time - the time it takes for an algorithm to complete, and
'space' - the memory
Memory
In psychology, memory is an organism's ability to store, retain, and recall information and experiences. Traditional studies of memory began in the fields of philosophy, including techniques of artificially enhancing memory....

or 'non-volatile storage' used by the algorithm during its operation.

but also might apply to

transmission size - such as required bandwidth during normal operation or
size of external memory- such as temporary disk space used to accomplish its task

and perhaps even

the size of required 'longterm' disk space required after its operation to record its output or maintain its required function during its required useful lifetime (examples: a data table
Table (information)
A table is a means of arranging data in rows and columns.Production % of goalNorth 4087102%South 4093110% The use of tables is pervasive throughout all communication, research and data analysis. Tables appear in print media, handwritten notes, computer software, architectural...

, archive
File archiver
A file archiver is a computer program that combines a number of files together into one archive file, or a series of archive files, for easier transportation or storage...

or a computer log) and also
the performance per watt
Performance per watt
In computing, performance per watt is a measure of the energy efficiency of a particular computer architecture or computer hardware. Literally, it measures the rate of computation that can be delivered by a computer for every watt of power consumed....

and the total energy, consumed by the chosen hardware
Hardware
Hardware is a general term for equipment such as keys, locks, hinges, latches, handles, wire, chains, plumbing supplies, tools, utensils, cutlery and machine parts. Household hardware is typically sold in hardware stores....

implementation (with its System requirements
System requirements
To be used efficiently, all computer software needs certain hardware components or other software resources to be present on a computer. These pre-requisites are known as system requirements and are often used as a guideline as opposed to an absolute rule. Most software defines two sets of system...

, necessary auxiliary support systems including interfaces, cabling, switching, cooling
Computer cooling
Computer cooling is required to remove the waste heat produced by computer components, to keep components within their safe operating temperature limits.Various cooling methods help to improve processor performance or reduce the noise of cooling fans....

and security
Computer security
Computer security is a branch of computer technology known as information security as applied to computers and networks. The objective of computer security includes protection of information and property from theft, corruption, or natural disaster, while allowing the information and property to...

), during its required useful lifetime. See Total cost of ownership
Total cost of ownership
Total cost of ownership is a financial estimate whose purpose is to help consumers and enterprise managers determine direct and indirect costs of a product or system...

for other potential costs that might be associated with any particular implementation.

(An extreme example of these metrics might be to consider their values in relation to a repeated simple algorithm for calculating and storing (π+n) to 50 decimal places running for say, 24 hours, on a processor operating in its own purpose-built heated or air conditioned unit.)

The process of making code more efficient is known as optimization

Optimization (computer science)

In computer science, program optimization or software optimization is the process of modifying a software system to make some aspect of it work more efficiently or use fewer resources...

and in the case of automatic optimization (i.e. compiler optimization

Compiler optimization

Compiler optimization is the process of tuning the output of a compiler to minimize or maximize some attributes of an executable computer program. The most common requirement is to minimize the time taken to execute a program; a less common one is to minimize the amount of memory occupied...

- performed by compilers on request or by default), usually focus on space at the cost of speed, or vice versa. There are also quite simple programming techniques and 'avoidance strategies' that can actually improve both at the same time, usually irrespective of hardware, software or language. Even the re-ordering of nested

Nested function

In computer programming, a nested function is a function which is lexically encapsulated within another function. It can only be called by the enclosing function or by functions directly or indirectly nested within the same enclosing function. In other words, the scope of the nested function is...

conditional statements to put the least frequently occurring condition first can reduce actual instruction path length

Instruction path length

In computer performance, the instruction path length is the number of machine code instructions required to execute a section of a computer program. The total path length for the entire program could be deemed a measure of the algorithm's performance on a particular computer hardware...

. For example, a condition might test patients for (age > 18) before testing (blood type

Blood type

A blood type is a classification of blood based on the presence or absence of inherited antigenic substances on the surface of red blood cells . These antigens may be proteins, carbohydrates, glycoproteins, or glycolipids, depending on the blood group system...

'AB-') because this type of blood occurs in only about 1 in 100 of the population. This would eliminate the second test at runtime in 99% of instances; something an optimizing compiler would almost certainly not be aware of - but which a programmer can research relatively easily even without specialist medical knowledge.
History
The first machines that were capable of computation were severely limited by purely mechanical considerations. As later electronic machines were developed they were, in turn, limited by the speed of their electronic counterparts. As software replaced hard-wired circuits, the efficiency of algorithms also became important. It has long been recognized that the precise 'arrangement of processes' is critical in reducing elapse time.

"In almost every computation a great variety of arrangements for the succession of the processes is possible, and various considerations must influence the selections amongst them for the purposes of a calculating engine. One essential object is to choose that arrangement which shall tend to reduce to a minimum the time necessary for completing the calculation"

Ada Lovelace

Augusta Ada King, Countess of Lovelace , born Augusta Ada Byron, was an English writer chiefly known for her work on Charles Babbage's early mechanical general-purpose computer, the analytical engine...

1815-1852, generally considered as 'the first programmer' who worked on Charles Babbage

Charles Babbage

Charles Babbage, FRS was an English mathematician, philosopher, inventor and mechanical engineer who originated the concept of a programmable computer...

's early mechanical general-purpose computer

"In established engineering disciplines a 12% improvement, easily obtained, is never considered marginal and I believe the same viewpoint should prevail in software engineering"

Extract from "Structured Programming with go to Statements" by Donald Knuth

Donald Knuth

Donald Ervin Knuth is a computer scientist and Professor Emeritus at Stanford University.He is the author of the seminal multi-volume work The Art of Computer Programming. Knuth has been called the "father" of the analysis of algorithms...

, renowned computer scientist

Computer science

and Professor Emeritus

Emeritus

Emeritus is a post-positive adjective that is used to designate a retired professor, bishop, or other professional or as a title. The female equivalent emerita is also sometimes used.-History:...

of the Art of Computer Programming at Stanford University

Stanford University

The Leland Stanford Junior University, commonly referred to as Stanford University or Stanford, is a private research university on an campus located near Palo Alto, California. It is situated in the northwestern Santa Clara Valley on the San Francisco Peninsula, approximately northwest of San...

"The key to performance is elegance, not battalions of special cases"

attributed to Jon Bentley

Jon Bentley

Jon Louis Bentley is a researcher in the field of computer science. He is credited with the invention of the k-d tree....

and (Malcolm) Douglas McIlroy

Douglas McIlroy

Malcolm Douglas McIlroy is a mathematician, engineer, and programmer. As of 2007 he is an Adjunct Professor of Computer Science at Dartmouth College. Dr...

Speed
The absolute speed of an algorithm for a given input can simply be measured as the duration of execution (or clock time) and the results can be averaged over several executions to eliminate possible random effects. Most modern processors operate in a multi-processing & multi-programming environment so consideration must be made for parallel processes occurring on the same physical machine, eliminating these as far as possible. For superscalar

Superscalar

A superscalar CPU architecture implements a form of parallelism called instruction level parallelism within a single processor. It therefore allows faster CPU throughput than would otherwise be possible at a given clock rate...

processors, the speed of a given algorithm can sometimes be improved through instruction-level parallelism

Instruction level parallelism

Instruction-level parallelism is a measure of how many of the operations in a computer program can be performed simultaneously. Consider the following program: 1. e = a + b 2. f = c + d 3. g = e * f...

within a single processor (but, for optimal results, the algorithm may require some adaptation to this environment to gain significant advantage ('speedup'), becoming, in effect, an entirely different algorithm). A relative measure of an algorithms performance can sometimes be gained from the total instruction path length

Instruction path length

which can be determined by a run time Instruction Set Simulator

Instruction Set Simulator

An instruction set simulator is a simulation model, usually coded in a high-level programming language, which mimics the behavior of a mainframe or microprocessor by "reading" instructions and maintaining internal variables which represent the processor's registers.Instruction simulation is a...

(where available).

An estimate of the speed of an algorithm can be determined in various ways. The most common method uses time complexity

Time complexity

In computer science, the time complexity of an algorithm quantifies the amount of time taken by an algorithm to run as a function of the size of the input to the problem. The time complexity of an algorithm is commonly expressed using big O notation, which suppresses multiplicative constants and...

to determine the Big-O

Big O notation

In mathematics, big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. It is a member of a larger family of notations that is called Landau notation, Bachmann-Landau notation, or...

of an algorithm. See Run-time analysis for estimating how fast a particular algorithm may be according to its type (example: lookup unsorted list, lookup sorted list etc.) and in terms of scalability

Scalability

In electronics scalability is the ability of a system, network, or process, to handle growing amount of work in a graceful manner or its ability to be enlarged to accommodate that growth...

- its dependence on 'size of input', processor power and other factors.
Memory
Often, it is possible to make an algorithm faster at the expense of memory. This might be the case whenever the result of an 'expensive' calculation is cache

Cache

In computer engineering, a cache is a component that transparently stores data so that future requests for that data can be served faster. The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that are stored elsewhere...

d rather than recalculating it afresh each time. The additional memory requirement would, in this case, be considered additional overhead

Computational overhead

In computer science, overhead is generally considered any combination of excess or indirect computation time, memory, bandwidth, or other resources that are required to attain a particular goal...

although, in many situations, the stored result occupies very little extra space and can often be held in pre-compiled static

Static memory allocation

Static memory allocation refers to the process of allocating memory at compile-time before the associated program is executed, unlike dynamic memory allocation or automatic memory allocation where memory is allocated as required at run-time....

storage, reducing not just processing time but also allocation & deallocation of working memory. This is a very common method of improving speed, so much so that some programming languages often add special features to support it, such as C++

C++

C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

's 'mutable' keyword.

The memory requirement of an algorithm is actually two separate but related things:-

The memory taken up by the compiled executable code (the object code
Object code
Object code, or sometimes object module, is what a computer compiler produces. In a general sense object code is a sequence of statements in a computer language, usually a machine code language....

or binary file
Binary file
A binary file is a computer file which may contain any type of data, encoded in binary form for computer storage and processing purposes; for example, computer document files containing formatted text...

) itself (on disk or equivalent, depending on the hardware and language). This can often be reduced by preferring run-time decision making mechanisms (such as virtual function
Virtual function
In object-oriented programming, a virtual function or virtual method is a function or method whose behaviour can be overridden within an inheriting class by a function with the same signature...

s and run-time type information
Run-time type information
In programming, RTTI refers to a C++ system that makes information about an object's data type available at runtime. Run-time type information can apply to simple data types, such as integers and characters, or to generic objects...

) over certain compile-time decision making mechanisms (such as macro substitution and templates
Template (programming)
Templates are a feature of the C++ programming language that allow functions and classes to operate with generic types. This allows a function or class to work on many different data types without being rewritten for each one....

). This, however, comes at the cost of speed.

Amount of temporary "dynamic memory" allocated during processing. For example, dynamically pre-caching results, as mentioned earlier, improves speed at the cost of this attribute. Even the depth of sub-routine calls can impact heavily on this cost and increase path length too, especially if there are 'heavy' dynamic memory requirements for the particular functions invoked. The use of copied function parameters (rather than simply using pointers to earlier, already defined, and sometimes static values) actually doubles the memory requirement for this particular memory metric (as well as carrying its own processing overhead
Computational overhead
In computer science, overhead is generally considered any combination of excess or indirect computation time, memory, bandwidth, or other resources that are required to attain a particular goal...

for the copying itself. This can be particularly relevant for quite 'lengthy' parameters such as html
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

script, JavaScript
JavaScript
JavaScript is a prototype-based scripting language that is dynamic, weakly typed and has first-class functions. It is a multi-paradigm language, supporting object-oriented, imperative, and functional programming styles....

source programs or extensive freeform text such as letters or email
Email
Electronic mail, commonly known as email or e-mail, is a method of exchanging digital messages from an author to one or more recipients. Modern email operates across the Internet or other computer networks. Some early email systems required that the author and the recipient both be online at the...

s.

(See also sections Choice of instruction or data type and Avoiding costs concerning deliberate 'stretching' of data structures to force them to have a fixed length which then becomes a multiple of 2, 4, 8 etc.)

Rematerialization

It has been argued that Rematerialization

Rematerialization

Rematerialization or remat is a compiler optimization which saves time by recomputing a value instead of loading it from memory. It is typically tightly integrated with register allocation, where it is used as an alternative to spilling registers to memory. It was conceived by Preston Briggs, Keith D...

(re-calculating) may occasionally be more efficient than holding results in cache. This is the somewhat non-intuitive belief that it can be faster to re-calculate from the input - even if the answer is already known - when it can be shown, in some special cases, to decrease "register pressure". Some optimizing compilers have the ability to decide when this is considered worthwhile based on a number of criteria such as complexity and no side effects

Side effect (computer science)

In computer science, a function or expression is said to have a side effect if, in addition to returning a value, it also modifies some state or has an observable interaction with calling functions or the outside world...

, and works by keeping track of the expression used to compute each variable, using the concept of available expression

Available expression

In the field of compiler optimizations, available expressions is an analysis algorithm that determines for each point in the program the set of expressions that need not be recomputed. Those expressions are said to be available at such a point...

s.

This is most likely to be true when a calculation is very fast (such as addition or bitwise operations), while the amount of data which must be cached would be very large, resulting in inefficient storage. Small amounts of data can be stored very efficiently in registers or fast cache, while in most contemporary computers large amounts of data must be stored in slower memory or even slower hard drive storage, and thus the efficiency of storing data which can be computed quickly drops significantly.

Precomputation

Precomputing a complete range of results prior to compiling, or at the beginning of an algorithm's execution, can often increase algorithmic efficiency substantially. This becomes advantageous when one or more inputs is constrained to a small enough range that the results can be stored in a reasonably sized block of memory. Because memory access is essentially constant in time complexity (except for caching

Cache

delays), any algorithm with a component which has worse than constant efficiency over a small input range can be improved by precomputing values. In some cases efficient approximation algorithms can be obtained by computing a discrete

Discrete mathematics

Discrete mathematics is the study of mathematical structures that are fundamentally discrete rather than continuous. In contrast to real numbers that have the property of varying "smoothly", the objects studied in discrete mathematics – such as integers, graphs, and statements in logic – do not...

subset of values and interpolating for intermediate input values, since interpolation is also a linear operation.
Transmission size
Data compression

Data compression

In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....

algorithms can be useful because they help reduce the consumption of expensive resources, such as hard disk space or transmission bandwidth. This however also comes at a cost - which is additional processing time to compress and subsequently decompress. Depending upon the speed of the data transfer, compression may reduce overall response times which, ultimately, equates to speed - even though processing within the computer itself takes longer. For audio, MP3

MP3

MPEG-1 or MPEG-2 Audio Layer III, more commonly referred to as MP3, is a patented digital audio encoding format using a form of lossy data compression...

is a compression method used extensively in portable sound systems. The efficiency of a data compression algorithm relates to the compression factor and speed of achieving both compression and decompression. For the purpose of archiving an extensive database

Database

A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

, it might be considered worthwhile to achieve a very high compression ratio, since decompression is less likely to occur on the majority of the data.
Data presentation
Output data can be presented to the end user in many ways - such as via punched tape or card, digital displays, local display monitors, remote computer monitors or printed. Each of these has its own inherent initial cost and, in some cases, an ongoing cost (e.g. refreshing

Refresh rate

The refresh rate is the number of times in a second that a display hardware draws the data...

an image from memory) . As an example, the web site "Google

Google

Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...

" recently showed, as its logo, an image of the Vancouver

Vancouver

Vancouver is a coastal seaport city on the mainland of British Columbia, Canada. It is the hub of Greater Vancouver, which, with over 2.3 million residents, is the third most populous metropolitan area in the country,...

olympics that is around 8K of gif

GIF

The Graphics Interchange Format is a bitmap image format that was introduced by CompuServe in 1987 and has since come into widespread usage on the World Wide Web due to its wide support and portability....

image. The normally displayed Google image is a PNG image of 28K (or 48K), yet the raw text string for "Google" occupies only 6 octet

Octet (computing)

An octet is a unit of digital information in computing and telecommunications that consists of eight bits. The term is often used when the term byte might be ambiguous, as there is no standard for the size of the byte.-Overview:...

s or 48 bits (4,778 or 8192 times less). This graphically illustrates that how data is presented can significantly effect the overall efficiency of transmission (and also the complete algorithm - since both GIF and PNG images require yet more processing).

It is estimated by "Internet World Stats" that there were 1,733,993,741 internet users in 2009 and, to transmit this new image to each one of them, would require around 136,000 billion (10⁹)octets of data to be transmitted - at least once - into their personal web cache

Web cache

A web cache is a mechanism for the temporary storage of web documents, such as HTML pages and images, to reduce bandwidth usage, server load, and perceived lag...

. In "Computational Energy Cost of TCP", co-authors Bokyung Wang and Suresh Singh examine the energy costs for TCP

Internet protocol suite

The Internet protocol suite is the set of communications protocols used for the Internet and other similar networks. It is commonly known as TCP/IP from its most important protocols: Transmission Control Protocol and Internet Protocol , which were the first networking protocols defined in this...

and calculated, for their chosen example, a cost of 0.022 Joule

Joule

The joule ; symbol J) is a derived unit of energy or work in the International System of Units. It is equal to the energy expended in applying a force of one newton through a distance of one metre , or in passing an electric current of one ampere through a resistance of one ohm for one second...

s per packet (of approx 1489 octets). On this basis, a total of around 2,000,000,000 joules (2 GJ) of energy

Energy

In physics, energy is an indirectly observed quantity. It is often understood as the ability a physical system has to do work on other physical systems...

might be expended by TCP elements alone to display the new logo for all users for the first time. To maintain or re-display this image requires still more processing and consequential energy cost (in contrast to printed output for instance).
Encoding and decoding methods (compared and contrasted)
When data is encoded for any 'external' use, it is possible to do so in an almost unlimited variety of different formats that are sometimes conflicting. This content encoding

Content format

A content format is an encoded format for converting a specific type of data to displayable information. Content formats are used in recording and transmission to prepare data for observation or interpretation. This includes both analog and digitized content...

(of the raw data

Raw data

'\putang inaIn computing, it may have the following attributes: possibly containing errors, not validated; in sfferent formats; uncoded or unformatted; and suspect, requiring confirmation or citation. For example, a data input sheet might contain dates as raw data in many forms: "31st January...

) may be designed for:

optimal readability – by humans
optimal decoding speed – by other computer programs
optimal compression – for archiving or data transmission
Data transmission
Data transmission, digital transmission, or digital communications is the physical transfer of data over a point-to-point or point-to-multipoint communication channel. Examples of such channels are copper wires, optical fibres, wireless communication channels, and storage media...
optimal compatibility – with "legacy
Legacy system
A legacy system is an old method, technology, computer system, or application program that continues to be used, typically because it still functions for the users' needs, even though newer technology or more efficient methods of performing a task are now available...

" or other existing formats or programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

s
optimal security – using encryption
Encryption
In cryptography, encryption is the process of transforming information using an algorithm to make it unreadable to anyone except those possessing special knowledge, usually referred to as a key. The result of the process is encrypted information...

(For character level encoding

Character encoding

A character encoding system consists of a code that pairs each character from a given repertoire with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the transmission of data through telecommunication networks or storage of text in...

, see the various encoding techniques such as EBCDIC

EBCDIC

Extended Binary Coded Decimal Interchange Code is an 8-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems....

or ASCII

ASCII

The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

)

It is unlikely that all of these goals could be met with a single 'generic' encoding scheme and so a compromise will often be the desired goal and will often be compromised by the need for standardization and/or legacy and compatibility issues.

Encoding efficiently

For data encoding whose destination is to be input for further computer processing, readability is not an issue – since the receiving processors algorithm can decode the data to any other desirable form including human readable. From the perspective of algorithmic efficiency, minimizing subsequent decoding (with zero or minimal parsing) should take highest priority. The general rule of thumb is that any encoding system that 'understands' the underlying data structure - sufficiently to encode it in the first place - should be equally capable of easily encoding it in such a way that makes decoding it highly efficient. For variable length data with possibly omitted data values, for instance, this almost certainly means the utilization of declarative notation (i.e. including the length of the data item as a prefix to the data so that a de-limiter is not required and parsing completely eliminated). For keyword data items, tokenizing the key to an index (integer) after its first occurrence not only reduces subsequent data size but, furthermore, reduces future decoding overhead for the same items that are repeated.

For more 'generic' encoding for efficient data compression see Arithmetic encoding and entropy encoding

Entropy encoding

In information theory an entropy encoding is a lossless data compression scheme that is independent of the specific characteristics of the medium....

articles.

Historically, optimal encoding was not only worthwhile from an efficiency standpoint but was also common practise to conserve valuable memory, external storage and processor resources. Once validated a country name for example could be held as a shorter sequential country code

Country code

Country codes are short alphabetic or numeric geographical codes developed to represent countries and dependent areas, for use in data processing and communications. Several different systems have been developed to do this. The best known of these is ISO 3166-1...

which could then also act as an index for subsequent 'decoding', using this code as an entry number within a table or record number within a file. If the table or file contained fixed length entries, the code could easily be converted to an absolute memory address or disk address for fast retrieval. The ISBN system for identifying books is a good example of a practical encoding method which also contains a built-in hierarchy

Hierarchy

A hierarchy is an arrangement of items in which the items are represented as being "above," "below," or "at the same level as" one another...

.

According to recent articles in New Scientist

New Scientist

New Scientist is a weekly non-peer-reviewed English-language international science magazine, which since 1996 has also run a website, covering recent developments in science and technology for a general audience. Founded in 1956, it is published by Reed Business Information Ltd, a subsidiary of...

and Scientific American

Scientific American

Scientific American is a popular science magazine. It is notable for its long history of presenting science monthly to an educated but not necessarily scientific public, through its careful attention to the clarity of its text as well as the quality of its specially commissioned color graphics...

;
"TODAY'S telecommunications networks could use one ten-thousandth of the power they presently consume if smarter data-coding techniques were used", according to Bell Labs

Bell Labs

Bell Laboratories is the research and development subsidiary of the French-owned Alcatel-Lucent and previously of the American Telephone & Telegraph Company , half-owned through its Western Electric manufacturing subsidiary.Bell Laboratories operates its...

, based in Murray Hill, New Jersey

Murray Hill, New Jersey

Murray Hill is an unincorporated area within portions of both Berkeley Heights and New Providence, located in Union County in northern New Jersey, United States....

It recognizes that this is only a theoretical limit but nevertheless sets itself a more realistic, practical goal of a 1,000 fold reduction within 5 years with future, as yet unidentified, technological changes.

Examples of several common encoding methods

Comma separated values (CSV
Comma-separated values
A comma-separated values file stores tabular data in plain-text form. As a result, such a file is easily human-readable ....

- a list of data values separated by commas)
Tab separated values (TSV) - a list of data values separated by 'tab
Tab key
Tab key on a keyboard is used to advance the cursor to the next tab stop.- Origin :The word tab derives from the word tabulate, which means "to arrange data in a tabular, or table, form"...

' characters
HyperText Markup Language (HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

) - the predominant markup language
Markup language
A markup language is a modern system for annotating a text in a way that is syntactically distinguishable from that text. The idea and terminology evolved from the "marking up" of manuscripts, i.e. the revision instructions by editors, traditionally written with a blue pencil on authors' manuscripts...

for web page
Web page
A web page or webpage is a document or information resource that is suitable for the World Wide Web and can be accessed through a web browser and displayed on a monitor or mobile device. This information is usually in HTML or XHTML format, and may provide navigation to other web pages via hypertext...

s
Extensible Markup Language (XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

) - a generic framework for storing any amount of text or any data whose structure can be represented as a tree with at least one element - the root element.
JavaScript Object Notation (JSON
JSON
JSON , or JavaScript Object Notation, is a lightweight text-based open standard designed for human-readable data interchange. It is derived from the JavaScript scripting language for representing simple data structures and associative arrays, called objects...

) - human-readable format for representing simple data structures

The last of these, (JSON) is apparently widely used for internet

Internet

The Internet is a global system of interconnected computer networks that use the standard Internet protocol suite to serve billions of users worldwide...

data transmission

Data transmission

Data transmission, digital transmission, or digital communications is the physical transfer of data over a point-to-point or point-to-multipoint communication channel. Examples of such channels are copper wires, optical fibres, wireless communication channels, and storage media...

, primarily it seems because the data can be uploaded by a single JavaScript

JavaScript

JavaScript is a prototype-based scripting language that is dynamic, weakly typed and has first-class functions. It is a multi-paradigm language, supporting object-oriented, imperative, and functional programming styles....

'eval

Eval

In some programming languages, eval is a function which evaluates a string as though it were an expression and returns a result; in others, it executes multiple lines of code as though they had been included instead of the line including the eval...

' statement - without the need to produce what otherwise would likely have been a more efficient purpose built encoder/decoder. There are in fact quite large amounts of repeated (and therefore redundant data

Redundancy (engineering)

In engineering, redundancy is the duplication of critical components or functions of a system with the intention of increasing reliability of the system, usually in the case of a backup or fail-safe....

) in this particular format, and also in HTML and XML source, that could quite easily be eliminated.
XML is recognized as a verbose form of encoding. Binary XML

Binary XML

Binary XML refers to any specification which defines the compact representation of XML in a binary format. While there are several competing formats, none has been widely adopted by a standards organization or accepted as a de facto standard...

has been put forward as one method of reducing transfer and processing times for XML and, while there are several competing formats, none has been widely adopted by a standards organization or accepted as a de facto standard. It has also been criticized by Jimmy Zhang for essentially trying to solve the wrong problem

There are a number of freely available products on the market that partially compress HTML files and perform some or all of the following:

merge lines
remove unnecessary whitespace
Whitespace (computer science)
In computer science, whitespace is any single character or series of characters that represents horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visual mark, but typically does occupy an area on a page...

characters
remove unnecessary quotation marks. For example, BORDER="0" will be replaced with BORDER=0)
replace some tags with shorter ones (e.g. replace STRIKE tags with S, STRONG with B and EM with I)
remove HTML comments (comments within scripts and styles are not removed)
remove tags
remove named meta tag
Meta element
Meta elements are the HTML or XHTML <meta … > element used to provide structured metadata about a Web page. Multiple elements are often used on the same page: the element is the same, but its attributes are different...

s

The effect of this limited form of compression is to make the HTML code smaller and faster to load, but more difficult to read manually (so the original HTML code is usually retained for updating), but since it is predominantly meant to be processed only by a browser

Web browser

A web browser is a software application for retrieving, presenting, and traversing information resources on the World Wide Web. An information resource is identified by a Uniform Resource Identifier and may be a web page, image, video, or other piece of content...

, this causes no problems. Despite these small improvements, HTML, which is the predominant language for the web still remains a predominantly source distributed, interpreted markup language, with high redundancy.

Kolmogorov complexity

The study of encoding techniques has been examined in depth in an area of computer science characterized by a method known as Kolmogorov complexity

Kolmogorov complexity

In algorithmic information theory , the Kolmogorov complexity of an object, such as a piece of text, is a measure of the computational resources needed to specify the object...

where a value known as ('K') is accepted as 'not a computable function'. The Kolmogorov complexity of any computable object is the length of the shortest program that computes it and then halts. The invariance theorem

Invariance theorem

In algorithmic information theory, the invariance theorem, originally proved by Ray Solomonoff, states that a universal Turing machine provides an optimal means of description, up to an additive constant...

shows that it is not really important which computer is used. Essentially this implies that there is no automated method that can produce an optimum result and is therefore characterized by a requirement for human ingenuity

Ingenuity

Ingenuity refers to the process of applying ideas to solve problems or meet challenges. The process of figuring out how to cross a mountain stream using a fallen log, build an airplane from a sheet of paper, or start a new company in a foreign culture all involve the exercising of ingenuity...

or Innovation

Innovation

Innovation is the creation of better or more effective products, processes, technologies, or ideas that are accepted by markets, governments, and society...

. See also Algorithmic probability

Algorithmic probability

In algorithmic information theory, algorithmic probability is a method of assigning a probability to each hypothesis that explains a given observation, with the simplest hypothesis having the highest probability and the increasingly complex hypotheses receiving increasingly small probabilities...

.
Effect of programming paradigms

The effect that different programming paradigm

Programming paradigm

A programming paradigm is a fundamental style of computer programming. Paradigms differ in the concepts and abstractions used to represent the elements of a program and the steps that compose a computation A programming paradigm is a fundamental style of computer programming. (Compare with a...

s have on algorithmic efficiency is fiercely contested, with both supporters and antagonist

Antagonist

An antagonist is a character, group of characters, or institution, that represents the opposition against which the protagonist must contend...

s for each new paradigm. Strong supporters of structured programming

Structured programming

Structured programming is a programming paradigm aimed on improving the clarity, quality, and development time of a computer program by making extensive use of subroutines, block structures and for and while loops - in contrast to using simple tests and jumps such as the goto statement which could...

, such as Dijkstra for instance, who favour entirely goto-less

Structured programming

programs are met with conflicting evidence that appears to nullify its supposed benefits.
The truth is, even if the structured code itself contains no goto

Goto

goto is a statement found in many computer programming languages. It is a combination of the English words go and to. It performs a one-way transfer of control to another line of code; in contrast a function call normally returns control...

s, the optimizing compiler that creates the binary code

Binary code

A binary code is a way of representing text or computer processor instructions by the use of the binary number system's two-binary digits 0 and 1. This is accomplished by assigning a bit string to each particular symbol or instruction...

almost certainly generates them (and not necessarily in the most efficient way).
Similarly, OOP

Object-oriented programming

Object-oriented programming is a programming paradigm using "objects" – data structures consisting of data fields and methods together with their interactions – to design applications and computer programs. Programming techniques may include features such as data abstraction,...

protagonist

Protagonist

A protagonist is the main character of a literary, theatrical, cinematic, or musical narrative, around whom the events of the narrative's plot revolve and with whom the audience is intended to most identify...

s who claim their paradigm is superior are met with opposition from strong sceptics such as Alexander Stepanov

Alexander Stepanov

Alexander Alexandrovich Stepanov is the primary designer and implementer of the C++ Standard Template Library, which he started to develop around 1992 while employed at HP Labs...

who suggested that OOP provides a mathematically limited viewpoint and called it, "almost as much of a hoax as Artificial Intelligence"
In the long term, benchmark

Benchmark (computing)

In computing, a benchmark is the act of running a computer program, a set of programs, or other operations, in order to assess the relative performance of an object, normally by running a number of standard tests and trials against it...

s, using real-life examples, provide the only real hope of resolving such conflicts - at least in terms of run-time efficiency.
Optimization techniques
The word optimize is normally used in relation to an existing algorithm/computer program (i.e. to improve upon completed code). In this section it is used both in the context of existing programs and also in the design and implementation of new algorithms, thereby avoiding the most common performance pitfalls. It is clearly wasteful to produce a working program - at first using an algorithm that ignores all efficiency issues - only to then have to redesign or rewrite sections of it if found to offer poor performance. Optimization can be broadly categorized into two domains:-

Environment specific - that are essentially worthwhile only on certain platforms
Platform (computing)
A computing platform includes some sort of hardware architecture and a software framework , where the combination allows software, particularly application software, to run...

or particular computer languages
General techniques - that apply irrespective of platform

Environment specific

Optimization of algorithms frequently depends on the properties of the machine the algorithm will be executed on as well as the language the algorithm is written in and chosen data type

Data type

In computer programming, a data type is a classification identifying one of various types of data, such as floating-point, integer, or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of...

s. For example, a programmer might optimize code for time efficiency in an application for home computers (with sizable amounts of memory), but for code destined to be embedded in small, "memory-tight" devices, the programmer may have to accept that it will run more slowly, simply because of the restricted memory available for any potential software optimization.

For a discussion of hardware

Hardware

Hardware is a general term for equipment such as keys, locks, hinges, latches, handles, wire, chains, plumbing supplies, tools, utensils, cutlery and machine parts. Household hardware is typically sold in hardware stores....

performance, see article on Computer performance

Computer performance

Computer performance is characterized by the amount of useful work accomplished by a computer system compared to the time and resources used.Depending on the context, good computer performance may involve one or more of the following:...

which covers such things as CPU clock speed, cycles per instruction and other relevant metrics

Metric (mathematics)

In mathematics, a metric or distance function is a function which defines a distance between elements of a set. A set with a metric is called a metric space. A metric induces a topology on a set but not all topologies can be generated by a metric...

. For a discussion on how the choice of particular instructions available on a specific machine effect efficiency, see later section 'Choice of instruction and data type'.

General techniques

Linear search such as unsorted table look-ups in particular can be very expensive in terms of execution time but can be reduced significantly through use of efficient techniques such as indexed arrays and binary searches. Using a simple linear search on first occurrence and using a cached result thereafter is an obvious compromise.
Use of index
Index (information technology)
In computer science, an index can be:# an integer that identifies an array element# a data structure that enables sublinear-time lookup -Array element identifier:...

ed program branching, utilizing branch table
Branch table
In computer programming, a branch table is a term used to describe an efficient method of transferring program control to another part of a program using a table of branch instructions. It is a form of multiway branch...

s or "threaded code"
Threaded code
In computer science, the term threaded code refers to a compiler implementation technique where the generated code has a form that essentially consists entirely of calls to subroutines...

to control program flow, (rather than using multiple conditional IF statements or unoptimized CASE/SWITCH
Switch statement
In computer programming, a switch, case, select or inspect statement is a type of selection control mechanism that exists in most imperative programming languages such as Pascal, Ada, C/C++, C#, Java, and so on. It is also included in several other types of languages...

) can drastically reduce instruction path length
Instruction path length
In computer performance, the instruction path length is the number of machine code instructions required to execute a section of a computer program. The total path length for the entire program could be deemed a measure of the algorithm's performance on a particular computer hardware...

, simultaneously reduce program size
Binary file
A binary file is a computer file which may contain any type of data, encoded in binary form for computer storage and processing purposes; for example, computer document files containing formatted text...

and even also make a program easier to read and more easily maintainable (in effect it becomes a 'decision table
Decision table
Decision tables are a precise yet compact way to model complicated logic.Decision tables, like flowcharts and if-then-else and switch-case statements, associate conditions with actions to perform, but in many cases do so in a more elegant way....

' rather than repetitive spaghetti code
Spaghetti code
Spaghetti code is a pejorative term for source code that has a complex and tangled control structure, especially one using many GOTOs, exceptions, threads, or other "unstructured" branching constructs. It is named such because program flow tends to look like a bowl of spaghetti, i.e. twisted and...

).
Loop unrolling performed manually, or more usually by an optimizing compiler, can provide significant savings in some instances. By processing 'blocks' of several array elements at a time, individually addressed, (for example, within a While loop), much pointer arithmetic and end of loop testing can be eliminated, resulting in decreased instruction path lengths. Other Loop optimization
Loop optimization
In compiler theory, loop optimization plays an important role in improving cache performance, making effective use of parallel processing capabilities, and reducing overheads associated with executing loops. Most execution time of a scientific program is spent on loops...

s are also possible.

Tunnel vision

There are many techniques for improving algorithms, but focusing on a single favorite technique can lead to a "tunnel vision

Tunnel vision

Tunnel vision is the loss of peripheral vision with retention of central vision, resulting in a constricted circular tunnel-like field of vision.- Medical / biological causes :Tunnel vision can be caused by:...

" mentality. For example, in this X86 assembly example, the author offers loop unrolling as a reasonable technique that provides some 40% improvements to his chosen example. However, the same example would benefit significantly from both inlining and use of a trivial hash function. If they were implemented, either as alternative or complementary techniques, an even greater percentage gain might be expected. A combination of optimizations may provide ever increasing speed, but selection of the most easily implemented and most effective technique, from a large repertoire of such techniques, is desirable as a starting point.

Dependency trees and spreadsheets

Spreadsheet

A spreadsheet is a computer application that simulates a paper accounting worksheet. It displays multiple cells usually in a two-dimensional matrix or grid consisting of rows and columns. Each cell contains alphanumeric text, numeric values or formulas...

s are a 'special case' of algorithms that self-optimize by virtue of their dependency trees that are inherent in the design of spreadsheets to reduce re-calculations when a cell changes. The results of earlier calculations are effectively cached within the workbook and only updated if another cells changed value effects it directly.

Table lookup

Table lookups

Lookup table

In computer science, a lookup table is a data structure, usually an array or associative array, often used to replace a runtime computation with a simpler array indexing operation. The savings in terms of processing time can be significant, since retrieving a value from memory is often faster than...

can make many algorithms more efficient, particularly when used to bypass computations with a high time complexity. However, if a wide input range is required, they can consume significant storage resources. In cases with a sparse valid input set, hash function

Hash function

A hash function is any algorithm or subroutine that maps large data sets to smaller data sets, called keys. For example, a single integer can serve as an index to an array...

s can be used to provide more efficient lookup access than a full table.

Hash function algorithms

A hash function is any well-defined procedure

Algorithm

or mathematical function

Function (mathematics)

In mathematics, a function associates one quantity, the argument of the function, also known as the input, with another quantity, the value of the function, also known as the output. A function assigns exactly one output to each input. The argument and the value may be real numbers, but they can...

which converts a large, possibly variable-sized amount of data into a small datum, usually a single integer

Integer

The integers are formed by the natural numbers together with the negatives of the non-zero natural numbers .They are known as Positive and Negative Integers respectively...

that may serve as an index to an array

Array data type

In computer science, an array type is a data type that is meant to describe a collection of elements , each selected by one or more indices that can be computed at run time by the program. Such a collection is usually called an array variable, array value, or simply array...

. The values returned by a hash function are called hash values, hash codes, hash sums, or simply hashes.

Hash functions are frequently used to speed up table lookups. The choice of a hashing function (to avoid a linear

Linear search

In computer science, linear search or sequential search is a method for finding a particular value in a list, that consists of checking every one of its elements, one at a time and in sequence, until the desired one is found....

or brute force search) depends critically on the nature of the input data, and their probability distribution

Probability distribution

In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

in the intended application.

Trivial hash function

Sometimes if the datum is small enough, a "trivial hash function" can be used to effectively provide constant time searches at almost zero cost. This is particularly relevant for single byte lookups (e.g. ASCII

ASCII

or EBCDIC

EBCDIC

Extended Binary Coded Decimal Interchange Code is an 8-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems....

characters)

Searching strings

Searching for particular text string

String (computer science)

In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....

s (for instance "tags" or keywords) in long sequences of characters potentially generates lengthy instruction path

Instruction path length

s. This includes searching for delimiter

Delimiter

A delimiter is a sequence of one or more characters used to specify the boundary between separate, independent regions in plain text or other data streams. An example of a delimiter is the comma character, which acts as a field delimiter in a sequence of comma-separated values.Delimiters represent...

s in comma separated

Comma-separated values

A comma-separated values file stores tabular data in plain-text form. As a result, such a file is easily human-readable ....

files or similar processing which can be very simply and effectively eliminated (using declarative notation for instance).

Several methods of reducing the cost for general searching have been examined and the "Boyer–Moore string search algorithm

Boyer–Moore string search algorithm

The Boyer–Moore string search algorithm is a particularly efficient string searching algorithm, and it has been the standard benchmark for the practical string search literature. It was developed by Bob Boyer and J Strother Moore in 1977...

" (or Boyer–Moore–Horspool algorithm

Boyer–Moore–Horspool algorithm

In computer science, the Boyer–Moore–Horspool algorithm or Horspool's algorithm is an algorithm for finding substrings in strings. It was published by Nigel Horspool in 1980....

, a similar but modified version) is one solution that has been proven to give superior results to repetitive comparisons of the entire search string along the sequence.

Hot spot analyzers

Special system software products known as "performance analyzers" are often available from suppliers to help diagnose "hot spot

Hot spot (computer science)

A hot spot in computer science is most usually defined as a region of a computer program where a high proportion of executed instructions occur or where most time is spent during the program's execution .If a program is stopped randomly, the program counter...

s" - during actual execution of computer programs - using real or test data - they perform a Performance analysis

Performance analysis

In software engineering, profiling is a form of dynamic program analysis that measures, for example, the usage of memory, the usage of particular instructions, or frequency and duration of function calls...

under generally repeatable conditions. They can pinpoint sections of the program that might benefit from specifically targeted programmer optimization without necessarily spending time optimizing the rest of the code. Using program re-runs, a measure of relative improvement can then be determined to decide if the optimization was successful and by what amount. Instruction Set Simulator

Instruction Set Simulator

s can be used as an alternative to measure the instruction path length

Instruction path length

at the machine code

Machine code

Machine code or machine language is a system of impartible instructions executed directly by a computer's central processing unit. Each instruction performs a very specific task, typically either an operation on a unit of data Machine code or machine language is a system of impartible instructions...

level between selected execution paths, or on the entire execution.

Regardless of the type of tool used, the quantitative values obtained can be used in combination with anticipated reductions (for the targeted code) to estimate a relative or absolute overall saving. For example if 50% of the total execution time (or path length) is absorbed in a subroutine

Subroutine

In computer science, a subroutine is a portion of code within a larger program that performs a specific task and is relatively independent of the remaining code....

whose speed can be doubled by programmer optimization, an overall saving of around 25% might be expected (Amdahl law).

Efforts have been made at the University of California

University of California

The University of California is a public university system in the U.S. state of California. Under the California Master Plan for Higher Education, the University of California is a part of the state's three-tier public higher education system, which also includes the California State University...

, Irvine

Irvine, California

Irvine is a suburban incorporated city in Orange County, California, United States. It is a planned city, mainly developed by the Irvine Company since the 1960s. Formally incorporated on December 28, 1971, the city has a population of 212,375 as of the 2010 census. However, the California...

to produce dynamic executable code using a combination of hot spot analysis and run-time program trace

Tracing (software)

In software engineering, tracing is a specialized use of logging to record information about a program's execution. This information is typically used by programmers for debugging purposes, and additionally, depending on the type and detail of information contained in a trace log, by experienced...

tree. A JIT

Just-in-time compilation

In computing, just-in-time compilation , also known as dynamic translation, is a method to improve the runtime performance of computer programs. Historically, computer programs had two modes of runtime operation, either interpreted or static compilation...

like dynamic compiler was built by Andreas Gal and others, "in which relevant (i.e., frequently executed) control flows are ...discovered lazily during execution"

Benchmarking & competitive algorithms

For new versions of software or to provide comparisons with competitive systems, benchmark

Benchmark (computing)

s are sometimes used which assist with gauging an algorithms relative performance. If a new sort

Sorting algorithm

In computer science, a sorting algorithm is an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order and lexicographical order...

algorithm is produced for example it can be compared with its predecessors to ensure that at least it is efficient as before with known data - taking into consideration any functional improvements.
Benchmarks can be used by customers when comparing various products from alternative suppliers to estimate which product will best suit their specific requirements in terms of functionality and performance. For example in the mainframe

Mainframe computer

Mainframes are powerful computers used primarily by corporate and governmental organizations for critical applications, bulk data processing such as census, industry and consumer statistics, enterprise resource planning, and financial transaction processing.The term originally referred to the...

world certain proprietary sort

Mainframe sort merge

The Sort/Merge utility is a mainframe program to sort records in a file into a specified order, merge pre-sorted files into a sorted file, or copy selected records...

products from independent software companies such as Syncsort

Syncsort

Syncsort Incorporated is a software company specializing in high speed sorting products, as well as data integration and backup software and services, for Windows, Unix, Linux, and mainframe systems. According to its website, Syncsort products are used by over 90 of the Fortune 100 companies and...

compete with products from the major suppliers such as IBM

IBM

International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...

for speed.
Some benchmarks provide opportunities for producing an analysis comparing the relative speed of various compiled and interpreted languages for example
and The Computer Language Benchmarks Game compares the performance of implementations of typical programming problems in several programming languages.
(Even creating "do it yourself

Do it yourself

Do it yourself is a term used to describe building, modifying, or repairing of something without the aid of experts or professionals...

" benchmarks to get at least some appreciation of the relative performance of different programming languages, using a variety of user specified criteria, is quite simple to produce as this "Nine language Performance roundup" by Christopher W. Cowell-Shah demonstrates by example)

Compiled versus interpreted languages

A compiled algorithm will, in general, execute faster than the equivalent interpreted

Interpreter (computing)

In computer science, an interpreter normally means a computer program that executes, i.e. performs, instructions written in a programming language...

algorithm simply because some processing is required even at run time to 'understand' (i.e. interpret) the instructions to effect an execution. A compiled program will normally output an object

Object code

Object code, or sometimes object module, is what a computer compiler produces. In a general sense object code is a sequence of statements in a computer language, usually a machine code language....

or machine code

Machine code

equivalent of the algorithm that has already been processed by the compiler into a form more readily executed by microcode

Microcode

Microcode is a layer of hardware-level instructions and/or data structures involved in the implementation of higher level machine code instructions in many computers and other processors; it resides in special high-speed memory and translates machine instructions into sequences of detailed...

or the hardware directly. The popular Perl

Perl

Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

language is an example of an interpreted language and benchmarks indicate that it executes approximately 24 times more slowly than compiled C.

Optimizing compilers

Many compilers have features that attempt to optimize the code they generate, utilizing some of the techniques outlined in this article and others specific to the compilation itself. Loop optimization

Loop optimization

In compiler theory, loop optimization plays an important role in improving cache performance, making effective use of parallel processing capabilities, and reducing overheads associated with executing loops. Most execution time of a scientific program is spent on loops...

is often the focus of optimizing compilers because significant time is spent in program loops and parallel processing opportunities can often be facilitated by automatic code re-structuring such as loop unrolling. Optimizing compilers are by no means perfect. There is no way that a compiler can guarantee that, for all program source code, the fastest (or smallest) possible equivalent compiled program is output; such a compiler is fundamentally impossible because it would solve the halting problem

Halting problem

In computability theory, the halting problem can be stated as follows: Given a description of a computer program, decide whether the program finishes running or continues to run forever...

. Additionally, even optimizing compilers generally have no access to runtime metrics to enable them to improve optimization through 'learning'.

Just-in-time compilers

'On-the-fly' processors known today as just-in-time

Just-in-time compilation

or 'JIT' compilers combine features of interpreted languages with compiled languages and may also incorporate elements of optimization to a greater or lesser extent. Essentially the JIT compiler can compile small sections of source code statements (or bytecode

Bytecode

Bytecode, also known as p-code , is a term which has been used to denote various forms of instruction sets designed for efficient execution by a software interpreter as well as being suitable for further compilation into machine code...

) as they are newly encountered and (usually) retain the result for the next time the same source is processed. In addition, pre-compiled segments of code can be in-lined

Inline function

In various versions of the C and C++ programming languages, an inline function is a function upon which the compiler has been requested to perform inline expansion...

or called as dynamic functions that themselves perform equally fast as the equivalent 'custom' compiled function. Because the JIT processor also has access to run-time information (that a normal compiler cannot have) it is also possible for it to optimize further executions depending upon the input and also perform other run-time introspective

Introspection

Introspection is the self-observation and reporting of conscious inner thoughts, desires and sensations. It is a conscious and purposive process relying on thinking, reasoning, and examining one's own thoughts, feelings, and, in more spiritual cases, one's soul...

optimization as execution proceeds. A JIT processor may, or may not, incorporate self modifying code or its equivalent by creating 'fast path' routes through an algorithm. It may also use such techniques as dynamic Fractional cascading

Fractional cascading

In computer science, fractional cascading is a technique to speed up a sequence of binary searches for the same value in a sequence of related data structures. The first binary search in the sequence takes a logarithmic amount of time, as is standard for binary searches, but successive searches in...

or any other similar runtime device based on collected actual runtime metrics

Metric (mathematics)

.
It is therefore entirely possible that a JIT compiler might (counter intuitively) execute even faster than an optimally 'optimized' compiled program.

Self-modifying code

As mentioned above, just-in-time compilers often make extensive use of self-modifying code

Self-modifying code

In computer science, self-modifying code is code that alters its own instructions while it is executing - usually to reduce the instruction path length and improve performance or simply to reduce otherwise repetitively similar code, thus simplifying maintenance...

to generate the actual machine instructions required to be executed. The technique can also be used to reduce instruction path length

Instruction path length

s in application programs where otherwise repetitive conditional tests might otherwise be required within the main program flow. This can be particularly useful where a sub routine may have embedded debugging code that is either active (testing mode) or inactive (production mode) depending upon some input parameter. A simple solution using a form of dynamic dispatch

Dynamic dispatch

In computer science, dynamic dispatch is the process of mapping a message to a specific sequence of code at runtime. This is done to support the cases where the appropriate method can't be determined at compile-time...

ing is where the sub routine entry point is dynamically 'swapped' at initialization, depending upon the input parameter. Entry point A) includes the debugging code prologue and entry point B) excludes the prologue; thus eliminating all overhead

Computational overhead

In computer science, overhead is generally considered any combination of excess or indirect computation time, memory, bandwidth, or other resources that are required to attain a particular goal...

except the initial 'test and swap' (even when test/debugging is selected, when the overhead is simply the test/debugging code itself).

Genetic algorithm

In the world of performance related algorithms it is worth mentioning the role of genetic algorithm

Genetic algorithm

A genetic algorithm is a search heuristic that mimics the process of natural evolution. This heuristic is routinely used to generate useful solutions to optimization and search problems...

s which compete using similar methods to the natural world in eliminating inferior algorithms in favour of more efficient versions.

Object code optimizers

Some proprietary program optimizers such as the "COBOL Optimizer" developed by Capex Corporation

Capex Corporation

Capex Corporation was a software house based in Phoenix, Arizona founded by three former employees of General Electric. It was subsequently acquired by Computer Associates.-Products:* Autotab - an early batch spreadsheet program...

in the mid-1970s for COBOL

COBOL

COBOL is one of the oldest programming languages. Its name is an acronym for COmmon Business-Oriented Language, defining its primary domain in business, finance, and administrative systems for companies and governments....

, actually took the unusual step of optimizing the Object code

Object code

Object code, or sometimes object module, is what a computer compiler produces. In a general sense object code is a sequence of statements in a computer language, usually a machine code language....

(or binary file

Binary file

A binary file is a computer file which may contain any type of data, encoded in binary form for computer storage and processing purposes; for example, computer document files containing formatted text...

) after normal compilation. This type of optimizer, recently sometimes referred to as a "post pass" optimizer or peephole optimizer

Peephole optimization

In compiler theory, peephole optimization is a kind of optimization performed over a very small set of instructions in a segment of generated code. The set is called a "peephole" or a "window"...

, depended, in this case, upon knowledge of 'weaknesses' in the standard IBM COBOL compiler and actually replaced (or patch

Patch (computing)

A patch is a piece of software designed to fix problems with, or update a computer program or its supporting data. This includes fixing security vulnerabilities and other bugs, and improving the usability or performance...

ed) sections of the object code with more efficient code.

A number of other suppliers have recently adopted the same approach. See main article for list of these products

Alignment of data

Most processors execute faster if certain data values are aligned on word, doubleword or page

Page (computing)

A page, memory page, or virtual page is a fixed-length contiguous block of virtual memory that is the smallest unit of data for the following:* memory allocation performed by the operating system for a program; and...

boundaries. If possible design/specify structures to satisfy appropriate alignments. This avoids exceptions.

Locality of reference

To reduce Cache miss exceptions by providing good spatial locality of reference

Locality of reference

In computer science, locality of reference, also known as the principle of locality, is the phenomenon of the same value or related storage locations being frequently accessed. There are two basic types of reference locality. Temporal locality refers to the reuse of specific data and/or resources...

, specify 'high frequency'/volatile working storage data within defined structure(s) so that they are also allocated from contiguous sections of memory (rather than possibly scattered over many pages). Group closely related data values also 'physically' close together within these structures. Consider the possibility of creating an 'artificial' structure to group some otherwise unrelated, but nevertheless frequently referenced, items together.

Choice of instruction or data type

Particularly in an Assembly language

Assembly language

An assembly language is a low-level programming language for computers, microprocessors, microcontrollers, and other programmable devices. It implements a symbolic representation of the machine codes and other constants needed to program a given CPU architecture...

(although also applicable to HLL

High-level programming language

A high-level programming language is a programming language with strong abstraction from the details of the computer. In comparison to low-level programming languages, it may use natural language elements, be easier to use, or be from the specification of the program, making the process of...

statements), the choice of a particular 'instruction' or data type

Data type

, can have a large impact on execution efficiency. In general, instructions that process variables such as signed or unsigned 16-bit

16-bit

-16-bit architecture:The HP BPC, introduced in 1975, was the world's first 16-bit microprocessor. Prominent 16-bit processors include the PDP-11, Intel 8086, Intel 80286 and the WDC 65C816. The Intel 8088 was program-compatible with the Intel 8086, and was 16-bit in that its registers were 16...

or 32-bit

32-bit

The range of integer values that can be stored in 32 bits is 0 through 4,294,967,295. Hence, a processor with 32-bit memory addresses can directly access 4 GB of byte-addressable memory....

integer

Integer

The integers are formed by the natural numbers together with the negatives of the non-zero natural numbers .They are known as Positive and Negative Integers respectively...

s are faster than those that process floating point

Floating point

In computing, floating point describes a method of representing real numbers in a way that can support a wide range of values. Numbers are, in general, represented approximately to a fixed number of significant digits and scaled using an exponent. The base for the scaling is normally 2, 10 or 16...

or packed decimal. Modern processors are even capable of executing multiple 'fixed point' instructions in parallel with the simultaneous execution of a floating point instruction. If the largest integer to be encountered can be accommodated by the 'faster' data type, defining the variables as that type will result in faster execution - since even a non-optimizing compiler will, in-effect, be 'forced' to choose appropriate instructions that will execute faster than would have been the case with data types associated with 'slower' instructions. Assembler programmers (and optimizing compiler writers) can then also benefit from the ability to perform certain common types of arithmetic for instance - division by 2, 4, 8 etc. by performing the very much faster binary shift right operations (in this case by 1, 2 or 3 bits).

If the choice of input data type is not under the control of the programmer, although prior conversion (outside of a loop for instance) to a faster data type carries some overhead

Computational overhead

In computer science, overhead is generally considered any combination of excess or indirect computation time, memory, bandwidth, or other resources that are required to attain a particular goal...

, it can often be worthwhile if the variable is then to be used as a loop counter

Loop counter

In software engineering, a loop counter is the term often used to refer to the variable that controls the iterations of a loop...

, especially if the count could be quite a high value or there are many input values to process. As mentioned above, choice of individual assembler instructions (or even sometimes just their order of execution) on particular machines can effect the efficiency of an algorithm. See Assembly Optimization Tips for one quite numerous arcane

Esotericism

Esotericism or Esoterism signifies the holding of esoteric opinions or beliefs, that is, ideas preserved or understood by a small group or those specially initiated, or of rare or unusual interest. The term derives from the Greek , a compound of : "within", thus "pertaining to the more inward",...

list of various technical (and sometimes non-intuitive) considerations for choice of assembly instructions on different processors that also discusses the merits of each case. Sometimes microcode

Microcode

or hardware

Hardware

quirks can result in unexpected performance differences between processors that assembler programmers can actively code for - or else specifically avoid if penalties result - something even the best optimizing compiler may not be designed to handle.

Data granularity

The greater the granularity

Granularity

Granularity is the extent to which a system is broken down into small parts, either the system itself or its description or observation. It is the "extent to which a larger entity is subdivided...

of data definitions (such as splitting a geographic address into separate street/city/postal code fields) can have performance overhead

Computational overhead

In computer science, overhead is generally considered any combination of excess or indirect computation time, memory, bandwidth, or other resources that are required to attain a particular goal...

implications during processing. Higher granularity in this example implies more procedure calls in Object-oriented programming

Object-oriented programming

and parallel computing

Parallel computing

Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently . There are several different forms of parallel computing: bit-level,...

environments since the additional objects are accessed via multiple method

Subroutine

In computer science, a subroutine is a portion of code within a larger program that performs a specific task and is relatively independent of the remaining code....

calls rather than perhaps one.

Subroutine granularity

For structured programming

Structured programming

and procedural programming

Procedural programming

Procedural programming can sometimes be used as a synonym for imperative programming , but can also refer to a programming paradigm, derived from structured programming, based upon the concept of the procedure call...

in general, great emphasis is placed on designing programs as a hierarchy of (or at least a set of) subroutines. For object oriented programming, the method call (a subroutine call) is the standard method of testing and setting all values in objects and so increasing data granularity consequently causes increased use of subroutines.
The greater the granularity of subroutine usage, the larger the proportion of processing time devoted to the mechanism of the subroutine linkages themselves.The presence of a (called) subroutine in a program contributes nothing extra to the functionality of the program. The extent to which subroutines (and their consequent memory requirements) influences the overall performance of the complete algorithm depends to a large extent on the actual implementation.

In assembly language

Assembly language

programs, the invocation of a subroutine need not involve much overhead

Computational overhead

In computer science, overhead is generally considered any combination of excess or indirect computation time, memory, bandwidth, or other resources that are required to attain a particular goal...

, typically adding just a couple of machine instructions to the normal instruction path length

Instruction path length

, each one altering the control flow

Control flow

In computer science, control flow refers to the order in which the individual statements, instructions, or function calls of an imperative or a declarative program are executed or evaluated....

either to the subroutine or returning from it (saving the state

State (computer science)

In computer science and automata theory, a state is a unique configuration of information in a program or machine. It is a concept that occasionally extends into some forms of systems programming such as lexers and parsers....

on a stack

Stack (data structure)

In computer science, a stack is a last in, first out abstract data type and linear data structure. A stack can have any abstract data type as an element, but is characterized by only three fundamental operations: push, pop and stack top. The push operation adds a new item to the top of the stack,...

being optional, depending on the complexity of the subroutine and its requirement to reuse general purpose registers). In many cases, small subroutines that perform frequently used data transformations using 'general purpose' work areas can be accomplished without the need to save or restore any registers, including the return register.

By contrast, HLL

High-level programming language

programs typically always invoke a 'standard' procedure call (the calling convention
Calling convention
In computer science, a calling convention is a scheme for how subroutines receive parameters from their caller and how they return a result; calling conventions can differ in:...

), which involves saving the program state by default and usually allocating additional memory on the stack to save all registers and other relevant state data (the prologue and epilogue

Function prologue

In assembly language programming, the function prologue is a few lines of code at the beginning of a function, which prepare the stack and registers for use within the function...

code). Recursion

Recursion (computer science)

Recursion in computer science is a method where the solution to a problem depends on solutions to smaller instances of the same problem. The approach can be applied to many types of problems, and is one of the central ideas of computer science....

in a HLL program can consequently consume significant overhead in both memory and execution time managing the stack to the required depth.

Guy Steele pointed out in a 1977 paper that a well-designed programming language implementation can have very low overheads for procedural abstraction (but laments, in most implementations, that they seldom achieve this in practice - being "rather thoughtless or careless in this regard"). Steele concludes that "we should have a healthy respect for procedure calls" (because they are powerful) but he also cautioned "use them sparingly"

See section Avoiding costs for discussion of how inlining

Inline expansion

In computing, inline expansion, or inlining, is a manual or compiler optimization that replaces a function call site with the body of the callee. This optimization may improve time and space usage at runtime, at the possible cost of increasing the final size of the program In computing, inline...

subroutines can be used to improve performance. For the Java

Java (programming language)

Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

language, use of the "final" keyword can be used to force method

Method (computer science)

In object-oriented programming, a method is a subroutine associated with a class. Methods define the behavior to be exhibited by instances of the associated class at program run time...

inlining (resulting in elimination of the method call, no dynamic dispatch and the possibility to constant-fold the value - with no code executed at runtime)

Choice of language / mixed languages

Some computer languages can execute algorithms more efficiently than others. As stated already, interpreted languages often perform less efficiently than compiled languages. In addition, where possible, 'high-use', and time-dependent sections of code may be written in a language such as assembler

Assembly language

that can usually execute faster and make better use of resources on a particular platform than the equivalent HLL

High-level programming language

code on the same platform. This section of code can either be statically called

Subroutine

In computer science, a subroutine is a portion of code within a larger program that performs a specific task and is relatively independent of the remaining code....

or dynamically invoked (external function) or embedded within the higher level code (e.g. Assembler instructions embedded in a 'C

C (programming language)

C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

' language program).
The effect of higher levels of abstraction when using a HLL has been described as the Abstraction penalty
Programmers who are familiar with assembler language (in addition to their chosen HLL) and are also familiar with the code that will be generated by the HLL, under known conditions, can sometimes use HLL language primitive

Language primitive

In computing, language primitives are the simplest elements available in a programming language. A primitive can be defined as the smallest 'unit of processing' available to a programmer of a particular machine, or can be an atomic element of an expression in a language.-Machine level primitives:A...

s of that language to generate code almost identical to assembler language. This is most likely to be possible only in languages that support pointers such as PL/1 or C. This is facilitated if the chosen HLL compiler provides an optional assembler listing as part of its printout so that various alternatives can be explored without also needing specialist knowledge of the compiler internals.

Software validation versus hardware validation

An optimization technique that was frequently taken advantage of on 'legacy

Legacy system

A legacy system is an old method, technology, computer system, or application program that continues to be used, typically because it still functions for the users' needs, even though newer technology or more efficient methods of performing a task are now available...

' platforms was that of allowing the hardware (or microcode

Microcode

) to perform validation on numeric data fields such as those coded in (or converted to) packed decimal (or packed BCD).
The choice was to either spend processing time checking each field for a valid numeric content in the particular internal representation chosen or simply assume the data was correct and let the hardware

Hardware

detect the error upon execution. The choice was highly significant because to check for validity on multiple fields (for sometimes many millions of input records), it could occupy valuable computer resources. Since input data fields were in any case frequently built from the output of earlier computer processing, the actual probability of a field containing invalid data was exceedingly low and usually the result of some 'corruption'.
The solution was to incorporate an 'event handler' for the hardware detected condition ('data exception

Exception handling

Exception handling is a programming language construct or computer hardware mechanism designed to handle the occurrence of exceptions, special conditions that change the normal flow of program execution....

)' that would intercept the occasional errant data field and either 'report, correct and continue' or, more usually, abort

Abort (computing)

In a computer or data transmission system, to abort means to terminate, usually in a controlled manner, a processing activity because it is impossible or undesirable for the activity to proceed. Such an action may be accompanied by diagnostic information on the aborted process.In addition to being...

the run with a core dump

Core dump

In computing, a core dump consists of the recorded state of the working memory of a computer program at a specific time, generally when the program has terminated abnormally...

to try to determine the reason for the bad data.

Similar event handlers are frequently utilized in today's web based applications to handle other exceptional conditions but repeatedly parsing data input, to ensure its validity before execution, has nevertheless become much more commonplace - partly because processors have become faster (and the perceived need for efficiency in this area less significant) but, predominantly - because data structure

Data structure

In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...

s have become less 'formalized' (e.g. .csv and .tsv files) or uniquely identifiable (e.g. packed decimal). The potential savings using this type of technique may have therefore fallen into general dis-use as a consequence and therefore repeated data validations and repeated data conversions have become an accepted overhead

Computational overhead

In computer science, overhead is generally considered any combination of excess or indirect computation time, memory, bandwidth, or other resources that are required to attain a particular goal...

. Ironically, one consequence of this move to less formalized data structures is that a corruption of say, a numeric binary integer

Integer

The integers are formed by the natural numbers together with the negatives of the non-zero natural numbers .They are known as Positive and Negative Integers respectively...

value, will not be detected at all by the hardware upon execution (for instance: is an ASCII

ASCII

hexadecimal

Hexadecimal

In mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...

value '20202020' a valid signed or unsigned binary value - or simply a string of blanks that has corrupted it?)

Algorithms for vector & superscalar processors

Algorithms for vector processor

Vector processor

A vector processor, or array processor, is a central processing unit that implements an instruction set containing instructions that operate on one-dimensional arrays of data called vectors. This is in contrast to a scalar processor, whose instructions operate on single data items...

s are usually different than those for scalar processor

Scalar processor

Scalar processors represent the simplest class of computer processors. A scalar processor processes one datum at a time . , a scalar processor is classified as a SISD processor .In a vector processor, by contrast, a single instruction operates simultaneously on multiple data items...

s since they can process multiple instructions and/or multiple data elements in parallel (for examples of these differences see 'description' in the main article). The process of converting an algorithm from a scalar to a vector process is known as vectorization and methods for automatically performing this transformation as efficiently as possible are constantly sought.
There are intrinsic limitations for implementing instruction level parallelism

Instruction level parallelism

in Superscalar

Superscalar

processors (which are discussed in the 'limitations' section in the main article) but, in essence, the overhead

Computational overhead

In computer science, overhead is generally considered any combination of excess or indirect computation time, memory, bandwidth, or other resources that are required to attain a particular goal...

in deciding for certain if particular instruction sequences can be processed in parallel can sometimes exceed the efficiency gain in so doing. The achievable reduction is governed primarily by the (somewhat obvious) law known as Amdahl's law

Amdahl's law

Amdahl's law, also known as Amdahl's argument, is named after computer architect Gene Amdahl, and is used to find the maximum expected improvement to an overall system when only part of the system is improved...

, that essentially states that the improvement from parallel processing is determined by its slowest sequential component. Algorithms designed for this class of processor therefore require more care if they are not to unwittingly disrupt the potential gains.

Avoiding costs

Defining variables as integer
Integer
The integers are formed by the natural numbers together with the negatives of the non-zero natural numbers .They are known as Positive and Negative Integers respectively...

s for indexed arrays instead of floating point
Floating point
In computing, floating point describes a method of representing real numbers in a way that can support a wide range of values. Numbers are, in general, represented approximately to a fixed number of significant digits and scaled using an exponent. The base for the scaling is normally 2, 10 or 16...

will result in faster execution (see above).
Defining structures whose structure length is a multiple of a power of 2 (2,4,8,16 etc.), will allow the compiler to calculate array indexes by shifting a binary index by 1, 2 or more bits to the left, instead of using a multiply instruction will result in faster execution. Adding an otherwise redundant short filler variable to 'pad out' the length of a structure element to say 8 bytes when otherwise it would have been 6 or 7 bytes may reduce overall processing time by a worthwhile amount for very large arrays. See for generated code differences for C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

as for example.
Storage defined in terms of bits, when bytes would suffice, may inadvertently involve extremely long path lengths involving bitwise operation
Bitwise operation
A bitwise operation operates on one or more bit patterns or binary numerals at the level of their individual bits. This is used directly at the digital hardware level as well as in microcode, machine code and certain kinds of high level languages...

s instead of more efficient single instruction 'multiple byte' copy instructions. (This does not apply to 'genuine' intentional bitwise operations - used for example instead of multiplication or division by powers of 2 or for TRUE/FALSE flags.)
Unnecessary use of allocated dynamic storage when static storage
Static memory allocation
Static memory allocation refers to the process of allocating memory at compile-time before the associated program is executed, unlike dynamic memory allocation or automatic memory allocation where memory is allocated as required at run-time....

would suffice, can increase the processing overhead substantially - both increasing memory requirements and the associated allocation/deallocation path length
Path length
In chemistry, the path length is defined as the distance that light travels through a sample in an analytical cell. Typically, a sample cell is made of quartz, glass, or a plastic rhombic cuvette with a volume typically ranging from 0.1 mL to 10 mL or larger used in a spectrophotometer. For the...

overheads for each function call.
Excessive use of function calls for very simple functions, rather than in-line statements, can also add substantially to instruction path length
Instruction path length
In computer performance, the instruction path length is the number of machine code instructions required to execute a section of a computer program. The total path length for the entire program could be deemed a measure of the algorithm's performance on a particular computer hardware...

s and stack
Stack (data structure)
In computer science, a stack is a last in, first out abstract data type and linear data structure. A stack can have any abstract data type as an element, but is characterized by only three fundamental operations: push, pop and stack top. The push operation adds a new item to the top of the stack,...

/unstack overheads. For particularly time critical systems that are not also code size sensitive, automatic or manual inline expansion
Inline expansion
In computing, inline expansion, or inlining, is a manual or compiler optimization that replaces a function call site with the body of the callee. This optimization may improve time and space usage at runtime, at the possible cost of increasing the final size of the program In computing, inline...

can reduce path length by eliminating all the instructions that call the function and return from it. (A conceptually similar method, loop unrolling, eliminates the instructions required to set up and terminate a loop by, instead; repeating the instructions inside the loop multiple times. This of course eliminates the branch back instruction but may also increase the size of the binary file
Binary file
A binary file is a computer file which may contain any type of data, encoded in binary form for computer storage and processing purposes; for example, computer document files containing formatted text...

or, in the case of JIT
Just-in-time compilation
In computing, just-in-time compilation , also known as dynamic translation, is a method to improve the runtime performance of computer programs. Historically, computer programs had two modes of runtime operation, either interpreted or static compilation...

built code, dynamic memory. Also, care must be taken with this method, that re-calculating addresses for each statement within an unwound indexed loop is not more expensive than incrementing pointers within the former loop would have been. If absolute indexes are used in the generated (or manually created) unwound code, rather than variables, the code created may actually be able to avoid generated pointer arithmetic instructions altogether, using offsets instead).

Memory management

Whenever memory is automatically allocated (for example in HLL

High-level programming language

programs, when calling

Subroutine

In computer science, a subroutine is a portion of code within a larger program that performs a specific task and is relatively independent of the remaining code....

a procedure or when issuing a system call

System call

In computing, a system call is how a program requests a service from an operating system's kernel. This may include hardware related services , creating and executing new processes, and communicating with integral kernel services...

), it is normally released (or 'freed'/ 'deallocated'/ 'deleted' ) automatically when it is no longer required - thus allowing it to be re-used for another purpose immediately. Some memory management

Memory management

Memory management is the act of managing computer memory. The essential requirement of memory management is to provide ways to dynamically allocate portions of memory to programs at their request, and freeing it for reuse when no longer needed. This is critical to the computer system.Several...

can easily be accomplished by the compiler, as in this example. However, when memory is explicitly allocated (for example in OOP

Object-oriented programming

when "new" is specified for an object

Object (computer science)

In computer science, an object is any entity that can be manipulated by the commands of a programming language, such as a value, variable, function, or data structure...

), releasing the memory is often left to an asynchronous 'garbage collector

Garbage collection (computer science)

In computer science, garbage collection is a form of automatic memory management. The garbage collector, or just collector, attempts to reclaim garbage, or memory occupied by objects that are no longer in use by the program...

' which does not necessarily release the memory at the earliest opportunity (as well as consuming some additional CPU resources deciding if it can be). The current trend nevertheless appears to be towards taking full advantage of this fully automated method, despite the tradeoff in efficiency - because it is claimed that it makes programming easier. Some functional languages are known as 'lazy functional languages' because of their tendency to delay computations in form of allocated thunks, leading to significant use of garbage collection and can consume much more memory as a result.

Array processing
Array processing
Array processing is signal processing of the outputs of an array of sensors to:* Enhance the signal-to-interference-plus-noise ratio compared to that of a single sensor using conventional or adaptive beamforming....

may simplify programming but use of separate statements to sum different elements of the same array(s) may produce code that is not easily optimized and that requires multiple passes of the arrays that might otherwise have been processed in a single pass. It may also duplicate data if array slicing
Array slicing
In computer programming, array slicing is an operation that extracts certain elements from an array and packages them as another array, possibly with different number of indices and different index ranges. Two common examples are extracting a substring from a string of characters In computer...

is used, leading to increased memory
Memory
In psychology, memory is an organism's ability to store, retain, and recall information and experiences. Traditional studies of memory began in the fields of philosophy, including techniques of artificially enhancing memory....

usage and copying overhead
Computational overhead
In computer science, overhead is generally considered any combination of excess or indirect computation time, memory, bandwidth, or other resources that are required to attain a particular goal...

.
In OOP, if an object
Object (computer science)
In computer science, an object is any entity that can be manipulated by the commands of a programming language, such as a value, variable, function, or data structure...

is known to be immutable
Immutable object
In object-oriented and functional programming, an immutable object is an object whose state cannot be modified after it is created. This is in contrast to a mutable object, which can be modified after it is created...

, it can be copied simply by making a copy of a reference to it instead of copying the entire object. Because a reference (typically only the size of a pointer) is usually much smaller than the object itself, this results in memory savings and a boost in execution speed.

Readability, trade offs and trends

One must be careful, in the pursuit of good coding style, not to over-emphasize efficiency. Frequently, a clean, readable and 'usable' design is much more important than a fast, efficient design that is hard to understand. There are exceptions to this 'rule' (such as embedded system

Embedded system

An embedded system is a computer system designed for specific control functions within a larger system. often with real-time computing constraints. It is embedded as part of a complete device often including hardware and mechanical parts. By contrast, a general-purpose computer, such as a personal...

s, where space is tight, and processing power minimal) but these are rarer than one might expect.

However, increasingly, for many 'time critical' applications such as air line reservation systems, point-of-sale applications, ATMs

Automated teller machine

An automated teller machine or automatic teller machine, also known as a Cashpoint , cash machine or sometimes a hole in the wall in British English, is a computerised telecommunications device that provides the clients of a financial institution with access to financial transactions in a public...

(cash-point machines), Airline Guidance system

Guidance system

A guidance system is a device or group of devices used to navigate a ship, aircraft, missile, rocket, satellite, or other craft. Typically, this refers to a system that navigates without direct or continuous human control...

s, Collision avoidance systems and numerous modern web based applications - operating in a real-time environment

Real-time computing

In computer science, real-time computing , or reactive computing, is the study of hardware and software systems that are subject to a "real-time constraint"— e.g. operational deadlines from event to system response. Real-time programs must guarantee response within strict time constraints...

where speed of response is fundamental - there is little alternative.

Determining if optimization is worthwhile

The essential criteria for using optimized code are of course dependent upon the expected use of the algorithm. If it is a new algorithm and is going to be in use for many years and speed is relevant, it is worth spending some time designing the code to be as efficient as possible from the outset. If an existing algorithm is proving to be too slow or memory is becoming an issue, clearly something must be done to improve it.

For the average application, or for one-off applications, avoiding inefficient coding techniques and encouraging the compiler to optimize where possible may be sufficient.

One simple way (at least for mathematicians) to determine whether an optimization is worthwhile is as follows: Let the original time and space requirements (generally in Big-O notation) of the algorithm be

and

. Let the new code require

and

time and space respectively. If

, the optimization should be carried out. However, as mentioned above, this may not always be true.

Implications for algorithmic efficiency

A recent report, published in December 2007, from Global Action Plan, a UK-based environmental organisation found that computer server

Server (computing)

In the context of client-server architecture, a server is a computer program running to serve the requests of other programs, the "clients". Thus, the "server" performs some computational task on behalf of "clients"...

s are "at least as great a threat to the climate

Climate

Climate encompasses the statistics of temperature, humidity, atmospheric pressure, wind, rainfall, atmospheric particle count and other meteorological elemental measurements in a given region over long periods...

as SUVs or the global aviation

Aviation

Aviation is the design, development, production, operation, and use of aircraft, especially heavier-than-air aircraft. Aviation is derived from avis, the Latin word for bird.-History:...

industry" drawing attention to the carbon footprint of the IT

Information technology

Information technology is the acquisition, processing, storage and dissemination of vocal, pictorial, textual and numerical information by a microelectronics-based combination of computing and telecommunications...

industry in the UK.
According to an Environmental Research Letters

Environmental Research Letters

Environmental Research Letters is an open-access electronic-only peer-reviewed scientific journal covering research in all aspects of environmental science. Numerical modelling or simulation, as well as theoretical and experimental approaches to environmental science form the core content...

report published in September 2008, "Total power used by information technology equipment in data centers represented about 0.5% of world electricity consumption in 2005. When cooling and auxiliary infrastructure are included, that figure is about 1%. The total data center power demand in 2005 is equivalent (in capacity terms) to about seventeen 1000 MW power plants for the world."

Some media reports claim that performing two Google search

Google search

Google or Google Web Search is a web search engine owned by Google Inc. Google Search is the most-used search engine on the World Wide Web, receiving several hundred million queries each day through its various services....

es from a desktop computer

Desktop computer

A desktop computer is a personal computer in a form intended for regular use at a single location, as opposed to a mobile laptop or portable computer. Early desktop computers are designed to lay flat on the desk, while modern towers stand upright...

can generate about the same amount of carbon dioxide

Carbon dioxide

Carbon dioxide is a naturally occurring chemical compound composed of two oxygen atoms covalently bonded to a single carbon atom...

as boiling a kettle for a cup of tea, according to new research; however, the factual accuracy of this comparison is disputed, and the author of the study in question asserts that the two-searches-tea-kettle statistic is a misreading of his work.

Greentouch, a recently established consortium of leading Information and Communications Technology (ICT) industry, academic and non-governmental research experts, has set itself the mission of reducing reduce energy consumption per user by a factor of 1000 from current levels. "A thousand-fold reduction is roughly equivalent to being able to power the world’s communications networks, including the Internet, for three years using the same amount of energy that it currently takes to run them for a single day". The first meeting in February 2010 will establish the organization’s five-year plan, first year deliverables and member roles and responsibilities. Intellectual property issues will be addressed and defined in the forum’s initial planning meetings. The conditions for research and the results of that research will be high priority for discussion in the initial phase of the research forum’s development.

Computers having become increasingly more powerful over the past few decades, emphasis was on a 'brute force

Brute-force search

In computer science, brute-force search or exhaustive search, also known as generate and test, is a trivial but very general problem-solving technique that consists of systematically enumerating all possible candidates for the solution and checking whether each candidate satisfies the problem's...

' mentality. This may have to be reconsidered in the light of these reports and more effort placed in future on reducing carbon footprint

Carbon footprint

A carbon footprint has historically been defined as "the total set of greenhouse gas emissions caused by an organization, event, product or person.". However, calculating a carbon footprint which conforms to this definition is often impracticable due to the large amount of data required, which is...

s through optimization. It is a timely reminder that algorithmic efficiency is just another aspect of the more general thermodynamic efficiency. The genuine economic benefits of an optimized algorithm are, in any case, that more processing can be done for the same cost or that useful results can be shown in a more timely manner and ultimately, acted upon sooner.
Criticism of the current state of programming

David May
David May (computer scientist)
Michael David May, born February 24, 1951, is a British computer scientist. He is Professor of Computer Science at the University of Bristol and founder and Chief Technology Officer of XMOS Semiconductor.May was lead architect for the transputer...

FRS a British
United Kingdom
The United Kingdom of Great Britain and Northern IrelandIn the United Kingdom and Dependencies, other languages have been officially recognised as legitimate autochthonous languages under the European Charter for Regional or Minority Languages...

computer scientist
Computer scientist
A computer scientist is a scientist who has acquired knowledge of computer science, the study of the theoretical foundations of information and computation and their application in computer systems....

and currently Professor
Professor
A professor is a scholarly teacher; the precise meaning of the term varies by country. Literally, professor derives from Latin as a "person who professes" being usually an expert in arts or sciences; a teacher of high rank...

of Computer Science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

at University of Bristol
University of Bristol
The University of Bristol is a public research university located in Bristol, United Kingdom. One of the so-called "red brick" universities, it received its Royal Charter in 1909, although its predecessor institution, University College, Bristol, had been in existence since 1876.The University is...

and founder and CTO
Chief technical officer
A chief technology officer is an executive-level position in a company or other entity whose occupant is focused on scientific and technological issues within an organization....

of XMOS Semiconductor
XMOS
XMOS is a fabless semiconductor company that develops multi-core multi-threaded processors designed to execute several real-time tasks, DSP, and control flow all at once.-Company history:...

, believes one of the problems is that there is a reliance on Moore's law
Moore's Law
Moore's law describes a long-term trend in the history of computing hardware: the number of transistors that can be placed inexpensively on an integrated circuit doubles approximately every two years....

to solve inefficiencies. He has advanced an 'alternative' to Moore's law (May's law) stated as follows:
Software efficiency halves every 18 months, compensating Moore's Law
He goes on to state
In ubiquitous systems, halving the instructions executed can double the battery life and big data sets bring big opportunities for better software and algorithms: Reducing the number of operations from N x N to N x log(N) has a dramatic effect when N is large... for N = 30 billion, this change is as good as 50 years of technology improvements

Software author Adam N. Rosenburg in his blog "The failure of the Digital computer", has described the current state of programming as nearing the "Software event horizon", (alluding to the fictitious "shoe event horizon" described by Douglas Adams
Douglas Adams
Douglas Noel Adams was an English writer and dramatist. He is best known as the author of The Hitchhiker's Guide to the Galaxy, which started life in 1978 as a BBC radio comedy before developing into a "trilogy" of five books that sold over 15 million copies in his lifetime, a television...

in his Hitchhiker's Guide to the Galaxy book). He estimates there has been a 70 dB factor loss of productivity or "99.99999 percent, of its ability to deliver the goods", since the 1980s - "When Arthur C. Clarke
Arthur C. Clarke
Sir Arthur Charles Clarke, CBE, FRAS was a British science fiction author, inventor, and futurist, famous for his short stories and novels, among them 2001: A Space Odyssey, and as a host and commentator in the British television series Mysterious World. For many years, Robert A. Heinlein,...

compared the reality of computing in 2001 to the computer HAL
HAL 9000
HAL 9000 is the antagonist in Arthur C. Clarke's science fiction Space Odyssey saga. HAL is an artificial intelligence that interacts with the astronaut crew of the Discovery One spacecraft, usually represented as a red television-camera eye found throughout the ship...

in his book 2001: A Space Odyssey, he pointed out how wonderfully small and powerful computers were but how disappointing computer programming had become".
Conrad Weisert gives examples, some of which were published in ACM SIGPLAN (Special Interest Group on Programming Languages) Notices, December 1995 in: "Atrocious Programming Thrives"
Marc Andreessen
Marc Andreessen
Marc Andreessen is an American entrepreneur, investor, software engineer, and multi-millionaire best known as co-author of Mosaic, the first widely-used web browser, and co-founder of Netscape Communications Corporation. He founded and later sold the software company Opsware to Hewlett-Packard...

co-founder of Netscape
Netscape
Netscape Communications is a US computer services company, best known for Netscape Navigator, its web browser. When it was an independent company, its headquarters were in Mountain View, California...

is quoted in "Mavericks at Work" (ISBN 0060779616) as saying "Five great programmers can completely outperform 1,000 mediocre programmers."http://www.linkedin.com/news?actionBar=&articleID=590198259&ids=0NdjgSd3wOejkIc3cPej0VczARb3ARczwVcj0VdiMVczcUcP8OejkIc3gTdj0TczAR&aag=true&freq=weekly&trk=eml-tod-b-ttle-96