All Topics  
Benchmark (computing)

 

   Email Print
   Bookmark   Link






 

Benchmark (computing)



 
 
In computing
Computing

Computing is usually defined as the activity of using and developing computer technology, computer hardware and computer software. It is the computer-specific part of information technology....
, a benchmark is the act of running a computer program
Computer program

Computer programs are Instruction for a computer. A computer requires programs to function. Moreover, a computer program does not run unless its instructions are executed by a Central processing unit; however, a program may communicate an Algorithm#Formalization of algorithms to people without running....
, a set of programs, or other operations, in order to assess the relative performance of an object, normally by running a number of standard tests and trials against it. The term 'benchmark' is also mostly utilized for the purposes of elaborately-designed benchmarking programs themselves. Benchmarking is usually associated with assessing performance characteristics of computer hardware, for example, the floating point operation performance of a CPU
Central processing unit

A central processing unit is an electronic circuit that can execute computer programs. This broad definition can easily be applied to many early computers that existed long before the term "CPU" ever came into widespread usage....
, but there are circumstances when the technique is also applicable to software.






Discussion
Ask a question about 'Benchmark (computing)'
Start a new discussion about 'Benchmark (computing)'
Answer questions from other users
Full Discussion Forum



Encyclopedia


In computing
Computing

Computing is usually defined as the activity of using and developing computer technology, computer hardware and computer software. It is the computer-specific part of information technology....
, a benchmark is the act of running a computer program
Computer program

Computer programs are Instruction for a computer. A computer requires programs to function. Moreover, a computer program does not run unless its instructions are executed by a Central processing unit; however, a program may communicate an Algorithm#Formalization of algorithms to people without running....
, a set of programs, or other operations, in order to assess the relative performance of an object, normally by running a number of standard tests and trials against it. The term 'benchmark' is also mostly utilized for the purposes of elaborately-designed benchmarking programs themselves. Benchmarking is usually associated with assessing performance characteristics of computer hardware, for example, the floating point operation performance of a CPU
Central processing unit

A central processing unit is an electronic circuit that can execute computer programs. This broad definition can easily be applied to many early computers that existed long before the term "CPU" ever came into widespread usage....
, but there are circumstances when the technique is also applicable to software. Software benchmarks are, for example, run against compiler
Compiler

A compiler is a computer program that transforms source code written in a programming language into another computer language . The most common reason for wanting to transform source code is to create an executable program....
s or database management system
Database management system

A database management system is computer software that manages databases. DBMSes may use any of a variety of database models, such as the network model or relational model....
s. Another type of test program, namely test suite
Test suite

In software development, a test suite, less commonly known as a validation suite, is a collection of test cases that are intended to be used to test a software program to show that it has some specified set of behaviours....
s or validation suites, are intended to assess the correctness of software.

Benchmarks provide a method of comparing the performance of various subsystems across different chip/system architectures.

Purpose

As computer architecture
Computer architecture

Computer architecture in computer engineering is the conceptual design and fundamental operational structure of a computer system. It is a blueprint and functional description of requirements and design implementations for the various parts of a computer, focusing largely on the way by which the central processing unit performs internally an...
 advanced, it became more difficult to compare the performance of various computer systems simply by looking at their specifications. Therefore, tests were developed that allowed comparison of different architectures. For example, Pentium 4
Pentium 4

The Pentium 4 brand refers to Intel's line of single-core mainstream Desktop computer and laptop central processing units introduced on November 20, 2000 ....
 processors generally operate at a higher clock frequency than Athlon XP processors, which does not necessarily translate to more computational power. A slower processor, with regard to clock frequency, can perform as well as a processor operating at a higher frequency. See BogoMips
BogoMips

BogoMips is an unscientific measurement of CPU speed made by the Linux kernel when it boots, to calibrate an internal busy-loop. An oft-quoted definition of the term is "the number of million times per second a processor can do absolutely nothing."...
 and the megahertz myth
Megahertz Myth

The megahertz myth, or less commonly the gigahertz myth, refers to the error of using clock rate to compare the performance of different microprocessors....
.

Benchmarks are designed to mimic a particular type of workload on a component or system. Synthetic benchmarks do this by specially created programs that impose the workload on the component. Application benchmarks run real-world programs on the system. Whilst application benchmarks usually give a much better measure of real-world performance on a given system, synthetic benchmarks are useful for testing individual components, like a hard disk
Hard disk

A hard disk drive , commonly referred to as a hard drive, hard disk, or fixed disk drive, is a non-volatile storage device which stores digitally encoded data on rapidly rotating hard disk platters with magnetic surfaces....
 or networking device.

Benchmarks are particularly important in CPU design
CPU design

CPU design is the design engineering task of creating a central processing unit , a component of computer hardware. It is a subfield of electronics engineering and computer engineering....
, giving processor architects the ability to measure and make tradeoffs in microarchitectural
Microarchitecture

In computer engineering, microarchitecture is a description of the electrical circuitry of a computer, central processing unit, or digital signal processor that is sufficient for completely describing the operation of the hardware....
 decisions. For example, if a benchmark extracts the key algorithms of an application, it will contain the performance-sensitive aspects of that application. Running this much smaller snippet on a cycle-accurate simulator, can give clues on how to improve performance.

Prior to 2000, computer and microprocessor architects used SPEC
Spec

Dablink|Due to...
 to do this, although SPEC's Unix-based benchmarks were quite lengthy and thus unwieldy to use intact.

Computer manufacturers are known to configure their systems to give unrealistically high performance on benchmark tests that are not replicated in real usage. For instance, during the 1980s some compilers could detect a specific mathematical operation used in a well-known floating-point benchmark and replace the operation with a faster mathematically-equivalent operation. However, such a transformation was rarely useful outside the benchmark until the mid-1990s, when RISC and VLIW architectures emphasized the importance of compiler
Compiler

A compiler is a computer program that transforms source code written in a programming language into another computer language . The most common reason for wanting to transform source code is to create an executable program....
 technology as it related to performance. Benchmarks are now regularly used by compiler
Compiler

A compiler is a computer program that transforms source code written in a programming language into another computer language . The most common reason for wanting to transform source code is to create an executable program....
 companies to improve not only their own benchmark scores, but real application performance.

CPUs that have many execution units — such as a superscalar
Superscalar

A superscalar Central processing unit architecture implements a form of parallel computer called instruction level parallelism within a single processor....
 CPU, a VLIW CPU, or a reconfigurable computing
Reconfigurable computing

Reconfigurable computing is a computing paradigm combining some of the flexibility of software with the high performance of hardware by processing with very flexible high speed computing fabrics like FPGAs....
 CPU — typically have slower clock rates than a sequential CPU with one or two execution units, when built from transistors that are just as fast. Nevertheless, CPUs with many execution units often complete real-world and benchmark tasks in less time than the supposedly faster high-clock-rate CPU.

Given the large number of benchmarks available, a manufacturer can usually find at least one benchmark that shows its system will outperform another system; the other systems can be shown to excel with a different benchmark.

Manufacturers commonly report only those benchmarks (or aspects of benchmarks) that show their products in the best light. They also have been known to mis-represent the significance of benchmarks, again to show their products in the best possible light. Taken together, these practices are called bench-marketing.

Ideally benchmarks should only substitute for real applications if the application is unavailable, or too difficult or costly to port to a specific processor or computer system. If performance is critical, the only benchmark that matters is the intended workload.

Challenges

Benchmarking is not easy and often involves several iterative rounds in order to arrive at predictable, useful conclusions. Interpretation of benchmarking data is also extraordinarily difficult. Here is a partial list of common challenges:

  • Vendors tend to tune their products specifically for industry-standard benchmarks. Norton SysInfo (SI) is particularly easy to tune for, since it mainly biased toward the speed of multiple operations. Use extreme caution in interpreting such results.
  • Many benchmarks focus entirely on the speed of computational performance, neglecting other important features of a computer system, such as:
    • Benchmarks generally do not give any credit for any qualities of service aside from raw performance. Examples of unmeasured qualities of service include security, availability, reliability, execution integrity, serviceability, scalability (especially the ability to quickly and nondisruptively add or reallocate capacity), etc. There are often real trade-offs between and among these qualities of service, and all are important in business computing. Transaction Processing Performance Council
      Transaction Processing Performance Council

      Transaction Processing Performance Council is a non-profit organization founded in 1988 to define transaction processing and database benchmarks and to disseminate objective, verifiable TPC performance data to the industry....
       Benchmark specifications partially address these concerns by specifying ACID
      Acid

      An acid is traditionally considered any chemical compound that, when dissolved in water, gives a solution with a hydrogen ion Activity greater than in pure water, i.e....
       property tests, database scalability rules, and service level requirements.
    • In general, benchmarks do not measure Total cost of ownership
      Total cost of ownership

      Total cost of ownership is a financial estimate designed to help consumers and enterprise managers assess direct and indirect costs. It is used in many industries and this article...
      . Transaction Processing Performance Council Benchmark specifications partially address this concern by specifying that a price/performance metric must be reported in addition to a raw performance metric, using a simplified TCO
      Total cost of ownership

      Total cost of ownership is a financial estimate designed to help consumers and enterprise managers assess direct and indirect costs. It is used in many industries and this article...
       formula.
    • Electrical power. When more power is used, a portable system will have a shorter battery life and require recharging more often. This is often the antithesis of performance as most semiconductors require more power to switch faster. See also performance per watt
      Performance per watt

      In computing, performance per watt is a measure of the energy efficiency of a particular computer architecture or computer hardware. Literally, it measures the rate of computation that can be delivered by a computer for every watt of power consumed....
      .
    • In some embedded systems, where memory is a significant cost, better code density can significantly reduce costs.
  • Benchmarks seldom measure real world performance of mixed workloads — running multiple applications concurrently in a full, multi-department or multi-application business context. For example, IBM's mainframe
    Mainframe computer

    Mainframes are computers used mainly by large organizations for critical applications, typically bulk data processing such as census, industry and consumer statistics, Enterprise Resource Planning, and financial transaction processing....
     servers (System z9
    System z9

    IBM System z9 is a line of IBM Mainframe computer. It was announced on July 25, 2005 and the first models were available on September 16, 2005....
    ) excel at mixed workload, but industry-standard benchmarks don't tend to measure the strong I/O and large and fast memory design such servers require. (Most other server architectures dictate fixed-function (single-purpose) deployments, e.g. "database servers" and "Web application servers" and "file servers," and measure only that. The better question is, "What more computing infrastructure would I need to fully support all this extra workload?")
  • Vendor benchmarks tend to ignore requirements for development, test, and disaster recovery
    Disaster recovery

    Disaster recovery is the process, policies and procedures related to preparing for recovery or continuation of technology infrastructure critical to an organization after a natural disaster or man-made hazards disaster....
     computing capacity. Vendors only like to report what might be narrowly required for production capacity in order to make their initial acquisition price seem as low as possible.
  • Benchmarks are having trouble adapting to widely distributed servers, particularly those with extra sensitivity to network topologies. The emergence of grid computing
    Grid computing

    Grid computing is the application of several computers to a single problem at the same time -- usually to a scientific or technical problem that requires a great number of computer processing cycles or access to large amounts of data....
    , in particular, complicates benchmarking since some workloads are "grid friendly", while others are not.
  • Users can have very different perceptions of performance than benchmarks may suggest. In particular, users appreciate predictability — servers that always meet or exceed service level agreement
    Service Level Agreement

    A service level agreement is a part of a service contract where the level of service is formally defined. In practice, the term SLA is sometimes used to refer to the contracted delivery time or performance....
    s. Benchmarks tend to emphasize mean scores (IT perspective) rather than low standard deviations (user perspective).
  • Many server architectures degrade dramatically at high (near 100%) levels of usage — "fall off a cliff" — and benchmarks should (but often do not) take that factor into account. Vendors, in particular, tend to publish server benchmarks at continuous at about 80% usage — an unrealistic situation — and do not document what happens to the overall system when demand spikes beyond that level.
  • Benchmarking institutions often disregard or do not follow basic scientific method. This includes, but is not limited to: small sample size, lack of variable control, and the limited repeatability of results.


Types of benchmarks

  1. Real program
    • word processing software
    • tool software of CDA
    • user's application software (MIS)
  2. Kernel
    • contains key codes
    • normally abstracted from actual program
    • popular kernel: Livermore loop
    • linpack benchmark (contains basic linear algebra subroutine written in FORTRAN language)
    • results are represented in MFLOPS
  3. Component Benchmark/ micro-benchmark
    • programs designed to measure performance of a computer's basic components
    • automatic detection of computer's hardware parameters like number of registers, cache size, memory latency
  4. Synthetic Benchmark
    • Procedure for programming synthetic Bench mark
      • take statistics of all type of operations from plenty of application programs
      • get proportion of each operation
      • write a program based on the proportion above
    • Types of Synthetic Benchmark are:
      • Whetstone
        Whetstone (benchmark)

        The Whetstone benchmark is a synthetic Benchmark for evaluating the performance of computers. It was first written in Algol 60 in 1972 at the National Physical Laboratory, UK in the United Kingdom and derived from statistics on program behaviour gathered on the English Electric KDF9 computer, using a modified version of its Whetstone Algol 6...
      • Dhrystone
        Dhrystone

        Dhrystone is a synthetic computing Benchmark program developed in 1984 by Reinhold P. Weicker intended to be representative of system programming....
    • These were the first general purpose industry standard computer benchmarks. They do not necessarily obtain high scores on modern pipelined computers.
  5. I/O benchmarks
  6. Parallel benchmarks:- used on machines with multiple processors or systems consisting of multiple machines.


Common benchmarks


Industry standard (audited and verifiable)

  • Business Applications Performance Corporation (BAPCo)
    BAPCo consortium

    BAPCo, Business Applications Performance Corporation, is a non-profit consortium with a charter to develop and distribute a set of objective performance benchmarks for personal computers based on popular software applications and operating systems....
  • Embedded Microprocessor Benchmark Consortium (EEMBC)
    EEMBC

    EEMBC, the Embedded Microprocessor Benchmark Consortium, is a non-profit organization formed in 1997 with the aim of developing meaningful performance benchmark s for the hardware and software used in embedded systems....
  • Standard Performance Evaluation Corporation
    Standard Performance Evaluation Corporation

    The Standard Performance Evaluation Corporation is a non-profit organization that aims to produce "fair, impartial and meaningful Benchmark s for computers." SPEC was founded in 1988 and their goal is to ensure that the marketplace has a fair and useful set of metrics to differentiate candidate systems....
     (SPEC)
  • Transaction Processing Performance Council
    Transaction Processing Performance Council

    Transaction Processing Performance Council is a non-profit organization founded in 1988 to define transaction processing and database benchmarks and to disseminate objective, verifiable TPC performance data to the industry....
     (TPC)


Open source benchmarks

  • DEISA Benchmark Suite: scientific HPC applications benchmark
  • Dhrystone
    Dhrystone

    Dhrystone is a synthetic computing Benchmark program developed in 1984 by Reinhold P. Weicker intended to be representative of system programming....
    : integer arithmetic performance
  • Fhourstones
    Fhourstones

    Fhourstones is an integer benchmark that efficiently solves positions in the game ofConnect Four.Available in both ANSI-C and Java, it is quite portable and compact , and uses 50Mb of memory....
    : an integer benchmark
  • HINT
    Hint

    Hint and similar may refer to:*Hint , musician Jonathan James, from Sussex, UK.*Hint , a feature of the SQL computer language*Font hinting, a process for optimizing the rasterization of vectors...
    : It ranks a computer system as a whole.
  • Iometer
    Iometer

    Iometer is an Input/output subsystem measurement and characterization tool for single and clustered systems. It is used as a benchmark and troubleshooting tool and is easily configured to replicate the behaviour of many popular applications....
    : I/O subsystem measurement and characterization tool for single and clustered systems.
  • Linpack
    LINPACK

    LINPACK is a software library_ for performing numerical linear algebra on digital computers. It was written in Fortran by Jack Dongarra, Jim Bunch, Cleve Moler, and Pete Stewart, and was intended for use on supercomputers in the 1970s and early 1980s....
     / LAPACK
    LAPACK

    LAPACK, the Linear Algebra PACKage, is a software library for numerical computation originally written in Fortran and now written in Fortran....
  • Livermore loops
    Livermore loops

    Livermore loops is a Benchmark for parallel computing. It was created by Francis H. McMahon from scientific source code run on computers at Lawrence Livermore National Laboratory....
  • NAS parallel benchmarks
    NAS benchmarks

    The NAS Parallel Benchmarks are a set of benchmark s targeting performance evaluation of highly parallel supercomputers. They are developed and maintained by the NASA Advanced Supercomputing facility based at the NASA Ames Research Center....
  • PAL: a benchmark for realtime physics engines
  • Phoronix Test Suite
    Phoronix Test Suite

    Phoronix Test Suite is a Benchmark software for Linux developed by Phoronix with cooperation from an undisclosed number of hardware and software vendors....
    : open-source benchmarking suite for Linux, OpenSolaris, and FreeBSD
  • POV-Ray
    POV-Ray

    The Persistence of Vision Raytracer, or POV-Ray, is a ray tracing program available for a variety of computer platforms. It was originally based on DKBTrace, written by David Kirk Buck and Aaron A....
    : 3D render
  • TPoX: An XML transaction processing benchmark for XML databases
  • Ubench: A simple cpu and memory benchmark for various flavors of Unix (including Linux).
  • VMmark
    VMmark

    VMmark is a freeware virtual machine Benchmark software suite from VMware, Inc., a division of EMC Corporation. The suite measures the performance of virtualized servers while running under load on a set of physical hardware....
    : a server virtualization benchmark suite from VMware
    VMware

    VMware, Inc. is a software developer of virtualization software. The company was founded in 1998 and is based in Palo Alto, California. The Company is majority owned by EMC Corporation ....
    .
  • Whetstone
    Whetstone (benchmark)

    The Whetstone benchmark is a synthetic Benchmark for evaluating the performance of computers. It was first written in Algol 60 in 1972 at the National Physical Laboratory, UK in the United Kingdom and derived from statistics on program behaviour gathered on the English Electric KDF9 computer, using a modified version of its Whetstone Algol 6...
    : floating-point arithmetic performance
  • LMBench: Suite of simple, portable benchmarks, useful for comparing performance of different UNIX systems


Microsoft Windows benchmarks

  • BAPCo
    BAPCo consortium

    BAPCo, Business Applications Performance Corporation, is a non-profit consortium with a charter to develop and distribute a set of objective performance benchmarks for personal computers based on popular software applications and operating systems....
    : MobileMark, SYSmark, WebMark
  • Futuremark
    Futuremark

    Futuremark Corporation is a Finland software development company, that produces computer benchmark applications for home users and businesses. Company headquarters and R&D department are located in Espoo, Finland....
    :3DMark
    3DMark

    3DMark is a computer benchmarking tool created and developed by Futuremark to determine the performance testing of a computer's 3D graphic rendering and CPU workload processing capabilities....
    , PCMark
    PCMark

    PCMark is a series of computer benchmark tools developed by Futuremark . The tools are designed to test the performance of the user's CPU, read/write speeds of RAM and hard drives....
  • Whetstone
    Whetstone (benchmark)

    The Whetstone benchmark is a synthetic Benchmark for evaluating the performance of computers. It was first written in Algol 60 in 1972 at the National Physical Laboratory, UK in the United Kingdom and derived from statistics on program behaviour gathered on the English Electric KDF9 computer, using a modified version of its Whetstone Algol 6...
  • PiFast
  • SuperPrime
    SuperPrime

    SuperPrime is a computer program that is used for calculating the primality of a large set of positive natural numbers. Because of its multi-threaded nature and dynamic load scheduling, it scales excellently when using more than 1 thread ....
  • Super PI
  • WinSAT
    Windows System Assessment Tool

    The Windows System Assessment Tool is a module of Microsoft Windows Vista which measures various performance characteristics and capabilities of the hardware it is running on and reports them as a Windows Experience Index score, a number between 1.0 and 5.9 for Windows Vista and Windows Vista SP1 and between 1.0 and 7.9 for Windows 7...
    , exclusively for Windows Vista
    Windows Vista

    Windows Vista is one member in a family of operating systems developed by Microsoft for use on personal computers, including home and business Desktop computer, laptops, Tablet PCs, and media center PCs....
    , providing an index for consumers to rate their systems easily


Others

  • BRL-CAD
  • Khornerstone
    Khornerstone

    In computer performance testing, Khornerstone is a multipurpose Benchmark from Workstation Labs used in various periodicals. The source is not free. Results are published in "UNIX Review"....
  • iCOMP
    ICOMP

    iCOMP for Intel Comparative Microprocessor Performance was an index published by Intel used to measure the relative performance of its microprocessors....
    , the Intel comparative microprocessor performance, published by Intel
  • Performance Rating, modelling scheme used by AMD and Cyrix to reflect the relative performance usually compared to competing products


See also

  • Benchmarking
    Benchmarking

    Benchmarking is the process of comparing the cost, cycle time, productivity, or quality of a specific process or method to another that is widely considered to be an industry standard or best practice....
     (business perspective)
  • Test suite
    Test suite

    In software development, a test suite, less commonly known as a validation suite, is a collection of test cases that are intended to be used to test a software program to show that it has some specified set of behaviours....
     a collection of test cases intended to show that a software program has some specified set of behaviors
  • Figure of merit
    Figure of merit

    A figure of merit is a quantity used to characterize the performance of a device, system or method, relative to its alternatives. In engineering, figures of merit are often defined for particular materials or devices in order to determine their relative utility for an application....


Further reading

  • Jim Gray (Editor), The Benchmark Handbook for Database and Transaction Systems (2nd Edition), Morgan Kaufmann, 1993, ISBN 1-55860-292-5
  • Bert Scalzo, Kevin Kline, Claudia Fernandez, Donald K. Burleson, Mike Ault (2007), Database Benchmarking Practical Methods for Oracle & SQL Server. ISBN 0-9776715-3-4


External links

  • [news:comp.benchmarks benchmark newsgroup]
  • - Kernels, Synthetic, Component (CPU, Caches, RAM, Graphics, Disk and other I/O, Network), Real/Simulated Real, Burn-in