Performance analysis - AbsoluteAstronomy.com

Software engineering

Software Engineering is the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software, and the study of these approaches; that is, the application of engineering to software...

, profiling ("program profiling", "software profiling") is a form of dynamic program analysis

Dynamic program analysis

Dynamic program analysis is the analysis of computer software that is performed by executing programs built from that software system on a real or virtual processor. For dynamic program analysis to be effective, the target program must be executed with sufficient test inputs to produce interesting...

that measures, for example, the usage of memory, the usage of particular instructions

Instruction Set Simulator

An instruction set simulator is a simulation model, usually coded in a high-level programming language, which mimics the behavior of a mainframe or microprocessor by "reading" instructions and maintaining internal variables which represent the processor's registers.Instruction simulation is a...

, or frequency and duration of function calls. The most common use of profiling information is to aid program optimization

Optimization (computer science)

In computer science, program optimization or software optimization is the process of modifying a software system to make some aspect of it work more efficiently or use fewer resources...

.

Profiling is achieved by instrumenting either the program source code

Source code

In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...

or its binary executable form using a tool called a profiler (or code profiler).

The methodology of the profiler itself classify the profiler as event-based, as statistical, as instrumentation, or as simulation.

Gathering program events

Profilers use a wide variety of techniques to collect data, including hardware interrupts, code instrumentation

Instrumentation (computer programming)

In context of computer programming, instrumentation refers to an ability to monitor or measure the level of a product's performance, to diagnose errors and to write trace information. Programmers implement instrumentation in the form of code instructions that monitor specific components in a system...

, instruction set simulation

Instruction Set Simulator

, operating system hooks

Hooking

In computer programming, the term hooking covers a range of techniques used to alter or augment the behavior of an operating system, of applications, or of other software components by intercepting function calls or messages or events passed between software components...

, and performance counter

Hardware performance counter

In computers, hardware performance counters, or hardware counters are a set of special-purpose registers built into modern microprocessors to store the counts of hardware-related activities within computer systems...

s. The usage of profilers is 'called out' in the performance engineering

Performance Engineering

Performance engineering within systems engineering, encompasses the set of roles, skills, activities, practices, tools, and deliverables applied at every phase of the Systems Development Life Cycle which ensures that a solution will be designed, implemented, and operationally supported to meet the...

process.

Use of profilers

Program analysis tools are extremely important for understanding program behavior. Computer architects need such tools to evaluate how well programs will perform on new architectures

Computer architecture

In computer science and engineering, computer architecture is the practical art of selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals and the formal modelling of those systems....

. Software writers need tools to analyze their programs and identify critical sections of code. Compiler

Compiler

A compiler is a computer program that transforms source code written in a programming language into another computer language...

writers often use such tools to find out how well their instruction scheduling

Instruction scheduling

In computer science, instruction scheduling is a compiler optimization used to improve instruction-level parallelism, which improves performance on machines with instruction pipelines...

or branch prediction algorithm is performing... (ATOM, PLDI, '94)

The output of a profiler may be:-

A statistical summary of the events observed (a profile)

Summary profile information is often shown annotated against the source code statements where the events occur, so the size of measurement data is linear to the code size of the program.




 /* ------------ source------------------------- count */          

 0001             IF X = "A"                     0055

 0002                THEN DO                    

 0003                  ADD 1 to XCOUNT           0032

 0004                ELSE

 0005             IF X = "B"                     0055

A stream of recorded events (a trace)

For sequential programs, a summary profile is usually sufficient, but performance problems in parallel programs (waiting for messages or synchronization issues) often depend on the time relationship of events, thus requiring a full trace to get an understanding of what is happening.

The size of a (full) trace is linear to the program's instruction path length

Instruction path length

In computer performance, the instruction path length is the number of machine code instructions required to execute a section of a computer program. The total path length for the entire program could be deemed a measure of the algorithm's performance on a particular computer hardware...

, making it somewhat impractical. A trace may therefore be initiated at one point in a program and terminated at another point to limit the output.

An ongoing interaction with the hypervisor
Hypervisor
In computing, a hypervisor, also called virtual machine manager , is one of many hardware virtualization techniques that allow multiple operating systems, termed guests, to run concurrently on a host computer. It is so named because it is conceptually one level higher than a supervisory program...

(continuous or periodic monitoring via on-screen display for instance)

This provides the opportunity to switch a trace on or off at any desired point during execution in addition to viewing on-going metrics about the (still executing) program. It also provides the opportunity to suspend asynchronous processes at critical points to examine interactions with other parallel processes in more detail.

History

Performance analysis tools existed on IBM/360 and IBM/370 platforms from the early 1970s, usually based on timer interrupts which recorded the Program status word

Program status word

The Program status word is an IBM System/360 architecture and successors control register which performs the function of a Status register in other architectures, and more....

(PSW) at set timer intervals to detect "hot spots" in executing code. This was an early example of sampling

Sampling (statistics)

In statistics and survey methodology, sampling is concerned with the selection of a subset of individuals from within a population to estimate characteristics of the whole population....

(see below). In early 1974, Instruction Set Simulator

Instruction Set Simulator

s permitted full trace and other performance monitoring features.

Profiler-driven program analysis on Unix dates back to at least 1979, when Unix systems included a basic tool "prof" that listed each function and how much of program execution time it used. In 1982, gprof extended the concept to a complete call graph

Call graph

A call graph is a directed graph that represents calling relationships between subroutines in a computer program. Specifically, each node represents a procedure and each edge indicates that procedure f calls procedure g...

analysis.

In 1994, Amitabh Srivastava and Alan Eustace of Digital Equipment Corporation

Digital Equipment Corporation

Digital Equipment Corporation was a major American company in the computer industry and a leading vendor of computer systems, software and peripherals from the 1960s to the 1990s...

published a paper describing ATOM. ATOM is a platform for converting a program into its own profiler. That is, at compile time

Compile time

In computer science, compile time refers to either the operations performed by a compiler , programming language requirements that must be met by source code for it to be successfully compiled , or properties of the program that can be reasoned about at compile time.The operations performed at...

, it inserts code into the program to be analyzed. That inserted code outputs analysis data. This technique - modifying a program to analyze itself - is known as "instrumentation

Instrumentation (computer programming)

".

In 2004, both the gprof and ATOM papers appeared on the list of the 50 most influential PLDI papers of all time.

Flat profiler

Flat profilers compute the average call times, from the calls, and do not break down the call times based on the callee or the context.

Call-graph profiler

Call graph

profilers show the call times, and frequencies of the functions, and also the call-chains involved based on the callee. However context is not preserved.

Event-based profilers

The programming languages listed here have event-based profilers:

Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

: the JVMTI
Java Virtual Machine Tools Interface
Java Virtual Machine Tool Interface was introduced in J2SE 5.0 . This interface allows a program to inspect the state and to control the execution of applications running in the Java Virtual Machine...

(JVM Tools Interface) API, formerly JVMPI (JVM Profiling Interface), provides hooks to profilers, for trapping events like calls, class-load, unload, thread enter leave.
.NET
.NET Framework
The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability...

: Can attach a profiling agent as a COM server to the CLR using Profiling API. Like Java, the runtime then provides various callbacks into the agent, for trapping events like method JIT / enter / leave, object creation, etc. Particularly powerful in that the profiling agent can rewrite the target application's bytecode in arbitrary ways.
Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

: Python profiling includes the profile module, hotshot (which is call-graph based), and using the 'sys.setprofile' function to trap events like c_{call,return,exception}, python_{call,return,exception}.
Ruby
Ruby (programming language)
Ruby is a dynamic, reflective, general-purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features. Ruby originated in Japan during the mid-1990s and was first developed and designed by Yukihiro "Matz" Matsumoto...

: Ruby also uses a similar interface like Python for profiling. Flat-profiler in profile.rb, module, and ruby-prof a C-extension are present.

Statistical profilers

Some profilers operate by sampling

Sampling (statistics)

In statistics and survey methodology, sampling is concerned with the selection of a subset of individuals from within a population to estimate characteristics of the whole population....

. A sampling profiler probes the target program's program counter

Program counter

The program counter , commonly called the instruction pointer in Intel x86 microprocessors, and sometimes called the instruction address register, or just part of the instruction sequencer in some computers, is a processor register that indicates where the computer is in its instruction sequence...

at regular intervals using operating system

Operating system

An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

interrupt

Interrupt

In computing, an interrupt is an asynchronous signal indicating the need for attention or a synchronous event in software indicating the need for a change in execution....

s. Sampling profiles are typically less numerically accurate and specific, but allow the target program to run at near full speed.

The resulting data are not exact, but a statistical approximation. The actual amount of error is usually more than one sampling period. In fact, if a value is n times the sampling period, the expected error in it is the square-root of n sampling periods.

In practice, sampling profilers can often provide a more accurate picture of the target program's execution than other approaches, as they are not as intrusive to the target program, and thus don't have as many side effects (such as on memory caches or instruction decoding pipelines). Also since they don't affect the execution speed as much, they can detect issues that would otherwise be hidden. They are also relatively immune to over-evaluating the cost of small, frequently called routines or 'tight' loops. They can show the relative amount of time spent in user mode versus interruptible kernel mode such as system call

System call

In computing, a system call is how a program requests a service from an operating system's kernel. This may include hardware related services , creating and executing new processes, and communicating with integral kernel services...

processing.

Still, kernel code to handle the interrupts entails a minor loss of CPU cycles, diverted cache usage, and is unable to distinguish the various tasks occurring in uninterruptible kernel code (microsecond-range activity).

Dedicated hardware can go beyond this: some recent MIPS processors JTAG interface have a PCSAMPLE register, which samples the program counter

Program counter

in a truly undetectable manner.

Some of the most commonly used statistical profilers are AMD CodeAnalyst

CodeAnalyst

AMD CodeAnalyst is a GUI-based code profiler for x86-based machines. CodeAnalyst has similar look and feel on both Linux and Microsoft Windows platforms...

, Apple Inc. Shark, gprof, Intel VTune

VTune

Intel VTune Amplifier XE is a commercial application for software performance analysis for 32 and 64-bit x86 based machines, and has both GUI and command line interfaces. It is available for both Linux and Microsoft Windows operating systems...

and Parallel Amplifier (part of Intel Parallel Studio

Intel Parallel Studio

Intel Parallel Studio is a software development product developed by Intel that plugs into the Microsoft Visual Studio Integrated Development Environment. Its purpose is to facilitate developing programs for parallel computing...

Instrumenting profilers

Some profilers instrument the target program with additional instructions to collect the required information.

Instrumenting the program can cause changes in the performance of the program, potentially causing inaccurate results and heisenbugs. Instrumenting will always have some impact on the program execution, typically always slowing it. However, instrumentation can be very specific and be carefully controlled to have a minimal impact. The impact on a particular program depends on the placement of instrumentation points and the mechanism used to capture the trace. Hardware support for trace capture means that on some targets, instrumentation can be on just one machine instruction. The impact of instrumentation can often be deducted (i.e. eliminated by subtraction) from the results.

gprof is an example of a profiler that uses both instrumentation and sampling. Instrumentation is used to gather caller information and the actual timing values are obtained by statistical sampling.

Instrumentation

Manual: Performed by the programmer, e.g. by adding instructions to explicitly calculate runtimes, simply count events or calls to measurement APIs such as the Application Response Measurement
Application Response Measurement
Application Response Measurement is an open standard published by the Open Group for monitoring and diagnosing performance bottlenecks within complex enterprise applications that use loosely-coupled designs or service-oriented architectures....

standard.
Automatic source level: instrumentation added to the source code by an automatic tool according to an instrumentation policy.
Compiler assisted: Example: "gcc -pg ..." for gprof, "quantify g++ ..." for Quantify
Binary translation: The tool adds instrumentation to a compiled binary. Example: ATOM
Runtime instrumentation: Directly before execution the code is instrumented. The program run is fully supervised and controlled by the tool. Examples: Pin
Pin (computer program)
Pin is a dynamic binary instrumentation framework for the IA-32 and x86-64 instruction set architectures that enables the creation of dynamic program analysis tools...

, Valgrind
Valgrind
Valgrind is a GPL licensed programming tool for memory debugging, memory leak detection, and profiling. The name valgrind comes from the main entrance to Valhalla in Norse mythology....
Runtime injection: More lightweight than runtime instrumentation. Code is modified at runtime to have jumps to helper functions. Example: DynInst
DynInst
DynInst is a multi-platform runtime code-patching library developed at the University of Wisconsin–Madison and University of Maryland, College Park. It may be useful in the development of performance measurement tools, debuggers, and simulators. The most recent release is Version 7.0....

Interpreter instrumentation

Interpreter debug options can enable the collection of performance metrics as the interpreter encounters each target statement. A bytecode
Bytecode
Bytecode, also known as p-code , is a term which has been used to denote various forms of instruction sets designed for efficient execution by a software interpreter as well as being suitable for further compilation into machine code...

, control table
Control table
Control tables are tables that control the program flow or play a major part in program control. There are no rigid rules concerning the structure or content of a control table - its only qualifying attribute is its ability to direct program flow in some way through its 'execution' by an associated...

or JIT
JIT
JIT may refer to:* Various meanings of Just In Time:** Just-in-time compilation - a technique for improving the performance of virtual machines in computing.** Just-in-time - a business inventory strategy.* Jabber ICQ Transport....

interpreters are three examples that usually have complete control over execution of the target code, thus enabling extremely comprehensive data collection opportunities.

Hypervisor/Simulator

Hypervisor: Data are collected by running the (usually) unmodified program under a hypervisor
Hypervisor
In computing, a hypervisor, also called virtual machine manager , is one of many hardware virtualization techniques that allow multiple operating systems, termed guests, to run concurrently on a host computer. It is so named because it is conceptually one level higher than a supervisory program...

. Example: SIMMON
SIMMON
SIMMON was a proprietary software testing system developed in the late 1960s in the IBM Product Test Laboratory, then at Poughkeepsie, N.Y...
Simulator and Hypervisor: Data collected interactively and selectively by running the unmodified program under an Instruction Set Simulator
Instruction Set Simulator
An instruction set simulator is a simulation model, usually coded in a high-level programming language, which mimics the behavior of a mainframe or microprocessor by "reading" instructions and maintaining internal variables which represent the processor's registers.Instruction simulation is a...

. Examples: SIMON
SIMON (Batch Interactive test/debug)
SIMON was a proprietary test/debugging toolkit for interactively testing Batch programs designed to run on IBM's System 360/370/390 architecture....

and OLIVER.

External links

Article "Need for speed — Eliminating performance bottlenecks" on doing execution time analysis of Java applications using IBM Rational Application Developer
IBM Rational Application Developer
IBM Rational Application Developer for WebSphere Software is a commercial Eclipse-based integrated development environment , made by IBM's Rational Software division, for visually designing, constructing, testing, and deploying Web services, portals, and Java Enterprise Edition ...

.
Profiling Runtime Generated and Interpreted Code using the VTune Performance Analyzer

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.

Gathering program events

Use of profilers

History

Flat profiler

Call-graph profiler

Event-based profilers

Statistical profilers

Instrumenting profilers

Instrumentation

Interpreter instrumentation

Hypervisor/Simulator

See also

External links