SAS System
Encyclopedia
SAS is an integrated system of software products provided by SAS Institute Inc. that enables programmers to perform:
  • retrieval
    Information retrieval
    Information retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...

    , management
    Data management
    Data management comprises all the disciplines related to managing data as a valuable resource.- Overview :The official definition provided by DAMA International, the professional organization for those in the data management profession, is: "Data Resource Management is the development and execution...

    , and mining
    Data mining
    Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...

  • report
    Report
    A report is a textual work made with the specific intention of relaying information or recounting certain events in a widely presentable form....

     writing and graphics
  • statistical analysis
    Statistics
    Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

  • business plan
    Business plan
    A business plan is a formal statement of a set of business goals, the reasons why they are believed attainable, and the plan for reaching those goals. It may also contain background information about the organization or team attempting to reach those goals....

    ning, forecasting
    Forecasting
    Forecasting is the process of making statements about events whose actual outcomes have not yet been observed. A commonplace example might be estimation for some variable of interest at some specified future date. Prediction is a similar, but more general term...

    , and decision support
    Decision support system
    A decision support system is a computer-based information system that supports business or organizational decision-making activities. DSSs serve the management, operations, and planning levels of an organization and help to make decisions, which may be rapidly changing and not easily specified in...

  • operations research
    Operations research
    Operations research is an interdisciplinary mathematical science that focuses on the effective use of technology by organizations...

     and project management
    Project management
    Project management is the discipline of planning, organizing, securing, and managing resources to achieve specific goals. A project is a temporary endeavor with a defined beginning and end , undertaken to meet unique goals and objectives, typically to bring about beneficial change or added value...

  • quality improvement
    Quality control
    Quality control, or QC for short, is a process by which entities review the quality of all factors involved in production. This approach places an emphasis on three aspects:...

  • applications
    Application software
    Application software, also known as an application or an "app", is computer software designed to help the user to perform specific tasks. Examples include enterprise software, accounting software, office suites, graphics software and media players. Many application programs deal principally with...

     development
  • data warehousing (extract, transform, load
    Extract, transform, load
    Extract, transform and load is a process in database usage and especially in data warehousing that involves:* Extracting data from outside sources* Transforming it to fit operational needs...

    )
  • platform independent and remote computing


In addition, SAS has many business solutions that enable large-scale software solutions for areas such as IT management, human resource management
Human resource management
Human Resource Management is the management of an organization's employees. While human resource management is sometimes referred to as a "soft" management skill, effective practice within an organization requires a strategic focus to ensure that people resources can facilitate the achievement of...

, financial management
Managerial finance
Managerial finance is the branch of finance that concerns itself with the managerial significance of finance techniques. It is focused on assessment rather than technique....

, business intelligence
Business intelligence
Business intelligence mainly refers to computer-based techniques used in identifying, extracting, and analyzing business data, such as sales revenue by products and/or departments, or by associated costs and incomes....

, customer relationship management
Customer relationship management
Customer relationship management is a widely implemented strategy for managing a company’s interactions with customers, clients and sales prospects. It involves using technology to organize, automate, and synchronize business processes—principally sales activities, but also those for marketing,...

 and more.

Description

SAS is driven by SAS programs
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...

, which define a sequence of operations to be performed on data stored as tables
Table (database)
In relational databases and flat file databases, a table is a set of data elements that is organized using a model of vertical columns and horizontal rows. A table has a specified number of columns, but can have any number of rows...

. Although non-programmer graphical user interfaces to SAS exist (such as the SAS Enterprise Guide), these GUI
Gui
Gui or guee is a generic term to refer to grilled dishes in Korean cuisine. These most commonly have meat or fish as their primary ingredient, but may in some cases also comprise grilled vegetables or other vegetarian ingredients. The term derives from the verb, "gupda" in Korean, which literally...

s are most often merely a front-end that automates or facilitates the generation of SAS programs. The functionalities of SAS components are intended to be accessed via application programming interface
Application programming interface
An application programming interface is a source code based specification intended to be used as an interface by software components to communicate with each other...

s, in the form of statements
Statement (programming)
In computer programming a statement can be thought of as the smallest standalone element of an imperative programming language. A program written in such a language is formed by a sequence of one or more statements. A statement will have internal components .Many languages In computer programming...

 and procedures
Procedural programming
Procedural programming can sometimes be used as a synonym for imperative programming , but can also refer to a programming paradigm, derived from structured programming, based upon the concept of the procedure call...

.

A SAS program has three major parts:
  1. the DATA step
  2. procedure steps (effectively, everything that is not enclosed in a DATA step)
  3. a macro language


SAS Library Engines and Remote Library Services allow access to data stored in external data structures and on remote computer platforms.

The DATA step section of a SAS program, like other database-oriented fourth-generation programming languages
Fourth-generation programming language
A fourth-generation programming language is a programming language or programming environment designed with a specific purpose in mind, such as the development of commercial business software. In the history of computer science, the 4GL followed the 3GL in an upward trend toward higher...

 such as SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....

 or Focus, assumes a default file structure, and automates the process of identifying files to the operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

, opening the input file, reading the next record, opening the output file, writing the next record, and closing the files. This allows the user/programmer to concentrate on the details of working with the data within each record, in effect working almost entirely within an implicit program loop that runs for each record.

All other tasks are accomplished by procedures that operate on the data set
Data set
A data set is a collection of data, usually presented in tabular form. Each column represents a particular variable. Each row corresponds to a given member of the data set in question. Its values for each of the variables, such as height and weight of an object or values of random numbers. Each...

 (SAS' terminology for "table") as a whole. Typical tasks include printing or performing statistical analysis
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

, and may just require the user/programmer to identify the data set. Procedures are not restricted to only one behavior and thus allow extensive customization, controlled by mini-languages defined within the procedures. SAS also has an extensive SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....

 procedure, allowing SQL programmers to use the system with little additional knowledge.

There are macro programming extensions, that allow for rationalization of repetitive sections of the program. Proper imperative
Imperative programming
In computer science, imperative programming is a programming paradigm that describes computation in terms of statements that change a program state...

 and procedural programming
Procedural programming
Procedural programming can sometimes be used as a synonym for imperative programming , but can also refer to a programming paradigm, derived from structured programming, based upon the concept of the procedure call...

 constructs can be simulated by use of the "open code" macros or the Interactive Matrix Language SAS/IML component.

Macro code in a SAS program, if any, undergoes preprocessing
Preprocessor
In computer science, a preprocessor is a program that processes its input data to produce output that is used as input to another program. The output is said to be a preprocessed form of the input data, which is often used by some subsequent programs like compilers...

. At run time, DATA steps are compiled and procedures are interpreted and run in the sequence they appear in the SAS program. A SAS program requires the SAS software to run.

Compared to general-purpose programming language
General-purpose programming language
In computer software a general-purpose programming language is a programming language designed to be used for writing software in a wide variety of application domains...

s, this structure allows the user/programmer to concentrate less on the technical details of the data and how it is stored, and more on the information contained in the data. This blurs the line between user and programmer, appealing to individuals who fall more into the 'business' or 'research' area and less in the 'information technology
Information technology
Information technology is the acquisition, processing, storage and dissemination of vocal, pictorial, textual and numerical information by a microelectronics-based combination of computing and telecommunications...

' area, since SAS does not enforce (although it recommends) a structured, centralized approach to data and infrastructure management.

SAS runs on IBM mainframe
IBM mainframe
IBM mainframes are large computer systems produced by IBM from 1952 to the present. During the 1960s and 1970s, the term mainframe computer was almost synonymous with IBM products due to their marketshare...

s, Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

, Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

, OpenVMS Alpha
OpenVMS
OpenVMS , previously known as VAX-11/VMS, VAX/VMS or VMS, is a computer server operating system that runs on VAX, Alpha and Itanium-based families of computers. Contrary to what its name suggests, OpenVMS is not open source software; however, the source listings are available for purchase...

, and Microsoft Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

. Code is "almost" transparently moved between these environments. Older versions have supported PC-DOS
PC-DOS
IBM PC DOS is a DOS system for the IBM Personal Computer and compatibles, manufactured and sold by IBM from the 1980s to the 2000s....

, the Apple Macintosh, VMS
OpenVMS
OpenVMS , previously known as VAX-11/VMS, VAX/VMS or VMS, is a computer server operating system that runs on VAX, Alpha and Itanium-based families of computers. Contrary to what its name suggests, OpenVMS is not open source software; however, the source listings are available for purchase...

, VM/CMS, PrimeOS, Data General AOS
Data General
Data General was one of the first minicomputer firms from the late 1960s. Three of the four founders were former employees of Digital Equipment Corporation. Their first product, the Data General Nova, was a 16-bit minicomputer...

 and OS/2
OS/2
OS/2 is a computer operating system, initially created by Microsoft and IBM, then later developed by IBM exclusively. The name stands for "Operating System/2," because it was introduced as part of the same generation change release as IBM's "Personal System/2 " line of second-generation personal...

.

Early history

SAS was conceived by Anthony J. Barr in 1966. As a North Carolina State University
North Carolina State University
North Carolina State University at Raleigh is a public, coeducational, extensive research university located in Raleigh, North Carolina, United States. Commonly known as NC State, the university is part of the University of North Carolina system and is a land, sea, and space grant institution...

 graduate student from 1962 to 1964, Barr had created an analysis of variance
Analysis of variance
In statistics, analysis of variance is a collection of statistical models, and their associated procedures, in which the observed variance in a particular variable is partitioned into components attributable to different sources of variation...

 modeling language inspired by the notation of statistician Maurice Kendall
Maurice Kendall
Sir Maurice George Kendall, FBA was a British statistician, widely known for his contribution to statistics. The Kendall tau rank correlation is named after him.-Education and early life:...

, followed by a multiple regression program that generated machine code for performing algebraic transformations of the raw data. Drawing on those programs and his experience with structured data files, he created SAS, placing statistical procedures into a formatted file framework. From 1966 to 1968, Barr developed the fundamental structure and language of SAS.

In January 1968, Barr and James Goodnight
James Goodnight
James "Jim" Goodnight is the CEO of SAS Instituteand is generally recognized as the wealthiest man in the state of North Carolina and one of the wealthiest in the world.-Biography:...

 collaborated, integrating new multiple regression and analysis of variance routines developed by Goodnight into Barr's framework. Goodnight's routines made the handling of basic statistical analysis more robust, and his later implementation (in SAS 76) of the general linear model
General linear model
The general linear model is a statistical linear model.It may be written aswhere Y is a matrix with series of multivariate measurements, X is a matrix that might be a design matrix, B is a matrix containing parameters that are usually to be estimated and U is a matrix containing errors or...

 increased the analytical power of the system. By 1971, SAS was gaining popularity within the academic community. One strength of the system was analyzing experiments with missing data, which was useful to the pharmaceutical
Pharmaceutical company
The pharmaceutical industry develops, produces, and markets drugs licensed for use as medications. Pharmaceutical companies are allowed to deal in generic and/or brand medications and medical devices...

 and agricultural
Agriculture
Agriculture is the cultivation of animals, plants, fungi and other life forms for food, fiber, and other products used to sustain life. Agriculture was the key implement in the rise of sedentary human civilization, whereby farming of domesticated species created food surpluses that nurtured the...

 industries, among others.

In 1973, John Sall joined the project, making extensive programming contributions in econometrics, time series, and matrix algebra. Other participants in the early years included Caroll G. Perkins, Jolayne W. Service, and Jane T. Helwig. Perkins made programming contributions. Service and Helwig created the early documentation.

In 1976, SAS Institute, Inc.
SAS Institute
SAS Institute Inc. , headquartered in Cary, North Carolina, USA, has been a major producer of software since it was founded in 1976 by Anthony Barr, James Goodnight, John Sall and Jane Helwig...

 was incorporated by Barr, Goodnight, Sall, and Helwig.

Components

SAS consists of a number of components, which organizations separately license and install as required.

Base SAS: The core of SAS, the so-called Base SAS Software, manages data. SAS procedures software analyzes and reports the data. The SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....

 procedure allows SQL (Structured Query Language) programming in lieu of data step and procedure programming. Library Engines allow transparent access to common data structures such as Oracle, as well as pass-through of SQL to be executed by such data structures. The Macro facility is a tool for extending and customizing SAS software programs and reducing overall program verbosity. The DATA step debugger is a programming tool that helps find logic problems in DATA step programs. The Output Delivery System (ODS) is an extendable system that delivers output in a variety of formats, such as SAS data sets, listing files, RTF
Rich Text Format
The Rich Text Format is a proprietary document file format with published specification developed by Microsoft Corporation since 1987 for Microsoft products and for cross-platform document interchange....

, PDF, XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

, or HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

. The SAS windowing environment is an interactive, graphical user interface
Graphical user interface
In computing, a graphical user interface is a type of user interface that allows users to interact with electronic devices with images rather than text commands. GUIs can be used in computers, hand-held devices such as MP3 players, portable media players or gaming devices, household appliances and...

 used to run and test SAS programs.

BI Dashboard: A plugin for Information Delivery Portal. It allows the user to create various graphics that represent a broad range of data. This allows a quick glance to provide a lot of information, without having to look at all the underlying data.

Data Integration Studio: Provides extract, transform, load
Extract, transform, load
Extract, transform and load is a process in database usage and especially in data warehousing that involves:* Extracting data from outside sources* Transforming it to fit operational needs...

 (ETL) services.

SAS Enterprise Business Intelligence Server: Includes both a suite of business intelligence
Business intelligence
Business intelligence mainly refers to computer-based techniques used in identifying, extracting, and analyzing business data, such as sales revenue by products and/or departments, or by associated costs and incomes....

 (BI) tools and a platform to provide uniform access to data. The goal of this product is to compete with Business Objects
Business Objects (company)
SAP Business Objects is a French enterprise software company, specializing in business intelligence . Since 2007, it has been a part of SAP AG. The company claimed more than 46,000 customers worldwide in its final earnings release...

 and Cognos
Cognos
Cognos was an Ottawa, Ontario-based company making business intelligence and performance management software. Founded in 1969, at its peak Cognos employed almost 3,500 people and served more than 23,000 customers in over 135 countries.Originally Quasar Systems Limited, it adopted the Cognos...

' offerings.

Enterprise Computing Offer (ECO): Not to be confused with Enterprise Guide or Enterprise Miner, ECO is a product bundle.

Enterprise Guide: SAS Enterprise Guide is a Microsoft Windows client application that provides a guided mechanism to use SAS and publish dynamic results throughout an organization in a uniform way. It is marketed as the default interface to SAS for business analysts, statisticians, and programmers. Though Data Integration Studio is the true ETL tool of SAS, Enterprise Guide can be used for the ETL of smaller projects.

Enterprise Miner: A data mining
Data mining
Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...

 tool.

Information Delivery Portal: Allows users to setup personalized homepages where they can view automatically generated reports, dashboards, and other SAS data structures.

Information Map Studio: A client application that helps with building information maps.

OLAP Cube Studio: A client application that helps with building OLAP Cubes.

SAS Web OLAP Viewer for Java: Web based application for viewing OLAP cubes and data explorations. (Discontinued as per Nov 2010 )

SAS Web OLAP Viewer for.NET:

SAS/ACCESS:Provides the ability for SAS to transparently share data with non-native datasources.

SAS/ACCESS for PC Files:Allows SAS to transparently share data with personal computer
Personal computer
A personal computer is any general-purpose computer whose size, capabilities, and original sales price make it useful for individuals, and which is intended to be operated directly by an end-user with no intervening computer operator...

 applications including MS Access and Microsoft Office Excel.

SAS Add-In for Microsoft Office: A component of the SAS Enterprise Business Intelligence Server, is designed to provide access to data, analysis, reporting and analytics for non-technical workers (such as business analysts, power users, domain experts and decision makers) via menus and toolbars integrated into Office applications.

SAS/AF:Applications facility, a set of application development tools to create customized desktop GUI applications; a library of drag-and-drop widgets are available; widgets and models are fully object oriented; SCL programs can be attached as needed.

SAS/SCL: SAS Component Language, allows programmers to create and compile object-oriented programs. Uniquely, SAS allows objects to submit and execute Base/SAS and SAS/Macro statements.

SAS/ASSIST:Early point-and-click
Point-and-click
Point-and-click is the action of a computer user moving a cursor to a certain location on a screen and then pressing a mouse button, usually the left button , or other pointing device...

 interface to SAS, has since been superseded by SAS Enterprise Guide and its client–server architecture.

SAS/C

SAS/CALC: Is a discontinued spreadsheet application, which came out in version 6 for mainframes and PCs, and didn't make it further.

SAS/CONNECT:Provides ability for SAS sessions on different platforms to communicate with each other.

SAS/DMI:A programming interface between interactive SAS and ISPF/PDF applications. Obsolete since version 5.

SAS/EIS:A menu-driven system for developing, running, and maintaining an enterprise information systems.

SAS/ETS:Provides Econometrics
Econometrics
Econometrics has been defined as "the application of mathematics and statistical methods to economic data" and described as the branch of economics "that aims to give empirical content to economic relations." More precisely, it is "the quantitative analysis of actual economic phenomena based on...

 and Time Series
Time series
In statistics, signal processing, econometrics and mathematical finance, a time series is a sequence of data points, measured typically at successive times spaced at uniform time intervals. Examples of time series are the daily closing value of the Dow Jones index or the annual flow volume of the...

 Analysis

SAS/FSP: Allows interaction with data using integrated tools for data entry, computation, query, editing, validation, display, and retrieval.

SAS/GIS:An interactive desktop Geographic Information System
Geographic Information System
A geographic information system, geographical information science, or geospatial information studies is a system designed to capture, store, manipulate, analyze, manage, and present all types of geographically referenced data...

 for mapping applications.

SAS/GRAPH:Although base SAS includes primitive graphing capabilities, SAS/GRAPH is needed for charting
Charting application
A charting application is a computer program that is used to graphically create a graphical representation based on some non-graphical data that is entered by a user, most often through a spreadsheet application, but also through a dedicated specific scientific application , or...

 on graphical media.

SAS/IML: Matrix
Matrix (mathematics)
In mathematics, a matrix is a rectangular array of numbers, symbols, or expressions. The individual items in a matrix are called its elements or entries. An example of a matrix with six elements isMatrices of the same size can be added or subtracted element by element...

-handling SAS script extensions.

SAS/INSIGHT: Dynamic tool for data mining - allows examination of univariate distributions, visualization of multivariate data, and model fitting using regression, analysis of variance, and the generalized linear model.

SAS/Integration Technologies: Allows the SAS System to use standard protocols, like LDAP for directory access, CORBA
Çorba
Chorba , ciorbă , shurpa , shorpo , or sorpa is one of various kinds of soup or stew found in national cuisines across Middle East...

 and Microsoft
Microsoft
Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...

's COM
Component Object Model
Component Object Model is a binary-interface standard for software componentry introduced by Microsoft in 1993. It is used to enable interprocess communication and dynamic object creation in a large range of programming languages...

/DCOM
Distributed component object model
Distributed Component Object Model is a proprietary Microsoft technology for communication among software components distributed across networked computers. DCOM, which originally was called "Network OLE", extends Microsoft's COM, and provides the communication substrate under Microsoft's COM+...

 for inter-application communication, as well as message-oriented middleware like Microsoft Message Queuing
Microsoft Message Queuing
Microsoft Message Queuing or MSMQ is a Message Queue implementation developed by Microsoft and deployed in its Windows Server operating systems since Windows NT 4 and Windows 95. The latest Windows 7 also includes this component...

 and IBM WebSphere MQ. Also includes the SAS' proprietary client–server protocols used by all SAS clients
Client (computing)
A client is an application or system that accesses a service made available by a server. The server is often on another computer system, in which case the client accesses the service by way of a network....

.

SAS/IntrNet:Extends SAS’ data retrieval and analysis functionality to the Web with a suite of CGI and Java tools

SAS/LAB:Superseded by SAS Enterprise Guide.

SAS/OR:Operations Research

SAS/PH-Clinical:Defunct product

SAS/QC:Quality Control
Quality control
Quality control, or QC for short, is a process by which entities review the quality of all factors involved in production. This approach places an emphasis on three aspects:...

 provides quality improvement tools.

SAS/SHARE: A data server that allows multiple users to gain simultaneous access to SAS files

SAS/SHARE*NET:Discontinued and now part of SAS/SHARE. It allowed a SAS/SHARE data server to be accessed from non-sas clients, like JDBC or ODBC compliant applications.

SAS/SPECTRAVIEW: Allows visual exploration of large amounts of data. Once the system has plotted the data in a 3D space, users can then visualise it by creating envelope surfaces, cutting planes, etc, which can be animated depending on a fourth parameter (time for example).

SAS/STAT: Statistical Analysis with a number of procedures, providing statistical information such as analysis of variance, regression, multivariate analysis, and categorical data analysis. Note for example the GLIMMIX procedure.

SAS/TOOLKIT

SAS/Warehouse Administrator: superseded in SAS 9 by SAS ETL Server.

SAS Web Report Studio: Part of the SAS Enterprise Business Intelligence Server, provides access to query and reporting capabilities on the Web. Aimed at non-technical users.

SAS Financial Management: Budgeting, planning, financial reporting and consolidation.

SAS Activity Based Management: Cost and revenue modeling.

SAS Strategy Management (formerly Strategic Performance Management): Collaborative scorecards.

SAS Scalable Performance Data Server (SPDS): Distributed data system offering increased performance; Data processing server.

Terminology

Where many other languages refer to tables
Table (database)
In relational databases and flat file databases, a table is a set of data elements that is organized using a model of vertical columns and horizontal rows. A table has a specified number of columns, but can have any number of rows...

, rows
Row (database)
In the context of a relational database, a row—also called a record or tuple—represents a single, implicitly structured data item in a table. In simple terms, a database table can be thought of as consisting of rows and columns or fields...

, and columns/fields
Column (database)
In the context of a relational database table, a column is a set of data values of a particular simple type, one for each row of the table. The columns provide the structure according to which the rows are composed....

, SAS uses the terms data sets, observations, and variables (although in some of the GUI applications, it is not consistent with these terms, sometimes referring to columns and rows). There are only two kinds of variables in SAS: numeric and character (string). By default all numeric variables are stored as (8 byte) real. It is possible to reduce precision in external storage only. Date and datetime variables are numeric variables that inherit the C tradition and are stored as either the number of days (for date variables) or seconds (for datetime variables).

Features

  • Read and write different file formats.
  • Process data in different formats.
  • SAS programming language, a 4th generation programming language. SAS DATA steps are written in a 3rd-generation procedural language very similar to PL/I; SAS PROCS, especially PROC SQL, are non-procedural and therefore better fit the definition of a 4GL.
  • WHERE filtering available in DATA steps and PROCs; based on SQL WHERE clauses, incl. operators like LIKE and BETWEEN/AND.
  • Built-in statistical and random number functions.
  • Functions for manipulating character and numeric variables. Version 9 includes Perl
    Perl
    Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

     Regular Expression
    Regular expression
    In computing, a regular expression provides a concise and flexible means for "matching" strings of text, such as particular characters, words, or patterns of characters. Abbreviations for "regular expression" include "regex" and "regexp"...

     processing.
  • System of formats and informats. These control representation and categorization of data and may be used within DATA step programs in a wide variety of ways. Users can create custom formats, either by direct specification or via an input dataset.
  • Comprehensive date- and time-handling functions; a variety of formats to represent date and time information without transformation of underlying values.
  • Interaction with database products through a subset of SQL
    SQL
    SQL is a programming language designed for managing data in relational database management systems ....

     (and ability to use SQL internally to manipulate SAS data sets). Almost all SAS functions and operators available in PROC SQL.
  • SAS/ACCESS modules allow communication with databases (including databases accessible via ODBC); in most cases, database tables can be viewed as though they were native SAS data sets. As a result, applications may combine data from many platforms without the end-user needing to know details of or distinctions between data sources.
  • Direct output of reports to CSV
    Comma-separated values
    A comma-separated values file stores tabular data in plain-text form. As a result, such a file is easily human-readable ....

    , HTML
    HTML
    HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

    , PCL, PDF
    Portable Document Format
    Portable Document Format is an open standard for document exchange. This file format, created by Adobe Systems in 1993, is used for representing documents in a manner independent of application software, hardware, and operating systems....

    , PostScript
    PostScript
    PostScript is a dynamically typed concatenative programming language created by John Warnock and Charles Geschke in 1982. It is best known for its use as a page description language in the electronic and desktop publishing areas. Adobe PostScript 3 is also the worldwide printing and imaging...

    , RTF
    Rich Text Format
    The Rich Text Format is a proprietary document file format with published specification developed by Microsoft Corporation since 1987 for Microsoft products and for cross-platform document interchange....

    , XML
    XML
    Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

    , and more using Output Delivery System. Templates, custom tagsets, styles incl. CSS
    Cascading Style Sheets
    Cascading Style Sheets is a style sheet language used to describe the presentation semantics of a document written in a markup language...

     and other markup
    Markup language
    A markup language is a modern system for annotating a text in a way that is syntactically distinguishable from that text. The idea and terminology evolved from the "marking up" of manuscripts, i.e. the revision instructions by editors, traditionally written with a blue pencil on authors' manuscripts...

     tools available and fully programmable.
  • Interaction with the operating system
    Operating system
    An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

     (for example, pipelining on Unix and Windows and DDE
    Dynamic Data Exchange
    Dynamic Data Exchange is a technology for interprocess communication under Microsoft Windows or OS/2.- Overview :Dynamic Data Exchange was first introduced in 1987 with the release of Windows 2.0 as a method of interprocess communication so that one program can communicate with or control another...

     on Windows).
  • Fast development time, particularly from the many built-in procedures, functions, in/formats, the macro facility, etc.
  • An integrated development environment
    Integrated development environment
    An integrated development environment is a software application that provides comprehensive facilities to computer programmers for software development...

    .
  • Dynamic data-driven code generation using the SAS Macro language.
  • Can process files containing millions of rows and thousands of columns of data.
  • University research centers often offer SAS code for advanced statistical techniques, especially in fields such as Political Science, Economics and Business Administration.
  • Large user community supported by SAS Institute. Users have a say in future development, e.g. via the annual SASWare Ballot.
  • SAS Text Miner was rated as the third most used text mining software (9%) by Rexer's Annual Data Miner Survey
    Rexer's Annual Data Miner Survey
    Rexer Analytics’s Annual Data Miner Survey is the largest survey of data mining professionals in the industry. It consists of approximately 50 multiple choice and open-ended questions that cover seven general areas of data mining science and practice: Field and goals, Algorithms, Models, Tools...

     in 2010.

Example SAS code

SAS uses data steps and procedures to analyze and manipulate data. By default, a data step iterates through each observation in a data set (like every row in a SQL table).

This data step creates a new data set BBB that includes those observations from data set AAA that had charges greater than 100.

data BBB;
set AAA;
where charge > 100;
run;

SAS makes available procedures that can summarize data. The proc freq procedure shows a frequency distribution of a given variable in a data set.

proc freq data=BBB;
table charge;
run;

SAS also allows direct subsetting of rows and/or columns of the data used as input to a procedure. The two previous examples could be replaced by the following:

proc freq data=AAA;
where charge > 100;
table charge;
run;

The same program could produce a data set containing the frequency distribution:
...
table charge / out=charge_freq;
...

The SAS Macro Language enables such features as conditional execution of SAS language components either across multiple data-steps and proc-steps, or within a single such step. One can think of it as a "code-generator", although it can also be used merely to establish static values that can be reused throughout the program, and altered as needed. For instance, the above example could be re-used in many pieces of code by rewriting it as a macro:

%macro freqtable (table, variable);
proc freq data = &table;
table &variable;
run;
%mend freqtable;

%freqtable (BBB, charge)

And further, other macro variables could be used for both conditional execution, as well as modification of the functionality of the step, as shown below. The first procedure is modified to include a new parameter limitObs, which, if used, subsets the data before performing the frequency analysis. A second macro provides overall program control functionality, including a flag indicating whether the frequency analysis should be performed at all.

%macro freqtable (table, variable, limitObs);
proc freq data = &table
%if .&limitObs ne . %then (obs=&limitObs);
; /* End of PROC FREQ statement */
table &variable;
run;
%mend freqtable;

%macro wrapper(myTable, myVariable, limitObs, doFreq);
/* Perform other proc-steps and data-steps */
%if &doFreq=Y %then %freqtable(&mytable, &myVariable, &limitObs);
%mend wrapper;

%wrapper(work.test, CLASS, 20, Y)

SAS also features SQL, usable to create, modify or query SAS datasets or external database tables accessed with a SAS libname engine. For example, duplicate records could be extracted from a table for analysis:

proc sql;
create table dup_recs
as select *
from your_dataset
group by id
having count(*) > 1
;
quit;

The proc print procedure allows the user to display information in ways not possible using only the SQL SELECT statement.

proc print data=BBB;
run;

SAS features SCL, which can be used to create object-oriented programs. SCL programs provide a robust library of features not available in Base SAS or the SAS Macro Language

class arrays;
public num supplyChain [*,*,*,*];
eventhandler runInterface / (sender='*', event='prepack for singles and bulk');

runInterface: method;
call send(_self_, 'step1');
call send(_self_, 'step2');
* ---;
call send(_self_, 'step99');
endmethod;

step1: method / (description='initialize array: suppliers, distro-centers, stores, prepack options');
supplyChain=makearray(34000, 15, 3207, 10);
endmethod;

step2: method / (description='load data');
* code cut;
endmethod;

step99: method / (description='print results');
submit continue;
proc print data=work.results;
run;
endsubmit;
endmethod;
endclass;

SAS 71

SAS 71 represents the first limited release of the system. The first manual for SAS was printed at this time, approximately 60 pages long. The DATA step was implemented. Regression and analysis of variance were the main uses of the program.

SAS 72

This more robust release was the first to achieve wide distribution. It included a substantial user's guide, 260 pages in length. The MERGE statement was introduced in this release, adding the ability to perform a database JOIN on two data sets. This release also introduced the comprehensive handling of missing data.

SAS 76

SAS 76 was a complete system level rewrite, featuring an open architecture for adding and extending procedures, and for extending the compiler. The INPUT and INFILE statements were significantly enhanced to read virtually all data formats in use on the IBM mainframe. Report generation was added through the PUT and FILE statements. The capacity to analyze general linear model
General linear model
The general linear model is a statistical linear model.It may be written aswhere Y is a matrix with series of multivariate measurements, X is a matrix that might be a design matrix, B is a matrix containing parameters that are usually to be estimated and U is a matrix containing errors or...

s was added.

79.3 - 82.4

1980 saw the addition of SAS/GRAPH, a graphing component; and SAS/ETS for econometric and time-series analysis. In 1981 SAS/FSP followed, providing full-screen interactive data entry, editing, browsing, retrieval, and letter writing.

In 1983 full-screen spreadsheet capabilities were introduced (PROC FSCALC).

For IBM mainframes, SAS 82 no longer required SAS databases to have direct access organization ( (DSORG=DAU), because SAS 82 removed location-dependent information from databases. This permitted SAS to work with datasets on tape and other media besides disk.

Version 4 series

In the early 1980s, SAS Institute released Version 4, the first version for non-IBM computers. It was written mostly in a subset of the PL/I
PL/I
PL/I is a procedural, imperative computer programming language designed for scientific, engineering, business and systems programming applications...

 language, to run on several minicomputer
Minicomputer
A minicomputer is a class of multi-user computers that lies in the middle range of the computing spectrum, in between the largest multi-user systems and the smallest single-user systems...

 manufacturers' operating systems and hardware: Data General
Data General
Data General was one of the first minicomputer firms from the late 1960s. Three of the four founders were former employees of Digital Equipment Corporation. Their first product, the Data General Nova, was a 16-bit minicomputer...

's AOS/VS, Digital Equipment's VAX/VMS, and Prime Computer
Prime Computer
Prime Computer, Inc. was a Natick, Massachusetts-based producer of minicomputers from 1972 until 1992. The alternative spellings "PR1ME" and "PR1ME Computer" were used as brand names or logos by the company.-Founders:...

's PRIMOS. The version was colloquially called "Portable SAS" because most of the code was portable, i.e., the same code would run under different operating systems.

Version 6 series

Version 6 represented a major milestone for SAS. While it appeared superficially similar to the user, major changes occurred "under the hood": the software was rewritten. From its FORTRAN
Fortran
Fortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...

 origins, followed by PL/I
PL/I
PL/I is a procedural, imperative computer programming language designed for scientific, engineering, business and systems programming applications...

 and mainframe assembly language
Assembly language
An assembly language is a low-level programming language for computers, microprocessors, microcontrollers, and other programmable devices. It implements a symbolic representation of the machine codes and other constants needed to program a given CPU architecture...

; in version 6 SAS was rewritten in C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

, to provide enhanced portability
Portability
Portability may refer to:*Portability , the portability of social security benefits*Software portability, the portability of a piece of software to multiple platforms...

 between operating systems, as well as access to an increasing pool of C programmers compared to the shrinking pool of PL/I programmers.

This was the first version to run on UNIX
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

, MS-DOS and Windows platforms. The DOS versions were incomplete implementations of the Version 6 spec: some functions and formats were unavailable, as were SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....

 and related items such as indexing and WHERE subsetting. DOS memory limitations restricted the size of some user-defined items.

The mainframe version of SAS 6 changed the physical format of SAS databases from "direct files" (DSORG=DA) to standard blocked physical sequential files (DSORG=PS,RECFM=FS) with a customized EXCP macro instead of BSAM, QSAM or previously BDAM which was used through version 5 until the complete rewrite of version 6. The practical benefit of this change is that a SAS 6 database can be copied from any media with any copying tool including IEBGENER - which uses BSAM.

In 1984 a project management component was added (SAS/PROJECT).

In 1985 SAS/AF software, econometrics and time series analysis (SAS/ETS) component, and interactive matrix programming (SAS/IML) software was introduced. MS-DOS SAS (version 6.02) was introduced, along with a link to mainframe SAS.

In 1986 Statistical quality improvement component is added (SAS/QC software); SAS/IML and SAS/STAT software is released for personal computers.

1987 saw concurrent update access provided for SAS data sets with SAS/SHARE software. Database interfaces are introduced for DB2
IBM DB2
The IBM DB2 Enterprise Server Edition is a relational model database server developed by IBM. It primarily runs on Unix , Linux, IBM i , z/OS and Windows servers. DB2 also powers the different IBM InfoSphere Warehouse editions...

 and SQL-DS.

In 1988 SAS introduced the concept of MultiVendor Architecture (MVA); SAS/ACCESS software is released. Support for UNIX-based hardware announced. SAS/ASSIST software for building user-friendly front-end menus is introduced. New SAS/CPE software establishes SAS as innovator in computer performance evaluation. Version 6.03 for MS-DOS is released.

6.06 for MVS
MVS
Multiple Virtual Storage, more commonly called MVS, was the most commonly used operating system on the System/370 and System/390 IBM mainframe computers...

, CMS
Conversational Monitor System
The Conversational Monitor System is a relatively simple interactive computing single-user operating system.* CMS is part of IBM's VM family, which runs on IBM mainframe computers...

, and OpenVMS
OpenVMS
OpenVMS , previously known as VAX-11/VMS, VAX/VMS or VMS, is a computer server operating system that runs on VAX, Alpha and Itanium-based families of computers. Contrary to what its name suggests, OpenVMS is not open source software; however, the source listings are available for purchase...

 is announced in 1990. The same year, the last MS-DOS version (6.04) is released.

Data visualization capabilities added in 1991 with SAS/INSIGHT software.

In 1992 SAS/CALC, SAS/TOOLKIT, SAS/PH-Clinical, and SAS/LAB software is released.

In 1993 software for building customized executive information systems (EIS) is introduced. Release 6.08 for MVS, CMS, VMS, VSE, OS/2
OS/2
OS/2 is a computer operating system, initially created by Microsoft and IBM, then later developed by IBM exclusively. The name stands for "Operating System/2," because it was introduced as part of the same generation change release as IBM's "Personal System/2 " line of second-generation personal...

, and Windows is announced.

1994 saw the addition of ODBC
Open Database Connectivity
In computing, ODBC is a standard C interface for accessing database management systems . The designers of ODBC aimed to make it independent of database systems and operating systems...

 support, plus SAS/SPECTRAVIEW and SAS/SHARE*NET components.

6.09 saw the addition of a data step debugger.

6.09E for MVS.

6.10 in 1995 was a Microsoft Windows release and the first release for the Apple Macintosh. Version 6 was the first, and last series to run on the Macintosh. JMP
JMP (statistical software)
JMP is a computer program that was first developed by John Sall and others to perform simple and complex statistical analyses.It dynamically links statistics with graphics to interactively explore, understand, and visualize data...

, also produced by the SAS Institute, is the software package the company produces for the Macintosh.

Also in 1995, 6.11 (codenamed Orlando) was released for Windows 95, Windows NT, and UNIX.

6.12 (Some of the following milestones in this sub-section may belong under version 7 or 8.)

In 1996 SAS announces Web enablement of SAS software and introduced the scalable performance data server.

In 1997 SAS/Warehouse Administrator and SAS/IntrNet software goes into production.

1998 sees SAS introduce a customer relationship management (CRM) solution, and an ERP access interface — SAS/ACCESS interface for SAP R/3. SAS is also the first to release OLE-DB for OLAP and releases HOLAP solution. Balanced scorecard, SAS/Enterprise Reporter, and HR Vision are released. First release of SAS Enterprise Miner.

1999 sees the releases of HR Vision software, the first end-to-end decision-support system for human resources reporting and analysis; and Risk Dimensions software, an end-to-end risk-management solution. MS-DOS versions are abandoned because of Y2K issues and lack of continued demand.

In 2000 SAS shipped Enterprise Guide and ported its software to Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

.

Version 7 series

The Output Delivery System debuted in version 7; as did long variable names (from 8 to 32 characters); storage of long character strings in variables (from 200 to 32,767); and a much improved built-in text editor, the Enhanced Editor.

Version 7 saw the synchronisation of features between the various platforms for a particular version number (which previously hadn't been the case).

Version 7 foreshadowed version 8. It was believed in the SAS users community, although never officially confirmed, that in releasing version 7 SAS Institute released a snapshot from their development on version 8 to meet a deadline promise. To some, SAS Institute recommending that sites wait until version 8 before deploying the new software was a confirmation of this.

Version 8 series

Released about 1999; 8.0, 8.1, 8.2 were Unix, Linux, Microsoft Windows, CMS (z/VM
Z/VM
z/VM is the current version in IBM's VM family of virtual machine operating systems. z/VM was first released in October 2000 and remains in active use and development . It is directly based on technology and concepts dating back to the 1960s, with IBM's CP/CMS on the IBM System/360-67...

) and z/OS
Z/OS
z/OS is a 64-bit operating system for mainframe computers, produced by IBM. It derives from and is the successor to OS/390, which in turn followed a string of MVS versions.Starting with earliest:*OS/VS2 Release 2 through Release 3.8...

 releases. Key features: long variable names, Output Delivery System (ODS).

SAS 8.1 was released in 2000.

SAS 8.2 was released in 2001.

Version 9 series

SAS 9.2 is the latest release (March 2008) and was demonstrated at SAS Global Forum (previously called SUGI) 2008. A list of features added to this release of SAS can be seen at the "What's New in SAS" web page.

SAS 9.2 will be released incrementally in three phases:

1) MVA-based products eg. SAS/BASE, SAS/STAT, SAS/Graph. Nothing that relies on metadata. Limited availability from March 2008 because most users rely on the Metadata Server (see Phase 2) or products released in Phased 3.

2) Enterprise Intelligence Platform. Metadata Server for Business Intelligence (BI) and Data Integration. Availability from March 2009.

3) Client software for metadata driven analytics and business solutions. Enterprise Miner, Text Miner, Model manager. Solutions include Financial, Retail, Health & Life Science. Availability unknown, probably 2nd Quarter 2009.

Version 9 makes additions to base SAS. The new hash object now allows functionality similar to the MERGE statement without sorting data or building formats. The function library was enlarged, and many functions have new parameters. Perl Regular Expressions are now supported, as opposed to the old "Regular Expression" facility, which was incompatible with most other implementations of Regular Expressions. Long format names are now supported.

Criticism

The Base SAS component had been criticized for its poor graphics when compared with other statistical software packages. With the release of the Output Delivery System (ODS) for Statistical Graphics extension in SAS 7, and with the use of the SAS Graph component the graphics have improved significantly. The development tools provided — which include the Enhanced text editor, log, DATA step debugger, SCL debugger — are also outdated compared to what other development environments provide. Debugging tools are especially lacking. Finding bugs in modern SAS programs that use many macros can be complex; SAS will often not note the correct line number of execution when reporting an error, as diagnostic messages will refer to the expanded macro code.

Competitors

  • R
    R (programming language)
    R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians for developing statistical software, and R is widely used for statistical software development and data analysis....

  • SPSS
    SPSS
    SPSS is a computer program used for survey authoring and deployment , data mining , text analytics, statistical analysis, and collaboration and deployment ....

  • STATA
    Stata
    Stata is a general-purpose statistical software package created in 1985 by StataCorp. It is used by many businesses and academic institutions around the world...

  • STATISTICA
    STATISTICA
    STATISTICA is a statistics and analytics software package developed by StatSoft. STATISTICA provides data analysis, data management, data mining, and data visualization procedures...

  • WPS
    World Programming System
    The World Programming System, also known as WPS, is a software product developed by a company called World Programming. WPS allows users to create, edit and run programs written in the language of SAS. The latest release of WPS covers a significant gap in use of WPS. It now provides PROC REG and...

  • XLSTAT
    XLSTAT
    XLSTAT is a commercial statistical and multivariate analysis software. The software has been developed by Addinsoft and was introduced by Thierry Fahmy, the founder of Addinsoft, in 1993. It is a Microsoft Excel add-in...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK