Home      Discussion      Topics      Dictionary      Almanac
Signup       Login
SPSS

SPSS

Overview
SPSS is a computer program
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...

 used for survey authoring and deployment (IBM SPSS Data Collection), data mining (IBM SPSS Modeler), text analytics, statistical analysis, and collaboration and deployment (batch and automated scoring services).
Discussion
Ask a question about 'SPSS'
Start a new discussion about 'SPSS'
Answer questions from other users
Full Discussion Forum
 
Unanswered Questions
Encyclopedia
SPSS is a computer program
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...

 used for survey authoring and deployment (IBM SPSS Data Collection), data mining (IBM SPSS Modeler), text analytics, statistical analysis, and collaboration and deployment (batch and automated scoring services).

Statistics program


SPSS (originally, Statistical Package for the Social Sciences) was released in its first version in 1968 after being developed by Norman H. Nie
Norman H. Nie
Norman H. Nie is an American social scientist, university professor, inventor, and pioneering technology entrepreneur. Born in St. Louis, Missouri in 1943, Dr. Nie was educated at the University of the Americas in Mexico City, Washington University in St. Louis and Stanford University, where he...

 and C. Hadlai Hull. Norman Nie was then a political science
Political science
Political Science is a social science discipline concerned with the study of the state, government and politics. Aristotle defined it as the study of the state. It deals extensively with the theory and practice of politics, and the analysis of political systems and political behavior...

 postgraduate at Stanford University
Stanford University
The Leland Stanford Junior University, commonly referred to as Stanford University or Stanford, is a private research university on an campus located near Palo Alto, California. It is situated in the northwestern Santa Clara Valley on the San Francisco Peninsula, approximately northwest of San...

, and is now Research Professor
Professor
A professor is a scholarly teacher; the precise meaning of the term varies by country. Literally, professor derives from Latin as a "person who professes" being usually an expert in arts or sciences; a teacher of high rank...

 in the Department of Political Science at Stanford and Professor Emeritus
Emeritus
Emeritus is a post-positive adjective that is used to designate a retired professor, bishop, or other professional or as a title. The female equivalent emerita is also sometimes used.-History:...

 of Political Science at the University of Chicago
University of Chicago
The University of Chicago is a private research university in Chicago, Illinois, USA. It was founded by the American Baptist Education Society with a donation from oil magnate and philanthropist John D. Rockefeller and incorporated in 1890...

. SPSS is among the most widely used programs for statistical analysis
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

 in social science. It is used by market researchers, health researchers, survey companies, government, education researchers, marketing organizations and others. The original SPSS manual (Nie, Bent & Hull, 1970) has been described as one of "sociology's most influential books". In addition to statistical analysis, data management (case selection, file reshaping, creating derived data) and data documentation (a metadata
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...

 dictionary is stored in the datafile
Data
The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...

) are features of the base software.

Statistics included in the base software:
  • Descriptive statistics: Cross tabulation
    Cross tabulation
    Cross tabulation is the process of creating a contingency table from the multivariate frequency distribution of statistical variables. Heavily used in survey research, cross tabulations can be produced by a range of statistical packages, including some that are specialised for the task. Survey...

    , Frequencies, Descriptives, Explore, Descriptive Ratio Statistics
  • Bivariate statistics: Mean
    Mean
    In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

    s, t-test, ANOVA, Correlation
    Correlation
    In statistics, dependence refers to any statistical relationship between two random variables or two sets of data. Correlation refers to any of a broad class of statistical relationships involving dependence....

     (bivariate, partial, distances), Nonparametric tests
  • Prediction for numerical outcomes: Linear regression
    Linear regression
    In statistics, linear regression is an approach to modeling the relationship between a scalar variable y and one or more explanatory variables denoted X. The case of one explanatory variable is called simple regression...

  • Prediction for identifying groups: Factor analysis
    Factor analysis
    Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved, uncorrelated variables called factors. In other words, it is possible, for example, that variations in three or four observed variables...

    , cluster analysis (two-step, K-means, hierarchical), Discriminant


The many features of SPSS are accessible via pull-down menus or can be programmed with a proprietary 4GL command syntax language. Command syntax programming has the benefits of reproducibility, simplifying repetitive tasks, and handling complex data manipulations and analyses. Additionally, some complex applications can only be programmed in syntax and are not accessible through the menu structure. The pull-down menu interface also generates command syntax; this can be displayed in the output, although the default
Default (computer science)
A default, in computer science, refers to a setting or value automatically assigned to a software application, computer program or device, outside of user intervention. Such settings are also called presets, especially for electronic devices...

 settings have to be changed to make the syntax visible to the user. They can also be pasted into a syntax file using the "paste" button present in each menu. Programs can be run interactively or unattended, using the supplied Production Job Facility. Additionally a "macro" language can be used to write command language subroutines and a Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

 programmability extension can access the information in the data dictionary and data and dynamically build command syntax programs. The Python programmability extension, introduced in SPSS 14, replaced the less functional SAX Basic
Visual Basic
Visual Basic is the third-generation event-driven programming language and integrated development environment from Microsoft for its COM programming model...

 "scripts" for most purposes, although SaxBasic remains available. In addition, the Python extension allows SPSS to run any of the statistics in the free software
Free software
Free software, software libre or libre software is software that can be used, studied, and modified without restriction, and which can be copied and redistributed in modified or unmodified form either without restriction, or with restrictions that only ensure that further recipients can also do...

 package R
R (programming language)
R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians for developing statistical software, and R is widely used for statistical software development and data analysis....

. From version 14 onwards SPSS can be driven externally by a Python or a VB.NET program using supplied "plug-ins".

SPSS places constraints on internal file structure, data types, data processing
Data processing
Computer data processing is any process that a computer program does to enter data and summarise, analyse or otherwise convert data into usable information. The process may be automated and run on a computer. It involves recording, analysing, sorting, summarising, calculating, disseminating and...

 and matching files, which together considerably simplify programming. SPSS datasets have a 2-dimensional table structure where the rows typically represent cases (such as individuals or households) and the columns represent measurements (such as age, sex or household income). Only 2 data types are defined: numeric and text
String (computer science)
In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....

 (or "string"). All data processing occurs sequentially case-by-case through the file. Files can be matched one-to-one and one-to-many, but not many-to-many.

The graphical user interface
Graphical user interface
In computing, a graphical user interface is a type of user interface that allows users to interact with electronic devices with images rather than text commands. GUIs can be used in computers, hand-held devices such as MP3 players, portable media players or gaming devices, household appliances and...

 has two views which can be toggled by clicking on one of the two tabs in the bottom left of the SPSS window. The 'Data View' shows a spreadsheet
Spreadsheet
A spreadsheet is a computer application that simulates a paper accounting worksheet. It displays multiple cells usually in a two-dimensional matrix or grid consisting of rows and columns. Each cell contains alphanumeric text, numeric values or formulas...

 view of the cases (rows) and variables (columns). Unlike spreadsheets, the data cells can only contain numbers or text and formulas cannot be stored in these cells. The 'Variable View' displays the metadata dictionary where each row represents a variable and shows the variable name, variable label, value label(s), print width, measurement type and a variety of other characteristics. Cells in both views can be manually edited, defining the file structure and allowing data entry without using command syntax. This may be sufficient for small datasets. Larger datasets such as statistical survey
Statistical survey
Survey methodology is the field that studies surveys, that is, the sample of individuals from a population with a view towards making statistical inferences about the population using the sample. Polls about public opinion, such as political beliefs, are reported in the news media in democracies....

s are more often created in data entry software, or entered during computer-assisted personal interviewing, by scanning and using optical character recognition
Optical character recognition
Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping...

 and optical mark recognition
Optical mark recognition
Optical Mark Recognition is the process of capturing human-marked data from document forms such as surveys and tests.-OMR background:...

 software, or by direct capture from online questionnaires
Online questionnaires
Computer-assisted web interviewing is a Internet surveying technique in which the interviewer follows a script provided in a website. The questionnaires are made in a program for creating web interviews. The program allows for the questionnaire to contain pictures, audio and video clips, links to...

. These datasets are then read into SPSS.

SPSS can read and write data from ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

 text files (including hierarchical files), other statistics packages, spreadsheets and databases. SPSS can read and write to external relational database tables
Table (database)
In relational databases and flat file databases, a table is a set of data elements that is organized using a model of vertical columns and horizontal rows. A table has a specified number of columns, but can have any number of rows...

 via ODBC and SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....

.

Statistical output is to a proprietary file format (*.spv file, supporting pivot table
Pivot table
In data processing, a pivot table is a data summarization tool found in data visualization programs such as spreadsheets or business intelligence software. Among other functions, pivot-table tools can automatically sort, count, total or give the average of the data stored in one table or spreadsheet...

s) for which, in addition to the in-package viewer, a stand-alone reader can be downloaded. The proprietary output can be exported to text or Microsoft Word
Microsoft Word
Microsoft Word is a word processor designed by Microsoft. It was first released in 1983 under the name Multi-Tool Word for Xenix systems. Subsequent versions were later written for several other platforms including IBM PCs running DOS , the Apple Macintosh , the AT&T Unix PC , Atari ST , SCO UNIX,...

. Alternatively, output can be captured as data (using the OMS command), as text, tab-delimited text, PDF, XLS, HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

, XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

, SPSS dataset or a variety of graphic image formats (JPEG
JPEG
In computing, JPEG . The degree of compression can be adjusted, allowing a selectable tradeoff between storage size and image quality. JPEG typically achieves 10:1 compression with little perceptible loss in image quality....

, PNG, BMP and EMF
Windows Metafile
Windows Metafile is a graphics file format on Microsoft Windows systems, originally designed in the 1990s. Windows Metafiles are intended to be portable between applications and may contain both vector graphics and bitmap components....

).
SPSS Server is a version of SPSS with a client/server architecture. It had some features not available in the desktop version, such as scoring functions
Score (statistics)
In statistics, the score, score function, efficient score or informant plays an important role in several aspects of inference...

 (Scoring functions are included in the desktop version from version 19).

Versions


Early versions of SPSS were designed for batch processing
Batch processing
Batch processing is execution of a series of programs on a computer without manual intervention.Batch jobs are set up so they can be run to completion without manual intervention, so all input data is preselected through scripts or command-line parameters...

 on mainframes, including for example IBM and ICL
ICL 2900 Series
The ICL 2900 Series was a range of mainframe computer systems announced by the UK manufacturer ICL on 9 October 1974. The company had started development, under the name "New Range" immediately on its formation in 1968...

 versions, originally using punched cards for input. A processing run read a command file of SPSS commands and either a raw input file of fixed format data with a single record type, or a 'getfile' of data saved by a previous run. To save precious computer time an 'edit' run could be done to check command syntax without analysing the data. From version 10 (SPSS-X) in 1983, data files could contain multiple record types.

SPSS version 16.0 runs under Windows, Mac OS 10.5 and earlier, and Linux. The graphical user interface
Graphical user interface
In computing, a graphical user interface is a type of user interface that allows users to interact with electronic devices with images rather than text commands. GUIs can be used in computers, hand-held devices such as MP3 players, portable media players or gaming devices, household appliances and...

 is written in Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

. The Mac OS version is provided as a Universal binary
Universal binary
A universal binary is, in Apple parlance, an executable file or application bundle that runs natively on either PowerPC or Intel-manufactured IA-32 or Intel 64-based Macintosh computers; it is an implementation of the concept more generally known as a fat binary.With the release of Mac OS X Snow...

, making it fully compatible with both PowerPC and Intel-based Mac hardware.

Prior to SPSS 16.0, different versions of SPSS were available for Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

, Mac OS X
Mac OS X
Mac OS X is a series of Unix-based operating systems and graphical user interfaces developed, marketed, and sold by Apple Inc. Since 2002, has been included with all new Macintosh computer systems...

 and Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

. The Windows version was updated more frequently, and had more features, than the versions for other operating systems.

SPSS version 13.0 for Mac OS X
Mac OS X
Mac OS X is a series of Unix-based operating systems and graphical user interfaces developed, marketed, and sold by Apple Inc. Since 2002, has been included with all new Macintosh computer systems...

 was not compatible with Intel-based Macintosh computers, due to the Rosetta emulation software
Rosetta (software)
Rosetta was a lightweight and dynamic binary translator for Mac OS X which Apple released in 2006 when it transitioned the Macintosh from PowerPC to Intel processors. It allowed pre-existing software to run on the new systems without modification....

 causing errors in calculations. SPSS 15.0 for Windows needed a downloadable hotfix to be installed in order to be compatible with Windows Vista
Windows Vista
Windows Vista is an operating system released in several variations developed by Microsoft for use on personal computers, including home and business desktops, laptops, tablet PCs, and media center PCs...

.

Ownership history


Between 2009 and 2010, the premier vendor for SPSS was called PASW (Predictive Analytics SoftWare) Statistics. The company announced on July 28, 2009 that it was being acquired by IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...

 for US$1.2 billion. As of January 2010, it became "SPSS: An IBM Company". Complete transfer of business to IBM was done by October 1, 2010. By that date, SPSS: An IBM Company ceased to exist. IBM SPSS is now fully integrated into the IBM Corporation, and is one of the brands under IBM Software Group's Business Analytics Portfolio, together with IBM Cognos.

Add-ons


Add-on modules provide additional capabilities. The available modules are:
  • SPSS Programmability Extension (added in version 14). Allows Python
    Python (programming language)
    Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

    , R
    R (programming language)
    R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians for developing statistical software, and R is widely used for statistical software development and data analysis....

    , and .NET
    .NET Framework
    The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability...

     programming control of SPSS.
  • SPSS Data Preparation (added in version 14). Allows programming of logical checks and reporting of suspicious values.
  • SPSS Regression - Logistic regression, ordinal regression, multinomial logistic regression, and mixed models.
  • SPSS Advanced Models - Multivariate GLM
    GLM
    The three-letter acronym GLM may refer to:*Generalized linear model*General linear model*Geostationary Lightning Mapper, an instrument being designed for the GOES-R series of satellites...

     and repeated measures ANOVA (removed from base system in version 14).
  • SPSS Decision Trees. Creates classification and decision trees for identifying groups and predicting behaviour.
  • SPSS Custom Tables. Allows user-defined control of output for reports.
  • SPSS Exact Tests. Allows statistical testing on small samples.
  • SPSS Categories
  • SPSS Forecasting
  • SPSS Conjoint
  • SPSS Missing Values. Simple regression-based imputation.
  • SPSS Complex Samples (added in Version 12). Adjusts for stratification and clustering and other sample selection biases.
  • AMOS (Analysis of Moment Structures) - add-on which allows modeling of structural equation and covariance structures, path analysis
    Path analysis
    In statistics, path analysis is used to describe the directed dependencies among a set of variables. This includes models equivalent to any form of multiple regression analysis, factor analysis, canonical correlation analysis, discriminant analysis, as well as more general families of models in the...

    , and has the more basic capabilities such as linear regression analysis, ANOVA and ANCOVA

Release history

  • SPSS 15.0.1 - November 2006
  • SPSS 16.0.2 - April 2008
  • SPSS Statistics 17.0.1 - December 2008
  • PASW Statistics 17.0.3 - September 2009
  • PASW Statistics 18.0 - August 2009
  • PASW Statistics 18.0.1 - December 2009
  • PASW Statistics 18.0.2 - April 2010
  • PASW Statistics 18.0.3 - September 2010
  • IBM SPSS Statistics 19.0 - August 2010
  • IBM SPSS Statistics 20.0 - August 2011

See also


  • List of statistical packages
  • Comparison of statistical packages
    Comparison of statistical packages
    The following tables compare general and technical information for a number of statistical analysis packages.-General information:Basic information about each product...

  • PSPP
    PSPP
    PSPP is a free software application for analysis of sampled data. It has a graphical user interface and conventional command line interface. It is written in C, uses GNU Scientific Library for its mathematical routines, and plotutils for generating graphs....

     – a free replacement for SPSS
  • gretl
    Gretl
    gretl is an open-source statistical package, mainly for econometrics. The name is an acronym for Gnu Regression, Econometrics and Time-series Library. It has a graphical user interface and can be used together with X-12-ARIMA, TRAMO/SEATS, R, Octave, and Ox. It is written in C, uses GTK as widget...

     – an open source alternative to SPSS that can import SPSS data files
  • R Commander
    R Commander
    R Commander is a GUI for the R programming language, licensed under the GNU General Public License. Among the existing R GUIs, Rcmdr together with its plug-ins is perhaps the most viable R-alternative to commercial statistical packages like SPSS...

     - an open source alternative to SPSS based on the R programming language

External links

  • Raynald Levesque's SPSS Tools - library of worked solutions for SPSS programmers (FAQ
    FAQ
    Frequently asked questions are listed questions and answers, all supposed to be commonly asked in some context, and pertaining to a particular topic. "FAQ" is usually pronounced as an initialism rather than an acronym, but an acronym form does exist. Since the acronym FAQ originated in textual...

    , command syntax; macros; scripts; python)
  • Archives of SPSSX-L Discussion - SPSS Listserv
    LISTSERV
    LISTSERV was the first electronic mailing list software application, consisting of a set of email addresses for a group in which the sender can send one email and it will reach a variety of people...

     active since 1996. Discusses programming, statistics and analysis
  • UCLA ATS Resources to help you learn SPSS - Resources for learning SPSS
  • UCLA ATS Technical Reports - Report 1 compares Stata, SAS and SPSS against R (R
    R (programming language)
    R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians for developing statistical software, and R is widely used for statistical software development and data analysis....

     is a language and environment for statistical computing and graphics).
  • Using SPSS For Data Analysis - SPSS Tutorial from Harvard
  • SPSS Developer Central - Support for developers of applications using SPSS, including materials and examples of the Python
    Python (programming language)
    Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

    programmability feature