SOFA Statistics
Encyclopedia
SOFA Statistics is an open-source statistical package, with an emphasis on ease of use, learn as you go, and beautiful output. The name stands for Statistics Open For All. It has a graphical user interface
Graphical user interface
In computing, a graphical user interface is a type of user interface that allows users to interact with electronic devices with images rather than text commands. GUIs can be used in computers, hand-held devices such as MP3 players, portable media players or gaming devices, household appliances and...

 and can connect directly to MySQL
MySQL
MySQL officially, but also commonly "My Sequel") is a relational database management system that runs as a server providing multi-user access to a number of databases. It is named after developer Michael Widenius' daughter, My...

, PostgreSQL
PostgreSQL
PostgreSQL, often simply Postgres, is an object-relational database management system available for many platforms including Linux, FreeBSD, Solaris, MS Windows and Mac OS X. It is released under the PostgreSQL License, which is an MIT-style license, and is thus free and open source software...

, SQLite
SQLite
SQLite is an ACID-compliant embedded relational database management system contained in a relatively small C programming library. The source code for SQLite is in the public domain and implements most of the SQL standard...

, MS Access, and Microsoft SQL Server
Microsoft SQL Server
Microsoft SQL Server is a relational database server, developed by Microsoft: It is a software product whose primary function is to store and retrieve data as requested by other software applications, be it those on the same computer or those running on another computer across a network...

. Data can also be imported from CSV
Comma-separated values
A comma-separated values file stores tabular data in plain-text form. As a result, such a file is easily human-readable ....

 files or spreadsheets (Microsoft Excel
Microsoft Excel
Microsoft Excel is a proprietary commercial spreadsheet application written and distributed by Microsoft for Microsoft Windows and Mac OS X. It features calculation, graphing tools, pivot tables, and a macro programming language called Visual Basic for Applications...

, OpenOffice.org Calc
OpenOffice.org Calc
OpenOffice.org Calc is the spreadsheet component of the OpenOffice.org software package.Calc is similar to Microsoft Excel, with a roughly equivalent range of features. Calc is capable of opening and saving most spreadsheets in Microsoft Excel file format...

, Gnumeric
Gnumeric
Gnumeric is a spreadsheet program that is part of the GNOME Free Software Desktop Project. Gnumeric version 1.0 was released December 31, 2001. Gnumeric is distributed as free software under the GNU GPL license; it is intended to replace proprietary and other spreadsheet programs such as Microsoft...

, Google Docs). The main statistical tests available are Independent and Paired t-tests, Wilcoxon signed ranks
Wilcoxon signed-rank test
The Wilcoxon signed-rank test is a non-parametric statistical hypothesis test used when comparing two related samples or repeated measurements on a single sample to assess whether their population mean ranks differ The Wilcoxon signed-rank test is a non-parametric statistical hypothesis test used...

, Mann–Whitney U, Pearson's chi squared
Pearson's chi-squared test
Pearson's chi-squared test is the best-known of several chi-squared tests – statistical procedures whose results are evaluated by reference to the chi-squared distribution. Its properties were first investigated by Karl Pearson in 1900...

, Kruskal Wallis H, one-way ANOVA
Analysis of variance
In statistics, analysis of variance is a collection of statistical models, and their associated procedures, in which the observed variance in a particular variable is partitioned into components attributable to different sources of variation...

, Spearman's R
Spearman's rank correlation coefficient
In statistics, Spearman's rank correlation coefficient or Spearman's rho, named after Charles Spearman and often denoted by the Greek letter \rho or as r_s, is a non-parametric measure of statistical dependence between two variables. It assesses how well the relationship between two variables can...

, and Pearson's R
Pearson product-moment correlation coefficient
In statistics, the Pearson product-moment correlation coefficient is a measure of the correlation between two variables X and Y, giving a value between +1 and −1 inclusive...

. Nested tables can be produced with row and column percentages, totals, sd
Standard deviation
Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...

, mean
Arithmetic mean
In mathematics and statistics, the arithmetic mean, often referred to as simply the mean or average when the context is clear, is a method to derive the central tendency of a sample space...

, median
Median
In probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to...

, and sum. Simple but dynamic bar charts (freq or means), clustered bar charts (freq or means), pie charts, single or multiple line charts (freq or means), area charts (freq or means), histograms, scatterplots, and box and whisker plots are available. It is also possible to create chart series.

Installation packages are available for Microsoft Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

, Ubuntu
Ubuntu (operating system)
Ubuntu is a computer operating system based on the Debian Linux distribution and distributed as free and open source software. It is named after the Southern African philosophy of Ubuntu...

, Linux Mint
Linux Mint
Linux Mint is a Linux-based computer operating system best known for its usability and ease of installation, particularly for users with no previous GNU/Linux experience...

, and Mac OS X
Mac OS X
Mac OS X is a series of Unix-based operating systems and graphical user interfaces developed, marketed, and sold by Apple Inc. Since 2002, has been included with all new Macintosh computer systems...

 (Leopard and Snow Leopard).

SOFA Statistics is written in Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

, and the widget toolkit
Widget toolkit
In computing, a widget toolkit, widget library, or GUI toolkit is a set of widgets for use in designing applications with graphical user interfaces...

 used is wxPython
WxPython
-External links:* * at showmedo...

. The statistical analyses are based on functions available through the Scipy
SciPy
SciPy is an open source library of algorithms and mathematical tools for the Python programming language.SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal and image processing, ODE solvers and other tasks common in science and...

 stats module.

Analysis and reporting can be automated using Python scripts – either exported directly from SOFA Statistics or manually written.

See also


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK