Statistical database
Encyclopedia
A statistical database is a database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

 used for statistical
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

 analysis purposes. It is an OLAP
OLAP
In computing, online analytical processing, or OLAP , is an approach to swiftly answer multi-dimensional analytical queries. OLAP is part of the broader category of business intelligence, which also encompasses relational reporting and data mining...

 instead of OLTP system, although this term precedes that modern decision, and classical statistical databases are often closer to the relational model
Relational model
The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F...

 than the multidimensional model commonly used in OLAP
OLAP
In computing, online analytical processing, or OLAP , is an approach to swiftly answer multi-dimensional analytical queries. OLAP is part of the broader category of business intelligence, which also encompasses relational reporting and data mining...

 systems today.

Statistical databases often incorporate support for advanced statistical analysis techniques, such as correlations, which go beyond SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....

. They also pose unique security
Security
Security is the degree of protection against danger, damage, loss, and crime. Security as a form of protection are structures and processes that provide or improve security as a condition. The Institute for Security and Open Methodologies in the OSSTMM 3 defines security as "a form of protection...

concerns, which were the focus of much research, particularly in the late 1970s and early to mid 1980s.

Security in statistical databases

In a statistical database, it is often desired to allow query access only to aggregate datas, not individual records. Securing such a database is a difficult problem, since intelligent users can use a combination of aggregate queries to derive information about a single individual.

Some common approaches are:
  • only allowing aggregate queries (SUM, COUNT, AVG, STDEV, etc.)
  • rather than returning exact values for sensitive data like income, only return which partition it belongs to (e.g. 35k-40k)
  • return imprecise counts (e.g. rather than 141 records met query, only indicate 130-150 records met it.)
  • don't allow overly selective WHERE clauses
  • audit all users queries, so users using system incorrectly can be investigated
  • use intelligent agents to detect automatically inappropriate system use


Research in this area has largely stalled; reference 3 below showed that, in general, securing statistical databases was an impossible aim: if they were open to legitimate use, they were also open to abuse; and if they were restricted so tightly as to be incapable of abuse, they would then be useless for practical statistical purposes. To quote:
The conclusion is that statistical databases are almost always subject to compromise. Severe restrictions on allowable query set sizes will render the database useless as a source of statistical information but will not secure the confidential records.

Some further reading

Statistical and Scientific Database Management (SSDBM) An important series of conferences in this field

Some key papers in this field:
  1. - Dorothy E. Denning, Secure statistical databases with random sample queries, ACM Transactions on Database Systems (TODS), Volume 5, Issue 3 (September 1980), Pages: 291 - 315
  2. - Wiebren de Jonge, Compromising statistical databases responding to queries about means, ACM Transactions on Database Systems, Volume 8, Issue 1 (March 1983), Pages: 60 - 80
  3. - Dorothy E. Denning, Jan Schlörer, A fast procedure for finding a tracker in a statistical database, ACM Transactions on Database Systems, Volume 5, Issue 1 (March 1980) . Pages: 88 - 102
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK