Data warehouse appliance
Encyclopedia
In computing
Computing
Computing is usually defined as the activity of using and improving computer hardware and software. It is the computer-specific part of information technology...

, a data warehouse appliance consists of an integrated set of servers, storage, operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

(s), DBMS and software specifically pre-installed and pre-optimized for data warehousing
Data warehouse
In computing, a data warehouse is a database used for reporting and analysis. The data stored in the warehouse is uploaded from the operational systems. The data may pass through an operational data store for additional operations before it is used in the DW for reporting.A data warehouse...

 (DW). Alternatively, the term can also apply to similar software-only systems — purportedly very easy to install on specific recommended hardware configurations or preconfigured as a complete system - a true appliance.

DW appliances provide solutions for the mid-to-large volume data warehouse market, offering low-cost performance most commonly on data volumes in the terabyte
Terabyte
The terabyte is a multiple of the unit byte for digital information. The prefix tera means 1012 in the International System of Units , and therefore 1 terabyte is , or 1 trillion bytes, or 1000 gigabytes. 1 terabyte in binary prefixes is 0.9095 tebibytes, or 931.32 gibibytes...

 to petabyte
Petabyte
A petabyte is a unit of information equal to one quadrillion bytes, or 1000 terabytes. The unit symbol for the petabyte is PB...

 range.

Appliance technology

Most DW appliance vendors use massively parallel
Massively parallel
Massively parallel is a description which appears in computer science, life sciences, medical diagnostics, and other fields.A massively parallel computer is a distributed memory computer system which consists of many individual nodes, each of which is essentially an independent computer in itself,...

 processing (MPP) architectures to provide high query performance and platform scalability
Scalability
In electronics scalability is the ability of a system, network, or process, to handle growing amount of work in a graceful manner or its ability to be enlarged to accommodate that growth...

. MPP architectures consist of independent processors or servers executing in parallel. Most MPP architectures implement a "shared-nothing architecture" where each server operates self-sufficiently and controls its own memory and disk. Shared-nothing architectures have a proven record for high scalability and little contention. DW appliances distribute data onto dedicated disk storage units connected to each server in the appliance. This distribution allows DW appliances to resolve a relational query by scanning data on each server in parallel. The divide-and-conquer approach delivers high performance and scales linearly as new servers are added into the architecture. Other DW appliance vendors use specialized hardware and advanced software, instead of MPP architectures. This approach is able to achieve MPP performance in a much smaller form factor. The first vendor to market with a data warehouse appliance featuring specialized SQL hardware was Netezza in 2003 through leveraging FPGA technology as sophisticated projection and restriction filters, minimizing data movement and I/O within the system. Kickfire
Kickfire
Kickfire, Inc. was an analytic database appliance manufacturer. It was acquired by Teradata in August 2010. The Kickfire appliance utilizes FPGA hardware acceleration techniques for SQL databases.Features:...

 followed in 2008 with what they deem a dataflow
Dataflow
Dataflow is a term used in computing, and may have various shades of meaning. It is closely related to message passing.-Software architecture:...

 "sql chip".

MPP database architectures have a long pedigree. Teradata
Teradata
Teradata Corporation is a vendor specializing in data warehousing and analytic applications. Its products are commonly used by companies to manage data warehouses for analytics and business intelligence purposes. Teradata was formerly a division of NCR Corporation, with the spinoff from NCR on...

, Tandem
Tandem Computers
Tandem Computers, Inc. was the dominant manufacturer of fault-tolerant computer systems for ATM networks, banks, stock exchanges, telephone switching centers, and other similar commercial transaction processing applications requiring maximum uptime and zero data loss. The company was founded in...

, Britton Lee
Britton Lee, Inc.
Britton Lee Inc. was a pioneering relational database company. Renamed ShareBase, it was acquired by Teradata in June, 1990.-History:Britton Lee was founded in 1979 by David L. Britton, Geoffrey M...

, and Sequent
Sequent Computer Systems
Sequent Computer Systems, or Sequent, was a computer company that designed and manufactured multiprocessing computer systems. They were among the pioneers in high-performance symmetric multiprocessing open systems, innovating in both hardware and software Sequent Computer Systems, or Sequent, was...

 offered MPP SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....

-based architectures in the 1980s. Open source
Open source
The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...

 and commodity
Commodity
In economics, a commodity is the generic term for any marketable item produced to satisfy wants or needs. Economic commodities comprise goods and services....

 components have aided a re-emergence of MPP data warehouses. Advances in technology have reduced costs and improved performance in storage devices, multi-core CPUs and networking components. Open-source RDBMS products, such as Ingres and PostgreSQL
PostgreSQL
PostgreSQL, often simply Postgres, is an object-relational database management system available for many platforms including Linux, FreeBSD, Solaris, MS Windows and Mac OS X. It is released under the PostgreSQL License, which is an MIT-style license, and is thus free and open source software...

, reduce software-license costs and allow DW-appliance vendors to focus on optimization rather than providing basic database functionality. Open-source Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

 provides a stable, well-implemented operating system for DW appliances.

History

Some consider Teradata's initial product as the first DW appliance — or Britton-Lee's
(Note: Teradata acquired Britton Lee — renamed ShareBase — in June, 1990.)
Others disagree, considering appliances as a "disruptive technology" for Teradata
. Interest in the data warehouse appliance category is generally dated to the emergence of Netezza
Netezza
Netezza designs and markets high-performance data warehouse appliances and advanced analytics applications for uses including enterprise data warehousing, business intelligence, predictive analytics and business continuity planning....

 in the early 2000s.

a second generation of DW appliances has emerged, marking the move to mainstream vendor integration. IBM integrated its InfoSphere
IBM InfoSphere
IBM Infosphere is a branded product line from IBM under its Information Management Software brand, announced in February 2008, which includes software products from its WebSphere and Information Server product lines...

 Warehouse (formerly DB2 Warehouse) with its own servers and storage to create the IBM InfoSphere Balanced Warehouse
IBM Balanced Configuration Unit
The IBM Data Warehousing Balanced Configuration Unit is a family of data warehousing server s from IBM. IBM introduced the Balanced Configuration Unit for AIX in 2005, and the BCU for Linux in 2006...

. Netezza introduced its TwinFin platform based on commodity IBM hardware. Other DW appliance vendors have also partnered with major hardware vendors to help bring their appliances to market. DATAllegro
DATAllegro
DATAllegro was a company that specializes in datawarehousing applicances. It was founded by Stuart Frost in 2003 inspired by and as a competitor to Data warehouse appliance pioneer Netezza...

, prior to acquisition by Microsoft
Microsoft
Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...

, partnered with EMC
EMC Corporation
EMC Corporation , a Financial Times Global 500, Fortune 500 and S&P 500 company, develops, delivers and supports information infrastructure and virtual infrastructure hardware, software, and services. EMC is headquartered in Hopkinton, Massachusetts, USA.Former Intel executive Richard Egan and his...

 and Dell
Dell
Dell, Inc. is an American multinational information technology corporation based in 1 Dell Way, Round Rock, Texas, United States, that develops, sells and supports computers and related products and services. Bearing the name of its founder, Michael Dell, the company is one of the largest...

 and implemented open-source Ingres on Linux. Greenplum
Greenplum
Greenplum is a database software company in San Mateo, California, specializing in enterprise data cloud solutions for large-scale data warehousing and analytics...

 has a partnership with Sun Microsystems
Sun Microsystems
Sun Microsystems, Inc. was a company that sold :computers, computer components, :computer software, and :information technology services. Sun was founded on February 24, 1982...

 and implements Greenplum Database (based on PostgreSQL) on Solaris using the ZFS
ZFS
In computing, ZFS is a combined file system and logical volume manager designed by Sun Microsystems. The features of ZFS include data integrity verification against data corruption modes , support for high storage capacities, integration of the concepts of filesystem and volume management,...

 file system. HP Neoview has a wholly owned solution and uses HP NonStop SQL
NonStop SQL
Nonstop SQL is software that is developed and sold by Hewlett Packard. Nonstop SQL is a commercial relational database management system that is designed for fault tolerance and scalability. The latest version of the product is SQL/MX 3.0. This was released in February 2011.The product was...

. XtremeData offers a FPGA based data-warehousing appliance built on commodity hardware and open-source operating system for "deep analytics" and data mining
Data mining
Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...

.

Kognitio offers a row-based "virtual" data warehouse appliance while Vertica
Vertica
Vertica Systems is an analytic database management software company. Vertica was founded in 2005 by database researcher Michael Stonebraker, and Andrew Palmer; its President and CEO is Christopher P. Lynch. HP announced it would acquire the company in February 2011. On March 22, 2011, HP completed...

, EXASOL
EXASOL
EXASOL is an analytic database management software company. Its product is called EXASolution, a RDBMS. EXASOL regularly publishes results of the TPC-H benchmark....

 and Paraccel
Paraccel
ParAccel, Inc. is a vendor in the data warehouse appliance market category. Vendors in this category provide a purpose-built database management system used for data warehousing, business intelligence and analytic processing and, according to Hackathorn and White, there are varying degrees of...

 offer column-based "virtual" data warehouse appliances. Like Greenplum, ParAccel partners with Sun Microsystems
Sun Microsystems
Sun Microsystems, Inc. was a company that sold :computers, computer components, :computer software, and :information technology services. Sun was founded on February 24, 1982...

. These solutions provide software-only solutions deployed on clusters of commodity hardware. Kognitio’s homegrown WX2 database runs on several blade configurations. Other players in the DW appliance space include Calpont and Kickfire
Kickfire
Kickfire, Inc. was an analytic database appliance manufacturer. It was acquired by Teradata in August 2010. The Kickfire appliance utilizes FPGA hardware acceleration techniques for SQL databases.Features:...

. Kickfire
Kickfire
Kickfire, Inc. was an analytic database appliance manufacturer. It was acquired by Teradata in August 2010. The Kickfire appliance utilizes FPGA hardware acceleration techniques for SQL databases.Features:...

 employs a column store storage engine compatible with MySQL for ease of deployment and use, in combination with specialized hardware for proven performance.

The market has also seen the emergence of data-warehouse bundles where vendors combine their hardware and database software together as a data warehouse platform. The Oracle
Oracle Corporation
Oracle Corporation is an American multinational computer technology corporation that specializes in developing and marketing hardware systems and enterprise software products – particularly database management systems...

 Optimized Warehouse Initiative combines the Oracle Database with hardware from various computer manufacturers (Dell
Dell
Dell, Inc. is an American multinational information technology corporation based in 1 Dell Way, Round Rock, Texas, United States, that develops, sells and supports computers and related products and services. Bearing the name of its founder, Michael Dell, the company is one of the largest...

, EMC
EMC Corporation
EMC Corporation , a Financial Times Global 500, Fortune 500 and S&P 500 company, develops, delivers and supports information infrastructure and virtual infrastructure hardware, software, and services. EMC is headquartered in Hopkinton, Massachusetts, USA.Former Intel executive Richard Egan and his...

, HP, IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...

, SGI
Silicon Graphics
Silicon Graphics, Inc. was a manufacturer of high-performance computing solutions, including computer hardware and software, founded in 1981 by Jim Clark...

 and Sun Microsystems
Sun Microsystems
Sun Microsystems, Inc. was a company that sold :computers, computer components, :computer software, and :information technology services. Sun was founded on February 24, 1982...

). Oracle's Optimized Warehouses offer pre-validated configurations and the database software comes pre-installed. In 2008 Oracle began offering a more classic appliance offering, the HP Oracle Database Machine, a jointly developed and co-branded platform that Oracle sells and supports and HP builds in configurations specifically for Oracle.
In 2009, Oracle released a second-generation Exadata
Oracle Exadata
Oracle Exadata is a database appliance with support for both OLTP and OLAP workloads. It was initially designed in collaboration between Oracle Corporation and Hewlett Packard where Oracle designed the database, operating system , and storage software whereas HP designed the hardware for it...

 system, based on their newly acquired Sun Microsystems
Sun Microsystems
Sun Microsystems, Inc. was a company that sold :computers, computer components, :computer software, and :information technology services. Sun was founded on February 24, 1982...

 hardware.

Benefits

The total cost of ownership
Total cost of ownership
Total cost of ownership is a financial estimate whose purpose is to help consumers and enterprise managers determine direct and indirect costs of a product or system...

 (TCO) of a data warehouse consists of initial entry costs, on-going maintenance costs and the cost of changing capacity as the data warehouse grows. DW appliances offer low entry and maintenance costs. Initial costs depend on the size of the appliance installed.

The resource cost for monitoring and tuning the data warehouse makes up a large part of the TCO, often as much as 80%. DW appliances reduce administration for day-to-day operations, setup and integration. Many also offer low costs for expanding processing power and capacity.

With an increased focus on controlling costs combined with tight IT Budgets, data warehouse managers sometimes need to reduce and manage expenses even while leveraging their technology as much as possible, making DW appliances a solution.

Parallel performance

Many DW appliances support mixed-workloads where a broad range of ad hoc queries and reports run simultaneously with loading. DW appliance vendors use several distribution and partitioning methods to provide parallel performance. Some DW appliances scan data using partitioning and sequential I/O instead of index usage. Other DW appliances use standard database indexing.

With high performance on highly granular data, DW appliances can address analytics that previously could not meet performance requirements.

Reduced administration

DW appliances provide a single vendor solution and take ownership for optimizing the parts and software within the appliance. This eliminates the customer's costs for integration and regression testing of the DBMS, storage and OS on a terabyte scale and avoids some of the compatibility issues that arise from multi-vendor solutions. A single support-point also provides a single source for problem-resolution and a simplified upgrade-path for software and hardware.

Built-in high availability

MPP DW appliance vendors provide built-in high availability through redundancy on components within the appliance. Many offer warm-standby servers, dual networks, dual power-supplies, disk mirroring with failover and solutions for server failure.

Scalability

DW appliances scale for both capacity and performance. Many DW appliances implement a modular design that database administrators can add to incrementally, eliminating up-front costs for over-provisioning. In contrast, architectures that do not support incremental expansion result in hours of production downtime, during which database administrators export and re-load terabytes of data. In MPP architectures, adding servers increases performance as well as capacity. This does not always happen with alternate solutions.

Rapid time-to-value

Companies increasingly expect to use business analytics to improve the current cycle. DW appliances provide fast implementations without the need for regression- and integration-testing. In some cases, reduced tuning, reduced index creation, fast loading and reduced need for aggregation make rapid prototyping
Software prototyping
*Software prototyping, refers to the activity of creating prototypes of software applications, i.e., incomplete versions of the software program being developed...

 possible.

Application uses

DW appliances provide solutions for many analytic application uses, including:
  • enterprise data warehousing
  • super-sized sandboxes which isolate power users with resource intensive queries
  • pilot projects or projects requiring rapid prototyping and rapid time-to-value
  • off-loading projects from the enterprise data warehouse, such as large analytical query projects that affect the overall workload of the enterprise data warehouse
    Data warehouse
    In computing, a data warehouse is a database used for reporting and analysis. The data stored in the warehouse is uploaded from the operational systems. The data may pass through an operational data store for additional operations before it is used in the DW for reporting.A data warehouse...

  • applications with specific performance or loading requirements
  • data marts that have outgrown their present environment
  • turnkey
    Turnkey
    A turn-key or a turn-key project is a type of project that is constructed by a developer and sold or turned over to a buyer in a ready-to-use condition.-Common usage:...

     data warehouses or data mart
    Data mart
    A data mart is the access layer of the data warehouse environment that is used to get data out to the users. The data mart is a subset of the data warehouse which is usually oriented to a specific business line or team.- Terminology :...

    s
  • solutions for applications with high data-growth and high-performance requirements
  • applications requiring data warehouse encryption

Trends

The DW appliance market has started to shift trends in many areas as it evolves:
  • Vendors have started moving toward using commodity technologies rather than proprietary assembly of commodity components
  • Implemented applications show usage expansion from tactical and data-mart solutions to strategic and enterprise data-warehouse use.
  • Mainstream vendor participation has become apparent .
  • With a lower total cost of ownership, reduced maintenance and high performance to address business analytics on growing data volumes, most analysts believe that DW appliances will gain market share - though TeraData maintain their leadership position.
  • Vendors have begun providing the ability to incorporate 'in-database' analytic algortihms to take advantage of their MPP architectures, eliminating the need to extract large datasets into traditional analytic and data mining platforms such as SAS.

See also

  • Business Intelligence
    Business intelligence
    Business intelligence mainly refers to computer-based techniques used in identifying, extracting, and analyzing business data, such as sales revenue by products and/or departments, or by associated costs and incomes....

     (BI)
  • Data Mining
    Data mining
    Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...

  • Data mart
    Data mart
    A data mart is the access layer of the data warehouse environment that is used to get data out to the users. The data mart is a subset of the data warehouse which is usually oriented to a specific business line or team.- Terminology :...

     (DM)
  • Data Warehouse
    Data warehouse
    In computing, a data warehouse is a database used for reporting and analysis. The data stored in the warehouse is uploaded from the operational systems. The data may pass through an operational data store for additional operations before it is used in the DW for reporting.A data warehouse...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK