MySQL Cluster
Encyclopedia
MySQL Cluster is a technology which provides shared-nothing
Shared nothing architecture
A shared nothing architecture is a distributed computing architecture in which each node is independent and self-sufficient, and there is no single point of contention across the system...

 clustering capabilities for the MySQL
MySQL
MySQL officially, but also commonly "My Sequel") is a relational database management system that runs as a server providing multi-user access to a number of databases. It is named after developer Michael Widenius' daughter, My...

 database management system
Database management system
A database management system is a software package with computer programs that control the creation, maintenance, and use of a database. It allows organizations to conveniently develop databases for various applications by database administrators and other specialists. A database is an integrated...

. It was first included in the production release of MySQL 4.1 in November 2004. It is designed to provide high availability and high performance, while allowing for nearly linear scalability. MySQL Cluster is implemented through an additional storage engine available within MySQL called NDB
Ndb Cluster
Ndb Cluster is the distributed database system underlying MySQL Cluster. It can be used independently of a MySQL Server.From the MySQL Server perspective the Ndb Cluster is a Storage engine for storing tables of rows. From the Ndb Cluster perspective, a MySQL Server instance is an 'API process'...

 or NDBCLUSTER
Ndb Cluster
Ndb Cluster is the distributed database system underlying MySQL Cluster. It can be used independently of a MySQL Server.From the MySQL Server perspective the Ndb Cluster is a Storage engine for storing tables of rows. From the Ndb Cluster perspective, a MySQL Server instance is an 'API process'...

 ("NDB" stands for Network Database).

Architecture

MySQL Cluster has a few important concepts behind its design, which add both benefits and burdens.

Replication

MySQL Cluster uses synchronous replication through a two-phase commit mechanism in order to guarantee that data is written to multiple nodes upon committing the data. (This is in contrast to what is usually referred to as "MySQL Replication", which is asynchronous.) Two copies (known as replicas) of the data are required to guarantee availability; however, the cluster can be configured to store between one and four copies at any single time.

Starting with MySQL 5.1, it is also possible to replicate asynchronously between clusters; this is sometimes referred to as "MySQL Cluster Replication" or "geographical replication". See MySQL Cluster Replication.

Horizontal data partitioning

Data within NDB tables is automatically partitioned
Partition (database)
A partition is a division of a logical database or its constituting elements into distinct independent parts. Database partitioning is normally done for manageability, performance or availability reasons....

 across all of the data nodes in the system. This is done based on a hashing algorithm based on the PRIMARY KEY on the table
Table (database)
In relational databases and flat file databases, a table is a set of data elements that is organized using a model of vertical columns and horizontal rows. A table has a specified number of columns, but can have any number of rows...

, and is transparent to the end application
Application software
Application software, also known as an application or an "app", is computer software designed to help the user to perform specific tasks. Examples include enterprise software, accounting software, office suites, graphics software and media players. Many application programs deal principally with...

.

In the 5.1 release, users can define their own partitioning schemes.

Hybrid Storage

MySQL Cluster allows datasets larger than the capacity of a single machine to be stored and accessed across multiple machines.

MySQL Cluster maintains all indexed
Index (database)
A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of slower writes and increased storage space...

 columns in distributed memory. Non indexed columns can also be maintained in distributed memory or can be maintained on disk
Hard disk
A hard disk drive is a non-volatile, random access digital magnetic data storage device. It features rotating rigid platters on a motor-driven spindle within a protective enclosure. Data is magnetically read from and written to the platter by read/write heads that float on a film of air above the...

 with an in-memory page cache
Page cache
In computing, page cache, sometimes ambiguously called disk cache, is a "transparent" buffer of disk-backed pages kept in main memory by the operating system for quicker access. Page cache is typically implemented in kernels with the paging memory management, and is completely transparent to...

. Storing non indexed columns on disk allows MySQL Cluster to store datasets larger than the aggregate memory of the clustered machines.

MySQL Cluster writes Redo
Redo log
In the Oracle RDBMS environment, redo logs comprise files in a proprietary format which log a history of all changes made to the database. Each redo log file consists of redo records...

 logs to disk for all data changes as well as check pointing data to disk regularly. This allows the cluster to consistently recover from disk after a full cluster outage. As the Redo logs are written asynchronous
Asynchronous communication
In telecommunications, asynchronous communication is transmission of data without the use of an external clock signal, where data can be transmitted intermittently rather than in a steady stream. Any timing required to recover data from the communication symbols is encoded within the symbols...

ly with respect to transaction commit, some small number of transactions can be lost if the full cluster fails. The current default asynchronous write delay is 2 seconds, and is configurable. Normal single point of failure scenarios do not result in any data loss due to the synchronous data replication within the cluster.

When a MySQL Cluster table is maintained in memory, the cluster will only access disk storage to write Redo records and checkpoints. As these writes are sequential and limited random access patterns are involved, MySQL Cluster can achieve higher write throughput rates with limited disk hardware compared to a traditional disk-based caching RDBMS.

Shared nothing

MySQL Cluster is designed to have no single point of failure
Single point of failure
A single point of failure is a part of a system that, if it fails, will stop the entire system from working. They are undesirable in any system with a goal of high availability or reliability, be it a business practice, software application, or other industrial system.-Overview:Systems can be made...

. Provided that the cluster is set up correctly, any single node, system, or piece of hardware can fail without the entire cluster failing. Shared disk (SAN
Storage area network
A storage area network is a dedicated network that provides access to consolidated, block level data storage. SANs are primarily used to make storage devices, such as disk arrays, tape libraries, and optical jukeboxes, accessible to servers so that the devices appear like locally attached devices...

) is not required. The interconnects between nodes can be standard Ethernet
Ethernet
Ethernet is a family of computer networking technologies for local area networks commercially introduced in 1980. Standardized in IEEE 802.3, Ethernet has largely replaced competing wired LAN technologies....

. Gigabit Ethernet
Gigabit Ethernet
Gigabit Ethernet is a term describing various technologies for transmitting Ethernet frames at a rate of a gigabit per second , as defined by the IEEE 802.3-2008 standard. It came into use beginning in 1999, gradually supplanting Fast Ethernet in wired local networks where it performed...

 and SCI interconnects are also supported.

Implementation

MySQL Cluster uses three different types of nodes (processes) :
  • Data node (ndbd/ndbmtd process): These nodes store the data.
  • Management node (ndb_mgmd process): Used for configuration and monitoring of the cluster. They are required only during node startup.
  • SQL node (mysqld process): A MySQL server (mysqld) that connects to all of the data nodes in order to perform data storage and retrieval. This node type is optional; it is possible to query data nodes directly via the NDB API.


Generally, it is expected that each node will run on a separate host computer.

Versions

MySQL Cluster version numbers are no longer tied to that of MySQL Server - for example, the most recent version is MySQL Cluster 7.1 even though it is based on/contains the server component from MySQL 5.1.

Higher versions of MySQL Cluster include all of the features of lower versions, plus some new features.
Currently available versions:
  • Ndb included in MySQL 5.1.X source tree
This is old and not maintained. Do not use
  • MySQL Cluster 6.2 based on MySQL 5.1.A
First 'telco' or 'carrier grade edition' release. Supports 255 nodes, online table alter, replication latency and throughput enhancements etc.
  • MySQL Cluster 6.3 based on MySQL 5.1.B
Includes compressed backup + LCP, circular replication support, conflict detection/resolution, table optimization etc.
  • MySQL Cluster 7.0 based on MySQL 5.1.C
Includes multi-threaded data nodes (ndbmtd), Transactional DDL, Windows support.
  • MySQL Cluster 7.1 based on MySQL 5.1.D
Includes ClusterJ and ClusterJPA connectors

Limitations

In the 5.1 release, non-indexed columns can be stored on disk and do not require dedicated RAM
Ram
-Animals:*Ram, an uncastrated male sheep*Ram cichlid, a species of freshwater fish endemic to Colombia and Venezuela-Military:*Battering ram*Ramming, a military tactic in which one vehicle runs into another...

. However, in 5.0 all indexes as well as all data are still in main memory.

In the 5.1 release, a maximum of 255 nodes can belong to a single MySQL Cluster with up to 48 of those being data nodes. In the 5.0 release the total number of nodes cannot exceed 63. It is possible to change this at compile time, but that has not been thoroughly tested at this point.

Versions up to and including 5.0 do not have support for variable-width columns, instead using the entire storage width of the column declaration, effectively making a VARCHAR(255) column into a CHAR(255) column. MySQL 5.1 adds true VARCHAR support for NDB tables.

The foreign key construct is ignored, just as it is in MyISAM tables.

Beginning with MySQL 5.0.6, the maximum number of metadata objects has increased to 20320. This includes database tables, system tables, and indexes.

Other limitations are listed here

MySQL Cluster Limitations 5.0.

MySQL Cluster Limitations 5.1.

Requirements

Minimum system requirements are as follows (for each node)
3 Machines Minimum
  • OS: Linux (Red Hat, SUSE), Solaris, Mac OS X, Windows
  • CPU: Intel/AMD x86
  • Memory: 512MB RAM
  • HDD: 3GB
  • Network: 1+ nodes (Standard Ethernet - TCP/IP)


Preferred system requirements are as follows (for each node)
4 Machines Preferred
  • OS: Linux (Red Hat, SUSE), Solaris, Mac OS X, Windows
  • CPU: 2x Intel Xeon, AMD Opteron, Sun SPARC
  • Memory: 16GB RAM
  • HDD: 4x 36GB SCSI (RAID 1 Controller)
  • Network: 1-8 Nodes (Gigabit Ethernet); 8+ Nodes (Dedicated Cluster Interconnect e.g. SCI)

History

MySQL AB
MySQL AB
MySQL AB was a software company. MySQL AB is the creator of MySQL, a relational database management system, as well as related products such as MySQL Cluster...

 acquired the technology behind MySQL Cluster from Alzato, a small venture company
Venture capital
Venture capital is financial capital provided to early-stage, high-potential, high risk, growth startup companies. The venture capital fund makes money by owning equity in the companies it invests in, which usually have a novel technology or business model in high technology industries, such as...

 started by Ericsson
Ericsson
Ericsson , one of Sweden's largest companies, is a provider of telecommunication and data communication systems, and related services, covering a range of technologies, including especially mobile networks...

. NDB
Ndb Cluster
Ndb Cluster is the distributed database system underlying MySQL Cluster. It can be used independently of a MySQL Server.From the MySQL Server perspective the Ndb Cluster is a Storage engine for storing tables of rows. From the Ndb Cluster perspective, a MySQL Server instance is an 'API process'...

 was originally designed for the telecom market, with its High availability
High availability
High availability is a system design approach and associated service implementation that ensures a prearranged level of operational performance will be met during a contractual measurement period....

 and high performance requirements. The original press release is still available from MySQL AB

NDB has since been integrated into the MySQL
MySQL
MySQL officially, but also commonly "My Sequel") is a relational database management system that runs as a server providing multi-user access to a number of databases. It is named after developer Michael Widenius' daughter, My...

product, with its first release being in MySQL 4.1.

MySQL AB


Other

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK