Ab Initio
Encyclopedia
The Ab Initio software is a fourth generation
Fourth-generation programming language
A fourth-generation programming language is a programming language or programming environment designed with a specific purpose in mind, such as the development of commercial business software. In the history of computer science, the 4GL followed the 3GL in an upward trend toward higher...

 data analysis, batch processing
Batch processing
Batch processing is execution of a series of programs on a computer without manual intervention.Batch jobs are set up so they can be run to completion without manual intervention, so all input data is preselected through scripts or command-line parameters...

, data manipulation graphical user interface
Graphical user interface
In computing, a graphical user interface is a type of user interface that allows users to interact with electronic devices with images rather than text commands. GUIs can be used in computers, hand-held devices such as MP3 players, portable media players or gaming devices, household appliances and...

 (GUI)-based parallel processing
Parallel processing
Parallel processing is the ability to carry out multiple operations or tasks simultaneously. The term is used in the contexts of both human cognition, particularly in the ability of the brain to simultaneously process incoming stimuli, and in parallel computing by machines.-Parallel processing by...

 product which is commonly used to extract, transform, and load
Extract, transform, load
Extract, transform and load is a process in database usage and especially in data warehousing that involves:* Extracting data from outside sources* Transforming it to fit operational needs...

 (ETL
Extract, transform, load
Extract, transform and load is a process in database usage and especially in data warehousing that involves:* Extracting data from outside sources* Transforming it to fit operational needs...

) data
Data
The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...

. The Ab Initio product also allows for processing of real-time data.

Ab Initio is having a Two Tier Architecture with Graphical Development Environment(GDE) & Co>Operating system couple together to form a client-server ‘like’ architecture.
The Ab Initio software is a suite of products which together provide a platform for data processing applications. The Core Ab Initio products are:
  • Graphical Development Environment
  • Co>Operating System
  • Enterprise Meta>Environment (EME)
  • Conduct>It
  • The Component Library
  • Data Profiler
  • BRE (Business Repository Environment)


Ab Initio Software Corporation was founded in the mid-1990s by the former CEO of Thinking Machines Corporation, Sheryl Handler
Sheryl Handler
Sheryl Handler was one of the founders of Thinking Machines and is the founder and current CEO of Ab Initio.-External links:* http://www.inc.com/magazine/19950915/2622.html* ....

, and several other former employees after the bankruptcy of that company.

The Co>Operating System

Co>Operating System (Co>Op) layered on top of an operating system. It unites a network of computing resources CPUs, storage disks, programs, datasets into a data processing system with scalable performance. Co>operating system runs on OS like IBM AIX, SUN SOLARIS, HP UX, Windows NT
It can be connected to high performance database like IBM DB2, ORACLE, Informix, SQL Server and other software packages like SAS, Trillium

It is the heart of the Abinitio tool. All the graphs developed in GDE run on Co>operating system. Runs across a variety of Operating Systems and Hardware Platforms including OS/390, zOS on Mainframe
Mainframe computer
Mainframes are powerful computers used primarily by corporate and governmental organizations for critical applications, bulk data processing such as census, industry and consumer statistics, enterprise resource planning, and financial transaction processing.The term originally referred to the...

, Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

, Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

, and Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

. Supports distributed and parallel execution. Can provide scalability proportional to the hardware
Hardware
Hardware is a general term for equipment such as keys, locks, hinges, latches, handles, wire, chains, plumbing supplies, tools, utensils, cutlery and machine parts. Household hardware is typically sold in hardware stores....

 resources provided. Supports platform independent data transport using the Ab Initio data manipulation language. The Co>Operating System is the underlying system of all parts of the product suite allowing communication and integration of all parts into the platform. It will also help the native operating system to understand the Ab Initio commands.
Ab initio boost due to its effective Parallel Runtime Environment, where some or all of the components of an application – datasets and processing modules are replicated into a number of partitions, each spawning a process.

Forms of Parallelism
  • Component Parallelism
  • Pipeline Parallelism (Inherent in Ab Initio)
  • Data Parallelism (Inherent in Ab Initio)


Data parallelism - When data is divided into segments or partitions and processes run simultaneously on each partition. During processing, each partition is processed in parallel.

Component parallelism - When different instances of same component run on separate data sets. Components execute simultaneously on different branches of a graph.

Pipeline parallelism - When multiple components run on same data set i.e when a record is processed in one component and a previous record is being processed in another components. Operations like sorting and aggregation break pipeline parallelism.

The Graphical Development Environment

It is a GUI for building applications in Ab Initio and can talk (connect) to the Co-operating system using several protocols like Telnet, Rexec, Ssh, DCOM and FTP(for file transfer).
Is loosely bound to the Co>operating system. The Co>operating system have different release mechanisms, making Co>operating system upgrade possible without change in the GDE release

Provides graphical interface for editing and executing Ab Initio computer programs. This development environment utilizes the available components from the library to enable various ETL activities to occur. The Co>Operating System can execute these programs directly. Allows for monitoring of running applications to quantify data volumes and execution times for performance estimation.

An Ab Initio computer program is called a graph
Graph (mathematics)
In mathematics, a graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. The interconnected objects are represented by mathematical abstractions called vertices, and the links that connect some pairs of vertices are called edges...

 as it behaves similar to its math counterpart. A graph contains one or more components or vertices
Vertex (graph theory)
In graph theory, a vertex or node is the fundamental unit out of which graphs are formed: an undirected graph consists of a set of vertices and a set of edges , while a directed graph consists of a set of vertices and a set of arcs...

, each joined by a flow or edge through which data flows. Data flows only in one direction, which allows the graph to run in a parallel processing
Parallel processing
Parallel processing is the ability to carry out multiple operations or tasks simultaneously. The term is used in the contexts of both human cognition, particularly in the ability of the brain to simultaneously process incoming stimuli, and in parallel computing by machines.-Parallel processing by...

 environment. Each graph is compiled by the GDE into a Korn Shell
Korn shell
The Korn shell is a Unix shell which was developed by David Korn in the early 1980s and announced at USENIX on July 14, 1983. Other early contributors were AT&T Bell Labs developers Mike Veach, who wrote the emacs code, and Pat Sullivan, who wrote the vi code...

 script (or batch file in Windows environment) which can be run by the Co>Operating System.

Prior to evolution of GDE, developer used SDE - Shell Development Environment by which it was also possible to write Ab Initio programs(Graphs) using a common text editor (but that is extremely cumbersome so rarely done in practice now-a-days).
Following statements hold true about a Graph.
A Graph
  • is the logical modular unit of an application.
  • is a diagram that defines the various processing stages of a task and the streams of data as they move from one stage to another.
  • consists of several components that forms the building blocks of an Ab Initio application.
  • a Component (Used as re-useable sub-graph in another graph).
  • is a program that does a specific type of job and can be controlled by its parameter settings.

The Component Library

Reusable software Modules for Sorting, Joining, Data Transformation, Database Loading, etc. The components adapt at runtime to the record formats and business rules controlling their behavior. Usually maximum of the components were shipped along with the GDE. Components also include various system "connectors" giving access to various storage engines.

Enterprise Meta>Environment

Enterprise Meta>Environment (EME) is an object oriented data storage system that version controls and manages various kinds of information associated with Ab Initio applications, which may range from design information to operational data.
In simple terms, it is a repository, which contains data about data – metadata.

EME tracks changes in development of graphs, as well as metadata pertaining to the development, how data is used, and potential of other means of data classification. The storage of graph related Metadata
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...

 allows for data impact analysis to occur, giving the user a visual sense of how the data is changing in a graph, and the impacts those changes have on another graph. Additionally the EME doubles up for config/change management
Revision control
Revision control, also known as version control and source control , is the management of changes to documents, programs, and other information stored as computer files. It is most commonly used in software development, where a team of people may change the same files...

 allowing the latest version of a graph to reside past subsequent code changes thereby ensuring the latest code and data. It performs the following operations:
  1. version controlling,
  2. statistical analysis,
  3. dependence analysis, and
  4. metadata management.

Data Profiler

The Data Profiler is a graphical data analysis tool which runs on top of the Co>Operating system. It can be used to characterize data range, scope, distribution, variance, and quality.

Ab Initio Conduct>It is a high-volume data processing systems developing tool. It enables combining graphs from Graphical Development Environment with custom scripts and programs from other vendors.

Major Competitors

The Gartner Magic Quadrant for Data Integration Tools 2009 lists , Informatica
Informatica
Informatica Corporation is a NASDAQ listed company with ticker INFA. Founded in 1993, its headquarters is in Redwood City, California. Founded by Diaz Nesamoney and Gaurav Dhillon...

, Oracle Corporation
Oracle Corporation
Oracle Corporation is an American multinational computer technology corporation that specializes in developing and marketing hardware systems and enterprise software products – particularly database management systems...

, and SAP
SAP AG
SAP AG is a German software corporation that makes enterprise software to manage business operations and customer relations. Headquartered in Walldorf, Baden-Württemberg, with regional offices around the world, SAP is the market leader in enterprise application software...

 Business Objects as the leading vendors followed by Microsoft
Microsoft
Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...

 as challenger. iWay Software, SAS Institute
SAS System
SAS is an integrated system of software products provided by SAS Institute Inc. that enables programmers to perform:* retrieval, management, and mining* report writing and graphics* statistical analysis...

, Sybase
Sybase
Sybase, an SAP company, is an enterprise software and services company offering software to manage, analyze, and mobilize information, using relational databases, analytics and data warehousing solutions and mobile applications development platforms....

, Pervasive Software
Pervasive Software
Pervasive Software develops and distributes data infrastructure software and ETL tools that integrate, analyze, secure, manage and harvest data from disparate sources. Pervasive Data Integrator and Pervasive Data Profiler are the flagship integration products, and the Pervasive PSQL relational...

 and Talend
Talend
Talend is an open source software vendor that provides data integration, data management and enterprise application integration software and solutions. Headquartered in Suresnes, France and Los Altos, California, Talend has offices in North America, Europe and Asia, and a global network of...

 are listed as visionaries. The 2009 report also includes newcomers such as expressor, CloverETL
Clover.ETL
CloverETL is a Java-based data integration framework used to transform, cleanse, standardize and distribute data to applications, databases or warehouses...

 and Pentaho
Pentaho
The Pentaho BI Suite is open source Business Intelligence suite with integrated reporting, dashboard, data mining, workflow and ETL capabilities. Pentaho is headquartered in Orlando, USA.- Overview :...

software. Gartner considers Ab Initio to not meet its analysis criteria due to a lack of information available. Due to this lack of information available, Ab Initio has had a particularly poor run over the latest Gartner Magic Quadrants, dropping from Visionary (2005) to Niche (2006) before falling off the grid entirely (2007). Ab Initio has also been criticised for their extreme secrecy about their products. Anyone working with their product (even who work for organizations who use Ab Initio) operate under a non-disclosure agreement which prevents them from revealing Ab Initio technical information to the public. As its competitor started gaining marketshare, Ab Initio introduced a free feature-limited version known as Elementum in 2010. However, it is only available to customers who already purchased commercial license for Ab Initio. This version is only intended for desktop use.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK