Innovative Routines International
Encyclopedia
Innovative Routines International (IRI), Inc. is an American software company known for bringing mainframe sort merge
Mainframe sort merge
The Sort/Merge utility is a mainframe program to sort records in a file into a specified order, merge pre-sorted files into a sorted file, or copy selected records...

 functionality into open systems
Open system (computing)
Open systems are computer systems that provide some combination of interoperability, portability, and open software standards. The term was popularized in the early 1980s, mainly to describe systems based on Unix,...

. IRI was the first vendor to develop a commercial replacement for the Unix sort
Sort (Unix)
sort is a standard Unix command line program that prints the lines of its input or concatenation of all files listed in its argument list in sorted order. Sorting is done based on one or more sort keys extracted from each line of input. By default, the entire input is taken as sort key...

 command, and combine data transformation
Data transformation
In metadata and data warehouse, a data transformation converts data from a source data format into destination data.Data transformation can be divided into two steps:...

 and reporting
Report Generator
Broadly speaking a Report Generator is an application whose purpose it is to take data from a source such as an XML data feed and then use it to produce a document in a format which satisfies a particular human readership....

 functionality in Unix batch processing
Batch processing
Batch processing is execution of a series of programs on a computer without manual intervention.Batch jobs are set up so they can be run to completion without manual intervention, so all input data is preselected through scripts or command-line parameters...

 environments. In 2007, IRI's coroutine
Coroutine
Coroutines are computer program components that generalize subroutines to allow multiple entry points for suspending and resuming execution at certain locations...

 sort ("CoSort") became the first product to collate and convert multi-gigabyte XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

 and LDIF files, join
Join (SQL)
An SQL join clause combines records from two or more tables in a database. It creates a set that can be saved as a table or used as is. A JOIN is a means for combining fields from two tables by using values common to each. ANSI standard SQL specifies four types of JOINs: INNER, OUTER, LEFT, and RIGHT...

 and lookup
Lookup
In computing, lookup usually refers to searching a data structure for an item that satisfies some specified property. For example, variable lookup performed by a language interpreter, virtual machine or other similar engine usually consists of performing certain...

 across multiple files, and apply role-based data privacy
Data privacy
Information privacy, or data privacy is the relationship between collection and dissemination of data, technology, the public expectation of privacy, and the legal and political issues surrounding them....

 functions (including AES-256 encryption
Advanced Encryption Standard
Advanced Encryption Standard is a specification for the encryption of electronic data. It has been adopted by the U.S. government and is now used worldwide. It supersedes DES...

) for fields within sensitive files.

IRI is headquartered in Melbourne, Florida, United States, and has resale and support offices in 25 countries, including France, Japan, South Africa, and Brazil. Primary computing platform partners include HP, IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...

, Fujitsu
Fujitsu
is a Japanese multinational information technology equipment and services company headquartered in Tokyo, Japan. It is the world's third-largest IT services provider measured by revenues....

, Intel, Novell
Novell
Novell, Inc. is a multinational software and services company. It is a wholly owned subsidiary of The Attachmate Group. It specializes in network operating systems, such as Novell NetWare; systems management solutions, such as Novell ZENworks; and collaboration solutions, such as Novell Groupwise...

, Red Hat
Red Hat
Red Hat, Inc. is an S&P 500 company in the free and open source software sector, and a major Linux distribution vendor. Founded in 1993, Red Hat has its corporate headquarters in Raleigh, North Carolina with satellite offices worldwide....

, Sun Microsystems
Sun Microsystems
Sun Microsystems, Inc. was a company that sold :computers, computer components, :computer software, and :information technology services. Sun was founded on February 24, 1982...

, and Microsoft
Microsoft
Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...

. CoSort users include: AIM Healthcare, EDS, HSBC Insurance, and Thomson Reuters.

Products

IRI software is designed to transform, convert, report, and protect large data volumes rapidly in distributed, heterogeneous computing environments. These functions are built into the CoSort package or through spin-offs for data extraction, generation, security, and migration. Each tool uses the same metadata
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...

 format for defining and manipulating data. This data definition file format is also supported by Meta Integration Technology (MITI) so that third-party ETL, BI, and data modeling tool users can re-use their existing metadata in IRI product environments.

CoSort

CoSort was released for CP/M
CP/M
CP/M was a mass-market operating system created for Intel 8080/85 based microcomputers by Gary Kildall of Digital Research, Inc...

 in 1978, DOS
DOS
DOS, short for "Disk Operating System", is an acronym for several closely related operating systems that dominated the IBM PC compatible market between 1981 and 1995, or until about 2000 if one includes the partially DOS-based Microsoft Windows versions 95, 98, and Millennium Edition.Related...

 in 1980, Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

 in the mid-eighties, and Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

 in the early nineties, and received a readership award from DMReview magazine in 2000, CoSort was initially designed as a file sorting utility, and added interfaces to replace or convert the sort program parameters used in IBM Infosphere DataStage
Datastage
IBM InfoSphere DataStage is an ETL tool and part of the IBM Information Platforms Solutions suite and IBM InfoSphere. It uses a graphical notation to construct data integration solutions and is available in various versions such as the Server Edition and the Enterprise Edition.DataStage originated...

, Informatica
Informatica
Informatica Corporation is a NASDAQ listed company with ticker INFA. Founded in 1993, its headquarters is in Redwood City, California. Founded by Diaz Nesamoney and Gaurav Dhillon...

, Micro Focus COBOL, JCL
Job Control Language
Job Control Language is a scripting language used on IBM mainframe operating systems to instruct the system on how to run a batch job or start a subsystem....

, NATURAL
NATURAL
NATURAL is a fourth-generation programming language from Software AG. It is largely used for building databases output in plain text form, for example. * Hello World in NATURAL WRITE 'Hello World!' END...

, SAS
SAS System
SAS is an integrated system of software products provided by SAS Institute Inc. that enables programmers to perform:* retrieval, management, and mining* report writing and graphics* statistical analysis...

, and SyncSort Unix.

In 1992, CoSort added related data manipulation functions through a control language interface based on DEC
Digital Equipment Corporation
Digital Equipment Corporation was a major American company in the computer industry and a leading vendor of computer systems, software and peripherals from the 1960s to the 1990s...

 VAX/VMS
OpenVMS
OpenVMS , previously known as VAX-11/VMS, VAX/VMS or VMS, is a computer server operating system that runs on VAX, Alpha and Itanium-based families of computers. Contrary to what its name suggests, OpenVMS is not open source software; however, the source listings are available for purchase...

 sort utility syntax, which evolved through the years to handle file-based data integration
Data integration
Data integration involves combining data residing in different sources and providing users with a unified view of these data.This process becomes significant in a variety of situations, which include both commercial and scientific domains...

 and staging functions in data warehouse
Data warehouse
In computing, a data warehouse is a database used for reporting and analysis. The data stored in the warehouse is uploaded from the operational systems. The data may pass through an operational data store for additional operations before it is used in the DW for reporting.A data warehouse...

 ETL
Extract, transform, load
Extract, transform and load is a process in database usage and especially in data warehousing that involves:* Extracting data from outside sources* Transforming it to fit operational needs...

 operations:

"For data warehouse and data mart applications, CoSort performs source data extraction, data cleansing, sorting, reformatting, data type conversion, aggregation, and indexing, all in a single pass. Most operational data in commercial and public sector enterprises reside internally in sequential flat files, (relational) database tables, or are imported from data tapes and transmissions generated externally. These historical databases are optimized for ad hoc queries and transactions, rather than for extraction. CoSort accepts multiple input files (large-scale tables or flat-file data dumps), or records streaming through pipes, to perform conditional selection on records for downstream processes." - Dennis Hill, Database Trends Magazine, July 1999

CoSort Version 9, released in 2007, can simultaneously transform, convert, report, and/or protect data for ETL
Extract, transform, load
Extract, transform and load is a process in database usage and especially in data warehousing that involves:* Extracting data from outside sources* Transforming it to fit operational needs...

, business intelligence
Business intelligence
Business intelligence mainly refers to computer-based techniques used in identifying, extracting, and analyzing business data, such as sales revenue by products and/or departments, or by associated costs and incomes....

, change data capture
Change data capture
In databases, change data capture is a set of software design patterns used to determine the data that has changed so that action can be taken using the changed data...

, database load and query, application development, and data migration
Data migration
Data migration is the process of transferring data between storage types, formats, or computer systems. Data migration is usually performed programmatically to achieve an automated migration, freeing up human resources from tedious tasks...

 activities.

Fast Extract (FACT)

FACT for Oracle
Oracle Database
The Oracle Database is an object-relational database management system produced and marketed by Oracle Corporation....

 and FACT for IBM DB2
IBM DB2
The IBM DB2 Enterprise Server Edition is a relational model database server developed by IBM. It primarily runs on Unix , Linux, IBM i , z/OS and Windows servers. DB2 also powers the different IBM InfoSphere Warehouse editions...

 exports large database tables into flat files for archive, ETL, reorg, reporting and other applications. FACT and CoSort used together "provide for rapid unloading and transformation of data in Oracle databases in support of ETL processes."

RowGen

RowGen is designed to generate test data
Test data
Test Data are data which have been specifically identified for use in tests, typically of a computer program.Some data may be used in a confirmatory way, typically to verify that a given set of input to a given function produces some expected result...

 in production table, file, and report formats for prototype database population, compliance, outsourcing, and application prototyping projects. RowGen's GUI
Gui
Gui or guee is a generic term to refer to grilled dishes in Korean cuisine. These most commonly have meat or fish as their primary ingredient, but may in some cases also comprise grilled vegetables or other vegetarian ingredients. The term derives from the verb, "gupda" in Korean, which literally...

 parses data models to define table layouts and relationships so database test sets are structurally and referentially correct. RowGen can also transform
Data transformation
In metadata and data warehouse, a data transformation converts data from a source data format into destination data.Data transformation can be divided into two steps:...

 and format test data during its generation.

FieldShield

Like RowGen, FieldShield is a CoSort spin-off designed to protect data privacy. The software protects personally identifiable information
Personally identifiable information
Personally Identifiable Information , as used in information security, is information that can be used to uniquely identify, contact, or locate a single person or can be used with other sources to uniquely identify a single individual...

 and other private data at the field
Field (computer science)
In computer science, data that has several parts can be divided into fields. Relational databases arrange data as sets of database records, also called rows. Each record consists of several fields; the fields of all records form the columns....

 or record level within files and other sources subject to data spill
Data spill
A data breach is the intentional or unintentional release of secure information to an untrusted environment. Other terms for this phenomenon include unintentional information disclosure, data leak and also data spill...

. Privacy functions include AES
Advanced Encryption Standard
Advanced Encryption Standard is a specification for the encryption of electronic data. It has been adopted by the U.S. government and is now used worldwide. It supersedes DES...

 encryption
Encryption
In cryptography, encryption is the process of transforming information using an algorithm to make it unreadable to anyone except those possessing special knowledge, usually referred to as a key. The result of the process is encrypted information...

, data masking
Data masking
Data masking is the process of obscuring specific data elements within data stores. It ensures that sensitive data is replaced with realistic but not real data. The goal is that sensitive customer information is not available outside of the authorized environment...

, and pseudonymization
Pseudonymization
Pseudonymisation is a procedure by which the most identifying fields within a data record are replaced by one or more artificial identifier's. There can be a single pseudonym for a collection of replaced fields or a pseudonym per replaced field. The purpose is to render the data record less...

. Job details can be audited from a log file
Log file
The term log file can refer to:*Text saved by a computer operating system to recored its activities, such as by the Unix syslog facility*Output produced by a data loggerAlso see Wikibooks chapter...

 in XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

 format.

NextForm

NextForm is a data migration
Data migration
Data migration is the process of transferring data between storage types, formats, or computer systems. Data migration is usually performed programmatically to achieve an automated migration, freeing up human resources from tedious tasks...

 spin-off from CoSort functionality designed to convert between structured file formats such as CSV
Comma-separated values
A comma-separated values file stores tabular data in plain-text form. As a result, such a file is easily human-readable ....

, ISAM
ISAM
ISAM stands for Indexed Sequential Access Method, a method for indexing data for fast retrieval. ISAM was originally developed by IBM for mainframe computers...

, LDIF, and XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

, plus data types such as ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

, EBCDIC
EBCDIC
Extended Binary Coded Decimal Interchange Code is an 8-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems....

, Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

, and Packed Decimal.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK