Dataflow language
Encyclopedia
In computer programming
Computer programming
Computer programming is the process of designing, writing, testing, debugging, and maintaining the source code of computer programs. This source code is written in one or more programming languages. The purpose of programming is to create a program that performs specific operations or exhibits a...

, dataflow programming is a programming paradigm
Programming paradigm
A programming paradigm is a fundamental style of computer programming. Paradigms differ in the concepts and abstractions used to represent the elements of a program and the steps that compose a computation A programming paradigm is a fundamental style of computer programming. (Compare with a...

 that models a program as a directed graph
Directed graph
A directed graph or digraph is a pair G= of:* a set V, whose elements are called vertices or nodes,...

 of the data flowing between operations, thus implementing dataflow
Dataflow
Dataflow is a term used in computing, and may have various shades of meaning. It is closely related to message passing.-Software architecture:...

 principles and architecture. Dataflow programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

s share some features of functional languages, and were generally developed in order to bring some functional concepts to a language more suitable for numeric processing.

Properties of dataflow programming languages

Dataflow programming focuses on how things connect, unlike imperative programming
Imperative programming
In computer science, imperative programming is a programming paradigm that describes computation in terms of statements that change a program state...

, which focuses on how things happen.
In imperative programming
Imperative programming
In computer science, imperative programming is a programming paradigm that describes computation in terms of statements that change a program state...

 a program is modeled as a series of operations (things that "happen"), the flow of data between these operations is of secondary concern to the behavior of the operations themselves. However, dataflow programming models programs as a series of (sometimes interdependent) connections, with the operations between these connections being of secondary importance.

One of the key concepts in computer programming is the idea of "state", essentially a snapshot of the measure of various conditions in the system. Most programming languages require a considerable amount of state information in order to operate properly, information which is generally hidden from the programmer. For a real world example, consider a three-way light switch. Typically a switch turns on a light by moving it to the "on" position, but in a three-way case that may turn the light back off — the result is based on the state of the other switch, which is likely out of view.

In fact, the state is often hidden from the computer itself as well, which normally has no idea that this piece of information encodes state, while that is temporary and will soon be discarded. This is a serious problem, as the state information needs to be shared across multiple processors in parallel processing
Parallel processing
Parallel processing is the ability to carry out multiple operations or tasks simultaneously. The term is used in the contexts of both human cognition, particularly in the ability of the brain to simultaneously process incoming stimuli, and in parallel computing by machines.-Parallel processing by...

 machines. Without knowing which state is important and which isn't, most languages force the programmer to add a considerable amount of extra code to indicate which data and parts of the code are important in this respect.

This code tends to be both expensive in terms of performance, as well as difficult to debug and often downright ugly; most programmers simply ignore the problem. Those that cannot must pay a heavy performance cost, which is paid even in the most common case when the program runs on one processor. Explicit parallelism is one of the main reasons for the poor performance of Enterprise Java Beans when building data-intensive, non-OLTP
Online transaction processing
Online transaction processing, or OLTP, refers to a class of systems that facilitate and manage transaction-oriented applications, typically for data entry and retrieval transaction processing...

 applications.

Dataflow languages promote the data to become the main concept behind any program. It may be considered odd that this is not always the case, as programs generally take in data, process it, and then feed it back out. This was especially true of older programs, and is well represented in the Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

 operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

 which pipes the data between small single-purpose tools. Programs in a dataflow language start with an input, perhaps the command line parameters, and illustrate how that data is used and modified. The data is now explicit, often illustrated physically on the screen as a line or pipe showing where the information flows.

Operations consist of "black boxes" with inputs and outputs, all of which are always explicitly defined. They run as soon as all of their inputs become valid, as opposed to when the program encounters them. Whereas a traditional program essentially consists of a series of statements saying "do this, now do this", a dataflow program is more like a series of workers on an assembly line
Assembly line
An assembly line is a manufacturing process in which parts are added to a product in a sequential manner using optimally planned logistics to create a finished product much faster than with handcrafting-type methods...

, who will do their assigned task as soon as the materials arrive. This is why dataflow languages are inherently parallel; the operations have no hidden state to keep track of, and the operations are all "ready" at the same time.

Dataflow programs are generally represented very differently inside the computer as well. A traditional program is just what it seems, a series of instructions that run one after the other. A dataflow program might be implemented as a big hash table
Hash table
In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...

 instead, with uniquely identified inputs as the keys, and pointers to the code as data. When any operation completes, the program scans down the list of operations until it finds the first operation where all of the inputs are currently valid, and runs it. When that operation finishes it will typically put data into one or more outputs, thereby making some other operation become valid.

For parallel operation only the list needs to be shared; the list itself is the state of the entire program. Thus the task of maintaining state is removed from the programmer and given to the language's runtime
Run-time system
A run-time system is a software component designed to support the execution of computer programs written in some computer language...

 instead. On machines with a single processor core where an implementation designed for parallel operation would simply introduce overhead, this overhead can be removed completely by using a different runtime.

There are many hardware architectures oriented toward the efficient implementation of dataflow programming models. MIT's tagged token dataflow architecture was designed by Greg Papadopoulos
Greg Papadopoulos
Greg Papadopoulos, Ph.D. was Executive Vice President and Chief Technology Officer of Sun Microsystems from September 1994 until February 2010. He is the creator and lead proponent for Redshift, a theory on whether technology markets are over or under-served by Moore's Law.Papadopoulos achieved a...

.

Data flow has also been proposed as an abstraction for specifying the global behavior of distributed system components: in the live distributed object
Live distributed object
- Definitions :The term live distributed object refers to a running instance of a distributed multi-party protocol, viewed from the object-oriented perspective, as an entity that has a distinct identity, may encapsulate internal state and threads of execution, and that exhibits a well-defined...

s programming model, distributed data flow
Distributed data flow
Distributed data flow refers to a set of events in a distributed application or protocol that satisfies the following informal properties:* Asynchronous, non-blocking, and one-way...

s are used to store and communicate state, and as such, they play the role analogous to variables, fields, and parameters in Java-like programming languages.

History

Dataflow languages were originally developed in order to make parallel programming easier. In Bert Sutherland
Bert Sutherland
William Robert "Bert" Sutherland , older brother of Ivan Sutherland, was the longtime manager of three prominent research labs, including Sun Microsystems Laboratories , the Systems Science Laboratory at Xerox PARC , and the Computer Science Division of Bolt, Beranek and Newman, Inc...

's 1966 Ph.D. thesis, The On-line Graphical Specification of Computer Procedures, Sutherland created one of the first graphical dataflow programming frameworks. Subsequent dataflow languages were often developed at the large supercomputer
Supercomputer
A supercomputer is a computer at the frontline of current processing capacity, particularly speed of calculation.Supercomputers are used for highly calculation-intensive tasks such as problems including quantum physics, weather forecasting, climate research, molecular modeling A supercomputer is a...

 labs. One of the most popular was SISAL
SISAL
SISAL is a general-purpose single assignment functional programming language with strict semantics, implicit parallelism, and efficient array handling. SISAL outputs a dataflow graph in Intermediary Form 1...

, developed at Lawrence Livermore National Laboratory
Lawrence Livermore National Laboratory
The Lawrence Livermore National Laboratory , just outside Livermore, California, is a Federally Funded Research and Development Center founded by the University of California in 1952...

. SISAL looks like most statement-driven languages, but variables should be assigned once. This allows the compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...

 to easily identify the inputs and outputs. A number of offshoots of SISAL have been developed, including SAC
SAC programming language
SAC is a strict purely functional programming language which design is focused on the needs of numerical applications. Emphasis is laid on efficient support for array processing. Efficiency concerns are essentially twofold...

, Single Assignment C, which tries to remain as close to the popular C programming language
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

 as possible.

A more radical concept is Prograph
Prograph
Prograph is a visual, object-oriented, dataflow, multiparadigm programming language that uses iconic symbols to represent actions to be taken on data. Commercial Prograph software development environments such as Prograph Classic and Prograph CPX were available for the Apple Macintosh and Windows...

, in which programs are constructed as graphs onscreen, and variables are replaced entirely with lines linking inputs to outputs. Ironically, Prograph was originally written on the Macintosh, which remained single-processor until the introduction of the DayStar Genesis MP
DayStar Digital
DayStar Digital, Inc., was founded in 1983 by Andrew Lewis as a subcontract manufacturer of electronic assemblies and circuit boards. In 1986, the company released memory upgrades for Apple Macintosh Computers, its first product. In 1987 the company began to market processor upgrades exclusively...

 in 1996.

The most popular dataflow languages are more practical, the most famous being National Instruments
National Instruments
National Instruments Corporation, or NI , is an American company with over 5,000 employees and direct operations in 41 countries. Headquartered in Austin, Texas, it is a producer of automated test equipment and virtual instrumentation software...

 LabVIEW
LabVIEW
LabVIEW is a system design platform and development environment for a visual programming language from National Instruments. LabVIEW provides engineers and scientists with the tools needed to create and deploy measurement and control systems.The graphical language is named "G"...

. It was originally intended to make linking data between lab equipment easy for non-programmers, but has since become more general purpose. Another is VEE
Agilent VEE
Agilent VEE is a graphical dataflow programming software development environment from Agilent Technologies for automated test, measurement, data analysis and reporting. VEE originally stood for Visual Engineering Environment and developed by HP designated as HP VEE; it has since been officially...

, optimized to use with data acquisition devices like digital voltmeters and oscilloscopes, and source devices like arbitrary waveform generators and power supplies.

Languages

  • Agilent VEE
    Agilent VEE
    Agilent VEE is a graphical dataflow programming software development environment from Agilent Technologies for automated test, measurement, data analysis and reporting. VEE originally stood for Visual Engineering Environment and developed by HP designated as HP VEE; it has since been officially...

  • AviSynth
    AviSynth
    AviSynth is a frameserver program for Microsoft Windows developed by Ben Rudiak-Gould, Edwin van Eggelen, Klaus Post, Richard Berg, Ian Brabham and others. It is free software under GNU GPL license.-Scripting video editor:...

     scripting language, for video processing
  • Blitzprog concurrent data flow programming language
  • BMDFM
    BMDFM
    BMDFM is software, which enables running an application in parallel on shared memory symmetric multiprocessors using the multiple processors to speed up the execution of single applications....

     Binary Modular Dataflow Machine
  • DUP dataflow-based coordination language for POSIX
    POSIX
    POSIX , an acronym for "Portable Operating System Interface", is a family of standards specified by the IEEE for maintaining compatibility between operating systems...

     systems
  • Hartmann pipeline
    Hartmann pipeline
    A Hartmann pipeline is an extension of the Unix pipeline concept, providing for more complex paths, multiple input/output streams, and other features. It is an example and extension of Pipeline programming....

    s
  • JMax - the jMax visual programming environment for building interactive real-time music and multimedia applications
  • LabVIEW
    LabVIEW
    LabVIEW is a system design platform and development environment for a visual programming language from National Instruments. LabVIEW provides engineers and scientists with the tools needed to create and deploy measurement and control systems.The graphical language is named "G"...

     / G
  • LAU [French]
  • Lily - Inspired by Max/MSP, but it runs in a browser. home page
  • Lucid
  • Lustre
  • Max/Msp
  • Microsoft Visual Programming Language
    Microsoft Visual Programming Language
    Microsoft Visual Programming Language, or MVPL, is a visual programming and dataflow programming language developed by Microsoft for the Microsoft Robotics Studio. The Microsoft Visual Programming Language is distinguished from other Microsoft programming languages such as Visual Basic and C#, as...

     - A component of Microsoft Robotics Studio
    Microsoft Robotics Studio
    Microsoft Robotics Developer Studio is a Windows-based environment for robot control and simulation. It is aimed at academic, hobbyist, and commercial developers and handles a wide variety of robot hardware....

     designed for Robotics
    Robotics
    Robotics is the branch of technology that deals with the design, construction, operation, structural disposition, manufacture and application of robots...

     programming
  • OpenWire
    OpenWire (library)
    The OpenWire is an open source Dataflow programming VCL and FireMonkey library that extends the functionality of Delphi, C++ Builder and Lazarus by providing pin type component properties. The properties can be connected to each other...

     - adds visual
    Visual programming language
    In computing, a visual programming language is any programming language that lets users create programs by manipulating program elements graphically rather than by specifying them textually. A VPL allows programming with visual expressions, spatial arrangements of text and graphic symbols, used...

     dataflow programming capabilities to Delphi
    Delphi
    Delphi is both an archaeological site and a modern town in Greece on the south-western spur of Mount Parnassus in the valley of Phocis.In Greek mythology, Delphi was the site of the Delphic oracle, the most important oracle in the classical Greek world, and a major site for the worship of the god...

     via VCL
    Visual Component Library
    VCL is a visual component-based object-oriented framework for developing Microsoft Windows applications. It was developed by Borland for use in, and tightly integrated with, its Delphi and C++Builder RAD tools...

     or FireMonkey components and a graphical editor (homonymous binary protocol is unrelated)
  • Oz
    Oz (programming language)
    Oz is a multiparadigm programming language, developed in the Programming Systems Lab at Université catholique de Louvain, for programming language education. It has a canonical textbook: Concepts, Techniques, and Models of Computer Programming....

     now also distributed since 1.4.0
  • Pifagor
  • Prograph
    Prograph
    Prograph is a visual, object-oriented, dataflow, multiparadigm programming language that uses iconic symbols to represent actions to be taken on data. Commercial Prograph software development environments such as Prograph Classic and Prograph CPX were available for the Apple Macintosh and Windows...

  • Pure Data
    Pure Data
    Pure Data is a visual programming language developed by Miller Puckette in the 1990s for creating interactive computer music and multimedia works. While Puckette is the main author of the program, Pd is an open source project with a large developer base working on new extensions to it. It is...

  • Quartz Composer
    Quartz Composer
    Quartz Composer is a node-based visual programming language provided as part of the Xcode development environment in Mac OS X for processing and rendering graphical data....

     - Designed by Apple; used for graphic animations and effects
  • Show and Tell
  • RVC-CAL
  • Simulink
    Simulink
    Simulink, developed by MathWorks, is a commercial tool for modeling, simulating and analyzing multidomain dynamic systems. Its primary interface is a graphical block diagramming tool and a customizable set of block libraries. It offers tight integration with the rest of the MATLAB environment and...

  • SAC
    SAC programming language
    SAC is a strict purely functional programming language which design is focused on the needs of numerical applications. Emphasis is laid on efficient support for array processing. Efficiency concerns are essentially twofold...

     Single Assignment C
  • SIGNAL (a dataflow-oriented synchronous language enabling multi-clock specifications)
  • SISAL
    SISAL
    SISAL is a general-purpose single assignment functional programming language with strict semantics, implicit parallelism, and efficient array handling. SISAL outputs a dataflow graph in Intermediary Form 1...

  • SPACE - AREVA
    Areva
    AREVA is a French public multinational industrial conglomerate headquartered in the Tour Areva in Courbevoie, Paris. AREVA is mainly known for nuclear power; it also has interests in other energy projects. It was created on 3 September 2001, by the merger of Framatome , Cogema and...

    s toolchain for the TELEPERM XS instrumentation and control system used in the nuclear industry
  • Tersus
    Tersus
    Tersus Visual Programming Platform is a general purpose software development platform that enables the development of applications, mainly rich web applications, by drawing flow diagrams instead of writing code...

     - Visual progamming platform (open source)
  • Verilog
    Verilog
    In the semiconductor and electronic design industry, Verilog is a hardware description language used to model electronic systems. Verilog HDL, not to be confused with VHDL , is most commonly used in the design, verification, and implementation of digital logic chips at the register-transfer level...

  • VHDL
  • Vignette's VBIS language for business processes integration
  • vvvv
    Vvvv
    vvvv is a general purpose toolkit with a special focus on real time video synthesis and programming large media environments with physical interfaces, real-time motion graphics, audio and video...

  • X10 (programming language)
    X10 (programming language)
    X10 is a programming language being developed by IBM at the Thomas J. Watson Research Center as part of the Productive, Easy-to-use, Reliable Computing System project funded by DARPA's High Productivity Computing Systems program...

  • XEE (Starlight)
    XEE (Starlight)
    XEE is a visual language for data processing and ETL tasks. It is designed for the Starlight Information Visualization System as a method for producing and processing XML data....

     XML Engineering Environment
  • XProc
    XProc
    XProc is a W3C Recommendation to define an XML transformation language to define XML Pipelines.Below is an example abbreviated XProc file: This is a pipeline that consists of two atomic steps, XInclude and Validate...


Application Programming Interfaces

  • SystemC
    SystemC
    SystemC is a set of C++ classes and macros which provide an event-driven simulation kernel in C++ . These facilities enable a designer to simulate concurrent processes, each described using plain C++ syntax...

    : Library for C++, mainly aimed at hardware design.
  • "Pervasive DataRush": Java API for Dataflow programming
  • ecto, C++ backend with graph construction/execution in python

External Links


See also

  • Dataflow
    Dataflow
    Dataflow is a term used in computing, and may have various shades of meaning. It is closely related to message passing.-Software architecture:...

  • Actor model
    Actor model
    In computer science, the Actor model is a mathematical model of concurrent computation that treats "actors" as the universal primitives of concurrent digital computation: in response to a message that it receives, an actor can make local decisions, create more actors, send more messages, and...

  • Digital signal processing
    Digital signal processing
    Digital signal processing is concerned with the representation of discrete time signals by a sequence of numbers or symbols and the processing of these signals. Digital signal processing and analog signal processing are subfields of signal processing...

  • Event-driven programming
    Event-driven programming
    In computer programming, event-driven programming or event-based programming is a programming paradigm in which the flow of the program is determined by events—i.e., sensor outputs or user actions or messages from other programs or threads.Event-driven programming can also be defined as an...

  • Flow-based programming
    Flow-based programming
    In computer science, flow-based programming is a programming paradigm that defines applications as networks of "black box" processes, which exchange data across predefined connections by message passing, where the connections are specified externally to the processes...

  • Functional reactive programming
    Functional reactive programming
    Functional reactive programming is a programming paradigm for reactive programming using the building blocks of functional programmingThe key points of FRP are:* Input is viewed as a "behavior", or time-varying stream of events...

  • Incremental computing
    Incremental computing
    Incremental computing, also known as incremental computation, is a software feature which, whenever a piece of data changes, attempts to save time by only recomputing those outputs which "depend on" the changed data....

  • Partitioned global address space
    Partitioned global address space
    In computer science, a partitioned global address space is a parallel programming model. It assumes a global memory address space that is logically partitioned and a portion of it is local to each processor. The novelty of PGAS is that the portions of the shared memory space may have an affinity...

  • Signal programming
    Signal programming
    Signal programming is used in the same sense as dataflow programming, and is similar to event-driven programming.The word signal is used instead of the word dataflow in documentation of such libraries as Qt, GTK+ and libsigc++...

  • Stream processing
    Stream processing
    Stream processing is a computer programming paradigm, related to SIMD , that allows some applications to more easily exploit a limited form of parallel processing...

  • Yahoo Pipes
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK