NetCDF
Encyclopedia
NetCDF is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented
Array programming
In computer science, array programming languages generalize operations on scalars to apply transparently to vectors, matrices, and higher dimensional arrays....

 scientific data. The project homepage is hosted by the Unidata program at the University Corporation for Atmospheric Research
University Corporation for Atmospheric Research
The University Corporation for Atmospheric Research is a nonprofit consortium of more than 75 universities offering Ph.D.s in the atmospheric and related sciences. UCAR manages the National Center for Atmospheric Research and provides additional services to strengthen and support research and...

 (UCAR). They are also the chief source of netCDF software, standards development, updates, etc. The format is an open standard
Open standard
An open standard is a standard that is publicly available and has various rights to use associated with it, and may also have various properties of how it was designed . There is no single definition and interpretations vary with usage....

. NetCDF Classic and 64-bit Offset Format are an international standard of the Open Geospatial Consortium
Open Geospatial Consortium
The Open Geospatial Consortium , an international voluntary consensus standards organization, originated in 1994. In the OGC, more than 400 commercial, governmental, nonprofit and research organizations worldwide collaborate in a consensus process encouraging development and implementation of open...

.

The project is actively supported by UCAR. The recently released (2008) version 4.0 greatly enhances the data model by allowing the use of the HDF5 data file format. Version 4.1 adds support for C and Fortran client access to specified subsets of remote data via OPeNDAP
OPeNDAP
OPeNDAP, an acronym for "Open-source Project for a Network Data Access Protocol", is a data transport architecture and protocol widely used by earth scientists. The protocol is based on HTTP and the current specification is . OPeNDAP includes standards for encapsulating structured data, annotating...

.

The format was originally based on the conceptual model of the NASA
NASA
The National Aeronautics and Space Administration is the agency of the United States government that is responsible for the nation's civilian space program and for aeronautics and aerospace research...

 CDF
Common Data Format
Common Data Format is a library and toolkit that has been developed by NASA. The software is an interface for the storage and manipulation of multi-dimensional data sets.-See also:* CGNS * EAS3...

 but has since diverged and is not compatible with it.

Format description

The netCDF libraries support 3 different binary formats for netCDF files:
  • The classic format was used in the first netCDF release, and is still the default format for file creation.
  • The 64-bit offset format was introduced in version 3.6.0, and it supports larger variable and file sizes.
  • The netCDF-4/HDF5 format was introduced in version 4.0; it is the HDF5 data format, with some restrictions.


All formats are "self-describing
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...

". This means that there is a header
Header (information technology)
In information technology, header refers to supplemental data placed at the beginning of a block of data being stored or transmitted. In data transmission, the data following the header are sometimes called the payload or body....

 which describes the layout of the rest of the file, in particular the data arrays, as well as arbitrary file metadata in the form of name/value attributes
Attribute (computing)
In computing, an attribute is a specification that defines a property of an object, element, or file. It may also refer to or set the specific value for a given instance of such....

. The format is platform independent, with issues such as endianness
Endianness
In computing, the term endian or endianness refers to the ordering of individually addressable sub-components within the representation of a larger data item as stored in external memory . Each sub-component in the representation has a unique degree of significance, like the place value of digits...

 being addressed in the software libraries. The data are stored in a fashion that allows efficient subsetting.

Starting with version 4.0 of the netCDF API allows the use of the HDF5 data format. NetCDF users can create HDF5 files with benefits not available with the netCDF format, such as much larger files and multiple unlimited dimensions.

Full backward compatibility in accessing old netCDF files and using previous versions of the C and Fortran APIs is supported.

Access libraries

The software libraries supplied by UCAR provide read-write access to netCDF files, encoding and decoding the necessary arrays and metadata. The core library is written in C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

, and provides an API
Application programming interface
An application programming interface is a source code based specification intended to be used as an interface by software components to communicate with each other...

 for C, C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

 and two APIs for Fortran
Fortran
Fortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...

 applications, one for Fortran 77, and one for Fortran 90. An independent implementation, also developed and maintained by Unidata, is written in 100% Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

, which extends the core data model and adds additional functionality. Interfaces to netCDF based on the C library are also available in other languages including R (ncdf and ncvar packages), Perl
Perl Data Language
PDL is a set of array programming extensions to the Perl programming language.PDL is an extension to Perl v5, intended for scientific and other data intensive programming tasks...

, Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

, Ruby, MATLAB
MATLAB
MATLAB is a numerical computing environment and fourth-generation programming language. Developed by MathWorks, MATLAB allows matrix manipulations, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages,...

, IDL, and Octave
GNU Octave
GNU Octave is a high-level language, primarily intended for numerical computations. It provides a convenient command-line interface for solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that is mostly compatible with MATLAB...

. The specification of the API calls is very similar across the different languages, apart from inevitable differences of syntax. The API calls for version 2 were rather different from those in version 3, but are also supported by versions 3 and 4 for backward compatibility. Application programmers using supported languages need not normally be concerned with the file structure itself, even though it is available as open formats.

Applications

A wide range of application software has been written which makes use of netCDF files. These range from command line utilities to graphical visualization
Information graphics
Information graphics or infographics are graphic visual representations of information, data or knowledge. These graphics present complex information quickly and clearly, such as in signs, maps, journalism, technical writing, and education...

 packages. A number are listed below, and a longer list is on the UCAR website.
  • A commonly used set of Unix
    Unix
    Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

     command line utilities for netCDF files is the NetCDF Operators
    NetCDF Operators
    NCO is a suite of programs designed to facilitate manipulation and analysis of self-describing data stored in the netCDF format.-Program Suite:ncap: netCDF Arithmetic Processorncatted: netCDF Attribute Editor...

     (NCO) suite, which provide a range of commands for manipulation and analysis of netCDF files including basic record concatenating, slicing
    Slicing
    In object-oriented programming, a subclass typically extends its superclass by defining additional member variables. If a superclass instance is assigned its value from a subclass instance, member variables defined in the subclass cannot be copied, since the superclass has no place to store them. ...

     and averaging
    Average
    In mathematics, an average, or central tendency of a data set is a measure of the "middle" value of the data set. Average is one form of central tendency. Not all central tendencies should be considered definitions of average....

    .
  • ncBrowse is a generic netCDF file viewer that includes Java graphics, animations and 3D visualizations for a wide range of netCDF file conventions.
  • ncview is a visual browser for netCDF format files. This program is a simple, fast, GUI-based tool for visualising fields in a netCDF file. One can browse through the various dimensions of a data array, taking a look at the raw data values. It is also possible to change color maps, invert the data, etc.
  • Panoply is a netCDF file viewer developed at the NASA Goddard Institute for Space Studies
    Goddard Institute for Space Studies
    The NASA Goddard Institute for Space Studies , at Columbia University in New York City, is a component laboratory of NASA's Goddard Space Flight Center Earth-Sun Exploration Division and a unit of The Earth Institute at Columbia University...

     which focuses on presentation of geo-gridded data. It is written in Java
    Java (programming language)
    Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

     and thus platform independent. Although its feature set overlaps with ncBrowse and ncview, Panoply is distinguished by offering a wide variety of map projections and ability to work with different scale color tables.
  • The NCAR Command Language
    NCAR Command Language
    The NCAR Command Language is a free interpreted language designed by the National Center for Atmospheric Research for scientific visualization and data processing. NCL has robust file input and output...

     is used to analyze and visualize data in netCDF files (among other formats).
  • PyNIO is a Python
    Python (programming language)
    Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

     programming language module that allows read and/or write access to a variety of data formats, including netCDF.
  • Ferret
    Ferret Data Visualization and Analysis
    Ferret is an interactive computer visualization and analysis environment designed to meet the needs of oceanographers and meteorologists analyzing large and complex gridded data sets. Ferret offers a Mathematica-like approach to analysis; new variables may be defined interactively as mathematical...

     is an interactive computer visualization and analysis environment designed to meet the needs of oceanographers and meteorologists analyzing large and complex gridded data sets. Ferret offers a Mathematica-like approach to analysis; new variables may be defined interactively as mathematical expressions involving data set variables. Calculations may be applied over arbitrarily shaped regions. Fully documented graphics are produced with a single command.
  • nCDF_Browser is a visual nCDF browser, written in the IDL programming language. Variables, attributes, and dimensions can be immediately downloaded to the IDL command line for further processing. All the Coyote Library files necessary to run nCDF_Browser are available in the zip file.
  • ArcGIS versions after 9.2 support netCDF files that follow the Climate and Forecast Metadata Conventions
    Climate and Forecast Metadata Conventions
    The Climate and Forecast metadata conventions are conventions for the description of Earth sciences data, intended to promote the processing and sharing of data files. The metadata defined by the CF conventions are generally included in the same file as the data, thus making the file...

     and contain rectilinear grids with equally-spaced coordinates. The Multidimensional Tools toolbox can be used to create raster layers, feature layers, and table views from netCDF data in ArcMap, or convert feature, raster, and table data to netCDF.
  • Origin 8
    Origin (software)
    Origin is a proprietary computer program for interactive scientific graphing and data analysis. It is produced by OriginLab Corporation, and runs on Microsoft Windows...

     software imports netCDF files as matrix books where each book can hold a 4D array. Users can select a subset of the imported data to make surface, controur or image plots.
  • The Geospatial Data Abstraction Library
    GDAL
    GDAL is a library for reading and writing raster geospatial data formats, and is released under the permissive X/MIT style free software license by the Open Source Geospatial Foundation. As a library, it presents a single abstract data model to the calling application for all supported formats...

     provides support for read and write access to netCDF data.

Common uses

It is commonly used in climatology
Climatology
Climatology is the study of climate, scientifically defined as weather conditions averaged over a period of time, and is a branch of the atmospheric sciences...

, meteorology
Meteorology
Meteorology is the interdisciplinary scientific study of the atmosphere. Studies in the field stretch back millennia, though significant progress in meteorology did not occur until the 18th century. The 19th century saw breakthroughs occur after observing networks developed across several countries...

 and oceanography
Oceanography
Oceanography , also called oceanology or marine science, is the branch of Earth science that studies the ocean...

 applications (e.g., weather forecasting
Weather forecasting
Weather forecasting is the application of science and technology to predict the state of the atmosphere for a given location. Human beings have attempted to predict the weather informally for millennia, and formally since the nineteenth century...

, climate change
Climate change
Climate change is a significant and lasting change in the statistical distribution of weather patterns over periods ranging from decades to millions of years. It may be a change in average weather conditions or the distribution of events around that average...

) and GIS
Geographic Information System
A geographic information system, geographical information science, or geospatial information studies is a system designed to capture, store, manipulate, analyze, manage, and present all types of geographically referenced data...

 applications.

It is an input/output format for many GIS applications, and for general scientific data exchange. To quote from their site "NetCDF (network Common Data Form) is an interface for array-oriented data access and a library that provides an implementation of the interface. The netCDF library also defines a machine-independent format for representing scientific data."

Conventions

The Climate and Forecast (CF) conventions are metadata conventions for earth science data, intended to promote the processing and sharing of files created with the NetCDF Application Programmer Interface (API). The conventions define metadata that are included in the same file as the data (thus making the file "self-describing"), that provide a definitive description of what the data in each variable represents, and of the spatial and temporal properties of the data (including information about grids, such as grid cell bounds and cell averaging methods). This enables users of data from different sources to decide which data are comparable, and allows building applications with powerful extraction, regridding, and display capabilities.

Parallel-NetCDF

An extension of netCDF for parallel computing
Parallel computing
Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently . There are several different forms of parallel computing: bit-level,...

 called Parallel-NetCDF (or PnetCDF) has been developed by Argonne National Laboratory
Argonne National Laboratory
Argonne National Laboratory is the first science and engineering research national laboratory in the United States, receiving this designation on July 1, 1946. It is the largest national laboratory by size and scope in the Midwest...

 and Northwestern University
Northwestern University
Northwestern University is a private research university in Evanston and Chicago, Illinois, USA. Northwestern has eleven undergraduate, graduate, and professional schools offering 124 undergraduate degrees and 145 graduate and professional degrees....

. This is built upon MPI-IO, the I/O
Input/output
In computing, input/output, or I/O, refers to the communication between an information processing system , and the outside world, possibly a human, or another information processing system. Inputs are the signals or data received by the system, and outputs are the signals or data sent from it...

 extension to MPI
Message Passing Interface
Message Passing Interface is a standardized and portable message-passing system designed by a group of researchers from academia and industry to function on a wide variety of parallel computers...

 communications. Using the high-level netCDF data structures, the Parallel-NetCDF libraries can make use of optimizations to efficiently distribute the file read and write applications between multiple processors. The Parallel-NetCDF package can read/write only classic and 64-bit offset formats. Parallel-NetCDF cannot read or write the HDF5-based format available with netCDF-4.0. The Parallel-NetCDF package uses different, but similar APIs in Fortran and C.

Parallel I/O in the Unidata netCDF library has been supported since release 4.0, for HDF5 data files. Since version 4.1.1 the Unidata NetCDF C library supports parallel I/O to classic and 64-bit offset files using the Parallel-NetCDF library, but with the NetCDF API.

Interoperability of C/Fortran/C++ libraries with other formats

The netCDF C library, and the libraries based on it (Fortran 77 and Fortran 90, C++, and all third-party libraries) can, starting with version 4.1.1, read some data in other data formats. Data in the HDF5 format can be read, with some restrictions. Data in the HDF4 format can be read by the netCDF C library if created using the HDF4 Scientific Data (SD) API.

NetCDF-Java common data model

The NetCDF-Java library currently reads the following file formats and remote access protocols:
  • BUFR
    BUFR
    The Binary Universal Form for the Representation of meteorological data is a binary data format maintained by the World Meteorological Organization . The latest version is BUFR Edition 4...

     Format Documentation (ongoing development)
  • CINRAD level II (Chinese Radar format)
  • DMSP (Defense Meteorological Satellite Program
    Defense Meteorological Satellite Program
    The Defense Meteorological Satellite Program monitors meteorological, oceanographic, and solar-terrestrial physics for the United States Department of Defense. The program is now run by the National Oceanic and Atmospheric Administration. The mission of the satellites was revealed in March 1973...

    )
  • DORADE radar file format
  • GINI (GOES
    Goes
    Goes is a municipality and a city in the southwestern Netherlands in Zuid-Beveland, in the province Zeeland. The city of Goes has approximately 27,000 residents.-History of Goes:...

     Ingest and NOAAPORT Interface) image format
  • GEMPAK gridded data
  • GRIB
    GRIB
    GRIB is a mathematically concise data format commonly used in meteorology to store historical and forecast weather data...

     version 1 and version 2 (ongoing work on tables)
  • GTOPO 30-sec elevation dataset (USGS)
  • Hierarchical Data Format
    Hierarchical Data Format
    Hierarchical Data Format is the name of a set of file formats and libraries designed to store and organize large amounts of numerical data...

     (HDF4, HDF-EOS, HDF5, HDF5-EOS)
  • NetCDF (classic and large format)
  • NetCDF-4 (built on HDF5)
  • NEXRAD Radar level 2 and level 3.


There are a number of other formats in development. Since each of these is accessed transparently through the NetCDF API, the NetCDF-Java library is said to implement a Common Data Model for scientific datasets.

The Common Data Model has three layers, which build on top of each other to add successively richer semantics:
  1. The data access layer, also known as the syntactic layer, handles data reading.
  2. The coordinate system layer identifies the coordinates of the data arrays. Coordinates are a completely general concept for scientific data; specialized georeferencing
    Georeference
    To georeference something means to define its existence in physical space. That is, establishing its location in terms of map projections or coordinate systems. The term is used both when establishing the relation between raster or vector images and coordinates, and when determining the spatial...

     coordinate systems, important to the Earth Science community, are specially annotated.
  3. The scientific data type layer identifies specific types of data, such as grids, images, and point data, and adds specialized methods for each kind of data.


The Data model
Data model
A data model in software engineering is an abstract model, that documents and organizes the business data for communication between team members and is used as a plan for developing applications, specifically how data is stored and accessed....

 of the data access layer is a generalization of the NetCDF-3 data model, and substantially the same as the NetCDF-4 data model. The coordinate system layer implements and extends the concepts in the Climate and Forecast Metadata Conventions
Climate and Forecast Metadata Conventions
The Climate and Forecast metadata conventions are conventions for the description of Earth sciences data, intended to promote the processing and sharing of data files. The metadata defined by the CF conventions are generally included in the same file as the data, thus making the file...

. The scientific data type layer allows data to be manipulated in coordinate space, analogous to the Open Geospatial Consortium
Open Geospatial Consortium
The Open Geospatial Consortium , an international voluntary consensus standards organization, originated in 1994. In the OGC, more than 400 commercial, governmental, nonprofit and research organizations worldwide collaborate in a consensus process encouraging development and implementation of open...

 specifications. The identification of coordinate systems and data typing is ongoing, but users can plug in their own classes at runtime for specialized processing.

See also

  • Common Data Format
    Common Data Format
    Common Data Format is a library and toolkit that has been developed by NASA. The software is an interface for the storage and manipulation of multi-dimensional data sets.-See also:* CGNS * EAS3...

     (CDF)
  • CGNS
    CGNS
    CGNS stands for CFD General Notation System. It is a general, portable, and extensible standard for the storage and retrieval of CFD analysis data. It consists of a collection of conventions, and free and open software implementing those conventions...

     (CFD
    Computational fluid dynamics
    Computational fluid dynamics, usually abbreviated as CFD, is a branch of fluid mechanics that uses numerical methods and algorithms to solve and analyze problems that involve fluid flows. Computers are used to perform the calculations required to simulate the interaction of liquids and gases with...

     General Notation System)
  • EAS3
    EAS3
    EAS3 is a software toolkit for reading and writing structured binary data with geometry information and for postprocessing of these data. It is meant to exchange floating-point data according to IEEE standard between different computers, to modify them or to convert them into other file formats....

     (Ein-Ausgabe-System)
  • FITS
    FITS
    Flexible Image Transport System is a digital file format used to store, transmit, and manipulate scientific and other images. FITS is the most commonly used digital file format in astronomy...

     (Flexible Image Transport System)
  • GRIB
    GRIB
    GRIB is a mathematically concise data format commonly used in meteorology to store historical and forecast weather data...

     (GRIdded Binary)
  • Hierarchical Data Format
    Hierarchical Data Format
    Hierarchical Data Format is the name of a set of file formats and libraries designed to store and organize large amounts of numerical data...

     (HDF)

  • OPeNDAP
    OPeNDAP
    OPeNDAP, an acronym for "Open-source Project for a Network Data Access Protocol", is a data transport architecture and protocol widely used by earth scientists. The protocol is based on HTTP and the current specification is . OPeNDAP includes standards for encapsulating structured data, annotating...

     client-server protocols
  • Tecplot
    Tecplot
    Tecplot is the name of a family of visualization software tools developed by Tecplot, Inc., which is headquartered in Bellevue, Washington.-Tecplot Chorus:...

     binary files
  • XMDF
    XMDF
    XMDF is a library providing a standard format for the geometry data storage of river cross-sections, 2D/3D structured and unstructured meshes, geometric paths through space, and associated time data. XMDF uses HDF5 for cross-platform data storage and compression...

     (eXtensible Model Data Format)

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK