Appendix B - Standards for Near-Term and Longer-Term
Missions
Near-Term Missions Standards Recommendations
July 30, 2002
ESDSWG Near-Term Mission Standard Study Team
Contributors
- Richard Ullman, NASA/GSFC, Study Team Lead
- Jingli Yang, ERT, Study Team
- Cheryl Craig, NCAR, Study Team
- John Evans, GST, Study Team
- Larry Klein, L-3 Analytics, Study Team
- Dorian Shuford, ERT, Study Team
- Siri Jodha Singh Khalsa, L-3 Analytics, Study Team
- Matt Smith, UAH, Study Team
Table of Contents
1.0 INTRODUCTION
1.1 ESDSWG GOALS AND STRATEGY
1.2 THE RATIONALE FOR STANDARDS
1.3 ASSUMPTIONS
1.4 METHODOLOGY
2.0 NEAR-TERM MISSION AND HERITAGE MISSION STANDARDS
2.1 ESDSWG NEAR-TERM MISSIONS
2.2 HERITAGE MISSION STANDARDS
3.0 LESSONS LEARNED
3.1 LESSONS LEARNED ON IMPLEMENTING AND USING NASA EOS STANDARDS
3.1.1 Landsat 7
3.1.2 TERRA
3.1.3 AQUA
3.1.4 AURA
3.1.5 QuikSCAT/SeaWinds
3.1.6 ACRIM
3.1.7 SeaWiFS
3.1.8 Jason-1
3.1.9 AVHRR
3.2 LESSONS LEARNED ON IMPLEMENTING AND USING OTHER STANDARDS
3.2.1 NOAA Standards
3.2.2 The Spatial Data Transfer Standard (SDTS)
4.0 ESSENTIAL STANDARDS CONCEPTS
4.1 A COMPARISON WITH PRIVATE, AD-HOC, BINARY INFORMATION TRANSFER
4.2 MANDATORY VS. OPTIONAL ELEMENTS, PROFILES AND EXTENSIONS
4.3 ABSTRACT VS. IMPLEMENTATION STANDARDS
4.4 CONTENT AND FORMAT VS. BEHAVIOR AND INTERFACE
4.5 WEB-BASED DATA SERVICE STANDARDS
5.0 STANDARDS EVALUATION
5.1 EVALUATION CRITERIA
5.2 DATA STANDARDS EVALUATION
5.3 METADATA AND DOCUMENTATION STANDARDS EVALUATION
5.4 USER SURVEYS
5.4.1 Data Format Standards
5.4.2 Metadata Format Standards
6.0 SUMMARY
7.0 CONCLUSIONS
7.1 DATA INTERFACE STANDARDS RECOMMENDATIONS
7.2 DATA PACKAGING STANDARDS
7.2.1 Data Distribution Formats Recommendations
7.2.2 Data Interchange Formats Recommendations
7.3 METADATA STANDARDS RECOMMENDATIONS
7.4 DOCUMENTATION STANDARDS RECOMMENDATIONS
7.5 STANDARD EVOLUTION PROCESS & OTHER ACTIVITIES RECOMMENDATIONS
ACRONYM LIST
List of Tables and Figures
FIGURE 1.1.1 SIMPLIFIED ESE NETWORK DATA FLOW
TABLE 2.1.1 ESDSWG NEAR-TERM MISSIONS
TABLE 2.1.2 ESDSWG NEAR-TERM MISSION STANDARDS
TABLE 2.2.3 ESDSWG HERITAGE MISSIONS DATA MANAGEMENT INFORMATION
TABLE 4.4.1 VIEWPOINTS AND LEVELS OF ABSTRACTION
TABLE 4.4.2 CRITERIA FOR FORMAT STANDARDS
TABLE 4.4.3 CRITERIA FOR INTERFACE STANDARDS
TABLE 4.4.4 DATA MODELS AND SOFTWARE ACCESS LIBRARIES
TABLE 5.2.1 DATA STANDARDS INTEROPERABILITY
TABLE 5.2.2 DATA STANDARDS AVAILABILITY
TABLE 5.2.3 DATA STANDARDS PORTABILITY
TABLE 5.2.4 DATA STANDARDS EVOLVABILITY
TABLE 5.2.5 DATA STANDARDS EXTENSIBILITY
TABLE 5.2.6 DATA STANDARDS SELF-DESCRIBING
TABLE 5.2.7 DATA STANDARDS TOOLS SUPPORT
TABLE 5.2.8 SEMANTIC COMPLETENESS
TABLE 5.2.9 DATA STANDARDS EVALUATION
TABLE 5.3.1 METADATA AND DOCUMENTATION STANDARDS EVALUATION
TABLE 5.4.1 SURVEY RATINGS OF ATTRIBUTE IMPORTANCE
TABLE 5.4.2 DATA STANDARDS SURVEY EVALUATION
TABLE 5.4.3 SUMMARY OF SURVEY ESSAY QUESTIONS
TABLE 5.4.4 METADATA STANDARDS SURVEY EVALUATION
TABLE 5.4.5 SUMMARY OF METADATA SURVEY ESSAY QUESTIONS
1.0 Introduction
1.1 ESDSWG Goals and Strategy
ESDSWG, previously called NewDISS, involves the Strategic Evolution of
the Earth Science Enterprise Data Systems to serve research and
application needs in the next ten years. Its primary goal is to support
NASA's Earth Science Enterprise (ESE), which, in turn, contributes to
the US Global Change Research Program (USGCRP). As such, ESDSWG is
driven principally by the objectives of scientific research, but must
also serve the needs of both scientific research and a wide variety of
practical applications.
Future ESE data systems will consist of a heterogeneous mix of
interdependent components derived from the contributions of numerous
individuals and institutions. These widely varying participants will be
responsible for data management functions including data acquisition
and synthesis; access to data and services; and data stewardship.
"An important premise underlying the operation of [the ESE network of
data systems and services] is that its various parts should have
considerable freedom in the ways in which they implement their
functions and capabilities. Implementation will not be centrally
developed, nor will the pieces developed be centrally managed. However,
every part of [the ESE network] should be configured in such a way that
data and information can be readily transferred to any other. This will
be achieved primarily through the adoption of common standards and
practices [1]."
Figure 1.1.1 is a simplified data flow diagram of the ESE network of
data systems and services [1]. Five types of data
centers, namely Backbone Processing Centers, PI-managed Mission Data
Centers, Science Data Centers, Applications Data Centers, and
Multimission Data Centers are shown in the diagram. Several data flows,
such as data flows from PI-managed Mission Data Centers to Multimission
Data Centers and vice versa, from Science Data Centers to Applications
Data Centers and vice versa, from Science Data Centers to Science Data
Center, from PI-managed Mission Data Centers to PI-managed Mission Data
Centers, etc. are omitted for simplicity. Four different types of data
flow are identified in the diagram. Internal data flow refers to data
flow inside each data center. L0 or spacecraft data flow refers to
spacecraft or level 0 data flow between mission operations, PI-managed
Data Centers or Multimission Data Centers, and Backbone or Long-Term
Archive Data Centers. Distribution flow denotes data distribution to
end-users. System interchange flow denotes data exchange between data
centers. As suggested by Figure 1.1.1, the ESE network provides a means
for opening numerous new channels for Earth Science satellite data
streams to reach the user community. Such data streams will flow to
users both directly from mission data processing centers as well as via
many intermediate information providers.
Figure 1.1.1 Simplified ESE network Data Flow (Adopted from Figure
C-2 [1])
The ESDSWG Near-Term Missions Standards (NTMS) study group is tasked to
make recommendations for the use of standards by the ESE near-term
missions (described in the Appendix, Section 1.0). These standards are
not meant to prescribe the ways that each near-term mission manages
data internally or the L0 or spacecraft data flow. Instead, the
recommended standards pertain to the data distribution to end-users and
to the data interchange between the ESE network of data systems and
services components (i.e., between different data centers as shown in
Figure 1.1.1).
1.2 The Rationale for Standards
Standards aid in interoperability between data systems and facilitate
access by users and the software they use. The successful adoption and
use of standards for the ESE network of data systems would reduce the
cost and enhance the efficiency of data system development and
maintenance. Use of standards for the interchange among the ESE data
and service components also makes it easy for data and service
providers to join the ESE network of data systems without negotiating
one-to-one agreements with each potential provider. The standards that
the NTMS study group is addressing include data packaging standards,
data service interface standards, metadata standards, and documentation
standards, as defined below.
- Data Packaging Standards define how to package or encode
data that is stored on a computer or transferred from one system to
another. Software libraries may be available to facilitate decoding,
encoding, or manipulating data packaged in a particular way.
- Content Standards for data or metadata define the
information elements and their intended meaning (semantics),
independently of how these elements may be encoded in files (their
syntax). Two or more encodings of the same content standard can be
mapped (machine-translated) to each other with no loss of information.
- Data Service Interface Standards specify data access
requests and service invocations between ESE data and services
components, usually over a network. These interface standards are
defined independently of the data's packaging (encoding). Web service
standards, driven by electronic commerce and other markets, are a
particularly promising class of service interface standards in the
World Wide Web context.
- Metadata / Documentation Standards provide a common
lexicon and a set of attributes describing data to ensure that users
can 1) find the data in catalogs, registries, and other indexes; 2)
interpret the data unambiguously; and 3) apply system services
correctly. Metadata is usually highly structured and formalized,
whereas, documentation usually refers to more free-text descriptions.
Most metadata and documentation standards are content standards
(format-independent); XML is a popular encoding for metadata.
For years, various satellite missions and scientific communities have
found ways to use each other's data, but stable, rich standards can
further promote opportunities in research and applications for data
users worldwide. The evolution of these standards over the past 25
years or so has largely been driven by specific science communities
with a goal of making life easier for themselves. The past 10 years or
so has seen ever wider global scientific communities tied together
through the Internet with a goal of still faster-paced data exchange
and hopefully faster-paced research results. However, the diversity of
available data sources and data standards presents a significant
challenge to Earth science researchers, especially interdisciplinary
Earth scientists.
As almost any researcher can attest, a substantial portion of the
resources required to perform an investigation are expended on
locating, obtaining, and then reading and possibly reformatting the
necessary data. Standardization of data formats, metadata, and
documentation can lower the threshold on data exchange between the ESE
network of data systems and services components and the user access to
the data products. The Internet offers a compelling example of the
essential role standards play in facilitating data exchange. Without
the underpinnings of the Internet - TCP/IP, HTML, SMTP, GIF, JPEG, PDF,
etc., the explosion of information exchange brought about by the
Internet could never have happened.
1.3 Assumptions
This study focuses on near-term missions that are already in
formulation and is aimed to provide concrete, specific recommendations
for the near-term missions' use. The following assumptions are made to
carry out this study.
- The emerging field of Web Services is driving rapid development
of data format-neutral service interface standards. Examples relevant
to ESE data include the OpenGIS Web Map Service and Web Coverage
Service. However, the use of online services is still only emerging in
practical ESE work; it will take some time before Web Services become a
part of mainstream data access and distribution.
- For the near-term missions, the preferred mode of delivering data
remains the transfer of discrete files. Therefore the file format
itself is critical to the interchange standard.
- Content data standards (define the information elements and their
intended meaning (semantics), independently of their syntax) provide
well-known semantics that can support interoperability through
translators or cross-reference tables. The leading definition for such
standards is the Federal Geographic Data Committee (FGDC) that has
developed Content Standard for Remote Sensing Swath Data and Content
Standards for Digital Orthoimagery. However in practice, content
standards alone may not suffice for transferring complex data between
different user communities without information loss or distortion.
- The processes of standards development and adoption are the
responsibility of the long-term standards study team.
1.4 Methodology
This document provides recommendations for the use of standards by the
near-term missions. We analyzed what standards are currently in use in
the near-term heritage missions and other EOS missions, posing
questions such as: What are the lessons learned on implementing and
using those standards currently in use? What are the lessons learned
from other government agencies such as NOAA? What criteria should we
use to evaluate different standards? What feedback do data producers
and data users have on standards? What standards do users think NASA
should use in the future? Once we provide recommendations, how can the
recommendations be implemented for the near-term missions? What
respective activities should be supported in order to facilitate the
adoption of the standards?
This report intends to answer these questions. It is based on a
previous report entitled, "Near-Term Missions and Standards Survey,"
which examines near-term missions and heritages missions and standards
in use by the heritage mission data management systems as well as
several emerging standards. Most of the content of the survey report
are included in the Appendices as background materials. In this report,
we present a summary of the heritage missions and standards in use in
the heritage missions. We review lessons learned from implementing and
using standards in heritage missions and in some NOAA missions. We
compare standards based on essential standards concepts. In addition,
we develop a suite of standards evaluation criteria and carry out a
standards analysis. The results from the standards analysis are
presented.
In order to include data users' and data producers' feedback on current
data and metadata standards in use in the ESE missions in the study, we
conducted a user interview/survey; this report summarizes and analyzes
the results from the interview and survey.
References:
[1] A 6 to 10 Year Approach to Data Systems and
Services for NASA's Earth Science Enterprise; Draft Version 1.0;
February 2001; Section A.3.
2.0 Near-Term Mission and Heritage Mission Standards
2.1 ESDSWG Near-Term Missions
The missions that ESDSWG is initially targeted to support include the
following eight near-term missions (Table 2.1.1). A detailed
description of these missions can be found in the Appendix, Section 1.
Table 2.1.1 ESDSWG Near-Term Missions
| Mission Name |
Phase |
Anticipated Launch Date |
| Landsat Data Continuity Mission (LDCM) |
Formulation |
2005 |
| NPOESS Preparatory Project (NPP) |
Formulation |
2005 |
| Ocean Surface Topography Measurement (OSTM) |
Formulation |
2005 |
| Ocean Vector Winds |
Formulation |
2006 |
| Global Precipitation Measurement (GPM) |
Formulation |
2007 |
| Solar Irradiance |
Formulation |
2007 |
| Carbon Cycle Initiative (CCI) |
Pre- Formulation |
2008-2012 |
| Total Column Ozone |
Pre- Formulation |
N/A |
See Acronym List if needed
A summary of the near-term mission instruments, data formats, and
metadata standards is described in Table 2.1.2. As shown in the table,
LDCM, the first near-term mission, has already decided the data and
metadata standards they plan to use for the mission data products
(specified in the Request For Proposal (RFP) they released October
2001). Our recommendations for the use of data, metadata, and data
interfaces in near-term missions may, or may not, impact the LDCM
mission.
Table 2.1.2 ESDSWG Near-Term Mission Standards
| Missions |
Instrument |
Data Format |
Metadata Format |
| LDCM |
Not specified |
- HDF
- GeoTIFF
- L7 Fast Format
|
- ECS
- FGDC
|
| NPP |
|
N/A |
N/A |
| OSTM (or Jason-2) |
N/A |
N/A |
N/A |
| Ocean Winds |
Seawinds |
N/A |
N/A |
| GPM |
- Dual Frequency Radar (DFR)
- Advanced TRMM Microwave Imager (TMI)
- Nadir-viewing Microwave Radiometer
|
N/A |
N/A |
| Solar Irradiance |
N/A |
N/A |
N/A |
| CCI Mission 1: Pathfinder CO2 |
A passive spectrometer |
N/A |
N/A |
| CCI Mission 2: Ocean Carbon |
A rotating scanner telescope |
N/A |
N/A |
| CCI Mission 3: Low Density Biomass |
A hyperspectral imager |
N/A |
N/A |
| CCI Mission 4: High Density Biomass |
A P-band SAR and an imaging laser altimeter |
N/A |
N/A |
| CCI Mission 5: Advanced Atmospheric CO2 |
A pulsed, dual frequency, tunable laser sounder |
N/A |
N/A |
| Total Column Ozone |
Some combination of OMPS-like, TOMS-like, SAGE-like and an IR
limb sounder |
N/A |
N/A |
See Acronym List if needed
2.2 Heritage Mission Standards
Data management information for near-term missions and heritage
missions is presented in Table 2.2.3.
Table 2.2.3 ESDSWG Heritage Missions Data Management
Information
| Mission |
Heritage Mission |
Heritage Instrument |
Production Site |
Archive Site |
Data Format |
Metadata Format |
| LDCM |
Landsat 1-7 |
|
EDC DAAC |
EDC DAAC |
- HDF 4
- GeoTIFF
- L7 Fast Format
|
- ECS
- FGDC
|
| NPP |
Aqua |
|
GSFC DAAC |
GSFC DAAC |
HDF-EOS 4 |
ECS |
| Terra |
MODIS |
- GSFC DAAC
- NSIDC DAAC
- EDC DAAC
|
- GSFC DAAC
- NSIDC DAAC
- EDC DAAC
|
HDF-EOS 4 |
ECS |
| OSTM |
Jason-1 |
- Poseidon-2 Radar Altimeter
- Jason Microwave Radiometer
|
|
|
Native Binary |
Custom |
| Topex/ Poseidon |
- Topex Altimeter
- Topex Microwave Radiometer
|
|
|
- Native Binary for low level products
- netCDF for Level 3 product
|
Custom |
| Ocean Winds |
Adeos-1 |
NSCAT |
JPL SeaPAC |
PO DAAC |
HDF 4 |
Adapted ECS |
| Quikscat |
Seawinds |
JPL SeaPAC |
PO DAAC |
|
Adapted ECS |
| Adeos-2 |
Seawinds |
JPL SeaPAC |
PO DAAC |
|
Adapted ECS |
| GPM |
TRMM |
TMI |
GSFC DAAC |
GSFC DAAC |
HDF 4 |
ECS |
| VIIRS |
GSFC DAAC |
GSFC DAAC |
HDF 4 |
ECS |
| PR |
GSFC DAAC |
GSFC DAAC |
HDF 4 |
ECS |
| CERES |
LaRC SIP |
LaRC DAAC |
HDF 4 |
ECS |
| LIS |
GHRC SIP |
GHRC |
HDF 4 |
ECS |
| Solar Irradiance |
SNOE |
XPS |
LASP |
LASP |
ASCII |
Custom |
| UARS SOLSTICE |
|
UARS CDHF and GSFC |
GSFC DAAC |
Native Binary Format |
Native SFDU format |
| ACRIM III |
TIM |
ACRIM III SIPS |
LaRC DAAC |
HDF 4 |
ECS |
| EOS SORCE |
|
LASP SORCE SIP |
GSFC DAAC |
HDF-5 |
ECS |
| CCI |
SeaStar |
SeaWiFS |
GSFC DAAC |
GSFC DAAC |
|
ECS |
| Terra |
MODIS |
GSFC DAAC |
GSFC DAAC |
HDF-EOS 4 |
ECS |
| Nimbus-7 |
CZCS |
GSFC DAAC |
GSFC DAAC |
|
Native format |
| VCL |
MBLA |
Raytheon ITSS |
EDC DAAC |
Unknown |
Unknown |
| Total Column Ozone |
- Nimbus-7
- Meteor-4
- ADOES
- Earth Probe
- QuikTOMS
|
TOMS |
GSFC DAAC |
GSFC DAAC |
HDF-4 |
ECS |
| AURA |
OMI |
GSFC DAAC |
GSFC DAAC |
- HDF-4 for Level 0 and 1
- HDF-EOS 5 for Level 2 up
|
ECS |
See Acronym List if needed
Several observations can be made from Table 2.2.3:
- Most of the heritage missions use the Hierarchical Data Format
(HDF) or HDF-EOS (Earth Observing System) data formats and the EOSDIS
Core System (ECS) metadata format for archiving and distribution.
Heritage missions that do not use the HDF or HDF-EOS data formats and
the ECS metadata format for product distribution are the Jason-1,
Topex/Poseidon, and the Upper Atmospheric Research Satellite (UARS)
missions. The Jason-1 and Topex/Poseidon missions are heritage missions
to the Ocean Surface Topography Mission. UARS is a heritage mission to
the Solar Irradiance mission.
- Several heritage missions distribute their data products in
multiple data and metadata formats. For example, Landsat missions
distribute their data products in three different data formats, namely
HDF, GeoTIFF, and Fast Format, and two metadata formats, ECS and FGDC
(Federal Geographic Data Committee). SeaWinds distributes their data
products in HDF and BUFR (Binary Universal Format For Representation of
data) format. The HDF format is for distributing research data products
by the NASA Jet Propulsion Laboratory (JPL) Distributed Active Archive
Center (DAAC), while BUFR format is used to distribute operational data
products by NOAA NESDIS (National Environmental Satellite, Data, and
Information Service).
- Data distribution formats for heritage missions consist of HDF,
HDF-EOS, netCDF, GeoTIFF, Fast Format, BUFR, Binary, and ASCII.
Metadata distribution formats for heritage missions include ECS, FGDC,
and custom formats. A survey and critique of different data standards
and metadata standards can be found in the Appendix, Section 2.0 and
Section 3.0, respectively.
3.0 Lessons Learned
This chapter presents lessons learned from past experiences with data
and metadata standards used for NASA ESDSWG heritage missions and NOAA
missions. Some of the lessons learned pertain to past experiences with
developing or implementing the standards, and others are related to
past experiences with using the standards.
3.1 Lessons Learned on Implementing and Using NASA EOS Standards
3.1.1 Landsat 7
Landsat 7 data products are archived in the HDF format but distributed
in three different formats: GeoTIFF, Landsat 7 Fast Format, and HDF.
Based on statistics collected by the EDC DAAC [Earth Resources
Observation System (EROS) Data Center (EDC) Distributed Active Archive
Center (DAAC)] User Services from January 1, 2001, to September 30,
2001, most of the users ordered L-7 data either in Fast Format (46%) or
in GeoTIFF (42%). Only 12% of the users ordered L-7 data in HDF format.
Of the users who ordered data in HDF format, most were from
international ground stations and the data product they ordered was
Level 0R. HDF is the only format available for Level 0R. These
statistics indicate that:
- User communities welcome multiple distribution data formats.
Statistics have shown that users order Landsat 7 data in all three
available formats with the majority (88%) of the users choosing GeoTIFF
or Fast Format. This indicates that for well-developed satellite
mission user communities such as the Landsat data user community,
multiple data distribution formats are needed. Different users choose
different data formats in their applications.
- Heritage mission data distribution formats play an important
role. The reason the majority of the Landsat 7 users choose GeoTIFF or
Fast Format may be because the Landsat 7 heritage mission Landsat 5
data products are distributed in Fast Format or GeoTIFF format. Thus,
users were already familiar with those two formats. It seems natural
that users should choose to use a format they are already familiar with
rather than switching to a new data format, such as HDF.
- GeoTIFF data format is gaining popularity among Geographic
Information System (GIS) users. Landsat Thematic Mapper (TM) data
(Landsat 4-5) products have been distributed in Fast Format since 1984.
EDC DAAC began distributing Landsat 5 TM data products in GeoTIFF in
recent years. However, based on the statistics collected from January 1
to September 30, 2001, almost half (42%) of the users order Landsat 7
data products in GeoTIFF format. As GeoTIFF format is becoming a
popular data format in the GIS user community, EDC DAAC is considering
distributing other land remote sensing data, such as ASTER (Advanced
Spaceborne Thermal Emission And Reflection Radiometer) data products,
in GeoTIFF format in addition to the HDF format.
3.1.2 TERRA
The flagship in NASA's Earth Observing System (EOS), Terra launched on
December 18, 1999 and began collecting science data on February 24,
2000. There are five instruments onboard Terra, namely MODIS, ASTER,
MISR, CERES, and MOPITT (see Acronym List). The data products from
Terra, consisting of a great variety of ocean, atmosphere, and land
data sets, are archived and distributed in HDF-EOS format as required
by the EOS project. Terra metadata conforms to the ECS data model.
In the early 1990's, NASA's Earth Science Data Information Systems
(ESDIS) began evaluating data format standards in preparation for the
launches of the EOS satellites. In 1993, after careful consideration of
over a dozen different formats, ESDIS chose the Hierarchical Data
Format (HDF) for EOS standard data products. During the ECS design
phase, it was realized that while HDF was a good format to use for
storing data, further standardization would be advantageous. HDF
provided little convention for associating spatial and temporal
information with the science data itself. To enable additional
standardization, the HDF-EOS data format was developed. This format
adds mechanisms for storing geo-referencing and temporal information,
data organization, and metadata storage.
Terra instrument teams and users have had several problems with
implementing and using the HDF-EOS standard and the ECS data model.
- The HDF-EOS Grid and Swath provided a natural structure for the
bulk of data taken on Terra and other EOS missions; however, there was
no convention for storing individual data values. For example, in the
case of one producer, real numbers are stored in 14 bits and 2
additional bits are used for a special purpose rather than using all 16
bits to store the number. The HDF-EOS library can access these data;
however, translation and other application tools can have problems. If
processing is to be performed on individual words or bits, errors can
occur if the user is not cognizant of the storage method.
- There was no convention for packaging both HDF-EOS and HDF
objects in the same file. All MODIS (Moderate-Resolution Imaging
Spectroradiometer) Level 2 and 3 products are different. Even though
they use HDF-EOS structures to store their primary data, many and
varied vanilla HDF objects are included in MODIS standard products.
MODIS also uses global and local text attributes to store non-ECS
metadata rather than dumping it all into the ArchiveMetadata attributes
as the HDF-EOS design calls for. This implies that software beyond the
HDF-EOS library is required to access the additional attributes.
- Even though HDF-EOS provides a standard for packaging geolocation
information, there was no detailed standard for actually calculating
this information. For example, some ASTER products are geolocated using
a geoid (geodetic coordinates) while others are geolocated using an
ellipsoid (geocentric coordinates). This is not a priori obvious to
data users.
- HDF-EOS has a steep learning curve. Once that hurdle is overcome,
platform independence and common packaging provide convenience in
access. However, scientists who are used to flat binary format complain
about the complexity of HDF-EOS.
- It was a mistake to try to have one HDF-EOS profile to fit all
disciplines. In Terra MODIS case, this leads to unproductive wrangling,
an overly broad profile, and poor fit for some (maybe all) disciplines.
The lesson learned is to develop strong discipline specific profiles
and worry about crossing disciplines later.
- An important lesson learned from Terra s not to impose immature
standards such as HDF-EOS. All the following are needed in no less than
launch time minus three years:
- Need an expert base before products are defined.
- Need tools to verify proper implementation.
- Need experienced help desk support (and more) and to help
with implementation.
- There have been many mismatches between ESDT (Earth Sciences Data
Type) and metadata output from MODIS production. This has led to a
large number of ingest failures. Quality control on the production end
is lacking, and it can be traced to the poor versioning on the MODIS
processing system end. There would be no problem if the MODIS
processing team acquired their Metadata Configuration Files (MCFs) from
installed descriptors at the DAACs. In reality, they modify the MCF
locally and then send the changes to ECS. As a result, there can be
mismatches between the DAACs installed ESDT and what MODIS is using.
This problem has all but disappeared since the MODIS processing team is
now using only the official MCFs.
3.1.3 AQUA
AQUA is a NASA Earth Science satellite mission mainly designed to study
Earth's water cycle. AQUA was formerly named EOS PM, signifying its
afternoon equatorial crossing time, as opposed to the morning
equatorial crossing time for TERRA. Aqua will carry six instruments in
a near-polar, low-Earth orbit. The six instruments are the Atmospheric
Infrared Sounder (AIRS), the Advanced Microwave Sounding Unit (AMSU-A),
the Humidity Sounder of Brazil (HSB), the Advanced Microwave Scanning
Radiometer for EOS (AMSR-E), the Moderate-Resolution Imaging
Spectroradiometer (MODIS), and the Clouds and the Earth's Radiant
Energy System (CERES). The MODIS and CERES instruments are the same as
those onboard TERRA launched in 2000. The AQUA mission launched in May
2002.
The data format and metadata standards for the AQUA instrument data are
the same as those for TERRA, namely the HDF-EOS and the ECS data model,
respectively. Lessons learned from the AIRS instrument team (Evan
Manning, AIRS principle developer) and the AMSR-E instrument team (Dawn
Conway, University of Alabama in Huntsville, Lead Software Engineer for
the AMSR-E Science Team) on implementing the data and metadata
standards are summarized below.
- In general, using the HDF-EOS standards requires a fair amount of
"buy-in" and has a steep learning curve. Instrument team developers
adapted, but casual users had more trouble. For example, it was
relatively easy for an instrument programmer to produce the HDF-EOS
files using the simple APID. A lot of end-users, however, are reluctant
to accept or "buy into" HDF-EOS because it is new. Both the AIRS and
the AMSR-E teams found that HDF-EOS is very easy to use.
- The HDF-EOS format has adequately supported AIRS and AMSR-E
requirements, but:
- The HDF-EOS should explicitly support field annotations.
Without a standard, some developers will add their own annotation to
internal HDF objects.
- The field/attribute distinction is not clear. It seems that a
swath attribute is anything that does not have a dimension that is a
geolocation dimension. HDF-EOS Swath thinks it's anything with less
than 2 dimensions.
- The documentation for the HDF-EOS is nearly adequate. It could
really use some good sample programs. For example, provide examples
that actually do something non-trivial, such as check for error
conditions.
- While AMSR-E Lead Science Computing Facility (SCF) found that
implementation of the required ECS metadata was simple and
straightforward; the AIRS team encountered several problems
implementing the ECS data model. In fact, the AMSR-E team found the
Science Data Processing (SDP) toolkit unnecessary to complete their
tasks. It was noted, however, that the ECS keywords should better
relate to keywords used in the GCMD (Global Change Master Directory).
Problems that the AIRS team encountered are:
- The ECS tools for implementing the ECS metadata standards are
not easy to use. There are some really tricky parts, like setting
"hdfattrname" to "coremetadata.0" or "coremetadata" depending on
whether it is embedded metadata or not. The interface is generally
confusing.
- The amount of lead-time for adding an ECS Product Specific
Attribute or changing attribute valids, etc. is too long.
- Documentation for the ECS data model is not adequate.
- The AIRS team supported ESDIS's (led by Bob Lutz) attempts to
add new valids for ScienceQualityFlag. The failure of those attempts
makes it hard for AIRS to support data access as they would prefer to.
- On a general development note, both teams discovered the
importance of regular, consistent communications (telecons, meetings,
etc.) between the SCF, SIPS (Science Investigator-lead Processing
System), DAAC, and ECS.
3.1.4 AURA
Aura is a NASA mission to study the Earth's ozone, air quality, and
climate. This mission is designed exclusively to conduct research on
the composition, chemistry, and dynamics of the Earth's upper and lower
atmosphere by employing multiple instruments on a single satellite.
Aura's chemistry measurements will follow-up on measurements that began
with NASA's UARS and will continue the record of satellite ozone data
collected from the TOMS (Total Ozone Mapping Spectrometer) missions.
The satellite will be launched in June 2003 and will operate for five
or more years. The Aura data products will be distributed in HDF-EOS5
format. Aura metadata will conform to the ECS data model.
The HDF file format was designed to be a very flexible format. It is
able to store many different types of scientific data in a variety of
ways. While this flexibility is an asset to customized data storage, it
is not ideal when one is trying to ease sharing of data. As there is so
much flexibility, two different developers storing the exact same data
can store the data in dramatically different ways. To constrain HDF for
use in the EOS community, HDF-EOS was developed.
While HDF-EOS constrains HDF with its POINT, GRID, and SWATH
interfaces, it is still possible to create two files that are
completely different and require dramatically different readers. Areas
of potential mismatch include:
- Organization of data fields and attributes
- Dimension names
- Geolocation names and dimension ordering
- Data field names and dimension ordering
- Units for data fields
- Attribute names, values, and units
When the Aura Data System Working Group (DSWG) reviewed the proposed
structure of the Level 2 data files from each instrument, it was
discovered that each instrument's data files were, at times, quite
different. DSWG agreed that with a little work, it was possible to
adopt a uniform set of file format guidelines and that it was
advantageous to do so. One of the main advantages of this standard is
to allow users the ability to use the same set of tools and I/O
routines for any of the Level 2 data from instruments within Aura. At
the time of this writing, the "HDF-EOS Aura File Format Guidelines" has
been adopted by all of the EOS Aura instrument teams. The guidelines
contain detailed, specific information on how to store data. All of the
items listed above are specifically addressed. As the launch of Aura
has not yet occurred at the time of this writing, the outcome of this
endeavor has not been determined, but it is hopeful that by adopting a
uniform set of strict guidelines that the benefits will be many. The
current guidelines can be found at:
http://www.eos.ucar.edu/hirdls/HDFEOS_Aura_File_Format_Guidelines.doc
(Microsoft Word version)
http://www.eos.ucar.edu/hirdls/HDFEOS_Aura_File_Format_Guidelines.pdf
(Adobe Acrobat format)
3.1.5 QuikSCAT/SeaWinds
The SeaWinds instrument on the QuickScat satellite is a specialized
microwave radar that measures near-surface wind speed and direction
under all weather conditions and cloud cover. It was launched in 1999
as a follow-on mission to the NASA scatterometer (NSCAT) that flew on
the Japanese ADEOS-1 (Advanced Earth Observing Satellite) platform
during 1996-1997; and the Seasat-A scatterometer system (SASS), which
flew in 1978.
A unique feature of the QuikSCAT/SeaWinds mission is that SeaWinds data
are processed, archived, and distributed at both NASA JPL and NOAA
NESDIS. SeaWinds data are downloaded from QuikSCAT once every orbit
(101 minutes). The stream passes on from the receiving ground station
to the Central Standard Autonomous File Server (C-SAFS) at Goddard
Space Flight Center (GSFC). The data are then forwarded to both JPL and
NOAA. JPL uses these data to produce its science-level wind product,
while NOAA uses an altered version of JPL's processing to produce its
own Near Real Time (NRT) wind product. This dichotomy can be summarized
as follows:
- While the processing software used at NASA JPL and at NOAA NESDIS
is the same, data products produced at JPL are research products (with
higher accuracy) used for research and in the application community,
while data products from NOAA are near real-time products (within 3
hours of observation) targeted for operational users such as the
National Weather Services (NWS).
- The SeaWinds products distributed by JPL are in HDF format while
data products distributed by NOAA NESDIS are in BUFR format. This is
because many operational and modeling users use the WMO (World
Meteorological Organization) data standards, BUFR and GRiB (GRidded
Binary). NOAA is required to provide data to their operational users in
BUFR/GRiB format. For the future, the current plan is to move the NRT
processing from NOAA to the Physical Oceanography (PO) DAAC at JPL,
starting with the ADEOS-II mission in 2002.
3.1.6 ACRIM
For ACRIM, using HDF-EOS was required; however, since mapping the
terrain of the Earth was not necessary (ACRIM is solar pointing), the
EOS part did not apply. ACRIM was actually using something akin to a
subset of HDF. Because ACRIM used HDF in a limited fashion, enough
tools were available, but it still required the team to learn almost
everything about HDF in order to determine what functions they actually
needed. Overall, HDF was relatively easy to implement. Some lessons
learned indicate that the following would have been helpful in the
implementation of HDF:
- An instruction manual - "What would have been helpful is a manual
with step-by-step instructions; it could have been a quicker
implementation."
- Help desk - "Having someone who could spend a little time over
the phone would have been very helpful."
- Rectifying the problems with creating HDF files with REAL and
INTEGER values.
[Frank Boecherer, ACRIM Science Computing Facility, Personal
Communication, June 2002]
3.1.7 SeaWiFS
Ten years ago, when SeaWiFS was in development, HDF had some
capabilities that were not supported at that time. In the beginning,
HDF was largely an image format; it only supported a limited number of
data sets, and it had floating point numbers only. The SeaWiFS team
identified these deficiencies early on; documented and issued reports;
then received responses from National Center for Supercomputing
Applications (NCSA). As a result, HDF was made more friendly and easier
to use. In addition, the parallel development of HDF for use with IDL
allowed users to write their own HDF tools. The main thing that was
learned through the experience of implementing HDF into the SeaWiFS
project was that good user support is essential. The group at NCSA
responded to all of their needs at the time. "That was the thing that
made it work - user support, help desk." [Fred Patt, SAIC Project
Manager, Personal Communication, June 2002]
SeaDAS (SeaWiFS Data Analysis System) is a comprehensive image analysis
package for the processing, display, analysis, and quality control of
all SeaWiFS data products, ADEOS / OCTS (Advanced Earth Observing
Satellite / Ocean Color and Temperature Scanner, Japan), MOS (Modular
Optoelectronic Scanner, Germany), CZCS (Coastal Zone Color Scanner,
NASA), and Ancillary data (Meteorological, Ozone). HDF facilitated the
development of this powerful tool. The versatility of HDF also allows
individuals to develop their own uses within the SeaDAS system. HDF was
mandated for the SeaWiFS project because EOS was still under
development, and SeaWiFS was to pave the way for future missions. One
lesson learned is: allow time to develop tools (or preferably use
existing tools) to facilitate ease of use. [Jim Acker, DAAC User
Support, Personal Communication, June 2002]
3.1.8 Jason-1
For Jason-1, binary was chosen as the primary data product for
historical reasons (continuity). The main advantage of using binary is
that it is fast and simple. Once given the read program, it is
self-contained. A disadvantage to binary is that each data set requires
its own read program.
Initially, one of the problems with HDF was that software to read the
format was not widely available, and it did not work on many important
computer classes. A second problem, in the past, was that installing
the HDF libraries required major system administration knowledge. Also,
the initial jump into HDF is difficult and requires a lot of
"handholding", but only for first-time users. However, the beauty of
HDF is uniformity across mission data sets.
From these ideas, the main lessons drawn are:
- Before declaring a format "STD", make sure it installs properly
and runs on the main machines intended.
- Understand which classes of users will be EXCLUDED by the new
format (for example, the simple binary format of Topex can be read on
even a windows 95 computer, but HDF will not install there). It is
acceptable to exclude classes of users CONSCIOUSLY, but not because of
oversight.
- Do not underestimate the "handholding" that will be needed to
help users install, then run, the new software. HDF, etc. are not 'read
programs,' they compare to major operating systems or major commercial
packages (IDL, Matlab, Mathematica, etc) in their complexity and their
installation can be as complex.
[Victor Zlotnicki, Jet Propulsion Laboratory, Personal Communication,
June 2002]
3.1.9 AVHRR
AVHRR data format was based on TIROS data for continuity (level 1B,
native binary). However, about 2-years ago, NOAA began offering AMSU
data in HDF-EOS along with the BUFR and 1B products. The response to
HDF-EOS was great. Almost all of the climate scientists are now using
the HDF-EOS format by their own choice. In the future, NOAA hopes to
offer AVHRR as an HDF-EOS product, due to customer demand. [Ingrid
Guch, National Environmental Satellite, Data and Information Services
(NESDIS), Personal Communication, June 2002]
The HDF format has already been chosen for the reprocessing of all
AVHRR data for JPL. It was known that the data files would need to be
compressed, but the problem was, if just a small part of a big data set
was needed, the entire file would have to be decompressed and then the
small subset would have to be extracted. With HDF, a chunking process
exists (also called tiling). This compresses the data in such a way
that it allows storage of data sets in chunks that can be decompressed
separately. Thus, HDF-4 was chosen for the reprocessing of the AVHRR
data. [Peter Cornillon, University of Rhode Island, Oceanography
Department, Personal Communication, June 2002]
3.2 Lessons Learned on Implementing and Using other Standards
3.2.1 NOAA Standards
The National Oceanic and Atmospheric Administration's (NOAA's) National
Environmental Satellite, Data, and Information Service (NESDIS)
operates NOAA's environmental (weather) satellites and manages the
processing and distribution of the data and images these satellites
produce daily. NOAA's operational weather satellite system is composed
of two types of satellites: Geostationary Operational Environmental
Satellites (GOES) for "now-casting" and short-range warning and
Polar-Orbiting Environmental Satellites (POES) for longer-term
forecasting. Both types of satellites are necessary for providing a
complete global weather monitoring system. The primary customer is
NOAA's National Weather Service (NWS), which uses satellite data to
create forecasts for the public, television, radio, and weather
advisory services.
NOAA NESDIS does not use consistent data and metadata formats for their
POES and GOES satellite data archive and distribution. The POES and
GOES data are processed by the Information Processing Division (IPD) of
the NESDIS Office of Satellite Data Processing and Distribution (OSDPD). The IPD is
responsible for ingest, processing, and dissemination of environmental
satellite data. The GOES data are distributed in McIDAS formats. The
POES weather and climate data products are distributed in various
different data formats including flat binary file, Level 1b, GIF,
ASCII, BUFR, GRiB, HDF-EOS, netCDF, and McIDAS [1].
- In general, NOAA NESDIS uses multiple distribution data formats
to satisfy different user communities' needs [Ingrid Guch, NOAA NESDIS,
personal communication]. The National Weather Service or the modeling
community (US and international) uses the WMO data standards, BUFR and
GRiB. These users have been relying on NOAA to format the data in BUFR
and GRiB (as opposed to them taking the data and running their own
converter). The BUFR/GRiB formats are very complex, though, and not
generally used by the people outside the modeling community.
- The imaging, climate, and scientific community as well as the
NOAA NESDIS maintenance personnel greatly prefer the HDF-EOS data (ease
in visualization, combining datasets, using commercial software, etc.).
The netCDF format has the same benefit.
- Other experienced users (education, academic, etc.) seem to
prefer a binary or ASCII flat file so they can easily manipulate it and
add GIS or whatever extensions they like.
- Browsing users (education, some academic folks, etc.) prefer the
option of ASCII, spreadsheet, and GIF.
- For satellite data (sensor counts with navigation and calibration
appended but not applied), users seem satisfied with the current packed
binary file (Level 1b format). The internal NESDIS maintenance
personnel have been using an unpacked binary file (Level 1b star) for
ease of use in real-time processing. However, this requires recreation
of the "unpacked" file from archived metadata and the 1b if
reprocessing is necessary (problems occurred in the real-time
processing).
Long-term environmental satellite data products are archived and
distributed at the NOAA National Climatic Data Center (NCDC). Archive
formats used in NCDC are different for different data products. Many
products are archived in a custom format and others are in HDF-EOS,
Level 1b, ASCII, or JPEG [Kathy Kidwell, NOAA NCDC, personal
communication, 2002]. Data distribution formats are the same as the
archive formats in NCDC. Lessons learned on NOAA data standards are
summarized below:
- Since NOAA is an operational agency and its main customer is the
NWS, NOAA NESDIS is required to distribute their satellite data in
BUFR/GRiB format to the NWS or the modeling users, although there are
many problems with the BUFR/GRiB format [Ingrid Guch, NOAA NESDIS,
personal communication; 2002].
- NOAA NCDC has many legacy systems and they have problems
translating data to/from BUFR/GRiB format [Geoffery Goodrum, NOAA NCDC,
personal communication, 2002].
- The NOAA NESDIS staff have had a positive experience with the
HDF-EOS data format [2] and their users, mainly
imaging, climate, and scientific communities, like the HDF-EOS format
because of the flexibility, tools, and vendor support [3].
3.2.2 The Spatial Data Transfer Standard (SDTS)
The Spatial Data Transfer Standard became a Federal Information
Processing Standard (FIPS 173) in 1992, after a 10-year development
effort. It was to serve as the national spatial data transfer mechanism
for all U. S. Federal agencies, and to be available for use by state
and local government entities, the private sector, and research
organizations. SDTS specifies exchange format constructs, addressing
structure, and content, for spatially-referenced vector and raster
data, to facilitate data transfer between dissimilar spatial database
systems.[4] The Spatial Data Transfer Standard
(SDTS) doesn't prescribe a single data model; rather it provides a set
of rules intended to represent virtually any data model.
However, SDTS fell short of its ambitious goals; and the marketplace
was slow to accept and support it. Arctur et al. [5]
list a number of reasons for this:
- Complexity - SDTS was driven primarily by large
national-level data producers and their needs (very large databases,
complex interdependencies, high precision, flexible models, extensive
metadata, collaborative updates, etc.). These needs far exceeded those
of casual "desktop GIS" users and of most commercial, regional, or
local GIS projects, and they stretch even today's GIS technology to its
limits. Many people in the GIS community found SDTS to be overly
complex, few understood its intended purpose, and thus few chose it
when other, more established formats were available.[6]
(Arctur et al. [5] suggest that as GIS users
become more sophisticated, they may demand more of their technology
(including data models and formats), and be more able and willing to
cope with the implied complexity.)
- Slow development of the standard in a fast-changing market
- In the decade that elapsed between the first work on SDTS and its
final adoption as a standard, the GIS industry grew significantly, and
several vendor-specific exchange formats came into widespread use,
which satisfied many users' immediate needs, and thus limited the
community's interest in using SDTS (which many perceived as yet another
format). Even though the standard was mandated for all federal
agencies, most data suppliers, responding to user demand, offered
alternative data encodings - and only the most curious and experimental
users chose SDTS.
- Limited vendor support - SDTS got caught in a
"chicken-and-egg" situation with GIS vendors: in order to build market
demand for SDTS-aware software, data providers needed to produce large
volumes of SDTS data. But they needed to use commercial GIS products to
build these data; so they had to persuade vendors to produce SDTS
products in the absence of customer demand. A few vendors did include
STDS conversion tools in their products (e.g., ESRI's Arc/Info,
Laser-Scan's Gothic); however different products interpreted SDTS
ambiguities differently (see below), so they would often fail to
translate unexpected STDS constructs introduced by another vendor's
product.
- Slow development of practical profiles - SDTS was a very
general standard: any practical use of it required users to agree on a
particular profile. But due to the complexity of SDTS, and the limited
educational material (such as usage examples) available to the
geospatial community, it took another four years to complete the first
usable profile of SDTS (the Topological Vector Profile). The lack of
interest in, and understanding of, SDTS among the GIS community also
reduced the demand for useful profiles, and the community's enthusiasm
for working on them. In the end, this first profile proved to be both
limiting (encoding fairly mundane examples required awkward
workarounds) and unnecessarily complex (it required arc/node/polygon
topology, which was unnecessary or even meaningless for many
commonly-used cases). [7]
- Harmonization delays - Subsequent efforts to define
other SDTS profiles (the Raster Profile and Transportation Network
Profile) were almost complete when they became mired in attempts to
harmonize them with similar standards being developed in NIMA, NATO,
and the European Union. This resulted in further delays to their
development. (Arctur et al. [5] suggest that early
harmonization is easier, and that profiles should not be developed so
quickly as to overlook other, related standards.)
- Ambiguity in the data model (e.g., the cardinality of
relationships) and the data semantics (e.g., the meaning of
relationships among entities) of SDTS and its profiles limited the
utility of SDTS for reliable information transfer. (Arctur [8] likens an SDTS profile to a game in which teams
agree on the size of the ball and the shape of the field, but not on
the rules of play.) SDTS was supposed to be very general, and to make
datasets self-describing; that is, the data model could be
determined from the dataset contents. But this proved an elusive goal;
and thus many even of those who were willing to be "SDTS pioneers"
ultimately concluded that its practical value was limited.
In addition, during and after the development of SDTS, new,
unanticipated technical expectations arose, which demanded significant
technical (re)design and international coordination, and further
weakened the community's support for SDTS:
- a standard means of representing subtiles within a dataset;
- support for permanent, universally unique object identifiers
across all datasets;
- support for value-added extensions and incremental updates by
users;
- support for tracking changes and historical lineage of features
and spatial primitives;
- harmonizing the metadata content with emerging international
standards; and
- harmonizing repository organization with emerging OpenGIS
software interfaces.
Some of these issues might have been anticipated in the design of SDTS,
while others stemmed from the increasing sophistication of GIS products
and their users over the years.
The need for harmonization with OpenGIS led to OpenGIS' work on
interface specifications for access to geospatial data (features,
coverages, identifiers, etc.). Since the late 1990s, OpenGIS has been
the locus of much subsequent work in this area. It focused first on
accessing geospatial data (e.g., Simple Features Access for SQL, COM,
and CORBA), then on encoding geospatial features in XML (Geography
Markup Language (GML)) for transfer between clients and servers.
In summary, the SDTS experience illustrates the importance of keeping
pace with technology and market trends and emerging expectations, even
after capturing initial requirements. It shows the role of timing: a
standard may be "ahead of its time" (arriving before people are ready
to understand them or accept more complexity) or "overcome by events"
(arriving after people are used to making do without flexible, general,
or vendor-independent solutions). Paradoxically perhaps, SDTS was both!
The SDTS experience also underscores the need to balance advanced needs
with more basic ones; the importance of good documentation and usage
examples; the challenge of "priming the pump" among vendors in advance
of market demand; the benefits and risks of harmonizing with related
standards; and the futility of mandating a standard that fails to meet
a need.
References:
[1] NESDIS Satellite Product Overview Display, http://osdacces.nesdis.noaa.gov:8081/satprod/products/prod_frameset.cfm?prodid=-1
[2] Huan Meng, Doug Moor, Limin Zhao, Ralph
Ferraro, HDF-EOS at NOAA/NESDIS; Presentation; HDF-EOS Workshop, 2000,
Landover, MD: http://hdfeos.gsfc.nasa.gov/hdfeos/WSfour/meng/hdfeos4.ppt
[3] Andrew S. Jones and Thomas H. Vonder Haar; A
Dynamic Parallel Data-Computing Environment for Cross-sensor Satellite
Data Merger and Scientific Analysis; Accepted for publication by
Journal of Atmospheric and Oceanic Technology; March 2002.
[4] Fegeas, Robin (1995). Q3.3: What is this SDTS
thing and is it available via ftp? In GIS-L / comp.infosystems.gis
Frequently Asked Questions and General Info List (Lisa Nyman,
ed.). http://www.prenhall.com/startgis/faq.html.
[5] Arctur, D., Hair, D., Timson, G., Martin, E.,
and Fegeas, R. (1998) Issues and Prospects for the Next Generation of
the Spatial Data Transfer Standard (SDTS). International Journal
of Geographical Information Science 12(4): 403-425.
[6] Hastings et al. (1996) The Spatial Data
Transfer Standard: Closing the Loop? - Panel Discussion at
GIS/LIS--Denver, Colorado--November 19, 1996. http://www.ngdc.noaa.gov/seg/tools/sdts/gislis_main.html
[7] Kelley, C., and Gosinski, T., 1994: Spatial
Data Transfer Standard: do you fit the profile? GIS World,
August 1994, pp. 48-50. http://wwwsgi.ursus.maine.edu/gisweb/spatdb/urisa/ur94076.html
[8] Arctur, David K., 1996: Spatial Data Transfer
Standard: A GIS Vendor's Perspective. Online article. http://web.archive.org/web/19990222153931/www.lsl.co.uk/~arctur/portfolio/sdts.html
4.0 Essential Standards Concepts
Before evaluating individual data or metadata standards, it may be
useful to review several key concepts crucial to understanding and
comparing standards.
- A comparison with private, ad-hoc, binary information transfer
- Mandatory vs. optional elements of a standard; profiles and
extensions
- Abstract vs. implementation standards
- Content and format standards vs. behavior and interface standards
4.1 A Comparison with Private, Ad-hoc, Binary Information Transfer
Webster's dictionary defines a standard as: "That which is established
as a rule or model by authority, custom, or general consent." Thus,
standards exist only within a community of people sharing certain usage
patterns ("custom") or organizational structures (formal "authority" or
informal "general consent"). The emphasis in this study is on standards
accepted by a fairly broad set of users, publicly documented, and
either stable (unchanging) or changeable only by a consensus among
these users.
Another aspect of standards is that they govern only a part of the
information transfer process. For instance, GeoTIFF codifies the
georeferencing of an image, but is silent on the meaning of its pixel
values. Whatever a standard does not specify is left to the private
(often implicit) understanding of each user community or to ad-hoc
ancillary information (such as a README file or a telephone message
describing data details). So, at one extreme, complex and rigid
standards specify every aspect of information transfer, and at the
other extreme, private agreements or ad-hoc communications leave
everything implicit or unstructured. Most standards fall somewhere in
between; they govern a certain piece of the information transfer
process to let a certain set of users communicate or work together, but
users must also rely on other standards, private agreements, or ad-hoc
qualifiers.
Many earth science data users favor a "raw binary" data format that is
both simple and comprehensive. In fact, "raw binary" doesn't actually
mean mysterious data files that one must guess at - but, rather, a
simple format (often some kind of raster grid) used by a small set of
colleagues with little attention to documentation or stability. Thus,
"raw binary" denotes not a single format, but as many different formats
as there are workgroups. Each such format has a syntax and semantics
invented just for that data set or data series, usually without a lot
of attention to other related formats, and with many details left
implicit or provided "out of band" in a mission report or some other
natural-language document.
Such a format may serve many people's immediate needs, for several
reasons.
- A given science team often works with only one kind of data and
so gets used to the one syntax for that data (e.g., keeps reusing the
same parser) and the one set of semantics (e.g., pins a page from the
mission report to the cubicle wall).
- Science teams traditionally put a low emphasis on making their
data accessible to others outside of their immediate colleagues. They
may feel that they have done their job by distributing a simple
"README" file with the data.
- ESE data is commonly encoded as raster grids for which one can
make easy guesses as to syntax (band-sequential, etc.).
- Most importantly, perhaps, the semantics of much ESE data
(platform orientation, sensor model, calibration information,
interpretation algorithms) are so complex that bundling them with the
data is often difficult; so they tend to remain in a mission report
document - or in people's heads. (For example, each MODIS L1b granule
has dozens of ancillary data items required for proper interpretation
along with several grids of data error and reliability estimates.)
However, the use of raw binary data relies much more on private
agreements among colleagues than on documented, consensus standards. It
has many of the properties opposite to those of standards, as listed
above in the "Rationale for Standards" paragraph. It limits the ability
of science teams to move beyond traditional work methods towards more
effective interdisciplinary research, collaborative work, and
applications. The essential points are:
- Data in a standard formats (should) convey something about the
data that its users need to know; whereas, users of binary data must
rely on inside knowledge or educated guesses to read and interpret the
data.
- Data in a standard format may be used outside of an "inner
circle" of colleagues, but only holders of the necessary private
information can use raw binary data.
- Standard data formats limit the need for pair-wise translators to
and from every possible format, whereas, each raw binary format needs a
different translator.
- Standard data formats, by fixing the syntax and semantics of
information, allow the possibility of machine-to-machine communication
between different systems (that is, interoperability). In contrast, raw
binary data requires human inspection an intervention, thus, hindering
(preventing) system interoperability.
- Standard data formats facilitate unambiguous transfer of
information between users of different systems working with different
datasets. This is more difficult with raw binary data, which often
loses all but raw pixel values in translation.
In summary, the use of private agreements does not constitute a
standard, and so "raw binary" data formats cannot be compared alongside
open consensus standards.
Of course, members of a community may choose to turn a private,
internal convention into a standard for a wider community by
documenting and publishing their shared syntax and semantics and by
sticking to what they document (that is, submitting any changes to a
public consensus process or formal authority). Most standards are born
this way when a usage community publishes its internal conventions to
facilitate collaboration with others.
4.2 Mandatory vs. Optional Elements, Profiles and Extensions
Many standards have a set of mandatory elements to ensure basic
interoperability plus a set of optional elements to serve a diversity
of users and uses. This provides a "base" standard from which a
particular community of users may define a profile (a more
specific standard) to support richer communication among themselves, or
more fine-grained control of each other's services. A profile is a
standard derived from a base standard by adding restrictions: it may
require (or exclude) an element that is optional in the base standard;
it may limit the valid entries under a heading; it may fix the
cardinality of a repeating element; and so on. But, the profile cannot
contradict the base standard; anything mandatory in the base standard
remains mandatory in the profile. Thus, any product that complies with
the profile will comply with the base standard. [1, 2]
One profile presented here is HDF-EOS, an EOS-specific adaptation of
the very general Hierarchical Data Format. Of the many different file
structures that are possible with HDF, HDF-EOS defines three (point,
grid, and swath), each with spatial and temporal details alongside
scientific data. In another example, FGDC's Metadata Content Standard
has allowed several community-specific profiles to be defined, and in
fact, the International Organization for Standardization's (ISO)
Metadata standard was designed primarily for profiles. It defines
several hundred elements of which fewer than 20 are required; the
remaining elements are shared vocabulary (i.e., a dictionary) for
building profiles. [3]
Related to profiles is the notion of extensions. These are
elements added to the base standard by consensus among a certain
community of users. As with profiles, extensions do not contradict the
base standard - what's mandatory remains mandatory; products that fit
the extended standard have everything needed by the base standard, and
more. Nonetheless, by adding more loosely controlled, loosely defined
elements to a standard, extensions may complicate the interoperability
and maintenance of the standard.
For example, the earth imagery user community has extended the FGDC
metadata content standard to more fully describe remotely sensed data
by adding metadata elements such as the sensor model and the orbital
platform, both of which the base standard doesn't provide [4].
4.3 Abstract vs. Implementation Standards
Standards and specifications for information systems are defined
primarily at two different "levels of abstraction;" implementation
specifications and abstract models [5].
- Implementation specifications tell software developers how to
express information or requests within particular distributed computing
environments (such as XML, Java, or the World Wide Web). Such standards
define data formats, access protocols, object models, naming
conventions, etc., in terms that are directly usable within the
targeted computing environment.
- Implementation specifications are the more immediately useful
standards when they apply to one's chosen computing context. The
data-format standards are implementation specifications, as are the
eXtensible Markup Language (XML) encodings of FGDC, ISO, and other
metadata standards seen in the Appendix, Section 3.0.
- Abstract models specify what information or requests are valid or
required in principle, irrespective of individual computing
environments. They define the essential concepts, vocabulary, and
generic structure (type hierarchy) of computational services and
information transfer. Although not directly usable to build data or
software, these models set the stage for creating implementation
specifications and for extending existing ones to new environments.
- Abstract models provide well-known semantics that can support
interoperability through translators or cross-reference tables. For
instance, thanks to FGDC's Content standard, Z39.50's GEO profile can
"normalize" any FGDC compliant metadata (regardless of actual record
formats or field names) for external access - that is, map its internal
data elements to the GEO field names for external access.
- In general, consensus-based abstract models of data are often
termed "content standards." They define the information elements and
their intended meaning (semantics) independently of their syntax - that
is, independent of how these elements may be encoded in files on disk
or along a communications link. In principle, content standards allow
different parties to communicate meaningfully by mapping their data
element names to those of the content standard even when they use
different formats for their data. This works well for fairly simple
data structures such as the "parameter=value" pairs of many metadata
files and catalog records. However, with more complex syntax or
semantics, translating the abstract concepts of the content standard
into the terms of a particular format often becomes an interpretation
task requiring judgment calls, assumptions, and ambiguity. So in
practice, content standards alone may not suffice for transferring
complex data between different user communities without information
loss or distortion.
4.4 Content and Format vs. Behavior and Interface
Table 4.4.1 shows that at each level of abstraction certain
standards define the interfaces that allow different systems to work
together or the expected behavior of software systems. This is the
computation viewpoint, whose accent is on invoking services effectively
and unambiguously. Other standards define the content of geospatial
information or its encoding (or packaging) for accurate transfer
between different processing systems. This is the information
viewpoint, which emphasizes efficient, lossless communication [5].
Table 4.4.1 Viewpoints and Levels of Abstraction
|
Service Invocation (computation viewpoint) |
Information Transfer (information viewpoint) |
| Implementation specifications ("how") |
Interface |
Encoding (format) |
| Abstract models ("what") |
Behavior |
Content |
For distributed computing, both of these viewpoints are crucial and
intertwined. For instance, information content isn't useful without
services to transmit and use it. Conversely, invoking a service
effectively requires that its underlying information be available and
its meaning clear. However, the two viewpoints are also separable: we
may agree on how to represent information regardless of what services
carry it; conversely, we may define how to invoke a service
independently of how we package the information needed or conveyed by
the service.
In a given context, either the computation view (implemented as
interfaces) or the information view (implemented as formats) may take
precedence. Tables 4.4.2 and 4.4.3 below show a few guidelines for
prioritizing standards definition or adoption in certain contexts. In
general, however, deciding which view to emphasize in a given setting
is not straightforward.
Table 4.4.2 Criteria For Format Standards
| Worry about a data format standard when ... |
Don't worry about a data format standard when ... |
| Users of different formats need to share or communicate data
with each other. |
There's no reason for users of different formats ever to
share information. |
| Each user group (or each user) uses a different format. |
A user consensus already exists on one or a few
non-proprietary data formats. |
| Available formats fail to convey all the information needed
for proper use. (Thus users have to rely on implicit knowledge or
ad-hoc notes to use the data.) |
A practical, reasonably simple data format conveys all of the
information users need. |
Table 4.4.3 Criteria For Interface Standards
| Worry about a service interface standard when ... |
Don't worry about a service interface standard (i.e. rely on
FTP / FedEx) when ... |
| Most users want the output of a few well-known processing
operations, such as subsetting, filtering, transformations, etc. |
Most users need direct access to raw data (as archived) for
ad-hoc processing and analysis. |
| The intended applications are streamed or interactive - they
only use parts of the available data at a given moment. |
Most use of the data requires all of it (full size and
detail) to be present simultaneously. |
| No one reasonably simple format will ever meet everyone's
needs. (A service allows users to request the data they need in a
format that fits it.) |
Users have not begun to map their workflow to online database
transactions or Web services. |
Among the data standards reviewed in this report, GeoTIFF, Landsat Fast
Format, and BUFR/GRiB are clearly file format standards; they specify
an encoding and are silent on what access interface to use. HDF,
HDF-EOS, and netCDF provide a software library to facilitate reading
and writing data files, but they too are file format standards; they
don't specify a format-neutral interface to a service. Table 4.4.4
compares the data models and software access libraries for a variety of
data packaging standards.
Table 4.4.4 Data Models and Software Access Libraries
| Data Format |
Logical Model |
Physical Model |
Software Access Libraries |
| HDF |
- Disk format, hierarchical, and similar to Unix file systems
- Self-description provided in global and local (individual
objects) attributesHeader describes disk structure with metadata &
pointers
- Usable for general scientific data storage; HFD4 data model
contains: arrays, tables, raster image and text objects. HDF5 data
model has HDF4-type objects imbedded within arrays and text attribute
objects.
- Will support extended (multiple machine) files
|
- XDR-based
- Storage layout is contiguous (serial) or chunked (direct
access)
- Datasets consist of header attributes & data
- Machine-independent
|
C, C++, FORTRAN, Java |
| HDF-EOS |
- HDF-based: Versions 4 and 5
- Provides standard for geolocation data map to science data .
- Point Structure: model for sparce, randomly geolocated data
- Swath Structure: model for data best organized by time,
latitude or track parameter
- Grid Structure: model for data organized spatially and
projected.
|
- (Same as HDF)
- XDR-based
- Storage layout is contiguous (serial) or chunked (direct
access)
- Datasets consist of header attributes & data
- Machine-independent
- Disk format is available to user
|
C, C++, FORTRAN |
| netCDF |
- Self-describing
- Usable for general scientific data storage
|
- XDR-based
- Storage layout direct access-- indexed
- Datasets consist of header & data
- Machine-independent
- Disk format is hidden
|
C, FORTRAN, Java, Perl, Python, Ruby. Tcl/Tk |
| GeoTIFF |
- TIFF-based, with geolocation tags
- Raster image data only
- Multiple images can be stored in a single file.
- Version 2 will support extended files
|
Storage layout allows random access to pixels by band, strip,
or tile |
C, Perl, Python, Java |
| BUFR |
- Tailored to atmospheric data - point data
- Based on sequential,, tape format
|
- Storage layout is serial
- Dataset consists of header + data
|
FORTRAN 77 |
| GRiB |
- Tailored to atmospheric data - gridded data
- Based on sequential, tape format
|
- Storage layout appears to be serial - "messages"
- Dataset consists of header + data
|
Command-line translators to ASCII or IEEE binary |
| Fast Format |
Multi-band image data |
- Separate header and data files
- Direct access to individual bands
|
Users write their own software based on examples |
| Binary |
- Data model chosen by user.
- Record, data types determined by specific platform.
|
- Different for every product
- Machine dependent
|
- Custom software
- Users must write their own
|
See Acronym List if needed
4.5 Web-based Data Service Standards
The World Wide Web is driving rapid development of format-neutral
service interface standards. Examples particularly relevant to ESE data
include the OpenGIS Web Coverage Service [6] and
Web Map Service [7] and the Distributed
Oceanographic Data System (DODS) [8].
The OpenGIS Consortium (OGC) Web Coverage Service (WCS) is likely to
become an OGC specification in early 2003. It will provide access to
images, imagery collections, and other systematic "fields" of values or
measurements - usually arrayed on a 2D or 3D spatial grid. It fully
describes the data's spatial location and its semantic content and
allows clients to request subsets in space or along any of the data
dimensions using a syntax based on either Uniform Resource Locators
(URLs) or structured XML messages. The EOSDIS Core System (ECS) Synergy
effort intends to provide WCS access to its large online data holdings
("data pools"); and the GLOBE educational project ("global learning and
observations to benefit the environment:") has begun experimenting with
WCS and WMS (next).
The OGC Web Map Service (WMS) provides access to rendered maps and
pictures using a simple, spatial query syntax and common graphics
formats (PNG, JPEG, etc.). Since its inception in early 2000, this
interface has seen widespread implementation by many vendors,
laboratories, and open-source efforts.
The Distributed Oceanographic Data Service provides format-neutral
access to scientific datasets; its query syntax allows for "slicing" or
"sampling" a dataset along any of its variable values. DODS originated
at MIT and the University of Rhode Island (URI) in the mid-1990s; since
then, it has seen a fair bit of implementation in the oceanographic
community and among NASA DAACs. Recently, URI and NASA-DAACs have built
"gateways" from DODS to WMS and WCS; and URI has begun defining two
distinct successors to DODS: an "Open Source Project for a Network Data
Access Protocol" (OPeNDAP) (tools for generic infrastructure protocols)
and a "National Virtual Ocean Data System (NVODS)" (to supply
oceanographic data and applications). [9]
(Notable Web-based services in the ESE environment include the
University of Maryland's MOCHA project ("Middleware based on a
code-shipping architecture") [10]; the Tropical
Rainforest Information Center (TRFIC) at Michigan State University [11]; EOS-Webster at the University of New Hampshire
[12]; and many others. However, these are not
service interface standards but, rather, particular implementations of
distributed systems. Although they provide a useful benefit to their
users, they are not linked by a well-defined, published service
interface standard; instead, they rely on tightly coupled components or
on unpublished or proprietary interfaces.)
Finally, a number of vendors in the world of e-commerce have championed
the notion of "Web Services" [13] consisting of
the Web Services Description Language (WSDL) [14];
Simple Object Access Protocol (SOAP) [15]; and
Universal Description, Discovery, and Integration (UDDI) [16]. These industry specifications have gained
broad visibility and offer a lot of promise for Web-based data access;
however, the dust is far from settling on this very active area of
technology development.
Generally, the use of Web-based services is still only emerging in
practical ESE work. The primary mechanism for information interchange
in the ESE context remains the transfer of discrete files; it will take
some time before Web-based services become a part of mainstream data
access and distribution in ESE. Accordingly, this document treats
format and content standards only for the near-term missions.
References:
[1] ISO TC211 (2001), "Geographic Information -
Profiles" http://www.isotc211.org/protdoc/211n1134/211n1134.pdf
[2] Federal Geographic Data Committee (1998),
Content Standard for Digital Geospatial Metadata: http://www.fgdc.gov/metadata/csdgm/
: Appendix D; Appendix E.
[3] Simon Cox (2001), Summary of some geospatial
metadata standards: http://www.ned.dem.csiro.au/research/visualisation/metadata/geospatial/
[4] NASA (2001), Digital Earth Reference Model
v0.5: http://www.digitalearth.gov/derm/v05/
[5] FGDC (2001), Content Standard for Digital
Geospatial Metadata: Extensions for Remote Sensing Metadata: http://www.fgdc.gov/standards/status/csdgm_rs_ex.html
[6] Evans, John D. (2001), OGC Web Coverage Server
(WCS), Discussion Paper #01-018: http://www.opengis.org/techno/discussions/01-018.pdf
[7] de La Beaujardière, Jeff (2001), OGC Web
Map Server Interface Implementation Specification, version 1.1.1: http://www.opengis.org/techno/specs/01-068r3.pdf
[8] Distributed Oceanographic Data System (DODS): http://www.unidata.ucar.edu/packages/dods/
[9] Cornillon, Peter (2002), "DODS: OPeNDAP
providing plug-and-play interoperability in a distributed data system,"
presentation at the 9th Assembly Meeting of the ESIP Federation,
University of Maryland, College Park, May 15-17, 2002. http://www.esipfed.org/business/library/meetings/9th_fed_meeting/ppt/DODS.PPT
[10] The MOCHA project: Self-Extensible Middleware
Architecture. http://www.cs.umd.edu/projects/mocha/
[11] Tropical Rain Forest Information Center
(TRFIC). http://www.bsrsi.msu.edu/trfic/
[12] EOS-WEBSTER: Earth Science Data from the
University of New Hampshire. http://eos-webster.sr.unh.edu/
[13] World Wide Web Consortium (W3C), 2002: Web
Services home page. http://www.w3.org/2002/ws/
[14] World Wide Web Consortium (W3C), 2002: Web
Services Description Language (WSDL) Version 1.2: W3C Working Draft 9
July 2002. http://www.w3.org/TR/2002/WD-wsdl12-20020709/
[15] World Wide Web Consortium (W3C), 2002: SOAP
Version 1.2 Part 1: Messaging Framework. http://www.w3.org/TR/soap12-part1/.
[16] Universal Description, Discovery, and
Integration (UDDI) of Business for the Web. http://www.uddi.org
5.0 Standards Evaluation
In order to objectively assess the data and metadata standards
identified in Chapter 2 for the ESDSWG near-term missions, an analysis
is carried out to evaluate the standards according to many features or
criteria. Furthermore, a user opinion interview/survey is conducted to
gather user community's feedback on using the standards.
5.1 Evaluation Criteria
Many features or criteria can be used to evaluate the data and metadata
standards identified. The intention of this study is not to identify
one all-purpose standard but, rather, to identify appropriate use of
the standards. For example, some standards are more suitable for
transmission and archiving while others for analysis. For transmission
and archiving, the most important features standards should have are
semantic completeness, portability, self-description, extensibility,
interoperability, etc [1]. For analysis, standards
should have features such as ease of use, analysis tools support, etc.
Many of these features and others are defined below.
1. Interoperability - Tools exist to translate to other standard
formats with no information loss.
* Is there a defined relationship or semantic equivalence between this
standard and other standards? i.e., can the standard be broken into
elements that have the same content as elements for other standards?
* Is the definition sufficiently precise to allow development of a
translation algorithm between standards?
* What translation tools (well known) have been developed?
2. Availability - Source code for writing and reading data in the
format is widely and publicly available.
* Is the source code for writing and reading data widely and publicly
available?
* Is the software for reading and writing well documented?
* Are the search and order methods for data using the format well
understood and established?
3. Portability - Data in this standard can be used on a variety of
platforms or in a variety of applications (vendor support).
* Is the format sufficiently well defined so that data can be ported to
new commonly used platforms with minimal effort?
* Is the format sufficiently well implemented that new applications can
access the implementation with minimal effort?
* Can the standard be implemented on one platform and installed and
tested on other platforms with minimal modification of source code?
i.e., machine dependent code is minimized.
4. Evolvability - A clear process for maintaining and evolving the
standard exists.
* Is there a methodology for adding new features to the standard?
* Is there a software development process?
* Is there a standard for documentation?
* Is there an open process for evolution?
5. Extensibility - Support for extensions and profiles exists.
* Does the standard allow extensions or profiles to be developed?
* Are there extensions or profiles developed for the standard?
6. Self-describing - Files contain data descriptions along with the
data.
* Can data in this format be read without a separate document detailing
file contents?
* Can the data be described internally to facilitate development of
applications?
* Does the format contain information to allow geospatial, temporal,
and/or spectral subsetting?
7. Tools Support - Software tools are available to support the
standard.
* Does the standard have freeware support?
* Does the standard have COTS (Commercial Off-The-Shelf) software
support?
8. Completeness - The capacity to carry semantic descriptive elements
of the data explicitly and unambiguously. Higher levels of completeness
can reduce the user's dependency on outside information, implicit
knowledge, or guesswork when interpreting and applying the data.
* Can the format carry everything users need to use the data correctly?
i.e., can the format convey the data's precise spatial location, its
units of measure, the observation parameters (e.g., spectral bands),
accuracy estimates (error bars), and other elements needed to
understand the data and apply it?
5.2 Data Standards Evaluation
Using the standards evaluation criteria defined above, Tables 5.2.1
through 5.2.8 analyze and compare data standards in use in heritage
missions and other ESE missions.
Table 5.2.1 Data Standards Interoperability
Data Standard
Evaluation Questions
Is there a defined relationship or semantic equivalence between the
standard and other standards?
Can translation algorithms be developed easily?
What Translation Tools (well known) developed?
HDF
Yes. Since HDF can contain general scientific data, it encompasses all
the other standards.
Yes, HDF has a well-documented software API.
GIF <-> HDF5
HDF4 <-> HDF5
Ensight6 -> HDF5
HDF-EOS
Yes. As a superset of HDF, it also encompasses the other standards.
Yes, Point, Grid Swath add-on structures are well-documented.
GIF <-> HDF5
HDF4 <-> HDF5
Ensight6 -> HDF5
GeoTIFF
Yes, for image-based standards; no, for non-image standards.
Yes. Public domain API library partially documented.
Lots of converters for TIFF; also GeoTIFF tag read & write
Specialized converters for L7, MODIS, MISR, ASTER
Fast Format
No
No. No API or library exists.
No
Native Binary
Depends on the standard. Most are specific to the application.
Depends on the standard, but usually not, unless specific efforts are
made to document and publish an API.
No, You have to write your own translation tool
netCDF
Yes. Since netCDF can contain general scientific data, it encompasses
all the other standards.
Yes. Net CDF has a well-documented API.
-> HDF
-> Matlab5
BUFR/GRiB
Yes - translation of meteorological parameters to other formats is
possible, with no loss of content. No for non-meteorological standards.
Yes
BUFR -> CDF
See Acronym List if needed
Table 5.2.2 Data Standards Availability
Data Standard
Evaluation Questions
Source code for writing and reading data widely available?
Read/write software well documented?
Format well described to facilitate application development?
HDF
Yes
Yes
C, C++, Fortran, and Java interfaces exist. Applications must use one
of these interfaces to access the data
HDF-EOS
Yes
Yes
C, C++, Fortran, and Java interfaces exist. Applications must use one
of these interfaces to access the data
GeoTIFF
Open source libraries; many COTS and freeware applications available
User interface well documented
TIFF format well documented. COTS venders sometimes use variations of
the standard.
Fast Format
No
No
No
Native Binary
Not always
Not always
Not always
netCDF
Yes (C, C++, FORTRAN, Perl)
Yes
Yes
BUFR/GRiB
There are few slightly different read and write software from different
organizations or countries
Not always
Not always
See Acronym List if needed
Table 5.2.3 Data Standards Portability
Data Standard
Evaluation Questions
Portable among commonly used platforms?
Format is sufficiently well implemented that new applications can
access the implementation with minimal effort?
Standard can be implemented on one platform and installed and tested on
other platforms with minimal modification of source code?
HDF
Precompiled HDF libraries for a variety of popular platforms such as
AIX, Cray HP,SGI,Sun, Linux and Windows.
Yes
Yes
HDF-EOS
Precompiled HDF-EOS libraries for a variety of popular platforms such
as AIX, HP, SGI, Sun, and Linux.
Yes
Yes
GeoTIFF
Works on common OS's (Linux, Unix, Windows). Designed to be portable,
but need some knowledge of specs.
Need some knowledge of the specs., Need understanding of geotags to
develop applications.
Yes
Fast Format
Yes
No
Yes
Native Binary
Usually not
No
No
netCDF
All major OS's: Winx, Unix, Linux, MacOS
Yes
Yes
BUFR/GRiB
YES
A generalized application would require in depth knowledge of all
variants, which is not easy to obtain
YES
See Acronym List if needed
Table 5.2.4 Data Standards Evolvability
Data Standard
Evaluation Questions
Is there a methodology for adding new features?
Is there a software development process?
Is there a standard for documentation?
Is there an open process for evolution?
HDF
NCSA is a currently active and outside funded group whose purpose is
devoted to the HDF project. They manage development schedules and are
open to suggestions from users. They are funded from a variety of
sources.
Yes, HDF library is funded and developing software.
Yes, HDF library follows an internally defined standard for their
documentation.
Yes, HDF group allow input from outside users
HDF-EOS
Support is a contract from NASA. They respond to suggestions from
users. It is NASA's decision on how long to support the contract and
whether to supply money for development as well as maintenance.
Yes, HDF-EOS library is funded and developing software.
Yes, HDF-EOS library follows an internally defined standard for their
documentation.
Yes, HDF-EOS group allow input from outside users
GeoTIFF
Maintained by JPL..No formal process, i.e. Standards committee. The
standard can be modified by others.
Yes
Yes
OpenGIS, but no formal process. Work on the GeoTIFF v2.0 spec has been
slow recently, with some recent efforts
Fast Format
No
No
No
No
Native Binary
No
No
No
No
netCDF
Yes, through Unidata
Yes
Yes
Yes, informally through Unidata
BUFR/GRiB
YES
NO
Appears so, WMO issues Tech. Docs. on these formats
The WMO CBS approves changes to the format and maintains a software
registry
See Acronym List if needed
Table 5.2.5 Data Standards Extensibility
Data Standard
Evaluation Questions
Does the standard allow extensions or profiles to be developed?
Are there extensions or profiles developed for the standard?
HDF
Yes
HDF-EOS is a profile which was developed.
HDF-EOS
This is a profile of HDF
No
GeoTIFF
Yes. New projections can be added. Multiple-band GeoTIFFs allowed.
GeoTIFF 2.0 will allow external files.
None that are not part of unofficial list of projections
Fast Format
No
No
Native Binary
No
NO
netCDF
Yes
Yes, e.g., MINC: (Medical Image netCDF)
BUFR/GRiB
YES
Not sure
See Acronym List if needed
Table 5.2.6 Data Standards Self-Describing
Data Standard
Evaluation Questions
Is data able to be stored so that it can be read without a separate
document detailing file contents? .
Can the data be described internally to facilitate development of
applications?
Does the format contain information to allow subsetting?
HDF
Data can be stored so that it is self-describing. There are no
restrictions in the standard though to prevent developers from using
names such as Variable1.
Data can be described with enough detail to allow applications to
process data appropriately. For instance, scale factors may be included
but it is developer dependent on how to do this. As a result, generic
applications are limited in their scope. Applications developed for a
specific data set can be very precise.
Yes, information can be supplied to allow subsetting, but there is not
a requirement to do so in a consistent way. Subsetting by selecting
selected data fields can easily be done on any HDF file.
HDF-EOS
Data can be stored so that it is self-describing. There are no
restrictions in the standard though to prevent developers from using
names such as Variable1.
Data can be described with enough detail to allow applications to
process data appropriately. For instance, scale factors may be included
but it is developer dependent on how to do this. As a result, generic
applications are limited in their scope. Applications developed for a
specific data set can be very precise.
Because of the profile, subsetting along certain geolocation fields can
be done. Individual developers can break this process by not following
the profile (there is no internal c