Skip all navigation and jump to content Jump to section navigation
NASA Home Page Link
NASA Logo - Goddard Space Flight Center




 

Earth Science Data System
WORKING GROUP
SEEDS STUDY
NEWS & EVENTS
List of REASoNs
Project Reporting
Access
 

What's New

______________________________
 
Capability Vision

Reuse Enablement System (RES)
Trade Study


Technology Transfer Guideline

Software Reuse Within the Earth Science Community
 
 

 


background

ESDSWG: Strategic Evolution of ESE Data Systems

(Formerly NewDISS: New Data and Information Systems and Services)

Appendix B - Standards for Near-Term and Longer-Term Missions


Near-Term Missions Standards Recommendations

July 30, 2002
ESDSWG Near-Term Mission Standard Study Team



Contributors

  • Richard Ullman, NASA/GSFC, Study Team Lead
  • Jingli Yang, ERT, Study Team
  • Cheryl Craig, NCAR, Study Team
  • John Evans, GST, Study Team
  • Larry Klein, L-3 Analytics, Study Team
  • Dorian Shuford, ERT, Study Team
  • Siri Jodha Singh Khalsa, L-3 Analytics, Study Team
  • Matt Smith, UAH, Study Team


Table of Contents

1.0 INTRODUCTION
1.1 ESDSWG GOALS AND STRATEGY
1.2 THE RATIONALE FOR STANDARDS
1.3 ASSUMPTIONS
1.4 METHODOLOGY
2.0 NEAR-TERM MISSION AND HERITAGE MISSION STANDARDS
2.1 ESDSWG NEAR-TERM MISSIONS
2.2 HERITAGE MISSION STANDARDS
3.0 LESSONS LEARNED
3.1 LESSONS LEARNED ON IMPLEMENTING AND USING NASA EOS STANDARDS
3.1.1 Landsat 7
3.1.2 TERRA
3.1.3 AQUA
3.1.4 AURA
3.1.5 QuikSCAT/SeaWinds
3.1.6 ACRIM
3.1.7 SeaWiFS
3.1.8 Jason-1
3.1.9 AVHRR
3.2 LESSONS LEARNED ON IMPLEMENTING AND USING OTHER STANDARDS
3.2.1 NOAA Standards
3.2.2 The Spatial Data Transfer Standard (SDTS)
4.0 ESSENTIAL STANDARDS CONCEPTS
4.1 A COMPARISON WITH PRIVATE, AD-HOC, BINARY INFORMATION TRANSFER
4.2 MANDATORY VS. OPTIONAL ELEMENTS, PROFILES AND EXTENSIONS
4.3 ABSTRACT VS. IMPLEMENTATION STANDARDS
4.4 CONTENT AND FORMAT VS. BEHAVIOR AND INTERFACE
4.5 WEB-BASED DATA SERVICE STANDARDS
5.0 STANDARDS EVALUATION
5.1 EVALUATION CRITERIA
5.2 DATA STANDARDS EVALUATION
5.3 METADATA AND DOCUMENTATION STANDARDS EVALUATION
5.4 USER SURVEYS
5.4.1 Data Format Standards
5.4.2 Metadata Format Standards
6.0 SUMMARY
7.0 CONCLUSIONS
7.1 DATA INTERFACE STANDARDS RECOMMENDATIONS
7.2 DATA PACKAGING STANDARDS
7.2.1 Data Distribution Formats Recommendations
7.2.2 Data Interchange Formats Recommendations
7.3 METADATA STANDARDS RECOMMENDATIONS
7.4 DOCUMENTATION STANDARDS RECOMMENDATIONS
7.5 STANDARD EVOLUTION PROCESS & OTHER ACTIVITIES RECOMMENDATIONS
ACRONYM LIST

List of Tables and Figures

FIGURE 1.1.1 SIMPLIFIED ESE NETWORK DATA FLOW
TABLE 2.1.1 ESDSWG NEAR-TERM MISSIONS
TABLE 2.1.2 ESDSWG NEAR-TERM MISSION STANDARDS
TABLE 2.2.3 ESDSWG HERITAGE MISSIONS DATA MANAGEMENT INFORMATION
TABLE 4.4.1 VIEWPOINTS AND LEVELS OF ABSTRACTION
TABLE 4.4.2 CRITERIA FOR FORMAT STANDARDS
TABLE 4.4.3 CRITERIA FOR INTERFACE STANDARDS
TABLE 4.4.4 DATA MODELS AND SOFTWARE ACCESS LIBRARIES
TABLE 5.2.1 DATA STANDARDS INTEROPERABILITY
TABLE 5.2.2 DATA STANDARDS AVAILABILITY
TABLE 5.2.3 DATA STANDARDS PORTABILITY
TABLE 5.2.4 DATA STANDARDS EVOLVABILITY
TABLE 5.2.5 DATA STANDARDS EXTENSIBILITY
TABLE 5.2.6 DATA STANDARDS SELF-DESCRIBING
TABLE 5.2.7 DATA STANDARDS TOOLS SUPPORT
TABLE 5.2.8 SEMANTIC COMPLETENESS
TABLE 5.2.9 DATA STANDARDS EVALUATION
TABLE 5.3.1 METADATA AND DOCUMENTATION STANDARDS EVALUATION
TABLE 5.4.1 SURVEY RATINGS OF ATTRIBUTE IMPORTANCE
TABLE 5.4.2 DATA STANDARDS SURVEY EVALUATION
TABLE 5.4.3 SUMMARY OF SURVEY ESSAY QUESTIONS
TABLE 5.4.4 METADATA STANDARDS SURVEY EVALUATION
TABLE 5.4.5 SUMMARY OF METADATA SURVEY ESSAY QUESTIONS

1.0 Introduction

1.1 ESDSWG Goals and Strategy

ESDSWG, previously called NewDISS, involves the Strategic Evolution of the Earth Science Enterprise Data Systems to serve research and application needs in the next ten years. Its primary goal is to support NASA's Earth Science Enterprise (ESE), which, in turn, contributes to the US Global Change Research Program (USGCRP). As such, ESDSWG is driven principally by the objectives of scientific research, but must also serve the needs of both scientific research and a wide variety of practical applications.

Future ESE data systems will consist of a heterogeneous mix of interdependent components derived from the contributions of numerous individuals and institutions. These widely varying participants will be responsible for data management functions including data acquisition and synthesis; access to data and services; and data stewardship.

"An important premise underlying the operation of [the ESE network of data systems and services] is that its various parts should have considerable freedom in the ways in which they implement their functions and capabilities. Implementation will not be centrally developed, nor will the pieces developed be centrally managed. However, every part of [the ESE network] should be configured in such a way that data and information can be readily transferred to any other. This will be achieved primarily through the adoption of common standards and practices [1]."

Figure 1.1.1 is a simplified data flow diagram of the ESE network of data systems and services [1]. Five types of data centers, namely Backbone Processing Centers, PI-managed Mission Data Centers, Science Data Centers, Applications Data Centers, and Multimission Data Centers are shown in the diagram. Several data flows, such as data flows from PI-managed Mission Data Centers to Multimission Data Centers and vice versa, from Science Data Centers to Applications Data Centers and vice versa, from Science Data Centers to Science Data Center, from PI-managed Mission Data Centers to PI-managed Mission Data Centers, etc. are omitted for simplicity. Four different types of data flow are identified in the diagram. Internal data flow refers to data flow inside each data center. L0 or spacecraft data flow refers to spacecraft or level 0 data flow between mission operations, PI-managed Data Centers or Multimission Data Centers, and Backbone or Long-Term Archive Data Centers. Distribution flow denotes data distribution to end-users. System interchange flow denotes data exchange between data centers. As suggested by Figure 1.1.1, the ESE network provides a means for opening numerous new channels for Earth Science satellite data streams to reach the user community. Such data streams will flow to users both directly from mission data processing centers as well as via many intermediate information providers.


This graphic gives a simplified flow diagram.  The key points are made in the preceding paragraph.

Figure 1.1.1 Simplified ESE network Data Flow (Adopted from Figure C-2 [1])


The ESDSWG Near-Term Missions Standards (NTMS) study group is tasked to make recommendations for the use of standards by the ESE near-term missions (described in the Appendix, Section 1.0). These standards are not meant to prescribe the ways that each near-term mission manages data internally or the L0 or spacecraft data flow. Instead, the recommended standards pertain to the data distribution to end-users and to the data interchange between the ESE network of data systems and services components (i.e., between different data centers as shown in Figure 1.1.1).

1.2 The Rationale for Standards

Standards aid in interoperability between data systems and facilitate access by users and the software they use. The successful adoption and use of standards for the ESE network of data systems would reduce the cost and enhance the efficiency of data system development and maintenance. Use of standards for the interchange among the ESE data and service components also makes it easy for data and service providers to join the ESE network of data systems without negotiating one-to-one agreements with each potential provider. The standards that the NTMS study group is addressing include data packaging standards, data service interface standards, metadata standards, and documentation standards, as defined below.

  • Data Packaging Standards define how to package or encode data that is stored on a computer or transferred from one system to another. Software libraries may be available to facilitate decoding, encoding, or manipulating data packaged in a particular way.
  • Content Standards for data or metadata define the information elements and their intended meaning (semantics), independently of how these elements may be encoded in files (their syntax). Two or more encodings of the same content standard can be mapped (machine-translated) to each other with no loss of information.
  • Data Service Interface Standards specify data access requests and service invocations between ESE data and services components, usually over a network. These interface standards are defined independently of the data's packaging (encoding). Web service standards, driven by electronic commerce and other markets, are a particularly promising class of service interface standards in the World Wide Web context.
  • Metadata / Documentation Standards provide a common lexicon and a set of attributes describing data to ensure that users can 1) find the data in catalogs, registries, and other indexes; 2) interpret the data unambiguously; and 3) apply system services correctly. Metadata is usually highly structured and formalized, whereas, documentation usually refers to more free-text descriptions. Most metadata and documentation standards are content standards (format-independent); XML is a popular encoding for metadata.

For years, various satellite missions and scientific communities have found ways to use each other's data, but stable, rich standards can further promote opportunities in research and applications for data users worldwide. The evolution of these standards over the past 25 years or so has largely been driven by specific science communities with a goal of making life easier for themselves. The past 10 years or so has seen ever wider global scientific communities tied together through the Internet with a goal of still faster-paced data exchange and hopefully faster-paced research results. However, the diversity of available data sources and data standards presents a significant challenge to Earth science researchers, especially interdisciplinary Earth scientists.

As almost any researcher can attest, a substantial portion of the resources required to perform an investigation are expended on locating, obtaining, and then reading and possibly reformatting the necessary data. Standardization of data formats, metadata, and documentation can lower the threshold on data exchange between the ESE network of data systems and services components and the user access to the data products. The Internet offers a compelling example of the essential role standards play in facilitating data exchange. Without the underpinnings of the Internet - TCP/IP, HTML, SMTP, GIF, JPEG, PDF, etc., the explosion of information exchange brought about by the Internet could never have happened.

1.3 Assumptions

This study focuses on near-term missions that are already in formulation and is aimed to provide concrete, specific recommendations for the near-term missions' use. The following assumptions are made to carry out this study.

  1. The emerging field of Web Services is driving rapid development of data format-neutral service interface standards. Examples relevant to ESE data include the OpenGIS Web Map Service and Web Coverage Service. However, the use of online services is still only emerging in practical ESE work; it will take some time before Web Services become a part of mainstream data access and distribution.
  2. For the near-term missions, the preferred mode of delivering data remains the transfer of discrete files. Therefore the file format itself is critical to the interchange standard.
  3. Content data standards (define the information elements and their intended meaning (semantics), independently of their syntax) provide well-known semantics that can support interoperability through translators or cross-reference tables. The leading definition for such standards is the Federal Geographic Data Committee (FGDC) that has developed Content Standard for Remote Sensing Swath Data and Content Standards for Digital Orthoimagery. However in practice, content standards alone may not suffice for transferring complex data between different user communities without information loss or distortion.
  4. The processes of standards development and adoption are the responsibility of the long-term standards study team.

1.4 Methodology

This document provides recommendations for the use of standards by the near-term missions. We analyzed what standards are currently in use in the near-term heritage missions and other EOS missions, posing questions such as: What are the lessons learned on implementing and using those standards currently in use? What are the lessons learned from other government agencies such as NOAA? What criteria should we use to evaluate different standards? What feedback do data producers and data users have on standards? What standards do users think NASA should use in the future? Once we provide recommendations, how can the recommendations be implemented for the near-term missions? What respective activities should be supported in order to facilitate the adoption of the standards?

This report intends to answer these questions. It is based on a previous report entitled, "Near-Term Missions and Standards Survey," which examines near-term missions and heritages missions and standards in use by the heritage mission data management systems as well as several emerging standards. Most of the content of the survey report are included in the Appendices as background materials. In this report, we present a summary of the heritage missions and standards in use in the heritage missions. We review lessons learned from implementing and using standards in heritage missions and in some NOAA missions. We compare standards based on essential standards concepts. In addition, we develop a suite of standards evaluation criteria and carry out a standards analysis. The results from the standards analysis are presented.

In order to include data users' and data producers' feedback on current data and metadata standards in use in the ESE missions in the study, we conducted a user interview/survey; this report summarizes and analyzes the results from the interview and survey.


References:

[1] A 6 to 10 Year Approach to Data Systems and Services for NASA's Earth Science Enterprise; Draft Version 1.0; February 2001; Section A.3.



2.0 Near-Term Mission and Heritage Mission Standards

2.1 ESDSWG Near-Term Missions

The missions that ESDSWG is initially targeted to support include the following eight near-term missions (Table 2.1.1). A detailed description of these missions can be found in the Appendix, Section 1.

Table 2.1.1 ESDSWG Near-Term Missions
Mission Name Phase Anticipated Launch Date
Landsat Data Continuity Mission (LDCM) Formulation 2005
NPOESS Preparatory Project (NPP) Formulation 2005
Ocean Surface Topography Measurement (OSTM) Formulation 2005
Ocean Vector Winds Formulation 2006
Global Precipitation Measurement (GPM) Formulation 2007
Solar Irradiance Formulation 2007
Carbon Cycle Initiative (CCI) Pre- Formulation 2008-2012
Total Column Ozone Pre- Formulation N/A

See Acronym List if needed

A summary of the near-term mission instruments, data formats, and metadata standards is described in Table 2.1.2. As shown in the table, LDCM, the first near-term mission, has already decided the data and metadata standards they plan to use for the mission data products (specified in the Request For Proposal (RFP) they released October 2001). Our recommendations for the use of data, metadata, and data interfaces in near-term missions may, or may not, impact the LDCM mission.


Table 2.1.2 ESDSWG Near-Term Mission Standards
Missions Instrument Data Format Metadata Format
LDCM Not specified
  1. HDF
  2. GeoTIFF
  3. L7 Fast Format
  1. ECS
  2. FGDC
NPP
  • ATMS
  • CrIS
  • VIIRS
N/A N/A
OSTM (or Jason-2) N/A N/A N/A
Ocean Winds Seawinds N/A N/A
GPM
  • Dual Frequency Radar (DFR)
  • Advanced TRMM Microwave Imager (TMI)
  • Nadir-viewing Microwave Radiometer
N/A N/A
Solar Irradiance N/A N/A N/A
CCI Mission 1: Pathfinder CO2 A passive spectrometer N/A N/A
CCI Mission 2: Ocean Carbon A rotating scanner telescope N/A N/A
CCI Mission 3: Low Density Biomass A hyperspectral imager N/A N/A
CCI Mission 4: High Density Biomass A P-band SAR and an imaging laser altimeter N/A N/A
CCI Mission 5: Advanced Atmospheric CO2 A pulsed, dual frequency, tunable laser sounder N/A N/A
Total Column Ozone Some combination of OMPS-like, TOMS-like, SAGE-like and an IR limb sounder N/A N/A

See Acronym List if needed

2.2 Heritage Mission Standards

Data management information for near-term missions and heritage missions is presented in Table 2.2.3.


Table 2.2.3 ESDSWG Heritage Missions Data Management Information
Mission Heritage Mission Heritage Instrument Production Site Archive Site Data Format Metadata Format
LDCM Landsat 1-7
  • TM
  • ETM+
EDC DAAC EDC DAAC
  1. HDF 4
  2. GeoTIFF
  3. L7 Fast Format
  1. ECS
  2. FGDC
NPP Aqua
  • AMSU
  • HSB
  • AIRS
GSFC DAAC GSFC DAAC HDF-EOS 4 ECS
Terra MODIS
  • GSFC DAAC
  • NSIDC DAAC
  • EDC DAAC
  • GSFC DAAC
  • NSIDC DAAC
  • EDC DAAC
HDF-EOS 4 ECS
OSTM Jason-1
  • Poseidon-2 Radar Altimeter
  • Jason Microwave Radiometer
  • PO DAAC
  • AVISO
  • PO DAAC
  • AVISO
Native Binary Custom
Topex/ Poseidon
  • Topex Altimeter
  • Topex Microwave Radiometer
  • PO DAAC
  • AVISO
  • PO DAAC
  • AVISO
  • Native Binary for low level products
  • netCDF for Level 3 product
Custom
Ocean Winds Adeos-1 NSCAT JPL SeaPAC PO DAAC HDF 4 Adapted ECS
Quikscat Seawinds JPL SeaPAC PO DAAC
  • HDF 4
  • BUFR
Adapted ECS
Adeos-2 Seawinds JPL SeaPAC PO DAAC
  • HDF 4
  • BUFR
Adapted ECS
GPM TRMM TMI GSFC DAAC GSFC DAAC HDF 4 ECS
VIIRS GSFC DAAC GSFC DAAC HDF 4 ECS
PR GSFC DAAC GSFC DAAC HDF 4 ECS
CERES LaRC SIP LaRC DAAC HDF 4 ECS
LIS GHRC SIP GHRC HDF 4 ECS
Solar Irradiance SNOE XPS LASP LASP ASCII Custom
UARS SOLSTICE
  • SOLSTICE
  • SIM
UARS CDHF and GSFC GSFC DAAC Native Binary Format Native SFDU format
ACRIM III TIM ACRIM III SIPS LaRC DAAC HDF 4 ECS
EOS SORCE
  • SIM
  • SOLSTICE
  • XPS
  • TIM
LASP SORCE SIP GSFC DAAC HDF-5 ECS
CCI SeaStar SeaWiFS GSFC DAAC GSFC DAAC
  • HDF
  • FF
ECS
Terra MODIS GSFC DAAC GSFC DAAC HDF-EOS 4 ECS
Nimbus-7 CZCS GSFC DAAC GSFC DAAC
  • HDF
  • DSP
  • CRTT
Native format
VCL MBLA Raytheon ITSS EDC DAAC Unknown Unknown
Total Column Ozone
  • Nimbus-7
  • Meteor-4
  • ADOES
  • Earth Probe
  • QuikTOMS
TOMS GSFC DAAC GSFC DAAC HDF-4 ECS
AURA OMI GSFC DAAC GSFC DAAC
  • HDF-4 for Level 0 and 1
  • HDF-EOS 5 for Level 2 up
ECS

See Acronym List if needed


Several observations can be made from Table 2.2.3:

  1. Most of the heritage missions use the Hierarchical Data Format (HDF) or HDF-EOS (Earth Observing System) data formats and the EOSDIS Core System (ECS) metadata format for archiving and distribution. Heritage missions that do not use the HDF or HDF-EOS data formats and the ECS metadata format for product distribution are the Jason-1, Topex/Poseidon, and the Upper Atmospheric Research Satellite (UARS) missions. The Jason-1 and Topex/Poseidon missions are heritage missions to the Ocean Surface Topography Mission. UARS is a heritage mission to the Solar Irradiance mission.
  2. Several heritage missions distribute their data products in multiple data and metadata formats. For example, Landsat missions distribute their data products in three different data formats, namely HDF, GeoTIFF, and Fast Format, and two metadata formats, ECS and FGDC (Federal Geographic Data Committee). SeaWinds distributes their data products in HDF and BUFR (Binary Universal Format For Representation of data) format. The HDF format is for distributing research data products by the NASA Jet Propulsion Laboratory (JPL) Distributed Active Archive Center (DAAC), while BUFR format is used to distribute operational data products by NOAA NESDIS (National Environmental Satellite, Data, and Information Service).
  3. Data distribution formats for heritage missions consist of HDF, HDF-EOS, netCDF, GeoTIFF, Fast Format, BUFR, Binary, and ASCII. Metadata distribution formats for heritage missions include ECS, FGDC, and custom formats. A survey and critique of different data standards and metadata standards can be found in the Appendix, Section 2.0 and Section 3.0, respectively.

3.0 Lessons Learned

This chapter presents lessons learned from past experiences with data and metadata standards used for NASA ESDSWG heritage missions and NOAA missions. Some of the lessons learned pertain to past experiences with developing or implementing the standards, and others are related to past experiences with using the standards.

3.1 Lessons Learned on Implementing and Using NASA EOS Standards


3.1.1 Landsat 7

Landsat 7 data products are archived in the HDF format but distributed in three different formats: GeoTIFF, Landsat 7 Fast Format, and HDF. Based on statistics collected by the EDC DAAC [Earth Resources Observation System (EROS) Data Center (EDC) Distributed Active Archive Center (DAAC)] User Services from January 1, 2001, to September 30, 2001, most of the users ordered L-7 data either in Fast Format (46%) or in GeoTIFF (42%). Only 12% of the users ordered L-7 data in HDF format. Of the users who ordered data in HDF format, most were from international ground stations and the data product they ordered was Level 0R. HDF is the only format available for Level 0R. These statistics indicate that:

  • User communities welcome multiple distribution data formats. Statistics have shown that users order Landsat 7 data in all three available formats with the majority (88%) of the users choosing GeoTIFF or Fast Format. This indicates that for well-developed satellite mission user communities such as the Landsat data user community, multiple data distribution formats are needed. Different users choose different data formats in their applications.
  • Heritage mission data distribution formats play an important role. The reason the majority of the Landsat 7 users choose GeoTIFF or Fast Format may be because the Landsat 7 heritage mission Landsat 5 data products are distributed in Fast Format or GeoTIFF format. Thus, users were already familiar with those two formats. It seems natural that users should choose to use a format they are already familiar with rather than switching to a new data format, such as HDF.
  • GeoTIFF data format is gaining popularity among Geographic Information System (GIS) users. Landsat Thematic Mapper (TM) data (Landsat 4-5) products have been distributed in Fast Format since 1984. EDC DAAC began distributing Landsat 5 TM data products in GeoTIFF in recent years. However, based on the statistics collected from January 1 to September 30, 2001, almost half (42%) of the users order Landsat 7 data products in GeoTIFF format. As GeoTIFF format is becoming a popular data format in the GIS user community, EDC DAAC is considering distributing other land remote sensing data, such as ASTER (Advanced Spaceborne Thermal Emission And Reflection Radiometer) data products, in GeoTIFF format in addition to the HDF format.

3.1.2 TERRA

The flagship in NASA's Earth Observing System (EOS), Terra launched on December 18, 1999 and began collecting science data on February 24, 2000. There are five instruments onboard Terra, namely MODIS, ASTER, MISR, CERES, and MOPITT (see Acronym List). The data products from Terra, consisting of a great variety of ocean, atmosphere, and land data sets, are archived and distributed in HDF-EOS format as required by the EOS project. Terra metadata conforms to the ECS data model.

In the early 1990's, NASA's Earth Science Data Information Systems (ESDIS) began evaluating data format standards in preparation for the launches of the EOS satellites. In 1993, after careful consideration of over a dozen different formats, ESDIS chose the Hierarchical Data Format (HDF) for EOS standard data products. During the ECS design phase, it was realized that while HDF was a good format to use for storing data, further standardization would be advantageous. HDF provided little convention for associating spatial and temporal information with the science data itself. To enable additional standardization, the HDF-EOS data format was developed. This format adds mechanisms for storing geo-referencing and temporal information, data organization, and metadata storage.

Terra instrument teams and users have had several problems with implementing and using the HDF-EOS standard and the ECS data model.

  • The HDF-EOS Grid and Swath provided a natural structure for the bulk of data taken on Terra and other EOS missions; however, there was no convention for storing individual data values. For example, in the case of one producer, real numbers are stored in 14 bits and 2 additional bits are used for a special purpose rather than using all 16 bits to store the number. The HDF-EOS library can access these data; however, translation and other application tools can have problems. If processing is to be performed on individual words or bits, errors can occur if the user is not cognizant of the storage method.
  • There was no convention for packaging both HDF-EOS and HDF objects in the same file. All MODIS (Moderate-Resolution Imaging Spectroradiometer) Level 2 and 3 products are different. Even though they use HDF-EOS structures to store their primary data, many and varied vanilla HDF objects are included in MODIS standard products. MODIS also uses global and local text attributes to store non-ECS metadata rather than dumping it all into the ArchiveMetadata attributes as the HDF-EOS design calls for. This implies that software beyond the HDF-EOS library is required to access the additional attributes.
  • Even though HDF-EOS provides a standard for packaging geolocation information, there was no detailed standard for actually calculating this information. For example, some ASTER products are geolocated using a geoid (geodetic coordinates) while others are geolocated using an ellipsoid (geocentric coordinates). This is not a priori obvious to data users.
  • HDF-EOS has a steep learning curve. Once that hurdle is overcome, platform independence and common packaging provide convenience in access. However, scientists who are used to flat binary format complain about the complexity of HDF-EOS.
  • It was a mistake to try to have one HDF-EOS profile to fit all disciplines. In Terra MODIS case, this leads to unproductive wrangling, an overly broad profile, and poor fit for some (maybe all) disciplines. The lesson learned is to develop strong discipline specific profiles and worry about crossing disciplines later.
  • An important lesson learned from Terra s not to impose immature standards such as HDF-EOS. All the following are needed in no less than launch time minus three years:
    • Need an expert base before products are defined.
    • Need tools to verify proper implementation.
    • Need experienced help desk support (and more) and to help with implementation.
  • There have been many mismatches between ESDT (Earth Sciences Data Type) and metadata output from MODIS production. This has led to a large number of ingest failures. Quality control on the production end is lacking, and it can be traced to the poor versioning on the MODIS processing system end. There would be no problem if the MODIS processing team acquired their Metadata Configuration Files (MCFs) from installed descriptors at the DAACs. In reality, they modify the MCF locally and then send the changes to ECS. As a result, there can be mismatches between the DAACs installed ESDT and what MODIS is using. This problem has all but disappeared since the MODIS processing team is now using only the official MCFs.

3.1.3 AQUA

AQUA is a NASA Earth Science satellite mission mainly designed to study Earth's water cycle. AQUA was formerly named EOS PM, signifying its afternoon equatorial crossing time, as opposed to the morning equatorial crossing time for TERRA. Aqua will carry six instruments in a near-polar, low-Earth orbit. The six instruments are the Atmospheric Infrared Sounder (AIRS), the Advanced Microwave Sounding Unit (AMSU-A), the Humidity Sounder of Brazil (HSB), the Advanced Microwave Scanning Radiometer for EOS (AMSR-E), the Moderate-Resolution Imaging Spectroradiometer (MODIS), and the Clouds and the Earth's Radiant Energy System (CERES). The MODIS and CERES instruments are the same as those onboard TERRA launched in 2000. The AQUA mission launched in May 2002.

The data format and metadata standards for the AQUA instrument data are the same as those for TERRA, namely the HDF-EOS and the ECS data model, respectively. Lessons learned from the AIRS instrument team (Evan Manning, AIRS principle developer) and the AMSR-E instrument team (Dawn Conway, University of Alabama in Huntsville, Lead Software Engineer for the AMSR-E Science Team) on implementing the data and metadata standards are summarized below.

  1. In general, using the HDF-EOS standards requires a fair amount of "buy-in" and has a steep learning curve. Instrument team developers adapted, but casual users had more trouble. For example, it was relatively easy for an instrument programmer to produce the HDF-EOS files using the simple APID. A lot of end-users, however, are reluctant to accept or "buy into" HDF-EOS because it is new. Both the AIRS and the AMSR-E teams found that HDF-EOS is very easy to use.
  2. The HDF-EOS format has adequately supported AIRS and AMSR-E requirements, but:
    • The HDF-EOS should explicitly support field annotations. Without a standard, some developers will add their own annotation to internal HDF objects.
    • The field/attribute distinction is not clear. It seems that a swath attribute is anything that does not have a dimension that is a geolocation dimension. HDF-EOS Swath thinks it's anything with less than 2 dimensions.
  3. The documentation for the HDF-EOS is nearly adequate. It could really use some good sample programs. For example, provide examples that actually do something non-trivial, such as check for error conditions.
  4. While AMSR-E Lead Science Computing Facility (SCF) found that implementation of the required ECS metadata was simple and straightforward; the AIRS team encountered several problems implementing the ECS data model. In fact, the AMSR-E team found the Science Data Processing (SDP) toolkit unnecessary to complete their tasks. It was noted, however, that the ECS keywords should better relate to keywords used in the GCMD (Global Change Master Directory). Problems that the AIRS team encountered are:
    • The ECS tools for implementing the ECS metadata standards are not easy to use. There are some really tricky parts, like setting "hdfattrname" to "coremetadata.0" or "coremetadata" depending on whether it is embedded metadata or not. The interface is generally confusing.
    • The amount of lead-time for adding an ECS Product Specific Attribute or changing attribute valids, etc. is too long.
    • Documentation for the ECS data model is not adequate.
    • The AIRS team supported ESDIS's (led by Bob Lutz) attempts to add new valids for ScienceQualityFlag. The failure of those attempts makes it hard for AIRS to support data access as they would prefer to.
  5. On a general development note, both teams discovered the importance of regular, consistent communications (telecons, meetings, etc.) between the SCF, SIPS (Science Investigator-lead Processing System), DAAC, and ECS.

3.1.4 AURA

Aura is a NASA mission to study the Earth's ozone, air quality, and climate. This mission is designed exclusively to conduct research on the composition, chemistry, and dynamics of the Earth's upper and lower atmosphere by employing multiple instruments on a single satellite. Aura's chemistry measurements will follow-up on measurements that began with NASA's UARS and will continue the record of satellite ozone data collected from the TOMS (Total Ozone Mapping Spectrometer) missions. The satellite will be launched in June 2003 and will operate for five or more years. The Aura data products will be distributed in HDF-EOS5 format. Aura metadata will conform to the ECS data model.

The HDF file format was designed to be a very flexible format. It is able to store many different types of scientific data in a variety of ways. While this flexibility is an asset to customized data storage, it is not ideal when one is trying to ease sharing of data. As there is so much flexibility, two different developers storing the exact same data can store the data in dramatically different ways. To constrain HDF for use in the EOS community, HDF-EOS was developed.

While HDF-EOS constrains HDF with its POINT, GRID, and SWATH interfaces, it is still possible to create two files that are completely different and require dramatically different readers. Areas of potential mismatch include:

  • Organization of data fields and attributes
  • Dimension names
  • Geolocation names and dimension ordering
  • Data field names and dimension ordering
  • Units for data fields
  • Attribute names, values, and units

When the Aura Data System Working Group (DSWG) reviewed the proposed structure of the Level 2 data files from each instrument, it was discovered that each instrument's data files were, at times, quite different. DSWG agreed that with a little work, it was possible to adopt a uniform set of file format guidelines and that it was advantageous to do so. One of the main advantages of this standard is to allow users the ability to use the same set of tools and I/O routines for any of the Level 2 data from instruments within Aura. At the time of this writing, the "HDF-EOS Aura File Format Guidelines" has been adopted by all of the EOS Aura instrument teams. The guidelines contain detailed, specific information on how to store data. All of the items listed above are specifically addressed. As the launch of Aura has not yet occurred at the time of this writing, the outcome of this endeavor has not been determined, but it is hopeful that by adopting a uniform set of strict guidelines that the benefits will be many. The current guidelines can be found at:

http://www.eos.ucar.edu/hirdls/HDFEOS_Aura_File_Format_Guidelines.doc (Microsoft Word version)

http://www.eos.ucar.edu/hirdls/HDFEOS_Aura_File_Format_Guidelines.pdf (Adobe Acrobat format)

3.1.5 QuikSCAT/SeaWinds

The SeaWinds instrument on the QuickScat satellite is a specialized microwave radar that measures near-surface wind speed and direction under all weather conditions and cloud cover. It was launched in 1999 as a follow-on mission to the NASA scatterometer (NSCAT) that flew on the Japanese ADEOS-1 (Advanced Earth Observing Satellite) platform during 1996-1997; and the Seasat-A scatterometer system (SASS), which flew in 1978.

A unique feature of the QuikSCAT/SeaWinds mission is that SeaWinds data are processed, archived, and distributed at both NASA JPL and NOAA NESDIS. SeaWinds data are downloaded from QuikSCAT once every orbit (101 minutes). The stream passes on from the receiving ground station to the Central Standard Autonomous File Server (C-SAFS) at Goddard Space Flight Center (GSFC). The data are then forwarded to both JPL and NOAA. JPL uses these data to produce its science-level wind product, while NOAA uses an altered version of JPL's processing to produce its own Near Real Time (NRT) wind product. This dichotomy can be summarized as follows:

  • While the processing software used at NASA JPL and at NOAA NESDIS is the same, data products produced at JPL are research products (with higher accuracy) used for research and in the application community, while data products from NOAA are near real-time products (within 3 hours of observation) targeted for operational users such as the National Weather Services (NWS).
  • The SeaWinds products distributed by JPL are in HDF format while data products distributed by NOAA NESDIS are in BUFR format. This is because many operational and modeling users use the WMO (World Meteorological Organization) data standards, BUFR and GRiB (GRidded Binary). NOAA is required to provide data to their operational users in BUFR/GRiB format. For the future, the current plan is to move the NRT processing from NOAA to the Physical Oceanography (PO) DAAC at JPL, starting with the ADEOS-II mission in 2002.

3.1.6 ACRIM

For ACRIM, using HDF-EOS was required; however, since mapping the terrain of the Earth was not necessary (ACRIM is solar pointing), the EOS part did not apply. ACRIM was actually using something akin to a subset of HDF. Because ACRIM used HDF in a limited fashion, enough tools were available, but it still required the team to learn almost everything about HDF in order to determine what functions they actually needed. Overall, HDF was relatively easy to implement. Some lessons learned indicate that the following would have been helpful in the implementation of HDF:

  • An instruction manual - "What would have been helpful is a manual with step-by-step instructions; it could have been a quicker implementation."
  • Help desk - "Having someone who could spend a little time over the phone would have been very helpful."
  • Rectifying the problems with creating HDF files with REAL and INTEGER values.

[Frank Boecherer, ACRIM Science Computing Facility, Personal Communication, June 2002]

3.1.7 SeaWiFS

Ten years ago, when SeaWiFS was in development, HDF had some capabilities that were not supported at that time. In the beginning, HDF was largely an image format; it only supported a limited number of data sets, and it had floating point numbers only. The SeaWiFS team identified these deficiencies early on; documented and issued reports; then received responses from National Center for Supercomputing Applications (NCSA). As a result, HDF was made more friendly and easier to use. In addition, the parallel development of HDF for use with IDL allowed users to write their own HDF tools. The main thing that was learned through the experience of implementing HDF into the SeaWiFS project was that good user support is essential. The group at NCSA responded to all of their needs at the time. "That was the thing that made it work - user support, help desk." [Fred Patt, SAIC Project Manager, Personal Communication, June 2002]

SeaDAS (SeaWiFS Data Analysis System) is a comprehensive image analysis package for the processing, display, analysis, and quality control of all SeaWiFS data products, ADEOS / OCTS (Advanced Earth Observing Satellite / Ocean Color and Temperature Scanner, Japan), MOS (Modular Optoelectronic Scanner, Germany), CZCS (Coastal Zone Color Scanner, NASA), and Ancillary data (Meteorological, Ozone). HDF facilitated the development of this powerful tool. The versatility of HDF also allows individuals to develop their own uses within the SeaDAS system. HDF was mandated for the SeaWiFS project because EOS was still under development, and SeaWiFS was to pave the way for future missions. One lesson learned is: allow time to develop tools (or preferably use existing tools) to facilitate ease of use. [Jim Acker, DAAC User Support, Personal Communication, June 2002]

3.1.8 Jason-1

For Jason-1, binary was chosen as the primary data product for historical reasons (continuity). The main advantage of using binary is that it is fast and simple. Once given the read program, it is self-contained. A disadvantage to binary is that each data set requires its own read program.

Initially, one of the problems with HDF was that software to read the format was not widely available, and it did not work on many important computer classes. A second problem, in the past, was that installing the HDF libraries required major system administration knowledge. Also, the initial jump into HDF is difficult and requires a lot of "handholding", but only for first-time users. However, the beauty of HDF is uniformity across mission data sets.

From these ideas, the main lessons drawn are:

  • Before declaring a format "STD", make sure it installs properly and runs on the main machines intended.
  • Understand which classes of users will be EXCLUDED by the new format (for example, the simple binary format of Topex can be read on even a windows 95 computer, but HDF will not install there). It is acceptable to exclude classes of users CONSCIOUSLY, but not because of oversight.
  • Do not underestimate the "handholding" that will be needed to help users install, then run, the new software. HDF, etc. are not 'read programs,' they compare to major operating systems or major commercial packages (IDL, Matlab, Mathematica, etc) in their complexity and their installation can be as complex.

[Victor Zlotnicki, Jet Propulsion Laboratory, Personal Communication, June 2002]

3.1.9 AVHRR

AVHRR data format was based on TIROS data for continuity (level 1B, native binary). However, about 2-years ago, NOAA began offering AMSU data in HDF-EOS along with the BUFR and 1B products. The response to HDF-EOS was great. Almost all of the climate scientists are now using the HDF-EOS format by their own choice. In the future, NOAA hopes to offer AVHRR as an HDF-EOS product, due to customer demand. [Ingrid Guch, National Environmental Satellite, Data and Information Services (NESDIS), Personal Communication, June 2002]

The HDF format has already been chosen for the reprocessing of all AVHRR data for JPL. It was known that the data files would need to be compressed, but the problem was, if just a small part of a big data set was needed, the entire file would have to be decompressed and then the small subset would have to be extracted. With HDF, a chunking process exists (also called tiling). This compresses the data in such a way that it allows storage of data sets in chunks that can be decompressed separately. Thus, HDF-4 was chosen for the reprocessing of the AVHRR data. [Peter Cornillon, University of Rhode Island, Oceanography Department, Personal Communication, June 2002]

3.2 Lessons Learned on Implementing and Using other Standards

3.2.1 NOAA Standards

The National Oceanic and Atmospheric Administration's (NOAA's) National Environmental Satellite, Data, and Information Service (NESDIS) operates NOAA's environmental (weather) satellites and manages the processing and distribution of the data and images these satellites produce daily. NOAA's operational weather satellite system is composed of two types of satellites: Geostationary Operational Environmental Satellites (GOES) for "now-casting" and short-range warning and Polar-Orbiting Environmental Satellites (POES) for longer-term forecasting. Both types of satellites are necessary for providing a complete global weather monitoring system. The primary customer is NOAA's National Weather Service (NWS), which uses satellite data to create forecasts for the public, television, radio, and weather advisory services.

NOAA NESDIS does not use consistent data and metadata formats for their POES and GOES satellite data archive and distribution. The POES and GOES data are processed by the Information Processing Division (IPD) of the NESDIS Office of Satellite Data Processing and Distribution (OSDPD). The IPD is responsible for ingest, processing, and dissemination of environmental satellite data. The GOES data are distributed in McIDAS formats. The POES weather and climate data products are distributed in various different data formats including flat binary file, Level 1b, GIF, ASCII, BUFR, GRiB, HDF-EOS, netCDF, and McIDAS [1].

  • In general, NOAA NESDIS uses multiple distribution data formats to satisfy different user communities' needs [Ingrid Guch, NOAA NESDIS, personal communication]. The National Weather Service or the modeling community (US and international) uses the WMO data standards, BUFR and GRiB. These users have been relying on NOAA to format the data in BUFR and GRiB (as opposed to them taking the data and running their own converter). The BUFR/GRiB formats are very complex, though, and not generally used by the people outside the modeling community.
  • The imaging, climate, and scientific community as well as the NOAA NESDIS maintenance personnel greatly prefer the HDF-EOS data (ease in visualization, combining datasets, using commercial software, etc.). The netCDF format has the same benefit.
  • Other experienced users (education, academic, etc.) seem to prefer a binary or ASCII flat file so they can easily manipulate it and add GIS or whatever extensions they like.
  • Browsing users (education, some academic folks, etc.) prefer the option of ASCII, spreadsheet, and GIF.
  • For satellite data (sensor counts with navigation and calibration appended but not applied), users seem satisfied with the current packed binary file (Level 1b format). The internal NESDIS maintenance personnel have been using an unpacked binary file (Level 1b star) for ease of use in real-time processing. However, this requires recreation of the "unpacked" file from archived metadata and the 1b if reprocessing is necessary (problems occurred in the real-time processing).

Long-term environmental satellite data products are archived and distributed at the NOAA National Climatic Data Center (NCDC). Archive formats used in NCDC are different for different data products. Many products are archived in a custom format and others are in HDF-EOS, Level 1b, ASCII, or JPEG [Kathy Kidwell, NOAA NCDC, personal communication, 2002]. Data distribution formats are the same as the archive formats in NCDC. Lessons learned on NOAA data standards are summarized below:

  • Since NOAA is an operational agency and its main customer is the NWS, NOAA NESDIS is required to distribute their satellite data in BUFR/GRiB format to the NWS or the modeling users, although there are many problems with the BUFR/GRiB format [Ingrid Guch, NOAA NESDIS, personal communication; 2002].
  • NOAA NCDC has many legacy systems and they have problems translating data to/from BUFR/GRiB format [Geoffery Goodrum, NOAA NCDC, personal communication, 2002].
  • The NOAA NESDIS staff have had a positive experience with the HDF-EOS data format [2] and their users, mainly imaging, climate, and scientific communities, like the HDF-EOS format because of the flexibility, tools, and vendor support [3].

3.2.2 The Spatial Data Transfer Standard (SDTS)

The Spatial Data Transfer Standard became a Federal Information Processing Standard (FIPS 173) in 1992, after a 10-year development effort. It was to serve as the national spatial data transfer mechanism for all U. S. Federal agencies, and to be available for use by state and local government entities, the private sector, and research organizations. SDTS specifies exchange format constructs, addressing structure, and content, for spatially-referenced vector and raster data, to facilitate data transfer between dissimilar spatial database systems.[4] The Spatial Data Transfer Standard (SDTS) doesn't prescribe a single data model; rather it provides a set of rules intended to represent virtually any data model.

However, SDTS fell short of its ambitious goals; and the marketplace was slow to accept and support it. Arctur et al. [5] list a number of reasons for this:

  • Complexity - SDTS was driven primarily by large national-level data producers and their needs (very large databases, complex interdependencies, high precision, flexible models, extensive metadata, collaborative updates, etc.). These needs far exceeded those of casual "desktop GIS" users and of most commercial, regional, or local GIS projects, and they stretch even today's GIS technology to its limits. Many people in the GIS community found SDTS to be overly complex, few understood its intended purpose, and thus few chose it when other, more established formats were available.[6] (Arctur et al. [5] suggest that as GIS users become more sophisticated, they may demand more of their technology (including data models and formats), and be more able and willing to cope with the implied complexity.)
  • Slow development of the standard in a fast-changing market - In the decade that elapsed between the first work on SDTS and its final adoption as a standard, the GIS industry grew significantly, and several vendor-specific exchange formats came into widespread use, which satisfied many users' immediate needs, and thus limited the community's interest in using SDTS (which many perceived as yet another format). Even though the standard was mandated for all federal agencies, most data suppliers, responding to user demand, offered alternative data encodings - and only the most curious and experimental users chose SDTS.
  • Limited vendor support - SDTS got caught in a "chicken-and-egg" situation with GIS vendors: in order to build market demand for SDTS-aware software, data providers needed to produce large volumes of SDTS data. But they needed to use commercial GIS products to build these data; so they had to persuade vendors to produce SDTS products in the absence of customer demand. A few vendors did include STDS conversion tools in their products (e.g., ESRI's Arc/Info, Laser-Scan's Gothic); however different products interpreted SDTS ambiguities differently (see below), so they would often fail to translate unexpected STDS constructs introduced by another vendor's product.
  • Slow development of practical profiles - SDTS was a very general standard: any practical use of it required users to agree on a particular profile. But due to the complexity of SDTS, and the limited educational material (such as usage examples) available to the geospatial community, it took another four years to complete the first usable profile of SDTS (the Topological Vector Profile). The lack of interest in, and understanding of, SDTS among the GIS community also reduced the demand for useful profiles, and the community's enthusiasm for working on them. In the end, this first profile proved to be both limiting (encoding fairly mundane examples required awkward workarounds) and unnecessarily complex (it required arc/node/polygon topology, which was unnecessary or even meaningless for many commonly-used cases). [7]
  • Harmonization delays - Subsequent efforts to define other SDTS profiles (the Raster Profile and Transportation Network Profile) were almost complete when they became mired in attempts to harmonize them with similar standards being developed in NIMA, NATO, and the European Union. This resulted in further delays to their development. (Arctur et al. [5] suggest that early harmonization is easier, and that profiles should not be developed so quickly as to overlook other, related standards.)
  • Ambiguity in the data model (e.g., the cardinality of relationships) and the data semantics (e.g., the meaning of relationships among entities) of SDTS and its profiles limited the utility of SDTS for reliable information transfer. (Arctur [8] likens an SDTS profile to a game in which teams agree on the size of the ball and the shape of the field, but not on the rules of play.) SDTS was supposed to be very general, and to make datasets self-describing; that is, the data model could be determined from the dataset contents. But this proved an elusive goal; and thus many even of those who were willing to be "SDTS pioneers" ultimately concluded that its practical value was limited.

In addition, during and after the development of SDTS, new, unanticipated technical expectations arose, which demanded significant technical (re)design and international coordination, and further weakened the community's support for SDTS:

  • a standard means of representing subtiles within a dataset;
  • support for permanent, universally unique object identifiers across all datasets;
  • support for value-added extensions and incremental updates by users;
  • support for tracking changes and historical lineage of features and spatial primitives;
  • harmonizing the metadata content with emerging international standards; and
  • harmonizing repository organization with emerging OpenGIS software interfaces.

Some of these issues might have been anticipated in the design of SDTS, while others stemmed from the increasing sophistication of GIS products and their users over the years.

The need for harmonization with OpenGIS led to OpenGIS' work on interface specifications for access to geospatial data (features, coverages, identifiers, etc.). Since the late 1990s, OpenGIS has been the locus of much subsequent work in this area. It focused first on accessing geospatial data (e.g., Simple Features Access for SQL, COM, and CORBA), then on encoding geospatial features in XML (Geography Markup Language (GML)) for transfer between clients and servers.

In summary, the SDTS experience illustrates the importance of keeping pace with technology and market trends and emerging expectations, even after capturing initial requirements. It shows the role of timing: a standard may be "ahead of its time" (arriving before people are ready to understand them or accept more complexity) or "overcome by events" (arriving after people are used to making do without flexible, general, or vendor-independent solutions). Paradoxically perhaps, SDTS was both!

The SDTS experience also underscores the need to balance advanced needs with more basic ones; the importance of good documentation and usage examples; the challenge of "priming the pump" among vendors in advance of market demand; the benefits and risks of harmonizing with related standards; and the futility of mandating a standard that fails to meet a need.

References:

[1] NESDIS Satellite Product Overview Display, http://osdacces.nesdis.noaa.gov:8081/satprod/products/prod_frameset.cfm?prodid=-1

[2] Huan Meng, Doug Moor, Limin Zhao, Ralph Ferraro, HDF-EOS at NOAA/NESDIS; Presentation; HDF-EOS Workshop, 2000, Landover, MD: http://hdfeos.gsfc.nasa.gov/hdfeos/WSfour/meng/hdfeos4.ppt

[3] Andrew S. Jones and Thomas H. Vonder Haar; A Dynamic Parallel Data-Computing Environment for Cross-sensor Satellite Data Merger and Scientific Analysis; Accepted for publication by Journal of Atmospheric and Oceanic Technology; March 2002.

[4] Fegeas, Robin (1995). Q3.3: What is this SDTS thing and is it available via ftp? In GIS-L / comp.infosystems.gis Frequently Asked Questions and General Info List (Lisa Nyman, ed.). http://www.prenhall.com/startgis/faq.html.

[5] Arctur, D., Hair, D., Timson, G., Martin, E., and Fegeas, R. (1998) Issues and Prospects for the Next Generation of the Spatial Data Transfer Standard (SDTS). International Journal of Geographical Information Science 12(4): 403-425.

[6] Hastings et al. (1996) The Spatial Data Transfer Standard: Closing the Loop? - Panel Discussion at GIS/LIS--Denver, Colorado--November 19, 1996. http://www.ngdc.noaa.gov/seg/tools/sdts/gislis_main.html

[7] Kelley, C., and Gosinski, T., 1994: Spatial Data Transfer Standard: do you fit the profile? GIS World, August 1994, pp. 48-50. http://wwwsgi.ursus.maine.edu/gisweb/spatdb/urisa/ur94076.html

[8] Arctur, David K., 1996: Spatial Data Transfer Standard: A GIS Vendor's Perspective. Online article. http://web.archive.org/web/19990222153931/www.lsl.co.uk/~arctur/portfolio/sdts.html


4.0 Essential Standards Concepts

Before evaluating individual data or metadata standards, it may be useful to review several key concepts crucial to understanding and comparing standards.

  • A comparison with private, ad-hoc, binary information transfer
  • Mandatory vs. optional elements of a standard; profiles and extensions
  • Abstract vs. implementation standards
  • Content and format standards vs. behavior and interface standards

4.1 A Comparison with Private, Ad-hoc, Binary Information Transfer

Webster's dictionary defines a standard as: "That which is established as a rule or model by authority, custom, or general consent." Thus, standards exist only within a community of people sharing certain usage patterns ("custom") or organizational structures (formal "authority" or informal "general consent"). The emphasis in this study is on standards accepted by a fairly broad set of users, publicly documented, and either stable (unchanging) or changeable only by a consensus among these users.

Another aspect of standards is that they govern only a part of the information transfer process. For instance, GeoTIFF codifies the georeferencing of an image, but is silent on the meaning of its pixel values. Whatever a standard does not specify is left to the private (often implicit) understanding of each user community or to ad-hoc ancillary information (such as a README file or a telephone message describing data details). So, at one extreme, complex and rigid standards specify every aspect of information transfer, and at the other extreme, private agreements or ad-hoc communications leave everything implicit or unstructured. Most standards fall somewhere in between; they govern a certain piece of the information transfer process to let a certain set of users communicate or work together, but users must also rely on other standards, private agreements, or ad-hoc qualifiers.

Many earth science data users favor a "raw binary" data format that is both simple and comprehensive. In fact, "raw binary" doesn't actually mean mysterious data files that one must guess at - but, rather, a simple format (often some kind of raster grid) used by a small set of colleagues with little attention to documentation or stability. Thus, "raw binary" denotes not a single format, but as many different formats as there are workgroups. Each such format has a syntax and semantics invented just for that data set or data series, usually without a lot of attention to other related formats, and with many details left implicit or provided "out of band" in a mission report or some other natural-language document.

Such a format may serve many people's immediate needs, for several reasons.

  • A given science team often works with only one kind of data and so gets used to the one syntax for that data (e.g., keeps reusing the same parser) and the one set of semantics (e.g., pins a page from the mission report to the cubicle wall).
  • Science teams traditionally put a low emphasis on making their data accessible to others outside of their immediate colleagues. They may feel that they have done their job by distributing a simple "README" file with the data.
  • ESE data is commonly encoded as raster grids for which one can make easy guesses as to syntax (band-sequential, etc.).
  • Most importantly, perhaps, the semantics of much ESE data (platform orientation, sensor model, calibration information, interpretation algorithms) are so complex that bundling them with the data is often difficult; so they tend to remain in a mission report document - or in people's heads. (For example, each MODIS L1b granule has dozens of ancillary data items required for proper interpretation along with several grids of data error and reliability estimates.)

However, the use of raw binary data relies much more on private agreements among colleagues than on documented, consensus standards. It has many of the properties opposite to those of standards, as listed above in the "Rationale for Standards" paragraph. It limits the ability of science teams to move beyond traditional work methods towards more effective interdisciplinary research, collaborative work, and applications. The essential points are:

  • Data in a standard formats (should) convey something about the data that its users need to know; whereas, users of binary data must rely on inside knowledge or educated guesses to read and interpret the data.
  • Data in a standard format may be used outside of an "inner circle" of colleagues, but only holders of the necessary private information can use raw binary data.
  • Standard data formats limit the need for pair-wise translators to and from every possible format, whereas, each raw binary format needs a different translator.
  • Standard data formats, by fixing the syntax and semantics of information, allow the possibility of machine-to-machine communication between different systems (that is, interoperability). In contrast, raw binary data requires human inspection an intervention, thus, hindering (preventing) system interoperability.
  • Standard data formats facilitate unambiguous transfer of information between users of different systems working with different datasets. This is more difficult with raw binary data, which often loses all but raw pixel values in translation.

In summary, the use of private agreements does not constitute a standard, and so "raw binary" data formats cannot be compared alongside open consensus standards.

Of course, members of a community may choose to turn a private, internal convention into a standard for a wider community by documenting and publishing their shared syntax and semantics and by sticking to what they document (that is, submitting any changes to a public consensus process or formal authority). Most standards are born this way when a usage community publishes its internal conventions to facilitate collaboration with others.

4.2 Mandatory vs. Optional Elements, Profiles and Extensions

Many standards have a set of mandatory elements to ensure basic interoperability plus a set of optional elements to serve a diversity of users and uses. This provides a "base" standard from which a particular community of users may define a profile (a more specific standard) to support richer communication among themselves, or more fine-grained control of each other's services. A profile is a standard derived from a base standard by adding restrictions: it may require (or exclude) an element that is optional in the base standard; it may limit the valid entries under a heading; it may fix the cardinality of a repeating element; and so on. But, the profile cannot contradict the base standard; anything mandatory in the base standard remains mandatory in the profile. Thus, any product that complies with the profile will comply with the base standard. [1, 2]

One profile presented here is HDF-EOS, an EOS-specific adaptation of the very general Hierarchical Data Format. Of the many different file structures that are possible with HDF, HDF-EOS defines three (point, grid, and swath), each with spatial and temporal details alongside scientific data. In another example, FGDC's Metadata Content Standard has allowed several community-specific profiles to be defined, and in fact, the International Organization for Standardization's (ISO) Metadata standard was designed primarily for profiles. It defines several hundred elements of which fewer than 20 are required; the remaining elements are shared vocabulary (i.e., a dictionary) for building profiles. [3]

Related to profiles is the notion of extensions. These are elements added to the base standard by consensus among a certain community of users. As with profiles, extensions do not contradict the base standard - what's mandatory remains mandatory; products that fit the extended standard have everything needed by the base standard, and more. Nonetheless, by adding more loosely controlled, loosely defined elements to a standard, extensions may complicate the interoperability and maintenance of the standard.

For example, the earth imagery user community has extended the FGDC metadata content standard to more fully describe remotely sensed data by adding metadata elements such as the sensor model and the orbital platform, both of which the base standard doesn't provide [4].

4.3 Abstract vs. Implementation Standards

Standards and specifications for information systems are defined primarily at two different "levels of abstraction;" implementation specifications and abstract models [5].

  • Implementation specifications tell software developers how to express information or requests within particular distributed computing environments (such as XML, Java, or the World Wide Web). Such standards define data formats, access protocols, object models, naming conventions, etc., in terms that are directly usable within the targeted computing environment.
    • Implementation specifications are the more immediately useful standards when they apply to one's chosen computing context. The data-format standards are implementation specifications, as are the eXtensible Markup Language (XML) encodings of FGDC, ISO, and other metadata standards seen in the Appendix, Section 3.0.
  • Abstract models specify what information or requests are valid or required in principle, irrespective of individual computing environments. They define the essential concepts, vocabulary, and generic structure (type hierarchy) of computational services and information transfer. Although not directly usable to build data or software, these models set the stage for creating implementation specifications and for extending existing ones to new environments.
    • Abstract models provide well-known semantics that can support interoperability through translators or cross-reference tables. For instance, thanks to FGDC's Content standard, Z39.50's GEO profile can "normalize" any FGDC compliant metadata (regardless of actual record formats or field names) for external access - that is, map its internal data elements to the GEO field names for external access.
    • In general, consensus-based abstract models of data are often termed "content standards." They define the information elements and their intended meaning (semantics) independently of their syntax - that is, independent of how these elements may be encoded in files on disk or along a communications link. In principle, content standards allow different parties to communicate meaningfully by mapping their data element names to those of the content standard even when they use different formats for their data. This works well for fairly simple data structures such as the "parameter=value" pairs of many metadata files and catalog records. However, with more complex syntax or semantics, translating the abstract concepts of the content standard into the terms of a particular format often becomes an interpretation task requiring judgment calls, assumptions, and ambiguity. So in practice, content standards alone may not suffice for transferring complex data between different user communities without information loss or distortion.

4.4 Content and Format vs. Behavior and Interface

Table 4.4.1 shows that at each level of abstraction certain standards define the interfaces that allow different systems to work together or the expected behavior of software systems. This is the computation viewpoint, whose accent is on invoking services effectively and unambiguously. Other standards define the content of geospatial information or its encoding (or packaging) for accurate transfer between different processing systems. This is the information viewpoint, which emphasizes efficient, lossless communication [5].

Table 4.4.1 Viewpoints and Levels of Abstraction

Service Invocation (computation viewpoint) Information Transfer (information viewpoint)
Implementation specifications ("how") Interface Encoding (format)
Abstract models ("what") Behavior Content

For distributed computing, both of these viewpoints are crucial and intertwined. For instance, information content isn't useful without services to transmit and use it. Conversely, invoking a service effectively requires that its underlying information be available and its meaning clear. However, the two viewpoints are also separable: we may agree on how to represent information regardless of what services carry it; conversely, we may define how to invoke a service independently of how we package the information needed or conveyed by the service.

In a given context, either the computation view (implemented as interfaces) or the information view (implemented as formats) may take precedence. Tables 4.4.2 and 4.4.3 below show a few guidelines for prioritizing standards definition or adoption in certain contexts. In general, however, deciding which view to emphasize in a given setting is not straightforward.

Table 4.4.2 Criteria For Format Standards
Worry about a data format standard when ... Don't worry about a data format standard when ...
Users of different formats need to share or communicate data with each other. There's no reason for users of different formats ever to share information.
Each user group (or each user) uses a different format. A user consensus already exists on one or a few non-proprietary data formats.
Available formats fail to convey all the information needed for proper use. (Thus users have to rely on implicit knowledge or ad-hoc notes to use the data.) A practical, reasonably simple data format conveys all of the information users need.



Table 4.4.3 Criteria For Interface Standards
Worry about a service interface standard when ... Don't worry about a service interface standard (i.e. rely on FTP / FedEx) when ...
Most users want the output of a few well-known processing operations, such as subsetting, filtering, transformations, etc. Most users need direct access to raw data (as archived) for ad-hoc processing and analysis.
The intended applications are streamed or interactive - they only use parts of the available data at a given moment. Most use of the data requires all of it (full size and detail) to be present simultaneously.
No one reasonably simple format will ever meet everyone's needs. (A service allows users to request the data they need in a format that fits it.) Users have not begun to map their workflow to online database transactions or Web services.

Among the data standards reviewed in this report, GeoTIFF, Landsat Fast Format, and BUFR/GRiB are clearly file format standards; they specify an encoding and are silent on what access interface to use. HDF, HDF-EOS, and netCDF provide a software library to facilitate reading and writing data files, but they too are file format standards; they don't specify a format-neutral interface to a service. Table 4.4.4 compares the data models and software access libraries for a variety of data packaging standards.


Table 4.4.4 Data Models and Software Access Libraries
Data Format Logical Model Physical Model Software Access Libraries
HDF
  • Disk format, hierarchical, and similar to Unix file systems
  • Self-description provided in global and local (individual objects) attributesHeader describes disk structure with metadata & pointers
  • Usable for general scientific data storage; HFD4 data model contains: arrays, tables, raster image and text objects. HDF5 data model has HDF4-type objects imbedded within arrays and text attribute objects.
  • Will support extended (multiple machine) files
  • XDR-based
  • Storage layout is contiguous (serial) or chunked (direct access)
  • Datasets consist of header attributes & data
  • Machine-independent
C, C++, FORTRAN, Java
HDF-EOS
  • HDF-based: Versions 4 and 5
  • Provides standard for geolocation data map to science data .
  • Point Structure: model for sparce, randomly geolocated data
  • Swath Structure: model for data best organized by time, latitude or track parameter
  • Grid Structure: model for data organized spatially and projected.
  • (Same as HDF)
  • XDR-based
  • Storage layout is contiguous (serial) or chunked (direct access)
  • Datasets consist of header attributes & data
  • Machine-independent
  • Disk format is available to user
C, C++, FORTRAN
netCDF
  • Self-describing
  • Usable for general scientific data storage
  • XDR-based
  • Storage layout direct access-- indexed
  • Datasets consist of header & data
  • Machine-independent
  • Disk format is hidden
C, FORTRAN, Java, Perl, Python, Ruby. Tcl/Tk
GeoTIFF
  • TIFF-based, with geolocation tags
  • Raster image data only
  • Multiple images can be stored in a single file.
  • Version 2 will support extended files
Storage layout allows random access to pixels by band, strip, or tile C, Perl, Python, Java
BUFR
  • Tailored to atmospheric data - point data
  • Based on sequential,, tape format
  • Storage layout is serial
  • Dataset consists of header + data
FORTRAN 77
GRiB
  • Tailored to atmospheric data - gridded data
  • Based on sequential, tape format
  • Storage layout appears to be serial - "messages"
  • Dataset consists of header + data
Command-line translators to ASCII or IEEE binary
Fast Format Multi-band image data
  • Separate header and data files
  • Direct access to individual bands
Users write their own software based on examples
Binary
  • Data model chosen by user.
  • Record, data types determined by specific platform.
  • Different for every product
  • Machine dependent
  • Custom software
  • Users must write their own

See Acronym List if needed


4.5 Web-based Data Service Standards

The World Wide Web is driving rapid development of format-neutral service interface standards. Examples particularly relevant to ESE data include the OpenGIS Web Coverage Service [6] and Web Map Service [7] and the Distributed Oceanographic Data System (DODS) [8].

The OpenGIS Consortium (OGC) Web Coverage Service (WCS) is likely to become an OGC specification in early 2003. It will provide access to images, imagery collections, and other systematic "fields" of values or measurements - usually arrayed on a 2D or 3D spatial grid. It fully describes the data's spatial location and its semantic content and allows clients to request subsets in space or along any of the data dimensions using a syntax based on either Uniform Resource Locators (URLs) or structured XML messages. The EOSDIS Core System (ECS) Synergy effort intends to provide WCS access to its large online data holdings ("data pools"); and the GLOBE educational project ("global learning and observations to benefit the environment:") has begun experimenting with WCS and WMS (next).

The OGC Web Map Service (WMS) provides access to rendered maps and pictures using a simple, spatial query syntax and common graphics formats (PNG, JPEG, etc.). Since its inception in early 2000, this interface has seen widespread implementation by many vendors, laboratories, and open-source efforts.

The Distributed Oceanographic Data Service provides format-neutral access to scientific datasets; its query syntax allows for "slicing" or "sampling" a dataset along any of its variable values. DODS originated at MIT and the University of Rhode Island (URI) in the mid-1990s; since then, it has seen a fair bit of implementation in the oceanographic community and among NASA DAACs. Recently, URI and NASA-DAACs have built "gateways" from DODS to WMS and WCS; and URI has begun defining two distinct successors to DODS: an "Open Source Project for a Network Data Access Protocol" (OPeNDAP) (tools for generic infrastructure protocols) and a "National Virtual Ocean Data System (NVODS)" (to supply oceanographic data and applications). [9]

(Notable Web-based services in the ESE environment include the University of Maryland's MOCHA project ("Middleware based on a code-shipping architecture") [10]; the Tropical Rainforest Information Center (TRFIC) at Michigan State University [11]; EOS-Webster at the University of New Hampshire [12]; and many others. However, these are not service interface standards but, rather, particular implementations of distributed systems. Although they provide a useful benefit to their users, they are not linked by a well-defined, published service interface standard; instead, they rely on tightly coupled components or on unpublished or proprietary interfaces.)

Finally, a number of vendors in the world of e-commerce have championed the notion of "Web Services" [13] consisting of the Web Services Description Language (WSDL) [14]; Simple Object Access Protocol (SOAP) [15]; and Universal Description, Discovery, and Integration (UDDI) [16]. These industry specifications have gained broad visibility and offer a lot of promise for Web-based data access; however, the dust is far from settling on this very active area of technology development.

Generally, the use of Web-based services is still only emerging in practical ESE work. The primary mechanism for information interchange in the ESE context remains the transfer of discrete files; it will take some time before Web-based services become a part of mainstream data access and distribution in ESE. Accordingly, this document treats format and content standards only for the near-term missions.

References:

[1] ISO TC211 (2001), "Geographic Information - Profiles" http://www.isotc211.org/protdoc/211n1134/211n1134.pdf

[2] Federal Geographic Data Committee (1998), Content Standard for Digital Geospatial Metadata: http://www.fgdc.gov/metadata/csdgm/ : Appendix D; Appendix E.

[3] Simon Cox (2001), Summary of some geospatial metadata standards: http://www.ned.dem.csiro.au/research/visualisation/metadata/geospatial/

[4] NASA (2001), Digital Earth Reference Model v0.5: http://www.digitalearth.gov/derm/v05/

[5] FGDC (2001), Content Standard for Digital Geospatial Metadata: Extensions for Remote Sensing Metadata: http://www.fgdc.gov/standards/status/csdgm_rs_ex.html

[6] Evans, John D. (2001), OGC Web Coverage Server (WCS), Discussion Paper #01-018: http://www.opengis.org/techno/discussions/01-018.pdf

[7] de La Beaujardière, Jeff (2001), OGC Web Map Server Interface Implementation Specification, version 1.1.1: http://www.opengis.org/techno/specs/01-068r3.pdf

[8] Distributed Oceanographic Data System (DODS): http://www.unidata.ucar.edu/packages/dods/

[9] Cornillon, Peter (2002), "DODS: OPeNDAP providing plug-and-play interoperability in a distributed data system," presentation at the 9th Assembly Meeting of the ESIP Federation, University of Maryland, College Park, May 15-17, 2002. http://www.esipfed.org/business/library/meetings/9th_fed_meeting/ppt/DODS.PPT

[10] The MOCHA project: Self-Extensible Middleware Architecture. http://www.cs.umd.edu/projects/mocha/

[11] Tropical Rain Forest Information Center (TRFIC). http://www.bsrsi.msu.edu/trfic/

[12] EOS-WEBSTER: Earth Science Data from the University of New Hampshire. http://eos-webster.sr.unh.edu/

[13] World Wide Web Consortium (W3C), 2002: Web Services home page. http://www.w3.org/2002/ws/

[14] World Wide Web Consortium (W3C), 2002: Web Services Description Language (WSDL) Version 1.2: W3C Working Draft 9 July 2002. http://www.w3.org/TR/2002/WD-wsdl12-20020709/

[15] World Wide Web Consortium (W3C), 2002: SOAP Version 1.2 Part 1: Messaging Framework. http://www.w3.org/TR/soap12-part1/.

[16] Universal Description, Discovery, and Integration (UDDI) of Business for the Web. http://www.uddi.org

5.0 Standards Evaluation

In order to objectively assess the data and metadata standards identified in Chapter 2 for the ESDSWG near-term missions, an analysis is carried out to evaluate the standards according to many features or criteria. Furthermore, a user opinion interview/survey is conducted to gather user community's feedback on using the standards.

5.1 Evaluation Criteria

Many features or criteria can be used to evaluate the data and metadata standards identified. The intention of this study is not to identify one all-purpose standard but, rather, to identify appropriate use of the standards. For example, some standards are more suitable for transmission and archiving while others for analysis. For transmission and archiving, the most important features standards should have are semantic completeness, portability, self-description, extensibility, interoperability, etc [1]. For analysis, standards should have features such as ease of use, analysis tools support, etc. Many of these features and others are defined below.

1. Interoperability - Tools exist to translate to other standard formats with no information loss.

* Is there a defined relationship or semantic equivalence between this standard and other standards? i.e., can the standard be broken into elements that have the same content as elements for other standards?

* Is the definition sufficiently precise to allow development of a translation algorithm between standards?

* What translation tools (well known) have been developed?

2. Availability - Source code for writing and reading data in the format is widely and publicly available.

* Is the source code for writing and reading data widely and publicly available?

* Is the software for reading and writing well documented?

* Are the search and order methods for data using the format well understood and established?

3. Portability - Data in this standard can be used on a variety of platforms or in a variety of applications (vendor support).

* Is the format sufficiently well defined so that data can be ported to new commonly used platforms with minimal effort?

* Is the format sufficiently well implemented that new applications can access the implementation with minimal effort?

* Can the standard be implemented on one platform and installed and tested on other platforms with minimal modification of source code? i.e., machine dependent code is minimized.

4. Evolvability - A clear process for maintaining and evolving the standard exists.

* Is there a methodology for adding new features to the standard?

* Is there a software development process?

* Is there a standard for documentation?

* Is there an open process for evolution?

5. Extensibility - Support for extensions and profiles exists.

* Does the standard allow extensions or profiles to be developed?

* Are there extensions or profiles developed for the standard?

6. Self-describing - Files contain data descriptions along with the data.

* Can data in this format be read without a separate document detailing file contents?

* Can the data be described internally to facilitate development of applications?

* Does the format contain information to allow geospatial, temporal, and/or spectral subsetting?

7. Tools Support - Software tools are available to support the standard.

* Does the standard have freeware support?

* Does the standard have COTS (Commercial Off-The-Shelf) software support?

8. Completeness - The capacity to carry semantic descriptive elements of the data explicitly and unambiguously. Higher levels of completeness can reduce the user's dependency on outside information, implicit knowledge, or guesswork when interpreting and applying the data.

* Can the format carry everything users need to use the data correctly? i.e., can the format convey the data's precise spatial location, its units of measure, the observation parameters (e.g., spectral bands), accuracy estimates (error bars), and other elements needed to understand the data and apply it?

5.2 Data Standards Evaluation

Using the standards evaluation criteria defined above, Tables 5.2.1 through 5.2.8 analyze and compare data standards in use in heritage missions and other ESE missions.

Table 5.2.1 Data Standards Interoperability

Data Standard

Evaluation Questions

Is there a defined relationship or semantic equivalence between the standard and other standards?

Can translation algorithms be developed easily?

What Translation Tools (well known) developed?

HDF

Yes. Since HDF can contain general scientific data, it encompasses all the other standards.

Yes, HDF has a well-documented software API.

GIF <-> HDF5

HDF4 <-> HDF5

Ensight6 -> HDF5

HDF-EOS

Yes. As a superset of HDF, it also encompasses the other standards.

Yes, Point, Grid Swath add-on structures are well-documented.

GIF <-> HDF5

HDF4 <-> HDF5

Ensight6 -> HDF5

GeoTIFF

Yes, for image-based standards; no, for non-image standards.

Yes. Public domain API library partially documented.

Lots of converters for TIFF; also GeoTIFF tag read & write

Specialized converters for L7, MODIS, MISR, ASTER

Fast Format

No

No. No API or library exists.

No

Native Binary

Depends on the standard. Most are specific to the application.

Depends on the standard, but usually not, unless specific efforts are made to document and publish an API.

No, You have to write your own translation tool

netCDF

Yes. Since netCDF can contain general scientific data, it encompasses all the other standards.

Yes. Net CDF has a well-documented API.

-> HDF

-> Matlab5

BUFR/GRiB

Yes - translation of meteorological parameters to other formats is possible, with no loss of content. No for non-meteorological standards.

Yes

BUFR -> CDF

See Acronym List if needed

Table 5.2.2 Data Standards Availability

Data Standard

Evaluation Questions

Source code for writing and reading data widely available?

Read/write software well documented?

Format well described to facilitate application development?

HDF

Yes

Yes

C, C++, Fortran, and Java interfaces exist. Applications must use one of these interfaces to access the data

HDF-EOS

Yes

Yes

C, C++, Fortran, and Java interfaces exist. Applications must use one of these interfaces to access the data

GeoTIFF

Open source libraries; many COTS and freeware applications available

User interface well documented

TIFF format well documented. COTS venders sometimes use variations of the standard.

Fast Format

No

No

No

Native Binary

Not always

Not always

Not always

netCDF

Yes (C, C++, FORTRAN, Perl)

Yes

Yes

BUFR/GRiB

There are few slightly different read and write software from different organizations or countries

Not always

Not always

See Acronym List if needed

Table 5.2.3 Data Standards Portability

Data Standard

Evaluation Questions

Portable among commonly used platforms?

Format is sufficiently well implemented that new applications can access the implementation with minimal effort?

Standard can be implemented on one platform and installed and tested on other platforms with minimal modification of source code?

HDF

Precompiled HDF libraries for a variety of popular platforms such as AIX, Cray HP,SGI,Sun, Linux and Windows.

Yes

Yes

HDF-EOS

Precompiled HDF-EOS libraries for a variety of popular platforms such as AIX, HP, SGI, Sun, and Linux.

Yes

Yes

GeoTIFF

Works on common OS's (Linux, Unix, Windows). Designed to be portable, but need some knowledge of specs.

Need some knowledge of the specs., Need understanding of geotags to develop applications.

Yes

Fast Format

Yes

No

Yes

Native Binary

Usually not

No

No

netCDF

All major OS's: Winx, Unix, Linux, MacOS

Yes

Yes

BUFR/GRiB

YES

A generalized application would require in depth knowledge of all variants, which is not easy to obtain

YES

See Acronym List if needed

Table 5.2.4 Data Standards Evolvability

Data Standard

Evaluation Questions

Is there a methodology for adding new features?

Is there a software development process?

Is there a standard for documentation?

Is there an open process for evolution?

HDF

NCSA is a currently active and outside funded group whose purpose is devoted to the HDF project. They manage development schedules and are open to suggestions from users. They are funded from a variety of sources.

Yes, HDF library is funded and developing software.

Yes, HDF library follows an internally defined standard for their documentation.

Yes, HDF group allow input from outside users

HDF-EOS

Support is a contract from NASA. They respond to suggestions from users. It is NASA's decision on how long to support the contract and whether to supply money for development as well as maintenance.

Yes, HDF-EOS library is funded and developing software.

Yes, HDF-EOS library follows an internally defined standard for their documentation.

Yes, HDF-EOS group allow input from outside users

GeoTIFF

Maintained by JPL..No formal process, i.e. Standards committee. The standard can be modified by others.

Yes

Yes

OpenGIS, but no formal process. Work on the GeoTIFF v2.0 spec has been slow recently, with some recent efforts

Fast Format

No

No

No

No

Native Binary

No

No

No

No

netCDF

Yes, through Unidata

Yes

Yes

Yes, informally through Unidata

BUFR/GRiB

YES

NO

Appears so, WMO issues Tech. Docs. on these formats

The WMO CBS approves changes to the format and maintains a software registry

See Acronym List if needed

Table 5.2.5 Data Standards Extensibility

Data Standard

Evaluation Questions

Does the standard allow extensions or profiles to be developed?

Are there extensions or profiles developed for the standard?

HDF

Yes

HDF-EOS is a profile which was developed.

HDF-EOS

This is a profile of HDF

No

GeoTIFF

Yes. New projections can be added. Multiple-band GeoTIFFs allowed. GeoTIFF 2.0 will allow external files.

None that are not part of unofficial list of projections

Fast Format

No

No

Native Binary

No

NO

netCDF

Yes

Yes, e.g., MINC: (Medical Image netCDF)

BUFR/GRiB

YES

Not sure

See Acronym List if needed

Table 5.2.6 Data Standards Self-Describing

Data Standard

Evaluation Questions

Is data able to be stored so that it can be read without a separate document detailing file contents? .

Can the data be described internally to facilitate development of applications?

Does the format contain information to allow subsetting?

HDF

Data can be stored so that it is self-describing. There are no restrictions in the standard though to prevent developers from using names such as Variable1.

Data can be described with enough detail to allow applications to process data appropriately. For instance, scale factors may be included but it is developer dependent on how to do this. As a result, generic applications are limited in their scope. Applications developed for a specific data set can be very precise.

Yes, information can be supplied to allow subsetting, but there is not a requirement to do so in a consistent way. Subsetting by selecting selected data fields can easily be done on any HDF file.

HDF-EOS

Data can be stored so that it is self-describing. There are no restrictions in the standard though to prevent developers from using names such as Variable1.

Data can be described with enough detail to allow applications to process data appropriately. For instance, scale factors may be included but it is developer dependent on how to do this. As a result, generic applications are limited in their scope. Applications developed for a specific data set can be very precise.

Because of the profile, subsetting along certain geolocation fields can be done. Individual developers can break this process by not following the profile (there is no internal c