Government of India
Ministry of Statistics and Programme Implementation
Data Dissemination: National Metadata Structure (NMDS) For Statistical Products
Introduction
National Statistical Office (NSO), Ministry of Statistics & Programme Implementation, presents and disseminates data and metadata through different products like Census data (Economic Census); Survey data such as NSS Surveys viz. Household Surveys, etc, Annual Survey of Industries (ASI), Consumer Price Indices (CPI), and macro-economic aggregates like National Income, Index of Industrial Production (IIP). In addition, statistical data is presented in analytical publications such as NSS Reports, Annual Survey of Industries Reports, National Indicator Framework (NIF) for monitoring SDGs, Energy Statistics, EnviStats India, Women & Men in India etc., which provide analysis of data, supported by the visual presentation of that data in the form of graphs and maps.
The production of data and presentation of metadata structure requires an overview of the arrangements, technical infrastructure and skills required for a holistic and integrated approach to the presentation and dissemination of statistical data and metadata to different user groups. National Metadata Structure (NMDS) is to provide guidelines for the data producer to adhere to a basic minimum quality standard in order to establish and maintain the quality of data and enhance ease in sharing data. The specific objectives of this document are:
to promote reporting for each type of statistical process and its outputs across different Ministries/Divisions/Departments of NSO, hence facilitating comparisons across processes and outputs;
to ensure that producer reports contain all the information required to facilitate identification of quality issues and potential improvements in statistical processes and their outputs; and
to ensure that user reports contain all the information required by users to assess whether statistical outputs are fit for the purposes they have in mind.
A.What is Metadata?
A.1. Metadata should contain all the information users need to analyse a dataset and draw conclusions. It increases data accessibility by summarizing the most important information (i.e. methodology, sampling design, interview mode, etc.) required for analyzing a dataset which alleviates the need for users to search for supporting documents and reports. Furthermore, good metadata clearly articulates the potential uses for a dataset, preventing potential misuses. Metadata is also a tool for rendering complex microdata structures into something meaningful, navigable, and user-friendly. Finally, the adoption of well-known metadata schemas and vocabularies allows for semantic interoperability.
The Metadata process is fully integrated in the Generic Statistical Business Process Model{UNECE: United Nations Economic Commission for Europe, https://statswiki.unece.org/display/GSBPM/GSBPM+v5.1 } (GSBPM) which has metadata as one of the key elements in the version 5.1.
B. Why Metadata?
B.1. In most information technology usages, the prefix of meta conveys “an underlying definition or description.” So it is that, at its most basic, metadata is data about data. More precisely, however, metadata describes data containing specific information like type, length, textual description and other characteristics. Metadata makes it much easier to find relevant data and to use a dataset, users need to understand how the data is structured, definitions of terms used, how it was collected, and how it should be read.
B.2. Metadata is an important way to protect resources and their future accessibility. For archiving and preservation purposes, it takes metadata elements that track the object’s lineage, and describe its physical characteristics and behaviour so it can be replicated on technologies in the future.
B.3. In today’s modern data driven world and in the era of digital transactions, huge amount of data is generated on real time basis, and lately, a large number of organisations/agencies have started producing data, the quantum of which is huge, and thus arises a need of standard regulatory framework to be laid down to assure the quality of data produced by different producers. It will also serve the purpose of ensuring data comparability across time horizons so as to enable better understanding of different social and economic movements.
B.4. Although metadata may not seem exciting or impressive, the true importance of metadata can never be underrated and hence, is important to take a concerted effort to build sound metadata structure to draw maximum gains from varied data sets.
C. Role of MoSPI in Building Metadata Structure
C.1. MoSPI being a nodal agency for planned development of the statistical system in the country is also responsible for maintaining the highest standards of data quality which adhere to basic guidelines of International Agencies so as to ensure India’s statistical system is one of the frontrunners in quality data producer. MoSPI aims at raising the National Statistical System (NSS) to the epitome of being one of the best professionally equipped government data producing agencies by building the best of IT infrastructure among others in the system, and Metadata is one of the building blocks to achieve the objective.
C.2. The document presents the NMDS in two formats- the first one is the indexed version as NMDS concepts (Section F), and the second one presents details of concepts through definition and guidelines (Section G).
D. Metadata Management
D.1 It is advisable to put in place a metadata policy by the official statistical producing agencies, ab initio. The policy is a set of broad, high level principles that form the guiding framework within which metadata management can operate. D.2 Once the metadata policy is put in place, for an organisation, metadata should be compiled and maintained actively. Otherwise, the currency, and thus use of Metadata will degrade with time. To realise the full capabilities of Metadata, it is necessary that the Metadata are maintained over a long period of time. Even with investment in technically sophisticated search tools, such systems may find little stakeholders acceptance, if the data are incomplete or is not updated regularly.
While preparing the NMDS, the following core principles should be borne in mind:
i. Metadata Handling:
a. Statistical Business Process Model
b. Active, not Passive
c. Reuse for Efficiency
d. Version Preservations
ii. Metadata Authority
e. Registration
f. Single Source
g. One Entry/Update
h. Standards Variations
iii. Relationship to Statistical Business Processes
i. Integrity
j. Matching Metadata
k. Describe Flow
l. Capture at Source
m. Exchange and Use
iv. Users
n. Identify Users
o. Variant Formats
p. Availability
E. Retention, Preservation, and Destruction
National Statistics constitute valuable and irreplaceable assets whose value can increase through widespread and long-term use. National Statistics should thus be backed by the Data Management Policy eliciting the arrangements it has in place for the retention, long term preservation, and destruction of its resources including Metadata.