This is the continuation of the previous article. For better understanding it is recommended to look through the earlier blog posts.
Data Formats and Data sources
There are several geographical data storing formats. The obtained data may be classified by several criteria. As there is a necessity of data consolidation to be later analyzed the following data storage format classification should be applied:
Textual spatial data sources in open source format – datasets in CSV, GeoJSON, fixed-length text and other text formats assigned to geographical coordinates;
Map files that are divided into vector and raster.
Vector maps contain information on geographical objects, nature etc. in the form of polygon datasets assigned to coordinate system, whereas raster maps are described as an image put on the coordinate grid every single pixel of which characterizes the landscape on the chosen point.
Proprietary data formats requiring interpretation/decoding. Some of these formats are NetCDF and GRIB designed especially for NASA and NOAA prediction models. These data formats are given in a separate class as they require special additional processing. The following figure NOAA GFS illustrates the parameters list available in the standard dataset provided in the GRIB format and obtained as a result of NOAA GFS weather predicting model processing.
One of the picturesque wind and temperature visualization in the scale of the whole Earth surface providing real time data taken from NCEP/GFS weather datasets is available on the web site http://earth.nullschool.net/
The reviewed list of weather data sources is composed of:
NASA GSOD (Global Summary of Day)
NASA SSE (Surface meteorology and Solar Energy)
ECMWF (European Centre for Medium-Range Weather Forecasts) archive datasets
Some public datasets from NOAA (National Oceanic and Atmospheric Administration)
NCEP (National Centers for Environmental Prediction)
GSOD is used to create the whole image of climate on the whole territory of the country. The climate changes history is also included providing data starting from 1929. GSOD includes such environmental parameters as: temperature, air humidity and pressure, dew point, wind velocity and direction, precipitation etc. All the data is daily. Time resolution is 24 hours. For instance, the data source has a very low resolution on the Kazakhstan territory as the measurement sites (weather stations entering the WMO – World Meteorological Organization network) is extremely low in comparison with developed countries. According to the data of the 2012 the number of weather stations on the territory of Kazakhstan was 359 whereas in the USA this index was 48529, i.e. the surface proportion is 1 to 3.5 and stations number proportion is 1 to 135. A large number of measuring sites will allow building detailed forecasting models and estimating solar and potential on the whole territory. The data is provided in the form of text files with columns of fixed length. The given format can be easily transformed into a CSV file or a relational database. The data scheme is displayed on the entity diagram
All the datasets are divided by years including a separate file with weather stations description and location.
SSE datasets have been obtained as the result of collaboration between several organizations carrying out solar activity measurements. Among them are BP Solar International, National Renewable Energy Laboratory (NREL), Solar Energy International (SEI), The State University of New York at Albany, Atmospheric Sciences Research Center. The main goal of the SSE project is providing all the needed data that may be later applied in green energy development (especially solar one). Moreover the data of distant probing is applied. In some points this data is interpolated as there are measurements for 22 years available (from 1983 to 2005). The total coverage includes the whole world; accuracy is 1 degree of longitude and latitude. The parameters list includes air temperature, number of sunny days, insolation layers of different types, cloud coverage, wind direction and velocity.
The project has its own web site where users can choose the data they are interested in. This data may be useful in defining such parameters as size and form, location and angle of solar panels, wind and temperature characteristics, influence of flora on the wind intensity.
ECMWF is a European Intergovernmental Structure that carries out researches on the climate measurements and weather data analysis. The main activity of the organization is a numeric weather prediction in mean time intervals from 15 to 30 days. ECMWF develops weather forecasting algorithms that would be impossible without big data. , Some of the data is open source and may be easily requested (on the web site there is a link to the web API which helps to automate the data requests). ECMWF offers a wide variety of parameters, even such sophisticated ones as snow coverage depth, soil temperature and ozone layer characteristics. Unfortunately high resolution data is mostly paid.
At present the most interesting form the point of view of data visualization is the information obtained from the NASA and NOAA affiliated organizations. Most of this data is mostly the result of distant probing. From these organizations it is possible to distinguish NCDC (National Climatic Data Center) mainly because the data may be ordered in printed certified form and has a guarantee. Data is gathered from the satellites, probes, weather stations, radars and are obtained from the climate models. NCDC is actually the world’s largest climate database. Datasets are provided in different forms (according to the data source NCDC refers to).
Besides textual and numeric data spatial data may be stored in a map form (map layers). There are a great a deal of map sources of different topics. Online map services as OpenStreetMap (OSM), OpenWeatherMap (OWM) mentioned above, Google Maps and others have recently become very popular. OWM for instance, provides maps of cloud coverage, air pressure, temperature and precipitation at the whole world. As usual the map providers have a well-documented API (which is free in most cases), with the help of which it is possible to create own web-based GIS with different map layers which are offered by the organization. And yet, there is a lot of data available in the form of map files or satellite images. The most popular map storage formats are Shape, GeoJSON for vector graphics and TIFF, JPEG for raster model.
One of the most important tasks is to gather data from existing weather station on the less covered territories, like Central Asia. For instance on the Kazakhstan territory, the largest weather station network belongs to governmental organization Kazgidromet (287 weather stations). Data obtained from this network is suitable to analyze real weather conditions on the Erath level. At present, amateur weather station are steadily becoming popular. Such devices are commonly less accurate but they are portable and autonomous. Moreover, a large number of such stations will allow monitoring in real time ecological and weather conditions within a certain city or a specific region.
The most effective way to represent the obtained data and the results of its processing and analysis is to create own GIS providing map data as well as service set in the form of user interface and API (application programming interface).
In the next article are going to cover the topic of non-uniform data collection and processing methods. See you later
Comments