IMPUTATION METHOD FOR SURFACE ULTRAVIOLET IRRADIANCE BASED ON FEASIBLE CLOUD INFORMATION AND MACHINE LEARNING
20230127492 · 2023-04-27
Inventors
Cpc classification
International classification
Abstract
An imputation method for surface ultraviolet irradiance based on feasible cloud information and machine learning includes: establishing a deep learning model, wherein the deep learning model is designed to be a two-layered stacking ensemble learning model; constructing a first layer of the deep learning model as combination of multiple fundamental machine learning models; constructing a second layer of the deep learning model as Lasso model, which integrates an output from the first layer to obtain a final retrieval result; matching the surface ultraviolet irradiance with input features comprising cloud and meteorological information according to the temporal and spatial variables; establishing a statistical relationship between the surface ultraviolet irradiance and by training the deep learning model; and estimating the surface ultraviolet irradiance based on the trained deep learning model in regions with missing satellite observations of the surface ultraviolet irradiance.
Claims
1. An imputation method for surface ultraviolet irradiance based on feasible cloud information and machine learning, the method comprising: A) establishing a deep learning model, wherein the deep learning model is designed to be a two-layered stacking ensemble learning model; constructing a first layer of the deep learning model as combination of multiple fundamental machine learning models; constructing a second layer of the deep learning model as Lasso model, which integrates an output from the first layer to obtain a final retrieval result; B) matching the surface ultraviolet irradiance with input features comprising cloud and meteorological information according to date, latitude and longitude; establishing a statistical relationship between the surface ultraviolet irradiance and the input features by training the deep learning model; and C) estimating the surface ultraviolet irradiance based on the trained deep learning model in regions with missing satellite observations of the surface ultraviolet irradiance; and D) inputting the cloud and meteorological information to produce an UV index.
2. The method of 1, wherein the multiple fundamental machine learning models in A) comprise at least a Random Forest model.
3. The method of 1, wherein the multiple fundamental machine learning models comprise Random Forest, XGBoost, LightGBM, and CatBoost.
4. The method of 1, wherein step B) comprises: B1) collecting the surface ultraviolet irradiance from the satellite products OMUVBd with temporal range of 1 year and spatial range covering more than 700 km×700 km comprising the surface ultraviolet irradiance at specific wavelengths together with latitude and longitude where: the surface ultraviolet irradiance is denoted as UV; the date is denoted as YY/MM/DD, with YY as the year, MM as the month and DD as the day; latitude is denoted as LAT; longitude is denoted as LON; B2) collecting the input variables comprising meteorological information and cloud information from ERAS products, wherein: cloud information variables comprise cloud coverage (TCC), total cloud ice water content (TCIW), total cloud and liquid water content (TCLW); the meteorological information variables comprise: surface temperature (ST), dewpoint temperature (DT), surface pressure (SP), U-direction wind speed (UW), V-direction wind speed (VW), boundary layer height (BLH), relative humidity (RH), total precipitation (TP) and total evaporation (TE); B3) constructing a data table by matching the satellite ultraviolet irradiance, meteorological information, cloud information and auxiliary information by the date, latitude and longitude; and B4) setting up the deep learning model (DL) in regions where satellite surface ultraviolet irradiance is available with input feature of variables comprising date, latitude and longitude, satellite ultraviolet irradiance, meteorological information and cloud information; training the deep learning model with a model target of UV.
5. The method of 1, wherein step C) comprises: C1) reading the meteorological information, the cloud information and the date with global coverage from ERAS products; and C2) acquiring the full-coverage global surface ultraviolet irradiance by inputting the cloud information, meteorological information, temporal information and geological information into the deep learning model.
6. The method of 1, wherein step D) comprises: D1) applying the geolocation information of Hong Kong and finding the corresponding TCC, TCIW, TCLW, ST, DT, SPT, DT, SP, UW, VW, BLH, RH, TP, TE, YY/MM/DD, LAT and LON of the specific geolocation; and D2) obtaining the UV index by inputting the variables as listed in D1 into the trained deep learning model (DL).
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0028] Some more features, purposes, and advantages of the disclosure will become more apparent by reading a detailed description of the following figures as non-restrictive embodiments of the disclosure.
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
DETAILED DESCRIPTION
[0035] The embodiments of the disclosure are described in detail as following. The embodiments of the disclosure are implemented on the premise of the technical scheme of the disclosure, and the detailed embodiments and the specific operation process are given. It should be pointed out that a person skilled in the art can make several deformations and improvements without breaking away from the idea of the present disclosure, which belong to the protection scope of the present disclosure.
[0036] 1. Implementation Goals
[0037] This embodiment takes an example of the application of UV index, a sub-product of the satellite UV product. This embodiment makes use of satellite-observed UV index as means of monitoring of risks of skin cancer. To be specific. UV index higher than 10 indicates dangerous exposure to UV and high risk of skin cancer and UV index lower than 2 suggests no practices will be necessary. However, the impairment of the satellite sensors named “row anomaly” causes deterioration of data coverage. Meanwhile, the cloud hinders algorithms of UV index to be highly accurate. Given the above circumstance, the imputation of surface ultraviolet irradiance method proposed in this disclosure would be the best solution to provide data on missing regions, thus providing an overall evaluation of risks of skin cancer in Hong Kong in the year of 2018.
[0038] 2. Data Selection
[0039] The surface ultraviolet irradiance product OMUVBd from OMI sensor onboard Aura satellite provides information of ultraviolet at a series of wavelengths, such as 305 nm, 310 nm, 324 nm and 380 nm, as well as UV index. It has a spatial resolution of 1°×1° and a temporal resolution of daily-level. In this embodiment, the ultraviolet irradiance at 380 nm is selected. To achieve the high-resolution and full-coverage surface ultraviolet irradiance estimations in the range of mainland China, the OMUVBd products in the year of 2018 are selected as a foundation. Meanwhile, data products of ERAS from ECMWF with the same spatial and temporal range are collected along with the satellite products. The cloud information is obtained from ECMWF as well. The above datasets are the raw materials for the statistical model to correlate the available surface ultraviolet irradiances with the variables. To depict the UV index in Hong Kong, the geolocation of the city is used as following: 114.17° E 22.32° N.
[0040] 3. Implementation Process
[0041] A) Setup of deep learning model (DL)
[0042] 1) Loading the Python 3.8 and installing the modules comprising mlxtend, catboost, xgboost, lightgbm and sklearn.
[0043] 2) Constructing a framework of Stacking ensemble learner; applying four modules in the first layer of the Stacking ensemble learner, comprising Random Forest, XGBoost, LightGBM and CatBoost; and applying Lasso as a generalized linear regression model in the second layer of the Stacking ensemble learner
[0044] 3) Setting the of number of decision trees as 100 in the Random Forest model; and determining the parameters of other machine learning models by the grid search method.
[0045] B) Training of deep learning models for existing satellite observations
[0046] 1) collecting the satellite products OMUVBd, reanalysis products ERAS and USGS-SRTM in the year of 2018 with spatial coverage of mainland China.
[0047] 2) constructing a data table to organize the available satellite products by matching the satellite ultraviolet irradiance, meteorological information, cloud information and auxiliary information according to the date, latitude and longitude; and forming the final data table with spatial resolution of 0.25° and temporal resolution of daily level.
[0048] 3) training the deep learning model (DL) with input feature of variables comprising YY, MM, DD, LAT, LON, TCC, TCIC, TCLC, ST, DT, SP, UW, VW, BLH, RH, TP, TE for the records in the data table where the satellite-observed surface ultraviolet irradiance is available; saving the deep learning model.
[0049] C) Application of the model to impute the missing satellite observations
[0050] 1) preparing the input of the application model of YY*, MM*, DD*, LAT*, LON*, TCC*, TCIC*, TCLC*, ST*, DT*, SP*, UW*, VW*, BLH*, RH* and TP* from the data table, whose coverage should be global and have a daily-level temporal resolution.
[0051] 2) running the trained model DL by delivering the input dataset to obtain the spatially continuous surface ultraviolet irradiances UV*.
[0052] 4. Method Evaluation
[0053] The cross-validation is conducted to evaluate the practicability, precision and robustness of this disclosure. In the validation, the collected records containing satellite surface ultraviolet irradiance observations are randomly divided into 90% training set and 10% testing set. The DL model is trained on the training set and tested on the testing set. Then the validation runs repeatedly for 10 ten times until all records are used as the testing set for once.
[0054] By validation under different schemes, the importance of the cloud information is highlighted in this disclosure. The schemes of validations are the sample-based cross-validation, which is described as above; the stripe-based cross-validation, where data records are divided to a proportion of 50% training set and 50% testing set according to the longitude. In detail, the data records belonging to the longitude of 75°-85°, 95°-105°, 115°-125° are arranged to training set and the rest records are assigned to testing set. The last validation scheme is the week-based validation, where data records in the odd number of weeks are assigned to the training set and the rest are assigned to testing set.
[0055] The embodiment targets to monitor and evaluate the risk of skin cancer via the indicator of human's exposure to surface UV irradiance, namely, UV index. The embodiment depicts the utilization of this disclosure in Hong Kong in the year of 2018.
[0056] The specific embodiments described herein are merely illustrative of the spirit of the disclosure. Those skilled in this field to which this disclosure pertains can make various modifications or additions to the described specific embodiments or substitute in similar manners but will not deviate from the spirit of the present disclosure or go beyond the definition of the appended claims.