Use Python to automatically download and analyze the AOD data of Himawari-8 (sunflower 8) satellite (transfer to TIFF)

Seeing that the summer internship is coming to an end, recently record some recent achievements, and you can also refer to similar problems in the future.

At the beginning, we need to use osgeo's GDAL package, which can be downloaded from“ https://www.lfd.uci.edu/~gohlke/pythonlibs/#pygame "download the wheel for installation. I use" GDAL-3.1.2-cp37-cp37m-win_amd64.whl "here. 3.1.2 is the version, and cp37 means to adapt to python3 7, you can choose to download according to your own needs.

Although the whole program is used to analyze AOD products of sunflower 8, its principle can be extended to analyze other products of sunflower 8 and even any NetCDF format data. However, the AOD product data of sunflower 8 is only three-dimensional (length * width * variable). If you encounter more complex data, such as data containing both altitude and time dimensions, you need to make some changes.

1. Composition and function introduction

The program consists of three parts, namely "Download_AOD.py", "AOD_NetCDF_to_GeoTIFF.py" and "Main.py"

"Download_AOD.py" is mainly used for downloading. The download function has been written in the previous blog, including the download of AOD daily average data and hourly average data. When downloading daily average data, the AOD daily average data of yesterday will be downloaded by default. When downloading the AOD hourly average data, the hourly average data from 9:00 to 17:00 Beijing time on the day the program runs will be downloaded by default. If the program is not updated at the time of running, the operation will be automatically terminated after downloading the latest data. The program stores the file being downloaded locally in ". temp" format. When the current file is downloaded, it will be stored in ". nc" format. Before each download, the program will first delete the ". temp" file that was not downloaded during the last program execution, and check whether there is data to be downloaded in the storage directory. If so, skip, otherwise the download will start.

"AOD_NetCDF_to_GeoTIFF.py" is used to parse the downloaded AOD data from NetCDF format to TIFF format. For daily and hourly mean data, the parsing field name is "AOT_L2_Mean". According to the official documents, the grid values of NoData are set to - 32768, and the program modifies all these values to 0 during parsing. You can also set the values of NoData to other specific values according to later needs.

"Main.py" is used to run.

2. Storage path and naming rules

Assuming that the storage path entered when the program runs is footpath, then:

  • The storage path of the original data of hourly data is: footpath/YYYY-MM-DD/AOD_Hourly_Download
  • The storage path of the processed hourly data is footpath/YYYY-MM-DD/AOD_Hourly_Analysis
  • The storage path of the original data of daily data is: footpath/YYYY-MM-DD/AOD_Daily_Download
  • The storage path of the processed daily data is: footpath/YYYY-MM-DD/AOD_Daily_Analysis

For the hourly mean data file, the original file name is consistent with the official website, and the format is "H08_YYYYMMDD_hh00_1HARP030_FLDK.02401_02401.nc". The naming format of the parsed hourly mean data file is "YYYYMMDD_hh00.tif".

For the daily average data file, the name of the original file is consistent with the official website, with the format of "H08_YYYYMMDD_0000_1DARP030_FLDK.02401_02401.nc", and the name format of the parsed daily average data file is "YYYYMMDD_0000.tif".

Note: in the naming of all the above folders and files, YYYY represents a four digit year, MM represents a two digit month, DD represents a two digit day, and hh represents a two digit hour. The time in the file name is UTC time. When using it, pay attention to add 8 to change to Beijing time.

3. Code

3.1 code of download part - "Download_AOD.py"

import os
import ftplib
import time


# This function is used to convert the date from an integer to the string required by the FTP path
def getDateStr(yearNum, monNum, dayNum):
    # Four digit year
    yearStr = str(yearNum)

    # Double digit month
    if monNum < 10:
        monStr = "0" + str(monNum)
    else:
        monStr = str(monNum)

    # Double digit days
    if dayNum < 10:
        dayStr = "0" + str(dayNum)
    else:
        dayStr = str(dayNum)

    return yearStr, monStr, dayStr


# This function is used to get the date number of the previous day
def getYesterday(year, month, day):
    if day == 1:

        if month == 1:
            year -= 1
            month = 12
            day = 31

        elif month == 2 or month == 4 or month == 6 or month == 8 or month == 9 or month == 11:
            month -= 1
            day = 31

        elif month == 5 or month == 7 or month == 10 or month == 12:
            month -= 1
            day = 30

        elif month == 3:
            # leap year
            if year % 4 == 0 and year % 400 == 0:
                day = 29
                month -= 1
            # leap year
            elif year % 4 == 0 and year % 100 != 0:
                day = 29
                month -= 1
            else:
                day = 28
                month -= 1
    else:
        day -= 1

    return year, month, day


# Get file suffix
def suffix(file, *suffixName):
    array = map(file.endswith, suffixName)
    if True in array:
        return True
    else:
        return False


# Delete directory with extension temp file
def deleteFile(fileDir):
    targetDir = fileDir
    for file in os.listdir(targetDir):
        targetFile = os.path.join(targetDir, file)
        if suffix(file, '.temp'):
            os.remove(targetFile)


class myFTP:
    ftp = ftplib.FTP()

    # Connect to FTP. host is the IP address and port is the port. The default is 21
    def __init__(self, host, port=21, YesdayNum=1):
        self.ftp.connect(host, port)
        self._dayNum = YesdayNum

    # Log in to the FTP connection. User is the user name and password is the password
    def Login(self, user, password):
        self.ftp.login(user, password)
        print(self.ftp.welcome)  # Display login information

    # Download a single file. LocalFile represents the local storage path and file name, and RemoteFile is the FTP path and file name
    def DownLoadFile(self, LocalFile, RemoteFile):
        bufSize = 102400

        file_handler = open(LocalFile, 'wb')
        print(file_handler)

        # Receive files on the server and write local files
        self.ftp.retrbinary('RETR ' + RemoteFile, file_handler.write, bufSize)
        self.ftp.set_debuglevel(0)
        file_handler.close()
        return True

    # Download the files in the whole directory. LocalDir represents the local storage path and emoteDir represents the FTP path
    def DownLoadFileTree(self, LocalDir, RemoteDir, choice):
        # print("remoteDir:", RemoteDir)
        # If the path does not exist locally, it is created
        if not os.path.exists(LocalDir):
            os.makedirs(LocalDir)

        # Get all the file names under the FTP path and store them in a list
        # It seems to be out of order
        self.ftp.cwd(RemoteDir)
        RemoteNames = self.ftp.nlst()
        RemoteNames.reverse()

        # print("RemoteNames: ", RemoteNames)
        for file in RemoteNames:
            # Download the temporary file Local first, and then change the name to nc4 format file after downloading
            # This is to prevent the last downloaded file from downloading completely after the last download interruption, and when the download starts again, the program will recognize that the download has been completed
            Local = os.path.join(LocalDir, file[0:-3] + ".temp")
            LocalNew = os.path.join(LocalDir, file)

            '''
            Download hour files, only download UTC Documents from 1:00 to 10:00 (9:00 to 18:00 Beijing time)
            The downloaded file must be nc format
            If it already exists, skip the download
            '''
            # Example of hourly data naming format: H08_20200819_0700_1HARP030_FLDK.02401_02401.nc
            if choice == 1:
                if int(file[13:15]) >= 1 and int(file[13:15]) <= 10:
                    if not os.path.exists(LocalNew):
                        print("Downloading the file of %s" % file)
                        self.DownLoadFile(Local, file)
                        os.rename(Local, LocalNew)
                        print("The download of the file of %s has finished\n" % file)
                    elif os.path.exists(LocalNew):
                        print("The file of %s has already existed!\n" % file)
                else:
                    pass

            # Example of day data naming format: H08_20200802_0000_1DARP030_FLDK.02401_02401.nc
            elif choice == 2:
                if int(file[10:12]) == self._dayNum and not os.path.exists(LocalNew):
                    print("Downloading the file of %s" % file)
                    self.DownLoadFile(Local, file)
                    os.rename(Local, LocalNew)
                    print("The download of the file of %s has finished\n" % file)
                elif int(file[10:12]) == self._dayNum and os.path.exists(LocalNew):
                    print("The file of %s has already existed!" % file)

        self.ftp.cwd("..")
        return

    def close(self):
        self.ftp.quit()

3.2 code of parsing part - "AOD_NetCDF_to_GeoTIFF.py"

# Module import
import numpy as np
import netCDF4 as nc
from osgeo import gdal, osr, ogr
import os
import glob


def NC_to_tiffs(data, Output_folder):
    # Read the basic information
    nc_data_obj = nc.Dataset(data)
    Lon = nc_data_obj.variables["longitude"][:]
    Lat = nc_data_obj.variables["latitude"][:]

    # When reading variables, the values will be automatically restored according to the scale factor, but the grid of Nodata will be stored as - 32768
    # The variable name is "AOT_L2_Mean" regardless of daily data or hours
    AOD_arr = np.asarray(nc_data_obj.variables["AOT_L2_Mean"])  # Read AOD data as an array

    # This loop changes all Nodata values (i.e. - 32768) to 0
    for i in range(len(AOD_arr)):
        for j in range(len(AOD_arr[0])):
            if AOD_arr[i][j] == -32768:
                AOD_arr[i][j] = 0.0

    # Four rank of image
    LonMin, LatMax, LonMax, LatMin = [Lon.min(), Lat.max(), Lon.max(), Lat.min()]

    # Resolution calculation, in fact, can be written to death. They are 2401 * 2401
    N_Lat = len(Lat)
    N_Lon = len(Lon)
    Lon_Res = (LonMax - LonMin) / (float(N_Lon) - 1)
    Lat_Res = (LatMax - LatMin) / (float(N_Lat) - 1)

    # This sentence code must have
    AOD_arr = np.array([AOD_arr])

    for i in range(len(AOD_arr[:])):
        # establish. tif file
        driver = gdal.GetDriverByName("GTiff")
        TIFF_name = os.path.basename(data)
        out_tif_name = Output_folder + "\\" + TIFF_name.split("_")[1] + "_" + TIFF_name.split("_")[2] + ".tif"
        out_tif = driver.Create(out_tif_name, N_Lon, N_Lat, 1, gdal.GDT_Float32)

        # Set the display range of the image
        # -Lat_Res must be negative
        geotransform = (LonMin, Lon_Res, 0, LatMax, 0, -Lat_Res)
        out_tif.SetGeoTransform(geotransform)

        # Obtain the geographic coordinate system information, which is used to select the required geographic coordinate system
        srs = osr.SpatialReference()
        srs.ImportFromEPSG(4326)  # Define the output coordinate system as "WGS 84", AUTHORITY["EPSG","4326"]
        out_tif.SetProjection(srs.ExportToWkt())  # Assign projection information to the new layer

        # Data writing
        out_tif.GetRasterBand(1).WriteArray(AOD_arr[i])  # Write data to memory, not to hard disk at this time
        out_tif.FlushCache()  # Write data to hard disk
        out_tif = None  # Note that the tif file must be closed

3.3 execution procedure - "Main.py"

import time
import os
import glob
import Download_AOD as Daod
import AOD_NetCDF_to_GeoTIFF as trans

dt = time.localtime(time.time())
_yearNum = dt.tm_year
_monNum = dt.tm_mon
_dayNum = dt.tm_mday
_yearStr = ""
_monStr = ""
_dayStr = ""

if __name__ == "__main__":
    # Incoming IP address
    # The incoming YesdayNum will need to download the daily data (yesterday's)
    ftp = Daod.myFTP(host='ftp.ptree.jaxa.jp', YesdayNum=_dayNum - 1)

    # You can register and modify your user name and password by yourself
    ftp.Login('Write your account here', 'Write your password here')

    # FTP from target path_ Filepath downloads the file to the local path dst_filePath
    dst_filePath = input("Please enter the path to store the file:") 
    dst_filePath = dst_filePath + "/" + time.strftime("%F")
    if not os.path.exists(dst_filePath):
        os.makedirs(dst_filePath)

    '''
    When downloading hourly data and daily data, the leading path is:/pub/himawari/L3/ARP/030
    When downloading daily data, example path:/pub/himawari/L3/ARP/030/202008/daily/
    When downloading hourly data, sample path:/pub/himawari/L3/ARP/030/202008/19/
    '''
    print("Please select the data to download:")
    _choice = int(input("1.AOD Hourly data (day 9):00 To 18:00)  2.AOD Average daily data (yesterday)\n"))

    # Download_Path is used to store the downloaded raw data
    Download_Path = ""
    # Analysis_Path is a folder used to store processed data (that is, data converted to TIFF)
    Analysis_Path = ""

    # If AOD hour data is selected
    if _choice == 1:
        _yearStr, _monStr, _dayStr = Daod.getDateStr(_yearNum, _monNum, _dayNum)
        ftp_filePath = "/pub/himawari/L3/ARP/030" + "/" + _yearStr + _monStr + "/" + _dayStr + "/"

        Download_Path = dst_filePath + "/AOD_Hourly_Download"
        if not os.path.exists(Download_Path):
            os.makedirs(Download_Path)
        Daod.deleteFile(Download_Path)  # Delete the temporary files in the storage path (that is, the complete files were not downloaded last time)

        Analysis_Path = dst_filePath + "/AOD_Hourly_Analysis"
        if not os.path.exists(Analysis_Path):
            os.makedirs(Analysis_Path)

        ftp.DownLoadFileTree(Download_Path, ftp_filePath, _choice)

    # If AOD daily data is selected (yesterday's)
    elif _choice == 2:
        _yearNum, _monNum, _dayNum = Daod.getYesterday(_yearNum, _monNum, _dayNum)
        _yearStr, _monStr, _dayStr = Daod.getDateStr(_yearNum, _monNum, _dayNum)
        ftp_filePath = "/pub/himawari/L3/ARP/030" + "/" + _yearStr + _monStr + "/" + "daily" + "/"

        Download_Path = dst_filePath + "/AOD_Daily_Download"
        if not os.path.exists(Download_Path):
            os.makedirs(Download_Path)
        Daod.deleteFile(Download_Path)  # Delete the temporary files in the storage path (that is, the complete files were not downloaded last time)

        Analysis_Path = dst_filePath + "/AOD_Daily_Analysis"
        if not os.path.exists(Analysis_Path):
            os.makedirs(Analysis_Path)

        ftp.DownLoadFileTree(Download_Path, ftp_filePath, _choice)


    else:
        print("Wrong selection!")

    # End of download
    ftp.close()
    print("Download complete!")

    # Let's start data processing
    # Read all nc data
    data_list = glob.glob(Download_Path + "\\*.nc")

    # The for loop completes parsing
    for i in range(len(data_list)):
        data = data_list[i]
        trans.NC_to_tiffs(data, Analysis_Path)
        print(data + "-----turn tif success")

    print("----End of conversion----")

 

 

Tags: Python gis

Posted by rob.weaver on Fri, 20 May 2022 06:09:24 +0300