Landslides are type of natural geohazard interfering with many economical and social activities and causing serious damages on human life. It is ranked as a great disaster, threatening life, property and environment. Therefore, early prediction of landslide prone areas is vital. Variety of causative factors such as glaciers melting, excessive raining, mining, volcanic activities, active faults, earthquake, logging, erosion, urbanization, construction, and other human activities can trigger landslide occurrence. Then, identification of factors that directly influences the slide events is highly in demand. Some topographical, geological, and hydrological datasets (e.g., slope, aspect, geology, terrain roughness, vegetation index, distance to stream, distance to road, distance to fault, land use, precipitation, profile curvature, plan curvature) are considered to be effective conditioning factors. However, the importance of each factor differs from one study to another. This study investigates the effectiveness of four sets of landslide conditioning variable(s). Fourteen landslide conditioning variables were considered in this study where they were duly divided into four groups G1, G2, G3, and G4. Three machine learning algorithms namely, Random Forest (RF), Naive Bayes (NB), and Boosted Logistic Regression (LogitBoost) were constructed based on each dataset in order to determine which set would be more suitable for landslide susceptibility prediction. In total, 227 landslide inventory datasets of the study area were used where 70% was used for training and 30% for testing. To this end, in the present research, the two main objectives were: 1) Investigation on effectiveness of 14 landslides conditioning factors (altitude, slope, aspect, total curvature, profile curvature, plan curvature, Stream Power Index (SPI), Topographic Wetness Index (TWI), Terrain Roughness Index (TRI), distance to fault, distance to road, distance to stream, land use, and geology) by analyzing and determining the most important factors using variance-inflated factor (VIF), Pearson’s correlation and Chi-square techniques. Consequently, 4 categories of datasets were defined; first dataset included all 14 conditioning factors, second dataset included Digital Elevation Models (DEM) derivatives (morphometrice factors), third dataset was only based on 5 factors namely lithology, land use, distance to stream, distance to road, and distance to fault, and last dataset was included 8 factors selected using factor analysis and optimization. 2) Evaluate the sensitivity of each modeling technique (NB, RF and LogitBoost) to different conditioning factors using the area under curve (AUC). Eventually, RF technique using optimized variables (G4) performed well with AUC of 0.940 followed by LogitBoost (0.898) and NB (0.864).
|