Impact of Similarity Measures on Causal Relation Based Feature Selection Method for Clustering Maritime Accident Reports

Santosh Tirunagari, Maria Hanninen, Abhishek Guggilla, Kaarle Stahlberg, Pentti Kujala


Unsupervised document clustering is an automated process in which documents are analyzed based on their similarity. In this paper, we propose a new feature selection method based on causal relations to classify maritime accident reports in unsupervised manner. We also compare the impact of different similarity measures on proposed feature selection method. Based on the analysis, we conclude that the proposed feature selection method has better performance over the conventional method due to the effect of dimensionality curse. The impact of similarity measures improves with the proposed feature selection method. In the analysis, we have compared Correlation, Cosine, Spearman, Bray-Curtis, Euclidean, City-block, Squared-Euclidean, Standardized Euclidean, and, Chebychev similarity measures. The first two produced the best results, followed by the next two. The rest did not produce good results with the maritime accident reports used in our analysis. Interestingly Chi-Square gave good results with proposed method in our analysis.

Full Text:



© 2017 International Journal of Global Research in Computer Science (JGRCS)
Copyright Agreement & Authorship Responsibility