چکیده
|
Feature selection (FS) is served in almost all data mining applications along with some benefits such as reducing the computation and storage cost. Most of the current feature selection algorithms just work in a centralized manner. However, this process does not apply to high dimensional datasets, effectively. In this paper, we propose a distributed version of Minimum Redundancy Maximum Relevance (mRMR) algorithm. The proposed algorithm acts in six steps to solve the problem. It distributes datasets horizontally into subsets, selects and eliminates redundant features, and finally merges the subsets into a single set. We evaluate the performance of the proposed method using different datasets. The results prove that the suggested method can improve classification accuracy and reduce the runtime
|