چکیده
|
Due to the increasing growth of data, many methods are proposed to extract useful data and remove noisy data. Instance selection is one of these methods which selects some instances of a data set and removes others. This paper proposes a new instance selection algorithm based on ReliefF, which is a feature selection algorithm. In the proposed algorithm, based on the Jaccard index, the nearest instances of each class are found for each instance. Then, based on the nearest neighbor’s set, the weight of each instance is calculated. Finally, only instances with more weights are selected. This algorithm can reduce data at a specified rate and have the ability to run parallel on the instances. It can work on a variety of data sets with nominal and numeric data with missing values and is also suitable for working with imbalanced data sets. The proposed algorithm tests on three data sets. Results show that the proposed algorithm can reduce the volume of data, without a significant change in classification accuracy of these datasets.
|