English  |  正體中文  |  简体中文  |  Items with full text/Total items : 90120/105277 (86%)
Visitors : 8143812      Online Users : 1622
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    ASIA unversity > 資訊學院 > 資訊工程學系 > 期刊論文 >  Item 310904400/108596

    Please use this identifier to cite or link to this item: http://asiair.asia.edu.tw/ir/handle/310904400/108596

    Title: Clustering-Based Undersampling for Class-imbalanced Data
    Authors: 林維昭;Wei-Chao Lin;Tsai, C.-F.;Hu, Y.-H.;Jhang, J.-S.
    Contributors: 資訊工程學系
    Date: 2017-10
    Issue Date: 2017-12-08 14:46:24 (UTC+8)
    Abstract: Class imbalance is often a problem in various real-world data sets, where one class (i.e. the minority class) contains a small number of data points and the other (i.e. the majority class) contains a large number of data points. It is notably difficult to develop an effective model using current data mining and machine learning algorithms without considering data preprocessing to balance the imbalanced data sets. Random undersampling and oversampling have been used in numerous studies to ensure that the different classes contain the same number of data points. A classifier ensemble (i.e. a structure containing several classifiers) can be trained on several different balanced data sets for later classification purposes. In this paper, we introduce two undersampling strategies in which a clustering technique is used during the data preprocessing step. Specifically, the number of clusters in the majority class is set to be equal to the number of data points in the minority class. The first strategy uses the cluster centers to represent the majority class, whereas the second strategy uses the nearest neighbors of the cluster centers. A further study was conducted to examine the effect on performance of the addition or deletion of 5 to 10 cluster centers in the majority class. The experimental results obtained using 44 small-scale and 2 large-scale data sets revealed that the clustering-based undersampling approach with the second strategy outperformed five state-of-the-art approaches. Specifically, this approach combined with a single multilayer perceptron classifier and C4.5 decision tree classifier ensembles delivered optimal performance over both small- and large-scale data sets.
    Appears in Collections:[資訊工程學系] 期刊論文

    Files in This Item:

    File SizeFormat

    All items in ASIAIR are protected by copyright, with all rights reserved.

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback