Recognition model based electronic payment technology platform for classification study of cheating accounts
|School||Zhejiang University of Technology|
|Keywords||Credit Risk Fraud Data Mining Classification Technique CART TreeNet|
As the volume of transactions, supply of virtual currency and number of accounts involved on e-payment platforms are growing very rapidly, the credit-related issues, such as redundant capital, money laundry, cash in and fraud are emerging as the biggest bottlenecks that curb the development of e-commerce. There are tremendous differences among variety of forms of credit risks hidden by shrewd techniques that evolve over time. Currently, e-payment companies usually adopt safety products that only provide defensive functions, but lack offensive mechanism. It is still very difficult to accurately identify and capture these risks only by experiences and manual inspections. Hence, e-payment companies need to strengthen their technologies to proactively and effectively prevent and contain credit risks, and enhance their capability of credit identification and risk management.In this article, based on the psychological motivations that fraudulent users intend to use little money and time to promote their credits, we make the clear definition of fraudulent accounts, and conduct single variable analysis on both fraudulent and normal accounts. We discover that fraudulent users have the characteristics of batch registrations, very low transaction amount and very high concentration of transaction time.Regarding the problem of identification of fraudulent accounts, Lach(1999) indicates that data mining technology can be used to identify and understand the pattern of fraudulent behavior, and certain actions can be taken to reduce fraudulent rates. Two tree algorithms: ID3 (Mitchell, 1997) and C4.5 (Quinlan, 1993) are applied to a simple fraud dataset from an e-commerce company. Though ID3 and C4.5 attempt to extract information as much as possible, the resulting trees often have too many branches and the size of trees is too big. CART based on GINI splitting rule was invented to simplify tree algorithm and enhance efficiency. CART offers simple-structured and easy-to-understand binary splitting tree. When data quality issue in reality is also taken into account, we choose CART as the modeling tool, given that tree algorithm is immune to outliers and can deal with missing values automatically. In addition, since single CART tree has the disadvantage of instability, boosting technique can be used to improve stability and accuracy. Therefore, multiple trees algorithm, TreeNet (Friedman, 2002) was invented.In this article, a real dataset from an e-commerce platform is used as an example to conduct comparison analysis between fraudulent accounts and normal accounts。After experimenting with logistic regression, CART and TreeNet algorithms, we discover that the non-parametric methods, CART and TreeNet outperform the parametric method, logistic regression, and the multiple trees method, TreeNet perform better than single tree method, CART model, but cannot provide comparable explicability.Taking into account the system’s implementation and interpretation services, usually without sacrificing accuracy in the case will give priority to the CART model.Models can be translated into generic languages (such as C and Java), and then incorporated into marketing platforms and corresponding analytical reports for the future analysis and routine operational decision making needs.