The Study of a Prediction Method for Search Ad CTR Based on Logistic Regression Model
|Course||Computer Software and Theory|
|Keywords||search advertising Click-Through Rate Logistic Regression featureselection time decay|
Search advertising has become the major source of revenue for search engine companies like Baidu and Google. The current charging mode is paid by the number of user clicks, while ad positions are limited, so for each query, the greatest concern of all search engine companies is how to select the ads that users most likely to click. CTR (Click-Through Rate) measures the possibility that a user clicks on an ad. Show ads with highest CTR to users leads to a win-win-win situation. For users, it improves their search experience, users are willing to click; For advertisers, precise ad delivery enables them to reach potential customers, thus improve their profitability; For search engine companies, more clicks means more revenue.CTR prediction is a very complicated task, this paper proposes a predicting method based on logistic regression model. The CTR prediction work is divided into two main parts:offline train and online calculation. With Hadoop, we can easily get offline train job done through a flow of data cleaning, feature extraction, sort and dimension reduction, model solving, model validation, and eventually obtain a mapping file contains features and their weights. In the online calculation part, we will do extended matching, rougher ad selection, CTR computing jobs and finally obtain the top10ad with the highest CTR.Logistic regression belongs to the supervised learning family, the key point is finding features with enough discrimination information. This paper proposes3one-item and two-item features, when they were applied to the production environment, experiments showed that they can effectively improve search engine revenue. This paper also proposes a time decay factor to distinguish the different influence of the older historical records and the newer historical records, which also has achieved good results.