-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmethod.tex
67 lines (59 loc) · 3.29 KB
/
method.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
\section{Method}
Here we will presnet some basic models to solve this question independently, which includes logistic regression, svm, decision tree and random forest. At last we will introduce how to ensemble all the model together to boost our perfermances.
\subsection{Logistic Regression}
In this task, the logistic regression can solve the binary classification problems. The specific model is as following:
\begin{equation}
P(y|x) = \sigma(y\cdot W^Tx)
\end{equation}
Once we confirm the model, we can construct some loss function to learn the parameters by stochasitc gradient descent.
\begin{equation}
loss(y, x) = -\sum_{i=1}^{m} [y_i ln\sigma(W^Tx_i)+(1-y_i)ln(1-\sigma(W^Tx_i))]
\end{equation}
In genearlly, we can overfit the validation data because of lots of features. So we need to add some regularization term to reduce the model complexity.
So we can rewrite the loss function as:
\begin{equation}
loss_{reg}(y,x) = loss(y, x) + \lambda W^TW
\end{equation}
Here we will explore the influence of regularization strength terms.
\begin{figure}[ht]
\centering
\includegraphics[width=1\linewidth]{img/f1_score_bs1_lg_smote.png}
\caption{The F1 scores via different $\lambda$}
\end{figure}
We can see that when we set $\lambda=0.189$ to get the best perfermances.
\subsection{SVM}
The intuition of SVM\cite{SVM} is to manimize the margin, which can get more better generalization powers. Let's see mathematical equations in SVM.
\begin{align}
min_w & \quad \frac{1}{2} W^TW \\
s.t. &\quad y_i(W^T x_i+b) \geq 1 \quad i = 1\dots m
\end{align}
In genearlly, the data cannot be seperated in limited dimension. So we can map the data to high dimension to let it be sepearted. So we can use some kernel tricks to get better perfermances. There are several kernel function,so we will give some comparision for different kernel function.
\begin{figure}[ht]
\centering
\includegraphics[width=1\linewidth]{img/f1_score_bs1_svm_rbf_smote.png}
\caption{The Guassian kernel}
\end{figure}
\begin{figure}[ht]
\centering
\includegraphics[width=1\linewidth]{img/f1_score_bs1_svm_poly_smote.png}
\caption{The Polynomial kernel}
\end{figure}
We can see that the Gaussion kernel can get better perfermances than polynomial kernel. Because the gaussion can map the data to infinte space.
\subsection{Decision Tree}
The decision tree can provide a explainable model to solve problems. In this taks we have different attributes in total. We can compute the information gain to get the split results.
\begin{equation}
Gain(D, a) = Ent(D) - \sum_{v=1}^{V} \frac{|D^v|}{|D|} Ent(D^v)
\end{equation}
In some degree, information gain can solve most of problems.But when we have lots of categories, information gain will fail. So we can use gain ratio to get more better perfermances.
\begin{align}
&Gain\_ratio(D, a) = \frac{Gain(D,a)}{IV(a)}\\
&IV(a) = -\sum_{v=1}^{V} \frac{|D^v|}{|D|} \log\frac{|D^v|}{|D|}
\end{align}
we always choose the most gain ratio to be our nodes. There are lost of parameters in decision trees.
In the end, we conclude three different methods by their AUC scores.
\begin{figure}[ht]
\centering
\includegraphics[width=1\linewidth]{img/myplot.png}
\caption{AUC}
\end{figure}
We can see from the figure, the logistic regression model can get the best perfermances.