{"cells":[{"cell_type":"markdown","metadata":{"id":"r9vo3Yf30U-r"},"source":["---\n","
"]},{"cell_type":"raw","metadata":{"id":"meVAlr6T0U-u"},"source":["\n","Throughout the financial sector, machine learning algorithms are being developed to detect fraudulent transactions. In this project, that is exactly what we are going to be doing as well. Using a dataset of of nearly 28,500 credit card transactions and multiple unsupervised anomaly detection algorithms, we are going to identify transactions with a high probability of being credit card fraud. In this project, we will build and deploy the following two machine learning algorithms:\n","\n","* Local Outlier Factor (LOF)\n","* Isolation Forest Algorithm\n","\n","Furthermore, using metrics suchs as precision, recall, and F1-scores, we will investigate why the classification accuracy for these algorithms can be misleading.\n","\n","In addition, we will explore the use of data visualization techniques common in data science, such as parameter histograms and correlation matrices, to gain a better understanding of the underlying distribution of data in our data set. Let's get started!"]},{"cell_type":"markdown","metadata":{"id":"2oqQhlgs0U-v"},"source":["## 1. Importing Necessary Libraries"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"Jqw0kjZv0U-w"},"outputs":[],"source":["# import the necessary packages\n","import numpy as np\n","import pandas as pd\n","import matplotlib.pyplot as plt\n","import seaborn as sns"]},{"cell_type":"markdown","metadata":{"id":"YG6BjULy0U-w"},"source":["### 2. Load The Data Set\n","In the following cells, we will import our dataset from a .csv file as a Pandas DataFrame. Furthermore, we will begin exploring the dataset to gain an understanding of the type, quantity, and distribution of data in our dataset. For this purpose, we will use Pandas' built-in describe feature, as well as parameter histograms and a correlation matrix."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"G2nsgKBc0U-x","outputId":"a87715a5-37ce-44ad-f466-b211531ad54f"},"outputs":[{"name":"stdout","output_type":"stream","text":["(284807, 31)\n"," Time V1 V2 V3 V4 V5 V6 V7 \\\n","0 0.0 -1.359807 -0.072781 2.536347 1.378155 -0.338321 0.462388 0.239599 \n","1 0.0 1.191857 0.266151 0.166480 0.448154 0.060018 -0.082361 -0.078803 \n","2 1.0 -1.358354 -1.340163 1.773209 0.379780 -0.503198 1.800499 0.791461 \n","3 1.0 -0.966272 -0.185226 1.792993 -0.863291 -0.010309 1.247203 0.237609 \n","4 2.0 -1.158233 0.877737 1.548718 0.403034 -0.407193 0.095921 0.592941 \n","\n"," V8 V9 ... V21 V22 V23 V24 V25 \\\n","0 0.098698 0.363787 ... -0.018307 0.277838 -0.110474 0.066928 0.128539 \n","1 0.085102 -0.255425 ... -0.225775 -0.638672 0.101288 -0.339846 0.167170 \n","2 0.247676 -1.514654 ... 0.247998 0.771679 0.909412 -0.689281 -0.327642 \n","3 0.377436 -1.387024 ... -0.108300 0.005274 -0.190321 -1.175575 0.647376 \n","4 -0.270533 0.817739 ... -0.009431 0.798278 -0.137458 0.141267 -0.206010 \n","\n"," V26 V27 V28 Amount Class \n","0 -0.189115 0.133558 -0.021053 149.62 0 \n","1 0.125895 -0.008983 0.014724 2.69 0 \n","2 -0.139097 -0.055353 -0.059752 378.66 0 \n","3 -0.221929 0.062723 0.061458 123.50 0 \n","4 0.502292 0.219422 0.215153 69.99 0 \n","\n","[5 rows x 31 columns]\n"]}],"source":["# Load the dataset from the csv file using pandas\n","data = pd.read_csv('creditcard.csv')\n","print(data.shape)\n","print(data.head())"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"_9gG3G1B0U-y","outputId":"22dd35ef-a6ae-46e6-9421-c027d3324ac0"},"outputs":[{"name":"stdout","output_type":"stream","text":["Index(['Time', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', 'V7', 'V8', 'V9', 'V10',\n"," 'V11', 'V12', 'V13', 'V14', 'V15', 'V16', 'V17', 'V18', 'V19', 'V20',\n"," 'V21', 'V22', 'V23', 'V24', 'V25', 'V26', 'V27', 'V28', 'Amount',\n"," 'Class'],\n"," dtype='object')\n"]}],"source":["# Start exploring the dataset\n","print(data.columns)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"490ebrbW0U-y","outputId":"a23400c9-4b5c-45d1-9e17-34dc6aabb147"},"outputs":[{"data":{"text/html":["| \n"," | Time | \n","V1 | \n","V2 | \n","V3 | \n","V4 | \n","V5 | \n","V6 | \n","V7 | \n","V8 | \n","V9 | \n","... | \n","V21 | \n","V22 | \n","V23 | \n","V24 | \n","V25 | \n","V26 | \n","V27 | \n","V28 | \n","Amount | \n","Class | \n","
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 169876 | \n","119907.0 | \n","-0.611712 | \n","-0.769705 | \n","-0.149759 | \n","-0.224877 | \n","2.028577 | \n","-2.019887 | \n","0.292491 | \n","-0.523020 | \n","0.358468 | \n","... | \n","-0.075208 | \n","0.045536 | \n","0.380739 | \n","0.023440 | \n","-2.220686 | \n","-0.201146 | \n","0.066501 | \n","0.221180 | \n","1.79 | \n","0 | \n","
| 127467 | \n","78340.0 | \n","-0.814682 | \n","1.319219 | \n","1.329415 | \n","0.027273 | \n","-0.284871 | \n","-0.653985 | \n","0.321552 | \n","0.435975 | \n","-0.704298 | \n","... | \n","-0.128619 | \n","-0.368565 | \n","0.090660 | \n","0.401147 | \n","-0.261034 | \n","0.080621 | \n","0.162427 | \n","0.059456 | \n","1.98 | \n","0 | \n","
| 137900 | \n","82382.0 | \n","-0.318193 | \n","1.118618 | \n","0.969864 | \n","-0.127052 | \n","0.569563 | \n","-0.532484 | \n","0.706252 | \n","-0.064966 | \n","-0.463271 | \n","... | \n","-0.305402 | \n","-0.774704 | \n","-0.123884 | \n","-0.495687 | \n","-0.018148 | \n","0.121679 | \n","0.249050 | \n","0.092516 | \n","0.89 | \n","0 | \n","
| 21513 | \n","31717.0 | \n","-1.328271 | \n","1.018378 | \n","1.775426 | \n","-1.574193 | \n","-0.117696 | \n","-0.457733 | \n","0.681867 | \n","-0.031641 | \n","0.383872 | \n","... | \n","-0.220815 | \n","-0.419013 | \n","-0.239197 | \n","0.009967 | \n","0.232829 | \n","0.814177 | \n","0.098797 | \n","-0.004273 | \n","15.98 | \n","0 | \n","
| 134700 | \n","80923.0 | \n","1.276712 | \n","0.617120 | \n","-0.578014 | \n","0.879173 | \n","0.061706 | \n","-1.472002 | \n","0.373692 | \n","-0.287204 | \n","-0.084482 | \n","... | \n","-0.160161 | \n","-0.430404 | \n","-0.076738 | \n","0.258708 | \n","0.552170 | \n","0.370701 | \n","-0.034255 | \n","0.041709 | \n","0.76 | \n","0 | \n","
5 rows × 31 columns
\n","| \n"," | Time | \n","V1 | \n","V2 | \n","V3 | \n","V4 | \n","V5 | \n","V6 | \n","V7 | \n","V8 | \n","V9 | \n","... | \n","V20 | \n","V21 | \n","V22 | \n","V23 | \n","V24 | \n","V25 | \n","V26 | \n","V27 | \n","V28 | \n","Amount | \n","
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 20637 | \n","31172.0 | \n","1.081380 | \n","-0.808354 | \n","-0.075382 | \n","-2.219655 | \n","-0.696680 | \n","-0.523893 | \n","-0.261545 | \n","0.000618 | \n","1.679131 | \n","... | \n","0.095880 | \n","0.308749 | \n","0.889073 | \n","-0.305266 | \n","-0.237899 | \n","0.703002 | \n","-0.597632 | \n","0.067376 | \n","0.023303 | \n","99.99 | \n","
| 215628 | \n","140152.0 | \n","-0.219573 | \n","0.364497 | \n","0.650628 | \n","-0.245234 | \n","0.657173 | \n","0.824911 | \n","0.700127 | \n","0.198098 | \n","0.563869 | \n","... | \n","0.005662 | \n","-0.079944 | \n","-0.001599 | \n","0.397672 | \n","-1.145711 | \n","-1.432307 | \n","-0.656639 | \n","0.137209 | \n","0.040315 | \n","85.06 | \n","
| 272210 | \n","164968.0 | \n","2.050245 | \n","-0.346468 | \n","-2.571895 | \n","-1.156155 | \n","1.369042 | \n","0.625456 | \n","0.196785 | \n","0.141313 | \n","0.278761 | \n","... | \n","-0.293740 | \n","0.134582 | \n","0.507608 | \n","0.078827 | \n","-0.878924 | \n","0.008064 | \n","1.063248 | \n","-0.107996 | \n","-0.111011 | \n","3.02 | \n","
| 224227 | \n","143702.0 | \n","-6.122199 | \n","4.091185 | \n","-3.495669 | \n","-0.304348 | \n","-2.637439 | \n","-0.855252 | \n","-2.441024 | \n","3.913505 | \n","-0.701046 | \n","... | \n","-1.026459 | \n","0.046612 | \n","-0.948586 | \n","0.155294 | \n","0.550086 | \n","1.064556 | \n","-0.396596 | \n","-1.615856 | \n","-0.431825 | \n","30.19 | \n","
| 86493 | \n","61258.0 | \n","-1.933338 | \n","1.883129 | \n","-0.736065 | \n","-0.742869 | \n","-0.603784 | \n","0.184637 | \n","0.961916 | \n","0.876091 | \n","-1.204311 | \n","... | \n","-0.593771 | \n","-0.202868 | \n","-0.906577 | \n","-0.009884 | \n","-0.902972 | \n","-0.149118 | \n","0.011736 | \n","-0.569260 | \n","-0.176868 | \n","194.85 | \n","
5 rows × 30 columns
\n","| \n"," | Time | \n","V1 | \n","V2 | \n","V3 | \n","V4 | \n","V5 | \n","V6 | \n","V7 | \n","V8 | \n","V9 | \n","... | \n","V20 | \n","V21 | \n","V22 | \n","V23 | \n","V24 | \n","V25 | \n","V26 | \n","V27 | \n","V28 | \n","Amount | \n","
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 231988 | \n","147004.0 | \n","-0.472661 | \n","-1.430584 | \n","-0.116832 | \n","-2.528959 | \n","0.975027 | \n","1.077347 | \n","-0.436959 | \n","0.426111 | \n","-2.597479 | \n","... | \n","-0.020614 | \n","-0.074423 | \n","-0.127588 | \n","0.561987 | \n","-0.803431 | \n","-1.137093 | \n","-0.408330 | \n","0.208716 | \n","0.218764 | \n","100.00 | \n","
| 223080 | \n","143238.0 | \n","-0.298687 | \n","0.932025 | \n","-0.460979 | \n","-0.922886 | \n","0.321475 | \n","-1.422644 | \n","0.773147 | \n","0.170359 | \n","0.025994 | \n","... | \n","-0.491993 | \n","0.454823 | \n","1.210077 | \n","-0.065351 | \n","-0.020786 | \n","-0.816140 | \n","-0.406927 | \n","0.112648 | \n","0.187384 | \n","5.34 | \n","
| 3594 | \n","3073.0 | \n","1.188739 | \n","-0.110925 | \n","-0.247423 | \n","-0.056450 | \n","-0.230669 | \n","-0.884735 | \n","0.252123 | \n","-0.107019 | \n","-0.128072 | \n","... | \n","-0.053187 | \n","-0.339530 | \n","-1.248046 | \n","0.070858 | \n","-0.003754 | \n","0.148766 | \n","0.706722 | \n","-0.132117 | \n","-0.008134 | \n","58.92 | \n","
| 147745 | \n","88926.0 | \n","0.080975 | \n","0.866882 | \n","-0.367443 | \n","-0.620509 | \n","0.801360 | \n","-0.530850 | \n","0.767727 | \n","0.137662 | \n","-0.093225 | \n","... | \n","-0.070243 | \n","-0.291797 | \n","-0.798181 | \n","0.128168 | \n","0.560247 | \n","-0.531633 | \n","0.097231 | \n","0.211335 | \n","0.071850 | \n","5.49 | \n","
| 110882 | \n","72013.0 | \n","-0.523168 | \n","0.504064 | \n","2.327159 | \n","1.033996 | \n","-0.256216 | \n","0.920008 | \n","0.123060 | \n","0.452847 | \n","0.348782 | \n","... | \n","-0.110084 | \n","-0.211943 | \n","-0.184578 | \n","0.053897 | \n","0.224449 | \n","-0.507835 | \n","-0.653984 | \n","0.232471 | \n","0.162894 | \n","20.00 | \n","
5 rows × 30 columns
\n","