Skip to content
Snippets Groups Projects
5_classification_supervisee.ipynb 115 KiB
Newer Older
  • Learn to ignore specific revisions
  • Alessandro Cerioni's avatar
    Alessandro Cerioni committed
    {
     "cells": [
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "# 5 - Classification supervisée"
       ]
      },
      {
       "cell_type": "code",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "execution_count": 6,
       "metadata": {},
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "outputs": [],
       "source": [
        "import pandas as pd\n",
        "import seaborn as sns # cf. https://stackoverflow.com/questions/41499857/seaborn-why-import-as-sns#44484758"
       ]
      },
      {
       "cell_type": "code",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "execution_count": 7,
       "metadata": {},
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "outputs": [],
       "source": [
        "sns.set(rc={\"figure.figsize\": (32, 16)})"
       ]
      },
      {
       "cell_type": "code",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "execution_count": 8,
       "metadata": {},
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "outputs": [],
       "source": [
        "df5 = pd.read_pickle('data/df5.pkl')"
       ]
      },
      {
       "cell_type": "code",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "execution_count": 9,
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "<class 'pandas.core.frame.DataFrame'>\n",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
          "Int64Index: 74963 entries, 0 to 95159\n",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
          "Data columns (total 28 columns):\n",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
          " #   Column              Non-Null Count  Dtype  \n",
          "---  ------              --------------  -----  \n",
          " 0   essencefrancais     74963 non-null  object \n",
          " 1   circonference_cm    74963 non-null  float64\n",
          " 2   hauteurtotale_m     74963 non-null  int64  \n",
          " 3   hauteurfut_m        74963 non-null  float64\n",
          " 4   diametrecouronne_m  74963 non-null  int64  \n",
          " 5   rayoncouronne_m     74900 non-null  float64\n",
          " 6   dateplantation      50216 non-null  object \n",
          " 7   genre               74963 non-null  object \n",
          " 8   espece              74963 non-null  object \n",
          " 9   variete             74963 non-null  object \n",
          " 10  essence             74963 non-null  object \n",
          " 11  architecture        74963 non-null  object \n",
          " 12  localisation        74963 non-null  object \n",
          " 13  naturerevetement    74963 non-null  object \n",
          " 14  mobilierurbain      74963 non-null  object \n",
          " 15  anneeplantation     50218 non-null  float64\n",
          " 16  commune             74963 non-null  object \n",
          " 17  codeinsee           74963 non-null  int64  \n",
          " 18  nomvoie             74963 non-null  object \n",
          " 19  codefuv             74808 non-null  float64\n",
          " 20  identifiant         74963 non-null  int64  \n",
          " 21  numero              74963 non-null  int64  \n",
          " 22  codegenre           74963 non-null  int64  \n",
          " 23  gid                 74963 non-null  int64  \n",
          " 24  surfacecadre_m2     49993 non-null  float64\n",
          " 25  lat                 74963 non-null  float64\n",
          " 26  lon                 74963 non-null  float64\n",
          " 27  circonference_m     74963 non-null  float64\n",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
          "dtypes: float64(9), int64(7), object(12)\n",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
          "memory usage: 16.6+ MB\n"
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
         ]
        }
       ],
       "source": [
        "df5.info()"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "## Objectif \n",
        "\n",
    
        "Déterminer le genre d'un arbre à partir des ses propriètes mesurables : hauteur totale, hauteur du fut, circonference, diametre de la couronne, latitude, longitude. Il s'agit d'un problème de **classification supervisée**, qu'on resoudra grâce à la librairie `scikit-learn`, https://scikit-learn.org/."
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "Pour rappel :"
       ]
      },
      {
       "cell_type": "code",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "execution_count": 10,
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
          "Nombre de genres différents =  86\n"
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
         ]
        }
       ],
       "source": [
        "print(\"Nombre de genres différents = \", df5.groupby(['genre'])['genre'].count().count())"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "Il convient de ranger les propriètes (*features*) numériques qu'on souhaite utiliser dans la variable suivante, car on en aura besoin ci-dessous :"
       ]
      },
      {
       "cell_type": "code",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "execution_count": 11,
       "metadata": {},
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "outputs": [],
       "source": [
        "num_features = ['circonference_m', 'diametrecouronne_m', 'hauteurfut_m', 'hauteurtotale_m', 'lat', 'lon']"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "À partir de `df5`, on peut créer un DataFrame n'incluant que ces dernières *features* :"
       ]
      },
      {
       "cell_type": "code",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "execution_count": 12,
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "metadata": {},
       "outputs": [
        {
         "data": {
          "text/html": [
           "<div>\n",
           "<style scoped>\n",
           "    .dataframe tbody tr th:only-of-type {\n",
           "        vertical-align: middle;\n",
           "    }\n",
           "\n",
           "    .dataframe tbody tr th {\n",
           "        vertical-align: top;\n",
           "    }\n",
           "\n",
           "    .dataframe thead th {\n",
           "        text-align: right;\n",
           "    }\n",
           "</style>\n",
           "<table border=\"1\" class=\"dataframe\">\n",
           "  <thead>\n",
           "    <tr style=\"text-align: right;\">\n",
           "      <th></th>\n",
           "      <th>circonference_m</th>\n",
           "      <th>diametrecouronne_m</th>\n",
           "      <th>hauteurfut_m</th>\n",
           "      <th>hauteurtotale_m</th>\n",
           "      <th>lat</th>\n",
           "      <th>lon</th>\n",
           "    </tr>\n",
           "  </thead>\n",
           "  <tbody>\n",
           "    <tr>\n",
           "      <th>0</th>\n",
           "      <td>0.30</td>\n",
           "      <td>5</td>\n",
           "      <td>2.0</td>\n",
           "      <td>7</td>\n",
           "      <td>45.804503</td>\n",
           "      <td>4.772993</td>\n",
           "    </tr>\n",
           "    <tr>\n",
           "      <th>1</th>\n",
           "      <td>0.45</td>\n",
           "      <td>4</td>\n",
           "      <td>2.0</td>\n",
           "      <td>6</td>\n",
           "      <td>45.803322</td>\n",
           "      <td>4.775080</td>\n",
           "    </tr>\n",
           "    <tr>\n",
           "      <th>2</th>\n",
           "      <td>0.50</td>\n",
           "      <td>5</td>\n",
           "      <td>2.0</td>\n",
           "      <td>7</td>\n",
           "      <td>45.803241</td>\n",
           "      <td>4.775227</td>\n",
           "    </tr>\n",
           "    <tr>\n",
           "      <th>3</th>\n",
           "      <td>0.40</td>\n",
           "      <td>5</td>\n",
           "      <td>2.0</td>\n",
           "      <td>7</td>\n",
           "      <td>45.804540</td>\n",
           "      <td>4.772921</td>\n",
           "    </tr>\n",
           "    <tr>\n",
           "      <th>4</th>\n",
           "      <td>0.30</td>\n",
           "      <td>5</td>\n",
           "      <td>2.0</td>\n",
           "      <td>7</td>\n",
           "      <td>45.804468</td>\n",
           "      <td>4.773058</td>\n",
           "    </tr>\n",
           "  </tbody>\n",
           "</table>\n",
           "</div>"
          ],
          "text/plain": [
           "   circonference_m  diametrecouronne_m  hauteurfut_m  hauteurtotale_m  \\\n",
           "0             0.30                   5           2.0                7   \n",
           "1             0.45                   4           2.0                6   \n",
           "2             0.50                   5           2.0                7   \n",
           "3             0.40                   5           2.0                7   \n",
           "4             0.30                   5           2.0                7   \n",
           "\n",
           "         lat       lon  \n",
           "0  45.804503  4.772993  \n",
           "1  45.803322  4.775080  \n",
           "2  45.803241  4.775227  \n",
           "3  45.804540  4.772921  \n",
           "4  45.804468  4.773058  "
          ]
         },
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
         "execution_count": 12,
         "metadata": {},
         "output_type": "execute_result"
        }
       ],
       "source": [
        "X = df5[ num_features ].copy()\n",
        "X.head()"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 13,
       "metadata": {},
       "outputs": [],
       "source": [
        "min_lat = X.lat.min()\n",
        "max_lat = X.lat.max()\n",
        "min_lon = X.lon.min()\n",
        "max_lon = X.lon.max()"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 14,
       "metadata": {},
       "outputs": [],
       "source": [
        "X['nlat'] = X.lat.apply( lambda row : (row - min_lat)/(max_lat-min_lat) )\n",
        "X['nlon'] = X.lon.apply( lambda row : (row - min_lon)/(max_lon-min_lon) )"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 15,
       "metadata": {},
       "outputs": [
        {
         "data": {
          "text/html": [
           "<div>\n",
           "<style scoped>\n",
           "    .dataframe tbody tr th:only-of-type {\n",
           "        vertical-align: middle;\n",
           "    }\n",
           "\n",
           "    .dataframe tbody tr th {\n",
           "        vertical-align: top;\n",
           "    }\n",
           "\n",
           "    .dataframe thead th {\n",
           "        text-align: right;\n",
           "    }\n",
           "</style>\n",
           "<table border=\"1\" class=\"dataframe\">\n",
           "  <thead>\n",
           "    <tr style=\"text-align: right;\">\n",
           "      <th></th>\n",
           "      <th>circonference_m</th>\n",
           "      <th>diametrecouronne_m</th>\n",
           "      <th>hauteurfut_m</th>\n",
           "      <th>hauteurtotale_m</th>\n",
           "      <th>lat</th>\n",
           "      <th>lon</th>\n",
           "      <th>nlat</th>\n",
           "      <th>nlon</th>\n",
           "    </tr>\n",
           "  </thead>\n",
           "  <tbody>\n",
           "    <tr>\n",
           "      <th>0</th>\n",
           "      <td>0.30</td>\n",
           "      <td>5</td>\n",
           "      <td>2.0</td>\n",
           "      <td>7</td>\n",
           "      <td>45.804503</td>\n",
           "      <td>4.772993</td>\n",
           "      <td>0.638981</td>\n",
           "      <td>0.209793</td>\n",
           "    </tr>\n",
           "    <tr>\n",
           "      <th>1</th>\n",
           "      <td>0.45</td>\n",
           "      <td>4</td>\n",
           "      <td>2.0</td>\n",
           "      <td>6</td>\n",
           "      <td>45.803322</td>\n",
           "      <td>4.775080</td>\n",
           "      <td>0.635795</td>\n",
           "      <td>0.215563</td>\n",
           "    </tr>\n",
           "    <tr>\n",
           "      <th>2</th>\n",
           "      <td>0.50</td>\n",
           "      <td>5</td>\n",
           "      <td>2.0</td>\n",
           "      <td>7</td>\n",
           "      <td>45.803241</td>\n",
           "      <td>4.775227</td>\n",
           "      <td>0.635576</td>\n",
           "      <td>0.215970</td>\n",
           "    </tr>\n",
           "    <tr>\n",
           "      <th>3</th>\n",
           "      <td>0.40</td>\n",
           "      <td>5</td>\n",
           "      <td>2.0</td>\n",
           "      <td>7</td>\n",
           "      <td>45.804540</td>\n",
           "      <td>4.772921</td>\n",
           "      <td>0.639080</td>\n",
           "      <td>0.209593</td>\n",
           "    </tr>\n",
           "    <tr>\n",
           "      <th>4</th>\n",
           "      <td>0.30</td>\n",
           "      <td>5</td>\n",
           "      <td>2.0</td>\n",
           "      <td>7</td>\n",
           "      <td>45.804468</td>\n",
           "      <td>4.773058</td>\n",
           "      <td>0.638886</td>\n",
           "      <td>0.209974</td>\n",
           "    </tr>\n",
           "  </tbody>\n",
           "</table>\n",
           "</div>"
          ],
          "text/plain": [
           "   circonference_m  diametrecouronne_m  hauteurfut_m  hauteurtotale_m  \\\n",
           "0             0.30                   5           2.0                7   \n",
           "1             0.45                   4           2.0                6   \n",
           "2             0.50                   5           2.0                7   \n",
           "3             0.40                   5           2.0                7   \n",
           "4             0.30                   5           2.0                7   \n",
           "\n",
           "         lat       lon      nlat      nlon  \n",
           "0  45.804503  4.772993  0.638981  0.209793  \n",
           "1  45.803322  4.775080  0.635795  0.215563  \n",
           "2  45.803241  4.775227  0.635576  0.215970  \n",
           "3  45.804540  4.772921  0.639080  0.209593  \n",
           "4  45.804468  4.773058  0.638886  0.209974  "
          ]
         },
         "execution_count": 15,
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
         "metadata": {},
         "output_type": "execute_result"
        }
       ],
       "source": [
        "X.head()"
       ]
      },
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
      {
       "cell_type": "code",
       "execution_count": 16,
       "metadata": {},
       "outputs": [],
       "source": [
        "X = X.drop(['lat', 'lon'], axis=1)"
       ]
      },
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "... et un autre DataFrame qui ne contient que la colonne qu'on souhaite prédire :"
       ]
      },
      {
       "cell_type": "code",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "execution_count": 17,
       "metadata": {},
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "outputs": [],
       "source": [
        "y = df5[ ['genre'] ]"
       ]
      },
      {
       "cell_type": "code",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "execution_count": 18,
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "metadata": {},
       "outputs": [
        {
         "data": {
          "text/html": [
           "<div>\n",
           "<style scoped>\n",
           "    .dataframe tbody tr th:only-of-type {\n",
           "        vertical-align: middle;\n",
           "    }\n",
           "\n",
           "    .dataframe tbody tr th {\n",
           "        vertical-align: top;\n",
           "    }\n",
           "\n",
           "    .dataframe thead th {\n",
           "        text-align: right;\n",
           "    }\n",
           "</style>\n",
           "<table border=\"1\" class=\"dataframe\">\n",
           "  <thead>\n",
           "    <tr style=\"text-align: right;\">\n",
           "      <th></th>\n",
           "      <th>genre</th>\n",
           "    </tr>\n",
           "  </thead>\n",
           "  <tbody>\n",
           "    <tr>\n",
           "      <th>0</th>\n",
           "      <td>Acer</td>\n",
           "    </tr>\n",
           "    <tr>\n",
           "      <th>1</th>\n",
           "      <td>Acer</td>\n",
           "    </tr>\n",
           "    <tr>\n",
           "      <th>2</th>\n",
           "      <td>Acer</td>\n",
           "    </tr>\n",
           "    <tr>\n",
           "      <th>3</th>\n",
           "      <td>Acer</td>\n",
           "    </tr>\n",
           "    <tr>\n",
           "      <th>4</th>\n",
           "      <td>Acer</td>\n",
           "    </tr>\n",
           "  </tbody>\n",
           "</table>\n",
           "</div>"
          ],
          "text/plain": [
           "  genre\n",
           "0  Acer\n",
           "1  Acer\n",
           "2  Acer\n",
           "3  Acer\n",
           "4  Acer"
          ]
         },
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
         "execution_count": 18,
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
         "metadata": {},
         "output_type": "execute_result"
        }
       ],
       "source": [
        "y.head()"
       ]
      },
      {
       "cell_type": "code",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "execution_count": 19,
       "metadata": {},
       "outputs": [
        {
         "data": {
          "text/plain": [
           "array([['Acer'],\n",
           "       ['Acer'],\n",
           "       ['Acer'],\n",
           "       ...,\n",
           "       ['Quercus'],\n",
           "       ['Fraxinus'],\n",
           "       ['Acer']], dtype=object)"
          ]
         },
         "execution_count": 19,
         "metadata": {},
         "output_type": "execute_result"
        }
       ],
       "source": [
        "y.values"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 20,
       "metadata": {},
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "outputs": [],
       "source": [
        "y = y.values.ravel() # pour que y soit conforme au format attendu par la librairie qu'on utilisera ci-dessous..."
       ]
      },
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
      {
       "cell_type": "code",
       "execution_count": 21,
       "metadata": {},
       "outputs": [
        {
         "data": {
          "text/plain": [
           "array(['Acer', 'Acer', 'Acer', ..., 'Quercus', 'Fraxinus', 'Acer'],\n",
           "      dtype=object)"
          ]
         },
         "execution_count": 21,
         "metadata": {},
         "output_type": "execute_result"
        }
       ],
       "source": [
        "y"
       ]
      },
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "## Découpage du jeu de données en deux parties : *training set* et *test set*"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "La librairie `sklearn` fournit la fonction dont on a besoin, cf. https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html :"
       ]
      },
      {
       "cell_type": "code",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "execution_count": 22,
       "metadata": {},
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "outputs": [],
       "source": [
        "from sklearn.model_selection import train_test_split"
       ]
      },
      {
       "cell_type": "code",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "execution_count": 23,
       "metadata": {},
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "outputs": [],
       "source": [
        "X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, shuffle=True)"
       ]
      },
      {
       "cell_type": "code",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "execution_count": 24,
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
          "0.7499966650214106\n",
          "0.25000333497858945\n"
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
         ]
        }
       ],
       "source": [
        "print( len(X_train)/len(X) )\n",
        "print( len(X_test)/len(X) )"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "La librairie `scikit-learn` inclut plusieurs algorithmes de classification supervisée, cf. https://scikit-learn.org/stable/supervised_learning.html#supervised-learning. Ici on se limitera à en tester quelques-uns. Afin de comparer les algorithmes entre eux, on stockera dans le dictionnaire `accuracy_report` la mésure de fiabilité de chaque algorithme."
       ]
      },
      {
       "cell_type": "code",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "execution_count": 25,
       "metadata": {},
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "outputs": [],
       "source": [
        "accuracy_report = dict()"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "### Logistic Regression"
       ]
      },
      {
       "cell_type": "code",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "execution_count": 26,
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "metadata": {},
       "outputs": [
        {
         "name": "stderr",
         "output_type": "stream",
         "text": [
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
          "/home/acerioni/Documents/ClubDevAnonymes/20190703_Python/venv/lib/python3.7/site-packages/sklearn/linear_model/_logistic.py:940: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
          "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
          "\n",
          "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
          "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
          "Please also refer to the documentation for alternative solver options:\n",
          "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
          "  extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)\n"
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
         ]
        },
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "Accuracy of the Logistic Regression classifier on the training set: 0.36\n",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
          "Accuracy of the Logistic Regression classifier on the test set: 0.35\n"
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
         ]
        }
       ],
       "source": [
        "from sklearn.linear_model import LogisticRegression\n",
        "\n",
        "logreg = LogisticRegression()\n",
        "\n",
        "logreg.fit( X_train, y_train )\n",
        "\n",
        "print('Accuracy of the Logistic Regression classifier on the training set: {:.2f}'\n",
        "     .format( logreg.score(X_train, y_train)) )\n",
        "\n",
        "print('Accuracy of the Logistic Regression classifier on the test set: {:.2f}'\n",
        "     .format( logreg.score(X_test, y_test)) )\n",
        "\n",
        "accuracy_report[ 'logreg' ] = logreg.score(X_test, y_test)"
       ]
      },
      {
       "cell_type": "code",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "execution_count": 27,
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "metadata": {},
       "outputs": [
        {
         "data": {
          "text/plain": [
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
           "{'logreg': 0.3537164505629369}"
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
          ]
         },
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
         "execution_count": 27,
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
         "metadata": {},
         "output_type": "execute_result"
        }
       ],
       "source": [
        "accuracy_report"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "### K-Nearest Neighbors Classifier"
       ]
      },
      {
       "cell_type": "code",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "execution_count": 28,
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "Accuracy of the K-NN Classifier on the training set: 0.51\n",
    
          "Accuracy of the K-NN classifier on the test set: 0.49\n"
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
         ]
        }
       ],
       "source": [
        "from sklearn.neighbors import KNeighborsClassifier\n",
        "\n",
        "knn = KNeighborsClassifier(n_neighbors = 50) # <- on devrait faire tourner l'algorithme avec différentes valeurs de ce paramètre, afin de sélectionner la meilleure configuration... \n",
        "\n",
        "knn.fit(X_train, y_train)\n",
        "\n",
        "print('Accuracy of the K-NN Classifier on the training set: {:.2f}'\n",
        "     .format(knn.score(X_train, y_train)))\n",
        "\n",
        "print('Accuracy of the K-NN classifier on the test set: {:.2f}'\n",
        "     .format(knn.score(X_test, y_test)))\n",
        "\n",
        "accuracy_report[ 'knn' ] = knn.score(X_test, y_test)"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "### Decision Tree Classifier"
       ]
      },
      {
       "cell_type": "code",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "execution_count": 29,
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "Accuracy of the Decision Tree classifier on the training set: 1.00\n",
          "Accuracy of the Decision Tree classifier on the test set: 0.77\n"
         ]
        }
       ],
       "source": [
        "from sklearn.tree import DecisionTreeClassifier\n",
        "\n",
        "dt = DecisionTreeClassifier().fit(X_train, y_train)\n",
        "\n",
        "print('Accuracy of the Decision Tree classifier on the training set: {:.2f}'\n",
        "     .format(dt.score(X_train, y_train)))\n",
        "\n",
        "print('Accuracy of the Decision Tree classifier on the test set: {:.2f}'\n",
        "     .format(dt.score(X_test, y_test)))\n",
        "\n",
        "accuracy_report[ 'decision_tree' ] = dt.score(X_test, y_test)"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "Les résultats fournis par cet algorithme sont tout à fait respectables ! Cela mérite un petit approfondissement :"
       ]
      },
      {
       "cell_type": "code",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "execution_count": 30,
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "metadata": {},
       "outputs": [
        {
         "name": "stderr",
         "output_type": "stream",
         "text": [
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
          "/home/acerioni/Documents/ClubDevAnonymes/20190703_Python/venv/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1272: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n",
          "  _warn_prf(average, modifier, msg_start, len(result))\n",
          "/home/acerioni/Documents/ClubDevAnonymes/20190703_Python/venv/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1272: UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.\n",
          "  _warn_prf(average, modifier, msg_start, len(result))\n"
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
         ]
        },
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "                 precision    recall  f1-score   support\n",
          "\n",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
          "          Abies       1.00      0.17      0.29         6\n",
          "         Acacia       1.00      1.00      1.00         2\n",
          "           Acer       0.70      0.71      0.70      2099\n",
          "       Aesculus       0.75      0.71      0.73       276\n",
          "      Ailanthus       1.00      0.55      0.71        11\n",
          "       Albizzia       0.68      0.72      0.70        47\n",
          "          Alnus       0.59      0.59      0.59       303\n",
          "    Amelanchier       0.58      0.65      0.61        23\n",
          "         Betula       0.66      0.61      0.63        69\n",
          "   Broussonetia       0.22      0.29      0.25         7\n",
          "          Buxus       0.00      0.00      0.00         1\n",
          "     Calocedrus       0.67      0.47      0.55        17\n",
          "       Carpinus       0.67      0.68      0.68       151\n",
          "       Castanea       0.67      0.67      0.67         3\n",
          "        Catalpa       0.46      0.38      0.42        29\n",
          "        Cedrela       0.60      0.75      0.67         8\n",
          "         Cedrus       0.71      0.76      0.73       132\n",
          "         Celtis       0.81      0.82      0.82      1451\n",
          " Cercidiphyllum       0.00      0.00      0.00         0\n",
          "         Cercis       0.60      0.55      0.57        55\n",
          "     Cladrastis       1.00      1.00      1.00         4\n",
          "         Cornus       0.43      0.43      0.43         7\n",
          "        Corylus       0.73      0.74      0.73       389\n",
          "      Crataegus       1.00      0.73      0.85        15\n",
          "Cupressocyparis       0.00      0.00      0.00         3\n",
          "      Cupressus       0.50      0.83      0.62         6\n",
          "        Davidia       1.00      1.00      1.00         1\n",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
          "      Elaeagnus       0.00      0.00      0.00         0\n",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
          "     Eucalyptus       0.00      0.00      0.00         1\n",
          "         Evodia       0.65      0.58      0.61        38\n",
          "          Fagus       0.33      0.53      0.41        15\n",
          "          Ficus       1.00      0.50      0.67         2\n",
          "       Fraxinus       0.73      0.72      0.72      1420\n",
          "         Ginkgo       0.59      0.68      0.63        60\n",
          "      Gleditsia       0.74      0.78      0.76       469\n",
          "    Gymnocladus       0.75      0.60      0.67         5\n",
    
          "        Halesia       1.00      1.00      1.00         1\n",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
          "       Hibiscus       0.00      0.00      0.00         1\n",
          "        Juglans       0.35      0.41      0.38        29\n",
          "   Koelreuteria       0.62      0.68      0.65       141\n",
          "  Lagerstroemia       0.93      0.82      0.87        49\n",
          "          Larix       0.00      0.00      0.00         0\n",
          "      Ligustrum       0.00      0.00      0.00         2\n",
          "    Liquidambar       0.64      0.62      0.63       154\n",
          "   Liriodendron       0.58      0.60      0.59        80\n",
          "       Magnolia       0.70      0.68      0.69       101\n",
          "          Malus       0.72      0.65      0.68       195\n",
          "          Melia       0.89      0.76      0.82        21\n",
          "       Mespilus       0.00      0.00      0.00         0\n",
          "    Metasequoia       0.69      0.59      0.63        41\n",
          "          Morus       0.61      0.66      0.64        53\n",
          "          Nyssa       0.00      0.00      0.00         3\n",
          "           Olea       0.00      0.00      0.00         0\n",
          "         Ostrya       0.67      0.73      0.70       102\n",
          "       Parrotia       0.62      0.64      0.63        25\n",
          "      Paulownia       0.58      0.56      0.57        75\n",
    
          "  Phellodendron       1.00      1.00      1.00         1\n",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
          "          Picea       0.14      0.33      0.20         3\n",
          "          Pinus       0.72      0.61      0.66       168\n",
          "          Pirus       0.72      0.75      0.74       782\n",
          "       Platanus       0.93      0.93      0.93      4524\n",
          "     Platycarya       0.00      0.00      0.00         1\n",
          "        Populus       0.69      0.73      0.71        98\n",
          "         Prunus       0.67      0.66      0.67       729\n",
          "    Pseudotsuga       0.00      0.00      0.00         4\n",
          "     Pterocarya       0.56      0.53      0.55        34\n",
          "        Quercus       0.70      0.70      0.70      1180\n",
          "           Rhus       0.00      0.00      0.00         1\n",
          "        Robinia       0.60      0.58      0.59       139\n",
          "          Salix       0.75      0.50      0.60        72\n",
          "        Sequoia       0.00      0.00      0.00         3\n",
          "        Sophora       0.79      0.77      0.78       754\n",
          "         Sorbus       0.45      0.50      0.48        10\n",
          "       Taxodium       0.00      0.00      0.00         0\n",
          "          Taxus       0.00      0.00      0.00         3\n",
          "          Thuya       0.00      0.00      0.00         0\n",
          "          Tilia       0.76      0.77      0.76      1440\n",
          "          Ulmus       0.74      0.73      0.73       328\n",
          "        Zelkova       0.62      0.62      0.62       269\n",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
          "\n",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
          "       accuracy                           0.77     18741\n",
          "      macro avg       0.53      0.51      0.51     18741\n",
          "   weighted avg       0.77      0.77      0.77     18741\n",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
          "\n"
         ]
        }
       ],
       "source": [
        "from sklearn.metrics import classification_report\n",
        "\n",
        "y_pred = dt.predict(X_test)\n",
        "\n",
        "print( classification_report(y_test, y_pred) )"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "L'algorithme est aussi capable de nous dire quelles sont les *features* qui ont plus d'importance pour la classification :"
       ]
      },
      {
       "cell_type": "code",
    
    Alessandro Cerioni's avatar
    Alessandro Cerioni committed
       "execution_count": 31,
       "metadata": {},
       "outputs": [
        {
         "data": {
          "text/html": [
           "<div>\n",
           "<style scoped>\n",
           "    .dataframe tbody tr th:only-of-type {\n",
           "        vertical-align: middle;\n",
           "    }\n",
           "\n",
           "    .dataframe tbody tr th {\n",
           "        vertical-align: top;\n",
           "    }\n",
           "\n",
           "    .dataframe thead th {\n",
           "        text-align: right;\n",
           "    }\n",
           "</style>\n",
           "<table border=\"1\" class=\"dataframe\">\n",
           "  <thead>\n",
           "    <tr style=\"text-align: right;\">\n",
           "      <th></th>\n",
           "      <th>circonference_m</th>\n",
           "      <th>diametrecouronne_m</th>\n",
           "      <th>hauteurfut_m</th>\n",
           "      <th>hauteurtotale_m</th>\n",
           "      <th>nlat</th>\n",
           "      <th>nlon</th>\n",
           "    </tr>\n",
           "  </thead>\n",
           "  <tbody>\n",
           "    <tr>\n",
           "      <th>0</th>\n",
           "      <td>0.30</td>\n",
           "      <td>5</td>\n",
           "      <td>2.0</td>\n",
           "      <td>7</td>\n",
           "      <td>0.638981</td>\n",
           "      <td>0.209793</td>\n",
           "    </tr>\n",
           "    <tr>\n",
           "      <th>1</th>\n",
           "      <td>0.45</td>\n",
           "      <td>4</td>\n",
           "      <td>2.0</td>\n",
           "      <td>6</td>\n",
           "      <td>0.635795</td>\n",
           "      <td>0.215563</td>\n",
           "    </tr>\n",
           "    <tr>\n",
           "      <th>2</th>\n",
           "      <td>0.50</td>\n",
           "      <td>5</td>\n",
           "      <td>2.0</td>\n",
           "      <td>7</td>\n",
           "      <td>0.635576</td>\n",
           "      <td>0.215970</td>\n",
           "    </tr>\n",
           "    <tr>\n",
           "      <th>3</th>\n",
           "      <td>0.40</td>\n",
           "      <td>5</td>\n",
           "      <td>2.0</td>\n",
           "      <td>7</td>\n",
           "      <td>0.639080</td>\n",
           "      <td>0.209593</td>\n",
           "    </tr>\n",
           "    <tr>\n",
           "      <th>4</th>\n",
           "      <td>0.30</td>\n",
           "      <td>5</td>\n",
           "      <td>2.0</td>\n",
           "      <td>7</td>\n",
           "      <td>0.638886</td>\n",
           "      <td>0.209974</td>\n",
           "    </tr>\n",
           "  </tbody>\n",
           "</table>\n",
           "</div>"
          ],
          "text/plain": [
           "   circonference_m  diametrecouronne_m  hauteurfut_m  hauteurtotale_m  \\\n",
           "0             0.30                   5           2.0                7   \n",
           "1             0.45                   4           2.0                6   \n",
           "2             0.50                   5           2.0                7   \n",
           "3             0.40                   5           2.0                7   \n",
           "4             0.30                   5           2.0                7   \n",
           "\n",
           "       nlat      nlon  \n",
           "0  0.638981  0.209793  \n",
           "1  0.635795  0.215563  \n",
           "2  0.635576  0.215970  \n",
           "3  0.639080  0.209593  \n",
           "4  0.638886  0.209974  "
          ]
         },
         "execution_count": 31,
         "metadata": {},
         "output_type": "execute_result"
        }
       ],
       "source": [
        "X.head()"
       ]
      },
      {