diff --git a/notebooks/machine-learning/answers/0.bias-and-common-errors.ipynb b/notebooks/machine-learning/answers/0.bias-and-common-errors.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..d02f22faef70e5a95e4f1939c0cbfca05a01c4f8 --- /dev/null +++ b/notebooks/machine-learning/answers/0.bias-and-common-errors.ipynb @@ -0,0 +1,145 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "411ba7b3-7d56-45fe-b01e-205275e1988a", + "metadata": {}, + "source": [ + "# Des biais et des erreurs communes" + ] + }, + { + "cell_type": "markdown", + "id": "4e2fcf4b-d8aa-4bb2-8eab-dfe9a3210604", + "metadata": {}, + "source": [ + "Les exercices suivants sont destinés à vous familiariser avec les concepts appréhendés lors de l’introduction au *machine learning*. Avant toute chose, importez les librairies utiles :" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4bbfb43b-1feb-4366-b1e7-5536f0f5aacd", + "metadata": {}, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "import pandas as pd\n", + "import seaborn as sns\n", + "\n", + "sns.set_context('notebook')" + ] + }, + { + "cell_type": "markdown", + "id": "61c8d84f-a791-425e-ae70-306f0da93a55", + "metadata": {}, + "source": [ + "## Les relations à distance" + ] + }, + { + "cell_type": "markdown", + "id": "057d738a-a8a8-4d38-9dd2-b109d1325308", + "metadata": {}, + "source": [ + "Il paraît que l’univers est en expansion et que cette expansion va en s’accélérant. C’est en tout cas ce que l’étude de Wendy Freedman et al. a prouvé ([*Freedman, 2001*](../0.about-datasets.ipynb#Stellar-Objects)). Par conséquent, on s’attend à ce qu’un objet stellaire s’éloigne d’autant plus vite de nous que la distance qui nous sépare de lui est grande.\n", + "\n", + "Chargeons le jeu de données en se concentrant sur des objets proches de nous (entre 30 000 et 100 000 années-lumières) :" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1cf3ab56-418f-46e3-bc3f-36cf0eec0dbf", + "metadata": {}, + "outputs": [], + "source": [ + "# load data\n", + "df = pd.read_csv(\"../files/stellar-objects.csv\", sep=\"\\t\")\n", + "\n", + "# distance: megaparsec (MPC)\n", + "# velocity: in km/s\n", + "df[\"velocity\"] = df.v_helio.fillna(df.v_flow.fillna(df.v_cmb))\n", + "\n", + "# objects close to earth, but not that close :)\n", + "data = df[(df.distance > 10) & (df.distance < 30)]" + ] + }, + { + "cell_type": "markdown", + "id": "f0a306e1-be3e-4431-84a3-32216340c326", + "metadata": {}, + "source": [ + "Affichons un nuage de points afin de vérifier la proposition de ces pontes de la NASA :" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1fb0d73f-62bd-4777-b4e4-276554e2a599", + "metadata": {}, + "outputs": [], + "source": [ + "sns.scatterplot(data=data, x=\"distance\", y=\"velocity\")\n", + "\n", + "sns.despine()\n", + "\n", + "plt.title(\"Relation between distance and velocity of stellar objects\")\n", + "plt.xlabel(\"Distance (MPC)\")\n", + "plt.ylabel(\"Velocity (km/s)\")\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "fbb20849-4a22-4870-940b-8067fd06e548", + "metadata": {}, + "source": [ + "Rien de bien concluant à première vue, non ? Afin de déterminer visuellement s’il existe bien une relation linéaire entre la distance et la vitesse d’éloignement, affichez une droite de régression :" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "125c4241-faf9-4209-b8c6-cfc2c1b07105", + "metadata": {}, + "outputs": [], + "source": [ + "# your code here\n", + "\n", + "_ = sns.regplot(data=data, x=\"distance\", y=\"velocity\")" + ] + }, + { + "cell_type": "markdown", + "id": "aa3c4eeb-5ce4-44f2-9403-9d50a9e425e9", + "metadata": {}, + "source": [ + "Bon, appelez BFM TV, Wendy s’est trompée : 2/3 des points sont en dehors de l’intervalle de confiance à 95 %. Ou alors, peut-être avons-nous fait une erreur de méthodologie ?" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.6" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/notebooks/machine-learning/stellar-objects.ipynb b/notebooks/machine-learning/stellar-objects.ipynb deleted file mode 100644 index 816a825d12ff8d25388b1930e8527ccaf7dd0d81..0000000000000000000000000000000000000000 --- a/notebooks/machine-learning/stellar-objects.ipynb +++ /dev/null @@ -1,86 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "id": "4bbfb43b-1feb-4366-b1e7-5536f0f5aacd", - "metadata": {}, - "outputs": [], - "source": [ - "import matplotlib.pyplot as plt\n", - "import pandas as pd\n", - "import seaborn as sns" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "1cf3ab56-418f-46e3-bc3f-36cf0eec0dbf", - "metadata": {}, - "outputs": [], - "source": [ - "# distance: megaparsec (MPC)\n", - "# velocity: in km/s\n", - "df = pd.read_csv(\"./galaxies.csv\", sep=\"\\t\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "f6e8f95c-1da6-4d6f-b8c0-a6aa5cdddc2d", - "metadata": {}, - "outputs": [], - "source": [ - "df[\"velocity\"] = df.v_helio.fillna(df.v_flow.fillna(df.v_cmb))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "8396a4ef-9f1f-425a-886e-8d2bf3d979ee", - "metadata": {}, - "outputs": [], - "source": [ - "plt.title(\"Relation between distance and velocity of stellar objects\")\n", - "plt.xlabel(\"Distance (MPC)\")\n", - "plt.ylabel(\"Velocity (km/s)\")\n", - "\n", - "#sns.scatterplot(data=df, x=\"distance\", y=\"velocity\", color=\"orange\")\n", - "sns.regplot(data=df, x=\"distance\", y=\"velocity\", color=\"orange\")\n", - "\n", - "sns.despine()\n", - "\n", - "plt.show()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "6d94f18a-29ec-4da1-b542-b498e3017d2d", - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.6" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -}