"
]
},
{
"cell_type": "markdown",
"id": "ba1e2404",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"### Optional Lab - W1: Brief Introduction to Python and Jupyter Notebooks\n",
"Welcome to the first optional lab! \n",
"Optional labs are available to:\n",
"- provide information - like this notebook\n",
"- reinforce lecture material with hands-on examples\n",
"- provide working examples of routines used in the graded labs"
]
},
{
"cell_type": "markdown",
"id": "7eed3933",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"#### Goals\n",
"In this lab, you will:\n",
"- Get a brief introduction to Jupyter notebooks\n",
"- Take a tour of Jupyter notebooks\n",
"- Learn the difference between markdown cells and code cells\n",
"- Practice some basic python\n"
]
},
{
"cell_type": "markdown",
"id": "da73e262",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"The easiest way to become familiar with Jupyter notebooks is to take the tour available above in the Help menu:"
]
},
{
"cell_type": "markdown",
"id": "4957c87e",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"
\n",
"
\n",
""
]
},
{
"cell_type": "markdown",
"id": "466abf8c",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Jupyter notebooks have two types of cells that are used in this course. Cells such as this which contain documentation called `Markdown Cells`. The name is derived from the simple formatting language used in the cells. You will not be required to produce markdown cells. Its useful to understand the `cell pulldown` shown in graphic below. Occasionally, a cell will end up in the wrong mode and you may need to restore it to the right state:"
]
},
{
"cell_type": "markdown",
"id": "e25d5702",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"
\n",
" \n",
""
]
},
{
"cell_type": "markdown",
"id": "30d554fe",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"The other type of cell is the `code cell` where you will write your code:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f35db920",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"#This is a 'Code' Cell\n",
"print(\"This is code cell\")"
]
},
{
"cell_type": "markdown",
"id": "21f16015",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"### Python\n",
"You can write your code in the code cells. \n",
"To run the code, select the cell and either\n",
"- hold the shift-key down and hit 'enter' or 'return'\n",
"- click the 'run' arrow above\n",
"
\n",
" \n",
"\n",
"\n",
" "
]
},
{
"cell_type": "markdown",
"id": "47759d5b",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"#### Print statement\n",
"Print statements will generally use the python f-string style. \n",
"Try creating your own print in the following cell. \n",
"Try both methods of running the cell."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8babb554",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"# print statements\n",
"variable = \"right in the strings!\"\n",
"print(f\"f strings allow you to embed variables {variable}\")"
]
},
{
"cell_type": "markdown",
"id": "66da1cd0",
"metadata": {},
"source": [
"### Practice Quiz "
]
},
{
"cell_type": "markdown",
"id": "b7f8d20a",
"metadata": {},
"source": [
"#### Quiz - 1"
]
},
{
"cell_type": "markdown",
"id": "3be811d0",
"metadata": {},
"source": [
"
\n"
]
},
{
"cell_type": "markdown",
"id": "c071f6e5",
"metadata": {},
"source": [
"## Module - 2"
]
},
{
"cell_type": "markdown",
"id": "af7db867",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"### Optional Lab W2: Python, NumPy and Vectorization\n",
"A brief introduction to some of the scientific computing used in this course. In particular the NumPy scientific computing package and its use with python.\n",
"\n",
"#### Outline\n",
"- [1.1 Goals ](#toc_40015_1.1)\n",
"- [1.2 Useful References ](#toc_40015_1.2)\n",
"- [2 Python and NumPy ](#toc_40015_2)\n",
"- [3 Vectors ](#toc_40015_3)\n",
"- [3.1 Abstract ](#toc_40015_3.1)\n",
"- [3.2 NumPy Arrays ](#toc_40015_3.2)\n",
"- [3.3 Vector Creation ](#toc_40015_3.3)\n",
"- [3.4 Operations on Vectors ](#toc_40015_3.4)\n",
"- [4 Matrices ](#toc_40015_4)\n",
"- [4.1 Abstract ](#toc_40015_4.1)\n",
"- [4.2 NumPy Arrays ](#toc_40015_4.2)\n",
"- [4.3 Matrix Creation ](#toc_40015_4.3)\n",
"- [4.4 Operations on Matrices ](#toc_40015_4.4)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5ba156ba",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"import numpy as np # it is an unofficial standard to use np for numpy\n",
"import time"
]
},
{
"cell_type": "markdown",
"id": "ba562829",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"#### 1.1 Goals \n",
"\n",
"In this lab, you will:\n",
"- Review the features of NumPy and Python that are used in Course 1"
]
},
{
"cell_type": "markdown",
"id": "8eed19e7",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"#### 1.2 Useful References\n",
"\n",
"- NumPy Documentation including a basic introduction: [NumPy.org](https://NumPy.org/doc/stable/)\n",
"- A challenging feature topic: [NumPy Broadcasting](https://NumPy.org/doc/stable/user/basics.broadcasting.html)\n"
]
},
{
"cell_type": "markdown",
"id": "4feabd06",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"#### 2 Python and NumPy \n",
"\n",
"Python is the programming language we will be using in this course. It has a set of numeric data types and arithmetic operations. NumPy is a library that extends the base capabilities of python to add a richer data set including more numeric types, vectors, matrices, and many matrix functions. NumPy and python work together fairly seamlessly. Python arithmetic operators work on NumPy data types and many NumPy functions will accept python data types.\n"
]
},
{
"cell_type": "markdown",
"id": "c5e38bfd",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"#### 3 Vectors\n",
""
]
},
{
"cell_type": "markdown",
"id": "3afb8d41",
"metadata": {},
"source": [
"##### 3.1 Abstract\n",
"\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
" Vectors, as you will use them in this course, are ordered arrays of numbers. In notation, vectors are denoted \n",
" with lower case bold letters such as $\\mathbf{x}$. The elements of a vector are all the same type. A vector \n",
" does not, for example, contain both characters and numbers. The number of elements in the array is often \n",
" referred to as the *dimension* though mathematicians may prefer *rank*. The vector shown has a dimension of $n$. The elements of a vector can be referenced with an index. In math settings, indexes typically run from 1 to n. In computer science and these labs, indexing will typically run from 0 to n-1. In notation, elements of a vector, when referenced individually will indicate the index in a subscript, for example, the $0^{th}$ element, of the vector $\\mathbf{x}$ is $x_0$. Note, the x is not bold in this case. \n",
"
\n"
]
},
{
"cell_type": "markdown",
"id": "c9e73ec6",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### 3.2 NumPy Arrays\n",
"\n",
"\n",
"NumPy's basic data structure is an indexable, n-dimensional *array* containing elements of the same type (`dtype`). Right away, you may notice we have overloaded the term 'dimension'. Above, it was the number of elements in the vector, here, dimension refers to the number of indexes of an array. A one-dimensional or 1-D array has one index. In Course 1, we will represent vectors as NumPy 1-D arrays. \n",
"\n",
" - 1-D array, shape (n,): n elements indexed [0] through [n-1]\n",
" "
]
},
{
"cell_type": "markdown",
"id": "4c557076",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### 3.3 Vector Creation\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "34a5e347",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Data creation routines in NumPy will generally have a first parameter which is the shape of the object. This can either be a single value for a 1-D result or a tuple (n,m,...) specifying the shape of the result. Below are examples of creating vectors using these routines."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "25aac0da",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"# NumPy routines which allocate memory and fill arrays with value\n",
"a = np.zeros(4); print(f\"np.zeros(4) : a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
"a = np.zeros((4,)); print(f\"np.zeros(4,) : a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
"a = np.random.random_sample(4); print(f\"np.random.random_sample(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")"
]
},
{
"cell_type": "markdown",
"id": "3caf3a13",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Some data creation routines do not take a shape tuple:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8cd7c0b3",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"# NumPy routines which allocate memory and fill arrays with value but do not accept shape as input argument\n",
"a = np.arange(4.); print(f\"np.arange(4.): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
"a = np.random.rand(4); print(f\"np.random.rand(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")"
]
},
{
"cell_type": "markdown",
"id": "f107c53e",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"values can be specified manually as well. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9a4cc9b8",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"# NumPy routines which allocate memory and fill with user specified values\n",
"a = np.array([5,4,3,2]); print(f\"np.array([5,4,3,2]): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
"a = np.array([5.,4,3,2]); print(f\"np.array([5.,4,3,2]): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")"
]
},
{
"cell_type": "markdown",
"id": "1580f398",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"These have all created a one-dimensional vector `a` with four elements. `a.shape` returns the dimensions. Here we see a.shape = `(4,)` indicating a 1-d array with 4 elements. "
]
},
{
"cell_type": "markdown",
"id": "8f8a0cf6",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### 3.4 Operations on Vectors\n",
"\n",
"Let's explore some operations using vectors.\n",
"\n",
"###### 3.4.1 Indexing\n",
"\n",
"\n",
"Elements of vectors can be accessed via indexing and slicing. NumPy provides a very complete set of indexing and slicing capabilities. We will explore only the basics needed for the course here. Reference [Slicing and Indexing](https://NumPy.org/doc/stable/reference/arrays.indexing.html) for more details. \n",
"**Indexing** means referring to *an element* of an array by its position within the array. \n",
"**Slicing** means getting a *subset* of elements from an array based on their indices. \n",
"NumPy starts indexing at zero so the 3rd element of an vector $\\mathbf{a}$ is `a[2]`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aed264a5",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"#vector indexing operations on 1-D vectors\n",
"a = np.arange(10)\n",
"print(a)\n",
"\n",
"#access an element\n",
"print(f\"a[2].shape: {a[2].shape} a[2] = {a[2]}, Accessing an element returns a scalar\")\n",
"\n",
"# access the last element, negative indexes count from the end\n",
"print(f\"a[-1] = {a[-1]}\")\n",
"\n",
"#indexs must be within the range of the vector or they will produce and error\n",
"try:\n",
" c = a[10]\n",
"except Exception as e:\n",
" print(\"The error message you'll see is:\")\n",
" print(e)"
]
},
{
"cell_type": "markdown",
"id": "09a98b39",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"###### 3.4.2 Slicing\n",
"\n",
"Slicing creates an array of indices using a set of three values (`start:stop:step`). A subset of values is also valid. Its use is best explained by example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1d4b3666",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"#vector slicing operations\n",
"a = np.arange(10)\n",
"print(f\"a = {a}\")\n",
"\n",
"#access 5 consecutive elements (start:stop:step)\n",
"c = a[2:7:1]; print(\"a[2:7:1] = \", c)\n",
"\n",
"# access 3 elements separated by two \n",
"c = a[2:7:2]; print(\"a[2:7:2] = \", c)\n",
"\n",
"# access all elements index 3 and above\n",
"c = a[3:]; print(\"a[3:] = \", c)\n",
"\n",
"# access all elements below index 3\n",
"c = a[:3]; print(\"a[:3] = \", c)\n",
"\n",
"# access all elements\n",
"c = a[:]; print(\"a[:] = \", c)"
]
},
{
"cell_type": "markdown",
"id": "7bd222c2",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"###### 3.4.3 Single vector operations\n",
"\n",
"\n",
"There are a number of useful operations that involve operations on a single vector."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "52c244d1",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"a = np.array([1,2,3,4])\n",
"print(f\"a : {a}\")\n",
"# negate elements of a\n",
"b = -a \n",
"print(f\"b = -a : {b}\")\n",
"\n",
"# sum all elements of a, returns a scalar\n",
"b = np.sum(a) \n",
"print(f\"b = np.sum(a) : {b}\")\n",
"\n",
"b = np.mean(a)\n",
"print(f\"b = np.mean(a): {b}\")\n",
"\n",
"b = a**2\n",
"print(f\"b = a**2 : {b}\")"
]
},
{
"cell_type": "markdown",
"id": "408fcc40",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"###### 3.4.4 Vector Vector element-wise operations\n",
"\n",
"\n",
"Most of the NumPy arithmetic, logical and comparison operations apply to vectors as well. These operators work on an element-by-element basis. For example \n",
"$$ \\mathbf{a} + \\mathbf{b} = \\sum_{i=0}^{n-1} a_i + b_i $$"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0169f2ba",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"a = np.array([ 1, 2, 3, 4])\n",
"b = np.array([-1,-2, 3, 4])\n",
"print(f\"Binary operators work element wise: {a + b}\")"
]
},
{
"cell_type": "markdown",
"id": "c3d4350f",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Of course, for this to work correctly, the vectors must be of the same size:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2118077b",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"#try a mismatched vector operation\n",
"c = np.array([1, 2])\n",
"try:\n",
" d = a + c\n",
"except Exception as e:\n",
" print(\"The error message you'll see is:\")\n",
" print(e)"
]
},
{
"cell_type": "markdown",
"id": "4ee67081",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"###### 3.4.5 Scalar Vector operations\n",
"\n",
"\n",
"Vectors can be 'scaled' by scalar values. A scalar value is just a number. The scalar multiplies all the elements of the vector."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7bff1040",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"a = np.array([1, 2, 3, 4])\n",
"\n",
"# multiply a by a scalar\n",
"b = 5 * a \n",
"print(f\"b = 5 * a : {b}\")"
]
},
{
"cell_type": "markdown",
"id": "8651d003",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"###### 3.4.6 Vector Vector dot product\n",
"\n",
"\n",
"The dot product is a mainstay of Linear Algebra and NumPy. This is an operation used extensively in this course and should be well understood. The dot product is shown below."
]
},
{
"cell_type": "markdown",
"id": "615a4d74",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
" "
]
},
{
"cell_type": "markdown",
"id": "36aa2cda",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"The dot product multiplies the values in two vectors element-wise and then sums the result.\n",
"Vector dot product requires the dimensions of the two vectors to be the same. "
]
},
{
"cell_type": "markdown",
"id": "31cf42f3",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Let's implement our own version of the dot product below:\n",
"\n",
"**Using a for loop**, implement a function which returns the dot product of two vectors. The function to return given inputs $a$ and $b$:\n",
"$$ x = \\sum_{i=0}^{n-1} a_i b_i $$\n",
"Assume both `a` and `b` are the same shape."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4cccf6b1",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"def my_dot(a, b): \n",
" \"\"\"\n",
" Compute the dot product of two vectors\n",
" \n",
" Args:\n",
" a (ndarray (n,)): input vector \n",
" b (ndarray (n,)): input vector with same dimension as a\n",
" \n",
" Returns:\n",
" x (scalar): \n",
" \"\"\"\n",
" x=0\n",
" for i in range(a.shape[0]):\n",
" x = x + a[i] * b[i]\n",
" return x"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "95c40306",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"# test 1-D\n",
"a = np.array([1, 2, 3, 4])\n",
"b = np.array([-1, 4, 3, 2])\n",
"print(f\"my_dot(a, b) = {my_dot(a, b)}\")"
]
},
{
"cell_type": "markdown",
"id": "8aeeabc3",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Note, the dot product is expected to return a scalar value. \n",
"\n",
"Let's try the same operations using `np.dot`. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b50e32ca",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"# test 1-D\n",
"a = np.array([1, 2, 3, 4])\n",
"b = np.array([-1, 4, 3, 2])\n",
"c = np.dot(a, b)\n",
"print(f\"NumPy 1-D np.dot(a, b) = {c}, np.dot(a, b).shape = {c.shape} \") \n",
"c = np.dot(b, a)\n",
"print(f\"NumPy 1-D np.dot(b, a) = {c}, np.dot(a, b).shape = {c.shape} \")\n"
]
},
{
"cell_type": "markdown",
"id": "ba8b9cdc",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Above, you will note that the results for 1-D matched our implementation."
]
},
{
"cell_type": "markdown",
"id": "abcfcc14",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"###### 3.4.7 The Need for Speed: vector vs for loop\n",
"\n",
"\n",
"We utilized the NumPy library because it improves speed memory efficiency. Let's demonstrate:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "427fae1d",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"np.random.seed(1)\n",
"a = np.random.rand(10000000) # very large arrays\n",
"b = np.random.rand(10000000)\n",
"\n",
"tic = time.time() # capture start time\n",
"c = np.dot(a, b)\n",
"toc = time.time() # capture end time\n",
"\n",
"print(f\"np.dot(a, b) = {c:.4f}\")\n",
"print(f\"Vectorized version duration: {1000*(toc-tic):.4f} ms \")\n",
"\n",
"tic = time.time() # capture start time\n",
"c = my_dot(a,b)\n",
"toc = time.time() # capture end time\n",
"\n",
"print(f\"my_dot(a, b) = {c:.4f}\")\n",
"print(f\"loop version duration: {1000*(toc-tic):.4f} ms \")\n",
"\n",
"del(a);del(b) #remove these big arrays from memory"
]
},
{
"cell_type": "markdown",
"id": "21897335",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"So, vectorization provides a large speed up in this example. This is because NumPy makes better use of available data parallelism in the underlying hardware. GPU's and modern CPU's implement Single Instruction, Multiple Data (SIMD) pipelines allowing multiple operations to be issued in parallel. This is critical in Machine Learning where the data sets are often very large."
]
},
{
"cell_type": "markdown",
"id": "fc56006f",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"###### 3.4.8 Vector Vector operations in Course 1\n",
"\n",
"\n",
"Vector Vector operations will appear frequently in course 1. Here is why:\n",
"- Going forward, our examples will be stored in an array, `X_train` of dimension (m,n). This will be explained more in context, but here it is important to note it is a 2 Dimensional array or matrix (see next section on matrices).\n",
"- `w` will be a 1-dimensional vector of shape (n,).\n",
"- we will perform operations by looping through the examples, extracting each example to work on individually by indexing X. For example:`X[i]`\n",
"- `X[i]` returns a value of shape (n,), a 1-dimensional vector. Consequently, operations involving `X[i]` are often vector-vector. \n",
"\n",
"That is a somewhat lengthy explanation, but aligning and understanding the shapes of your operands is important when performing vector operations."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9eac936c",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"# show common Course 1 example\n",
"X = np.array([[1],[2],[3],[4]])\n",
"w = np.array([2])\n",
"c = np.dot(X[1], w)\n",
"\n",
"print(f\"X[1] has shape {X[1].shape}\")\n",
"print(f\"w has shape {w.shape}\")\n",
"print(f\"c has shape {c.shape}\")"
]
},
{
"cell_type": "markdown",
"id": "05e58839",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"#### 4 Matrices\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "9e5b0b36",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### 4.1 Abstract\n",
"\n",
"\n",
"Matrices, are two dimensional arrays. The elements of a matrix are all of the same type. In notation, matrices are denoted with capitol, bold letter such as $\\mathbf{X}$. In this and other labs, `m` is often the number of rows and `n` the number of columns. The elements of a matrix can be referenced with a two dimensional index. In math settings, numbers in the index typically run from 1 to n. In computer science and these labs, indexing will run from 0 to n-1. \n",
"
\n",
"
\n",
" Generic Matrix Notation, 1st index is row, 2nd is column \n",
""
]
},
{
"cell_type": "markdown",
"id": "88c4fda5",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### 4.2 NumPy Arrays\n",
"\n",
"\n",
"NumPy's basic data structure is an indexable, n-dimensional *array* containing elements of the same type (`dtype`). These were described earlier. Matrices have a two-dimensional (2-D) index [m,n].\n",
"\n",
"In Course 1, 2-D matrices are used to hold training data. Training data is $m$ examples by $n$ features creating an (m,n) array. Course 1 does not do operations directly on matrices but typically extracts an example as a vector and operates on that. Below you will review: \n",
"- data creation\n",
"- slicing and indexing"
]
},
{
"cell_type": "markdown",
"id": "45cba502",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### 4.3 Matrix Creation\n",
"\n",
"\n",
"The same functions that created 1-D vectors will create 2-D or n-D arrays. Here are some examples\n"
]
},
{
"cell_type": "markdown",
"id": "15f7050e",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Below, the shape tuple is provided to achieve a 2-D result. Notice how NumPy uses brackets to denote each dimension. Notice further than NumPy, when printing, will print one row per line.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1269c0be",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"a = np.zeros((1, 5)) \n",
"print(f\"a shape = {a.shape}, a = {a}\") \n",
"\n",
"a = np.zeros((2, 1)) \n",
"print(f\"a shape = {a.shape}, a = {a}\") \n",
"\n",
"a = np.random.random_sample((1, 1)) \n",
"print(f\"a shape = {a.shape}, a = {a}\") "
]
},
{
"cell_type": "markdown",
"id": "f17b4454",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"One can also manually specify data. Dimensions are specified with additional brackets matching the format in the printing above."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "203da595",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"# NumPy routines which allocate memory and fill with user specified values\n",
"a = np.array([[5], [4], [3]]); print(f\" a shape = {a.shape}, np.array: a = {a}\")\n",
"a = np.array([[5], # One can also\n",
" [4], # separate values\n",
" [3]]); #into separate rows\n",
"print(f\" a shape = {a.shape}, np.array: a = {a}\")"
]
},
{
"cell_type": "markdown",
"id": "fcebc352",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### 4.4 Operations on Matrices\n",
"\n",
"\n",
"Let's explore some operations using matrices."
]
},
{
"cell_type": "markdown",
"id": "0852ee28",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"###### 4.4.1 Indexing\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "ca8c4a30",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Matrices include a second index. The two indexes describe [row, column]. Access can either return an element or a row/column. See below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bb8fa67f",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"#vector indexing operations on matrices\n",
"a = np.arange(6).reshape(-1, 2) #reshape is a convenient way to create matrices\n",
"print(f\"a.shape: {a.shape}, \\na= {a}\")\n",
"\n",
"#access an element\n",
"print(f\"\\na[2,0].shape: {a[2, 0].shape}, a[2,0] = {a[2, 0]}, type(a[2,0]) = {type(a[2, 0])} Accessing an element returns a scalar\\n\")\n",
"\n",
"#access a row\n",
"print(f\"a[2].shape: {a[2].shape}, a[2] = {a[2]}, type(a[2]) = {type(a[2])}\")"
]
},
{
"cell_type": "markdown",
"id": "5d66eadf",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"It is worth drawing attention to the last example. Accessing a matrix by just specifying the row will return a *1-D vector*."
]
},
{
"cell_type": "markdown",
"id": "2d803091",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"**Reshape** \n",
"The previous example used [reshape](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html) to shape the array. \n",
"`a = np.arange(6).reshape(-1, 2) ` \n",
"This line of code first created a *1-D Vector* of six elements. It then reshaped that vector into a *2-D* array using the reshape command. This could have been written: \n",
"`a = np.arange(6).reshape(3, 2) ` \n",
"To arrive at the same 3 row, 2 column array.\n",
"The -1 argument tells the routine to compute the number of rows given the size of the array and the number of columns.\n"
]
},
{
"cell_type": "markdown",
"id": "118df767",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"###### 4.4.2 Slicing\n",
"\n",
"\n",
"Slicing creates an array of indices using a set of three values (`start:stop:step`). A subset of values is also valid. Its use is best explained by example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "934be8fa",
"metadata": {
"pycharm": {
"name": "#%%\n"
},
"scrolled": true
},
"outputs": [],
"source": [
"#vector 2-D slicing operations\n",
"a = np.arange(20).reshape(-1, 10)\n",
"print(f\"a = \\n{a}\")\n",
"\n",
"#access 5 consecutive elements (start:stop:step)\n",
"print(\"a[0, 2:7:1] = \", a[0, 2:7:1], \", a[0, 2:7:1].shape =\", a[0, 2:7:1].shape, \"a 1-D array\")\n",
"\n",
"#access 5 consecutive elements (start:stop:step) in two rows\n",
"print(\"a[:, 2:7:1] = \\n\", a[:, 2:7:1], \", a[:, 2:7:1].shape =\", a[:, 2:7:1].shape, \"a 2-D array\")\n",
"\n",
"# access all elements\n",
"print(\"a[:,:] = \\n\", a[:,:], \", a[:,:].shape =\", a[:,:].shape)\n",
"\n",
"# access all elements in one row (very common usage)\n",
"print(\"a[1,:] = \", a[1,:], \", a[1,:].shape =\", a[1,:].shape, \"a 1-D array\")\n",
"# same as\n",
"print(\"a[1] = \", a[1], \", a[1].shape =\", a[1].shape, \"a 1-D array\")\n"
]
},
{
"cell_type": "markdown",
"id": "3624960f",
"metadata": {},
"source": [
"### Practice Quiz "
]
},
{
"cell_type": "markdown",
"id": "a1516b3a",
"metadata": {},
"source": [
"#### Quiz-1 "
]
},
{
"cell_type": "markdown",
"id": "cc8002d6",
"metadata": {},
"source": [
"
\n",
""
]
},
{
"cell_type": "markdown",
"id": "6d64f2fc",
"metadata": {},
"source": [
"### Assignment W2: \n"
]
},
{
"cell_type": "markdown",
"id": "30e6a206",
"metadata": {},
"source": [
"#### Practice Lab: Linear Regression\n",
"\n",
"Welcome to your first practice lab! In this lab, you will implement linear regression with one variable to predict profits for a restaurant franchise.\n",
"\n",
"\n",
"##### Outline\n",
"- [ 1 - Packages ](#1)\n",
"- [ 2 - Linear regression with one variable ](#2)\n",
"- [ 2.1 Problem Statement](#2.1)\n",
"- [ 3 Dataset](#3)\n",
"- [ 4 Refresher on linear regression](#4)\n",
"- [ 5 Compute Cost](#5)\n",
" - [ Exercise 1](#ex01)\n",
"- [ 6 Gradient descent ](#6)\n",
" - [ Exercise 2](#ex02)\n",
" - [ 6.1 Learning parameters using batch gradient descent ](#6.1)\n"
]
},
{
"cell_type": "markdown",
"id": "22a5e94a",
"metadata": {},
"source": [
"#### 1 - Packages \n",
"\n",
"\n",
"First, let's run the cell below to import all the packages that you will need during this assignment.\n",
"- [numpy](www.numpy.org) is the fundamental package for working with matrices in Python.\n",
"- [matplotlib](http://matplotlib.org) is a famous library to plot graphs in Python.\n",
"- ``utils.py`` contains helper functions for this assignment. You do not need to modify code in this file.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ebb0fd1d",
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"#add modules from the path\n",
"sys.path.append(\"/home/amitk/my_web/Machine-Learning-Andrew-Ng/source/source_files/Supervised_Machine_Learning_Regression_and_Classification/week2/C1W2A1\")\n",
"\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"\n",
"from utils import *\n",
"import copy\n",
"import math\n",
"%matplotlib inline \n",
"#to show graphs inline"
]
},
{
"cell_type": "markdown",
"id": "425b1ed5",
"metadata": {},
"source": [
"#### 2 - Problem Statement\n",
"\n",
"Suppose you are the CEO of a restaurant franchise and are considering different cities for opening a new outlet.\n",
"- You would like to expand your business to cities that may give your restaurant higher profits.\n",
"- The chain already has restaurants in various cities and you have data for profits and populations from the cities.\n",
"- You also have data on cities that are candidates for a new restaurant. \n",
" - For these cities, you have the city population.\n",
" \n",
"Can you use the data to help you identify which cities may potentially give your business higher profits?\n",
"\n",
"#### 3 - Dataset\n",
"\n",
"You will start by loading the dataset for this task. \n",
"- The `load_data()` function shown below loads the data into variables `x_train` and `y_train`\n",
" - `x_train` is the population of a city\n",
" - `y_train` is the profit of a restaurant in that city. A negative value for profit indicates a loss. \n",
" - Both `X_train` and `y_train` are numpy arrays."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cb386022",
"metadata": {},
"outputs": [],
"source": [
"# load the dataset\n",
"x_train, y_train = load_data()"
]
},
{
"cell_type": "markdown",
"id": "492f2c9c",
"metadata": {},
"source": [
"##### View the variables\n",
"Before starting on any task, it is useful to get more familiar with your dataset. \n",
"- A good place to start is to just print out each variable and see what it contains.\n",
"\n",
"The code below prints the variable `x_train` and the type of the variable."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d19e9005",
"metadata": {},
"outputs": [],
"source": [
"# print x_train\n",
"print(\"Type of x_train:\",type(x_train))\n",
"print(\"First five elements of x_train are:\\n\", x_train[:5]) "
]
},
{
"cell_type": "markdown",
"id": "bb820aa5",
"metadata": {},
"source": [
"`x_train` is a numpy array that contains decimal values that are all greater than zero.\n",
"- These values represent the city population times 10,000\n",
"- For example, 6.1101 means that the population for that city is 61,101\n",
" \n",
"Now, let's print `y_train`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4e65298e",
"metadata": {},
"outputs": [],
"source": [
"# print y_train\n",
"print(\"Type of y_train:\",type(y_train))\n",
"print(\"First five elements of y_train are:\\n\", y_train[:5]) "
]
},
{
"cell_type": "markdown",
"id": "4338e255",
"metadata": {},
"source": [
"Similarly, `y_train` is a numpy array that has decimal values, some negative, some positive.\n",
"- These represent your restaurant's average monthly profits in each city, in units of \\$10,000.\n",
" - For example, 17.592 represents \\$175,920 in average monthly profits for that city.\n",
" - -2.6807 represents -\\$26,807 in average monthly loss for that city."
]
},
{
"cell_type": "markdown",
"id": "f0bd0ace",
"metadata": {},
"source": [
"##### Check the dimensions of your variables\n",
"\n",
"Another useful way to get familiar with your data is to view its dimensions.\n",
"\n",
"Please print the shape of `x_train` and `y_train` and see how many training examples you have in your dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "da8ceba3",
"metadata": {},
"outputs": [],
"source": [
"print ('The shape of x_train is:', x_train.shape)\n",
"print ('The shape of y_train is: ', y_train.shape)\n",
"print ('Number of training examples (m):', len(x_train))"
]
},
{
"cell_type": "markdown",
"id": "1edcdd3f",
"metadata": {},
"source": [
"The city population array has 97 data points, and the monthly average profits also has 97 data points. These are NumPy 1D arrays."
]
},
{
"cell_type": "markdown",
"id": "65fcd323",
"metadata": {},
"source": [
"##### Visualize your data\n",
"\n",
"It is often useful to understand the data by visualizing it. \n",
"- For this dataset, you can use a scatter plot to visualize the data, since it has only two properties to plot (profit and population). \n",
"- Many other problems that you will encounter in real life have more than two properties (for example, population, average household income, monthly profits, monthly sales).When you have more than two properties, you can still use a scatter plot to see the relationship between each pair of properties.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fd2f807a",
"metadata": {},
"outputs": [],
"source": [
"# Create a scatter plot of the data. To change the markers to red \"x\",\n",
"# we used the 'marker' and 'c' parameters\n",
"plt.scatter(x_train, y_train, marker='x', c='r') \n",
"\n",
"# Set the title\n",
"plt.title(\"Profits vs. Population per city\")\n",
"# Set the y-axis label\n",
"plt.ylabel('Profit in $10,000')\n",
"# Set the x-axis label\n",
"plt.xlabel('Population of City in 10,000s')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "1294b699",
"metadata": {},
"source": [
"Your goal is to build a linear regression model to fit this data.\n",
"- With this model, you can then input a new city's population, and have the model estimate your restaurant's potential monthly profits for that city."
]
},
{
"cell_type": "markdown",
"id": "a86e4912",
"metadata": {},
"source": [
"#### 4 - Refresher on linear regression\n",
"\n",
"\n",
"\n",
"In this practice lab, you will fit the linear regression parameters $(w,b)$ to your dataset.\n",
"- The model function for linear regression, which is a function that maps from `x` (city population) to `y` (your restaurant's monthly profit for that city) is represented as \n",
" $$f_{w,b}(x) = wx + b$$\n",
" \n",
"\n",
"- To train a linear regression model, you want to find the best $(w,b)$ parameters that fit your dataset. \n",
"\n",
" - To compare how one choice of $(w,b)$ is better or worse than another choice, you can evaluate it with a cost function $J(w,b)$\n",
" - $J$ is a function of $(w,b)$. That is, the value of the cost $J(w,b)$ depends on the value of $(w,b)$.\n",
" \n",
" - The choice of $(w,b)$ that fits your data the best is the one that has the smallest cost $J(w,b)$.\n",
"\n",
"\n",
"- To find the values $(w,b)$ that gets the smallest possible cost $J(w,b)$, you can use a method called **gradient descent**. \n",
" - With each step of gradient descent, your parameters $(w,b)$ come closer to the optimal values that will achieve the lowest cost $J(w,b)$.\n",
" \n",
"\n",
"- The trained linear regression model can then take the input feature $x$ (city population) and output a prediction $f_{w,b}(x)$ (predicted monthly profit for a restaurant in that city)."
]
},
{
"cell_type": "markdown",
"id": "0819a0e6",
"metadata": {},
"source": [
"#### 5 - Compute Cost\n",
"\n",
"\n",
"Gradient descent involves repeated steps to adjust the value of your parameter $(w,b)$ to gradually get a smaller and smaller cost $J(w,b)$.\n",
"- At each step of gradient descent, it will be helpful for you to monitor your progress by computing the cost $J(w,b)$ as $(w,b)$ gets updated. \n",
"- In this section, you will implement a function to calculate $J(w,b)$ so that you can check the progress of your gradient descent implementation.\n",
"\n",
"##### Cost function\n",
"As you may recall from the lecture, for one variable, the cost function for linear regression $J(w,b)$ is defined as\n",
"\n",
"$$J(w,b) = \\frac{1}{2m} \\sum\\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})^2$$ \n",
"\n",
"- You can think of $f_{w,b}(x^{(i)})$ as the model's prediction of your restaurant's profit, as opposed to $y^{(i)}$, which is the actual profit that is recorded in the data.\n",
"- $m$ is the number of training examples in the dataset\n",
"\n",
"##### Model prediction\n",
"\n",
"- For linear regression with one variable, the prediction of the model $f_{w,b}$ for an example $x^{(i)}$ is representented as:\n",
"\n",
"$$ f_{w,b}(x^{(i)}) = wx^{(i)} + b$$\n",
"\n",
"This is the equation for a line, with an intercept $b$ and a slope $w$\n",
"\n",
"##### Implementation\n",
"\n",
"Please complete the `compute_cost()` function below to compute the cost $J(w,b)$."
]
},
{
"cell_type": "markdown",
"id": "ed5704b1",
"metadata": {},
"source": [
"#### Exercise 1\n",
"\n",
"\n",
"Complete the `compute_cost` below to:\n",
"\n",
"* Iterate over the training examples, and for each example, compute:\n",
" * The prediction of the model for that example \n",
" $$\n",
" f_{wb}(x^{(i)}) = wx^{(i)} + b \n",
" $$\n",
" \n",
" * The cost for that example $$cost^{(i)} = (f_{wb} - y^{(i)})^2$$\n",
" \n",
"\n",
"* Return the total cost over all examples\n",
"$$J(\\mathbf{w},b) = \\frac{1}{2m} \\sum\\limits_{i = 0}^{m-1} cost^{(i)}$$\n",
" * Here, $m$ is the number of training examples and $\\sum$ is the summation operator\n",
"\n",
"If you get stuck, you can check out the hints presented after the cell below to help you with the implementation."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "575a6d9e",
"metadata": {},
"outputs": [],
"source": [
"# UNQ_C1\n",
"# GRADED FUNCTION: compute_cost\n",
"\n",
"def compute_cost(x, y, w, b): \n",
" \"\"\"\n",
" Computes the cost function for linear regression.\n",
" \n",
" Args:\n",
" x (ndarray): Shape (m,) Input to the model (Population of cities) \n",
" y (ndarray): Shape (m,) Label (Actual profits for the cities)\n",
" w, b (scalar): Parameters of the model\n",
" \n",
" Returns\n",
" total_cost (float): The cost of using w,b as the parameters for linear regression\n",
" to fit the data points in x and y\n",
" \"\"\"\n",
" # number of training examples\n",
" m = x.shape[0] \n",
" \n",
" # You need to return this variable correctly\n",
" total_cost = 0\n",
"\n",
" ### START CODE HERE ###\n",
" cost=0\n",
" for i in range(m):\n",
" f_wb = w*x[i]+b\n",
" cost += (f_wb - y[i])**2\n",
" \n",
" total_cost = cost/(2*m)\n",
" \n",
" ### END CODE HERE ### \n",
"\n",
" return total_cost"
]
},
{
"cell_type": "markdown",
"id": "4bdd9017",
"metadata": {},
"source": [
"\n",
" Click for hints\n",
" \n",
" \n",
" * You can represent a summation operator eg: $h = \\sum\\limits_{i = 0}^{m-1} 2i$ in code as follows:\n",
" ```python \n",
" h = 0\n",
" for i in range(m):\n",
" h = h + 2*i\n",
" ```\n",
" \n",
" * In this case, you can iterate over all the examples in `x` using a for loop and add the `cost` from each iteration to a variable (`cost_sum`) initialized outside the loop.\n",
"\n",
" * Then, you can return the `total_cost` as `cost_sum` divided by `2m`.\n",
" \n",
" \n",
" Click for more hints\n",
" \n",
" * Here's how you can structure the overall implementation for this function\n",
" ```python \n",
" def compute_cost(x, y, w, b):\n",
" # number of training examples\n",
" m = x.shape[0] \n",
" \n",
" # You need to return this variable correctly\n",
" total_cost = 0\n",
" \n",
" ### START CODE HERE ### \n",
" # Variable to keep track of sum of cost from each example\n",
" cost_sum = 0\n",
" \n",
" # Loop over training examples\n",
" for i in range(m):\n",
" # Your code here to get the prediction f_wb for the ith example\n",
" f_wb = \n",
" # Your code here to get the cost associated with the ith example\n",
" cost = \n",
" \n",
" # Add to sum of cost for each example\n",
" cost_sum = cost_sum + cost \n",
"\n",
" # Get the total cost as the sum divided by (2*m)\n",
" total_cost = (1 / (2 * m)) * cost_sum\n",
" ### END CODE HERE ### \n",
"\n",
" return total_cost\n",
" ```\n",
" \n",
" If you're still stuck, you can check the hints presented below to figure out how to calculate `f_wb` and `cost`.\n",
" \n",
" \n",
" Hint to calculate f_wb For scalars $a$, $b$ and $c$ (x[i], w and b are all scalars), you can calculate the equation $h = ab + c$ in code as h = a * b + c\n",
" \n",
" More hints to calculate f You can compute f_wb as f_wb = w * x[i] + b \n",
" \n",
" \n",
"\n",
" \n",
" Hint to calculate cost You can calculate the square of a variable z as z**2\n",
" \n",
" More hints to calculate cost You can compute cost as cost = (f_wb - y[i]) ** 2\n",
" \n",
" \n",
" \n",
" \n",
"\n",
"\n",
" \n"
]
},
{
"cell_type": "markdown",
"id": "ac5bcb29",
"metadata": {},
"source": [
"You can check if your implementation was correct by running the following test code:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e52a6e0b",
"metadata": {},
"outputs": [],
"source": [
"# Compute cost with some initial values for paramaters w, b\n",
"initial_w = 2\n",
"initial_b = 1\n",
"\n",
"cost = compute_cost(x_train, y_train, initial_w, initial_b)\n",
"print(type(cost))\n",
"print(f'Cost at initial w (zeros): {cost:.3f}')\n",
"\n",
"# Public tests\n",
"from public_tests import *\n",
"compute_cost_test(compute_cost)"
]
},
{
"cell_type": "markdown",
"id": "1ff4d7d1",
"metadata": {},
"source": [
"**Expected Output**:\n",
"
\n",
"
\n",
"
Cost at initial w (zeros): 75.203
\n",
"
\n",
"
"
]
},
{
"cell_type": "markdown",
"id": "39fdb569",
"metadata": {},
"source": [
"#### 6 - Gradient descent \n",
"\n",
"\n",
"In this section, you will implement the gradient for parameters $w, b$ for linear regression. "
]
},
{
"cell_type": "markdown",
"id": "c22f102d",
"metadata": {},
"source": [
"As described in the lecture videos, the gradient descent algorithm is:\n",
"\n",
"$$\\begin{align*}& \\text{repeat until convergence:} \\; \\lbrace \\newline \\; & \\phantom {0000} b := b - \\alpha \\frac{\\partial J(w,b)}{\\partial b} \\newline \\; & \\phantom {0000} w := w - \\alpha \\frac{\\partial J(w,b)}{\\partial w} \\tag{1} \\; & \n",
"\\newline & \\rbrace\\end{align*}$$\n",
"\n",
"where, parameters $w, b$ are both updated simultaniously and where \n",
"$$\n",
"\\frac{\\partial J(w,b)}{\\partial b} = \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)}) \\tag{2}\n",
"$$\n",
"$$\n",
"\\frac{\\partial J(w,b)}{\\partial w} = \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) -y^{(i)})x^{(i)} \\tag{3}\n",
"$$\n",
"* m is the number of training examples in the dataset\n",
"\n",
" \n",
"* $f_{w,b}(x^{(i)})$ is the model's prediction, while $y^{(i)}$, is the target value\n",
"\n",
"\n",
"You will implement a function called `compute_gradient` which calculates $\\frac{\\partial J(w)}{\\partial w}$, $\\frac{\\partial J(w)}{\\partial b}$ "
]
},
{
"cell_type": "markdown",
"id": "5162e280",
"metadata": {},
"source": [
"##### Exercise 2\n",
"\n",
"\n",
"Please complete the `compute_gradient` function to:\n",
"\n",
"* Iterate over the training examples, and for each example, compute:\n",
" * The prediction of the model for that example \n",
" $$\n",
" f_{wb}(x^{(i)}) = wx^{(i)} + b \n",
" $$\n",
" \n",
" * The gradient for the parameters $w, b$ from that example \n",
" $$\n",
" \\frac{\\partial J(w,b)}{\\partial b}^{(i)} = (f_{w,b}(x^{(i)}) - y^{(i)}) \n",
" $$\n",
" $$\n",
" \\frac{\\partial J(w,b)}{\\partial w}^{(i)} = (f_{w,b}(x^{(i)}) -y^{(i)})x^{(i)} \n",
" $$\n",
" \n",
"\n",
"* Return the total gradient update from all the examples\n",
" $$\n",
" \\frac{\\partial J(w,b)}{\\partial b} = \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} \\frac{\\partial J(w,b)}{\\partial b}^{(i)}\n",
" $$\n",
" \n",
" $$\n",
" \\frac{\\partial J(w,b)}{\\partial w} = \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} \\frac{\\partial J(w,b)}{\\partial w}^{(i)} \n",
" $$\n",
" * Here, $m$ is the number of training examples and $\\sum$ is the summation operator\n",
"\n",
"If you get stuck, you can check out the hints presented after the cell below to help you with the implementation."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dd59d520",
"metadata": {},
"outputs": [],
"source": [
"# UNQ_C2\n",
"# GRADED FUNCTION: compute_gradient\n",
"def compute_gradient(x, y, w, b): \n",
" \"\"\"\n",
" Computes the gradient for linear regression \n",
" Args:\n",
" x (ndarray): Shape (m,) Input to the model (Population of cities) \n",
" y (ndarray): Shape (m,) Label (Actual profits for the cities)\n",
" w, b (scalar): Parameters of the model \n",
" Returns\n",
" dj_dw (scalar): The gradient of the cost w.r.t. the parameters w\n",
" dj_db (scalar): The gradient of the cost w.r.t. the parameter b \n",
" \"\"\"\n",
" \n",
" # Number of training examples\n",
" m = x.shape[0]\n",
" \n",
" # You need to return the following variables correctly\n",
" dj_dw = 0\n",
" dj_db = 0\n",
" \n",
" ### START CODE HERE ### \n",
" for i in range(m):\n",
" f_wb = w*x[i]+b\n",
" dj_db += f_wb - y[i]\n",
" dj_dw += (f_wb - y[i])*x[i]\n",
" dj_dw /= m\n",
" dj_db /= m\n",
" \n",
" ### END CODE HERE ### \n",
" \n",
" return dj_dw, dj_db"
]
},
{
"cell_type": "markdown",
"id": "03265cc0",
"metadata": {},
"source": [
"\n",
" Click for hints\n",
" \n",
" * You can represent a summation operator eg: $h = \\sum\\limits_{i = 0}^{m-1} 2i$ in code as follows:\n",
" ```python \n",
" h = 0\n",
" for i in range(m):\n",
" h = h + 2*i\n",
" ```\n",
" \n",
" * In this case, you can iterate over all the examples in `x` using a for loop and for each example, keep adding the gradient from that example to the variables `dj_dw` and `dj_db` which are initialized outside the loop. \n",
"\n",
" * Then, you can return `dj_dw` and `dj_db` both divided by `m`. \n",
" \n",
" Click for more hints\n",
" \n",
" * Here's how you can structure the overall implementation for this function\n",
" ```python \n",
" def compute_gradient(x, y, w, b): \n",
" \"\"\"\n",
" Computes the gradient for linear regression \n",
" Args:\n",
" x (ndarray): Shape (m,) Input to the model (Population of cities) \n",
" y (ndarray): Shape (m,) Label (Actual profits for the cities)\n",
" w, b (scalar): Parameters of the model \n",
" Returns\n",
" dj_dw (scalar): The gradient of the cost w.r.t. the parameters w\n",
" dj_db (scalar): The gradient of the cost w.r.t. the parameter b \n",
" \"\"\"\n",
" \n",
" # Number of training examples\n",
" m = x.shape[0]\n",
" \n",
" # You need to return the following variables correctly\n",
" dj_dw = 0\n",
" dj_db = 0\n",
" \n",
" ### START CODE HERE ### \n",
" # Loop over examples\n",
" for i in range(m): \n",
" # Your code here to get prediction f_wb for the ith example\n",
" f_wb = \n",
" \n",
" # Your code here to get the gradient for w from the ith example \n",
" dj_dw_i = \n",
" \n",
" # Your code here to get the gradient for b from the ith example \n",
" dj_db_i = \n",
" \n",
" # Update dj_db : In Python, a += 1 is the same as a = a + 1\n",
" dj_db += dj_db_i\n",
" \n",
" # Update dj_dw\n",
" dj_dw += dj_dw_i\n",
" \n",
" # Divide both dj_dw and dj_db by m\n",
" dj_dw = dj_dw / m\n",
" dj_db = dj_db / m\n",
" ### END CODE HERE ### \n",
" \n",
" return dj_dw, dj_db\n",
" ```\n",
" \n",
" If you're still stuck, you can check the hints presented below to figure out how to calculate `f_wb` and `cost`.\n",
" \n",
" \n",
" Hint to calculate f_wb\n",
" You did this in the previous exercise! For scalars $a$, $b$ and $c$ (x[i], w and b are all scalars), you can calculate the equation $h = ab + c$ in code as h = a * b + c\n",
" \n",
" More hints to calculate f\n",
" You can compute f_wb as f_wb = w * x[i] + b \n",
" \n",
" \n",
" \n",
" \n",
" Hint to calculate dj_dw_i\n",
" For scalars $a$, $b$ and $c$ (f_wb, y[i] and x[i] are all scalars), you can calculate the equation $h = (a - b)c$ in code as h = (a-b)*c\n",
" \n",
" More hints to calculate f\n",
" You can compute dj_dw_i as dj_dw_i = (f_wb - y[i]) * x[i] \n",
" \n",
" \n",
" \n",
" \n",
" Hint to calculate dj_db_i\n",
" You can compute dj_db_i as dj_db_i = f_wb - y[i] \n",
" \n",
" \n",
" \n",
"\n",
"\n",
"\n",
" \n"
]
},
{
"cell_type": "markdown",
"id": "247d6609",
"metadata": {},
"source": [
"Run the cells below to check your implementation of the `compute_gradient` function with two different initializations of the parameters $w$,$b$."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b9b34b8a",
"metadata": {},
"outputs": [],
"source": [
"# Compute and display gradient with w initialized to zeroes\n",
"initial_w = 0\n",
"initial_b = 0\n",
"\n",
"tmp_dj_dw, tmp_dj_db = compute_gradient(x_train, y_train, initial_w, initial_b)\n",
"print('Gradient at initial w, b (zeros):', tmp_dj_dw, tmp_dj_db)\n",
"\n",
"compute_gradient_test(compute_gradient)"
]
},
{
"cell_type": "markdown",
"id": "cd7603c1",
"metadata": {},
"source": [
"Now let's run the gradient descent algorithm implemented above on our dataset.\n",
"\n",
"**Expected Output**:\n",
"
"
]
},
{
"cell_type": "markdown",
"id": "d13b681c",
"metadata": {},
"source": [
"##### 6.1 Learning parameters using batch gradient descent \n",
"\n",
"\n",
"You will now find the optimal parameters of a linear regression model by using batch gradient descent. Recall batch refers to running all the examples in one iteration.\n",
"- You don't need to implement anything for this part. Simply run the cells below. \n",
"\n",
"- A good way to verify that gradient descent is working correctly is to look\n",
"at the value of $J(w,b)$ and check that it is decreasing with each step. \n",
"\n",
"- Assuming you have implemented the gradient and computed the cost correctly and you have an appropriate value for the learning rate alpha, $J(w,b)$ should never increase and should converge to a steady value by the end of the algorithm."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "47307808",
"metadata": {},
"outputs": [],
"source": [
"def gradient_descent(x, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters): \n",
" \"\"\"\n",
" Performs batch gradient descent to learn theta. Updates theta by taking \n",
" num_iters gradient steps with learning rate alpha\n",
" \n",
" Args:\n",
" x : (ndarray): Shape (m,)\n",
" y : (ndarray): Shape (m,)\n",
" w_in, b_in : (scalar) Initial values of parameters of the model\n",
" cost_function: function to compute cost\n",
" gradient_function: function to compute the gradient\n",
" alpha : (float) Learning rate\n",
" num_iters : (int) number of iterations to run gradient descent\n",
" Returns\n",
" w : (ndarray): Shape (1,) Updated values of parameters of the model after\n",
" running gradient descent\n",
" b : (scalar) Updated value of parameter of the model after\n",
" running gradient descent\n",
" \"\"\"\n",
" \n",
" # number of training examples\n",
" m = len(x)\n",
" \n",
" # An array to store cost J and w's at each iteration — primarily for graphing later\n",
" J_history = []\n",
" w_history = []\n",
" w = copy.deepcopy(w_in) #avoid modifying global w within function\n",
" b = b_in\n",
" \n",
" for i in range(num_iters):\n",
"\n",
" # Calculate the gradient and update the parameters\n",
" dj_dw, dj_db = gradient_function(x, y, w, b ) \n",
" \n",
" # Update Parameters using w, b, alpha and gradient\n",
" w = w - alpha * dj_dw \n",
" b = b - alpha * dj_db \n",
"\n",
" # Save cost J at each iteration\n",
" if i<100000: # prevent resource exhaustion \n",
" cost = cost_function(x, y, w, b)\n",
" J_history.append(cost)\n",
"\n",
" # Print cost every at intervals 10 times or as many iterations if < 10\n",
" if i% math.ceil(num_iters/10) == 0:\n",
" w_history.append(w)\n",
" print(f\"Iteration {i:4}: Cost {float(J_history[-1]):8.2f} \")\n",
" \n",
" return w, b, J_history, w_history #return w and J,w history for graphing"
]
},
{
"cell_type": "markdown",
"id": "3302fed3",
"metadata": {},
"source": [
"Now let's run the gradient descent algorithm above to learn the parameters for our dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c16c485b",
"metadata": {},
"outputs": [],
"source": [
"# initialize fitting parameters. Recall that the shape of w is (n,)\n",
"initial_w = 20\n",
"initial_b = 5\n",
"\n",
"# some gradient descent settings\n",
"iterations = 15000\n",
"alpha = 0.01\n",
"\n",
"w,b,_,_ = gradient_descent(x_train ,y_train, initial_w, initial_b, compute_cost, compute_gradient, alpha, iterations)\n",
"print(\"w,b found by gradient descent:\", w, b)"
]
},
{
"cell_type": "markdown",
"id": "fe7c4c79",
"metadata": {},
"source": [
"**Expected Output**:\n",
"
\n",
"
\n",
"
w, b found by gradient descent
\n",
"
1.16636235 -3.63029143940436
\n",
"
\n",
"
"
]
},
{
"cell_type": "markdown",
"id": "0c0460e9",
"metadata": {},
"source": [
"We will now use the final parameters from gradient descent to plot the linear fit. \n",
"\n",
"Recall that we can get the prediction for a single example $f(x^{(i)})= wx^{(i)}+b$. \n",
"\n",
"To calculate the predictions on the entire dataset, we can loop through all the training examples and calculate the prediction for each example. This is shown in the code block below."
]
},
{
"cell_type": "markdown",
"id": "908a3ada",
"metadata": {},
"source": [
"##### Assignment2: My solution"
]
},
{
"cell_type": "markdown",
"id": "35d5b587",
"metadata": {},
"source": [
"\n",
" My Solution\n",
" \n",
" ```python\n",
" \n",
"#my solution: Dictate learning rate automatically,costrain parameter within boundry\n",
"\n",
"import sys\n",
"#add modules from the path\n",
"sys.path.append(\"/home/amitk/my_web/Machine-Learning-Andrew-Ng/source/source_files/Supervised_Machine_Learning_Regression_and_Classification/week2/C1W2A1\")\n",
"\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"\n",
"from utils import *\n",
"import copy\n",
"import math\n",
"%matplotlib inline \n",
"\n",
"#to show graphs inline\n",
"# load the dataset\n",
"#x_train, y_train = load_data()\n",
"x_train=np.linspace(5,25,100)\n",
"y_train= 3*x_train-8 + np.random.normal(0,1,len(x_train)) \n",
"\n",
"\n",
"def model(x,theta):\n",
" w,b=theta\n",
" return w*x+b\n",
"\n",
"def dmodel_w(x,theta): \n",
" w,b=theta\n",
" return x\n",
"\n",
"def dmodel_b(x,theta): \n",
" w,b=theta\n",
" return 1.\n",
"\n",
"\n",
"def cost(x,theta,y):\n",
" cf= ( model(x,theta) - y)**2\n",
" return np.sum(cf)/2/np.shape(x_train)[0]\n",
"\n",
"def dcost_w(x,theta,y):\n",
" return np.sum((model(x,theta)-y)*dmodel_w(x,theta))/len(x)\n",
"\n",
"def dcost_b(x,theta,y):\n",
" return np.sum((model(x,theta)-y)*dmodel_b(x,theta))/len(x)\n",
" \n",
"def compute_gradient(x,theta,y):\n",
" return dcost_w(x,theta,y),dcost_b(x,theta,y)\n",
"\n",
"np.set_printoptions(precision=2)\n",
"def gradient_decent(x,y,theta,alpha,niter):\n",
" w,b=theta\n",
" if theta[1]>0: #constraining parameters\n",
" b=-theta[1]\n",
" cost_i=np.zeros(niter)\n",
" for i in np.arange(niter):\n",
" if i>1:\n",
" if np.abs((cost_i[i]-cost_i[i-1])/cost_i[i])<0.05:\n",
" alpha/=2\n",
" \n",
" dcw,dcb= compute_gradient(x,theta,y)\n",
" w = w-alpha*dcw\n",
" b = b-alpha*dcb\n",
" theta=w,b\n",
" cost_i[i]=cost(x,theta,y)\n",
" if i>1:\n",
" if cost_i[i]>cost_i[i-1]:\n",
" alpha/=2\n",
" #print(cost_i[i],alpha)\n",
" #print(theta) \n",
" return cost_i,theta\n",
"\n",
" \n",
" \n",
"niter=10000\n",
"Win=20\n",
"Bin=5\n",
"alpha=0.5\n",
"theta_in=Win,Bin\n",
"grad_dec_result,theta_f=gradient_decent(x_train,y_train,theta_in,alpha,niter) \n",
"\n",
"wf,bf=theta_f\n",
"print(wf,bf,grad_dec_result[-1])\n",
"#print(compute_gradient(x_train,y_train,0.2,0.2))\n",
"ax=plt.subplot(121)\n",
"\n",
"plt.plot(np.arange(niter),grad_dec_result,\".\")\n",
"plt.yscale(\"log\")\n",
"plt.xlabel(\"No of steps\")\n",
"plt.ylabel(\"Cost function\")\n",
"plt.ylim(bottom=0.01)\n",
"#plt.xlim(0,100)\n",
"#plt.show()\n",
"m = x_train.shape[0]\n",
"predictedamit = np.zeros(m)\n",
"\n",
"for i in range(m):\n",
" predictedamit[i] = wf * x_train[i] + bf\n",
"\n",
" \n",
"ax=plt.subplot(122) \n",
"# Plot the linear fit\n",
"#plt.plot(x_train, predicted, c = \"b\")\n",
"plt.plot(x_train, predictedamit, c = \"g\",label=\"Predcited model\")\n",
"\n",
"# Create a scatter plot of the data. \n",
"plt.scatter(x_train, y_train, marker='x', c='r') \n",
"\n",
"# Set the title\n",
"plt.title(\"Model fit\")\n",
"# Set the y-axis label\n",
"plt.ylabel('training data')\n",
"# Set the x-axis label\n",
"plt.xlabel('training input') \n",
"plt.legend()\n",
"plt.tight_layout()\n",
" \n",
" ```\n",
"\n",
"\n",
"\n",
"\n",
" \n"
]
},
{
"cell_type": "markdown",
"id": "88f79b65",
"metadata": {},
"source": [
"\n",
" How to write summary\n",
" \n",
" ```python\n",
" import math\n",
" %matplotlib inline \n",
" plt.xlabel('Area of triangle')\n",
" ```\n",
"\n",
" \n",
" See hints\n",
" \n",
" ```python\n",
" import math\n",
" %matplotlib inline \n",
" plt.xlabel('Area of triangle')\n",
" ```\n",
" \n",
" \n",
"\n",
" \n",
"\n",
"\n",
"\n",
" \n",
" Hint to calculate f_wb For scalars $a$, $b$ and $c$ (x[i], w and b are all scalars), you can calculate the equation $h = ab + c$ in code as h = a * b + c\n",
" \n",
" More hints to calculate f You can compute f_wb as f_wb = w * x[i] + b \n",
" \n",
" \n",
"\n",
" \n",
" Hint to calculate cost You can calculate the square of a variable z as z**2\n",
" \n",
" More hints to calculate cost You can compute cost as cost = (f_wb - y[i]) ** 2\n",
" \n",
" \n",
"\n",
"\n",
" \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1180e202",
"metadata": {},
"outputs": [],
"source": [
"m = x_train.shape[0]\n",
"predicted = np.zeros(m)\n",
"\n",
"for i in range(m):\n",
" predicted[i] = w * x_train[i] + b\n",
" \n",
" "
]
},
{
"cell_type": "markdown",
"id": "c973eb09",
"metadata": {},
"source": [
"We will now plot the predicted values to see the linear fit."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "238e66d0",
"metadata": {},
"outputs": [],
"source": [
"# Plot the linear fit\n",
"plt.plot(x_train, predicted, c = \"b\")\n",
"#plt.plot(x_train, predictedamit, c = \"g\")\n",
"\n",
"# Create a scatter plot of the data. \n",
"plt.scatter(x_train, y_train, marker='x', c='r') \n",
"\n",
"# Set the title\n",
"plt.title(\"Profits vs. Population per city\")\n",
"# Set the y-axis label\n",
"plt.ylabel('Profit in $10,000')\n",
"# Set the x-axis label\n",
"plt.xlabel('Population of City in 10,000s')"
]
},
{
"cell_type": "markdown",
"id": "e3f2dba9",
"metadata": {},
"source": [
"Your final values of $w,b$ can also be used to make predictions on profits. Let's predict what the profit would be in areas of 35,000 and 70,000 people. \n",
"\n",
"- The model takes in population of a city in 10,000s as input. \n",
"\n",
"- Therefore, 35,000 people can be translated into an input to the model as `np.array([3.5])`\n",
"\n",
"- Similarly, 70,000 people can be translated into an input to the model as `np.array([7.])`\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0395000c",
"metadata": {},
"outputs": [],
"source": [
"predict1 = 3.5 * w + b\n",
"print('For population = 35,000, we predict a profit of $%.2f' % (predict1*10000))\n",
"\n",
"predict2 = 7.0 * w + b\n",
"print('For population = 70,000, we predict a profit of $%.2f' % (predict2*10000))"
]
},
{
"cell_type": "markdown",
"id": "4029bda9",
"metadata": {},
"source": [
"**Expected Output**:\n",
"
\n"
]
},
{
"cell_type": "markdown",
"id": "f2acf972",
"metadata": {},
"source": [
"## Module - 3"
]
},
{
"cell_type": "markdown",
"id": "d5c72459",
"metadata": {},
"source": [
"### Optional Lab W3"
]
},
{
"cell_type": "markdown",
"id": "3941ceb8",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"#### Optional Lab - 3.1: Classification\n",
"\n",
"In this lab, you will contrast regression and classification."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cbd4f7cc",
"metadata": {},
"outputs": [],
"source": [
"import os,sys\n",
"proj_path=f\"{os.environ['HOME']}/my_web/Machine-Learning-Andrew-Ng\"\n",
"module3=f\"{proj_path}/source/source_files/Supervised_Machine_Learning_Regression_and_Classification/\"\n",
"os.chdir(module3)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2572523f",
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"plt.style.use(\"week3/OptionalLabs/deeplearning.mplstyle\")\n",
"sys.path.append(f\"{module3}/week3/OptionalLabs\")\n",
"from lab_utils_common import dlc, plot_data\n",
"from plt_one_addpt_onclick import plt_one_addpt_onclick\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"id": "f31ffb13",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### Classification Problems\n",
" Examples of classification problems are things like: identifying email as Spam or Not Spam or determining if a tumor is malignant or benign. In particular, these are examples of *binary* classification where there are two possible outcomes. Outcomes can be described in pairs of 'positive'/'negative' such as 'yes'/'no, 'true'/'false' or '1'/'0'. \n",
"\n",
"Plots of classification data sets often use symbols to indicate the outcome of an example. In the plots below, 'X' is used to represent the positive values while 'O' represents negative outcomes. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ef30f6ed",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"x_train = np.array([0., 1, 2, 3, 4, 5])\n",
"y_train = np.array([0, 0, 0, 1, 1, 1])\n",
"X_train2 = np.array([[0.5, 1.5], [1,1], [1.5, 0.5], [3, 0.5], [2, 2], [1, 2.5]])\n",
"y_train2 = np.array([0, 0, 0, 1, 1, 1])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "555a8709",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"pos = y_train == 1\n",
"neg = y_train == 0\n",
"\n",
"fig,ax = plt.subplots(1,2,figsize=(8,3))\n",
"#plot 1, single variable\n",
"ax[0].scatter(x_train[pos], y_train[pos], marker='x', s=80, c = 'red', label=\"y=1\")\n",
"ax[0].scatter(x_train[neg], y_train[neg], marker='o', s=100, label=\"y=0\", facecolors='none', edgecolors=dlc[\"dlblue\"],lw=3)\n",
"\n",
"ax[0].set_ylim(-0.08,1.1)\n",
"ax[0].set_ylabel('y', fontsize=12)\n",
"ax[0].set_xlabel('x', fontsize=12)\n",
"ax[0].set_title('one variable plot')\n",
"ax[0].legend()\n",
"\n",
"#plot 2, two variables\n",
"plot_data(X_train2, y_train2, ax[1])\n",
"ax[1].axis([0, 4, 0, 4])\n",
"ax[1].set_ylabel('$x_1$', fontsize=12)\n",
"ax[1].set_xlabel('$x_0$', fontsize=12)\n",
"ax[1].set_title('two variable plot')\n",
"ax[1].legend()\n",
"plt.tight_layout()\n",
"plt.show()\n"
]
},
{
"cell_type": "markdown",
"id": "4110c136",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Note in the plots above:\n",
"- In the single variable plot, positive results are shown both a red 'X's and as y=1. Negative results are blue 'O's and are located at y=0.\n",
" - Recall in the case of linear regression, y would not have been limited to two values but could have been any value.\n",
"- In the two-variable plot, the y axis is not available. Positive results are shown as red 'X's, while negative results use the blue 'O' symbol.\n",
" - Recall in the case of linear regression with multiple variables, y would not have been limited to two values and a similar plot would have been three-dimensional."
]
},
{
"cell_type": "markdown",
"id": "5590635c",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### Linear Regression approach\n",
"In the previous week, you applied linear regression to build a prediction model. Let's try that approach here using the simple example that was described in the lecture. The model will predict if a tumor is benign or malignant based on tumor size. Try the following:\n",
"- Click on 'Run Linear Regression' to find the best linear regression model for the given data.\n",
" - Note the resulting linear model does **not** match the data well. \n",
"One option to improve the results is to apply a *threshold*. \n",
"- Tick the box on the 'Toggle 0.5 threshold' to show the predictions if a threshold is applied.\n",
" - These predictions look good, the predictions match the data\n",
"- *Important*: Now, add further 'malignant' data points on the far right, in the large tumor size range (near 10), and re-run linear regression.\n",
" - Now, the model predicts the larger tumor, but data point at x=3 is being incorrectly predicted!\n",
"- to clear/renew the plot, rerun the cell containing the plot command."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "de29056d",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"w_in = np.zeros((1))\n",
"b_in = 0\n",
"plt.close('all') \n",
"addpt = plt_one_addpt_onclick( x_train,y_train, w_in, b_in, logistic=False)"
]
},
{
"cell_type": "markdown",
"id": "4a7b4314",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"The example above demonstrates that the linear model is insufficient to model categorical data. The model can be extended as described in the following lab."
]
},
{
"cell_type": "markdown",
"id": "ab2364fc",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"In this lab you:\n",
"- explored categorical data sets and plotting\n",
"- determined that linear regression was insufficient for a classification problem."
]
},
{
"cell_type": "markdown",
"id": "9ffe5098",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"#### Optional Lab - 3.2: Logistic Regression\n",
"\n",
"In this ungraded lab, you will \n",
"- explore the sigmoid function (also known as the logistic function)\n",
"- explore logistic regression; which uses the sigmoid function"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7c92ccec",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"import numpy as np\n",
"%matplotlib widget\n",
"import matplotlib.pyplot as plt\n",
"from plt_one_addpt_onclick import plt_one_addpt_onclick\n",
"from lab_utils_common import draw_vthresh\n",
"plt.style.use('week3/OptionalLabs/deeplearning.mplstyle')"
]
},
{
"cell_type": "markdown",
"id": "39a36487",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### Sigmoid or Logistic Function\n",
"As discussed in the lecture videos, for a classification task, we can start by using our linear regression model, $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) = \\mathbf{w} \\cdot \\mathbf{x}^{(i)} + b$, to predict $y$ given $x$. \n",
"- However, we would like the predictions of our classification model to be between 0 and 1 since our output variable $y$ is either 0 or 1. \n",
"- This can be accomplished by using a \"sigmoid function\" which maps all input values to values between 0 and 1. \n",
"\n",
"\n",
"Let's implement the sigmoid function and see this for ourselves.\n",
"\n",
"##### Formula for Sigmoid function\n",
"\n",
"The formula for a sigmoid function is as follows - \n",
"\n",
"$g(z) = \\frac{1}{1+e^{-z}}\\tag{1}$\n",
"\n",
"In the case of logistic regression, z (the input to the sigmoid function), is the output of a linear regression model. \n",
"- In the case of a single example, $z$ is scalar.\n",
"- in the case of multiple examples, $z$ may be a vector consisting of $m$ values, one for each example. \n",
"- The implementation of the sigmoid function should cover both of these potential input formats.\n",
"Let's implement this in Python."
]
},
{
"cell_type": "markdown",
"id": "256fca46",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"NumPy has a function called [`exp()`](https://numpy.org/doc/stable/reference/generated/numpy.exp.html), which offers a convenient way to calculate the exponential ( $e^{z}$) of all elements in the input array (`z`).\n",
" \n",
"It also works with a single number as an input, as shown below."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fa442685",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"# Input is an array. \n",
"input_array = np.array([1,2,3])\n",
"exp_array = np.exp(input_array)\n",
"\n",
"print(\"Input to exp:\", input_array)\n",
"print(\"Output of exp:\", exp_array)\n",
"\n",
"# Input is a single number\n",
"input_val = 1 \n",
"exp_val = np.exp(input_val)\n",
"\n",
"print(\"Input to exp:\", input_val)\n",
"print(\"Output of exp:\", exp_val)"
]
},
{
"cell_type": "markdown",
"id": "e9596531",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"The `sigmoid` function is implemented in python as shown in the cell below."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dc319850",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"def sigmoid(z):\n",
" \"\"\"\n",
" Compute the sigmoid of z\n",
"\n",
" Args:\n",
" z (ndarray): A scalar, numpy array of any size.\n",
"\n",
" Returns:\n",
" g (ndarray): sigmoid(z), with the same shape as z\n",
" \n",
" \"\"\"\n",
"\n",
" g = 1/(1+np.exp(-z))\n",
" \n",
" return g"
]
},
{
"cell_type": "markdown",
"id": "14c4ba5e",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Let's see what the output of this function is for various value of `z`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dac07d87",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"# Generate an array of evenly spaced values between -10 and 10\n",
"z_tmp = np.arange(-10,11)\n",
"\n",
"# Use the function implemented above to get the sigmoid values\n",
"y = sigmoid(z_tmp)\n",
"\n",
"# Code for pretty printing the two arrays next to each other\n",
"np.set_printoptions(precision=3) \n",
"print(\"Input (z), Output (sigmoid(z))\")\n",
"print(np.c_[z_tmp, y])"
]
},
{
"cell_type": "markdown",
"id": "5a2287bb",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"The values in the left column are `z`, and the values in the right column are `sigmoid(z)`. As you can see, the input values to the sigmoid range from -10 to 10, and the output values range from 0 to 1. \n",
"\n",
"Now, let's try to plot this function using the `matplotlib` library."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3e9e89ad",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"# Plot z vs sigmoid(z)\n",
"fig,ax = plt.subplots(1,1,figsize=(5,3))\n",
"ax.plot(z_tmp, y, c=\"b\")\n",
"\n",
"ax.set_title(\"Sigmoid function\")\n",
"ax.set_ylabel('sigmoid(z)')\n",
"ax.set_xlabel('z')\n",
"draw_vthresh(ax,0)"
]
},
{
"cell_type": "markdown",
"id": "1ee2fdc7",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"As you can see, the sigmoid function approaches `0` as `z` goes to large negative values and approaches `1` as `z` goes to large positive values.\n"
]
},
{
"cell_type": "markdown",
"id": "5b6be211",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### Logistic Regression\n",
" A logistic regression model applies the sigmoid to the familiar linear regression model as shown below:\n",
"\n",
"$$ f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) = g(\\mathbf{w} \\cdot \\mathbf{x}^{(i)} + b ) \\tag{2} $$ \n",
"\n",
" where\n",
"\n",
" $g(z) = \\frac{1}{1+e^{-z}}\\tag{3}$\n"
]
},
{
"cell_type": "markdown",
"id": "59745ca1",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
" \n",
"Let's apply logistic regression to the categorical data example of tumor classification. \n",
"First, load the examples and initial values for the parameters.\n",
" \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5db9b025",
"metadata": {
"pycharm": {
"name": "#%%\n"
},
"tags": []
},
"outputs": [],
"source": [
"x_train = np.array([0., 1, 2, 3, 4, 5])\n",
"y_train = np.array([0, 0, 0, 1, 1, 1])\n",
"\n",
"w_in = np.zeros((1))\n",
"b_in = 0"
]
},
{
"cell_type": "markdown",
"id": "076378af",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Try the following steps:\n",
"- Click on 'Run Logistic Regression' to find the best logistic regression model for the given training data\n",
" - Note the resulting model fits the data quite well.\n",
" - Note, the orange line is '$z$' or $\\mathbf{w} \\cdot \\mathbf{x}^{(i)} + b$ above. It does not match the line in a linear regression model.\n",
"Further improve these results by applying a *threshold*. \n",
"- Tick the box on the 'Toggle 0.5 threshold' to show the predictions if a threshold is applied.\n",
" - These predictions look good. The predictions match the data\n",
" - Now, add further data points in the large tumor size range (near 10), and re-run logistic regression.\n",
" - unlike the linear regression model, this model continues to make correct predictions"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c0f8764c",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"plt.close('all') \n",
"addpt = plt_one_addpt_onclick( x_train,y_train, w_in, b_in, logistic=True)"
]
},
{
"cell_type": "markdown",
"id": "60ac40e2",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"You have explored the use of the sigmoid function in logistic regression."
]
},
{
"cell_type": "markdown",
"id": "c164236e",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"#### Optional Lab - 3.3: Logistic Regression, Decision Boundary\n"
]
},
{
"cell_type": "markdown",
"id": "7f045e80",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### Goals\n",
"In this lab, you will:\n",
"- Plot the decision boundary for a logistic regression model. This will give you a better sense of what the model is predicting.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f69c660a",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"import numpy as np\n",
"%matplotlib widget\n",
"import matplotlib.pyplot as plt\n",
"from lab_utils_common import plot_data, sigmoid, draw_vthresh\n",
"#plt.style.use('week3/OptionalLabs/deeplearning.mplstyle')"
]
},
{
"cell_type": "markdown",
"id": "1205b0c6",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### Dataset\n",
"\n",
"Let's suppose you have following training dataset\n",
"- The input variable `X` is a numpy array which has 6 training examples, each with two features\n",
"- The output variable `y` is also a numpy array with 6 examples, and `y` is either `0` or `1`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3642026f",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"X = np.array([[0.5, 1.5], [1,1], [1.5, 0.5], [3, 0.5], [2, 2], [1, 2.5]])\n",
"y = np.array([0, 0, 0, 1, 1, 1]).reshape(-1,1) "
]
},
{
"cell_type": "markdown",
"id": "8be4a974",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### Plot data \n",
"\n",
"Let's use a helper function to plot this data. The data points with label $y=1$ are shown as red crosses, while the data points with label $y=0$ are shown as blue circles. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "199f1848",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"fig,ax = plt.subplots(1,1,figsize=(4,4))\n",
"plot_data(X, y, ax)\n",
"\n",
"ax.axis([0, 4, 0, 3.5])\n",
"ax.set_ylabel('$x_1$')\n",
"ax.set_xlabel('$x_0$')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "74003f78",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### Logistic regression model\n",
"\n",
"\n",
"* Suppose you'd like to train a logistic regression model on this data which has the form \n",
"\n",
" $f(x) = g(w_0x_0+w_1x_1 + b)$\n",
" \n",
" where $g(z) = \\frac{1}{1+e^{-z}}$, which is the sigmoid function\n",
"\n",
"\n",
"* Let's say that you trained the model and get the parameters as $b = -3, w_0 = 1, w_1 = 1$. That is,\n",
"\n",
" $f(x) = g(x_0+x_1-3)$\n",
"\n",
" (You'll learn how to fit these parameters to the data further in the course)\n",
" \n",
" \n",
"Let's try to understand what this trained model is predicting by plotting its decision boundary"
]
},
{
"cell_type": "markdown",
"id": "c287e48f",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### Refresher on logistic regression and decision boundary\n",
"\n",
"* Recall that for logistic regression, the model is represented as \n",
"\n",
" $$f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) = g(\\mathbf{w} \\cdot \\mathbf{x}^{(i)} + b) \\tag{1}$$\n",
"\n",
" where $g(z)$ is known as the sigmoid function and it maps all input values to values between 0 and 1:\n",
"\n",
" $g(z) = \\frac{1}{1+e^{-z}}\\tag{2}$\n",
" and $\\mathbf{w} \\cdot \\mathbf{x}$ is the vector dot product:\n",
" \n",
" $$\\mathbf{w} \\cdot \\mathbf{x} = w_0 x_0 + w_1 x_1$$\n",
" \n",
" \n",
" * We interpret the output of the model ($f_{\\mathbf{w},b}(x)$) as the probability that $y=1$ given $\\mathbf{x}$ and parameterized by $\\mathbf{w}$ and $b$.\n",
"* Therefore, to get a final prediction ($y=0$ or $y=1$) from the logistic regression model, we can use the following heuristic -\n",
"\n",
" if $f_{\\mathbf{w},b}(x) >= 0.5$, predict $y=1$\n",
" \n",
" if $f_{\\mathbf{w},b}(x) < 0.5$, predict $y=0$\n",
" \n",
" \n",
"* Let's plot the sigmoid function to see where $g(z) >= 0.5$"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "06536178",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"# Plot sigmoid(z) over a range of values from -10 to 10\n",
"z = np.arange(-10,11)\n",
"\n",
"fig,ax = plt.subplots(1,1,figsize=(5,3))\n",
"# Plot z vs sigmoid(z)\n",
"ax.plot(z, sigmoid(z), c=\"b\")\n",
"\n",
"ax.set_title(\"Sigmoid function\")\n",
"ax.set_ylabel('sigmoid(z)')\n",
"ax.set_xlabel('z')\n",
"draw_vthresh(ax,0)"
]
},
{
"cell_type": "markdown",
"id": "31154efa",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"* As you can see, $g(z) >= 0.5$ for $z >=0$\n",
"\n",
"* For a logistic regression model, $z = \\mathbf{w} \\cdot \\mathbf{x} + b$. Therefore,\n",
"\n",
" if $\\mathbf{w} \\cdot \\mathbf{x} + b >= 0$, the model predicts $y=1$\n",
" \n",
" if $\\mathbf{w} \\cdot \\mathbf{x} + b < 0$, the model predicts $y=0$\n",
" \n",
" \n",
" \n",
"##### Plotting decision boundary\n",
"\n",
"Now, let's go back to our example to understand how the logistic regression model is making predictions.\n",
"\n",
"* Our logistic regression model has the form\n",
"\n",
" $f(\\mathbf{x}) = g(-3 + x_0+x_1)$\n",
"\n",
"\n",
"* From what you've learnt above, you can see that this model predicts $y=1$ if $-3 + x_0+x_1 >= 0$\n",
"\n",
"Let's see what this looks like graphically. We'll start by plotting $-3 + x_0+x_1 = 0$, which is equivalent to $x_1 = 3 - x_0$.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7575f8d5",
"metadata": {},
"outputs": [],
"source": [
"#plotting some decision boundry\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"x0 = np.arange(0,2.1,0.01)\n",
"\n",
"x1 = np.sqrt(4 - x0**2)\n",
"fig,ax = plt.subplots(1,1,figsize=(5,4))\n",
"# Plot the decision boundary\n",
"ax.plot(x0,x1, c=\"b\")\n",
"ax.axis([0, 4, 0, 4])\n",
"\n",
"# Fill the region below the line\n",
"ax.fill_between(x0,x1, alpha=0.2)\n",
"\n",
"# Plot the original data\n",
"ax.set_ylabel(r'$x_1$')\n",
"ax.set_xlabel(r'$x_0$')\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b33eeb88",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"# Choose values between 0 and 6\n",
"x0 = np.arange(0,6)\n",
"\n",
"x1 = 3 - x0\n",
"fig,ax = plt.subplots(1,1,figsize=(5,4))\n",
"# Plot the decision boundary\n",
"ax.plot(x0,x1, c=\"b\")\n",
"ax.axis([0, 4, 0, 3.5])\n",
"\n",
"# Fill the region below the line\n",
"ax.fill_between(x0,x1, alpha=0.2)\n",
"\n",
"# Plot the original data\n",
"plot_data(X,y,ax)\n",
"ax.set_ylabel(r'$x_1$')\n",
"ax.set_xlabel(r'$x_0$')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "4b86da6b",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"* In the plot above, the blue line represents the line $x_0 + x_1 - 3 = 0$ and it should intersect the x1 axis at 3 (if we set $x_1$ = 3, $x_0$ = 0) and the x0 axis at 3 (if we set $x_1$ = 0, $x_0$ = 3). \n",
"\n",
"\n",
"* The shaded region represents $-3 + x_0+x_1 < 0$. The region above the line is $-3 + x_0+x_1 > 0$.\n",
"\n",
"\n",
"* Any point in the shaded region (under the line) is classified as $y=0$. Any point on or above the line is classified as $y=1$. This line is known as the \"decision boundary\".\n",
"\n",
"As we've seen in the lectures, by using higher order polynomial terms (eg: $f(x) = g( x_0^2 + x_1 -1)$, we can come up with more complex non-linear boundaries."
]
},
{
"cell_type": "markdown",
"id": "38168e17",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"You have explored the decision boundary in the context of logistic regression."
]
},
{
"cell_type": "markdown",
"id": "976d73e3",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"#### Optional Lab - 3.4: Logistic Regression, Logistic Loss\n",
"\n",
"In this ungraded lab, you will:\n",
"- explore the reason the squared error loss is not appropriate for logistic regression\n",
"- explore the logistic loss function"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bf9a0b64",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "d02e05b0",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"import numpy as np\n",
"%matplotlib widget\n",
"import matplotlib.pyplot as plt\n",
"from plt_logistic_loss import plt_logistic_cost, plt_two_logistic_loss_curves, plt_simple_example\n",
"from plt_logistic_loss import soup_bowl, plt_logistic_squared_error\n",
"plt.style.use('week3/OptionalLabs/deeplearning.mplstyle')"
]
},
{
"cell_type": "markdown",
"id": "3be5d79c",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### Squared error for logistic regression?\n",
" Recall for **Linear** Regression we have used the **squared error cost function**:\n",
"The equation for the squared error cost with one variable is:\n",
" $$J(w,b) = \\frac{1}{2m} \\sum\\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})^2 \\tag{1}$$ \n",
" \n",
"where \n",
" $$f_{w,b}(x^{(i)}) = wx^{(i)} + b \\tag{2}$$\n"
]
},
{
"cell_type": "markdown",
"id": "5f552ee0",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Recall, the squared error cost had the nice property that following the derivative of the cost leads to the minimum."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3f97dafd",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"soup_bowl()"
]
},
{
"cell_type": "markdown",
"id": "01265df7",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"This cost function worked well for linear regression, it is natural to consider it for logistic regression as well. However, as the slide above points out, $f_{wb}(x)$ now has a non-linear component, the sigmoid function: $f_{w,b}(x^{(i)}) = sigmoid(wx^{(i)} + b )$. Let's try a squared error cost on the example from an earlier lab, now including the sigmoid.\n",
"\n",
"Here is our training data:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d7a0079c",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"x_train = np.array([0., 1, 2, 3, 4, 5],dtype=np.longdouble)\n",
"y_train = np.array([0, 0, 0, 1, 1, 1],dtype=np.longdouble)\n",
"plt_simple_example(x_train, y_train)"
]
},
{
"cell_type": "markdown",
"id": "5ef9cf54",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Now, let's get a surface plot of the cost using a *squared error cost*:\n",
" $$J(w,b) = \\frac{1}{2m} \\sum\\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})^2 $$ \n",
" \n",
"where \n",
" $$f_{w,b}(x^{(i)}) = sigmoid(wx^{(i)} + b )$$\n"
]
},
{
"cell_type": "markdown",
"id": "5b9b2478",
"metadata": {},
"source": [
"###### Plot logistic squared error "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3c78eb3c",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"from matplotlib import cm\n",
"x_train = np.array([0., 1, 2, 3, 4, 5],dtype=np.longdouble)\n",
"y_train = np.array([0, 0, 0, 1, 1, 1],dtype=np.longdouble)\n",
"\n",
"wx,by=np.meshgrid(np.linspace(-6,12,100),np.linspace(10,-20,100))\n",
"\n",
"\n",
"def logistic_model(x,w,b):\n",
" return 1/(1+np.exp(-(w*x+b)))\n",
"\n",
"def cost_fn_logistic(x,w,b,y):\n",
" return np.sum((logistic_model(x,w,b)-y)**2)/2/len(x)\n",
"\n",
"\n",
"cost_f=np.zeros(wx.shape)\n",
"for wi in range(wx.shape[0]):\n",
" for wj in range(wx.shape[1]):\n",
" w,b=wx[wi,wj],by[wi,wj]\n",
" #print(cost_fn_logistic(x_train,w,b,y_train))\n",
" cost_f[wi,wj]=cost_fn_logistic(x_train,w,b,y_train)\n",
" \n",
" \n",
"fig = plt.figure()\n",
"fig.canvas.toolbar_visible = False\n",
"fig.canvas.header_visible = False\n",
"fig.canvas.footer_visible = False\n",
"ax = fig.add_subplot(1, 1, 1, projection='3d')\n",
"ax.plot_surface(wx, by, cost_f, alpha=0.6,cmap=cm.coolwarm)\n",
" \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "10726c44",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"plt.close('all')\n",
"plt_logistic_squared_error(x_train,y_train)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "a8165594",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"While this produces a pretty interesting plot, the surface above not nearly as smooth as the 'soup bowl' from linear regression! \n",
"\n",
"Logistic regression requires a cost function more suitable to its non-linear nature. This starts with a Loss function. This is described below."
]
},
{
"cell_type": "markdown",
"id": "c6b0b89a",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### Logistic Loss Function\n",
"\n",
"\n",
" "
]
},
{
"cell_type": "markdown",
"id": "5f552d09",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Logistic Regression uses a loss function more suited to the task of categorization where the target is 0 or 1 rather than any number. \n",
"\n",
">**Definition Note:** In this course, these definitions are used: \n",
"**Loss** is a measure of the difference of a single example to its target value while the \n",
"**Cost** is a measure of the losses over the training set\n",
"\n",
"\n",
"This is defined: \n",
"* $loss(f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}), y^{(i)})$ is the cost for a single data point, which is:\n",
"\n",
"\\begin{equation}\n",
" loss(f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}), y^{(i)}) = \\begin{cases}\n",
" - \\log\\left(f_{\\mathbf{w},b}\\left( \\mathbf{x}^{(i)} \\right) \\right) & \\text{if $y^{(i)}=1$}\\\\\n",
" - \\log \\left( 1 - f_{\\mathbf{w},b}\\left( \\mathbf{x}^{(i)} \\right) \\right) & \\text{if $y^{(i)}=0$}\n",
" \\end{cases}\n",
"\\end{equation}\n",
"\n",
"\n",
"* $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)})$ is the model's prediction, while $y^{(i)}$ is the target value.\n",
"\n",
"* $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) = g(\\mathbf{w} \\cdot\\mathbf{x}^{(i)}+b)$ where function $g$ is the sigmoid function.\n",
"\n",
"The defining feature of this loss function is the fact that it uses two separate curves. One for the case when the target is zero or ($y=0$) and another for when the target is one ($y=1$). Combined, these curves provide the behavior useful for a loss function, namely, being zero when the prediction matches the target and rapidly increasing in value as the prediction differs from the target. Consider the curves below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "edb72317",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"plt_two_logistic_loss_curves()"
]
},
{
"cell_type": "markdown",
"id": "277f5476",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Combined, the curves are similar to the quadratic curve of the squared error loss. Note, the x-axis is $f_{\\mathbf{w},b}$ which is the output of a sigmoid. The sigmoid output is strictly between 0 and 1."
]
},
{
"cell_type": "markdown",
"id": "f5c96b6e",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"The loss function above can be rewritten to be easier to implement.\n",
" $$loss(f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}), y^{(i)}) = (-y^{(i)} \\log\\left(f_{\\mathbf{w},b}\\left( \\mathbf{x}^{(i)} \\right) \\right) - \\left( 1 - y^{(i)}\\right) \\log \\left( 1 - f_{\\mathbf{w},b}\\left( \\mathbf{x}^{(i)} \\right) \\right)$$\n",
" \n",
"This is a rather formidable-looking equation. It is less daunting when you consider $y^{(i)}$ can have only two values, 0 and 1. One can then consider the equation in two pieces: \n",
"when $ y^{(i)} = 0$, the left-hand term is eliminated:\n",
"$$\n",
"\\begin{align}\n",
"loss(f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}), 0) &= (-(0) \\log\\left(f_{\\mathbf{w},b}\\left( \\mathbf{x}^{(i)} \\right) \\right) - \\left( 1 - 0\\right) \\log \\left( 1 - f_{\\mathbf{w},b}\\left( \\mathbf{x}^{(i)} \\right) \\right) \\\\\n",
"&= -\\log \\left( 1 - f_{\\mathbf{w},b}\\left( \\mathbf{x}^{(i)} \\right) \\right)\n",
"\\end{align}\n",
"$$\n",
"and when $ y^{(i)} = 1$, the right-hand term is eliminated:\n",
"$$\n",
"\\begin{align}\n",
" loss(f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}), 1) &= (-(1) \\log\\left(f_{\\mathbf{w},b}\\left( \\mathbf{x}^{(i)} \\right) \\right) - \\left( 1 - 1\\right) \\log \\left( 1 - f_{\\mathbf{w},b}\\left( \\mathbf{x}^{(i)} \\right) \\right)\\\\\n",
" &= -\\log\\left(f_{\\mathbf{w},b}\\left( \\mathbf{x}^{(i)} \\right) \\right)\n",
"\\end{align}\n",
"$$\n",
"\n",
"OK, with this new logistic loss function, a cost function can be produced that incorporates the loss from all the examples. This will be the topic of the next lab. For now, let's take a look at the cost vs parameters curve for the simple example we considered above:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4022f38d",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"plt.close('all')\n",
"cst = plt_logistic_cost(x_train,y_train)"
]
},
{
"cell_type": "markdown",
"id": "beb51ef1",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"This curve is well suited to gradient descent! It does not have plateaus, local minima, or discontinuities. Note, it is not a bowl as in the case of squared error. Both the cost and the log of the cost are plotted to illuminate the fact that the curve, when the cost is small, has a slope and continues to decline. Reminder: you can rotate the above plots using your mouse."
]
},
{
"cell_type": "markdown",
"id": "5b367649",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"You have:\n",
" - determined a squared error loss function is not suitable for classification tasks\n",
" - developed and examined the logistic loss function which **is** suitable for classification tasks.\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "5d717b07",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"#### Optional Lab - 3.5: Cost Function for Logistic Regression\n",
"\n",
"##### Goals\n",
"In this lab, you will:\n",
"- examine the implementation and utilize the cost function for logistic regression."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4a4fdc5e",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"import numpy as np\n",
"%matplotlib widget\n",
"import matplotlib.pyplot as plt\n",
"from lab_utils_common import plot_data, sigmoid, dlc\n",
"plt.style.use('week3/OptionalLabs/deeplearning.mplstyle')"
]
},
{
"cell_type": "markdown",
"id": "c075181d",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"###### Dataset \n",
"Let's start with the same dataset as was used in the decision boundary lab."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6ef2e3dc",
"metadata": {
"pycharm": {
"name": "#%%\n"
},
"tags": []
},
"outputs": [],
"source": [
"X_train = np.array([[0.5, 1.5], [1,1], [1.5, 0.5], [3, 0.5], [2, 2], [1, 2.5]]) #(m,n)\n",
"y_train = np.array([0, 0, 0, 1, 1, 1]) #(m,)"
]
},
{
"cell_type": "markdown",
"id": "d3d3cba7",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"We will use a helper function to plot this data. The data points with label $y=1$ are shown as red crosses, while the data points with label $y=0$ are shown as blue circles."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bf605c65",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"fig,ax = plt.subplots(1,1,figsize=(4,4))\n",
"plot_data(X_train, y_train, ax)\n",
"\n",
"# Set both axes to be from 0-4\n",
"ax.axis([0, 4, 0, 3.5])\n",
"ax.set_ylabel('$x_1$', fontsize=12)\n",
"ax.set_xlabel('$x_0$', fontsize=12)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "f22e896e",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### Cost function\n",
"\n",
"In a previous lab, you developed the *logistic loss* function. Recall, loss is defined to apply to one example. Here you combine the losses to form the **cost**, which includes all the examples.\n",
"\n",
"\n",
"Recall that for logistic regression, the cost function is of the form \n",
"\n",
"$$ J(\\mathbf{w},b) = \\frac{1}{m} \\sum_{i=0}^{m-1} \\left[ loss(f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}), y^{(i)}) \\right] \\tag{1}$$\n",
"\n",
"where\n",
"* $loss(f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}), y^{(i)})$ is the cost for a single data point, which is:\n",
"\n",
" $$loss(f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}), y^{(i)}) = -y^{(i)} \\log\\left(f_{\\mathbf{w},b}\\left( \\mathbf{x}^{(i)} \\right) \\right) - \\left( 1 - y^{(i)}\\right) \\log \\left( 1 - f_{\\mathbf{w},b}\\left( \\mathbf{x}^{(i)} \\right) \\right) \\tag{2}$$\n",
" \n",
"* where m is the number of training examples in the data set and:\n",
"$$\n",
"\\begin{align}\n",
" f_{\\mathbf{w},b}(\\mathbf{x^{(i)}}) &= g(z^{(i)})\\tag{3} \\\\\n",
" z^{(i)} &= \\mathbf{w} \\cdot \\mathbf{x}^{(i)}+ b\\tag{4} \\\\\n",
" g(z^{(i)}) &= \\frac{1}{1+e^{-z^{(i)}}}\\tag{5} \n",
"\\end{align}\n",
"$$\n",
" "
]
},
{
"cell_type": "markdown",
"id": "6c505662",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### Code Description\n",
"\n",
"\n",
"\n",
"The algorithm for `compute_cost_logistic` loops over all the examples calculating the loss for each example and accumulating the total.\n",
"\n",
"Note that the variables X and y are not scalar values but matrices of shape ($m, n$) and ($𝑚$,) respectively, where $𝑛$ is the number of features and $𝑚$ is the number of training examples.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "22a5ddbe",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"def compute_cost_logistic(X, y, w, b):\n",
" \"\"\"\n",
" Computes cost\n",
"\n",
" Args:\n",
" X (ndarray (m,n)): Data, m examples with n features\n",
" y (ndarray (m,)) : target values\n",
" w (ndarray (n,)) : model parameters \n",
" b (scalar) : model parameter\n",
" \n",
" Returns:\n",
" cost (scalar): cost\n",
" \"\"\"\n",
"\n",
" m = X.shape[0]\n",
" cost = 0.0\n",
" for i in range(m):\n",
" z_i = np.dot(X[i],w) + b\n",
" f_wb_i = sigmoid(z_i)\n",
" cost += -y[i]*np.log(f_wb_i) - (1-y[i])*np.log(1-f_wb_i)\n",
" \n",
" cost = cost / m\n",
" return cost\n"
]
},
{
"cell_type": "markdown",
"id": "180dec69",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Check the implementation of the cost function using the cell below."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f2bb9ca6",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"w_tmp = np.array([1,1])\n",
"b_tmp = -3\n",
"print(compute_cost_logistic(X_train, y_train, w_tmp, b_tmp))"
]
},
{
"cell_type": "markdown",
"id": "0d1f4ef7",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"**Expected output**: 0.3668667864055175"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7315c0e3",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"from matplotlib import cm\n",
"\n",
"wx,by=np.meshgrid(np.linspace(-6,12,100),np.linspace(10,-20,100))\n",
"\n",
"\n",
"def logistic_model(x,w,b):\n",
" return np.array([ (1/(1+np.exp(-(np.dot(w,i)+b)))) for i in x])\n",
" \n",
"def cost_fn_logistic(x,w,b,y):\n",
" return np.sum(-y*np.log(logistic_model(x,w,b))-(1-y)*np.log(1-logistic_model(x,w,b)))/len(x)\n",
"cost_fn_logistic(X_train,np.array([1,1]),-4,y_train)\n"
]
},
{
"cell_type": "markdown",
"id": "ac31da9f",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### Example\n",
"Now, let's see what the cost function output is for a different value of $w$. \n",
"\n",
"* In a previous lab, you plotted the decision boundary for $b = -3, w_0 = 1, w_1 = 1$. That is, you had `b = -3, w = np.array([1,1])`.\n",
"\n",
"* Let's say you want to see if $b = -4, w_0 = 1, w_1 = 1$, or `b = -4, w = np.array([1,1])` provides a better model.\n",
"\n",
"Let's first plot the decision boundary for these two different $b$ values to see which one fits the data better.\n",
"\n",
"* For $b = -3, w_0 = 1, w_1 = 1$, we'll plot $-3 + x_0+x_1 = 0$ (shown in blue)\n",
"* For $b = -4, w_0 = 1, w_1 = 1$, we'll plot $-4 + x_0+x_1 = 0$ (shown in magenta)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dfff9371",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"\n",
"# Choose values between 0 and 6\n",
"x0 = np.arange(0,6)\n",
"\n",
"# Plot the two decision boundaries\n",
"x1 = 3 - x0\n",
"x1_other = 4 - x0\n",
"\n",
"fig,ax = plt.subplots(1, 1, figsize=(4,4))\n",
"# Plot the decision boundary\n",
"ax.plot(x0,x1, c=dlc[\"dlblue\"], label=\"$b$=-3\")\n",
"ax.plot(x0,x1_other, c=dlc[\"dlmagenta\"], label=\"$b$=-4\")\n",
"ax.axis([0, 4, 0, 4])\n",
"\n",
"# Plot the original data\n",
"plot_data(X_train,y_train,ax)\n",
"ax.axis([0, 4, 0, 4])\n",
"ax.set_ylabel('$x_1$', fontsize=12)\n",
"ax.set_xlabel('$x_0$', fontsize=12)\n",
"plt.legend(loc=\"upper right\")\n",
"plt.title(\"Decision Boundary\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "c30b9c56",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"You can see from this plot that `b = -4, w = np.array([1,1])` is a worse model for the training data. Let's see if the cost function implementation reflects this."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4d2a4e61",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"w_array1 = np.array([1,1])\n",
"b_1 = -3\n",
"w_array2 = np.array([1,1])\n",
"b_2 = -4\n",
"\n",
"print(\"Cost for b = -3 : \", compute_cost_logistic(X_train, y_train, w_array1, b_1))\n",
"print(\"Cost for b = -4 : \", compute_cost_logistic(X_train, y_train, w_array2, b_2))"
]
},
{
"cell_type": "markdown",
"id": "68a25495",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"**Expected output**\n",
"\n",
"Cost for b = -3 : 0.3668667864055175\n",
"\n",
"Cost for b = -4 : 0.5036808636748461\n",
"\n",
"\n",
"You can see the cost function behaves as expected and the cost for `b = -4, w = np.array([1,1])` is indeed higher than the cost for `b = -3, w = np.array([1,1])`"
]
},
{
"cell_type": "markdown",
"id": "8fb5b5be",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"In this lab you examined and utilized the cost function for logistic regression."
]
},
{
"cell_type": "markdown",
"id": "6beb4eb1",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"#### Optional Lab - 3.6: Gradient Descent for Logistic Regression"
]
},
{
"cell_type": "markdown",
"id": "984999b1",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### Goals\n",
"In this lab, you will:\n",
"- update gradient descent for logistic regression.\n",
"- explore gradient descent on a familiar data set"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0247ebcd",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"import copy, math\n",
"import numpy as np\n",
"%matplotlib widget\n",
"import matplotlib.pyplot as plt\n",
"from lab_utils_common import dlc, plot_data, plt_tumor_data, sigmoid, compute_cost_logistic\n",
"from plt_quad_logistic import plt_quad_logistic, plt_prob\n",
"plt.style.use('week3/OptionalLabs/deeplearning.mplstyle')"
]
},
{
"cell_type": "markdown",
"id": "7a1e36bb",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"###### Data set \n",
"Let's start with the same two feature data set used in the decision boundary lab."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4fc1fdc5",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"X_train = np.array([[0.5, 1.5], [1,1], [1.5, 0.5], [3, 0.5], [2, 2], [1, 2.5]])\n",
"y_train = np.array([0, 0, 0, 1, 1, 1])"
]
},
{
"cell_type": "markdown",
"id": "8df6592b",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"As before, we'll use a helper function to plot this data. The data points with label $y=1$ are shown as red crosses, while the data points with label $y=0$ are shown as blue circles."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bd3a0530",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"fig,ax = plt.subplots(1,1,figsize=(4,4))\n",
"plot_data(X_train, y_train, ax)\n",
"\n",
"ax.axis([0, 4, 0, 3.5])\n",
"ax.set_ylabel('$x_1$', fontsize=12)\n",
"ax.set_xlabel('$x_0$', fontsize=12)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "f893d122",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### Logistic Gradient Descent\n",
"\n",
"\n",
"Recall the gradient descent algorithm utilizes the gradient calculation:\n",
"$$\\begin{align*}\n",
"&\\text{repeat until convergence:} \\; \\lbrace \\\\\n",
"& \\; \\; \\;w_j = w_j - \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} \\tag{1} \\; & \\text{for j := 0..n-1} \\\\ \n",
"& \\; \\; \\; \\; \\;b = b - \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial b} \\\\\n",
"&\\rbrace\n",
"\\end{align*}$$\n",
"\n",
"Where each iteration performs simultaneous updates on $w_j$ for all $j$, where\n",
"$$\\begin{align*}\n",
"\\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} &= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \\tag{2} \\\\\n",
"\\frac{\\partial J(\\mathbf{w},b)}{\\partial b} &= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)}) \\tag{3} \n",
"\\end{align*}$$\n",
"\n",
"* m is the number of training examples in the data set \n",
"* $f_{\\mathbf{w},b}(x^{(i)})$ is the model's prediction, while $y^{(i)}$ is the target\n",
"* For a logistic regression model \n",
" $z = \\mathbf{w} \\cdot \\mathbf{x} + b$ \n",
" $f_{\\mathbf{w},b}(x) = g(z)$ \n",
" where $g(z)$ is the sigmoid function: \n",
" $g(z) = \\frac{1}{1+e^{-z}}$ \n",
" \n"
]
},
{
"cell_type": "markdown",
"id": "8dad45d1",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### Gradient Descent Implementation\n",
"The gradient descent algorithm implementation has two components: \n",
"- The loop implementing equation (1) above. This is `gradient_descent` below and is generally provided to you in optional and practice labs.\n",
"- The calculation of the current gradient, equations (2,3) above. This is `compute_gradient_logistic` below. You will be asked to implement this week's practice lab.\n",
"\n",
"###### Calculating the Gradient, Code Description\n",
"Implements equation (2),(3) above for all $w_j$ and $b$.\n",
"There are many ways to implement this. Outlined below is this:\n",
"- initialize variables to accumulate `dj_dw` and `dj_db`\n",
"- for each example\n",
" - calculate the error for that example $g(\\mathbf{w} \\cdot \\mathbf{x}^{(i)} + b) - \\mathbf{y}^{(i)}$\n",
" - for each input value $x_{j}^{(i)}$ in this example, \n",
" - multiply the error by the input $x_{j}^{(i)}$, and add to the corresponding element of `dj_dw`. (equation 2 above)\n",
" - add the error to `dj_db` (equation 3 above)\n",
"\n",
"- divide `dj_db` and `dj_dw` by total number of examples (m)\n",
"- note that $\\mathbf{x}^{(i)}$ in numpy `X[i,:]` or `X[i]` and $x_{j}^{(i)}$ is `X[i,j]`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ded2e5ce",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "2e692682",
"metadata": {},
"source": [
"#### My solution\n",
"\n",
"\n",
" Logistic regression(1 variable) \n",
" \n",
" ```python\n",
"\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"from matplotlib import cm\n",
"x_train = np.array([0., 1, 2, 3, 4, 5],dtype=np.longdouble)\n",
"y_train = np.array([0, 0, 0, 1, 1, 1],dtype=np.longdouble)\n",
"\n",
"#wx,by=np.meshgrid(np.linspace(-6,12,100),np.linspace(10,-20,100))\n",
"\n",
"def model(x,theta):\n",
" w,b=theta\n",
" sigmoid=np.zeros(len(x))\n",
" for i in range(len(x)):\n",
" if np.isscalar(w):\n",
" w=np.array(w)\n",
" if w.shape!=x[i].shape:\n",
" print(\"Shape of W and X dosn't match\")\n",
" sys.exit() \n",
" sigmoid[i]=1/(1+np.exp(-(np.dot(w,x[i])+b)))\n",
" return sigmoid\n",
"\n",
"def dmodel_w(x,theta): \n",
" w,b=theta\n",
" return x\n",
"\n",
"def dmodel_b(x,theta): \n",
" w,b=theta\n",
" return 1.\n",
"\n",
"def cost(x,theta,y):\n",
" w,b=theta\n",
" cf= -y*np.log(model(x,theta))-(1-y)*np.log(1-model(x,theta))\n",
" return np.sum(cf)/np.shape(x)[0]\n",
"\n",
"def dcost_w(x,theta,y):\n",
" return np.sum((model(x,theta)-y)*dmodel_w(x,theta))/len(x)\n",
"\n",
"def dcost_b(x,theta,y):\n",
" return np.sum((model(x,theta)-y)*dmodel_b(x,theta))/len(x)\n",
" \n",
"def compute_gradient(x,theta,y):\n",
" return dcost_w(x,theta,y),dcost_b(x,theta,y)\n",
"\n",
"np.set_printoptions(precision=2)\n",
"def gradient_decent(x,y,theta,alpha,niter):\n",
" w,b=theta\n",
" if theta[1]>0: #constraining parameters\n",
" b=-theta[1]\n",
" cost_i=np.zeros(niter)\n",
" for i in np.arange(niter):\n",
" if i>1:\n",
" if np.abs((cost_i[i]-cost_i[i-1])/cost_i[i])<0.05:\n",
" alpha/=2\n",
" \n",
" dcw,dcb= compute_gradient(x,theta,y)\n",
" w = w-alpha*dcw\n",
" b = b-alpha*dcb\n",
" theta=w,b\n",
" cost_i[i]=cost(x,theta,y)\n",
" if i>1:\n",
" if cost_i[i]>cost_i[i-1]:\n",
" alpha/=2\n",
" #print(cost_i[i],alpha)\n",
" #print(theta) \n",
" return cost_i,theta\n",
"\n",
" \n",
" \n",
"niter=1000\n",
"Win=20\n",
"Bin=5\n",
"alpha=0.5\n",
"theta_in=Win,Bin\n",
"grad_dec_result,theta_f=gradient_decent(x_train,y_train,theta_in,alpha,niter) \n",
"\n",
"wf,bf=theta_f\n",
"print(wf,bf,grad_dec_result[-1])\n",
"\n",
"\n",
"\n",
"plt.figure(figsize=(8,4))\n",
"ax=plt.subplot(121)\n",
"plt.plot(np.arange(niter),grad_dec_result,\".\")\n",
"plt.yscale(\"log\")\n",
"plt.xlabel(\"No of steps\")\n",
"plt.ylabel(\"Cost function\")\n",
"plt.ylim(bottom=0.01)\n",
"\n",
"\n",
"\n",
"ax=plt.subplot(1,2,2) \n",
"plt.plot(x_train, model(x_train,theta_f), c = \"g\",label=\"Predcited model\")\n",
"plt.scatter(x_train, y_train, marker='x', c='r') \n",
"# Set the title\n",
"plt.title(\"Model fit\")\n",
"# Set the y-axis label\n",
"plt.ylabel('training data')\n",
"# Set the x-axis label\n",
"plt.xlabel('training input') \n",
"plt.legend()\n",
"plt.tight_layout()\n",
"\n",
" \n",
" ```\n",
"\n",
"\n",
"\n",
"\n",
"\n",
" Logistic regression(2 variables) \n",
"\n",
" ```python\n",
"\n",
"import numpy as np,sys\n",
"import matplotlib.pyplot as plt\n",
"from matplotlib import cm\n",
"x_train = np.array([[0.5, 1.5], [1,1], [1.5, 0.5], [3, 0.5], [2, 2],[0.5,0.5],[2.7,1.5], [1, 2.5]])\n",
"y_train = np.array([1, 1, 1, 1, 1,0,1, 1],dtype=np.longdouble)\n",
"\n",
"\n",
"\n",
"#wx,by=np.meshgrid(np.linspace(-6,12,100),np.linspace(10,-20,100))\n",
"\n",
"def model(x,theta):\n",
" w,b=theta\n",
" if np.isscalar(x):\n",
" x=np.array(x)\n",
" if np.isscalar(w):\n",
" w=np.array([w])\n",
" elif isinstance(w,tuple):\n",
" w=np.array(w)\n",
" sigmoid=np.zeros(len(x))\n",
" for i in range(len(x)):\n",
" if w.shape!=x[i].shape:\n",
" print(\"Shape of W and X dosn't match\", w.shape,x[i].shape)\n",
" sys.exit() \n",
" sigmoid[i]=1/(1+np.exp(-(np.dot(w,x[i])+b)))\n",
" return sigmoid\n",
"\n",
"def dmodel_w(x,theta):\n",
" w,b=theta\n",
" return x\n",
"\n",
"def dmodel_b(x,theta): \n",
" w,b=theta\n",
" return 1.\n",
"\n",
"def cost(x,theta,y):\n",
" w,b=theta\n",
" cf= -y*np.log(model(x,theta))-(1-y)*np.log(1-model(x,theta))\n",
" return np.sum(cf)/np.shape(x)[0]\n",
"\n",
"def dcost_w(x,theta,y):\n",
" w,b=theta\n",
" if np.isscalar(w):\n",
" w=np.array([w])\n",
" elif isinstance(w,tuple):\n",
" w=np.array(w)\n",
" dcost_w_result=np.zeros(w.shape) \n",
" for wi in range(len(w)):\n",
" dcost_w_result[wi]=np.sum((model(x,theta)-y)*dmodel_w(x,theta)[:,wi])/len(x) \n",
" return dcost_w_result \n",
"\n",
"def dcost_b(x,theta,y):\n",
" return np.sum((model(x,theta)-y)*dmodel_b(x,theta))/len(x)\n",
" \n",
"def compute_gradient(x,theta,y):\n",
" return dcost_w(x,theta,y),dcost_b(x,theta,y)\n",
"\n",
"np.set_printoptions(precision=2)\n",
"\n",
"def gradient_decent(x,y,theta,alpha,niter):\n",
" w,b=theta\n",
" if np.isscalar(w):\n",
" w=np.array(w)\n",
" elif isinstance(w, tuple):\n",
" w=np.array(w)\n",
"\n",
" if theta[1]>0: #constraining parameters\n",
" b=-theta[1]\n",
" cost_i=np.zeros(niter)\n",
" for i in np.arange(niter):\n",
" if i>1:\n",
" if np.abs((cost_i[i]-cost_i[i-1])/cost_i[i])<0.05:\n",
" alpha/=2\n",
" dcw,dcb= compute_gradient(x,theta,y)\n",
" \n",
" w = w-alpha*dcw\n",
" b = b-alpha*dcb\n",
" theta=w,b\n",
" cost_i[i]=cost(x,theta,y)\n",
" if i>1:\n",
" if cost_i[i]>cost_i[i-1]:\n",
" alpha/=2\n",
" #print(cost_i[i],alpha)\n",
" #print(theta) \n",
" return cost_i,theta\n",
"\n",
" \n",
" \n",
"niter=10000\n",
"Win=np.array([2.,3.])\n",
"Bin=1.\n",
"\n",
"alpha=0.5\n",
"theta_in=Win,Bin\n",
"grad_dec_result,theta_f=gradient_decent(x_train,y_train,theta_in,alpha,niter) \n",
"\n",
"wf,bf=theta_f\n",
"print(wf,bf,grad_dec_result[-1])\n",
"\n",
"\n",
"\n",
"plt.figure(figsize=(8,4))\n",
"ax=plt.subplot(121)\n",
"plt.plot(np.arange(niter),grad_dec_result,\".\")\n",
"plt.yscale(\"log\")\n",
"plt.xlabel(\"No of steps\")\n",
"plt.ylabel(\"Cost function\")\n",
"plt.ylim(bottom=0.01)\n",
"\n",
"\n",
"ax=plt.subplot(1,2,2) \n",
"#plt.plot(x_train, model(x_train,theta_f), c = \"g\",label=\"Predcited model\")\n",
"ax.plot((-bf/wf[0],0),(0,-bf/wf[1]),label=\"Predicted model\")\n",
"pos=y_train>0.5\n",
"neg=y_train<0.5\n",
"plt.scatter(x_train[:,0][pos],x_train[:,1][pos] , marker='x', c='r') \n",
"plt.scatter(x_train[:,0][neg],x_train[:,1][neg] , marker='o', c='b') \n",
"ax.set_ylabel(r'$x_1$')\n",
"ax.set_xlabel(r'$x_0$') \n",
"ax.axis([0, 4, 0, 3.5])\n",
"# Set the title\n",
"plt.title(\"Model fit\")\n",
"# Set the y-axis label\n",
"plt.ylabel('training data')\n",
"# Set the x-axis label\n",
"plt.xlabel('training input') \n",
"plt.legend()\n",
"plt.tight_layout()\n",
"\n",
"\n",
"x_train, model(x_train,theta_f),y_train\n",
"\n",
" ```\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fe38d2d9",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"def compute_gradient_logistic(X, y, w, b): \n",
" \"\"\"\n",
" Computes the gradient for linear regression \n",
" \n",
" Args:\n",
" X (ndarray (m,n): Data, m examples with n features\n",
" y (ndarray (m,)): target values\n",
" w (ndarray (n,)): model parameters \n",
" b (scalar) : model parameter\n",
" Returns\n",
" dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w. \n",
" dj_db (scalar) : The gradient of the cost w.r.t. the parameter b. \n",
" \"\"\"\n",
" m,n = X.shape\n",
" dj_dw = np.zeros((n,)) #(n,)\n",
" dj_db = 0.\n",
"\n",
" for i in range(m):\n",
" f_wb_i = sigmoid(np.dot(X[i],w) + b) #(n,)(n,)=scalar\n",
" err_i = f_wb_i - y[i] #scalar\n",
" for j in range(n):\n",
" dj_dw[j] = dj_dw[j] + err_i * X[i,j] #scalar\n",
" dj_db = dj_db + err_i\n",
" dj_dw = dj_dw/m #(n,)\n",
" dj_db = dj_db/m #scalar\n",
" \n",
" return dj_db, dj_dw"
]
},
{
"cell_type": "markdown",
"id": "5ed1faef",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Check the implementation of the gradient function using the cell below."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ddd41a33",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"X_tmp = np.array([[0.5, 1.5], [1,1], [1.5, 0.5], [3, 0.5], [2, 2], [1, 2.5]])\n",
"y_tmp = np.array([0, 0, 0, 1, 1, 1])\n",
"w_tmp = np.array([2.,3.])\n",
"b_tmp = 1.\n",
"dj_db_tmp, dj_dw_tmp = compute_gradient_logistic(X_tmp, y_tmp, w_tmp, b_tmp)\n",
"print(f\"dj_db: {dj_db_tmp}\" )\n",
"print(f\"dj_dw: {dj_dw_tmp.tolist()}\" )"
]
},
{
"cell_type": "markdown",
"id": "745399c8",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"**Expected output**\n",
"``` \n",
"dj_db: 0.49861806546328574\n",
"dj_dw: [0.498333393278696, 0.49883942983996693]\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "c4cfd3fc",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### Gradient Descent Code \n",
"The code implementing equation (1) above is implemented below. Take a moment to locate and compare the functions in the routine to the equations above."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "77f30888",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"def gradient_descent(X, y, w_in, b_in, alpha, num_iters): \n",
" \"\"\"\n",
" Performs batch gradient descent\n",
" \n",
" Args:\n",
" X (ndarray (m,n) : Data, m examples with n features\n",
" y (ndarray (m,)) : target values\n",
" w_in (ndarray (n,)): Initial values of model parameters \n",
" b_in (scalar) : Initial values of model parameter\n",
" alpha (float) : Learning rate\n",
" num_iters (scalar) : number of iterations to run gradient descent\n",
" \n",
" Returns:\n",
" w (ndarray (n,)) : Updated values of parameters\n",
" b (scalar) : Updated value of parameter \n",
" \"\"\"\n",
" # An array to store cost J and w's at each iteration primarily for graphing later\n",
" J_history = []\n",
" w = copy.deepcopy(w_in) #avoid modifying global w within function\n",
" b = b_in\n",
" \n",
" for i in range(num_iters):\n",
" # Calculate the gradient and update the parameters\n",
" dj_db, dj_dw = compute_gradient_logistic(X, y, w, b) \n",
"\n",
" # Update Parameters using w, b, alpha and gradient\n",
" w = w - alpha * dj_dw \n",
" b = b - alpha * dj_db \n",
" \n",
" # Save cost J at each iteration\n",
" if i<100000: # prevent resource exhaustion \n",
" J_history.append( compute_cost_logistic(X, y, w, b) )\n",
"\n",
" # Print cost every at intervals 10 times or as many iterations if < 10\n",
" if i% math.ceil(num_iters / 10) == 0:\n",
" print(f\"Iteration {i:4d}: Cost {J_history[-1]} \")\n",
" \n",
" return w, b, J_history #return final w,b and J history for graphing\n"
]
},
{
"cell_type": "markdown",
"id": "720c6fdd",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Let's run gradient descent on our data set."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ed3a1ce9",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"w_tmp = np.zeros_like(X_train[0])\n",
"b_tmp = 0.\n",
"alph = 0.1\n",
"iters = 10000\n",
"\n",
"w_out, b_out, _ = gradient_descent(X_train, y_train, w_tmp, b_tmp, alph, iters) \n",
"print(f\"\\nupdated parameters: w:{w_out}, b:{b_out}\")"
]
},
{
"cell_type": "markdown",
"id": "3acea32f",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### Let's plot the results of gradient descent:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "454c6748",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"fig,ax = plt.subplots(1,1,figsize=(5,4))\n",
"# plot the probability \n",
"plt_prob(ax, w_out, b_out)\n",
"\n",
"# Plot the original data\n",
"ax.set_ylabel(r'$x_1$')\n",
"ax.set_xlabel(r'$x_0$') \n",
"ax.axis([0, 4, 0, 3.5])\n",
"plot_data(X_train,y_train,ax)\n",
"\n",
"# Plot the decision boundary\n",
"x0 = -b_out/w_out[0]\n",
"x1 = -b_out/w_out[1]\n",
"ax.plot([0,x0],[x1,0], c=dlc[\"dlblue\"], lw=1)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "db45e5dd",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"In the plot above:\n",
" - the shading reflects the probability y=1 (result prior to decision boundary)\n",
" - the decision boundary is the line at which the probability = 0.5\n",
" "
]
},
{
"cell_type": "markdown",
"id": "434a95de",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"###### Another Data set\n",
"Let's return to a one-variable data set. With just two parameters, $w$, $b$, it is possible to plot the cost function using a contour plot to get a better idea of what gradient descent is up to."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "555b0836",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"x_train = np.array([0., 1, 2, 3, 4, 5])\n",
"y_train = np.array([0, 0, 0, 1, 1, 1])"
]
},
{
"cell_type": "markdown",
"id": "61a539b3",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"As before, we'll use a helper function to plot this data. The data points with label $y=1$ are shown as red crosses, while the data points with label $y=0$ are shown as blue circles."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "64f0f080",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"fig,ax = plt.subplots(1,1,figsize=(4,3))\n",
"plt_tumor_data(x_train, y_train, ax)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "f6ea41f2",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"In the plot below, try:\n",
"- changing $w$ and $b$ by clicking within the contour plot on the upper right.\n",
" - changes may take a second or two\n",
" - note the changing value of cost on the upper left plot.\n",
" - note the cost is accumulated by a loss on each example (vertical dotted lines)\n",
"- run gradient descent by clicking the orange button.\n",
" - note the steadily decreasing cost (contour and cost plot are in log(cost) \n",
" - clicking in the contour plot will reset the model for a new run\n",
"- to reset the plot, rerun the cell"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6983ece1",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"w_range = np.array([-1, 8])\n",
"b_range = np.array([1, -18])\n",
"quad = plt_quad_logistic( x_train, y_train, w_range, b_range )"
]
},
{
"cell_type": "markdown",
"id": "0e72b476",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"You have:\n",
"- examined the formulas and implementation of calculating the gradient for logistic regression\n",
"- utilized those routines in\n",
" - exploring a single variable data set\n",
" - exploring a two-variable data set"
]
},
{
"cell_type": "markdown",
"id": "619d67d0",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"#### Optional Lab - 3.7: Ungraded Lab: Logistic Regression using Scikit-Learn\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "fbb02dcf",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### Goals\n",
"In this lab you will:\n",
"- Train a logistic regression model using scikit-learn.\n"
]
},
{
"cell_type": "markdown",
"id": "825654b3",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"###### Dataset \n",
"Let's start with the same dataset as before."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8b4efcaa",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"X = np.array([[0.5, 1.5], [1,1], [1.5, 0.5], [3, 0.5], [2, 2], [1, 2.5]])\n",
"y = np.array([0, 0, 0, 1, 1, 1])"
]
},
{
"cell_type": "markdown",
"id": "3dcb0174",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### Fit the model\n",
"\n",
"The code below imports the [logistic regression model](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression) from scikit-learn. You can fit this model on the training data by calling `fit` function."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d13eeeff",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"from sklearn.linear_model import LogisticRegression\n",
"\n",
"lr_model = LogisticRegression()\n",
"lr_model.fit(X, y)"
]
},
{
"cell_type": "markdown",
"id": "d8f1d0ec",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### Make Predictions\n",
"\n",
"You can see the predictions made by this model by calling the `predict` function."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bcf0a8d6",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"y_pred = lr_model.predict(X)\n",
"\n",
"print(\"Prediction on training set:\", y_pred)"
]
},
{
"cell_type": "markdown",
"id": "1df576c2",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### Calculate accuracy\n",
"\n",
"You can calculate this accuracy of this model by calling the `score` function."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9728cd8c",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"print(\"Accuracy on training set:\", lr_model.score(X, y))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b07d490f",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "6ec838ec",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"#### Optional Lab - 3.8: Ungraded Lab: Logistic Regression using Scikit-Learn\n"
]
},
{
"cell_type": "markdown",
"id": "e895d21e",
"metadata": {},
"source": [
"##### Ungraded Lab: Overfitting \n",
"\n",
"\n",
"\n",
"\n",
"\n",
"###### Goals\n",
"In this lab, you will explore:\n",
"- the situations where overfitting can occur\n",
"- some of the solutions\n",
"\n",
"\n",
"##### Overfitting\n",
"The week's lecture described situations where overfitting can arise. Run the cell below to generate a plot that will allow you to explore overfitting. There are further instructions below the cell.\n",
"\n",
" ```python\n",
"plt.close(\"all\")\n",
"display(output)\n",
"ofit = overfit_example(False)\n",
" ```\n",
"In the plot above you can:\n",
"- switch between Regression and Categorization examples\n",
"- add data\n",
"- select the degree of the model\n",
"- fit the model to the data \n",
"\n",
"Here are some things you should try:\n",
"- Fit the data with degree = 1; Note 'underfitting'.\n",
"- Fit the data with degree = 6; Note 'overfitting'\n",
"- tune degree to get the 'best fit'\n",
"- add data:\n",
" - extreme examples can increase overfitting (assuming they are outliers).\n",
" - nominal examples can reduce overfitting\n",
"- switch between `Regression` and `Categorical` to try both examples.\n",
"\n",
"To reset the plot, re-run the cell. Click slowly to allow the plot to update before receiving the next click.\n",
"\n",
"Notes on implementations:\n",
"- the 'ideal' curves represent the generator model to which noise was added to achieve the data set\n",
"- 'fit' does not use pure gradient descent to improve speed. These methods can be used on smaller data sets. \n",
"\n",
"You have developed some intuition about the causes and solutions to overfitting. In the next lab, you will explore a commonly used solution, Regularization.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "6428c6a2",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"%matplotlib widget\n",
"import matplotlib.pyplot as plt\n",
"from ipywidgets import Output\n",
"import sys\n",
"sys.path.append(\"week3/OptionalLabs\")\n",
"plt.style.use('week3/OptionalLabs/deeplearning.mplstyle')\n",
"from plt_overfit import overfit_example, output"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "5b657b94",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "af12f81533e941d494086be48ae8e298",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Output()"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "c4c372a238644bdda01e4f48e2a06862",
"version_major": 2,
"version_minor": 0
},
"image/png": "",
"text/html": [
"\n",
"
\n",
"
\n",
" Figure\n",
"
\n",
" \n",
"
\n",
" "
],
"text/plain": [
"Canvas(footer_visible=False, header_visible=False, toolbar=Toolbar(toolitems=[('Home', 'Reset original view', …"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plt.close(\"all\")\n",
"display(output)\n",
"ofit = overfit_example(False)"
]
},
{
"cell_type": "markdown",
"id": "3a1ffc89",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"#### Optional Lab - 3.9 - Regularized Cost and Gradient"
]
},
{
"cell_type": "markdown",
"id": "20519af2",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"###### Goals\n",
"In this lab, you will:\n",
"- extend the previous linear and logistic cost functions with a regularization term.\n",
"- rerun the previous example of over-fitting with a regularization term added.\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "f50f471a",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"import numpy as np,sys,os\n",
"%matplotlib widget\n",
"import matplotlib.pyplot as plt\n",
"proj_path=f\"{os.environ['HOME']}/my_web/Machine-Learning-Andrew-Ng\"\n",
"os.chdir(f\"{proj_path}/source/source_files/Supervised_Machine_Learning_Regression_and_Classification/\")\n",
"sys.path.append(\"week3/C1W3A1\")\n",
"sys.path.append(\"week3/OptionalLabs\")\n",
"\n",
"from plt_overfit import overfit_example, output\n",
"from lab_utils_common import sigmoid\n",
"np.set_printoptions(precision=5)"
]
},
{
"cell_type": "markdown",
"id": "6fa743d1",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### Adding regularization\n",
"\n",
"\n",
"\n",
"The slides above show the cost and gradient functions for both linear and logistic regression. Note:\n",
"- Cost\n",
" - The cost functions differ significantly between linear and logistic regression, but adding regularization to the equations is the same.\n",
"- Gradient\n",
" - The gradient functions for linear and logistic regression are very similar. They differ only in the implementation of $f_{wb}$."
]
},
{
"cell_type": "markdown",
"id": "49ea942b",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### Cost functions with regularization\n",
"###### Cost function for regularized linear regression\n",
"\n",
"The equation for the cost function regularized linear regression is:\n",
"$$J(\\mathbf{w},b) = \\frac{1}{2m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)})^2 + \\frac{\\lambda}{2m} \\sum_{j=0}^{n-1} w_j^2 \\tag{1}$$ \n",
"where:\n",
"$$ f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) = \\mathbf{w} \\cdot \\mathbf{x}^{(i)} + b \\tag{2} $$ \n",
"\n",
"\n",
"Compare this to the cost function without regularization (which you implemented in a previous lab), which is of the form:\n",
"\n",
"$$J(\\mathbf{w},b) = \\frac{1}{2m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)})^2 $$ \n",
"\n",
"The difference is the regularization term, \n",
" $\\frac{\\lambda}{2m} \\sum_{j=0}^{n-1} w_j^2$ \n",
" \n",
"Including this term encourages gradient descent to minimize the size of the parameters. Note, in this example, the parameter $b$ is not regularized. This is standard practice.\n",
"\n",
"Below is an implementation of equations (1) and (2). Note that this uses a *standard pattern for this course*, a `for loop` over all `m` examples."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c6e3a2d3",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"def compute_cost_linear_reg(X, y, w, b, lambda_ = 1):\n",
" \"\"\"\n",
" Computes the cost over all examples\n",
" Args:\n",
" X (ndarray (m,n): Data, m examples with n features\n",
" y (ndarray (m,)): target values\n",
" w (ndarray (n,)): model parameters \n",
" b (scalar) : model parameter\n",
" lambda_ (scalar): Controls amount of regularization\n",
" Returns:\n",
" total_cost (scalar): cost \n",
" \"\"\"\n",
"\n",
" m = X.shape[0]\n",
" n = len(w)\n",
" cost = 0.\n",
" for i in range(m):\n",
" f_wb_i = np.dot(X[i], w) + b #(n,)(n,)=scalar, see np.dot\n",
" cost = cost + (f_wb_i - y[i])**2 #scalar \n",
" cost = cost / (2 * m) #scalar \n",
" \n",
" reg_cost = 0\n",
" for j in range(n):\n",
" reg_cost += (w[j]**2) #scalar\n",
" reg_cost = (lambda_/(2*m)) * reg_cost #scalar\n",
" \n",
" total_cost = cost + reg_cost #scalar\n",
" return total_cost #scalar"
]
},
{
"cell_type": "markdown",
"id": "f2fd183e",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Run the cell below to see it in action."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5281d9a1",
"metadata": {
"pycharm": {
"name": "#%%\n"
},
"tags": []
},
"outputs": [],
"source": [
"np.random.seed(1)\n",
"X_tmp = np.random.rand(5,6)\n",
"y_tmp = np.array([0,1,0,1,0])\n",
"w_tmp = np.random.rand(X_tmp.shape[1]).reshape(-1,)-0.5\n",
"b_tmp = 0.5\n",
"lambda_tmp = 0.7\n",
"cost_tmp = compute_cost_linear_reg(X_tmp, y_tmp, w_tmp, b_tmp, lambda_tmp)\n",
"\n",
"print(\"Regularized cost:\", cost_tmp)"
]
},
{
"cell_type": "markdown",
"id": "6a21f014",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"**Expected Output**:\n",
"
\n",
"
\n",
"
Regularized cost: 0.07917239320214275
\n",
"
\n",
"
"
]
},
{
"cell_type": "markdown",
"id": "fda319e0",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### Cost function for regularized logistic regression\n",
"For regularized **logistic** regression, the cost function is of the form\n",
"$$J(\\mathbf{w},b) = \\frac{1}{m} \\sum_{i=0}^{m-1} \\left[ -y^{(i)} \\log\\left(f_{\\mathbf{w},b}\\left( \\mathbf{x}^{(i)} \\right) \\right) - \\left( 1 - y^{(i)}\\right) \\log \\left( 1 - f_{\\mathbf{w},b}\\left( \\mathbf{x}^{(i)} \\right) \\right) \\right] + \\frac{\\lambda}{2m} \\sum_{j=0}^{n-1} w_j^2 \\tag{3}$$\n",
"where:\n",
"$$ f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) = sigmoid(\\mathbf{w} \\cdot \\mathbf{x}^{(i)} + b) \\tag{4} $$ \n",
"\n",
"Compare this to the cost function without regularization (which you implemented in a previous lab):\n",
"\n",
"$$ J(\\mathbf{w},b) = \\frac{1}{m}\\sum_{i=0}^{m-1} \\left[ (-y^{(i)} \\log\\left(f_{\\mathbf{w},b}\\left( \\mathbf{x}^{(i)} \\right) \\right) - \\left( 1 - y^{(i)}\\right) \\log \\left( 1 - f_{\\mathbf{w},b}\\left( \\mathbf{x}^{(i)} \\right) \\right)\\right] $$\n",
"\n",
"As was the case in linear regression above, the difference is the regularization term, which is \n",
" $\\frac{\\lambda}{2m} \\sum_{j=0}^{n-1} w_j^2$ \n",
"\n",
"Including this term encourages gradient descent to minimize the size of the parameters. Note, in this example, the parameter $b$ is not regularized. This is standard practice. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4641aca8",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"def compute_cost_logistic_reg(X, y, w, b, lambda_ = 1):\n",
" \"\"\"\n",
" Computes the cost over all examples\n",
" Args:\n",
" Args:\n",
" X (ndarray (m,n): Data, m examples with n features\n",
" y (ndarray (m,)): target values\n",
" w (ndarray (n,)): model parameters \n",
" b (scalar) : model parameter\n",
" lambda_ (scalar): Controls amount of regularization\n",
" Returns:\n",
" total_cost (scalar): cost \n",
" \"\"\"\n",
"\n",
" m,n = X.shape\n",
" cost = 0.\n",
" for i in range(m):\n",
" z_i = np.dot(X[i], w) + b #(n,)(n,)=scalar, see np.dot\n",
" f_wb_i = sigmoid(z_i) #scalar\n",
" cost += -y[i]*np.log(f_wb_i) - (1-y[i])*np.log(1-f_wb_i) #scalar\n",
" \n",
" cost = cost/m #scalar\n",
"\n",
" reg_cost = 0\n",
" for j in range(n):\n",
" reg_cost += (w[j]**2) #scalar\n",
" reg_cost = (lambda_/(2*m)) * reg_cost #scalar\n",
" \n",
" total_cost = cost + reg_cost #scalar\n",
" return total_cost #scalar"
]
},
{
"cell_type": "markdown",
"id": "ed8316cf",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Run the cell below to see it in action."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3e592cf7",
"metadata": {
"pycharm": {
"name": "#%%\n"
},
"tags": []
},
"outputs": [],
"source": [
"np.random.seed(1)\n",
"X_tmp = np.random.rand(5,6)\n",
"y_tmp = np.array([0,1,0,1,0])\n",
"w_tmp = np.random.rand(X_tmp.shape[1]).reshape(-1,)-0.5\n",
"b_tmp = 0.5\n",
"lambda_tmp = 0.7\n",
"cost_tmp = compute_cost_logistic_reg(X_tmp, y_tmp, w_tmp, b_tmp, lambda_tmp)\n",
"\n",
"print(\"Regularized cost:\", cost_tmp)"
]
},
{
"cell_type": "markdown",
"id": "9959f568",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"**Expected Output**:\n",
"
\n",
"
\n",
"
Regularized cost: 0.6850849138741673
\n",
"
\n",
"
"
]
},
{
"cell_type": "markdown",
"id": "4531d5d5",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### Gradient descent with regularization\n",
"The basic algorithm for running gradient descent does not change with regularization, it is:\n",
"$$\\begin{align*}\n",
"&\\text{repeat until convergence:} \\; \\lbrace \\\\\n",
"& \\; \\; \\;w_j = w_j - \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} \\tag{1} \\; & \\text{for j := 0..n-1} \\\\ \n",
"& \\; \\; \\; \\; \\;b = b - \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial b} \\\\\n",
"&\\rbrace\n",
"\\end{align*}$$\n",
"Where each iteration performs simultaneous updates on $w_j$ for all $j$.\n",
"\n",
"What changes with regularization is computing the gradients."
]
},
{
"cell_type": "markdown",
"id": "7936754c",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### Computing the Gradient with regularization (both linear/logistic)\n",
"The gradient calculation for both linear and logistic regression are nearly identical, differing only in computation of $f_{\\mathbf{w}b}$.\n",
"$$\\begin{align*}\n",
"\\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} &= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} + \\frac{\\lambda}{m} w_j \\tag{2} \\\\\n",
"\\frac{\\partial J(\\mathbf{w},b)}{\\partial b} &= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)}) \\tag{3} \n",
"\\end{align*}$$\n",
"\n",
"* m is the number of training examples in the data set \n",
"* $f_{\\mathbf{w},b}(x^{(i)})$ is the model's prediction, while $y^{(i)}$ is the target\n",
"\n",
" \n",
"* For a **linear** regression model \n",
" $f_{\\mathbf{w},b}(x) = \\mathbf{w} \\cdot \\mathbf{x} + b$ \n",
"* For a **logistic** regression model \n",
" $z = \\mathbf{w} \\cdot \\mathbf{x} + b$ \n",
" $f_{\\mathbf{w},b}(x) = g(z)$ \n",
" where $g(z)$ is the sigmoid function: \n",
" $g(z) = \\frac{1}{1+e^{-z}}$ \n",
" \n",
"The term which adds regularization is the $\\frac{\\lambda}{m} w_j $."
]
},
{
"cell_type": "markdown",
"id": "dbc80d53",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### Gradient function for regularized linear regression"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "56709277",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"def compute_gradient_linear_reg(X, y, w, b, lambda_): \n",
" \"\"\"\n",
" Computes the gradient for linear regression \n",
" Args:\n",
" X (ndarray (m,n): Data, m examples with n features\n",
" y (ndarray (m,)): target values\n",
" w (ndarray (n,)): model parameters \n",
" b (scalar) : model parameter\n",
" lambda_ (scalar): Controls amount of regularization\n",
" \n",
" Returns:\n",
" dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w. \n",
" dj_db (scalar): The gradient of the cost w.r.t. the parameter b. \n",
" \"\"\"\n",
" m,n = X.shape #(number of examples, number of features)\n",
" dj_dw = np.zeros((n,))\n",
" dj_db = 0.\n",
"\n",
" for i in range(m): \n",
" err = (np.dot(X[i], w) + b) - y[i] \n",
" for j in range(n): \n",
" dj_dw[j] = dj_dw[j] + err * X[i, j] \n",
" dj_db = dj_db + err \n",
" dj_dw = dj_dw / m \n",
" dj_db = dj_db / m \n",
" \n",
" for j in range(n):\n",
" dj_dw[j] = dj_dw[j] + (lambda_/m) * w[j]\n",
"\n",
" return dj_db, dj_dw"
]
},
{
"cell_type": "markdown",
"id": "bacf0f56",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Run the cell below to see it in action."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6b729786",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"np.random.seed(1)\n",
"X_tmp = np.random.rand(5,3)\n",
"y_tmp = np.array([0,1,0,1,0])\n",
"w_tmp = np.random.rand(X_tmp.shape[1])\n",
"b_tmp = 0.5\n",
"lambda_tmp = 0.7\n",
"dj_db_tmp, dj_dw_tmp = compute_gradient_linear_reg(X_tmp, y_tmp, w_tmp, b_tmp, lambda_tmp)\n",
"\n",
"print(f\"dj_db: {dj_db_tmp}\", )\n",
"print(f\"Regularized dj_dw:\\n {dj_dw_tmp.tolist()}\", )"
]
},
{
"cell_type": "markdown",
"id": "f753293f",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"**Expected Output**\n",
"```\n",
"dj_db: 0.6648774569425726\n",
"Regularized dj_dw:\n",
" [0.29653214748822276, 0.4911679625918033, 0.21645877535865857]\n",
" ```"
]
},
{
"cell_type": "markdown",
"id": "fd9ea7e5",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### Gradient function for regularized logistic regression"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "04553ad7",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"def compute_gradient_logistic_reg(X, y, w, b, lambda_): \n",
" \"\"\"\n",
" Computes the gradient for linear regression \n",
" \n",
" Args:\n",
" X (ndarray (m,n): Data, m examples with n features\n",
" y (ndarray (m,)): target values\n",
" w (ndarray (n,)): model parameters \n",
" b (scalar) : model parameter\n",
" lambda_ (scalar): Controls amount of regularization\n",
" Returns\n",
" dj_dw (ndarray Shape (n,)): The gradient of the cost w.r.t. the parameters w. \n",
" dj_db (scalar) : The gradient of the cost w.r.t. the parameter b. \n",
" \"\"\"\n",
" m,n = X.shape\n",
" dj_dw = np.zeros((n,)) #(n,)\n",
" dj_db = 0.0 #scalar\n",
"\n",
" for i in range(m):\n",
" f_wb_i = sigmoid(np.dot(X[i],w) + b) #(n,)(n,)=scalar\n",
" err_i = f_wb_i - y[i] #scalar\n",
" for j in range(n):\n",
" dj_dw[j] = dj_dw[j] + err_i * X[i,j] #scalar\n",
" dj_db = dj_db + err_i\n",
" dj_dw = dj_dw/m #(n,)\n",
" dj_db = dj_db/m #scalar\n",
"\n",
" for j in range(n):\n",
" dj_dw[j] = dj_dw[j] + (lambda_/m) * w[j]\n",
"\n",
" return dj_db, dj_dw \n"
]
},
{
"cell_type": "markdown",
"id": "b58a224a",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Run the cell below to see it in action."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7747ecab",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"np.random.seed(1)\n",
"X_tmp = np.random.rand(5,3)\n",
"y_tmp = np.array([0,1,0,1,0])\n",
"w_tmp = np.random.rand(X_tmp.shape[1])\n",
"b_tmp = 0.5\n",
"lambda_tmp = 0.7\n",
"dj_db_tmp, dj_dw_tmp = compute_gradient_logistic_reg(X_tmp, y_tmp, w_tmp, b_tmp, lambda_tmp)\n",
"\n",
"print(f\"dj_db: {dj_db_tmp}\", )\n",
"print(f\"Regularized dj_dw:\\n {dj_dw_tmp.tolist()}\", )"
]
},
{
"cell_type": "markdown",
"id": "00f3b856",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"**Expected Output**\n",
"```\n",
"dj_db: 0.341798994972791\n",
"Regularized dj_dw:\n",
" [0.17380012933994293, 0.32007507881566943, 0.10776313396851499]\n",
" ```"
]
},
{
"cell_type": "markdown",
"id": "442e5abd",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"##### Rerun over-fitting example"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6464dc89",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"plt.close(\"all\")\n",
"display(output)\n",
"ofit = overfit_example(True)"
]
},
{
"cell_type": "markdown",
"id": "5817863b",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"In the plot above, try out regularization on the previous example. In particular:\n",
"- Categorical (logistic regression)\n",
" - set degree to 6, lambda to 0 (no regularization), fit the data\n",
" - now set lambda to 1 (increase regularization), fit the data, notice the difference.\n",
"- Regression (linear regression)\n",
" - try the same procedure."
]
},
{
"cell_type": "markdown",
"id": "8e73bb4f",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"You have:\n",
"- examples of cost and gradient routines with regularization added for both linear and logistic regression\n",
"- developed some intuition on how regularization can reduce over-fitting"
]
},
{
"cell_type": "markdown",
"id": "1c887ea3",
"metadata": {},
"source": [
"### Practice Quiz "
]
},
{
"cell_type": "markdown",
"id": "a79ab212",
"metadata": {},
"source": [
"#### Quiz-1 "
]
},
{
"cell_type": "markdown",
"id": "268b0a6d",
"metadata": {},
"source": [
"
\n",
""
]
},
{
"cell_type": "markdown",
"id": "f97644cb",
"metadata": {},
"source": [
"### Assignment W3: \n"
]
},
{
"cell_type": "markdown",
"id": "990eb3c7",
"metadata": {},
"source": [
"##### Logistic Regression\n",
"\n",
"In this exercise, you will implement logistic regression and apply it to two different datasets. \n",
"\n",
"\n",
"###### Outline\n",
"- [ 1 - Packages ](#1)\n",
"- [ 2 - Logistic Regression](#2)\n",
" - [ 2.1 Problem Statement](#2.1)\n",
" - [ 2.2 Loading and visualizing the data](#2.2)\n",
" - [ 2.3 Sigmoid function](#2.3)\n",
" - [ 2.4 Cost function for logistic regression](#2.4)\n",
" - [ 2.5 Gradient for logistic regression](#2.5)\n",
" - [ 2.6 Learning parameters using gradient descent ](#2.6)\n",
" - [ 2.7 Plotting the decision boundary](#2.7)\n",
" - [ 2.8 Evaluating logistic regression](#2.8)\n",
"- [ 3 - Regularized Logistic Regression](#3)\n",
" - [ 3.1 Problem Statement](#3.1)\n",
" - [ 3.2 Loading and visualizing the data](#3.2)\n",
" - [ 3.3 Feature mapping](#3.3)\n",
" - [ 3.4 Cost function for regularized logistic regression](#3.4)\n",
" - [ 3.5 Gradient for regularized logistic regression](#3.5)\n",
" - [ 3.6 Learning parameters using gradient descent](#3.6)\n",
" - [ 3.7 Plotting the decision boundary](#3.7)\n",
" - [ 3.8 Evaluating regularized logistic regression model](#3.8)\n"
]
},
{
"cell_type": "markdown",
"id": "c7d0ad3c",
"metadata": {},
"source": [
"#### 1 - Packages \n",
"\n",
"\n",
"First, let's run the cell below to import all the packages that you will need during this assignment.\n",
"- [numpy](www.numpy.org) is the fundamental package for scientific computing with Python.\n",
"- [matplotlib](http://matplotlib.org) is a famous library to plot graphs in Python.\n",
"- ``utils.py`` contains helper functions for this assignment. You do not need to modify code in this file."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "a5c62d8a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'/home/amit/my_web/Machine-Learning-Andrew-Ng/source/source_files/Supervised_Machine_Learning_Regression_and_Classification'"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": []
},
{
"cell_type": "code",
"execution_count": 1,
"id": "5c2efd93",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np,os,sys\n",
"import matplotlib.pyplot as plt\n",
"import subprocess,os\n",
"from pathlib import Path\n",
"home_path = str(Path.home())\n",
"proj_path=home_path+\"/my_web/Machine-Learning-Andrew-Ng/source/source_files/Supervised_Machine_Learning_Regression_and_Classification\"\n",
"sys.path.append(f\"{proj_path}/week3/C1W3A1\")\n",
"#os.chdir(proj_path)\n",
"from utils import *\n",
"import copy\n",
"import math\n",
"\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"id": "2610c7f7",
"metadata": {},
"source": [
"#### 2 - Logistic Regression\n",
"\n",
"\n",
"In this part of the exercise, you will build a logistic regression model to predict whether a student gets admitted into a university.\n",
"\n",
"##### 2.1 Problem Statement\n",
"\n",
"\n",
"Suppose that you are the administrator of a university department and you want to determine each applicant’s chance of admission based on their results on two exams. \n",
"* You have historical data from previous applicants that you can use as a training set for logistic regression. \n",
"* For each training example, you have the applicant’s scores on two exams and the admissions decision. \n",
"* Your task is to build a classification model that estimates an applicant’s probability of admission based on the scores from those two exams. \n",
"\n",
"##### 2.2 Loading and visualizing the data\n",
"\n",
"\n",
"You will start by loading the dataset for this task. \n",
"- The `load_dataset()` function shown below loads the data into variables `X_train` and `y_train`\n",
" - `X_train` contains exam scores on two exams for a student\n",
" - `y_train` is the admission decision \n",
" - `y_train = 1` if the student was admitted \n",
" - `y_train = 0` if the student was not admitted \n",
" - Both `X_train` and `y_train` are numpy arrays.\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "15b6df01",
"metadata": {},
"outputs": [],
"source": [
"# load dataset\n",
"X_train, y_train = load_data(\"week3/C1W3A1/data/ex2data1.txt\")"
]
},
{
"cell_type": "markdown",
"id": "0b1f0d8a",
"metadata": {},
"source": [
"###### View the variables\n",
"Let's get more familiar with your dataset. \n",
"- A good place to start is to just print out each variable and see what it contains.\n",
"\n",
"The code below prints the first five values of `X_train` and the type of the variable."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "701d523f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"First five elements in X_train are:\n",
" [[34.62365962 78.02469282]\n",
" [30.28671077 43.89499752]\n",
" [35.84740877 72.90219803]\n",
" [60.18259939 86.3085521 ]\n",
" [79.03273605 75.34437644]]\n",
"Type of X_train: \n"
]
}
],
"source": [
"print(\"First five elements in X_train are:\\n\", X_train[:5])\n",
"print(\"Type of X_train:\",type(X_train))"
]
},
{
"cell_type": "markdown",
"id": "78850227",
"metadata": {},
"source": [
"Now print the first five values of `y_train`"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "e93bae7c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"First five elements in y_train are:\n",
" [0. 0. 0. 1. 1.]\n",
"Type of y_train: \n"
]
}
],
"source": [
"print(\"First five elements in y_train are:\\n\", y_train[:5])\n",
"print(\"Type of y_train:\",type(y_train))"
]
},
{
"cell_type": "markdown",
"id": "4f7d0260",
"metadata": {},
"source": [
"###### Check the dimensions of your variables\n",
"\n",
"Another useful way to get familiar with your data is to view its dimensions. Let's print the shape of `X_train` and `y_train` and see how many training examples we have in our dataset."
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "9a2a991c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The shape of X_train is: (100, 2)\n",
"The shape of y_train is: (100,)\n",
"We have m = 100 training examples\n"
]
}
],
"source": [
"print ('The shape of X_train is: ' + str(X_train.shape))\n",
"print ('The shape of y_train is: ' + str(y_train.shape))\n",
"print ('We have m = %d training examples' % (len(y_train)))"
]
},
{
"cell_type": "markdown",
"id": "588956c1",
"metadata": {},
"source": [
"###### \n",
"Visualize your data\n",
"\n",
"Before starting to implement any learning algorithm, it is always good to visualize the data if possible.\n",
"- The code below displays the data on a 2D plot (as shown below), where the axes are the two exam scores, and the positive and negative examples are shown with different markers.\n",
"- We use a helper function in the ``utils.py`` file to generate this plot. \n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "f49c2b82",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Plot examples\n",
"plot_data(X_train, y_train[:], pos_label=\"Admitted\", neg_label=\"Not admitted\")\n",
"\n",
"# Set the y-axis label\n",
"plt.ylabel('Exam 2 score') \n",
"# Set the x-axis label\n",
"plt.xlabel('Exam 1 score') \n",
"plt.legend(loc=\"upper right\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "734e4cff",
"metadata": {},
"source": [
"Your goal is to build a logistic regression model to fit this data.\n",
"- With this model, you can then predict if a new student will be admitted based on their scores on the two exams."
]
},
{
"cell_type": "markdown",
"id": "41a18ab6",
"metadata": {},
"source": [
"##### 2.3 Sigmoid function\n",
"\n",
"\n",
"Recall that for logistic regression, the model is represented as\n",
"\n",
"$$ f_{\\mathbf{w},b}(x) = g(\\mathbf{w}\\cdot \\mathbf{x} + b)$$\n",
"where function $g$ is the sigmoid function. The sigmoid function is defined as:\n",
"\n",
"$$g(z) = \\frac{1}{1+e^{-z}}$$\n",
"\n",
"Let's implement the sigmoid function first, so it can be used by the rest of this assignment.\n",
"\n",
"###### Exercise 1\n",
"\n",
"\n",
"Please complete the `sigmoid` function to calculate\n",
"\n",
"$$g(z) = \\frac{1}{1+e^{-z}}$$\n",
"\n",
"Note that \n",
"- `z` is not always a single number, but can also be an array of numbers. \n",
"- If the input is an array of numbers, we'd like to apply the sigmoid function to each value in the input array.\n",
"\n",
"If you get stuck, you can check out the hints presented after the cell below to help you with the implementation."
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "b21a4ef0",
"metadata": {},
"outputs": [],
"source": [
"# UNQ_C1\n",
"# GRADED FUNCTION: sigmoid\n",
"\n",
"def sigmoid(z):\n",
" \"\"\"\n",
" Compute the sigmoid of z\n",
"\n",
" Args:\n",
" z (ndarray): A scalar, numpy array of any size.\n",
"\n",
" Returns:\n",
" g (ndarray): sigmoid(z), with the same shape as z\n",
" \n",
" \"\"\"\n",
" \n",
" ### START CODE HERE ### \n",
" g = 1/(1+np.exp(-z))\n",
" ### END SOLUTION ### \n",
" \n",
" return g"
]
},
{
"cell_type": "markdown",
"id": "b63b9554",
"metadata": {},
"source": [
"\n",
" Click for hints\n",
" \n",
"`numpy` has a function called [`np.exp()`](https://numpy.org/doc/stable/reference/generated/numpy.exp.html), which offers a convinient way to calculate the exponential ( $e^{z}$) of all elements in the input array (`z`).\n",
" \n",
"\n",
" Click for more hints\n",
" \n",
" - You can translate $e^{-z}$ into code as `np.exp(-z)` \n",
" \n",
" - You can translate $1/e^{-z}$ into code as `1/np.exp(-z)` \n",
" \n",
" If you're still stuck, you can check the hints presented below to figure out how to calculate `g` \n",
" \n",
" \n",
" Hint to calculate g\n",
" g = 1 / (1 + np.exp(-z))\n",
" \n",
"\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"id": "6f2932ee",
"metadata": {},
"source": [
"When you are finished, try testing a few values by calling `sigmoid(x)` in the cell below. \n",
"- For large positive values of x, the sigmoid should be close to 1, while for large negative values, the sigmoid should be close to 0. \n",
"- Evaluating `sigmoid(0)` should give you exactly 0.5. \n"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "2e8ef3d4",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"sigmoid(0) = 0.5\n"
]
}
],
"source": [
"print (\"sigmoid(0) = \" + str(sigmoid(0)))"
]
},
{
"cell_type": "markdown",
"id": "c07b16cd",
"metadata": {},
"source": [
"**Expected Output**:\n",
"
\n",
"
\n",
"
sigmoid(0)
\n",
"
0.5
\n",
"
\n",
"
\n",
" \n",
"- As mentioned before, your code should also work with vectors and matrices. For a matrix, your function should perform the sigmoid function on every element."
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "42b4570d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"sigmoid([ -1, 0, 1, 2]) = [0.26894142 0.5 0.73105858 0.88079708]\n",
"\u001b[92mAll tests passed!\n"
]
}
],
"source": [
"print (\"sigmoid([ -1, 0, 1, 2]) = \" + str(sigmoid(np.array([-1, 0, 1, 2]))))\n",
"\n",
"# UNIT TESTS\n",
"from public_tests import *\n",
"sigmoid_test(sigmoid)"
]
},
{
"cell_type": "markdown",
"id": "90682034",
"metadata": {},
"source": [
"**Expected Output**:\n",
"
\n",
"
\n",
"
sigmoid([-1, 0, 1, 2])
\n",
"
[0.26894142 0.5 0.73105858 0.88079708]
\n",
"
\n",
" \n",
"
"
]
},
{
"cell_type": "markdown",
"id": "f05dce11",
"metadata": {},
"source": [
"##### 2.4 Cost function for logistic regression\n",
"\n",
"\n",
"In this section, you will implement the cost function for logistic regression.\n",
"\n",
"###### Exercise 2\n",
"\n",
"\n",
"Please complete the `compute_cost` function using the equations below.\n",
"\n",
"Recall that for logistic regression, the cost function is of the form \n",
"\n",
"$$ J(\\mathbf{w},b) = \\frac{1}{m}\\sum_{i=0}^{m-1} \\left[ loss(f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}), y^{(i)}) \\right] \\tag{1}$$\n",
"\n",
"where\n",
"* m is the number of training examples in the dataset\n",
"\n",
"\n",
"* $loss(f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}), y^{(i)})$ is the cost for a single data point, which is - \n",
"\n",
" $$loss(f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}), y^{(i)}) = (-y^{(i)} \\log\\left(f_{\\mathbf{w},b}\\left( \\mathbf{x}^{(i)} \\right) \\right) - \\left( 1 - y^{(i)}\\right) \\log \\left( 1 - f_{\\mathbf{w},b}\\left( \\mathbf{x}^{(i)} \\right) \\right) \\tag{2}$$\n",
" \n",
" \n",
"* $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)})$ is the model's prediction, while $y^{(i)}$, which is the actual label\n",
"\n",
"* $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) = g(\\mathbf{w} \\cdot \\mathbf{x^{(i)}} + b)$ where function $g$ is the sigmoid function.\n",
" * It might be helpful to first calculate an intermediate variable $z_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) = \\mathbf{w} \\cdot \\mathbf{x^{(i)}} + b = w_0x^{(i)}_0 + ... + w_{n-1}x^{(i)}_{n-1} + b$ where $n$ is the number of features, before calculating $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) = g(z_{\\mathbf{w},b}(\\mathbf{x}^{(i)}))$\n",
"\n",
"Note:\n",
"* As you are doing this, remember that the variables `X_train` and `y_train` are not scalar values but matrices of shape ($m, n$) and ($𝑚$,1) respectively, where $𝑛$ is the number of features and $𝑚$ is the number of training examples.\n",
"* You can use the sigmoid function that you implemented above for this part.\n",
"\n",
"If you get stuck, you can check out the hints presented after the cell below to help you with the implementation."
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "dc682c2d",
"metadata": {},
"outputs": [],
"source": [
"# UNQ_C2\n",
"# GRADED FUNCTION: compute_cost\n",
"def compute_cost(X, y, w, b, lambda_= 1):\n",
" \"\"\"\n",
" Computes the cost over all examples\n",
" Args:\n",
" X : (ndarray Shape (m,n)) data, m examples by n features\n",
" y : (array_like Shape (m,)) target value \n",
" w : (array_like Shape (n,)) Values of parameters of the model \n",
" b : scalar Values of bias parameter of the model\n",
" lambda_: unused placeholder\n",
" Returns:\n",
" total_cost: (scalar) cost \n",
" \"\"\"\n",
"\n",
" m, n = X.shape\n",
" \n",
" ### START CODE HERE ###\n",
" cost = 0\n",
" for i in range(m):\n",
" z = np.dot(X[i],w) + b\n",
" f_wb = sigmoid(z)\n",
" cost += -y[i]*np.log(f_wb) - (1-y[i])*np.log(1-f_wb)\n",
" total_cost = cost/m\n",
" \n",
" ### END CODE HERE ### \n",
"\n",
" return total_cost"
]
},
{
"cell_type": "markdown",
"id": "af06ab44",
"metadata": {},
"source": [
"\n",
" Click for hints\n",
" \n",
" \n",
" * You can represent a summation operator eg: $h = \\sum\\limits_{i = 0}^{m-1} 2i$ in code as follows:\n",
" ```python \n",
" h = 0\n",
" for i in range(m):\n",
" h = h + 2*i\n",
" ```\n",
" \n",
" * In this case, you can iterate over all the examples in `X` using a for loop and add the `loss` from each iteration to a variable (`loss_sum`) initialized outside the loop.\n",
"\n",
" * Then, you can return the `total_cost` as `loss_sum` divided by `m`.\n",
" \n",
" \n",
" Click for more hints\n",
" \n",
" * Here's how you can structure the overall implementation for this function\n",
" ```python \n",
" def compute_cost(X, y, w, b, lambda_= 1):\n",
" m, n = X.shape\n",
" \n",
" ### START CODE HERE ###\n",
" loss_sum = 0 \n",
" \n",
" # Loop over each training example\n",
" for i in range(m): \n",
" \n",
" # First calculate z_wb = w[0]*X[i][0]+...+w[n-1]*X[i][n-1]+b\n",
" z_wb = 0 \n",
" # Loop over each feature\n",
" for j in range(n): \n",
" # Add the corresponding term to z_wb\n",
" z_wb_ij = # Your code here to calculate w[j] * X[i][j]\n",
" z_wb += z_wb_ij # equivalent to z_wb = z_wb + z_wb_ij\n",
" # Add the bias term to z_wb\n",
" z_wb += b # equivalent to z_wb = z_wb + b\n",
" \n",
" f_wb = # Your code here to calculate prediction f_wb for a training example\n",
" loss = # Your code here to calculate loss for a training example\n",
" \n",
" loss_sum += loss # equivalent to loss_sum = loss_sum + loss\n",
" \n",
" total_cost = (1 / m) * loss_sum \n",
" ### END CODE HERE ### \n",
" \n",
" return total_cost\n",
" ```\n",
" \n",
" If you're still stuck, you can check the hints presented below to figure out how to calculate `z_wb_ij`, `f_wb` and `cost`.\n",
" \n",
" Hint to calculate z_wb_ij\n",
" z_wb_ij = w[j]*X[i][j] \n",
" \n",
" \n",
" \n",
" Hint to calculate f_wb\n",
" $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) = g(z_{\\mathbf{w},b}(\\mathbf{x}^{(i)}))$ where $g$ is the sigmoid function. You can simply call the `sigmoid` function implemented above.\n",
" \n",
" More hints to calculate f\n",
" You can compute f_wb as f_wb = sigmoid(z_wb) \n",
" \n",
" \n",
"\n",
" \n",
" Hint to calculate loss\n",
" You can use the np.log function to calculate the log\n",
" \n",
" More hints to calculate loss\n",
" You can compute loss as loss = -y[i] * np.log(f_wb) - (1 - y[i]) * np.log(1 - f_wb)\n",
" \n",
" \n",
" \n",
" \n",
"\n",
""
]
},
{
"cell_type": "markdown",
"id": "603ffa12",
"metadata": {},
"source": [
"Run the cells below to check your implementation of the `compute_cost` function with two different initializations of the parameters $w$"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "5811e870",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Cost at initial w (zeros): 0.693\n"
]
}
],
"source": [
"m, n = X_train.shape\n",
"\n",
"# Compute and display cost with w initialized to zeroes\n",
"initial_w = np.zeros(n)\n",
"initial_b = 0.\n",
"cost = compute_cost(X_train, y_train, initial_w, initial_b)\n",
"print('Cost at initial w (zeros): {:.3f}'.format(cost))"
]
},
{
"cell_type": "markdown",
"id": "1d6e1746",
"metadata": {},
"source": [
"**Expected Output**:\n",
"
"
]
},
{
"cell_type": "markdown",
"id": "e6eff0f4",
"metadata": {},
"source": [
"##### 2.5 Gradient for logistic regression\n",
"\n",
"\n",
"In this section, you will implement the gradient for logistic regression.\n",
"\n",
"Recall that the gradient descent algorithm is:\n",
"\n",
"$$\\begin{align*}& \\text{repeat until convergence:} \\; \\lbrace \\newline \\; & b := b - \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial b} \\newline \\; & w_j := w_j - \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} \\tag{1} \\; & \\text{for j := 0..n-1}\\newline & \\rbrace\\end{align*}$$\n",
"\n",
"where, parameters $b$, $w_j$ are all updated simultaniously"
]
},
{
"cell_type": "markdown",
"id": "cfdbf87b",
"metadata": {},
"source": [
"###### Exercise 3\n",
"\n",
"\n",
"\n",
"Please complete the `compute_gradient` function to compute $\\frac{\\partial J(\\mathbf{w},b)}{\\partial w}$, $\\frac{\\partial J(\\mathbf{w},b)}{\\partial b}$ from equations (2) and (3) below.\n",
"\n",
"$$\n",
"\\frac{\\partial J(\\mathbf{w},b)}{\\partial b} = \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - \\mathbf{y}^{(i)}) \\tag{2}\n",
"$$\n",
"$$\n",
"\\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} = \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - \\mathbf{y}^{(i)})x_{j}^{(i)} \\tag{3}\n",
"$$\n",
"* m is the number of training examples in the dataset\n",
"\n",
" \n",
"* $f_{\\mathbf{w},b}(x^{(i)})$ is the model's prediction, while $y^{(i)}$ is the actual label\n",
"\n",
"\n",
"- **Note**: While this gradient looks identical to the linear regression gradient, the formula is actually different because linear and logistic regression have different definitions of $f_{\\mathbf{w},b}(x)$.\n",
"\n",
"As before, you can use the sigmoid function that you implemented above and if you get stuck, you can check out the hints presented after the cell below to help you with the implementation."
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "ec8a88b0",
"metadata": {},
"outputs": [],
"source": [
"# UNQ_C3\n",
"# GRADED FUNCTION: compute_gradient\n",
"def compute_gradient(X, y, w, b, lambda_=None): \n",
" \"\"\"\n",
" Computes the gradient for logistic regression \n",
" \n",
" Args:\n",
" X : (ndarray Shape (m,n)) variable such as house size \n",
" y : (array_like Shape (m,1)) actual value \n",
" w : (array_like Shape (n,1)) values of parameters of the model \n",
" b : (scalar) value of parameter of the model \n",
" lambda_: unused placeholder.\n",
" Returns\n",
" dj_dw: (array_like Shape (n,1)) The gradient of the cost w.r.t. the parameters w. \n",
" dj_db: (scalar) The gradient of the cost w.r.t. the parameter b. \n",
" \"\"\"\n",
" m, n = X.shape\n",
" dj_dw = np.zeros(w.shape)\n",
" dj_db = 0.\n",
"\n",
" ### START CODE HERE ### \n",
" for i in range(m):\n",
" f_wb_i = sigmoid(np.dot(X[i],w) + b) \n",
" err_i = f_wb_i - y[i] \n",
" for j in range(n):\n",
" dj_dw[j] = dj_dw[j] + err_i * X[i,j] \n",
" dj_db = dj_db + err_i\n",
" dj_dw = dj_dw/m \n",
" dj_db = dj_db/m \n",
" \n",
" ### END CODE HERE ###\n",
"\n",
" \n",
" return dj_db, dj_dw"
]
},
{
"cell_type": "markdown",
"id": "fcb8beee",
"metadata": {},
"source": [
" \n",
" Click for hints\n",
" \n",
" \n",
"* Here's how you can structure the overall implementation for this function\n",
" ```python \n",
" def compute_gradient(X, y, w, b, lambda_=None): \n",
" m, n = X.shape\n",
" dj_dw = np.zeros(w.shape)\n",
" dj_db = 0.\n",
" \n",
" ### START CODE HERE ### \n",
" for i in range(m):\n",
" # Calculate f_wb (exactly as you did in the compute_cost function above)\n",
" f_wb = \n",
" \n",
" # Calculate the gradient for b from this example\n",
" dj_db_i = # Your code here to calculate the error\n",
" \n",
" # add that to dj_db\n",
" dj_db += dj_db_i\n",
" \n",
" # get dj_dw for each attribute\n",
" for j in range(n):\n",
" # You code here to calculate the gradient from the i-th example for j-th attribute\n",
" dj_dw_ij = \n",
" dj_dw[j] += dj_dw_ij\n",
" \n",
" # divide dj_db and dj_dw by total number of examples\n",
" dj_dw = dj_dw / m\n",
" dj_db = dj_db / m\n",
" ### END CODE HERE ###\n",
" \n",
" return dj_db, dj_dw\n",
" ```\n",
" \n",
" If you're still stuck, you can check the hints presented below to figure out how to calculate `f_wb`, `dj_db_i` and `dj_dw_ij` \n",
" \n",
" \n",
" Hint to calculate f_wb\n",
" Recall that you calculated f_wb in compute_cost above — for detailed hints on how to calculate each intermediate term, check out the hints section below that exercise\n",
" \n",
" More hints to calculate f_wb\n",
" You can calculate f_wb as\n",
"
\n",
" for i in range(m): \n",
" # Calculate f_wb (exactly how you did it in the compute_cost function above)\n",
" z_wb = 0\n",
" # Loop over each feature\n",
" for j in range(n): \n",
" # Add the corresponding term to z_wb\n",
" z_wb_ij = X[i, j] * w[j]\n",
" z_wb += z_wb_ij\n",
" \n",
" # Add bias term \n",
" z_wb += b\n",
" \n",
" # Calculate the prediction from the model\n",
" f_wb = sigmoid(z_wb)\n",
"
\n",
" \n",
" \n",
" \n",
" Hint to calculate dj_db_i\n",
" You can calculate dj_db_i as dj_db_i = f_wb - y[i]\n",
" \n",
" \n",
" \n",
" Hint to calculate dj_dw_ij\n",
" You can calculate dj_dw_ij as dj_dw_ij = (f_wb - y[i])* X[i][j]\n",
" \n",
"\n",
""
]
},
{
"cell_type": "markdown",
"id": "d5a86862",
"metadata": {},
"source": [
"Run the cells below to check your implementation of the `compute_gradient` function with two different initializations of the parameters $w$"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "ca500a49",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"dj_db at initial w (zeros):-0.1\n",
"dj_dw at initial w (zeros):[-12.00921658929115, -11.262842205513591]\n"
]
}
],
"source": [
"# Compute and display gradient with w initialized to zeroes\n",
"initial_w = np.zeros(n)\n",
"initial_b = 0.\n",
"\n",
"dj_db, dj_dw = compute_gradient(X_train, y_train, initial_w, initial_b)\n",
"print(f'dj_db at initial w (zeros):{dj_db}' )\n",
"print(f'dj_dw at initial w (zeros):{dj_dw.tolist()}' )"
]
},
{
"cell_type": "markdown",
"id": "1643f15d",
"metadata": {},
"source": [
"**Expected Output**:\n",
"
"
]
},
{
"cell_type": "markdown",
"id": "50b93479",
"metadata": {},
"source": [
"##### 2.6 Learning parameters using gradient descent \n",
"\n",
"\n",
"Similar to the previous assignment, you will now find the optimal parameters of a logistic regression model by using gradient descent. \n",
"- You don't need to implement anything for this part. Simply run the cells below. \n",
"\n",
"- A good way to verify that gradient descent is working correctly is to look\n",
"at the value of $J(\\mathbf{w},b)$ and check that it is decreasing with each step. \n",
"\n",
"- Assuming you have implemented the gradient and computed the cost correctly, your value of $J(\\mathbf{w},b)$ should never increase, and should converge to a steady value by the end of the algorithm."
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "d27a2ef7",
"metadata": {},
"outputs": [],
"source": [
"def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters, lambda_): \n",
" \"\"\"\n",
" Performs batch gradient descent to learn theta. Updates theta by taking \n",
" num_iters gradient steps with learning rate alpha\n",
" \n",
" Args:\n",
" X : (array_like Shape (m, n)\n",
" y : (array_like Shape (m,))\n",
" w_in : (array_like Shape (n,)) Initial values of parameters of the model\n",
" b_in : (scalar) Initial value of parameter of the model\n",
" cost_function: function to compute cost\n",
" alpha : (float) Learning rate\n",
" num_iters : (int) number of iterations to run gradient descent\n",
" lambda_ (scalar, float) regularization constant\n",
" \n",
" Returns:\n",
" w : (array_like Shape (n,)) Updated values of parameters of the model after\n",
" running gradient descent\n",
" b : (scalar) Updated value of parameter of the model after\n",
" running gradient descent\n",
" \"\"\"\n",
" \n",
" # number of training examples\n",
" m = len(X)\n",
" \n",
" # An array to store cost J and w's at each iteration primarily for graphing later\n",
" J_history = []\n",
" w_history = []\n",
" \n",
" for i in range(num_iters):\n",
"\n",
" # Calculate the gradient and update the parameters\n",
" dj_db, dj_dw = gradient_function(X, y, w_in, b_in, lambda_) \n",
"\n",
" # Update Parameters using w, b, alpha and gradient\n",
" w_in = w_in - alpha * dj_dw \n",
" b_in = b_in - alpha * dj_db \n",
" \n",
" # Save cost J at each iteration\n",
" if i<100000: # prevent resource exhaustion \n",
" cost = cost_function(X, y, w_in, b_in, lambda_)\n",
" J_history.append(cost)\n",
"\n",
" # Print cost every at intervals 10 times or as many iterations if < 10\n",
" if i% math.ceil(num_iters/10) == 0 or i == (num_iters-1):\n",
" w_history.append(w_in)\n",
" print(f\"Iteration {i:4}: Cost {float(J_history[-1]):8.2f} \")\n",
" \n",
" return w_in, b_in, J_history, w_history #return w and J,w history for graphing"
]
},
{
"cell_type": "markdown",
"id": "6a70e252",
"metadata": {},
"source": [
"Now let's run the gradient descent algorithm above to learn the parameters for our dataset.\n",
"\n",
"**Note**\n",
"\n",
"The code block below takes a couple of minutes to run, especially with a non-vectorized version. You can reduce the `iterations` to test your implementation and iterate faster. If you have time, try running 100,000 iterations for better results."
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "ea6e6ab2",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Iteration 0: Cost 1.01 \n",
"Iteration 1000: Cost 0.31 \n",
"Iteration 2000: Cost 0.30 \n",
"Iteration 3000: Cost 0.30 \n",
"Iteration 4000: Cost 0.30 \n",
"Iteration 5000: Cost 0.30 \n",
"Iteration 6000: Cost 0.30 \n",
"Iteration 7000: Cost 0.30 \n",
"Iteration 8000: Cost 0.30 \n",
"Iteration 9000: Cost 0.30 \n",
"Iteration 9999: Cost 0.30 \n"
]
}
],
"source": [
"np.random.seed(1)\n",
"intial_w = 0.01 * (np.random.rand(2).reshape(-1,1) - 0.5)\n",
"initial_b = -8\n",
"\n",
"\n",
"# Some gradient descent settings\n",
"iterations = 10000\n",
"alpha = 0.001\n",
"\n",
"w,b, J_history,_ = gradient_descent(X_train ,y_train, initial_w, initial_b, \n",
" compute_cost, compute_gradient, alpha, iterations, 0)"
]
},
{
"cell_type": "markdown",
"id": "70b0aac4",
"metadata": {},
"source": [
"\n",
"\n",
" Expected Output: Cost 0.30, (Click to see details):\n",
"\n",
"\n",
" # With the following settings\n",
" np.random.seed(1)\n",
" intial_w = 0.01 * (np.random.rand(2).reshape(-1,1) - 0.5)\n",
" initial_b = -8\n",
" iterations = 10000\n",
" alpha = 0.001\n",
" #\n",
"\n",
"```\n",
"Iteration 0: Cost 1.01 \n",
"Iteration 1000: Cost 0.31 \n",
"Iteration 2000: Cost 0.30 \n",
"Iteration 3000: Cost 0.30 \n",
"Iteration 4000: Cost 0.30 \n",
"Iteration 5000: Cost 0.30 \n",
"Iteration 6000: Cost 0.30 \n",
"Iteration 7000: Cost 0.30 \n",
"Iteration 8000: Cost 0.30 \n",
"Iteration 9000: Cost 0.30 \n",
"Iteration 9999: Cost 0.30 \n",
"```"
]
},
{
"cell_type": "markdown",
"id": "73bc1fc7",
"metadata": {},
"source": [
"##### 2.7 Plotting the decision boundary\n",
"\n",
"\n",
"We will now use the final parameters from gradient descent to plot the linear fit. If you implemented the previous parts correctly, you should see the following plot: \n",
"\n",
"\n",
"We will use a helper function in the `utils.py` file to create this plot."
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "4f008e19",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot_decision_boundary(w, b, X_train, y_train)"
]
},
{
"cell_type": "markdown",
"id": "e948ffa2",
"metadata": {},
"source": [
"##### 2.8 Evaluating logistic regression\n",
"\n",
"\n",
"We can evaluate the quality of the parameters we have found by seeing how well the learned model predicts on our training set. \n",
"\n",
"You will implement the `predict` function below to do this.\n"
]
},
{
"cell_type": "markdown",
"id": "2ce67b86",
"metadata": {},
"source": [
"##### Exercise 4\n",
"\n",
"\n",
"\n",
"Please complete the `predict` function to produce `1` or `0` predictions given a dataset and a learned parameter vector $w$ and $b$.\n",
"- First you need to compute the prediction from the model $f(x^{(i)}) = g(w \\cdot x^{(i)})$ for every example \n",
" - You've implemented this before in the parts above\n",
"- We interpret the output of the model ($f(x^{(i)})$) as the probability that $y^{(i)}=1$ given $x^{(i)}$ and parameterized by $w$.\n",
"- Therefore, to get a final prediction ($y^{(i)}=0$ or $y^{(i)}=1$) from the logistic regression model, you can use the following heuristic -\n",
"\n",
" if $f(x^{(i)}) >= 0.5$, predict $y^{(i)}=1$\n",
" \n",
" if $f(x^{(i)}) < 0.5$, predict $y^{(i)}=0$\n",
" \n",
"If you get stuck, you can check out the hints presented after the cell below to help you with the implementation."
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "432d08e4",
"metadata": {},
"outputs": [],
"source": [
"# UNQ_C4\n",
"# GRADED FUNCTION: predict\n",
"\n",
"def predict(X, w, b): \n",
" \"\"\"\n",
" Predict whether the label is 0 or 1 using learned logistic\n",
" regression parameters w\n",
" \n",
" Args:\n",
" X : (ndarray Shape (m, n))\n",
" w : (array_like Shape (n,)) Parameters of the model\n",
" b : (scalar, float) Parameter of the model\n",
"\n",
" Returns:\n",
" p: (ndarray (m,1))\n",
" The predictions for X using a threshold at 0.5\n",
" \"\"\"\n",
" # number of training examples\n",
" m, n = X.shape \n",
" p = np.zeros(m)\n",
" \n",
" ### START CODE HERE ### \n",
" # Loop over each example\n",
" for i in range(m): \n",
" z_wb = np.dot(X[i],w) \n",
" # Loop over each feature\n",
" for j in range(n): \n",
" # Add the corresponding term to z_wb\n",
" z_wb += 0\n",
" \n",
" # Add bias term \n",
" z_wb += b\n",
" \n",
" # Calculate the prediction for this example\n",
" f_wb = sigmoid(z_wb)\n",
"\n",
" # Apply the threshold\n",
" p[i] = 1 if f_wb>0.5 else 0\n",
" \n",
" ### END CODE HERE ### \n",
" return p"
]
},
{
"cell_type": "markdown",
"id": "ba0c79a5",
"metadata": {},
"source": [
"\n",
" Click for hints\n",
" \n",
" \n",
"* Here's how you can structure the overall implementation for this function\n",
" ```python \n",
" def predict(X, w, b): \n",
" # number of training examples\n",
" m, n = X.shape \n",
" p = np.zeros(m)\n",
" \n",
" ### START CODE HERE ### \n",
" # Loop over each example\n",
" for i in range(m): \n",
" \n",
" # Calculate f_wb (exactly how you did it in the compute_cost function above) \n",
" # using a couple of lines of code\n",
" f_wb = \n",
"\n",
" # Calculate the prediction for that training example \n",
" p[i] = # Your code here to calculate the prediction based on f_wb\n",
" \n",
" ### END CODE HERE ### \n",
" return p\n",
" ```\n",
" \n",
" If you're still stuck, you can check the hints presented below to figure out how to calculate `f_wb` and `p[i]` \n",
" \n",
" \n",
" Hint to calculate f_wb\n",
" Recall that you calculated f_wb in compute_cost above — for detailed hints on how to calculate each intermediate term, check out the hints section below that exercise\n",
" \n",
" More hints to calculate f_wb\n",
" You can calculate f_wb as\n",
"
\n",
" for i in range(m): \n",
" # Calculate f_wb (exactly how you did it in the compute_cost function above)\n",
" z_wb = 0\n",
" # Loop over each feature\n",
" for j in range(n): \n",
" # Add the corresponding term to z_wb\n",
" z_wb_ij = X[i, j] * w[j]\n",
" z_wb += z_wb_ij\n",
" \n",
" # Add bias term \n",
" z_wb += b\n",
" \n",
" # Calculate the prediction from the model\n",
" f_wb = sigmoid(z_wb)\n",
"
\n",
" \n",
" \n",
" \n",
" Hint to calculate p[i]\n",
" As an example, if you'd like to say x = 1 if y is less than 3 and 0 otherwise, you can express it in code as x = y < 3 . Now do the same for p[i] = 1 if f_wb >= 0.5 and 0 otherwise. \n",
" \n",
" More hints to calculate p[i]\n",
" You can compute p[i] as p[i] = f_wb >= 0.5\n",
" \n",
" \n",
"\n",
""
]
},
{
"cell_type": "markdown",
"id": "b5cf416b",
"metadata": {},
"source": [
"Once you have completed the function `predict`, let's run the code below to report the training accuracy of your classifier by computing the percentage of examples it got correct."
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "c01a976c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Output of predict: shape (4,), value [0. 1. 1. 1.]\n",
"\u001b[92mAll tests passed!\n"
]
}
],
"source": [
"# Test your predict code\n",
"np.random.seed(1)\n",
"tmp_w = np.random.randn(2)\n",
"tmp_b = 0.3 \n",
"tmp_X = np.random.randn(4, 2) - 0.5\n",
"\n",
"tmp_p = predict(tmp_X, tmp_w, tmp_b)\n",
"print(f'Output of predict: shape {tmp_p.shape}, value {tmp_p}')\n",
"\n",
"# UNIT TESTS \n",
"predict_test(predict)"
]
},
{
"cell_type": "markdown",
"id": "3c73ab64",
"metadata": {},
"source": [
"**Expected output** \n",
"\n",
"
\n",
"
\n",
"
Output of predict: shape (4,),value [0. 1. 1. 1.]
\n",
"
\n",
"
"
]
},
{
"cell_type": "markdown",
"id": "eb8c0f6a",
"metadata": {},
"source": [
"Now let's use this to compute the accuracy on the training set"
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "0cfe1116",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Train Accuracy: 92.000000\n"
]
}
],
"source": [
"#Compute accuracy on our training set\n",
"p = predict(X_train, w,b)\n",
"print('Train Accuracy: %f'%(np.mean(p == y_train) * 100))"
]
},
{
"cell_type": "markdown",
"id": "5fd9bc6e",
"metadata": {},
"source": [
"
\n",
"
\n",
"
Train Accuracy (approx):
\n",
"
92.00
\n",
"
\n",
"
"
]
},
{
"cell_type": "markdown",
"id": "b46e747b",
"metadata": {},
"source": [
"#### 3 - Regularized Logistic Regression\n",
"\n",
"\n",
"In this part of the exercise, you will implement regularized logistic regression to predict whether microchips from a fabrication plant passes quality assurance (QA). During QA, each microchip goes through various tests to ensure it is functioning correctly. \n",
"\n",
"##### 3.1 Problem Statement\n",
"\n",
"\n",
"Suppose you are the product manager of the factory and you have the test results for some microchips on two different tests. \n",
"- From these two tests, you would like to determine whether the microchips should be accepted or rejected. \n",
"- To help you make the decision, you have a dataset of test results on past microchips, from which you can build a logistic regression model.\n",
"\n",
"##### 3.2 Loading and visualizing the data\n",
"\n",
"\n",
"\n",
"Similar to previous parts of this exercise, let's start by loading the dataset for this task and visualizing it. \n",
"\n",
"- The `load_dataset()` function shown below loads the data into variables `X_train` and `y_train`\n",
" - `X_train` contains the test results for the microchips from two tests\n",
" - `y_train` contains the results of the QA \n",
" - `y_train = 1` if the microchip was accepted \n",
" - `y_train = 0` if the microchip was rejected \n",
" - Both `X_train` and `y_train` are numpy arrays."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "125e300e",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 36,
"id": "728ac6da",
"metadata": {},
"outputs": [],
"source": [
"# load dataset\n",
"%matplotlib widget\n",
"import matplotlib.pyplot as plt\n",
"import sys\n",
"sys.path.append(\"week3/OptionalLabs\")\n",
"sys.path.append(\"week3/C1W3A1\")\n",
"from utils import *\n",
"plt.style.use('week3/OptionalLabs/deeplearning.mplstyle')\n",
"from plt_overfit import overfit_example, output\n",
"\n",
"X_train, y_train = load_data(\"week3/C1W3A1/data/ex2data2.txt\")"
]
},
{
"cell_type": "markdown",
"id": "0365b1a4",
"metadata": {},
"source": [
"###### View the variables\n",
"\n",
"The code below prints the first five values of `X_train` and `y_train` and the type of the variables.\n"
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "3716097e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"X_train: [[ 0.05 0.7 ]\n",
" [-0.09 0.68]\n",
" [-0.21 0.69]\n",
" [-0.38 0.5 ]\n",
" [-0.51 0.47]]\n",
"Type of X_train: \n",
"y_train: [1. 1. 1. 1. 1.]\n",
"Type of y_train: \n"
]
}
],
"source": [
"# print X_train\n",
"print(\"X_train:\", X_train[:5])\n",
"print(\"Type of X_train:\",type(X_train))\n",
"\n",
"# print y_train\n",
"print(\"y_train:\", y_train[:5])\n",
"print(\"Type of y_train:\",type(y_train))"
]
},
{
"cell_type": "markdown",
"id": "2e174fe0",
"metadata": {},
"source": [
"###### Check the dimensions of your variables\n",
"\n",
"Another useful way to get familiar with your data is to view its dimensions. Let's print the shape of `X_train` and `y_train` and see how many training examples we have in our dataset."
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "0e6fb2f1",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The shape of X_train is: (118, 2)\n",
"The shape of y_train is: (118,)\n",
"We have m = 118 training examples\n"
]
}
],
"source": [
"print ('The shape of X_train is: ' + str(X_train.shape))\n",
"print ('The shape of y_train is: ' + str(y_train.shape))\n",
"print ('We have m = %d training examples' % (len(y_train)))"
]
},
{
"cell_type": "markdown",
"id": "e68a2775",
"metadata": {},
"source": [
"###### Visualize your data\n",
"\n",
"The helper function `plot_data` (from `utils.py`) is used to generate a figure like Figure 3, where the axes are the two test scores, and the positive (y = 1, accepted) and negative (y = 0, rejected) examples are shown with different markers.\n",
"\n",
""
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "dee26d62",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "2b07931d87014b7f881c17310c7d83a5",
"version_major": 2,
"version_minor": 0
},
"image/png": "",
"text/html": [
"\n",
"
\n",
"
\n",
" Figure\n",
"
\n",
" \n",
"
\n",
" "
],
"text/plain": [
"Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Plot examples\n",
"plot_data(X_train, y_train[:], pos_label=\"Accepted\", neg_label=\"Rejected\")\n",
"\n",
"# Set the y-axis label\n",
"plt.ylabel('Microchip Test 2') \n",
"# Set the x-axis label\n",
"plt.xlabel('Microchip Test 1') \n",
"plt.legend(loc=\"upper right\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "a5b2e048",
"metadata": {},
"source": [
"Figure 3 shows that our dataset cannot be separated into positive and negative examples by a straight-line through the plot. Therefore, a straight forward application of logistic regression will not perform well on this dataset since logistic regression will only be able to find a linear decision boundary.\n"
]
},
{
"cell_type": "markdown",
"id": "f9dda84b",
"metadata": {},
"source": [
"##### 3.3 Feature mapping\n",
"\n",
"\n",
"One way to fit the data better is to create more features from each data point. In the provided function `map_feature`, we will map the features into all polynomial terms of $x_1$ and $x_2$ up to the sixth power.\n",
"\n",
"$$\\mathrm{map\\_feature}(x) = \n",
"\\left[\\begin{array}{c}\n",
"x_1\\\\\n",
"x_2\\\\\n",
"x_1^2\\\\\n",
"x_1 x_2\\\\\n",
"x_2^2\\\\\n",
"x_1^3\\\\\n",
"\\vdots\\\\\n",
"x_1 x_2^5\\\\\n",
"x_2^6\\end{array}\\right]$$\n",
"\n",
"As a result of this mapping, our vector of two features (the scores on two QA tests) has been transformed into a 27-dimensional vector. \n",
"\n",
"- A logistic regression classifier trained on this higher-dimension feature vector will have a more complex decision boundary and will be nonlinear when drawn in our 2-dimensional plot. \n",
"- We have provided the `map_feature` function for you in utils.py. "
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "6905b1de",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Original shape of data: (118, 2)\n",
"Shape after feature mapping: (118, 27)\n"
]
}
],
"source": [
"print(\"Original shape of data:\", X_train.shape)\n",
"\n",
"mapped_X = map_feature(X_train[:, 0], X_train[:, 1])\n",
"print(\"Shape after feature mapping:\", mapped_X.shape)"
]
},
{
"cell_type": "markdown",
"id": "f5cdea39",
"metadata": {},
"source": [
"Let's also print the first elements of `X_train` and `mapped_X` to see the tranformation."
]
},
{
"cell_type": "code",
"execution_count": 41,
"id": "ef06b33e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"X_train[0]: [0.05 0.7 ]\n",
"mapped X_train[0]: [5.13e-02 7.00e-01 2.63e-03 3.59e-02 4.89e-01 1.35e-04 1.84e-03 2.51e-02\n",
" 3.42e-01 6.91e-06 9.43e-05 1.29e-03 1.76e-02 2.39e-01 3.54e-07 4.83e-06\n",
" 6.59e-05 9.00e-04 1.23e-02 1.68e-01 1.82e-08 2.48e-07 3.38e-06 4.61e-05\n",
" 6.29e-04 8.59e-03 1.17e-01]\n"
]
}
],
"source": [
"print(\"X_train[0]:\", X_train[0])\n",
"print(\"mapped X_train[0]:\", mapped_X[0])"
]
},
{
"cell_type": "markdown",
"id": "79288cae",
"metadata": {},
"source": [
"While the feature mapping allows us to build a more expressive classifier, it is also more susceptible to overfitting. In the next parts of the exercise, you will implement regularized logistic regression to fit the data and also see for yourself how regularization can help combat the overfitting problem.\n",
"\n",
"##### 3.4 Cost function for regularized logistic regression\n",
"\n",
"\n",
"In this part, you will implement the cost function for regularized logistic regression.\n",
"\n",
"Recall that for regularized logistic regression, the cost function is of the form\n",
"$$J(\\mathbf{w},b) = \\frac{1}{m} \\sum_{i=0}^{m-1} \\left[ -y^{(i)} \\log\\left(f_{\\mathbf{w},b}\\left( \\mathbf{x}^{(i)} \\right) \\right) - \\left( 1 - y^{(i)}\\right) \\log \\left( 1 - f_{\\mathbf{w},b}\\left( \\mathbf{x}^{(i)} \\right) \\right) \\right] + \\frac{\\lambda}{2m} \\sum_{j=0}^{n-1} w_j^2$$\n",
"\n",
"Compare this to the cost function without regularization (which you implemented above), which is of the form \n",
"\n",
"$$ J(\\mathbf{w}.b) = \\frac{1}{m}\\sum_{i=0}^{m-1} \\left[ (-y^{(i)} \\log\\left(f_{\\mathbf{w},b}\\left( \\mathbf{x}^{(i)} \\right) \\right) - \\left( 1 - y^{(i)}\\right) \\log \\left( 1 - f_{\\mathbf{w},b}\\left( \\mathbf{x}^{(i)} \\right) \\right)\\right]$$\n",
"\n",
"The difference is the regularization term, which is $$\\frac{\\lambda}{2m} \\sum_{j=0}^{n-1} w_j^2$$ \n",
"Note that the $b$ parameter is not regularized."
]
},
{
"cell_type": "markdown",
"id": "786f8919",
"metadata": {},
"source": [
"###### Exercise 5\n",
"\n",
"\n",
"Please complete the `compute_cost_reg` function below to calculate the following term for each element in $w$ \n",
"$$\\frac{\\lambda}{2m} \\sum_{j=0}^{n-1} w_j^2$$\n",
"\n",
"The starter code then adds this to the cost without regularization (which you computed above in `compute_cost`) to calculate the cost with regulatization.\n",
"\n",
"If you get stuck, you can check out the hints presented after the cell below to help you with the implementation."
]
},
{
"cell_type": "code",
"execution_count": 42,
"id": "137c8eaa",
"metadata": {},
"outputs": [],
"source": [
"# UNQ_C5\n",
"def compute_cost_reg(X, y, w, b, lambda_ = 1):\n",
" \"\"\"\n",
" Computes the cost over all examples\n",
" Args:\n",
" X : (array_like Shape (m,n)) data, m examples by n features\n",
" y : (array_like Shape (m,)) target value \n",
" w : (array_like Shape (n,)) Values of parameters of the model \n",
" b : (array_like Shape (n,)) Values of bias parameter of the model\n",
" lambda_ : (scalar, float) Controls amount of regularization\n",
" Returns:\n",
" total_cost: (scalar) cost \n",
" \"\"\"\n",
"\n",
" m, n = X.shape\n",
" \n",
" # Calls the compute_cost function that you implemented above\n",
" cost_without_reg = compute_cost(X, y, w, b) \n",
" \n",
" # You need to calculate this value\n",
" reg_cost = 0.\n",
" \n",
" ### START CODE HERE ###\n",
" reg_cost = sum(np.square(w))\n",
" ### END CODE HERE ### \n",
" \n",
" # Add the regularization cost to get the total cost\n",
" total_cost = cost_without_reg + (lambda_/(2 * m)) * reg_cost\n",
"\n",
" return total_cost"
]
},
{
"cell_type": "markdown",
"id": "fa69610d",
"metadata": {},
"source": [
"\n",
" Click for hints\n",
" \n",
" \n",
"* Here's how you can structure the overall implementation for this function\n",
" ```python \n",
" def compute_cost_reg(X, y, w, b, lambda_ = 1):\n",
" \n",
" m, n = X.shape\n",
" \n",
" # Calls the compute_cost function that you implemented above\n",
" cost_without_reg = compute_cost(X, y, w, b) \n",
" \n",
" # You need to calculate this value\n",
" reg_cost = 0.\n",
" \n",
" ### START CODE HERE ###\n",
" for j in range(n):\n",
" reg_cost_j = # Your code here to calculate the cost from w[j]\n",
" reg_cost = reg_cost + reg_cost_j\n",
"\n",
" ### END CODE HERE ### \n",
" \n",
" # Add the regularization cost to get the total cost\n",
" total_cost = cost_without_reg + (lambda_/(2 * m)) * reg_cost\n",
"\n",
" return total_cost\n",
" ```\n",
" \n",
" If you're still stuck, you can check the hints presented below to figure out how to calculate `reg_cost_j` \n",
" \n",
" \n",
" Hint to calculate reg_cost_j\n",
" You can use calculate reg_cost_j as reg_cost_j = w[j]**2 \n",
" \n",
" \n",
" \n",
"\n",
"
"
]
},
{
"cell_type": "markdown",
"id": "b22ef148",
"metadata": {},
"source": [
"##### 3.5 Gradient for regularized logistic regression\n",
"\n",
"\n",
"In this section, you will implement the gradient for regularized logistic regression.\n",
"\n",
"\n",
"The gradient of the regularized cost function has two components. The first, $\\frac{\\partial J(\\mathbf{w},b)}{\\partial b}$ is a scalar, the other is a vector with the same shape as the parameters $\\mathbf{w}$, where the $j^\\mathrm{th}$ element is defined as follows:\n",
"\n",
"$$\\frac{\\partial J(\\mathbf{w},b)}{\\partial b} = \\frac{1}{m} \\sum_{i=0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)}) $$\n",
"\n",
"$$\\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} = \\left( \\frac{1}{m} \\sum_{i=0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)}) x_j^{(i)} \\right) + \\frac{\\lambda}{m} w_j \\quad\\, \\mbox{for $j=0...(n-1)$}$$\n",
"\n",
"Compare this to the gradient of the cost function without regularization (which you implemented above), which is of the form \n",
"$$\n",
"\\frac{\\partial J(\\mathbf{w},b)}{\\partial b} = \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - \\mathbf{y}^{(i)}) \\tag{2}\n",
"$$\n",
"$$\n",
"\\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} = \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - \\mathbf{y}^{(i)})x_{j}^{(i)} \\tag{3}\n",
"$$\n",
"\n",
"\n",
"As you can see,$\\frac{\\partial J(\\mathbf{w},b)}{\\partial b}$ is the same, the difference is the following term in $\\frac{\\partial J(\\mathbf{w},b)}{\\partial w}$, which is $$\\frac{\\lambda}{m} w_j \\quad\\, \\mbox{for $j=0...(n-1)$}$$ \n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "61898e54",
"metadata": {},
"source": [
"###### Exercise 6\n",
"\n",
"\n",
"Please complete the `compute_gradient_reg` function below to modify the code below to calculate the following term\n",
"\n",
"$$\\frac{\\lambda}{m} w_j \\quad\\, \\mbox{for $j=0...(n-1)$}$$\n",
"\n",
"The starter code will add this term to the $\\frac{\\partial J(\\mathbf{w},b)}{\\partial w}$ returned from `compute_gradient` above to get the gradient for the regularized cost function.\n",
"\n",
"\n",
"If you get stuck, you can check out the hints presented after the cell below to help you with the implementation."
]
},
{
"cell_type": "code",
"execution_count": 44,
"id": "76cef830",
"metadata": {},
"outputs": [],
"source": [
"# UNQ_C6\n",
"def compute_gradient_reg(X, y, w, b, lambda_ = 1): \n",
" \"\"\"\n",
" Computes the gradient for linear regression \n",
" \n",
" Args:\n",
" X : (ndarray Shape (m,n)) variable such as house size \n",
" y : (ndarray Shape (m,)) actual value \n",
" w : (ndarray Shape (n,)) values of parameters of the model \n",
" b : (scalar) value of parameter of the model \n",
" lambda_ : (scalar,float) regularization constant\n",
" Returns\n",
" dj_db: (scalar) The gradient of the cost w.r.t. the parameter b. \n",
" dj_dw: (ndarray Shape (n,)) The gradient of the cost w.r.t. the parameters w. \n",
"\n",
" \"\"\"\n",
" m, n = X.shape\n",
" \n",
" dj_db, dj_dw = compute_gradient(X, y, w, b)\n",
"\n",
" ### START CODE HERE ### \n",
" for j in range(n):\n",
" dj_dw[j] = dj_dw[j] + (lambda_/m) * w[j]\n",
" ### END CODE HERE ### \n",
" \n",
" return dj_db, dj_dw"
]
},
{
"cell_type": "markdown",
"id": "bab72c69",
"metadata": {},
"source": [
"\n",
" Click for hints\n",
" \n",
" \n",
"* Here's how you can structure the overall implementation for this function\n",
" ```python \n",
" def compute_gradient_reg(X, y, w, b, lambda_ = 1): \n",
" m, n = X.shape\n",
" \n",
" dj_db, dj_dw = compute_gradient(X, y, w, b)\n",
"\n",
" ### START CODE HERE ### \n",
" # Loop over the elements of w\n",
" for j in range(n): \n",
" \n",
" dj_dw_j_reg = # Your code here to calculate the regularization term for dj_dw[j]\n",
" \n",
" # Add the regularization term to the correspoding element of dj_dw\n",
" dj_dw[j] = dj_dw[j] + dj_dw_j_reg\n",
" \n",
" ### END CODE HERE ### \n",
" \n",
" return dj_db, dj_dw\n",
" ```\n",
" \n",
" If you're still stuck, you can check the hints presented below to figure out how to calculate `dj_dw_j_reg` \n",
" \n",
" \n",
" Hint to calculate dj_dw_j_reg\n",
" You can use calculate dj_dw_j_reg as dj_dw_j_reg = (lambda_ / m) * w[j] \n",
" \n",
" \n",
" \n",
"\n",
"\n",
"\n",
" \n"
]
},
{
"cell_type": "markdown",
"id": "f05e0b4e",
"metadata": {},
"source": [
"Run the cell below to check your implementation of the `compute_gradient_reg` function."
]
},
{
"cell_type": "code",
"execution_count": 45,
"id": "5b26b0dd",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"dj_db: 0.07138288792343662\n",
"First few elements of regularized dj_dw:\n",
" [-0.010386028450548701, 0.011409852883280122, 0.0536273463274574, 0.003140278267313462]\n",
"\u001b[92mAll tests passed!\n"
]
}
],
"source": [
"X_mapped = map_feature(X_train[:, 0], X_train[:, 1])\n",
"np.random.seed(1) \n",
"initial_w = np.random.rand(X_mapped.shape[1]) - 0.5 \n",
"initial_b = 0.5\n",
" \n",
"lambda_ = 0.5\n",
"dj_db, dj_dw = compute_gradient_reg(X_mapped, y_train, initial_w, initial_b, lambda_)\n",
"\n",
"print(f\"dj_db: {dj_db}\", )\n",
"print(f\"First few elements of regularized dj_dw:\\n {dj_dw[:4].tolist()}\", )\n",
"\n",
"# UNIT TESTS \n",
"compute_gradient_reg_test(compute_gradient_reg)\n"
]
},
{
"cell_type": "markdown",
"id": "c1d9f328",
"metadata": {},
"source": [
"**Expected Output**:\n",
"
"
]
},
{
"cell_type": "markdown",
"id": "0b4a8a31",
"metadata": {},
"source": [
"##### 3.6 Learning parameters using gradient descent\n",
"\n",
"\n",
"Similar to the previous parts, you will use your gradient descent function implemented above to learn the optimal parameters $w$,$b$. \n",
"- If you have completed the cost and gradient for regularized logistic regression correctly, you should be able to step through the next cell to learn the parameters $w$. \n",
"- After training our parameters, we will use it to plot the decision boundary. \n",
"\n",
"**Note**\n",
"\n",
"The code block below takes quite a while to run, especially with a non-vectorized version. You can reduce the `iterations` to test your implementation and iterate faster. If you have time, run for 100,000 iterations to see better results."
]
},
{
"cell_type": "markdown",
"id": "7c0683be",
"metadata": {},
"source": [
"###### Regularised Gradient Descent "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3aef0b72",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "10233bd1",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 46,
"id": "6e1c8da6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Iteration 0: Cost 0.72 \n",
"Iteration 1000: Cost 0.59 \n",
"Iteration 2000: Cost 0.56 \n",
"Iteration 3000: Cost 0.53 \n",
"Iteration 4000: Cost 0.51 \n",
"Iteration 5000: Cost 0.50 \n",
"Iteration 6000: Cost 0.48 \n",
"Iteration 7000: Cost 0.47 \n",
"Iteration 8000: Cost 0.46 \n",
"Iteration 9000: Cost 0.45 \n",
"Iteration 9999: Cost 0.45 \n"
]
}
],
"source": [
"# Initialize fitting parameters\n",
"np.random.seed(1)\n",
"initial_w = np.random.rand(X_mapped.shape[1])-0.5\n",
"initial_b = 1.\n",
"\n",
"# Set regularization parameter lambda_ to 1 (you can try varying this)\n",
"lambda_ = 0.01; \n",
"# Some gradient descent settings\n",
"iterations = 10000\n",
"alpha = 0.01\n",
"\n",
"w,b, J_history,_ = gradient_descent(X_mapped, y_train, initial_w, initial_b, \n",
" compute_cost_reg, compute_gradient_reg, \n",
" alpha, iterations, lambda_)"
]
},
{
"cell_type": "markdown",
"id": "97f9d12d",
"metadata": {},
"source": [
"\n",
"\n",
" Expected Output: Cost < 0.5 (Click for details)\n",
"\n",
"\n",
"```\n",
"# Using the following settings\n",
"#np.random.seed(1)\n",
"#initial_w = np.random.rand(X_mapped.shape[1])-0.5\n",
"#initial_b = 1.\n",
"#lambda_ = 0.01; \n",
"#iterations = 10000\n",
"#alpha = 0.01\n",
"Iteration 0: Cost 0.72 \n",
"Iteration 1000: Cost 0.59 \n",
"Iteration 2000: Cost 0.56 \n",
"Iteration 3000: Cost 0.53 \n",
"Iteration 4000: Cost 0.51 \n",
"Iteration 5000: Cost 0.50 \n",
"Iteration 6000: Cost 0.48 \n",
"Iteration 7000: Cost 0.47 \n",
"Iteration 8000: Cost 0.46 \n",
"Iteration 9000: Cost 0.45 \n",
"Iteration 9999: Cost 0.45 \n",
" \n",
"```"
]
},
{
"cell_type": "markdown",
"id": "25ea15f9",
"metadata": {},
"source": [
"##### 3.7 Plotting the decision boundary\n",
"\n",
"\n",
"To help you visualize the model learned by this classifier, we will use our `plot_decision_boundary` function which plots the (non-linear) decision boundary that separates the positive and negative examples. \n",
"\n",
"- In the function, we plotted the non-linear decision boundary by computing the classifier’s predictions on an evenly spaced grid and then drew a contour plot of where the predictions change from y = 0 to y = 1.\n",
"\n",
"- After learning the parameters $w$,$b$, the next step is to plot a decision boundary similar to Figure 4.\n",
"\n",
""
]
},
{
"cell_type": "code",
"execution_count": 47,
"id": "6d59cdd7",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "2b07931d87014b7f881c17310c7d83a5",
"version_major": 2,
"version_minor": 0
},
"image/png": "",
"text/html": [
"\n",
"
\n",
"
\n",
" Figure\n",
"
\n",
" \n",
"
\n",
" "
],
"text/plain": [
"Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot_decision_boundary(w, b, X_mapped, y_train)"
]
},
{
"cell_type": "code",
"execution_count": 48,
"id": "a9f5c8f2",
"metadata": {},
"outputs": [],
"source": [
"def sig(z):\n",
" return 1/(1+np.exp(-z))\n",
"\n",
"def plot_decision_boundary(w, b, X, y):\n",
" # Credit to dibgerge on Github for this plotting code\n",
"\n",
" plot_data(X[:, 0:2], y)\n",
"\n",
" if X.shape[1] <= 2:\n",
" print(\"HI\")\n",
" plot_x = np.array([min(X[:, 0]), max(X[:, 0])])\n",
" plot_y = (-1. / w[1]) * (w[0] * plot_x + b)\n",
"\n",
" plt.plot(plot_x, plot_y, c=\"b\")\n",
"\n",
" else:\n",
" u = np.linspace(-1, 1.5, 50)\n",
" v = np.linspace(-1, 1.5, 50)\n",
"\n",
" z = np.zeros((len(u), len(v)))\n",
"\n",
" # Evaluate z = theta*x over the grid\n",
" for i in range(len(u)):\n",
" for j in range(len(v)):\n",
" z[i,j] = sig(np.dot(map_feature(u[i], v[j]), w) + b)\n",
"\n",
" # important to transpose z before calling contour \n",
" z = z.T\n",
" print(z,z.shape)\n",
" # Plot z = 0\n",
" plt.contour(u,v,z, levels = [0.5], colors=\"g\")\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"id": "36468e29",
"metadata": {},
"source": [
"##### 3.8 Evaluating regularized logistic regression model\n",
"\n",
"\n",
"You will use the `predict` function that you implemented above to calculate the accuracy of the regulaized logistic regression model on the training set"
]
},
{
"cell_type": "code",
"execution_count": 49,
"id": "36559ced",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Train Accuracy: 82.203390\n"
]
}
],
"source": [
"#Compute accuracy on the training set\n",
"p = predict(X_mapped, w, b)\n",
"\n",
"print('Train Accuracy: %f'%(np.mean(p == y_train) * 100))"
]
},
{
"cell_type": "markdown",
"id": "15f64b07",
"metadata": {},
"source": [
"**Expected Output**:\n",
"
\n",
"
\n",
"
Train Accuracy:~ 80%
\n",
"
"
]
},
{
"cell_type": "markdown",
"id": "6b7715cd",
"metadata": {},
"source": [
"##### My Solution"
]
},
{
"cell_type": "code",
"execution_count": 50,
"id": "066eceb6",
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The cost is 2.0028840493199764\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/tmp/ipykernel_6349/2829149365.py:82: RuntimeWarning: divide by zero encountered in scalar divide\n",
" if np.abs((cost_i[i]-cost_i[i-1])/cost_i[i])<0.05:\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"The cost is 0.6893248648821225\n",
"The cost is 0.5947327181536963\n",
"The cost is 0.5594964140290553\n",
"The cost is 0.5439204525169686\n",
"The cost is 0.5365941656209957\n",
"The cost is 0.5329892209020973\n",
"The cost is 0.5311518234465994\n",
"The cost is 0.5301878305603884\n",
"The cost is 0.5296693737410955\n",
"The cost is 0.5293843854020753\n",
"The cost is 0.5292246537838604\n",
"The cost is 0.5291335540683393\n",
"The cost is 0.5290807827627604\n",
"The cost is 0.5290497884840468\n",
"The cost is 0.5290313609587369\n",
"The cost is 0.5290202869125287\n",
"The cost is 0.5290135693988894\n",
"The cost is 0.5290094612865763\n",
"The cost is 0.5290069311969307\n",
"[ 0.62 1.18 -2.02 -0.92 -1.43 0.13 -0.37 -0.36 -0.17 -1.46 -0.05 -0.61\n",
" -0.27 -1.19 -0.24 -0.2 -0.04 -0.27 -0.29 -0.46 -1.04 0.03 -0.28 0.02\n",
" -0.32 -0.14 -0.93] 1.2715618096527268 0.5290053880660263\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"No artists with labels found to put in legend. Note that artists whose label start with an underscore are ignored when legend() is called with no argument.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Train Accuracy: 0.000000\n",
"(array([ 0.62, 1.18, -2.02, -0.92, -1.43, 0.13, -0.37, -0.36, -0.17,\n",
" -1.46, -0.05, -0.61, -0.27, -1.19, -0.24, -0.2 , -0.04, -0.27,\n",
" -0.29, -0.46, -1.04, 0.03, -0.28, 0.02, -0.32, -0.14, -0.93]), 1.2715618096527268)\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "2516291ceb8a486e86cf3517f8d513c4",
"version_major": 2,
"version_minor": 0
},
"image/png": "",
"text/html": [
"\n",
"