Really Calculating the Gradient

By Ingo Dahn

This is an activated version of the above Khan Academy video on how to compute gradients. A lage part of the text below is taken from the transcript of the comment of the video.

Additionally, this page contains interactive code cells, activating the formulas explained in the video, thus providing the opportunity to apply them to calculate gradients even of very complex functions. These cells use the computational power of SageMath. No programming is required to edit the code cells to evaluate the formulas for your own functions.

Important Note: Code cells may depend on previous code cells. Therefore it is important, that code cells are evaluated in the sequence in which they occur. If a code cell is edited, all the following code cells should be re-evaluated.

License: This document may be used in accordance with the YouTube license and the Creative Commons BY-SA license - whichever is more restictive for the particulr purpose.

Now, follow the video. It will stop at predefined points to let you read the respective comment and to really compute the gradient.

So here I'm going to talk about the gradient and in this video I'm only going to describe how you compute the gradient and in the next couple ones I'm going to give the geometric interpretation and I hate doing this. I hate showing the computation before the geometric intuition since usually it should go the other way around but the gradient is one of those weird things where the way that you compute it actually seems kind of unrelated to the intuition and you'll see that we'll connect them in the next few videos but to do that we need to know what both of them actually are.

For actually computing something, we need to tell the computer in which space we are working. See here for background and details.

By default, the Euclidean plane is equipped with an orthonormal basis considting of the vectors e_x, e_y.

So on the computation side of things let's say you have some sort of function and I'm just going to make it a two variable function and let's say it's f(x,y)=x2sin(y)f(x,y)=x^2\sin(y).

We define the function ff - feel free to replace the definition of ff at will!

The gradient is a way of packing together all the partial derivative information of a function so let's just start by computing the partial derivatives of this guy.

So now what the gradient does is, it just puts both of these together in a vector and specifically you denote it with a little upside down triangle \nabla. The name of that symbol is nabla but you often just pronounce it del you'd say "del f" or "gradient of f" and what this equals is a vector that has those two partial derivatives in it. So the first one is the partial derivative with respect to xx, fx\frac{\partial f}{\partial x} the bottom one partial derivative with respect to yy, fx\frac{\partial f}{\partial x}.

We can import the gradient operator grad from a particular library. Then we can apply it to our function ff - or any other differentiable function.

Maybe I should emphasize this is actually a vector-valued function: It's got an xx and a yy.

This is a function that takes in a point in two-dimensional space and outputs a two-dimensional vector.

f:[xy][fx(x,y)fy(x,y)]\nabla f: \left[ { \begin{matrix} x \\ y \end {matrix} } \right] \mapsto \left[ { \begin{matrix} \frac{\partial f}{\partial x}(x,y)\\ \frac{\partial f}{\partial y}(x,y) \end{matrix} } \right]

So you could also imagine doing this with three different variables then you would have three partial derivatives and a three-dimensional output and the way you might write this more generally as we could go down here and say the gradient of any function is equal to a vector with its partial derivatives fx\frac{\partial f}{\partial x} and fy\frac{\partial f}{\partial y}.

And in some sense you know we call these partial derivatives - I like to think of the gradient as the full derivative, because it kind of captures all of the information that you need.

So a very helpful mnemonic device with the gradient is to think about this triangle, this nabla symbol, as being a vector full of partial derivative operators. =[xy]\nabla= \left[ { \begin{matrix} \frac{\partial}{\partial x}\\ \frac{\partial}{\partial y} \end{matrix} } \right] and by operator I just mean something where you could give it a function and it gives you another function.

So you give this guy \nabla the function ff and it gives you this expression [fxfy]\left[ { \begin{matrix} \frac{\partial f}{\partial x}\\ \frac{\partial f}{\partial y} \end{matrix} } \right] which, is a multivariable function, as a result.

So the nabla symbol is this vector full of different partial derivative operators.

This is kind of a weird thing: This is a vector which got operators in it. That's not what I thought vectors do! You could think of it as a memory trick but it's in some sense a little bit deeper than that.

You can kind of imagine multiplying \nabla by ff - really it's like an operator taking in this function ff and it's going to give you another function [xy][fx(x,y)fy(x,y)]\left[ { \begin{matrix} x \\ y \end {matrix} } \right] \mapsto \left[ { \begin{matrix} \frac{\partial f}{\partial x}(x,y)\\ \frac{\partial f}{\partial y}(x,y) \end{matrix} } \right]

We can really calculate the gradient of the function ff for any point [ab]\left[ { \begin{matrix} a \\ b \end {matrix} } \right] , say for a=1,b=2a=1,b=2 (feel free to substitute other values).

The reason for doing this is, that this symbol comes up a lot in other contexts. There are two other operators that you're going to learn about called the divergence and the curl.

I'll get to those later all in due time but it's useful to think about this vector-ish thing of partial derivatives and I mean one weird thing about it.

You could say: Okay so this this nabla symbol is a vector of partial derivative operators - what's its dimension? If you had a three-dimensional function that would mean that you should treat this like it's got three different operators as part of it.

I'd kind of finish this off down here and if you had something that was 100 dimensional it would have 100 dimension 100 different operators in it and that's fine. It's really just again a kind of a memory trick. So with that, that's how you compute the gradient.

It's pretty much just partial derivatives but you smack them into a vector where it gets fun and where it gets interesting is with the geometric interpretation. I'll get to that in the next couple videos. It's also a super important tool for something called the directional derivative.

So you've got a lot of fun stuff ahead

Now, can you answer this? You may use the cells above.