## Quick Intro

SimpleGrad includes two main composite types (basically Julia's version of objects/classes): *Values* and *Tensors*. *Values* store single numbers and *Tensors* store arrays of numbers. Both *Values* and *Tensors* support a variety of operations, which are automatically tracked so that the gradients can be calculated with a backward pass.

In this section, we'll cover how to actually use *Values* and *Tensors* to do calculations and compute gradients. Then in the Under the Hood section, we'll take a look at the source code and talk about how it works. I've also included two tutorials for extra practice (and will probably add more later): linear regression and MNIST.

*Values*

Let's start with the *Value* composite type. Here's how you define a *Value*

```
using SimpleGrad
x = Value(4.0)
println(x)
# output: Value(4.0)
```

*Values* can store numbers, perform operations, and automatically track the gradients of the variables involved.

Here's how you take a look at the number a *Value* is storing (called `Value.data`

), and its gradient (called `Value.grad`

):

```
println(x.data) # the number
# output: 4.0
println(x.grad) # the gradient
# output: 0.0
```

Here, `x.data == 4.0`

because the *Value* `x`

is storing the number `4.0`

, and `x.grad == 0.0`

is a placeholder for the gradient, which could eventually change if we do some operations and eventually back-calculate the gradient.

Next let's try an operation. We'll define another *Value* called `y`

, add it to `x`

, and save the result as `z`

.

```
y = Value(3.0)
z = x + y
println(z)
# output: Value(7.0)
```

Pretty simple so far, right? But here's the cool part – we can now do a backward pass to calculate the derivative of `z`

with respect to `x`

and `y`

. Here's how we do that:

`backward(z)`

Now, the `grad`

fields of `x`

and `y`

are populated, and will tell us the derivative of `z`

with respect to each of the inputs `x`

and `y`

.

```
println(x.grad) # dz/dx = 1, meaning an increase of 1 in x will lead to an increase of 1 in z.
# output: 1.0
println(y.grad) # dz/dy = 1, meaning an increase of 1 in y will lead to an increase of 1 in z.
# output: 1.0
```

In mathematical terms, we're considering the equation $z = x + y$ and are interested in the derivatives $\frac{dz}{dx}$ and $\frac{dz}{dy}$. `x.grad == 1`

tells us that $\frac{dz}{dx} = 1$ and `y.grad == 1`

tells us that $\frac{dz}{dy} = 1$ for the values of `x`

and `y`

that we've defined in our code (and in this specific example, for all values of `x`

and `y`

). If you're rusty on the calculus, you can also think of it this way: increasing `x`

by 1 will cause `z`

to increase by 1, and increasing `y`

by 1 will also cause `z`

to increase by 1.

So that's the basic functionality of the *Value* class. We can store store numbers, do operations, and track the derivative of the output with respect to all of the inputs. This allows us to, for example, minimize a loss function through gradient-descent, by tracking the derivative of the loss with respect to the model parameters, and then updating those parameters so that the loss decreases.

Here's a list of the operations currently supported:

**Addition****Subtraction****Multiplication****Division****Exponents****e^x****log()****tanh()**

Let's test a couple of them out. We've already done addition, so let's try subtraction.

```
x = Value(10.0)
y = Value(3.0)
z = x - y
println(z)
# output: Value(7.0)
```

If you want, you can try `backward(z)`

, and you should be able to find `x.grad == 1`

meaning that $\frac{dz}{dx} = 1$, and `y.grad == -1`

meaning that $\frac{dz}{dy} = -1$. But I'll skip over that for now.

Next let's try multiplication.

```
x = Value(6.0)
y = Value(2.0)
z = x * y
println(z)
# output: Value(12.0)
```

And again, we can get the derivative with of `z`

with respect to `x`

and `y`

.

```
backward(z)
println(x.grad) # dz/dx = y = 2
# output: 2.0
println(y.grad) # dz/dy = x = 6
# output: 6.0
```

Alright, so far so good! Let's try division now:

```
x = Value(15.0)
y = Value(5.0)
z = x / y
println(z)
# output: Value(3.0)
```

And the backward pass:

```
backward(z)
println(x.grad) # dz/dx = 1/5 = 0.2
# output: 0.2
println(y.grad) # dz/dy = -15 / x^2 = -0.6
# output: -0.6
```

Ok, now let's try exponents. **NOTE:** for this function, the exponent must be an Integer, NOT a *Value* or Float. Might work on fixing this later.

```
x = Value(5.0)
y = 2 # NOTE - exponent must be an Integer, not a Value or Float
z = x^y
println(z)
# output: Value(25.0)
```

And here's the backward pass:

```
backward(z)
println(x.grad) # dz/dx = 2x = 10
# output: 10.0
```

Ok, now for the exponential function $e^x$, which we'll call `exp()`

.

```
x = Value(2.0)
z = exp(x)
println(z)
# output: Value(7.38905609893065)
```

And here's the backward pass:

```
backward(z)
println(x.grad) # dz/dx = e^x = (same thing we got for above)
# output: 7.38905609893065
```

Ok, now for the natural logarithm, which we call **log()**.

```
x = Value(10.0)
z = log(x)
println(z)
# output: Value(2.302585092994046)
```

And here's the backward pass:

```
backward(z)
println(x.grad) # dz/dx = 1/x = 0.1
# output: 0.1
```

Lastly, the **tanh()** function. Personally my trig is pretty rusty and I don't use this function very often, but I'm including it because it was in Andrej Karpathy's Micrograd, which the SimpleGrad *Value* is based on. **tanh()** is useful as a possible activation function for a linear layer of neurons, to add nonlinearity and bound the layer outputs on [-1, 1].

```
x = Value(3.0)
z = tanh(x)
println(z)
# output: Value(0.9950547536867305)
```

And here's the backward pass:

```
backward(z)
println(x.grad) # dz/dx = 1 - tanh^2(x) = ????
# output: 0.009866037165440211
```

So far these examples have been pretty simple. But as long as we're using these simple functions, we can combine them in pretty complicated ways. The gradients can still be calculated for all the inputs, using backpropagation and the chain rule of derivatives.

Let's try out a complicated example to see this...

```
input1 = Value(2.3)
input2 = Value(-3.5)
input3 = Value(3.9)
weight1 = Value(-0.8)
weight2 = Value(1.8)
weight3 = Value(3.0)
bias = Value(-3.2)
y_pred = tanh(input1*weight1 + input2*weight2 + input3*weight3 + bias)
y_true = Value(0.8)
loss = (y_pred - y_true)^2
println(loss)
# output: Value(0.20683027474728832)
```

Here we're using 3 inputs, 3 weights, a bias, and a tanh() activation function to come up with some prediction in a regression problem, and calculating a loss by comparing it to the target value.

Even though this looks pretty complicated, we can still use **backward(loss)** to calculate the derivative of the loss with respect to everything.

```
backward(loss)
println(weight1.grad) # dloss/dweight1
# output: -1.8427042527651991
println(weight2.grad) # dloss/dweight2
# output: 2.80411516725139
println(weight3.grad) # dloss/dweight3
# output: -3.12458547208012
println(bias.grad) # dloss/dbias
# output: -0.8011757620718257
```

*Tensors*

*Values* are pretty useful for some specific cases, but unfortunately their scalar-valued calculations will be too slow when it comes to implementing even a pretty basic neural network. So in addition to *Values*, we also have our *Tensor* composite type, which stores data in array format (either one-dimensional or two-dimensional).

We can define a *Tensor* like this:

```
x = Tensor([2.0, 3.0, 4.0])
println(x)
# output: Tensor([2.0, 3.0, 4.0])
```

Similarly to *Values*, *Tensors* also have fields called `data`

and `grad`

that store their arrays of numbers and gradients.

```
println(x.data)
# output: [2.0, 3.0, 4.0]
println(x.grad)
# output: [0.0, 0.0, 0.0]
```

Right now the *Tensor* class pretty much has the bare minimum needed to implement a simple neural network, although I'm probably going to add more in the future. Here's a list of the operations currently supported:

**Addition****Matrix Multiplication****ReLU****Softmax Activation / Cross Entropy Loss Combination**

Rather than testing out all of these individually, let's see if we can save some time by testing them all out at once:

```
using Random
Random.seed!(1234)
inputs = Tensor(rand(2, 3)) # Matrix with shape (2,3) -- 2 batches, 3 input features per batch
weights1 = Tensor(rand(3, 4)) # Matrix with shape (3,4) -- takes 3 inputs, has 4 neurons
weights2 = Tensor(rand( 4, 5)) # Matrix with shape (4,5) -- takes 4 inputs, has 5 neurons
biases1 = Tensor([1.0, 1.0, 1.0, 1.0]) # Bias vector for first layer neurons
biases2 = Tensor([1.0, 1.0, 1.0, 1.0, 1.0]) # Bias vector for second layer neurons
layer1_out = relu(inputs * weights1 + biases1)
layer2_out = layer1_out * weights2 + biases2
# important -- correct classes should be one-hot encoded and NOT a Tensor, just a regular matrix.
y_true = [0 1 0 0 0;
0 0 0 1 0]
loss = softmax_crossentropy(layer2_out,y_true)
println(loss)
# output: Tensor([2.137377648400186])
```

Now we can find the derivative of the loss with respect to the weights and biases (and inputs if we want although that isn't as relevant).

```
backward(loss)
println(weights1.grad)
# output: [-0.3908952261176255 0.12683215951155127 0.2416920583878119 0.38808820865148697; -0.29634913482989794 0.07488805376600344 0.14838075027585607 0.29197521584353536; -0.4750896539667244 0.14050215019726503 0.27138275497284103 0.4702367656227933]
println(weights2.grad)
# output: [0.863617972699941 -0.2976748175494542 0.023649534838850777 -0.6817622076321975 0.09216951764285997; 1.0039983446760201 -0.3429788768507116 0.02749410384710956 -0.7956612611906658 0.10714768951824792; 1.19691493443326 -0.43168111913872687 0.03277440852700377 -0.9257730447883243 0.1277648209667876; 1.0153161801798791 -0.3479645917015216 0.02780390869850365 -0.8035124553248412 0.10835695814798015]
println(biases1.grad)
# output: [-0.5767635099832011 0.16409112495001651 0.31884196813703536 0.5701877733300129]
println(biases2.grad)
# output: [0.5785120187360111 -0.20248891413334258 0.01584176327066983 -0.453610397799674 0.06174552992633567]
```

Pretty cool! To see how all of this actually works, check out the Under the Hood section. For more extensive tutorials, check out the linear regression and MNIST sections.