- Neural Network: Function Approximator
- Stochastic Gradient Descent: In Gradient descent, you use the whole dataset at once to compute the loss function and update the weights accordingly, while in stochastic gradient descent, you go batch-wise, using only a single data point or a small subset of the data for each update, leading to faster but more variable progress towards the optimal solution.
- Computational Graph: A directed graph where the nodes represent operations (activation functions) and the edges represent tensors (weights) that flow between these operations.