Introduction to Machine Learning

7. Introduction to Machine Learning#

Artificial intelligence can be broadly defined as a field dealing with making machines perform tasks that require intelligence, when performed by humans, like: reasoning, perception, representation, language processing, planning, learning

Machine learning is a branch of AI focused on statistical algoritms that can learn from data and generalize to unseen data and perform tasks, without explicit instructions.1

Three core paradigms. Algorithms in machine learning can be divided into three paradigms:

Supervised Learning, SL: algorithm learns from labelled data; many applications can be reduced to 2 main tasks: regression (or function approximation) and classification.
Unsupervised Learning, UL: algorithm learns pattern from un-labelled data; examples of taks in UL are clustering, dimensionality reduction (and recognition of main components in data), compression (retaining only relevant components in data). Some historical algorithms and linear algebra decompositions can be interpreted or generalized as unsupervised learning.
Reinforcement Learning, RL: an algorithm (agent) learns a policy - i.e. the way to behave - interacting with an environment, and maximizing some performance to efficiently perform required tasks. Applications of RL includes planning and control.

Goals and methodology. ML is mainly a engineering-oriented and an application-focused discipline, relying on statistical inference (todo be more explicit). A ML model usually takes an input \(\mathbf{u}\), and produces an output \(\mathbf{y}\), depending on its own structure and a set of parameters \(\boldsymbol{\theta}\) and hyper-parameters \(\boldsymbol{\mu}\). Learning usually relies on optimization of an objective function

\[L(\boldsymbol{\theta}; \boldsymbol{\mu}) \ ,\]

w.r.t. parameters \(\boldsymbol{\theta}\), whose value is learned/adjusted towards an optimal solution \(\boldsymbol{\theta}^*\) that makes \(L(\boldsymbol{\theta}^*; \boldsymbol{\mu})\) extreme. The choice of hyper-parameters \(\boldsymbol{\mu}\) instead influences the training process and model behavior. Optimization usually relies on gradient methods, updating the parameters in the direction of the gradient of the objective function w.r.t. the parameters,

\[\boldsymbol{\theta} \ \leftarrow \ \boldsymbol{\theta} + \alpha \nabla_{\boldsymbol{\theta}} L(\boldsymbol{\theta}; \boldsymbol{\mu}) \ .\]

Optimization of model parameters is made fast by the use of back-propagation and automatic differentiation (AD), which efficiently compute gradients of the cost function with respect to the model’s parameters, and technically feasible for large-dimensional models - as the ones used in multi-layered neural networks, in deep learning2 - by recent hardware improvement. These algorithms are not only feasible but also particularly well-suited (being a major driver for new designs) to modern processing architectures, such as GPUs and TPUs, that accelerate the large-scale matrix and tensor computations involved in both the forward and backward passes of training.

todo Show NVIDIA, TSMC revenues

todo Add references: Bishop,…

1: “Without explicit instructions” means that a systems has no user-coded behavior, but learns it usually via optimization, usually either involving minimization of an error function or maximization of an objective function or energy/information content.
2: Deep learning can be roughly defined as that branch of machine learning using multi-layered neural networks, indeed.