基础知识

机器学习的一些基础知识

资源


交叉熵


  • 相对熵=交叉熵-信息熵,其中
  • 信息熵表示在结果出来之前对可能产生的信息量的期望(期望就是所有可能结果的概率乘以该对应的结果)。用来衡量事务不确定性的,信息熵越大,事务越具有不确定性。我的理解就是y*与y的不确定性。
  • 相对熵用来表示两个概率分布的差异,当两个随机分布相同时,他们的相对熵为零,当两个随机分布的差别增大时,他们的相对熵也会增大。
  • 交叉熵,由于在机器学习和深度学习中,样本和标签已知,那么信息熵相当于常量,此时,只需拟合交叉熵(此时交叉熵近似与相对熵)。

梯度范数


  • 可以通过惩罚梯度范数来提高模型的泛化能力,原理是 penalizing the gradient norm of loss function is to encourage the optimizer to find a minimum that lies in a relatively flat neighborhood region, since such flat minima have been demonstrated to be able to lead to better model generalization than sharp ones.(becase penalize the gradient norm of loss function would motivate the loss function to have small Lipschitz constant in local. If the loss function has a smaller Lipschitz constant, it would indicate that the loss function landscape is flatter, which in consequence could lead to better model generalization)

zero(one)-shot learning


  • Zero-shot learning (ZSL) is a problem setup in machine learning, where at test time, a learner observes samples from classes, which were not observed during training, and needs in order to predict the class they belong to. Zero-shot methods generally work by associating observed and non-observed classes through some form of auxiliary information, which encodes observable distinguishing properties of objects.[1] For example, given a set of images of animals to be classified, along with auxiliary textual descriptions of what animals look like, an artificial intelligence (“AI”), which has been trained to recognize horses, but has never been given a zebra, can still recognize a zebra when it also knows that zebras look like striped horses. This problem is widely studied in computer vision, natural language processing, and machine perception.

  • One-shot learning is an object categorization problem, found mostly in computer vision. Whereas most machine learning-based object categorization algorithms require training on hundreds or thousands of samples, one-shot learning aims to classify objects from one, or only a few, samples.

  • Few-shot learning means making classification or regression based on a very small number of samples. Few-shot learning is the problem of making predictions based on a limited number of samples. Few-shot learning is different from standard supervised learning. The goal of few-shot learning is not to let the model recognize the images in the training set and then generalize to the test set. Instead, the goal is to learn. The goal of training is not to know what an elephant is and what a tiger is. Instead, the goal is to know the similarity and difference between objects.