Sequence Model (Deep Learning Ai)

Deep learning by Andrew Ng

Posted by CHENEY WANG on November 19, 2018

Recently by the same author:



#Sequence Models

Normal LSTM

Four gates:

LSTM中最重要的就是它的四个gates. 四个分别包含着记忆的丢弃和增加功能。 ####Forget gate layer It looks at $h_{t-1}$ and $x_t$, and outputs a number between 0 and 1 for each number in the cell state $C_{t−1}$. A 1 represents “completely keep this” while a 0 represents “completely get rid of this.”
照我先前的理解是,$h_{t-1}$如果在一个句子生成模型中,将是一个单词,但是后来我发现,这不对,如果是一个单词的,present 只和上一个有关系,这样子的话,forget gate layer只存在上一个单词和现在这个单词记或者不记住这两种关系。所以$h_{t-1}$是一个vector,它保存了从第一个单词开始所生成的所有单词形成的句子。所以在forget gate layer我们可以选择去遗忘会影响我们进行下一个单词生成的那部分单词从而最终达到长记忆的作用。

Input gate layer

A sigmoid layer called the “input gate layer” decides which values we’ll update. Next, a tanh layer creates a vector of new candidate values, $C̃_t$, that could be added to the state. In the next step, we’ll combine these two to create an update to the state.

相比于之前的forget, 这个应该是update。用来更新我们在新词出现以后我们所需要更新的,其实就是说,比如,第一人称第三人称的不同会影响后面的单词生成,所以我们需要进行新的名词或者说是代词的更新。这样子的话,我的理解是,sigmoid gate所输出的应该就是按0,1分布的。也就是说和前面的forget gate一样,它所输出的形式应该是[0,1,1,0,0,1,1]这样的形式。1代表仍然需要,而0代表不需要了,需要舍弃。因为这个两个输出最终都进行的是pointwise opersation 中的 multiply。而$C̃_t$代表的是我产生新的一个加入了$x_t$以后的candidate vector用于和$i_t$进行相乘。

Combine previous two gates

Output gate Layer

First, we run a sigmoid layer which decides what parts of the cell state we’re going to output. Then, we put the cell state through tanh (to push the values to be between −1 and 1) and multiply it by the output of the sigmoid gate, so that we only output the parts we decided to.


What I’ve described so far is a pretty is the core concept of normal lstm. But not all LSTM are the same as the above.