Data Science

The training process of a face verification network is very similar to the training process of a face recognition network
Specifically, we would still train a siamese network
However, we add an extra layer after the $f(x)$ embeddings

siamesesigmoid

This extra layer contains a single sigmoid neuron
Specifically, this output layer outputs:
- A $1$ if the two images are the same
- A $0$ if the two images are different
Therefore, we are not using a triplet loss anymore
Instead, we are using a cross-entropy loss function

\vert f_{k}(x^{(i)}) - f_{k}(x^{(j)}) \vert

\hat{y} = \sigma(\sum_{k=1}^{128} w_{i}^{[l]} a^{[l-1]} + b^{[l]})

\hat{y} = \sigma(\sum_{k=1}^{128} w_{i} \vert f_{k}(x^{(i)}) - f_{k}(x^{(j)}) \vert + b)

\chi^{2} = \frac{(f_{k}(x^{(i)}) - f_{k}(x^{(j)}))^{2}}{f_{k}(x^{(i)}) - f_{k}(x^{(j)})}

The training process of a face verification network is very similar to the training process of a face recognition network
Specifically, we would still train a siamese network
However, we add an extra layer after the $f(x)$ embeddings
This extra layer contains a single sigmoid neuron
The output of our network becomes a sigmoid function applied to the features
These features aren't only the embeddings
Instead, the activations $a^{[l-1]}$ become the following:

\vert f_{k}(x^{(i)}) - f_{k}(x^{(j)}) \vert

Face Recognition

Visualizing a CNN