Data Science

A siamese network is a type of neural network using two or more identical subnetworks with the same architecture
They must share the same parameters and weights
Implying, we only need to train one set of weights (not two)
Typically, a siamese network is used when we're interested in determining whether two inputs are similar to each other
The following is an example of a siamese network using an LSTM:
- Note, not all siamese networks use an LSTM

lstmsiamese

The output of a siamese network is a cosine similarity
Meaning, the output becomes a measure of the similarity between the two inputs
When $\hat{y}$ is less than some threshold, then the two inputs are different
When $\hat{y}$ is greater than some threshold, then the two inputs are similar
- Here, $\tau$ represents the threshold determining whether two inputs are similar or not
- A larger $\tau$ implies only very similar sentences will be considered similar
- Note, $\tau$ is a tunable hyperparameter

\hat{y} \le \tau \implies \text{different}

\hat{y} \gt \tau \implies \text{same}

Some common applications for siamese networks include:
- Facial recognition
- Signature verification for banks
- Paraphrase identification
- Duplicate checking of two similar questions
A siamese network performs well for tasks with little training data
- This is because the two subnetworks have shared weights
- Thus, there are fewer parameters to learn during training
Specifically, siamese networks are useful when there are many classes with a small number of observations for each class

Without our training data, each observation must include a positive and negative pair
There are three components of a triplet loss function:
- Anchor instance
- Positive instance
- Negative instance
An anchor refers to the input we want to test
A positive example refers to saved input that's similar to the anchor
A negative example refers to saved input that's different from the anchor
The following observation satisfies the above requirements:

\text{anchor} = \text{How old are you?} \\ \text{positive} = \text{What is your age?} \\ \text{negative} = \text{Where are you from?}

The triplet loss function is used to train siamese networks on training data with positive and negative pairings
The goal of the triplet loss function is to minimize the difference of the $\text{sim}(A, N)$ and $\text{sim}(A, P)$
When training, we should choose positive and negative examples that aren't easily discernible
The following formula defines the triplet loss function:

\text{diff} = \text{sim}(A, N) - \text{sim}(A, P) \\ \mathcal{L} = \max(\text{diff} + \alpha, 0)

When we only have a few instances of each class, one-shot coding can be a useful alternative to classification problems
For example, classifying a signature as one of the many names in our database realistically may be an infeasible solution
Instead, we may compare the similarity of an anchor signature to positive and negative signatures already saved in our database
This alternative process is called one-shot learning
Most signature verification systems must learn from only one image
As a result, the learning algorithm trains on a small training set
In the case of one-shot learning, there's no longer a need for retraining our model with the addition of more signature examples

Prepare batches so each question in the other batch at its corresponding index is a duplicate:

Start with a model architecture
Create a siamese network consisting of two subnetworks following identical model architectures
Feed in two different questions from the different batches as inputs
Transform each question into embeddings
Pass the embeddings into an LSTM
Receive the outputs of the vectors
Compare these vectors using cosine similarity

Long Short-Term Memory

Attention Models