Introducing Siamese Networks
- A siamese network is a type of neural network using two or more identical subnetworks with the same architecture
- They must share the same parameters and weights
- Implying, we only need to train one set of weights (not two)
- Typically, a siamese network is used when we're interested in determining whether two inputs are similar to each other
-
The following is an example of a siamese network using an LSTM:
- Note, not all siamese networks use an LSTM
Interpreting the Output of Siamese Networks
- The output of a siamese network is a cosine similarity
- Meaning, the output becomes a measure of the similarity between the two inputs
- When is less than some threshold, then the two inputs are different
-
When is greater than some threshold, then the two inputs are similar
- Here, represents the threshold determining whether two inputs are similar or not
- A larger implies only very similar sentences will be considered similar
- Note, is a tunable hyperparameter
Describing Applications of Siamese Networks
-
Some common applications for siamese networks include:
- Facial recognition
- Signature verification for banks
- Paraphrase identification
- Duplicate checking of two similar questions
-
A siamese network performs well for tasks with little training data
- This is because the two subnetworks have shared weights
- Thus, there are fewer parameters to learn during training
- Specifically, siamese networks are useful when there are many classes with a small number of observations for each class
Defining the Triplet Loss Function
- Without our training data, each observation must include a positive and negative pair
-
There are three components of a triplet loss function:
- Anchor instance
- Positive instance
- Negative instance
- An anchor refers to the input we want to test
- A positive example refers to saved input that's similar to the anchor
- A negative example refers to saved input that's different from the anchor
- The following observation satisfies the above requirements:
- The triplet loss function is used to train siamese networks on training data with positive and negative pairings
- The goal of the triplet loss function is to minimize the difference of the and
- When training, we should choose positive and negative examples that aren't easily discernible
- The following formula defines the triplet loss function:
Illustrating the Use of One-Shot Learning
- When we only have a few instances of each class, one-shot coding can be a useful alternative to classification problems
- For example, classifying a signature as one of the many names in our database realistically may be an infeasible solution
- Instead, we may compare the similarity of an anchor signature to positive and negative signatures already saved in our database
- This alternative process is called one-shot learning
- Most signature verification systems must learn from only one image
- As a result, the learning algorithm trains on a small training set
- In the case of one-shot learning, there's no longer a need for retraining our model with the addition of more signature examples
Training an LSTM-based Siamese Network
- Prepare the training set in the following fashion:
Question 1 | Question 2 | Is Duplicate? |
---|---|---|
What is your age? | How old are you? | true |
Where are you from? | Where are you going? | false |
... | ... | ... |
- Prepare batches so each question in the other batch at its corresponding index is a duplicate:
Batch 1 | Batch 2 |
---|---|
What is your age? | How old are you? |
Where are you from? | Where were you born? |
... | ... |
- Start with a model architecture
- Create a siamese network consisting of two subnetworks following identical model architectures
- Feed in two different questions from the different batches as inputs
- Transform each question into embeddings
- Pass the embeddings into an LSTM
- Receive the outputs of the vectors
- Compare these vectors using cosine similarity
Testing an LSTM-based Siamese Network
- Convert each input into an array of numbers
- Feed the inputs into our model
- Compare the outputs and using cosine similarity
- Test against a threshold
References
Previous
Next