Distance Metrics in Machine Learning In Python IN SHORT

Distance Metrics

Sandipan Paul
5 min readJan 22, 2023

It is used in both supervised and unsupervised learning, generally to calculate the similarity between data points.

Types of Distance Metrics in Machine Learning

  1. Euclidean Distance
  2. Manhattan Distance
  3. Minkowski Distance
  4. Hamming Distance
  5. Cosine Distance

Few Machine learning algorithm uses Distance Metrics

  1. Clustering Algorithms (For example, K Means etc.)
  2. Classification Algorithms (For example, KNN Classification etc.)
Only one necessary library is needed for the analysis

Euclidean Distance

Euclidean Distance represents the shortest distance between two points.

Most machine learning algorithms including K-Means use this distance metric to measure the similarity between observations.

Overview of Euclidean Distance
Calculate Euclidean Distance in Python

Manhattan Distance

Manhattan Distance is the sum of absolute differences between points across all the dimensions.

Overview of Manhattan Distance
Comparison of Manhattan Distance vs Euclidean Distance. The inference is they are quite similar.

Minkowski Distance

Minkowski Distance is the generalized form of Euclidean and Manhattan Distance

  • If Lamda = 1, then it calculates Manhatten Distance
  • If Lamda = 2, then it calculates Euclidean Distance

In the SciPy package, the p parameter of the Minkowski Distance metric of the SciPy package

  • When the order(p) = 1, it will represent Manhattan Distance
  • When the order(p) = 2, it will represent Euclidean Distance
Overview of Minkowski Distance
Minkowski Distance in Python

Hamming Distance

Hamming Distance measures the similarity between two strings of the same length

The Hamming Distance between two strings of the same length is the number of positions at which the corresponding characters are different.

Let’s say we have two strings: “euclidean” and “manhattan”

Since the length of these strings is equal, we can calculate the Hamming Distance. We will go character by character and match the strings. The first character of both the strings (e and m respectively) is different. Similarly, the second character of both the strings (u and a) is different. and so on.

Look carefully — seven characters are different whereas two characters (the last two characters) are similar: “euclide — an” and “manhatt — an”. Hence, the Hamming Distance here will be 7.

Note that the larger the Hamming Distance between two strings, the more dissimilar they will be those strings (and vice versa)

Hamming Distance in Python
In most cases, we use Euclidean distance because it’s more intuitive and it’s something that everybody understands.

Cosine Distance / Cosine Similarity

Cosine similarity is used to determine the similarity between documents or vectors. Mathematically, it measures the cosine of the angle between two vectors projected in a multi-dimensional space.

The relation between cosine similarity and cosine distance can be defined below.

  1. Similarity decreases when the distance between two vectors increases
  2. Similarity increases when the distance between two vectors decreases

1 — Cosine Similarity = Cosine Distance, Cosine Similarity = Cos (Theta)

Case 1: When the angle between points P1 & P2 is 45. Degrees then

cosine_similarity = Cos 45 = 0.525

Case 2: When two points P1 & P2 are far from each other and the angle between points is 90 Degrees then

cosine_similarity = Cos 90 = 0

Case 3: When two points P1 & P2 are very near and lies on the same axis as each other and the angle between points is 0 Degree then

cosine_similarity = Cos 0 = 1

Case 4: When points P1 & P2 lies opposite two each other and the angle between points is 180 Degree then

cosine_similarity= Cos 180 = -1

Case 5: When the angle between points P1 & P2 is 270 Degrees then

cosine_similarity= Cos 270 = 0

Case 6: When the angle between points P1 & P2 is 360 Degrees then

cosine_similarity= Cos 360 = 1

Let's pass these values of each angle discussed above and see the Cosine Distance between two points. (1 — Cosine Similarity = Cosine Distance)

Let's replace the values in the above formula.

Case 1: When Cos 45 Degree: Cosine_Distance = 1–0.525 = 0.475

Case 2: When Cos 90 Degree: Cosine_Distance = 1–0 = 1

Case 3: When Cos 0 Degree: Cosine_Distance = 1–1 = 0

Case 4: When Cos 180 Degree: Cosine_Distance = 1–(-1)= 2

Case 5: When Cos 270 Degree: Cosine_Distance = 1–0 = 1

Case 6: When Cos 360 Degree: Cosine_Distance = 1–1 = 0

We can clearly see that when distance is less the similarity is more(points are near to each other) and distance is more, two points are dissimilar (far away from each other).

GitHub Code:

If it was helpful, please give a thumbs up. Thank You and Please follow:

Medium: https://medium.com/@sandipanpaul

GitHub: https://github.com/sandipanpaul21

--

--

Sandipan Paul
Sandipan Paul

No responses yet