A distance measure is represented by d(x,y)
Negativity of a distance
d(x,y)>=0
Positivity of a distance
The distance between any 2 points is 0 if it has the same co-ordinates.
if x=y d(x,y)=0
Symmetry of a distance
Distance between x to y is same as y to x
d(x,y)= d(y,x)
Triangular Inequality of a distance
d(x,z) + d(y,z) >= d(x,y)
Different types of Distance Measures
1. Euclidean Distance
2. Jaccard Similarity
3. Edit Distance
4. Hamming Distance
5. Cosine Distance
1. Euclidean Distance
find the euclidean distance between P and Q
P(6,4) and Q(2,7)
L1 norm = |6 - 2| +
|4 - 7 | = 7
L2 norm = [ (6 - 2)2 +
(4 - 7)2] 1/2 =
5
L α norm = max (|6 - 2|,
|4 - 7 |)
2. Jaccard Similarity
Compute Jaccard distance between A and B
A = {1,2,5,4}
B= {2,3,5,7}
A∩ B = {2,5}
No of elements (A) = 4
No of elements (B) = 4
No of elements (A∩ B ) = 2
A∪B = A + B - (A∩B) = 4+4-2 = 6
JS = A∩B/ A∪B = 2/6 = 1/3
Jaccard Distance = 1 - JS = 1- 1/3 = 2/3
3. Edit Distance
a] Longest Common Sequence
|x| + |y| - 2 |LCS (x,y) |
compute edit distance of x = a b c d e f and y = b c d e s g
LCS (x,y) = b c d e
|x| =6
|y| = 6
|LCS (x,y) |= 4
|x| + |y| - 2 |LCS (x,y) |
6+6-2(4)
= 4
b] Classical Method
x = A B C D E
y= A C F D E G
step 1 Delete B from Position 2 in x
x = A C D E
y= A C F D E G
step 2 Insert F in x
x = A C F D E
y= A C F D E G
step 3 Insert G in x
x = A C F D E G
y= A C F D E G
Edit Distance = no of insertions + no of deletions = 2+1 = 3
4. Hamming Distance
No of dissimilarities in a component of vector
dist(c1,c2) = 2 [c and d ]
5. Cosine Distance
Compute cosine distance of x = [1,2,-1] and y = [2,1,1 ]
step 1 Dot product
x.y = 3
step 2 L2 norm of x and y
x = [ (1)2 + (2)2 + (-1)2] 1/2
= √6
y = [ (2)2 + (1)2 + (1)2] 1/2
= √6
step 3 calculate cosine angle
cosine angle = dot product / l2 norm of x and y
= 3/ ( √6 √6) = 1/2 = 60°