written 8.5 years ago by |
Euclidean Distance:
The most familiar distance measure is the one we normally think of as “distance.” An n-dimensional Euclidean space is one where points are vectors of n real numbers. The conventional distance measure in this space, which we shall refer to as the L2-norm, is defined:
That is, we square the distance in each dimension, sum the squares, and take the positive square root.
Example: Consider the two-dimensional Euclidean space (the customary plane) and the points (2, 7) and (6, 4). The L2-norm gives a distance of
The L1-norm gives a distance of |2 − 6| + |7 − 4| = 4 + 3 = 7. The L∞-norm gives a distance of
max(|2 − 6|, |7 − 4|) = max(4, 3) = 4
Jaccard Distance
We define the Jaccard distance of sets by d(x, y) = 1 − SIM(x, y). That is, the Jaccard distance is 1 minus the ratio of the sizes of the intersection and union of sets x and y. We must verify that this function is a distance measure.
d(x, y) is non negative because the size of the intersection cannot exceed the size of the union.
d(x, y) = 0 if x = y, because x ∪ x = x ∩ x = x. However, if x 6= y, then the size of x ∩ y is strictly less than the size of x ∪ y, so d(x, y) is strictly positive.
d(x, y) = d(y, x) because both union and intersection are symmetric; i.e., x ∪ y = y ∪ x and x ∩ y = y ∩ x.
Cosine Distance
The cosine distance makes sense in spaces that have dimensions, including Euclidean spaces and discrete versions of Euclidean spaces, such as spaces where points are vectors with integer components or boolean (0 or 1) components. In such a space, points may be thought of as directions. The cosine distance between two points is the angle that the vectors to those points make. This angle will be in the range 0 to 180 degrees, regardless of how many dimensions the space has.
We can calculate the cosine distance by first computing the cosine of the angle, and then applying the arc-cosine function to translate to an angle in the 0-180 degree range. Given two vectors x and y, the cosine of the angle between them is the dot product x.y divided by the L2-norms of x and y (i.e., their Euclidean distances from the origin).
Edit Distance
This distance makes sense when points are strings. The distance between two strings x = x1x2 • • • xn and y = y1y2 • • • ym is the smallest number of insertions and deletions of single characters that will convert x to y.
Hamming Distance
Given a space of vectors, we define the Hamming distance between two vectors to be the number of components in which they differ. The Hamming distance cannot be negative, and if it is zero, then the vectors are identical. The distance does not depend on which of two vectors we consider first. The triangle inequality should also be evident. If x and z differ in m components, and z and y differ in n components, then x and y cannot differ in more than m+n components. Most commonly, Hamming distance is used when the vectors are boolean; they consist of 0’s and 1’s only. However, in principle, the vectors can have components from any set.