written 2.8 years ago by |
What Are Outliers?
Outlier. A data object that deviates significantly from the normal objects as if it were generated by a different mechanism
Outliers are different from the noise data
Noise is random error or variance in a measured variable
Noise should be removed before outller detection
Outliers are interesting it violates the mechanism that generates the
Outlier detection vs. novelty detection: early stage, outlier, but later merged into the model
Applications:
Credit card fraud detection
Telecom fraud detection
Customer segmentation
Medical analysis
Types of Outliers
Three kinds: global, contextual and collective outliers
1.Global outlier (or point anomaly)
If it significantly deviates from the rest of the data set.
Ex. Intrusion detection in computer networks
- Issue: Find an appropriate measurement of deviation
2.Contextual outlier (or conditional outler)
Object is O, if it deviates significantly based on a selected context
Ex. 80° F in Urbana: outlier? (depending on summer or winter?)
Attributes of data objects should be divided into two groups
Contextual attributes: defines the context, e.g. time & location
Behavioral attributes: characteristics of the object, used in Can be viewed as a generalization of local outliers-whose density
significantly deviates from its local area Issue: How to define or formulate meaningful context?
Collective Outliers
A subset of data objects collectively deviate significantly from the whole data set, even if the individual data objects may not be outliers
Applications: E.g.. Intrusion detection:
When a number of computers keep sending denial-of-service packages to each other.
Detection of collective outliers
Consider not only behavior of individual objects, but also that of groups of objects
Need to have the background knowledge on the relationship among data objects, such as a distance or similarity measure on objects.