anomaly detection

Kernel density estimation(KDE)

Jinhee. 2020. 3. 17. 12:52

- Density Estimation : how to estimate distribution characteristics of the original variable from distribution of observed data

    * Estimate density of variable 'x' = estimate pdf(probability density function) of variable 'x' in machine learning

    * The density estimation method can be divided into parametric method and non-parametric method

        (parametric density estimation) Estimate model parameters for predefined pdf using data

            (example) Under assumption that daily traffic follows the normal distribution, we only need to get mean and variance from data

        (non-parametric density estimation) However, models are not often given in reality → Have to estimate model parameters for unknown pdf using data  Easiest form for non-parametric density estimation is histogram(normalize histogram obtain from observed data, and assume it as pdf)

 

- Kernel Density Estimation(KDE) : how to estimate density using kernel function

    - cons of histogram : discontinuity at the edge of bins, depends on bin size, inefficient for high dimension data

    - KDE : improve the problem of histogram for estimating non-parametric density using kernel function

        * Kernel function : non-negative function that is symmetry about the origin and has an integral value of 1 (ex. Gaussian, uniform function)

        * Important issue using KDE : what kernel function used, what h applied for bandwidth

        (ref) https://en.wikipedia.org/wiki/Kernel_(statistics)#Kernel_functions_in_common_use

        (ref) https://en.wikipedia.org/wiki/Kernel_density_estimation

 

Kernel (statistics) - Wikipedia

From Wikipedia, the free encyclopedia Jump to navigation Jump to search The term kernel is used in statistical analysis to refer to a window function. The term "kernel" has several distinct meanings in different branches of statistics. Bayesian statistics[

en.wikipedia.org

 

Kernel density estimation - Wikipedia

In statistics, kernel density estimation (KDE) is a non-parametric way to estimate the probability density function of a random variable. Kernel density estimation is a fundamental data smoothing problem where inferences about the population are made, base

en.wikipedia.org

(left) compare histogram with KDE    (right) bandwidth of the kernel

 

source : https://darkpgmr.tistory.com/147