Package 'feature'

Title: Local Inferential Feature Significance for Multivariate Kernel Density Estimation
Description: Local inferential feature significance for multivariate kernel density estimation.
Authors: Tarn Duong [aut, cre], Matt Wand [aut]
Maintainer: Tarn Duong <[email protected]>
License: GPL-2 | GPL-3
Version: 1.2.15
Built: 2025-02-17 05:15:59 UTC
Source: https://github.com/cran/feature

Help Index


feature

Description

Package for feature significance for multivariate kernel density estimation.

Details

The feature package contains functions to display and compute kernel density estimates, significant gradient and significant curvature regions. Significant gradient and/or curvature regions often correspond to significant features (e.g. local modes).

There are two main functions in this package. featureSignifGUI is the interactive function where the user can select bandwidths from a pre-defined range. This mode is useful for initial exploratory data analysis. featureSignif is the non-interactive function. This is useful when the user has a more definite idea of suitable values for the bandwidths. For a more detailed example for 1-, 2- and 3-d data, see vignette("feature").

Author(s)

Tarn Duong <[email protected]> & Matt Wand <[email protected]>

See Also

ks, sm, KernSmooth


Mt St Helens earthquake data

Description

This data set is a reduced version of the full data set in Scott (1992). It contains the first three variables.

Usage

data(earthquake)

Format

A matrix with 3 columns and 510 rows. Each row corresponds to the measurements of an earthquake beneath the Mt St Helens volcano. The first column is the longitude (in degrees, where a negative number indicates west of the International Date Line), the second column is the latitude (in degrees, where a positive number indicates north of the Equator) and the third column is the depth (in km, where a negative number indicates below the Earth's surface).

Source

Scott, D.W. (1992). Multivariate Density Estimation: Theory, Practice and Visualization. John Wiley & Sons Inc., New York.


Feature significance for kernel density estimation

Description

Identify significant features of kernel density estimates of 1- to 4-dimensional data.

Usage

featureSignif(x, bw, gridsize, scaleData=FALSE, addSignifGrad=TRUE,
   addSignifCurv=TRUE, signifLevel=0.05)

Arguments

x

data matrix

bw

vector of bandwidth(s)

gridsize

vector of estimation grid sizes

scaleData

flag for scaling the data i.e. transforming to unit variance for each dimension.

addSignifGrad

flag for computing significant gradient regions

addSignifCurv

flag for computing significant curvature regions

signifLevel

significance level

Details

Feature significance is based on significance testing of the gradient (first derivative) and curvature (second derivative) of a kernel density estimate. This was developed for 1-d data by Chaudhuri & Marron (1995), for 2-d data by Godtliebsen, Marron & Chaudhuri (1999), and for 3-d and 4-d data by Duong, Cowling, Koch & Wand (2007).

The test statistic for gradient testing is at a point x\mathbf{x} is

W(x)=f^(x;H)2W(\mathbf{x}) = \Vert \widehat{\nabla f} (\mathbf{x}; \mathbf{H}) \Vert^2

where f^(x;H)\widehat{\nabla f} (\mathbf{x};\mathbf{H}) is kernel estimate of the gradient of f(x)f(\mathbf{x}) with bandwidth H\mathbf{H}, and \Vert\cdot\Vert is the Euclidean norm. W(x)W(\mathbf{x}) is approximately chi-squared distributed with dd degrees of freedom where dd is the dimension of the data.

The analogous test statistic for the curvature is

W(2)(x)=vech(2)f^(x;H)2W^{(2)}(\mathbf{x}) = \Vert \mathrm{vech} \widehat{\nabla^{(2)}f} (\mathbf{x}; \mathbf{H})\Vert ^2

where (2)f^(x;H)\widehat{\nabla^{(2)} f} (\mathbf{x};\mathbf{H}) is the kernel estimate of the curvature of f(x)f(\mathbf{x}), and vech is the vector-half operator. W(2)(x)W^{(2)}(\mathbf{x}) is approximately chi-squared distributed with d(d+1)/2d(d+1)/2 degrees of freedom.

Since this is a situation with many dependent hypothesis tests, we use the Hochberg multiple comparison testing procedure to control the overall level of significance. See Hochberg (1988) and Duong, Cowling, Koch & Wand (2007).

Value

Returns an object of class fs which is a list with the following fields

x

data matrix

names

name labels used for plotting

bw

vector of bandwidths

fhat

kernel density estimate on a grid

grad

logical grid for significant gradient

curv

logical grid for significant curvature

gradData

logical vector for significant gradient data points

gradDataPoints

significant gradient data points

curvData

logical vector for significant curvature data points

curvDataPoints

significant curvature data points

References

Chaudhuri, P. & Marron, J.S. (1999) SiZer for exploration of structures in curves. Journal of the American Statistical Association, 94, 807-823.

Duong, T., Cowling, A., Koch, I. & Wand, M.P. (2008) Feature significance for multivariate kernel density estimation. Computational Statistics and Data Analysis, 52, 4225-4242.

Hochberg, Y. (1988) A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75, 800-802.

Godtliebsen, F., Marron, J.S. & Chaudhuri, P. (2002) Significance in scale space for bivariate density estimation. Journal of Computational and Graphical Statistics, 11, 1-22.

Wand, M.P. & Jones, M.C. (1995) Kernel Smoothing. Chapman & Hall/CRC, London.

See Also

featureSignifGUI, plot.fs

Examples

## Univariate example
data(earthquake)
eq3 <- -log10(-earthquake[,3])
fs <- featureSignif(eq3, bw=0.1)
plot(fs, addSignifGradRegion=TRUE)

## Bivariate example
library(MASS)
data(geyser)
fs <- featureSignif(geyser)
plot(fs, addKDE=FALSE, addData=TRUE)  ## data only
plot(fs, addKDE=TRUE)                 ## KDE plot only
plot(fs, addSignifGradRegion=TRUE)    
plot(fs, addKDE=FALSE, addSignifCurvRegion=TRUE)
plot(fs, addSignifCurvData=TRUE, curvCol="cyan")

GUI for feature significance for kernel density estimation

Description

GUI for feature significance for kernel density estimation.

Usage

featureSignifGUI(x, scaleData=FALSE)

Arguments

x

data matrix

scaleData

flag for scaling the data to the unit interval in each dimension

Details

In the first column are the sliders for selecting the bandwidths (one for each dimension). Move the slider buttons to change the value of the bandwidths. The text field is for the grid size which specifies the number of points in each dimension of the kernel estimation binning grid. Press the Compute significant features button to begin the computation. This creates a plot of the kernel density estimate (KDE) from the data with the specified bandwidths by calling featureSignif. Once this complete, a pop-up window will appear.

In the second column are the axis limits and labels. The last text field is for the (maximum) number of data points used in the display. Press the Reset plot (except KDE) button to clear the plot of all added features except for the KDE itself.

In the third column are 5 buttons which can be used to add to the KDE plot such as the data points, significant gradient points/regions and significant curvature points/regions. For 1-d data, the button in the third column is Compute SiZer map. Press this button to compute a gradient SiZer plot using the SiZer function. Once this complete, a pop-up window will appear. For 2- and 3-d data, the button in the third column is Reset plot. This will clear the plot of all features as well as the KDE. This is useful for showing only the significant features when the KDE may interfere with their display.

For 3-d data, there is an extra fourth column of options: these are sliders for the transparency values for the features. Move the slider button along to the desired value (between 0 and 1) and then press the Add ... button to the left. Repeatedly pressing the Add ... button will cause the transparency of the features to decrease. In this case, press the one of the Reset plot buttons to clear the plot window, and replot the significant feature with the desired transparency.

Examples

if (interactive()){
library(MASS)
data(geyser)
duration <- geyser$duration 
featureSignifGUI(duration)  ## univariate example
featureSignifGUI(geyser)    ## bivariate example

data(earthquake)            ## trivariate example
earthquake$depth <- -log10(-earthquake$depth)
featureSignifGUI(earthquake, scaleData=TRUE)
}

Feature signficance plot for 1- to 3-dimensional data

Description

Feature signficance plot for 1- to 3-dimensional data.

Usage

## S3 method for class 'fs'
plot(x, xlab, ylab, zlab, xlim, ylim, zlim, add=FALSE, addData=FALSE,
   scaleData=FALSE, addDataNum=1000, addKDE=TRUE,jitterRug=TRUE,
   addSignifGradRegion=FALSE, addSignifGradData=FALSE,
   addSignifCurvRegion=FALSE, addSignifCurvData=FALSE, addAxes3d=TRUE,
   densCol, dataCol="black", gradCol="#33A02C", curvCol="#1F78B4",
   axisCol="black", bgCol="white", dataAlpha=0.1, gradDataAlpha=0.3,
   gradRegionAlpha=0.2, curvDataAlpha=0.3, curvRegionAlpha=0.3, rgl=FALSE, ...)

Arguments

x

object of class fs (output from featureSignif function)

xlim, ylim, zlim

x-, y-, z-axis limits

xlab, ylab, zlab

x-, y-, z-axis labels

scaleData

flag for scaling the data i.e. transforming to unit variance for each dimension

add

flag for adding to an existing plot

addData

flag for display of the data

addDataNum

maximum number of data points plotted in displays

addKDE

flag for display of kernel density estimates

jitterRug

flag for jittering of rug-plot for univariate data display

addSignifGradRegion, addSignifGradData

flag for display of significant gradient regions/data points

addSignifCurvRegion, addSignifCurvData

flag for display of significant curvature regions/data points

addAxes3d

flag for displaying axes in 3-d displays

densCol

colour of density estimate curve

dataCol

colour of data points

gradCol

colour of significant gradient regions/data points

curvCol

colour of significant curvature regions/data points

axisCol

colour of axes

bgCol

colour of background

dataAlpha

transparency of data points

gradRegionAlpha, gradDataAlpha

transparency of significant gradient regions/data points

curvRegionAlpha, curvDataAlpha

transparency of significant curvature regions/data points

rgl

flag to send 3D graphics to RGL window. Default is FALSE (usual graphics window).

...

other graphics parameters

Value

Plot of 1-d and 2-d kernel density estimates are sent to graphics window. Plot for 3-d is sent to RGL/graphics window.

See Also

featureSignif

Examples

## See ? featureSignif for uni- and bivariate examples
## Trivariate example
data(earthquake)
earthquake[,3] <- -log10(-earthquake[,3])
fs <- featureSignif(earthquake, scaleData=TRUE, bw=c(0.06, 0.06, 0.05))
plot(fs, addKDE=TRUE, addSignifCurvData=TRUE)
plot(fs, addKDE=FALSE, addSignifCurvRegion=TRUE)
if (interactive()) plot(fs, addKDE=FALSE, addSignifCurvRegion=TRUE, rgl=TRUE)

SiZer and SiCon plots for 1-dimensional data

Description

SiZer (Significant Zero crossings) and SiCon (Significant Convexity) plots for 1-dimensional data.

Usage

SiZer(x, bw, gridsize, scaleData=FALSE, signifLevel=0.05, plotSiZer=TRUE,
   logbw=TRUE, xlim, xlab, addLegend=TRUE, posLegend="bottomright") 

SiCon(x, bw, gridsize, scaleData=FALSE, signifLevel=0.05, plotSiCon=TRUE,
   logbw=TRUE, xlim, xlab, addLegend=TRUE, posLegend="bottomright")

Arguments

x

data vector

bw

vector of range of bandwidths

gridsize

number of x- and y-axis grid points

scaleData

flag for scaling the data i.e. transforming to unit variance for each dimension.

signifLevel

significance level

plotSiZer, plotSiCon

flag for displaying SiZer/SiCon map

logbw

flag for displaying log bandwidths on y-axis

xlim

x-axis limits

xlab

x-axis label

addLegend

flag for legend display

posLegend

legend position

Details

The gradient SiZer and curvature SiCon maps of Chaudhuri & Marron (1999) are implemented. The horizontal axis is the data axis, the vertical axis are the bandwidths. The colour scheme for the SiZer map is red: negative gradient, blue: positive gradient, purple: zero gradient and grey: sparse regions. For the SiCon map, orange: negative curvature (concave), blue: positive curvature (convex), green: zero curvature and grey: sparse regions.

Value

SiZer/SiCon plot sent to graphics window.

References

Chaudhuri, P. & Marron, J.S. (1999) SiZer for exploration of structures in curves. Journal of the American Statistical Association, 94, 807-823.

See Also

featureSignif

Examples

data(earthquake)
eq3 <- -log10(-earthquake[,3])
SiZer(eq3)
SiCon(eq3)