Title: | Local Inferential Feature Significance for Multivariate Kernel Density Estimation |
---|---|
Description: | Local inferential feature significance for multivariate kernel density estimation. |
Authors: | Tarn Duong [aut, cre], Matt Wand [aut] |
Maintainer: | Tarn Duong <[email protected]> |
License: | GPL-2 | GPL-3 |
Version: | 1.2.15 |
Built: | 2025-02-17 05:15:59 UTC |
Source: | https://github.com/cran/feature |
Package for feature significance for multivariate kernel density estimation.
The feature package contains functions to display and compute kernel density estimates, significant gradient and significant curvature regions. Significant gradient and/or curvature regions often correspond to significant features (e.g. local modes).
There are two main functions in this package.
featureSignifGUI
is the interactive function where
the user can select bandwidths from a pre-defined range. This
mode is useful for initial exploratory
data analysis. featureSignif
is the non-interactive function.
This is useful when the user has a more
definite idea of suitable values for the bandwidths.
For a more detailed example for 1-, 2- and 3-d data, see
vignette("feature")
.
Tarn Duong <[email protected]> & Matt Wand <[email protected]>
ks
, sm
, KernSmooth
This data set is a reduced version of the full data set in Scott (1992). It contains the first three variables.
data(earthquake)
data(earthquake)
A matrix with 3 columns and 510 rows. Each row corresponds to the measurements of an earthquake beneath the Mt St Helens volcano. The first column is the longitude (in degrees, where a negative number indicates west of the International Date Line), the second column is the latitude (in degrees, where a positive number indicates north of the Equator) and the third column is the depth (in km, where a negative number indicates below the Earth's surface).
Scott, D.W. (1992). Multivariate Density Estimation: Theory, Practice and Visualization. John Wiley & Sons Inc., New York.
Identify significant features of kernel density estimates of 1- to 4-dimensional data.
featureSignif(x, bw, gridsize, scaleData=FALSE, addSignifGrad=TRUE, addSignifCurv=TRUE, signifLevel=0.05)
featureSignif(x, bw, gridsize, scaleData=FALSE, addSignifGrad=TRUE, addSignifCurv=TRUE, signifLevel=0.05)
x |
data matrix |
bw |
vector of bandwidth(s) |
gridsize |
vector of estimation grid sizes |
scaleData |
flag for scaling the data i.e. transforming to unit variance for each dimension. |
addSignifGrad |
flag for computing significant gradient regions |
addSignifCurv |
flag for computing significant curvature regions |
signifLevel |
significance level |
Feature significance is based on significance testing of the gradient (first derivative) and curvature (second derivative) of a kernel density estimate. This was developed for 1-d data by Chaudhuri & Marron (1995), for 2-d data by Godtliebsen, Marron & Chaudhuri (1999), and for 3-d and 4-d data by Duong, Cowling, Koch & Wand (2007).
The test statistic for gradient testing is at a point is
where
is kernel estimate of the gradient
of
with bandwidth
, and
is the Euclidean norm.
is
approximately chi-squared distributed with
degrees of freedom
where
is the dimension of the data.
The analogous test statistic for the curvature is
where is the kernel estimate of the curvature of
, and vech is the vector-half operator.
is
approximately chi-squared distributed with
degrees of freedom.
Since this is a situation with many dependent hypothesis tests, we use the Hochberg multiple comparison testing procedure to control the overall level of significance. See Hochberg (1988) and Duong, Cowling, Koch & Wand (2007).
Returns an object of class fs
which is a list with the following fields
x |
data matrix |
names |
name labels used for plotting |
bw |
vector of bandwidths |
fhat |
kernel density estimate on a grid |
grad |
logical grid for significant gradient |
curv |
logical grid for significant curvature |
gradData |
logical vector for significant gradient data points |
gradDataPoints |
significant gradient data points |
curvData |
logical vector for significant curvature data points |
curvDataPoints |
significant curvature data points |
Chaudhuri, P. & Marron, J.S. (1999) SiZer for exploration of structures in curves. Journal of the American Statistical Association, 94, 807-823.
Duong, T., Cowling, A., Koch, I. & Wand, M.P. (2008) Feature significance for multivariate kernel density estimation. Computational Statistics and Data Analysis, 52, 4225-4242.
Hochberg, Y. (1988) A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75, 800-802.
Godtliebsen, F., Marron, J.S. & Chaudhuri, P. (2002) Significance in scale space for bivariate density estimation. Journal of Computational and Graphical Statistics, 11, 1-22.
Wand, M.P. & Jones, M.C. (1995) Kernel Smoothing. Chapman & Hall/CRC, London.
## Univariate example data(earthquake) eq3 <- -log10(-earthquake[,3]) fs <- featureSignif(eq3, bw=0.1) plot(fs, addSignifGradRegion=TRUE) ## Bivariate example library(MASS) data(geyser) fs <- featureSignif(geyser) plot(fs, addKDE=FALSE, addData=TRUE) ## data only plot(fs, addKDE=TRUE) ## KDE plot only plot(fs, addSignifGradRegion=TRUE) plot(fs, addKDE=FALSE, addSignifCurvRegion=TRUE) plot(fs, addSignifCurvData=TRUE, curvCol="cyan")
## Univariate example data(earthquake) eq3 <- -log10(-earthquake[,3]) fs <- featureSignif(eq3, bw=0.1) plot(fs, addSignifGradRegion=TRUE) ## Bivariate example library(MASS) data(geyser) fs <- featureSignif(geyser) plot(fs, addKDE=FALSE, addData=TRUE) ## data only plot(fs, addKDE=TRUE) ## KDE plot only plot(fs, addSignifGradRegion=TRUE) plot(fs, addKDE=FALSE, addSignifCurvRegion=TRUE) plot(fs, addSignifCurvData=TRUE, curvCol="cyan")
GUI for feature significance for kernel density estimation.
featureSignifGUI(x, scaleData=FALSE)
featureSignifGUI(x, scaleData=FALSE)
x |
data matrix |
scaleData |
flag for scaling the data to the unit interval in each dimension |
In the first column are the sliders for selecting the bandwidths (one
for each dimension). Move
the slider buttons to change the value of the bandwidths.
The text field is for the grid size which specifies the number of
points in each dimension of the kernel estimation binning grid. Press the Compute significant features
button to begin the
computation. This creates a plot of the kernel density estimate (KDE)
from the data with the specified bandwidths by calling
featureSignif
. Once this complete, a
pop-up window will appear.
In the second column are the axis limits and labels. The last text
field is for the (maximum) number of data points used in the display.
Press the Reset plot (except KDE)
button to clear the plot of all
added features except for the KDE itself.
In the third column are 5 buttons which can be used to add to the KDE plot
such as the data points, significant gradient points/regions and
significant curvature points/regions.
For 1-d data, the button in the third column is Compute SiZer map
.
Press this button to compute a gradient SiZer plot using the
SiZer
function. Once this complete, a pop-up window will appear.
For 2- and 3-d data, the button in the third column is Reset plot
. This
will clear the plot of all features as well as the KDE. This is useful
for showing only the significant features when the KDE
may interfere with their display.
For 3-d data, there is an extra fourth column of options: these are
sliders for the transparency values for the
features. Move the slider button along to the desired value (between 0
and 1) and then press the Add ...
button to the left.
Repeatedly pressing the Add ...
button will cause the transparency
of the features to decrease. In this case, press the one of the Reset plot
buttons to clear the plot window, and replot the significant feature
with the desired transparency.
if (interactive()){ library(MASS) data(geyser) duration <- geyser$duration featureSignifGUI(duration) ## univariate example featureSignifGUI(geyser) ## bivariate example data(earthquake) ## trivariate example earthquake$depth <- -log10(-earthquake$depth) featureSignifGUI(earthquake, scaleData=TRUE) }
if (interactive()){ library(MASS) data(geyser) duration <- geyser$duration featureSignifGUI(duration) ## univariate example featureSignifGUI(geyser) ## bivariate example data(earthquake) ## trivariate example earthquake$depth <- -log10(-earthquake$depth) featureSignifGUI(earthquake, scaleData=TRUE) }
Feature signficance plot for 1- to 3-dimensional data.
## S3 method for class 'fs' plot(x, xlab, ylab, zlab, xlim, ylim, zlim, add=FALSE, addData=FALSE, scaleData=FALSE, addDataNum=1000, addKDE=TRUE,jitterRug=TRUE, addSignifGradRegion=FALSE, addSignifGradData=FALSE, addSignifCurvRegion=FALSE, addSignifCurvData=FALSE, addAxes3d=TRUE, densCol, dataCol="black", gradCol="#33A02C", curvCol="#1F78B4", axisCol="black", bgCol="white", dataAlpha=0.1, gradDataAlpha=0.3, gradRegionAlpha=0.2, curvDataAlpha=0.3, curvRegionAlpha=0.3, rgl=FALSE, ...)
## S3 method for class 'fs' plot(x, xlab, ylab, zlab, xlim, ylim, zlim, add=FALSE, addData=FALSE, scaleData=FALSE, addDataNum=1000, addKDE=TRUE,jitterRug=TRUE, addSignifGradRegion=FALSE, addSignifGradData=FALSE, addSignifCurvRegion=FALSE, addSignifCurvData=FALSE, addAxes3d=TRUE, densCol, dataCol="black", gradCol="#33A02C", curvCol="#1F78B4", axisCol="black", bgCol="white", dataAlpha=0.1, gradDataAlpha=0.3, gradRegionAlpha=0.2, curvDataAlpha=0.3, curvRegionAlpha=0.3, rgl=FALSE, ...)
x |
object of class |
xlim , ylim , zlim
|
x-, y-, z-axis limits |
xlab , ylab , zlab
|
x-, y-, z-axis labels |
scaleData |
flag for scaling the data i.e. transforming to unit variance for each dimension |
add |
flag for adding to an existing plot |
addData |
flag for display of the data |
addDataNum |
maximum number of data points plotted in displays |
addKDE |
flag for display of kernel density estimates |
jitterRug |
flag for jittering of rug-plot for univariate data display |
addSignifGradRegion , addSignifGradData
|
flag for display of significant gradient regions/data points |
addSignifCurvRegion , addSignifCurvData
|
flag for display of significant curvature regions/data points |
addAxes3d |
flag for displaying axes in 3-d displays |
densCol |
colour of density estimate curve |
dataCol |
colour of data points |
gradCol |
colour of significant gradient regions/data points |
curvCol |
colour of significant curvature regions/data points |
axisCol |
colour of axes |
bgCol |
colour of background |
dataAlpha |
transparency of data points |
gradRegionAlpha , gradDataAlpha
|
transparency of significant gradient regions/data points |
curvRegionAlpha , curvDataAlpha
|
transparency of significant curvature regions/data points |
rgl |
flag to send 3D graphics to RGL window. Default is FALSE (usual graphics window). |
... |
other graphics parameters |
Plot of 1-d and 2-d kernel density estimates are sent to graphics window. Plot for 3-d is sent to RGL/graphics window.
## See ? featureSignif for uni- and bivariate examples ## Trivariate example data(earthquake) earthquake[,3] <- -log10(-earthquake[,3]) fs <- featureSignif(earthquake, scaleData=TRUE, bw=c(0.06, 0.06, 0.05)) plot(fs, addKDE=TRUE, addSignifCurvData=TRUE) plot(fs, addKDE=FALSE, addSignifCurvRegion=TRUE) if (interactive()) plot(fs, addKDE=FALSE, addSignifCurvRegion=TRUE, rgl=TRUE)
## See ? featureSignif for uni- and bivariate examples ## Trivariate example data(earthquake) earthquake[,3] <- -log10(-earthquake[,3]) fs <- featureSignif(earthquake, scaleData=TRUE, bw=c(0.06, 0.06, 0.05)) plot(fs, addKDE=TRUE, addSignifCurvData=TRUE) plot(fs, addKDE=FALSE, addSignifCurvRegion=TRUE) if (interactive()) plot(fs, addKDE=FALSE, addSignifCurvRegion=TRUE, rgl=TRUE)
SiZer (Significant Zero crossings) and SiCon (Significant Convexity) plots for 1-dimensional data.
SiZer(x, bw, gridsize, scaleData=FALSE, signifLevel=0.05, plotSiZer=TRUE, logbw=TRUE, xlim, xlab, addLegend=TRUE, posLegend="bottomright") SiCon(x, bw, gridsize, scaleData=FALSE, signifLevel=0.05, plotSiCon=TRUE, logbw=TRUE, xlim, xlab, addLegend=TRUE, posLegend="bottomright")
SiZer(x, bw, gridsize, scaleData=FALSE, signifLevel=0.05, plotSiZer=TRUE, logbw=TRUE, xlim, xlab, addLegend=TRUE, posLegend="bottomright") SiCon(x, bw, gridsize, scaleData=FALSE, signifLevel=0.05, plotSiCon=TRUE, logbw=TRUE, xlim, xlab, addLegend=TRUE, posLegend="bottomright")
x |
data vector |
bw |
vector of range of bandwidths |
gridsize |
number of x- and y-axis grid points |
scaleData |
flag for scaling the data i.e. transforming to unit variance for each dimension. |
signifLevel |
significance level |
plotSiZer , plotSiCon
|
flag for displaying SiZer/SiCon map |
logbw |
flag for displaying log bandwidths on y-axis |
xlim |
x-axis limits |
xlab |
x-axis label |
addLegend |
flag for legend display |
posLegend |
legend position |
The gradient SiZer and curvature SiCon maps of Chaudhuri & Marron (1999) are implemented. The horizontal axis is the data axis, the vertical axis are the bandwidths. The colour scheme for the SiZer map is red: negative gradient, blue: positive gradient, purple: zero gradient and grey: sparse regions. For the SiCon map, orange: negative curvature (concave), blue: positive curvature (convex), green: zero curvature and grey: sparse regions.
SiZer/SiCon plot sent to graphics window.
Chaudhuri, P. & Marron, J.S. (1999) SiZer for exploration of structures in curves. Journal of the American Statistical Association, 94, 807-823.
data(earthquake) eq3 <- -log10(-earthquake[,3]) SiZer(eq3) SiCon(eq3)
data(earthquake) eq3 <- -log10(-earthquake[,3]) SiZer(eq3) SiCon(eq3)