gvar.dataset
- Random Data Sets¶
Introduction¶
gvar.dataset
contains a several tools for collecting and analyzing
random samples from arbitrary distributions. The random samples are
represented by lists of numbers or arrays, where each number/array is a new
sample from the underlying distribution. For example, six samples from a
one-dimensional gaussian distribution, 1±1, might look like
>>> random_numbers = [1.739, 2.682, 2.493, -0.460, 0.603, 0.800]
while six samples from a two-dimensional distribution, [1±1, 2±1], might be
>>> random_arrays = [[ 0.494, 2.734], [ 0.172, 1.400], [ 1.571, 1.304],
... [ 1.532, 1.510], [ 0.669, 0.873], [ 1.242, 2.188]]
Samples from more complicated multidimensional distributions are represented by dictionaries whose values are lists of numbers or arrays: for example,
>>> random_dict = dict(n=random_numbers, a=random_arrays)
where list elements random_dict['n'][i]
and random_dict['a'][i]
are
part of the same multidimensional sample for every i
— that is, the
lists for different keys in the dictionary are synchronized one with the
other.
With large samples, we typically want to estimate the mean value of the
underlying distribution. This is done using gvar.dataset.avg_data()
:
for example,
>>> print(avg_data(random_numbers))
1.31(45)
indicates that 1.31(45)
is our best guess, based only upon the samples in
random_numbers
, for the mean of the distribution from which those samples
were drawn. Similarly
>>> print(avg_data(random_arrays))
[0.95(22) 1.67(25)]
indicates that the means for the two-dimensional distribution behind
random_arrays
are [0.95(22), 1.67(25)]
. avg_data()
can also
be applied to a dictionary whose values are lists of numbers/arrays: for
example,
>>> print(avg_data(random_dict))
{'a': array([0.95(22), 1.67(25)], dtype=object),'n': 1.31(45)}
Class gvar.dataset.Dataset
can be used to assemble dictionaries containing
random samples. For example, imagine that the random samples above were
originally written into a file, as they were generated:
# file: datafile
n 1.739
a [ 0.494, 2.734]
n 2.682
a [ 0.172, 1.400]
n 2.493
a [ 1.571, 1.304]
n -0.460
a [ 1.532, 1.510]
n 0.603
a [ 0.669, 0.873]
n 0.800
a [ 1.242, 2.188]
Here each line is a different random sample, either from the one-dimensional
distribution (labeled n
) or from the two-dimensional distribution (labeled
a
). Assuming the file is called datafile
, this data can be read into
a dictionary, essentially identical to the data
dictionary above, using:
>>> data = Dataset("datafile")
>>> print(data['a'])
[array([ 0.494, 2.734]), array([ 0.172, 1.400]), array([ 1.571, 1.304]) ... ]
>>> print(avg_data(data['n']))
1.31(45)
The brackets and commas can be omitted in the input file for one-dimensional
arrays: for example, datafile
(above) could equivalently be written
# file: datafile
n 1.739
a 0.494 2.734
n 2.682
a 0.172 1.400
...
Other data formats may also be easy to use. For example, a data file written
using yaml
would look like
# file: datafile
---
n: 1.739
a: [ 0.494, 2.734]
---
n: 2.682
a: [ 0.172, 1.400]
.
.
.
and could be read into a gvar.dataset.Dataset
using:
import yaml
data = Dataset()
with open("datafile", "r") as dfile:
for d in yaml.load_all(dfile.read()): # iterate over yaml records
data.append(d) # d is a dictionary
Finally note that data can be binned, into bins of size binsize
, using
gvar.dataset.bin_data()
. For example,
gvar.dataset.bin_data(data, binsize=3)
replaces every three samples in
data
by the average of those samples. This creates a dataset that is
1/3
the size of the original but has the same mean. Binning is useful
for making large datasets more manageable, and also for removing
sample-to-sample correlations. Over-binning, however, erases statistical
information.
Class gvar.dataset.Dataset
can also be used to build a dataset sample by
sample in code: for example,
>>> a = Dataset()
>>> a.append(n=1.739, a=[ 0.494, 2.734])
>>> a.append(n=2.682, a=[ 0.172, 1.400])
...
creates the same dataset as above.
Functions¶
The functions defined in the module are:
- gvar.dataset.avg_data(dataset, spread=False, median=False, bstrap=False, noerror=False, mismatch='truncate', unbias=False, warn=False)¶
Average
dataset
to estimate means and covariance.dataset
is a list of random numbers, a list of random arrays, or a dictionary of lists of random numbers and/or arrays: for example,>>> random_numbers = [1.60, 0.99, 1.28, 1.30, 0.54, 2.15] >>> random_arrays = [[12.2,121.3],[13.4,149.2],[11.7,135.3], ... [7.2,64.6],[15.2,69.0],[8.3,108.3]] >>> random_dict = dict(n=random_numbers,a=random_arrays)
where in each case there are six random numbers/arrays.
avg_data
estimates the means of the distributions from which the random numbers/arrays are drawn, together with the uncertainties in those estimates. The results are returned as agvar.GVar
or an array ofgvar.GVar
s, or a dictionary ofgvar.GVar
s and/or arrays ofgvar.GVar
s:>>> print(avg_data(random_numbers)) 1.31(20) >>> print(avg_data(random_arrays)) [11.3(1.1) 108(13)] >>> print(avg_data(random_dict)) {'a': array([11.3(1.1), 108(13)], dtype=object),'n': 1.31(20)}
The arrays in
random_arrays
are one dimensional; in general, they can have any shape.avg_data(dataset)
also estimates any correlations between different quantities indataset
. Whendataset
is a dictionary, it does this by assuming that the lists of random numbers/arrays for the differentdataset[k]
s are synchronized, with the first element in one list corresponding to the first elements in all other lists, and so on.Note that estimates of the correlations are robust only if the number of samples being averaged is substantially larger (eg, 10x) than the number of quantities being averaged. The correlation matrix is poorly conditioned or singular if the number of samples is too small. Function
gvar.dataset.svd_diagnosis()
can be used to determine whether there is a problem, and, if so, the problem can be ameliorated by applying an SVD cut to the data after averaging (gvar.regulate()
).- Parameters:
stderr (bool) – If
True
(default) the variances and covariances represent the uncertainties in the estimates of the means (corresponding to the standard errors). IfFalse
they represent the spread in the data (corresponding to the standard deviations of the distribution). The former is smaller by a factor of one over the number of samples.spread (bool) – (Legacy)
spread=True
is the same asstderr=False
, and vice versa.median (bool) – Setting
median=True
replaces the means in the results by medians, while the standard deviations are approximated by the width of the larger of the intervals above or below the median that contains 34% of the data. These estimates are more robust than the mean and standard deviation when averaging over small amounts of data; in particular, they are unaffected by extreme outliers in the data. But correlations between different variables are ignored in this case. The default ismedian=False
.bstrap (bool) – Setting
bstrap=True
is shorthand for settingmedian=True
andstderr=False
, and overrides any explicit setting for these keyword arguments. This is the typical choice for analyzing bootstrap data — hence its name. The default value isbstrap=False
.noerror (bool) – Error estimates on the averages (and correlations) are omitted if
noerror=True
is set — just the mean values are returned. The default value isnoerror=False
.mismatch (str) –
The
mismatch
option is only relevant whendataset
is a dictionary whose entriesdataset[k]
have different sample sizes. This complicates the calculation of correlations between the different entries. There are three choices formismatch
:mismatch='truncate'
:Samples are discarded from the ends of entries so that all entries have the same sample size (equal to the smallest sample size). This is the default.
mismatch='wavg'
:The data set is decomposed into a collection of data sets so that the entries in any given data set all have the same sample size; no samples are discarded. Each of the resulting data sets is averaged separately, and the final result is the weighted average of these averages. This choice is the most accurate but also the slowest, especially for large problems. It should only be used when the smallest sample size is much larger (eg, 10x) than the number of quantities being averaged. It also requires the
lsqfit
Python module.mismatch='decorrelate'
:Ignores correlations between different entries
dataset[k]
. All samples are used and correlations within an entrydataset[k]
are retained. This is the fastest choice.
unbias (bool) – The default choice
unbias=False
means that variances are estimated usingsum((x[i] - mean(x))**2)/N
whereN
is the number of random samples. This estimate is biased but is typically more accurate than the unbiased estimate obtained whenN
is replaced byN-1
(because statistical fluctuations are then larger than the bias). The difference is negligible unlessN
is very small. Settingunbias=True
gives the unbiased estimate.warn (bool) –
warn
determines whether or not a warning is issued when different entries in a dictionary data set have different sample sizes. The default iswarn=False
.
- gvar.dataset.autocorr(dataset)¶
Compute autocorrelation in
dataset
.dataset
is a list of random numbers or random arrays, or a dictionary of lists of random numbers/arrays.When
dataset
is a list of random numbers,autocorr(dataset)
returns an array whereautocorr(dataset)[i]
is the correlation between elements indataset
that are separated by distancei
in the list: for example,>>> print(autocorr([2,-2,2,-2,2,-2])) [ 1. -1. 1. -1. 1. -1.]
shows perfect correlation between elements separated by an even interval in the list, and perfect anticorrelation between elements by an odd interval.
autocorr(dataset)
returns a list of arrays of autocorrelation coefficients whendataset
is a list of random arrays. Againautocorr(dataset)[i]
gives the autocorrelations fordataset
elements separated by distancei
in the list. Similarlyautocorr(dataset)
returns a dictionary whendataset
is a dictionary.autocorr(dataset)
uses FFTs to compute the autocorrelations; the cost of computing the autocorrelations should grow roughly linearly with the number of random samples indataset
(up to logarithms).
- gvar.dataset.bin_data(dataset, binsize=2)¶
Bin random data.
dataset
is a list of random numbers or random arrays, or a dictionary of lists of random numbers/arrays.bin_data(dataset, binsize)
replaces consecutive groups ofbinsize
numbers/arrays by the average of those numbers/arrays. The result is new data list (or dictionary) with1/binsize
times as much random data: for example,>>> print(bin_data([1,2,3,4,5,6,7],binsize=2)) [1.5, 3.5, 5.5] >>> print(bin_data(dict(s=[1,2,3,4,5],v=[[1,2],[3,4],[5,6],[7,8]]),binsize=2)) {'s': [1.5, 3.5], 'v': [array([ 2., 3.]), array([ 6., 7.])]}
Data is dropped at the end if there is insufficient data to from complete bins. Binning is used to make calculations faster and to reduce measurement-to-measurement correlations, if they exist. Over-binning erases useful information.
- gvar.dataset.bootstrap_iter(dataset, n=None)¶
Create iterator that returns bootstrap copies of
dataset
.dataset
is a list of random numbers or random arrays, or a dictionary of lists of random numbers/arrays.bootstrap_iter(dataset,n)
is an iterator that returnsn
bootstrap copies ofdataset
. The random numbers/arrays in a bootstrap copy are drawn at random (with repetition allowed) from among the samples indataset
: for example,>>> dataset = [1.1, 2.3, 0.5, 1.9] >>> data_iter = bootstrap_iter(dataset) >>> print(next(data_iter)) [ 1.1 1.1 0.5 1.9] >>> print(next(data_iter)) [ 0.5 2.3 1.9 0.5] >>> dataset = dict(a=[1,2,3,4],b=[1,2,3,4]) >>> data_iter = bootstrap_iter(dataset) >>> print(next(data_iter)) {'a': array([3, 3, 1, 2]), 'b': array([3, 3, 1, 2])} >>> print(next(data_iter)) {'a': array([1, 3, 3, 2]), 'b': array([1, 3, 3, 2])} >>> dataset = [[1,2],[3,4],[5,6],[7,8]] >>> data_iter = bootstrap_iter(dataset) >>> print(next(data_iter)) [[ 7. 8.] [ 1. 2.] [ 1. 2.] [ 7. 8.]] >>> print(next(data_iter)) [[ 3. 4.] [ 7. 8.] [ 3. 4.] [ 1. 2.]]
The distribution of bootstrap copies is an approximation to the distribution from which
dataset
was drawn. Consequently means, variances and correlations for bootstrap copies should be similar to those indataset
. Analyzing variations from bootstrap copy to copy is often useful when dealing with non-gaussian behavior or complicated correlations between different quantities.Parameter
n
specifies the maximum number of copies; there is no maximum ifn is None
.
Classes¶
gvar.dataset.Dataset
is used to assemble random samples from
multidimensional distributions:
- class gvar.dataset.Dataset(inputdata=None, binsize=1, grep=None, keys=None, h5group='/', nbin=None)¶
Dictionary for collecting random data.
A
gvar.dataset.Dataset
is an ordered dictionary whose values represent collections of random samples. Each value is anumpy
array whose first index labels the random sample. Random samples can be numbers or arrays of numbers. The keys identify the quantity being sampled.A
Dataset
can be assembled piece by piece, as random data is accumulated, or it can be read from a file. Consider a situation where there are four random values for a scalars
and four random values for vectorv
. These can be collected as follows:>>> dset = Dataset() >>> dset.append(s=1.1, v=[12.2, 20.6]) >>> dset.append(s=0.8, v=[14.1, 19.2]) >>> dset.append(s=0.95, v=[10.3, 19.7]) >>> dset.append(s=0.91, v=[8.2, 21.0]) >>> print(dset['s']) # 4 random values of s [ 1.1, 0.8, 0.95, 0.91] >>> print(dset['v']) # 4 random vector-values of v [array([ 12.2, 20.6]), array([ 14.1, 19.2]), array([ 10.3, 19.7]), array([ 8.2, 21. ])]
The argument to
dset.append()
can also be a dictionary: for example,dd = dict(s=1.1,v=[12.2,20.6]); dset.append(dd)
is equivalent to the firstappend
statement above. One can also append data key-by-key: for example,dset.append('s',1.1); dset.append('v',[12.2,20.6])
is equivalent to the firstappend
in the example above.Use
extend
in place ofappend
to add data in batches: for example,>>> dset = Dataset() >>> dset.extend(s=[1.1, 0.8], v=[[12.2, 20.6], [14.1, 19.2]]) >>> dset.extend(s=[0.95, 0.91], v=[[10.3, 19.7],[8.2, 21.0]]) >>> print(dset['s']) # 4 random values of s [ 1.1, 0.8, 0.95, 0.91]
gives the same dataset as the first example above.
The same
Dataset
can also be created from a text file named'datafile'
with the following contents:# file: datafile s 1.1 v [12.2, 20.6] s 0.8 v [14.1, 19.2] s 0.95 v [10.3, 19.7] s 0.91 v [8.2, 21.0]
Here each line consists of a key followed by a new random sample for that key. Lines that begin with
#
are ignored. The file is read using:>>> dset = Dataset('datafile') >>> print(dset['s']) [ 1.1, 0.8, 0.95, 0.91]
Data can be binned while reading it in, which might be useful if the data set is huge or if correlations are a concern. To bin the data contained in file
datafile
in bins of bin size 2 we use:>>> dset = Dataset('datafile', binsize=2) >>> print(dset['s']) [0.95, 0.93]
The keys read from a data file are restricted to those listed in keyword
keys
and those that are matched (or partially matched) by regular expressiongrep
if one or other of these is specified: for example,>>> dset = Dataset('datafile') >>> print([k for k in dset]) ['s', 'v'] >>> dset = Dataset('datafile', keys=['v']) >>> print([k for k in dset]) ['v'] >>> dset = Dataset('datafile', grep='[^v]') >>> print([k for k in dset]) ['s'] >>> dset = Dataset('datafile', keys=['v'], grep='[^v]') >>> print([k for k in dset]) []
In addition to text files, hdf5 files can also be read (provided module
h5py
is available): for example,>>> dset = Dataset('datafile.h5', h5group='/mcdata')
reads the hdf5 datasets in hdf5 group
'/mcdata'
. An hdf5 equivalent to the text file above would contain two groups, one with key's'
that is a one-dimensional array with shape (4,), and another with key'v'
that is a two-dimensional array with shape (4, 2):>>> import h5py >>> for v in h5py.File('datafile.h5')['/mcdata'].values(): ... print(v) <HDF5 dataset "s": shape (4,), type "<f8"> <HDF5 dataset "v": shape (4, 2), type "<f8">
Finally,
Dataset
s can also be constructed from other dictionaries (including otherDataset
s), or lists of key-data tuples. For example,>>> dset = Dataset('datafile') >>> dset_binned = Dataset(dset, binsize=2) >>> dset_v = Dataset(dset, keys=['v'])
reads data from file
'datafile'
intoDataset
dset
, and then creates a newDataset
with the data binned (dset_binned
), and another that only contains the data with key'v'
(dset_v
).- Parameters:
inputdata (str or list or dictionary) –
If
inputdata
is a string, it is the name of a file containing datasets. Two formats are supported. If the filename ends in ‘.h5’, the file is in hdf5 format, with datasets that arenumpy
arrays whose first index labels the random sample.The other file format is a text file where each line consists of a key followed by a number or array of numbers representing a new random sample associated with that key. Lines beginning with
#
are comments. A list of text file names can also be supplied, and text files can be compressed (with names ending in.gz
or.bz2
).If
inputdata
is a dictionary or a list of (key,value) tuples, its keys and values are copied into the dataset. Its values should be arrays whose first index labels the random sample.binsize (int) – Bin the random samples in bins of size
binsize
. Default value isbinsize=1
(i.e., no binning).grep (str or
None
) – If notNone
, only keys that match or partially match regular expressiongrep
are retained in the data set. Keys that don’t match are ignored. Default isgrep=None
.keys (list) – List of keys to retain in data set. Keys that are not in the list are ignored. Default is
keys=None
which implies that all keys are kept.h5group (str or list) – Address within the hdf5 file identified by
inputdata
that contains the relevant datasets. Every hdf5 dataset in grouph5group
is read into the dataset, with the same key as inh5group
. Default is the top group in the file:h5group='/'
.h5group
can also be a list of groups, in which case datasets from all of the groups are read.
The main attributes and methods are:
- samplesize¶
Smallest number of samples for any key.
- append(*args, **kargs)¶
Append data to dataset.
There are three equivalent ways of adding data to a dataset
data
: for example, each ofdata.append(n=1.739,a=[0.494,2.734]) # method 1 data.append(n,1.739) # method 2 data.append(a,[0.494,2.734]) dd = dict(n=1.739,a=[0.494,2.734]) # method 3 data.append(dd)
adds one new random number to
data['n']
, and a new vector todata['a']
.
- extend(*args, **kargs)¶
Add batched data to dataset.
There are three equivalent ways of adding batched data, containing multiple samples for each quantity, to a dataset
data
: for example, each ofdata.extend(n=[1.739,2.682], a=[[0.494,2.734],[ 0.172, 1.400]]) # method 1 data.extend(n,[1.739,2.682]) # method 2 data.extend(a,[[0.494,2.734],[ 0.172, 1.400]]) dd = dict(n=[1.739,2.682], a=[[0.494,2.734],[ 0.172, 1.400]]) # method 3 data.extend(dd)
adds two new random numbers to
data['n']
, and two new random vectors todata['a']
.This method can be used to merge two datasets, whether or not they share keys: for example,
data = Dataset("file1") data_extra = Dataset("file2") data.extend(data_extra) # data now contains all of data_extra
- grep(rexp)¶
Create new dataset containing items whose keys match
rexp
.Returns a new
gvar.dataset.Dataset`
containing only the itemsself[k]
whose keysk
match regular expressionrexp
(a string) according to Python modulere
:>>> a = Dataset() >>> a.append(xx=1.,xy=[10.,100.]) >>> a.append(xx=2.,xy=[20.,200.]) >>> print(a.grep('y')) {'yy': [array([ 10., 100.]), array([ 20., 200.])]} >>> print(a.grep('x')) {'xx': [1.0, 2.0], 'xy': [array([ 10., 100.]), array([ 20., 200.])]} >>> print(a.grep('x|y')) {'xx': [1.0, 2.0], 'xy': [array([ 10., 100.]), array([ 20., 200.])]} >>> print a.grep('[^y][^x]') {'xy': [array([ 10., 100.]), array([ 20., 200.])]}
Items are retained even if
rexp
matches only part of the item’s key.
- slice(sl)¶
Create new dataset with
self[k] -> self[k][sl].
Parameter
sl
is a slice object that is applied to every item in the dataset to produce a newgvar.Dataset
. Settingsl = slice(0,None,2)
, for example, discards every other sample for each quantity in the dataset. Settingsl = slice(100,None)
discards the first 100 samples for each quantity.If parameter
sl
is a tuple of slice objects, these are applied to successive indices ofself[k]
. An exception is called if the number of slice objects exceeds the number of dimensions for anyself[k]
.
- arrayzip(template)¶
Merge lists of random data according to
template
.template
is an array of keys in the dataset, where the shapes ofself[k]
are the same for all keysk
intemplate
.self.arrayzip(template)
merges the lists of random numbers/arrays associated with these keys to create a new list of (merged) random arrays whose layout is specified bytemplate
: for example,>>> d = Dataset() >>> d.append(a=1,b=10) >>> d.append(a=2,b=20) >>> d.append(a=3,b=30) >>> print(d) # three random samples each for a and b {'a': [1.0, 2.0, 3.0], 'b': [10.0, 20.0, 30.0]} >>> # merge into list of 2-vectors: >>> print(d.arrayzip(['a','b'])) [[ 1. 10.] [ 2. 20.] [ 3. 30.]] >>> # merge into list of (symmetric) 2x2 matrices: >>> print(d.arrayzip([['b','a'],['a','b']])) [[[ 10. 1.] [ 1. 10.]] [[ 20. 2.] [ 2. 20.]] [[ 30. 3.] [ 3. 30.]]]
The number of samples in each merged result is the same as the number samples for each key (here 3). The keys used in this example represent scalar quantities; in general, they could be either scalars or arrays (of any shape, so long as all have the same shape).
- trim()¶
Create new dataset where all entries have same sample size.
- toarray()¶
Create new dictionary
d
whered[k]=numpy.array(self[k])
for allk
.
- class gvar.dataset.svd_diagnosis(dataset, nbstrap=50, mincut=1e-12, models=None, process_dataset=None)¶
Diagnose the need for an SVD cut.
gvar.dataset.svd_diagnosis
bootstraps the spectrum of the correlation matrix for the data indataset
to determine how much of that spectrum is reliably determined by this data.Here
dataset
is a list of random arrays or a dictionary (e.g.,gvar.dataset.Dataset
) whose values are lists of random numbers or random arrays. The random numbers or arrays are averaged (usinggvar.dataset.avg_data()
) to produce a setgvar.GVar
s and their correlation matrix. The smallest eigenvalues of the correlation matrix are poorly estimated when the number of random samples is insufficiently large — the number of samples should typically be significantly larger than the number of random variables being analyzed in order to get good estimates of the correlations between these variables.Typical usage is
import gvar as gv s = gv.dataset.svd_diagnosis(dataset) avgdata = gv.svd(s.avgdata, svdcut=s.svdcut) s.plot_ratio(show=True)
where the defective part of the correlation matrix is corrected by applying an SVD cut to the averaged data. A plot showing the ratio of bootstrapped eigenvalues to the actual eigenvalues is displayed by the
s.plot_ratio
command.- Parameters:
dataset – List of random arrays or a dictionary (e.g.,
gvar.dataset.Dataset
) whose values are lists of random numbers or random arrays. Alternatively it can be a tuple(g, Ns)
where:g
is an array ofgvar.GVar
s or a dictionary whose values aregvar.GVar
s or arrays ofgvar.GVar
s; andNs
is the number of random samples. Then the list of random data that is analyzed is created is created usinggvar.raniter(g, n=Ns)
.nbstrap – Number of bootstrap copies used (default is 50).
models – For use in conjunction with
lsqfit.MultiFitter
; ignored when not specified. When specified, it is a list of multi- fitter models used to specify which parts of the data are being analyzed. The correlation matrix is restricted to the data specified by the models and the data returned are “processed data” for use with a multi-fitter using keywordpdata
rather thandata
. Ignored if keywordprocess_datasets
is specified.process_dataset – Function that converts datasets into averaged data. Function
gvar.dataset.avg_data()
is used if set equal toNone
(default).mincut – Minimum SVD cut (default 1e-12).
The main attributes are:
- svdcut¶
SVD cut for bad eigenvalues in correlation matrix.
- eps¶
eps
corresponding tosvdcut
, for use ingvar.regulate()
.
- avgdata¶
Averaged data (
gvar.dataset.avg_data(dataset)
).
- val¶
Eigenvalues of the correlation matrix.
- bsval¶
Bootstrap average of correlation matrix eigenvalues.
- nmod¶
Number of eigenmodes modified by SVD cut
svdcut
.
A method is available to display the eigenvalues:
- plot_ratio(plot=None, show=False)¶
Plot ratio of bootstrapped eigenvalues divided by actual eigenvalues.
Ratios (blue points) are plotted versus the value of the actual eigenvalues divided by the maximum eigenvalue. Error bars on the ratios show the range of variation across bootstrapped copies. A dotted line is drawn at
1 - sqrt(2/N)
, whereN
is the number of data points. The proposed SVD cut is where the ratio curve intersects this line; that point is indicated by a vertical dashed red line. The plot object is returned.- Parameters:
plot –
matplotlib
plotter used to make plot. Usesplot = matplotlib.pyplot
ifplot=None
(default).show – Displays the plot if
show=True
(defaultFalse
).