Numerical analysis routines

Contents:

My module for various data analysis tasks.

REQUIREMENTS:numpy, Plotting and analysis tools (for errxy())

2008-07-25 16:20 IJC: Created.

2009-12-08 11:31 IJC: Updated transit flag in planet objects and
rveph() function.

2010-02-18 14:06 IJC: Added medianfilter()

2010-08-03 15:38 IJC: Merged versions.

2010-10-28 11:53 IJMC: Updated documentation strings for Sphinx;
moved pylab import inside individual function.
2011-04-13 14:29 IJMC: Added keyword functionality to fmin()
(taken from scipy.optimize).

2011-04-14 09:48 IJMC: Added a few fundamental constants.

2011-04-22 14:48 IJMC: Added trueanomaly() and
eccentricanomaly().
2011-06-21 13:15 IJMC: Added get_t23() and get_t14() to
planet objects.
analysis.allanvariance(data, dt=1)[source]

Compute the Allan variance on a set of regularly-sampled data (1D).

If the time between samples is dt and there are N total samples, the returned variance spectrum will have frequency indices from 1/dt to (N-1)/dt.

analysis.amedian(a, axis=None)[source]

Median the array over the given axis. If the axis is None, median over all dimensions of the array.

Think of this as normal Numpy median, but preserving dimensionality.

analysis.binarray(img, ndown, axis=None)[source]

downsample a 2D image

Takes a 1D vector or 2D array and reduce resolution by an integer factor “ndown”. This is done by binning the array – i.e., integrating over square blocks of pixels of width “ndown”

If keyword “axis” is None, bin over all axes. Otherwise, bin over the single specified axis.

Note that ‘ndown’ can also be a sequence: e.g., [2, 1]

EXAMPLE:
[img_ds]=binarray(img,ndown)        
analysis.cheby2poly(cin)[source]

Convert chebychev coefficients to ‘normal’ polyval coefficients .

INPUT:chebyt coefficients
OUTPUT:poly coefficients (e.g., for use w/polyval)
SEE ALSO:poly2cheby(), gpolyval(); scipy.special.chebyt
analysis.confmap(map, frac, **kw)[source]

Return the confidence level of a 2D histogram or array that encloses the specified fraction of the total sum.

INPUTS:
map : 1D or 2D numpy array

Probability map (from hist2d or kde)

frac : float, 0 <= frac <= 1

desired fraction of enclosed energy of map

OPTIONS:
ordinate : None or 1D array

If 1D map, interpolates onto the desired value. This could cause problems when you aren’t just setting upper/lower limits....

SEE_ALSO:

dumbconf() for 1D distributions

analysis.dopspec(starspec, planetspec, starrv, planetrv, disp, starphase=, []planetphase=, []wlscale=True)[source]

Generate combined time series spectra using planet and star models, planet and star RV profiles.

D = dopspec(sspec, pspec, sRV, pRV, disp, sphase=[], pphase=[])

INPUTS:
sspec, pspec: star, planet spectra; must be on a common

logarithmic wavelength grid

sRV, pRV: star, planet radial velocities in m/s disp: constant logarithmic dispersion of the wavelength

grid: LAMBDA_i/LAMBDA_(i-1)

OPTIONAL INPUTS:
 
sphase, pphase: normalized phase functions of star and planet.

The inputs sspec and pspec will be scaled by these values for each observation.

wlscale: return relative wavelength scale for new data

NOTE: Input spectra must be linearly spaced in log wavelength and increasing:
that is, they must have [lambda_i / lambda_(i-1)] = disp = constant > 1

Positive velocities are directed AWAY from the observer.

analysis.doubleGaussian(p, x)[source]

Compute the sum of two gaussian distributions at the points x.

p is a six- or seven-component sequence:

y = [p6 +] p0/(p1*sqrt(2pi)) * exp(-(x-p2)**2 / (2*p1**2)) +
p3/(p4*sqrt(2pi)) * exp(-(x-p5)**2 / (2*p4**2))

p[0] – Area of gaussian A

p[1] – one-sigma dispersion of gaussian A

p[2] – central offset (mean location) of gaussian A

p[3] – Area of gaussian B

p[4] – one-sigma dispersion of gaussian B

p[5] – central offset (mean location) of gaussian B

p[6] – optional constant, vertical offset

NOTE: FWHM = 2*sqrt(2*ln(2)) * p1 ~ 2.3548*p1

SEE ALSO: gaussian()

analysis.doubleGaussianCen(p, x, mu1, mu2)[source]

Compute the sum of two gaussian distributions at the points x. The distributions have central moments mu1 and mu2.

Useful for fitting to partially blended spectral data.

p is a four- or five-component sequence:

y = [p6 +] p0/(p1*sqrt(2pi)) * exp(-(x-mu1)**2 / (2*p1**2)) +
p3/(p4*sqrt(2pi)) * exp(-(x-mu2)**2 / (2*p4**2))

p[0] – Area of gaussian A p[1] – one-sigma dispersion of gaussian A p[2] – Area of gaussian B p[3] – one-sigma dispersion of gaussian B p[4] – optional constant, vertical offset

mu1 – central offset (mean location) of gaussian A mu2 – central offset (mean location) of gaussian B

NOTE: FWHM = 2*sqrt(2*ln(2)) * p1 ~ 2.3548*p1

SEE ALSO: doubleGaussian(), gaussian()

analysis.dumbconf(vec, sig, type='central', mid='mean', verbose=False)[source]

Determine two-sided and one-sided confidence limits, using sorting.

INPUTS:
vec : sequence

1D Vector of data values, for which confidence levels will be computed.

sig : scalar

Confidence level, 0 < sig < 1. If type=’central’, we return the value X for which the range (mid-X, mid+x) encloses a fraction sig of the data values.

OPTIONAL INPUTS:
 

type=’central’ – ‘upper’, ‘lower’, or ‘central’ confidence limits mid=’mean’ – compute middle with mean or median

SEE_ALSO:

confmap() for 2D distributions

EXAMPLES:
from numpy import random
from analysis import dumbconf
x = random.randn(10000)
dumbconf(x, 0.683)    #  --->   1.0  (one-sigma)
dumbconf(3*x, 0.954)  #  --->   6.0  (two-sigma)
dumbconf(x+2, 0.997, type='lower')   #  --->   -0.74
dumbconf(x+2, 0.997, type='upper')   #  --->    4.7

Some typical values for a Normal (Gaussian) distribution:

type confidence level
one-sigma 0.6826895
2 sigma 0.9544997
3 sigma 0.9973002
4 sigma 0.9999366
5 sigma 0.9999994
analysis.eccentricanomaly(ecc, manom=None, tanom=None, tol=1e-08)[source]

Calculate (Keplerian, orbital) eccentric anomaly.

One optional input must be given.

INPUT:

ecc – scalar. orbital eccentricity.

OPTIONAL_INPUTS:
 
manom – scalar or sequence. Mean anomaly, equal to

2*pi*(t - t0)/period

tanom – scalar or Numpy array. True anomaly. See

trueanomaly().

analysis.egaussian(p, x, y, e=None)[source]

Compute the deviation between the values y and the gaussian defined by p, x:

p is a three- or four-component array, list, or tuple.

Returns: y - p3 - p0/(p1*sqrt(2pi)) * exp(-(x-p2)**2 / (2*p1**2))

if an error array, e (typ. one-sigma) is entered, the returned value is divided by e.

SEE ALSO: gaussian()

analysis.egaussian2d(p, x, y, z, w=None)[source]

Return the error associated with a 2D gaussian fit, using gaussian2d.

w is an array of weights, typically 1./sigma**2

analysis.error_dropoff(data)[source]

Calculates the dropoff in measurement uncertainty with increasing number of samples (a random and uncorrelated set of data should drop of as 1/sqrt[n] ).

E(0), the first returned element, always returns the uncertainty in a single measurement (i.e., the standard deviation).

EXAMPLE:

Compute the errors on an array of 3 column vectors

data = randn(1000,3)
e = error_dropoff(data)
plot(e[1], 1./sqrt(e[1]), '-', e[1], e[0], 'x')
xlim([0,30])
legend(['Theoretical [1/sqrt(N)]', 'Computed errors'])
analysis.fftfilter1d(vec, bandwidth, retfilter=False)[source]

Apply a hard-edged low-pass filter to an input vector.

INPUTS:

vec – sequence – 1D vector, assumed to be evenly sampled

bandwidth – integer – size of the filter passband in cycles

per signal duration. f <= bandwidth is passed through; f > bandwidth is rejected.

retfilter – bool – if True, return the tuple (filtered_vec, filter)

OUTPUT:

Lopass-filtered version of vec

NOTE:

Assumes the input is real-valued.

analysis.fixval(arr, repval, retarr=False)[source]

Fix non-finite values in a numpy array, and replace them with repval.

INPUT:

arr – numpy array, with values to be replaced.

repval – value to replace non-finite elements with

OPTIONAL INPUT:

retarr – if False, changes values in arr directly (more efficient). if True, returns a fixed copy of the input array, which is left unchanged.

Example:
fixval(arr, -1)
analysis.fmin(func, x0, args=(), kw={}, xtol=0.0001, ftol=0.0001, maxiter=None, maxfun=None, full_output=0, disp=1, retall=0, callback=None, zdelt=0.00025, nonzdelt=0.05, holdfixed=None)[source]

Minimize a function using the downhill simplex algorithm – now with KEYWORDS.

Parameters:
func : callable func(x,*args)

The objective function to be minimized.

x0 : ndarray

Initial guess.

args : tuple

Extra arguments passed to func, i.e. f(x,*args).

callback : callable

Called after each iteration, as callback(xk), where xk is the current parameter vector.

Returns:

(xopt, {fopt, iter, funcalls, warnflag})

xopt : ndarray

Parameter that minimizes function.

fopt : float

Value of function at minimum: fopt = func(xopt).

iter : int

Number of iterations performed.

funcalls : int

Number of function calls made.

warnflag : int

1 : Maximum number of function evaluations made. 2 : Maximum number of iterations reached.

allvecs : list

Solution at each iteration.

Other Parameters:

xtol : float
Relative error in xopt acceptable for convergence.
ftol : number
Relative error in func(xopt) acceptable for convergence.
maxiter : int
Maximum number of iterations to perform.
maxfun : number
Maximum number of function evaluations to make [200*len(x0)]
full_output : bool
Set to True if fval and warnflag outputs are desired.
disp : bool
Set to True to print convergence messages.
retall : bool
Set to True to return list of solutions at each iteration.
zdelt : number
Set the initial stepsize for x0[k] equal to zero
nonzdelt : number
Set the initial stepsize for x0[k] nonzero
holdfixed : sequence
Indices of x0 to hold fixed (e.g., [1, 2, 4])
TBD:gprior : tuple or sequence of tuples Set a gaussian prior on the indicated parameter, such that chisq += ((x0[p] - val)/unc_val)**2, where the parameters are defined by the tuple gprior=(param, val, unc_val)
Notes:Uses a Nelder-Mead simplex algorithm to find the minimum of function of one or more variables.
analysis.fmin_helper2(all_args)[source]

Allows me to wrap fmin() within pool.map() for multithreading.

EXAMPLE:
::

from multiprocessing import Pool import analysis as an import phasecurves as pc

pool = Pool(processes=nthreads) fits = pool.map(an.fmin_helper2, [[pc.errfunc, pos0[jj], mcargs, test_kws] for jj in xrange(0, nwalkers, int(nwalkers/12.))])

# The above line is equivalent to, but roughly ~(nthreads) # times faster, than the standard way to do it: fits2 = [an.fmin(pc.errfunc, pos0[jj], mcargs, **test_kw) for jj in xrange(0, nwalkers, int(nwalkers/10.))]

NOTES:

This must be a separate, stand-alone function in order to be ‘pickleable’, which is required by pool.map().

analysis.fmin_powell(func, x0, args=(), kw={}, xtol=0.0001, ftol=0.0001, maxiter=None, maxfun=None, full_output=0, disp=1, retall=0, callback=None, direc=None, holdfixed=None)[source]

Minimize a function using modified Powell’s method – now with KEYWORDS.

Parameters:
func : callable f(x,*args)

Objective function to be minimized.

x0 : ndarray

Initial guess.

args : tuple

Eextra arguments passed to func.

kw : dict

Keyword arguments passed to func.

callback : callable

An optional user-supplied function, called after each iteration. Called as callback(xk), where xk is the current parameter vector.

direc : ndarray

Initial direction set.

Returns:

(xopt, {fopt, xi, direc, iter, funcalls, warnflag}, {allvecs})

xopt : ndarray

Parameter which minimizes func.

fopt : number

Value of function at minimum: fopt = func(xopt).

direc : ndarray

Current direction set.

iter : int

Number of iterations.

funcalls : int

Number of function calls made.

warnflag : int
Integer warning flag:

1 : Maximum number of function evaluations. 2 : Maximum number of iterations.

allvecs : list

List of solutions at each iteration.

Other Parameters:

xtol : float
Line-search error tolerance.
ftol : float
Relative error in func(xopt) acceptable for convergence.
maxiter : int
Maximum number of iterations to perform.
maxfun : int
Maximum number of function evaluations to make.
full_output : bool
If True, fopt, xi, direc, iter, funcalls, and warnflag are returned.
disp : bool
If True, print convergence messages.
retall : bool
If True, return a list of the solution at each iteration.
Notes:Uses a modification of Powell’s method to find the minimum of a function of N variables.
analysis.gaussian(p, x)[source]

Compute a gaussian distribution at the points x.

p is a three- or four-component array, list, or tuple:

y = [p3 +] p0/(p1*sqrt(2pi)) * exp(-(x-p2)**2 / (2*p1**2))

p[0] – Area of the gaussian p[1] – one-sigma dispersion p[2] – central offset (mean location) p[3] – optional constant, vertical offset

NOTE: FWHM = 2*sqrt(2*ln(2)) * p1 ~ 2.3548*p1

SEE ALSO: egaussian()

analysis.gaussian2d(p, x, y)[source]

Compute a 2D gaussian distribution at the points x, y.

INPUTS:
p : sequence

a four- or five-component array, list, or tuple:

z = [p4 +] p0/(2*pi*p1**2) * exp(-((x-p2)**2+(y-p3)) / (2*p1**2))

p[0] – Area of the gaussian p[1] – one-sigma dispersion p[2] – x-central offset p[3] – y-central offset p[4] – optional constant, vertical offset

x : NumPy array

X-coordinate values at which the above relation will be computed.

y : NumPy array

Y-coordinate values at which the above relation will be computed.

SEE_ALSO:

gaussian() (1D)

analysis.gaussian2d_ellip(p, x, y)[source]

Compute a 2D elliptical gaussian distribution at the points x, y.

p is a 5-, 6-, or 7-component sequence, defined as:

p[0] – Amplitude (Area of the function)

p[1] – x-dispersion

p[2] – y-dispersion

p[3] – x-central offset

p[4] – y-central offset

p[5] – optional rotation angle (radians)

p[6] – optional constant, vertical offset

X, Y are gridded data from numpy.meshgrid()

First define:

x’ = (x - p[3]) cos p[5] - (y - p[4]) sin p[5]

y’ = (x - p[3]) sin p[5] + (y - p[4]) cos p[5]

Then calculate:

U = (x’ / p[1])**2 + (y’ / p[2])**2

z = p[6] + p0/(2*pi*p1*p2) * exp(-U / 2)

SEE ALSO: gaussian2d(), lorentzian2d()

analysis.gaussiannd(mu, cov, x)[source]

Compute an N-dimensional gaussian distribution at the position x.

mu is the length-N 1D vector of mean positions

cov is the NxN covariance matrix of the multinormal distribution.

x are the positions at which to compute. If X is 2D (size M x
N), it is taken as M sets of N-point positions.

SEE ALSO: gaussian(), gaussian2d()

analysis.generic_mcmc(*arg, **kw)[source]

Run a Markov Chain Monte Carlo (Metropolis-Hastings algorithm) on an arbitrary function.

INPUTS:
EITHER:
func : function to generate model.

First argument must be “params;” subsequent arguments are passed in via the “args” keyword

params : 1D sequence

parameters to be fit

stepsize : 1D or 2D array

If 1D: 1-sigma change in parameter per iteration If 2D: covariance matrix for parameter changes.

z : 1D array

Contains dependent data (to be modeled)

sigma : 1D array

Contains standard deviation (errors) of “z” data

numit : int

Number of iterations to perform

OR:

(allparams, (arg1, arg2, ...), numit)

where allparams is a concatenated list of parameters for each of several functions, and the arg_i are tuples of (func_i, stepsize_i, z_i, sigma_i). In this case the keyword ‘args’ must also be a tuple of sequences, one for each function to be MCMC’ed.

OPTIONAL_INPUTS:
 
args : 1D sequence

Second, third, etc.... arguments to “func”

nstep : int

Saves every “nth” step of the chain

posdef : None, ‘all’, or sequences of indices.

Which elements should be restricted to positive definite? If indices, it should be of the form (e.g.): [0, 1, 4]

holdfixed : None, or sequences of indices.

Which elements should be held fixed in the analysis? If indices, it should be of the form (e.g.): [0, 1, 4]

jointpars : None, or sequence of 2-tuples.

Only for simultaneous multi-function fitting. For each pair of values passed, we set the parameters values so: allparams[pair[1]] = allparams[pair[0]]

OUTPUTS:
allparams : 2D array

Contains all parameters at each step

bestp : 1D array

Contains best paramters as determined by lowest Chi^2

numaccept: int

Number of accepted steps

chisq: 1D array

Chi-squared value at each step

REFERENCES:

Numerical Recipes, 3rd Edition (Section 15.8) Wikipedia

NOTES:

If you need an efficient MCMC algorithm, you should be using http://danfm.ca/emcee/

analysis.getblocks(vec)[source]

Return start and end indices for consecutive sequences of integer-valued indices.

Example:
import analysis as an
vec = range(5) +range(10,14) + range(22,39)
starts,ends = an.getblocks(vec)
print zip(starts,ends)
analysis.getobj(*args, **kw)[source]

Get data for a specified planet.

INPUTS:

(str) – planet name, e.g. “HD 189733 b”

OPTIONAL INPUTS:
 
datafile : str

datafile name

verbose : bool

verbosity flag

EXAMPLE:
p = getobj('55cnce')
p.period   ---> 2.81705

The attributes of the returned object are many and varied, and can be listed using the ‘dir’ command on the returned object.

This looks up data from the local datafile, which could be out of date.

SEE ALSO: rv()

analysis.getorbitalphase(planet, hjd, **kw)[source]

Get phase of an orbiting planet.

INPUT:
planet – a planet from getobj; e.g., getobj(‘55cnce’) hjd
OUTPUT:
orbital phase – from 0 to 1.
NOTES:

If planet.transit==True, phase is based on the transit time ephemeris. If planet.transit==False, phase is based on the RV ephemeris as

computed by function rveph

SEE ALSO: getobj(), mjd(), rveph()

analysis.gfit(func, x0, fprime, args=(), kwargs={}, maxiter=2000, ftol=0.001, factor=1.0, disp=False, bounds=None)[source]

Perform gradient-based minimization of a user-specified function.

INPUTS:
func : function

Function that takes as input the parameters x0, optional additional arguments args, and optional keywords kwargs, and returns the metric to be minimized as a scalar. For chi-squared minimization, a generalized option is :phasecurves:`errfunc`.

x0 : sequence

List or 1D NumPy array of initial-guess parameters, to be adjusted to minimize func(x0, *args, **kwargs).

fprime : function

Function that takes as input the parameters x0, optional additional arguments args, and optional keywords kwargs, and returns the partial derivatives of the metric to be minimized with regard to each element of x0.

args : list

Optional arguments to func and fprime (see above)

kwargs : dict

Optional keywords to func and fprime (see above)

maxiter : int

Maximum number of iterations to run.

ftol : scalar

Desired tolerance on the metric to be minimized. Iteration will continue until either iter>maxiter OR (metric_i - metric_(i+1)) < ftol.

factor : scalar

Factor to scale gradient before applying each new iteration. Small values will lead to slower convergences; large values will lead to wild behavior. The code attempts to (crudely) tune the value of ‘factor’ depending on how the minimization process progresses.

disp : bool

If True, print some text to screen on each iteration.

bounds : None, or list

(min, max) pairs for each element in x0, defining the bounds on that parameter. Use None or +/-inf for one of min or max when there is no bound in that direction.

RETURNS:

(params, metric, n_iter)

NOTES:

The program attempts to be slightly clever: if the metric decreases by <ftol on one iteration, the code iterates one more time. If the termination criterion is once again met then minimization ends; if not, minimization continues as before.

For quicker, smarter routines that do much the same thing, you may want to check out the functions in the scipy.optimize package.

analysis.gpolyval(c, x, mode='poly', retp=False)[source]

Generic polynomial evaluator.

INPUT:
c (1D array) – coefficients of polynomial to evaluate,
from highest-order to lowest.

x (1D array) – pixel values at which to evaluate C

OPTINAL INPUTS:
MODE=’poly’ – ‘poly’ uses standard monomial coefficients
as accepted by, e.g., polyval. Other modes – ‘cheby’ (1st kind) and ‘legendre’ (P_n) – convert input ‘x’ to a normalized [-1,+1] domain

RETP=False – Return coefficients as well as evaluated poly.

OUTPUT:
y – array of shape X; evaluated polynomial. (y, p) (if retp=True)

SEE ALSO: poly2cheby()

analysis.im2(data1, data2, xlab='', ylab='', tit='', bar=False, newfig=True, cl=None, x=, []y=, []fontsize=16)[source]

Show two images; title will only be drawn for the top one.

analysis.loadatran(filename, wl=True, verbose=False)[source]

Load ATRAN atmospheric transmission data file.

t = loadatran(filename, wl=True)

INPUT:
filename – filename of the ATRAN file. This should be an
ASCII array where the second column is wavelength and the third is the atmospheric transmission.

(This can also be a list of filenames!)

OPTIONAL INPUT:
wl – if True (DEFAULT) also return the wavelength scale.

This can take up to twice as long for large files.

RETURNS:

if wl==True: returns a 2D array, with columns [lambda, transmission] if wl==False: returns a 1D Numpy array of the transmission

NOTE: A list of these return values is created if
‘filename’ is actually an input list.
analysis.loaddata(filelist, path='', band=1)[source]

Load a set of reduced spectra into a single data file.

datalist = [‘file1.fits’, ‘file2.fits’] datapath = ‘~/data/’

data = loaddata(datalist, path=datapath, band=1)

The input can also be the name of an IRAF-style file list.

analysis.lorentzian2d(p, x, y)[source]

Compute a 2D Lorentzian distribution at the points x, y.

p is a 5-, 6-, or 7–component sequence:

z = (x-p3) ** 2 / p1 ** 2 + (y-p4) ** 2 / p2 ** 2 [ + (x-p3) * (y-p4) * p5 ]
lorentz = p0 / (1.0 + z) [ + p6]

p[0] – Amplitude (Area of the function) p[1] – x-dispersion p[2] – y-dispersion p[3] – x-central offset p[4] – y-central offset p[5] – optional ellipticitity parameter p[6] – optional constant, vertical offset

SEE ALSO: gaussian2d()

analysis.lsq(x, z, w=None, retcov=False, checkvals=True)[source]

Do weighted least-squares fitting.

INPUTS:
x : sequence

tuple of 1D vectors of equal lengths N, or the transposed numpy.vstack of this tuple

z : sequence

vector of length N; data to fit to.

w : sequence

Either an N-vector or NxN array of weights (e.g., 1./sigma_z**2)

retcov : bool.

If True, also return covariance matrix.

checkvals : bool

If True, check that all values are finite values. This is safer, but the array-based indexing slows down the function.

RETURNS:

the tuple of (coef, coeferrs, {cov_matrix})

analysis.lsqsp(x, z, w=None, retcov=False)[source]

Do weighted least-squares fitting on sparse matrices.

INPUTS:
x : sparse matrix, shape N x M

data used in the least-squares fitting, a set of N rows of M elements each.

z : sequence (shape N) or sparse matrix (shape N x 1)

data to fit to.

w : sequence (shape N) or sparse matrix (shape N x N)

Data weights and/or inverse covariances (e.g., 1./sigma_z**2)

#retcov : bool. # If True, also return covariance matrix.

RETURNS:

the tuple of (coef, coeferrs, {cov_matrix})

SEE_ALSO:

lsq()

REQUIREMENTS:

SciPy’s sparse module.

analysis.meanr(x, nsigma=3, niter=inf, finite=True, verbose=False, axis=None)[source]

Return the mean of an array after removing outliers.

INPUTS:

x – (array) data set to find mean of

OPTIONAL INPUT:

nsigma – (float) number of standard deviations for clipping niter – number of iterations. finite – if True, remove all non-finite elements (e.g. Inf, NaN) axis – (int) axis along which to compute the mean.

EXAMPLE:
from numpy import *
from analysis import meanr
x = concatenate((randn(200),[1000]))
print mean(x), meanr(x, nsigma=3)
x = concatenate((x,[nan,inf]))
print mean(x), meanr(x, nsigma=3)
SEE ALSO: medianr(), stdr(), removeoutliers(),
numpy.isfinite()
analysis.medianfilter(data, filtersize, threshold=None, verbose=False)[source]

For now, we assume FILTERSIZE is odd, and that DATA is square!

filt = medianfilter(data, filtersize)

Note that filtersize can be a scalar (e.g., 5) to equally median-filter along both axes, or a 2-vector (e.g., [5, 1]) to apply a rectangular median-filter.

This is about the slowest way to do it, but it was easy to write.

analysis.medianr(x, nsigma=3, niter=inf, finite=True, verbose=False, axis=None)[source]

Return the median of an array after removing outliers.

INPUTS:

x – (array) data set to find median of

OPTIONAL INPUT:

nsigma – (float) number of standard deviations for clipping niter – number of iterations. finite – if True, remove all non-finite elements (e.g. Inf, NaN) axis – (int) axis along which to compute the mean.

EXAMPLE:
from numpy import *
from analysis import medianr
x = concatenate((randn(200),[1000]))
print median(x), medianr(x, nsigma=3)
x = concatenate((x,[nan,inf]))
print median(x), medianr(x, nsigma=3)
SEE ALSO: meanr(), stdr(), removeoutliers(),
numpy.isfinite()
analysis.mjd(date)[source]

Convert Julian Date to Modified Julian Date, or vice versa.

if date>=2400000.5, add 2400000.5 if date<2400000.5, subtract 2400000.5

analysis.nGaussianCen(p, x, mu)[source]

Compute the sum of N gaussian distributions at the points x. The distributions have central moments defined by the vector mu.

Useful for fitting to partially blended spectral data when you have good measurements of positions (i.e., from 2D tracing).

p is a sequence of length (2N+1). If N=2:

y = [p6 +] p0/(p1*sqrt(2pi)) * exp(-(x-mu1)**2 / (2*p1**2)) +
p3/(p4*sqrt(2pi)) * exp(-(x-mu2)**2 / (2*p4**2))

p[0] – Area of gaussian 1 p[1] – one-sigma dispersion of gaussian 1 p[2] – Area of gaussian 2 p[3] – one-sigma dispersion of gaussian 2

... etc.
p[-1] – optional constant, vertical offset
and

mu1 – central offset (mean location) of gaussian A mu2 – central offset (mean location) of gaussian B

NOTE: FWHM = 2*sqrt(2*ln(2)) * p1 ~ 2.3548*p1

SEE ALSO: doubleGaussian(), gaussian()

analysis.neworder(N)[source]

Generate a random order the integers [0, N-1] inclusive.

analysis.pad(inp, npix_rows, npix_cols=None)[source]

Pads input matrix to size specified.

out = pad(in, npix)     
out = pad(in, npix_rows, npix_cols);  # alternate usage

Written by J. Green @ JPL; converted to Python by I. Crossfield

analysis.pb_helperfunction(inputs)[source]

Helper function for prayerbead(). Not for general use.

class analysis.planet(*args)[source]

Very handy planet object.

Best initialized using getobj().

REQUIREMENTS:Database file exoplanets.csv from http://exoplanets.org/
get_t14(*args)[source]

Compute total transit duration (in days) for a transiting planet.

Returns:
nan if required fields are missing.

Using Eq. 14 of J. Winn’s chapter in S. Seager’s book “Exoplanets.”

SEE ALSO:get_t23()
get_t23(*args)[source]

Compute full transit duration (in days) for a transiting planet.

Returns:
nan if required fields are missing.

Using Eq. 15 of J. Winn’s chapter in S. Seager’s book “Exoplanets.”

SEE ALSO:get_t14()
get_teq(ab, f, reterr=False)[source]

Compute equilibrium temperature.

INPUTS:
ab : scalar, 0 <= ab <= 1

Bond albedo.

f : scalar, 0.25 <= ab <= 2/3.

Recirculation efficiency. A value of 0.25 indicates full redistribution of incident heat, while 2/3 indicates zero redistribution.

EXAMPLE:
import analysis
planet = analysis.getobj('HD 189733 b')
planet.get_teq(0.0, 0.25) # zero albedo, full recirculation
REFERENCE:
  1. Seager, “Exoplanets,” 2010. Eq. 3.9
phase(hjd)[source]

Get phase of an orbiting planet.

refer to function analysis.getorbitalphase for full documentation.

SEE ALSO: analysis.getorbitalphase(), analysis.mjd()

rv(**kw)[source]

Compute radial velocity on a planet object for given Julian Date.

EXAMPLE:
import analysis
p = analysis.getobj('HD 189733 b')
jd = 2400000.
print p.rv(jd)

refer to function analysis.rv for full documentation.

SEE ALSO: analysis.rv(), analysis.mjd()

rveph(jd)[source]

Compute the most recently elapsed RV emphemeris of a given planet at a given JD. RV ephemeris is defined by the having radial velocity equal to zero.

refer to analysis.rv() for full documentation.

SEE ALSO: analysis.getobj(), analysis.phase()

writetext(filename, **kw)[source]

See analysis.planettext()

analysis.planettext(planets, filename, delimiter=', ', append=True)[source]

Write planet object info into a delimited line of text.

INPUTS:

planets : planet object or list thereof

filename : str

delimiter : str

analysis.poly2cheby(cin)[source]

Convert straight monomial coefficients to chebychev coefficients.

INPUT: poly coefficients (e.g., for use w/polyval) OUTPUT: chebyt coefficients

SEE ALSO: gpolyval(); scipy.special.chebyt

analysis.polyfitr(x, y, N, s, fev=100, w=None, diag=False, clip='both', verbose=False, plotfit=False, plotall=False, eps=1e-13, catchLinAlgError=False)[source]

Matplotlib’s polyfit with weights and sigma-clipping rejection.

DESCRIPTION:

Do a best fit polynomial of order N of y to x. Points whose fit residuals exeed s standard deviations are rejected and the fit is recalculated. Return value is a vector of polynomial coefficients [pk ... p1 p0].

OPTIONS:
w: a set of weights for the data; uses CARSMath’s weighted polynomial

fitting routine instead of numpy’s standard polyfit.

fev: number of function evaluations to call before stopping

‘diag’nostic flag: Return the tuple (p, chisq, n_iter)

clip: ‘both’ – remove outliers +/- ‘s’ sigma from fit

‘above’ – remove outliers ‘s’ sigma above fit ‘below’ – remove outliers ‘s’ sigma below fit

catchLinAlgError : bool

If True, don’t bomb on LinAlgError; instead, return [0, 0, ... 0].

REQUIREMENTS:

CARSMath

NOTES:

Iterates so long as n_newrejections>0 AND n_iter<fev.

analysis.prayerbead(*arg, **kw)[source]

Generic function to perform Prayer-Bead (residual permutation) analysis.

INPUTS:

(fitparams, modelfunction, arg1, arg2, ... , data, weights)

OR:

(allparams, (args1, args2, ..), npars=(npar1, npar2, ...))

where allparams is an array concatenation of each functions input parameters.

OPTIONAL INPUTS:
 
jointpars – list of 2-tuples.

For use with multi-function calling (w/npars keyword). Setting jointpars=[(0,10), (0,20)] will always set params[10]=params[0] and params[20]=params[0].

parinfo – None, or list of dicts

‘parinfo’ to pass to the kapteyn.py kpmfit routine.

gaussprior – list of 2-tuples, same length as “allparams.”

The i^th tuple (x_i, s_i) imposes a Gaussian prior on the i^th parameter p_i by adding ((p_i - x_i)/s_i)^2 to the total chi-squared.

axis – int or None

If input is 2D, which axis to permute over. (NOT YET IMPLEMENTED!)

step – int > 0

Stepsize for permutation steps. 1 by default. (NOT YET IMPLEMENTED!)

verbose – bool

Print various status lines to console.

maxiter – int

Maximum number of iterations for _each_ fitting step.

maxfun – int

Maximum number of function evaluations for _each_ fitting step.

threads – int

Number of threads to use (via multiprocessing.Pool)

EXAMPLE:
TBW
REQUIREMENTS:

kapteyn, Planetary phase curve routines, numpy

analysis.removeoutliers(data, nsigma, remove='both', center='mean', niter=inf, retind=False, verbose=False)[source]

Strip outliers from a dataset, iterating until converged.

INPUT:

data – 1D numpy array. data from which to remove outliers.

nsigma – positive number. limit defining outliers: number of

standard deviations from center of data.

OPTIONAL INPUTS:
 
remove – (‘min’|’max’|’both’) respectively removes outliers

below, above, or on both sides of the limits set by nsigma.

center – (‘mean’|’median’|value) – set central value, or

method to compute it.

niter – number of iterations before exit; defaults to Inf,

which can occasionally result in empty arrays returned

retind – (bool) whether to return index of good values as

second part of a 2-tuple.

EXAMPLE:
from numpy import hist, linspace, randn
from analysis import removeoutliers
data = randn(1000)
hbins = linspace(-5,5,50)
d2 = removeoutliers(data, 1.5, niter=1)
hist(data, hbins)
hist(d2, hbins)
analysis.returnSections(time, dtmax=0.1)[source]

Return 2-tuples that are the indices of separate sections, as indicated by breaks in a continuous and always-increasing time series.

INPUTS:
time : 1D NumPy array

The time index of interest. Should be always increasing, such that numpy.diff(time) is always positive.

dtmax : float

Any break in ‘time’ equal to or larger than this indicates a new segment.

EXAMPLE:
import transit

# Simulate a time series with 30-minute sampling:
t1 = np.arange(0, 3.7, 0.5/24)
t2 = np.arange(5, 70, 0.5/24)
t3 = np.arange(70.2, 85, 0.5/24)
days = np.concatenate((t1, t2, t3))
ret = transit.returnSections(days, dtmax=0.1)

# If each segment was correctly identified, these print 'True':
print (t1==days[ret[0][0]:ret[0][1]+1]).all()
print (t2==days[ret[1][0]:ret[1][1]+1]).all()
print (t3==days[ret[2][0]:ret[2][1]+1]).all()
analysis.rv(p, jd=None, e=None, reteanom=False, tol=1e-08)[source]

Compute unprojected astrocentric RV of a planet for a given JD in m/s.

INPUTS:
p : planet object, or 5- or 6-sequence

planet object: see get_obj()

OR:

sequence: [period, t_peri, ecc, a, long_peri, gamma]

(if gamma is omitted, it is set to zero) (long_peri should be in radians!)

jd : NumPy array

Dates of observation (in same time system as t_peri).

e : NumPy array

Eccentric Anomaly of observations (can be precomputed to save time)

EXAMPLE:
jd = 2454693            # date: 2008/08/14
p  = getobj('55 Cnc e')   # planet: 55 Cnc e
vp = rv(p, jd)

returns vp ~ 1.47e5 [m/s]

The result will need to be multiplied by the sine of the inclination angle (i.e., “sin i”). Positive radial velocities are directed _AWAY_ from the observer.

To compute the barycentric radial velocity of the host star, scale the returned values by the mass ratio -m/(m+M).

SEE ALSO: getobj(), rvstar()

analysis.rveph(p, jd)[source]

Compute the most recently elapsed RV emphemeris of a given planet at a given JD. RV ephemeris is defined by the having radial velocity equal to zero.

EXAMPLE:
from analysis import getobj, rveph
jd = 2454693            # date: 2008/08/14
p  = getobj('55cnce')   # planet: 55 Cnc e
t = rveph(p, jd)

returns t ~

SEE ALSO: getobj(), phase()

analysis.rvstar(p, jd=None, e=None, reteanom=False, tol=1e-08)[source]

Compute radial velocity of a star which has an orbiting planet.

INPUTS:
p : planet object, or 5- or 6-sequence

planet object: see get_obj()

OR:

sequence: [period, t_peri, ecc, K, long_peri, gamma]

(if gamma is omitted, it is set to zero)

jd : NumPy array

Dates of observation (in same time system as t_peri).

e : NumPy array

Eccentric Anomaly of observations (can be precomputed to save time)

EXAMPLE:
jd = 2454693            # date: 2008/08/14
p  = getobj('55 Cnc e')   # planet: 55 Cnc e
vp = rv(p, jd)

Positive radial velocities are directed _AWAY_ from the observer.

SEE_ALSO:

rv(), getobj()

analysis.scale_mcmc_stepsize(accept, func, params, stepsize, z, sigma, numit=1000, scales=[0.1, 0.3, 1.0, 3.0, 10.0], args=(), nstep=1, posdef=None, holdfixed=None, retall=False, jointpars=None)[source]

Run generic_mcmc() and scale the input stepsize to match the desired input acceptance rate.

INPUTS:

mostly the same as for generic_mcmc(), but also with:

accept : float between 0 and 1

desired acceptance rate; typically between 0.15 - 0.5.

scales : sequence of floats

test scaling parameters; measure for these, then interpolate to get ‘accept’

retall : bool

if True, return tuple (scalefactor, acceptances, scales). Otherwise return only the scaler ‘scalefactor.’

REQUIREMENTS:

pylab (for pylab.interp())

analysis.snr(data, axis=None, nsigma=None)[source]
Compute the quantity:
data.mean(axis=axis)/data.std(axis=axis)

for the specified data array/vector along the specified axis.

‘nsigma’ is used to reject outliers.

Output will be a scalar (axis is None) or numpy array, as appropriate.

analysis.spliner(x, y, k=3, sig=5, s=None, fev=100, w=None, clip='both', verbose=False, plotfit=False, plotall=False, diag=False)[source]

Matplotlib’s polyfit with sigma-clipping rejection.

Do a scipy.interpolate.UnivariateSpline of order k of y to x.

Points whose fit residuals exeed s standard deviations are rejected and the fit is recalculated. Returned is a spline object.

Iterates so long as n_newrejections>0 AND n_iter<fev.

OPTIONAL INPUTS:
 

err: a set of errors for the data fev: number of function evaluations to call before stopping ‘diag’nostic flag: Return the tuple (p, chisq, n_iter) clip: ‘both’ – remove outliers +/- ‘s’ sigma from fit

‘above’ – remove outliers ‘s’ sigma above fit ‘below’ – remove outliers ‘s’ sigma below fit

analysis.stdfilt(vec, wid=3)[source]

Compute the standard deviation in a sliding window.

INPUTS:
vec : 1D sequence

data to filter

wid : int, odd

width of filter; ideally odd (not even).

analysis.stdfilt2d(data, filtersize, threshold=None, verbose=False)[source]

For now, we assume FILTERSIZE is odd, and that DATA is square!

filt = stdfilt2d(data, filtersize)

This is about the slowest way to do it, but it was easy to write.

analysis.stdr(x, nsigma=3, niter=inf, finite=True, verbose=False, axis=None)[source]

Return the standard deviation of an array after removing outliers.

INPUTS:

x – (array) data set to find std of

OPTIONAL INPUT:

nsigma – (float) number of standard deviations for clipping niter – number of iterations. finite – if True, remove all non-finite elements (e.g. Inf, NaN) axis – (int) axis along which to compute the mean.

EXAMPLE:
from numpy import *
from analysis import stdr
x = concatenate((randn(200),[1000]))
print std(x), stdr(x, nsigma=3)
x = concatenate((x,[nan,inf]))
print std(x), stdr(x, nsigma=3)
SEE ALSO: meanr(), medianr(), removeoutliers(),
numpy.isfinite()
analysis.stdres(data, bins=None, oversamp=None, dataindex=None)[source]

Compute the standard deviation in the residuals of a data series after average-binning by specified amounts.

INPUTS:
data - 1D numpy array

Data to analyze.

bins - sequence

Factors by which to bin down. If None, use 1:sqrt(data.size) and set dataindex=None.

oversamp - int

Number of times to shift, resample, and bin the data. Large values take longer, but give a less “noisy” estimate (which can be a problem at large bin sizes)

dataindex - 1D numpy array

Values across which data are indexed (i.e., ‘time’). If not None, bins apply to dataindex rather than to data and should be increasing intervals of dataindex (rather than the number of points to bin down by).

REQUIREMENTS:

Plotting and analysis tools, numpy

EXAMPLE:
import numpy as np
import analysis as an
import pylab as py

npts = 1e4
t = np.arange(npts)
data = np.random.normal(size=npts)
binfactors = np.arange(1, npts/2.+1)
bindown_result = an.stdres(data, binfactors, dataindex=t, oversamp=1)

py.figure()
py.subplot(211)
py.plot(t, data, 'k')
py.xlabel('Time')
py.ylabel('Data value')
py.minorticks_on()
py.title('Bin-down test: Gaussian Noise')

py.subplot(212)
py.loglog(binfactors, bindown_result, '-b', linewidth=2)
py.loglog(binfactors, data.std()/np.sqrt(binfactors), '--r')
py.xlabel('Binning factor')
py.ylabel('RMS of binned data')
py.legend(['Binned RMS', '1/sqrt(N)'])
analysis.test_eccentric_anomaly(ecc, manom, tol=1e-08)[source]

Test various methods of computing the eccentric anomaly.

ecc = scalar; manom = 1D NumPy array

Just run, e.g.:
ecc = 0.15
p_orb = 3.3
mean_anom = 2*pi*linspace(0, p_orb, 10000)/p_orb
an.test_eccentric_anomaly(ecc, mean_anom, tol=1e-10)
analysis.total_least_squares(data1, data2, data1err=None, data2err=None, print_results=False, ignore_nans=True, intercept=True, return_error=False, inf=10000000000.0)[source]

Use Singular Value Decomposition to determine the Total Least Squares linear fit to the data. (e.g. http://en.wikipedia.org/wiki/Total_least_squares) data1 - x array data2 - y array

if intercept:
returns m,b in the equation y = m x + b
else:
returns m

print tells you some information about what fraction of the variance is accounted for

ignore_nans will remove NAN values from BOTH arrays before computing

data1,data2 : np.ndarray
Vectors of the same length indicating the ‘x’ and ‘y’ vectors to fit
data1err,data2err : np.ndarray or None
Vectors of the same length as data1,data2 holding the 1-sigma error values

From https://code.google.com/p/agpy/

analysis.travisplanet(p)[source]

Generate a line of text for Travis Barman’s planet table.

INPUT: a planet object from getobj().

analysis.trueanomaly(ecc, eanom=None, manom=None)[source]

Calculate (Keplerian, orbital) true anomaly.

One optional input must be given.

INPUT:

ecc – scalar. orbital eccentricity.

OPTIONAL_INPUTS:
 
eanom – scalar or Numpy array. Eccentric anomaly. See

eccentricanomaly()

manom – scalar or sequence. Mean anomaly, equal to

2*pi*(t - t0)/period

analysis.wmean(a, w, axis=None)[source]

Perform a weighted mean along the specified axis.

INPUTS:
a : sequence or Numpy array

data for which weighted mean is computed

w : sequence or Numpy array

weights of data – e.g., 1./sigma^2

reterr : bool

If True, return the tuple (mean, err_on_mean), where err_on_mean is the unbiased estimator of the sample standard deviation.

SEE ALSO:

wstd()

analysis.wmeanfilt(vec, wid=3, w=None)[source]

Compute the (weighted) mean in a sliding window.

INPUTS:
vec : 1D sequence

data to filter

wid : int, odd

width of filter; ideally odd (not even).

analysis.wstd(a, w, axis=None)[source]

Perform a weighted standard deviation along the specified axis. If axis=None, then the weighted standard deviation of the entire array is computed.

Note that this computes the _sample_ standard deviation; Numpy/Scipy computes the _population_ standard deviation, which is greater by a factor sqrt(N/N-1). This effect is small for large datasets.

SEE ALSO:wmean()

Taken from http://en.wikipedia.org/wiki/Weighted_standard_deviation

analysis.xcorr2_qwik(img0, img1)[source]

Perform quick 2D cross-correlation between two images.

Images must be the same size.

Computed via squashing the images along each dimension and computing 1D cross-correlations.

Previous topic

Phase curve spectroscopic analysis, and associated subroutines.

Next topic

Plotting and analysis tools

This Page