Data

A data object contains your entire n-dimensional dataset, including axes, units, channels, and relevant metadata. Once you have a data object, all of the other capabilities of WrightTools are immediately open to you, including processing, fitting, and plotting tools.

Here we highlight some key features of the data object. For a complete list of methods and attributes, see WrightTools.data.Data in the API docs.

Instantiation

From Supported File Types

WrightTools aims to provide user-friendly ways of creating data directly from common spectroscopy file formats. Here are the formats currently supported.

name	description	API
BrunoldrRaman	Files from Brunold lab resonance raman measurements	`from_BrunoldrRaman()`
Cary	Files from Varian’s Cary® Spectrometers	`from_Cary()`
COLORS	Files from Control Lots Of Research in Spectroscopy	`from_COLORS()`
JASCO	Files from JASCO optical spectrometers	`from_JASCO()`
KENT	Files from “ps control” by Kent Meyer	`from_KENT()`
Aramis	Horiba Aramis ngc binary files	`from_Aramis()`
Ocean Optics	.scope files from ocean optics spectrometers	`from_ocean_optics()`
PyCMDS	Files from PyCMDS	`from_PyCMDS()`
Shimadzu	Files from Shimadzu UV-VIS spectrophotometers	`from_shimadzu()`
SPCM	Files from Becker & Hickl spcm software	`from_spcm()`
Solis	Files from Andor Solis software	`from_Solis()`
Tensor 27	Files from Bruker Tensor 27 FT-IR	`from_Tensor27()`

Is your favorite format missing? It’s easy to add—promise! Check out Contributing.

These functions accept both local and remote (http/ftp) files as well as transparent compression (gz/bz2). Compression detection is based on the file name, and file names for remote links are as appears in the link. Many download links (such as those from osf.io or Google drive) do not include extensions in the download link, and thus will cause Warnings/be unable to accept compressed files. This can often be worked around by adding a variable to the end of the url such as https://osf.io/xxxxx/download?fname=file.csv.gz. Google Drive direct download links have the form https://drive.google.com/dc?id=XXXXXXXXXXXXXXXXXXXX (i.e. replace open in the “share” links with dc).

From Bare Arrays

Got bare numpy arrays and dreaming of data? It is possible to create data objects directly, as shown below.

# import
import numpy as np
import WrightTools as wt
# generate arrays for example
def my_resonance(xi, yi, intensity=1, FWHM=500, x0=7000):
    def single(arr, intensity=intensity, FWHM=FWHM, x0=x0):
        return intensity*(0.5*FWHM)**2/((xi-x0)**2+(0.5*FWHM)**2)
    return single(xi) * single(yi)
xi = np.linspace(6000, 8000, 75)[:, None]
yi = np.linspace(6000, 8000, 75)[None, :]
zi = my_resonance(xi, yi)
# package into data object
data = wt.Data(name='example')
data.create_variable(name='w1', units='wn', values=xi)
data.create_variable(name='w2', units='wn', values=yi)
data.create_channel(name='signal', values=zi)
data.transform('w1', 'w2')

Note that NumPy has functions for reading data arrays from text files. Our favorite is genfromtxt. Lean on these functions to read in data from unsuported file formats, then pass in the data as arrays. Of course, if you find yourself processing a lot of data from a particular file format, consider contributing a new from function to WrightTools.

Having trouble connecting the WrightTools Data structure to bare numpy arrays? We have a notebook that takes a look at how many common numpy.ndarray operations– slicing, element-wise math, broadcasting, etc.–have analogues within the WrightTools data structure:

Creating Compressed Datasets

WrightTools can transparently read and create compressed datasets by passing arguments to create_variable() or create_channel(). These arguments are the same as are passed to h5py’s create_dataset method.

data = wt.Data(name='example')
data.create_variable(name='w1', units='wn', shape=(1024, 1024), compression="gzip")
data.create_channel(name='signal', shape=(1024, 1024), compression="gzip", compression_opts=9)

Structure & Attributes

So what is a data object anyway? To put it simply, Data is a collection of WrightTools.data.Axis and WrightTools.data.Channel objects. WrightTools.data.Axis objects are composed of WrightTools.data.Variable objects.

attribute	tuple of…
`axes`	`Axis` objects
`constants`	`Constant` objects
`channels`	`Channel` objects
`variables`	`Variable` objects

As mentioned above, the axes and channels within data can be accessed within the data.axes and data.channels lists. Data also supports natural naming, so axis and channel objects can be accessed directly according to their name. The natural syntax is recommended, as it tends to result in more readable code.

>>> data.axis_expressions
('w1', 'w2')
>>> data.w2 == data.axes[1]
True
>>> data.channel_names
('signal', 'pyro1', 'pyro2', 'pyro3')
>>> data.pyro2 == data.channels[2]
True

The order of axes and channels is arbitrary. However many methods within WrightTools operate on the zero-indexed channel by default. For this reason, you can bring your favorite channel to zero-index using bring_to_front().

Variable

The WrightTools.data.Variable class holds key coordinates of the data object. One Variable instance exists for each recorded independent variable. This includes scanned optomechanical hardware, but also still hardware, and other variables like lab time. A typical data object will have many variables (each a multidimensional array). Variables have the following key attributes:

attribute	description
`label`	LaTeX-formatted label, appropriate for plotting
`max()`	variable maximum
`min()`	variable minimum
`natural_name`	variable name
`units`	variable units

Axis

The WrightTools.data.Axis class defines the coordinates of a data object. Each Axis contains an expression, which dictates its relationship with one or more variables. Given 5 variables with names ['w1', 'w2', 'wm', 'd1', 'd2'] , example valid expressions include 'w1', 'w1=wm', 'w1+w2', '2*w1', 'd1-d2', and 'wm-w1+w2'. WrightTools ignores the space character in expressions, so 'w1 = w2' will be interpreted as 'w1=w2'. Axes behave like arrays: you can slice into them, view their shape, get a min and max etc. But actually axes do not contain any new array information: they simply refer to the Variable arrays. Axes have the following key attributes:

attribute	description
`label()`	LaTeX-formatted label, appropriate for plotting
`min()`	coordinates minimum, in current units
`max()`	coordinates maximum, in current units
`natural_name`	axis name
`units`	current axis units (change with `convert()`)
`variables`	component variables
`expression`	expression

Constant

WrightTools.data.Constant objects are a special subclass of Axis objects, which is expected to be a single value. Constant adds the value to to the label attribute, suitable for titles of plots to identify static values associated with the plot. Note that there is nothing enforcing that the value is actually static: constants still have shapes and can be indexed to get the underlying numpy array.

You can control how this label is generated using the attributes format_spec an round_spec. label uses the python builtin format, an thus format_spec is a specification as in the Format Specification Mini-Language. Common examples would be “0.2f” or “0.3e” for decimal representation with two digits past the decimal and engineers notation with 3 digits past the decimal, respectively. round_spec allows you to control the rounding of your number via the builtin round(). For instance, if you want a number rounded to the hundreds position, but represented as an integer, you may use round_spec=-2; format_spec="0.0f".

For example, if you have a constant with value 123.4567 nm, a format_spec of 0.3f, and a round_spec of 2, you will get a label something like '$\\mathsf{\\lambda_{1}\\,=\\,123.460\\,nm}$', which will render as $\mathsf{\lambda_{1}\,=\,123.460\,nm}$.

An example of using constants/constant labels for plotting can be found in the gallery: Custom Figure.

In addition to the above attributes, constants add:

attribute	description
`format_spec`	Format specification for how to represent the value, as in `format()`.
`round_spec`	Specify which digit to round to, as in round()
`label`	LaTeX formatted label which includes a symbol and the constant value.
`value`	The mean (ignoring NaNs) of the evaluated expression.
`std`	The standard deviation of the points used to compute the value.

Channel

The WrightTools.data.Channel class contains the n-dimensional signals. A single data object may contain multiple channels corresponding to different detectors or measurement schemes. Channels have the following key attributes:

attribute	description
`label`	LaTeX-formatted label, appropriate for plotting
`mag()`	channel magnitude (furthest deviation from null)
`max()`	channel maximum
`min()`	channel minimum
`name`	channel name
`null`	channel null (value of zero signal)
`signed`	flag to indicate if channel is signed

Processing

Units aware & interpolation ready

Experiments are taken over all kinds of dynamic range, with all kinds of units. You might wish to take the difference between a UV-VIS scan taken from 400 to 800 nm, 1 nm steps and a different scan taken from 1.75 to 2.00 eV, 1 meV steps. This can be a huge pain! Even if you converted them to the same unit system, you would still have to deal with the different absolute positions of the two coordinate arrays. map_variable() allows you to easily obtain a data object mapped onto a different set of coordinates.

WrightTools data objects know all about units, and they are able to use interpolation to map between different absolute coordinates. Here we list some of the capabilities that are enabled by this behavior.

method	description	gallery
`heal()`	use interpolation to guess the value of NaNs within a channel	Heal
`join()`	join together multiple data objects, accounting for dimensionality and overlap	Join
`map_variable()`	re-map data coordinates	Map-Variable

Dimensionality without the cursing

Working with multidimensional data can be intimidating. What axis am I looking at again? Where am I in the other axis? Is this slice unusual, or do they all look like that?

WrightTools tries to make multi-dimensional data easy to work with. The following methods deal directly with dimensionality manipulation.

method	description	gallery
`chop()`	chop data into a list of lower dimensional data
`collapse()`	destroy one dimension of data using a mathematical strategy
`moment()`	destroy one dimension of a channel by taking the nth moment
`split()`	split data at a series of coordinates, without reducing dimensionality	Split
`transform()`	transform the data on to a new combination of variables as axes	DOVE transform Fringes transform

WrightTools seamlessly handles dimensionality throughout. Artists is one such place where dimensionality is addressed explicitly.

Processing without the pain

There are many common data processing operations in spectroscopy. WrightTools endeavors to make these operations easy. A selection of important methods follows.

method	description	gallery
`clip()`	clip values outside of a given range (method of `Channel`)
`gradient()`	take the derivative along an axis	Gradient
`join()`	join multiple data objects into one	Join
`level()`	level the edge of data along a certain axis	Level
`smooth()`	smooth a channel via convolution with a n-dimensional Kaiser window