What's in a spectral matrix?

olivertburton
Aug 11
7 min read

Updated: Aug 14

In this series of posts, I’m going to discuss a bit about the mathematics and metrics we use in spectral flow cytometry. Now, since I am not a mathematician by training, I’m probably going to get some of this wrong, so please chime in when you spot errors or misconceptions. My hope is that this lack of a mathematical background is going to force me to explain things in a relatively simple way.

Before we get started, I’ll point out a few excellent resources:

David Novo’s comprehensive article on spectral flow cytometry:

A comparison of spectral unmixing to conventional compensation for the calculation of fluorochrome abundances from flow cytometric data - Novo - 2022 - Cytometry Part A - Wiley Online Library

Peter Mage’s talk at the Babraham Spectral Symposium

Mathematics of spectral unmixing │Peter Mage │ Babraham Institute Spectral Symposium 2022

CYTO 2025 workshop: Florian Mair’s part at the beginning gives a good overview of what the metrics we use are and what they mean for our data.

ISAC Learning: CYTO 2025 Tutorials: The road to successful high-parameter spectral cytometry experiments: guidelines for control optimization, handling of autofluorescence and quality control

Peter Mage, Andrew Konecny and Florian Mair’s unmixing spread paper, which covers some of the metrics as well.

Measurement and prediction of unmixing-dependent spreading in spectral flow cytometry panels | bioRxiv

Okay, so today the plan is to try to explain what a spillover or spectral matrix is. A matrix is basically a collection of numbers in a table. When we talk about a spillover or spectral matrix, those numbers are the measurements of fluorescence signal from fluorophores in the detectors of the cytometer.

For example, here a matrix:

This is a set of observations of cells (one cell per row) with the measured values in each of the columns (detectors).

What do we call this "spillover" matrix? Spillover is a term from conventional flow. In conventional flow, fluorophores have an assigned "peak" channel, where we want to direct the signal, and off-target "spillover" channels, where we have unwanted signal. The unwanted signal from the spillover channels gets adjusted through compensation. We don't exactly have "spillover" in spectral flow because we don't have assigned detectors per fluorophore; that isn't to say we don't have off-peak signals, just that maybe calling them spillover isn't right. We tend to talk more about the spectra in spectral flow, so calling this a spectral matrix makes some sense. Formally, this matrix of multiple spectral signatures is the mixing matrix (see the David Novo article). It tells us how the signals in the raw measured data on the instrument have been mixed together, defining the component fluorophores. Furthermore, we use the mixing matrix to calculate the unmixing matrix, which is used to calculate the unmixed data. So, I'm going to try to refer to this "spillover matrix" as the mixing matrix from now on. In the same way, the term "raw data" for the measured values in the instrument's detectors is probably unhelpful. As scientists, we perform quite a few manipulations on our data during the analysis, so at any given point, we may be pointing to a different thing when we talk about "raw data" (perhaps the data prior to pre-processing, perhaps the data without batch correction, or maybe the unmixed flow data that was used to create the summary plot). In the same way we can refer to the mixing and unmixing matrices, we can talk about mixed and unmixed data, which maybe clarifies things a bit.

Now, let's look at how we generate the mixing matrix. This is where our single-stained controls come into play. These tell us (hopefully!) what the fluorophores look like. For this example, I'm going to primarily use FlowJo v10 and Excel, which are pretty commonly available so you can follow along.

Let's drop some single-stained control data in FlowJo:

After setting some gates on the beads or cells, go to the Cytometry drop-down under the Flowjo menu:

Here we select "Spectral Population Viewer". Note that you need to have a population selected in the list first.

FlowJo will prompt you to pick the detectors you want to look at. It's actually pretty good at detecting these, so hit "Accept".

We get these plots (after stretching the ridiculously tiny window):

A spectral ribbon plot in FlowJo for SBV515 on BioLegend compensation beads

What are we looking at here? These are the measured intensities for every point in the gated data. Because these are compensation beads, we have a nice bimodal distribution where some of the points are strongly positive, and others are close to zero. The more signal there is a given detector, the higher the values on the plot. The data here are plotted on a biexponential transformation, but that can be set according to your preferences using that big "T" button.

Underneath, we have a matrix of measurements for each cell (or point) in the data:

The data from the SBV515 FCS file, first ten detectors only

In order to extract the spectrum for this fluorophore accurately, we need to identify the points (beads in this case) that are stained.

Dragging this population on the Spectral Viewer gives us a nice overlay:

While you can use the negative beads to subtract the background, you'll get better results using unstained positive beads. Drag this to the "Negative" column to subtract it from the "Positive". While we're at it, let's remove the ungated control (blue) that contains both positive and negative events.

Okay, so this is our unnormalized set of measurements for this fluorophore (SBV515). To get to the spectrum we'll use for ordinary unmixing, we need to take just the median (or mean) for each detector, and we need to normalize it.

To display the median on this plot, under "Options" you can untick "Show Density" and then select "Configure". If you set the upper and lower percentiles both to 50th, you get this:

Now we've got the unnormalized spectrum.

What is normalization? Normalization means taking our different spectra and putting them on the same, fixed scale. This is useful for comparing them, particularly if we're looking at bright versus dim signals on cells, but it also affects the unmixing math, as we'll see later. In practice, this means dividing the value in each detector by the maximum value (the value in the peak detector). That way the peak detector's signal becomes 1 and everything else becomes a fraction of 1. You can also normalize to 100 (or any arbitrary number), which is what FlowJo does.

Under "Options" again, we now select "Normalized". I encourage you to always look at "Positive/Negative Values", which shows you whether the calculated spectrum is dipping below zero. If it is, you almost certainly have a mismatch in the negative control you're using for background subtraction. Basically, you shouldn't have data below zero because that means negative fluorescence. Selecting "Only Positive Values" hides this, but doesn't change the fact that your negative is mismatched and there are likely other problems in the calculation that aren't so obvious. Once you're satisfied there's no problem, go ahead and switch to "Only Positive Values" if you prefer that visualization.

Now we have a single row for our mixing matrix. We can get the actual numbers and put them into a spreadsheet as follows:

Right-click on the plot.
Select "Copy Content"
Open your spreadsheet and paste. This puts in a bunch of comma-separated values, which is not fantastic.
Use the little clipboard icon to open the "Text Import Wizard"
Select "Next" to get to step 2.
Select "Comma" and then "Finish"

If you do this a lot, you can probably set up some defaults so it happens automatically in Excel.

And now, as we see in the graph, the numbers peak in V6 (on the Aurora for SBV515).

Why are we going through all of these steps? If we don't, we can get data that looks like this:

Here I've added on the data from the SBUV795 control (red), just gating on the bead population. In this sample, I have many more negative beads than positive beads, so the normalized spectrum of the entire population is close to that of the unstained beads (orange).

In contrast, by gating on the positive beads and subtracting the negative bead signal, we get this:

Pretty different.

To get our full mixing matrix for the experiment, we repeat the process of extracting the spectra for all the fluorophores.

The spectral traces for the fluorophores in this experiment

Here's the mixing matrix in Excel (UV detectors only because it's really long):

And here's what the mixing matrix looks like normalized to 1 in R:

We can also plot this as a heatmap, which can be easier to read when there are lots of fluorophores (unlike this case):

Why do we normalize? Well, if we don't, a funny thing happens: the unmixed data get scaled by the magnitude of the signals in the mixing matrix. We'll look into this a bit more later, but basically when we do the unmixing, we're trying to figure out how much of each of the fluorophores is present in the mixed data. It's a bit like trying to reverse engineer the recipe for your favorite food by knowing the ingredients. If we give the recipe (mixing matrix) as 2 parts flour, 1 part butter and 1 part egg, we get a different result depending on whether each of those parts is measured in the same units (say grams weight) or if they're in unique units (say bags of flour, sticks of butter and dozens of eggs).

For this example, I'm going to use a different set of controls, which are cells rather than beads so we can see the effect better.

The normalized mixing matrix:

Every row (fluorophore) has a maximum value of 1.

Without normalization:

In this case, the maximum is 3 x 10^5. We've got way more signal in BV510 (CD14, highly expressed) than in BV605 (CD127, low expression) or in the autofluorescence (AF).

Unmixed both ways:

What's happened here? Well, we've basically scaled the output unmixed data to be between 0 and 1, relative to the intensities in the mixing matrix.

So, we normalize the mixing matrix so that the unmixed data tell us something about the intensities of expression of the fluorophores, or in other words, the abundances of the fluorophore signals the markers they represent.

Colibri Cytometry

What's in a spectral matrix?

Recent Posts

Comments