FluoCLIP: Stain-Aware Focus Quality Assessment in Fluorescence Microscopy

CVPR 2026

SATURDAY, JUNE 6 16:45 - 18:45(CST) Poster Session 4 & Exhibit Hall (ExHall A)

Abstract

Accurate focus quality assessment (FQA) in fluorescence microscopy remains challenging, as the stain-dependent optical properties of fluorescent dyes cause abrupt and heterogeneous focus shifts. However, existing datasets and models overlook this variability, treating focus quality as a stain-agnostic problem. In this work, we formulate the task of stain-aware FQA, emphasizing that focus behavior in fluorescence microscopy must be modeled as a function of staining characteristics. Through quantitative analysis of existing datasets (FocusPath, BBBC006) and our newly curated FluoMix, we demonstrate that focus–rank relationships vary substantially across stains, underscoring the need for stain-aware modeling in fluorescence microscopy. To support this new formulation, we propose FluoMix, the first dataset for stain-aware FQA that encompasses multiple tissues, fluorescent stains, and focus variations. Building on this dataset, we propose FluoCLIP, a two-stage vision-language framework that leverages CLIP’s alignment capability to interpret focus quality in the context of biological staining. In the stain-grounding phase, FluoCLIP learns general stain representations by aligning textual stain tokens with visual features, while in the stain-guided ranking phase, it optimizes stain-specific rank prompts for ordinal focus prediction. Together, our formulation, dataset, and framework establish the first foundation for stain-aware FQA, and FluoCLIP achieves strong generalization across diverse fluorescence microscopy conditions.

Key Contributions

We propose FluoCLIP, a two-stage ordinal vision–language framework that learns stain-specific grounding and stain-guided ranking for robust FQA.
We introduce FluoMix, a new dataset featuring diverse fluorescent stains and tissue-level focus variations, providing the first dataset for stain-aware FQA in fluorescence microscopy.
We formulate the task of Stain-Aware FQA in fluorescence microscopy, highlighting the need to model stain-dependent focus behavior.

FluoMix: A Dataset for Stain-Aware FQA

Existing datasets for focus quality assessment are limited in scope and fail to represent the diversity of fluorescence microscopy. To address this, we introduce FluoMix, a multi-tissue, multi-stain dataset specifically designed for stain-aware FQA. FluoMix aggregates fluorescence microscopy images across brain, lung, and liver tissues to capture diverse optical and biological characteristics. Each field of view includes up to four distinct stains and is acquired as a complete z-stack (covering the full range from sharp to severely blurred slices). By reflecting the spatial and biological heterogeneity of real tissue specimens, FluoMix establishes a practical foundation for robust, stain-aware focus assessment.

Figure 1 : (Left)Examples of dataset classes and (Right)Sample images illustrating stain diversity in the three datasets.

Table 1: Overview of the FluoMix dataset. Each stain is paired with distinct protein markers, reflecting the heterogeneity of signals across tissues.

Brain Tissue
Dataset	Hoechst 34580	Alexa 488	Cy3	Alexa 647	# Sets
D1	nucleus	Iba-1	Tuj-1	Collagen IV	504
D2	nucleus	NFM	TH	Collagen IV	152
D3	nucleus	NeuN	TH	Collagen IV	554
D4	nucleus	GFAP	Tuj-1	CD31	623
Lung Tissue
D5	nucleus	CD31	Vimentin	Collagen IV	634
D6	nucleus	CD31	Vimentin	Collagen IV	596
Liver Tissue
D7	nucleus	CK19	Claudin	ZO-1	96

FluoCLIP: Stain-Aware FQA Framework

To handle the heterogeneous focus degradation unique to fluorescence imaging, we propose FluoCLIP, a two-stage vision-language framework:

Stage 1: Stain-Grounding: The model aligns learnable stain tokens with CLIP visual representations so that the text encoder acquires fluorescence-specific semantics. We freeze the pretrained text encoder and attach a compact adapter that learns stain-specific attributes, preserving linguistic consistency while enabling domain adaptation.
Stage 2: Stain-Guided Ranking: The learned stain embeddings are used to condition focus prediction on stain-dependent appearance variations. A conditioning network projects base rank embeddings into a stain-guided space, and intermediate ranks are obtained through interpolation. This allows the model to modulate its focus perception according to the unique characteristics of each fluorophore.

Table 2: Ablation study on FluoCLIP components on the FluoMix dataset. We evaluate the contribution of grounded stain tokens ($S$) and stain-guided rank modules ($\tilde{R}^S$).

Type	Configuration	$S^{\text{plain}}$	$S^{\text{train}}$	$S$	$R$	$\tilde{R}^S$	Acc. (%)	Step Gain	Total Gain
(A)	Baseline (OrdinalCLIP)				✓		83.12±0.41	-	-
(B)	(A) + Plain Stain Token	✓			✓		83.21±3.93	0.09	-
(C)	(A) + Learnable Stain Token		✓		✓		81.38±0.63	-1.74	-
(D)	(A) + Grounded Stain Token			✓	✓		84.28±0.88	1.16	1.16
(E)	(D) + Stain-Guided Rank Token			✓		✓	85.21±0.88	0.93	2.09

Empirical Analysis of Stain-Dependent Focus Behavior

The mean SF values exhibit a generally monotonic decrease as the focus level moves from in-focus to defocused regions across all three datasets. This confirms that the SF metric can reasonably capture the overall directionality of focus degradation implied by the ordinal labels, validating its use as a quantitative sharpness indicator for subsequent analysis.

We then analyze the distribution of the SF metric across different stains within each dataset. In FocusPath, which consists of bright-field H&E-stained whole-slide images, the per-stain variance is modest, with stains exhibiting closely aligned SF ranges and largely overlapping distributions, indicating stain-invariant focus behavior. In contrast, BBBC006 and FluoMix exhibit substantial stain-dependent variability. Different fluorescent stains occupy distinct SF ranges with largely non-overlapping variances, demonstrating that focus degradation is highly stain-dependent in fluorescence microscopy.

This observation supports the hypothesis that focus quality should be modeled as a function of stain characteristics rather than assuming a universal focus-rank mapping.

Figure 2 : Empirical Analysis of Stain-Dependent Focus Behavior: (a) Mean spatial frequency (SF) versus focus rank for three datasets; the shaded region indicates ±1 standard deviation across samples. SF decreases monotonically with increasing rank, confirming that SF reliably captures focus degradation. (b)–(d) Boxplots of SF values across stains for each dataset (x-axis: stain identity, y-axis: SF distribution). FocusPath shows stain-invariant SF trends, wherease BBBC006 and FluoMix display pronounced stain-dependent variability.

Spatial Frequency (SF) Metric

In our analysis, we utilize the Spatial Frequency (SF) metric as a quantitative proxy for image sharpness to analyze stain-dependent focus behavior. Higher SF values indicate sharper, more in-focus images.

Given an image $I \in \mathbb{R}^{M \times N}$, the row frequency ($RF$) and column frequency ($CF$) components are defined as:

\[RF = \sqrt{\frac{1}{(M-1)N} \sum_{i=1}^{M-1}\sum_{j=1}^{N} (I(i+1,j)-I(i,j))^2}\] \[CF = \sqrt{\frac{1}{M(N-1)} \sum_{i=1}^{M}\sum_{j=1}^{N-1} (I(i,j+1)-I(i,j))^2}\]

The overall Spatial Frequency ($SF$) is then calculated as:

\[SF = \sqrt{RF^2 + CF^2}\]

Source: Image quality measures and their performance. IEEE Trans. Commun..

Citation

@article{park2026fluoclip,
  title={FluoCLIP: Stain-Aware Focus Quality Assessment in Fluorescence Microscopy},
  author={Park, Hyejin and Yoon, Jiwon and Park, Sumin and Kim, Suree and Jang, Sinae and Lee, Eunsoo and Kang, Dongmin and Min, Dongbo},
  journal={arXiv preprint arXiv:2602.23791},
  year={2026}
}