FluoCLIP: Stain-Aware Focus Quality Assessment in Fluorescence Microscopy

Hyejin Park1*, Jiwon Yoon1*, Sumin Park1, Suree Kim3, Sinae Jang2, Eunsoo Lee2,
Dongmin Kang2†, Dongbo Min1† (*: Equal Contribution, †: Corresponding Author)
1Division of AI and Software, Ewha Womans University,
2Bioimaging Data Curation Center(BDCC), Dept. of Life Science, Ewha Womans University,
3Analytical Solution Team, Daesang Corporation

Abstract

Accurate focus quality assessment (FQA) in fluorescence microscopy remains challenging, as the stain-dependent optical properties of fluorescent dyes cause abrupt and heterogeneous focus shifts. However, existing datasets and models overlook this variability, treating focus quality as a stain-agnostic problem. In this work, we formulate the task of stain-aware FQA, emphasizing that focus behavior in fluorescence microscopy must be modeled as a function of staining characteristics. Through quantitative analysis of existing datasets (FocusPath, BBBC006) and our newly curated FluoMix, we demonstrate that focus–rank relationships vary substantially across stains, underscoring the need for stain-aware modeling in fluorescence microscopy. To support this new formulation, we propose FluoMix, the first dataset for stain-aware FQA that encompasses multiple tissues, fluorescent stains, and focus variations. Building on this dataset, we propose FluoCLIP, a two-stage vision-language framework that leverages CLIP’s alignment capability to interpret focus quality in the context of biological staining. In the stain-grounding phase, FluoCLIP learns general stain representations by aligning textual stain tokens with visual features, while in the stain-guided ranking phase, it optimizes stain-specific rank prompts for ordinal focus prediction. Together, our formulation, dataset, and framework establish the first foundation for stain-aware FQA, and FluoCLIP achieves strong generalization across diverse fluorescence microscopy conditions.

Key Contributions

  1. We propose FluoCLIP, a two-stage ordinal vision–language framework that learns stain-specific grounding and stain-guided ranking for robust FQA.
  2. We introduce FluoMix, a new dataset featuring diverse fluorescent stains and tissue-level focus variations, providing the first dataset for stain-aware FQA in fluorescence microscopy.
  3. We formulate the task of Stain-Aware FQA in fluorescence microscopy, highlighting the need to model stain-dependent focus behavior.

FluoMix: A Dataset for Stain-Aware FQA

Existing datasets for focus quality assessment are limited in scope and fail to represent the diversity of fluorescence microscopy. To address this, we introduce FluoMix, a multi-tissue, multi-stain dataset specifically designed for stain-aware FQA. FluoMix aggregates fluorescence microscopy images across brain, lung, and liver tissues to capture diverse optical and biological characteristics. Each field of view includes up to four distinct stains and is acquired as a complete z-stack (covering the full range from sharp to severely blurred slices). By reflecting the spatial and biological heterogeneity of real tissue specimens, FluoMix establishes a practical foundation for robust, stain-aware focus assessment.

FluoMix Sample 1
FluoMix Sample 2

Figure 1 : (Left)Examples of dataset classes and (Right)Sample images illustrating stain diversity in the three datasets.

Table 1: Overview of the FluoMix dataset. Each stain is paired with distinct protein markers, reflecting the heterogeneity of signals across tissues.

Brain Tissue
Dataset Hoechst 34580 Alexa 488 Cy3 Alexa 647 # Sets
D1nucleusIba-1Tuj-1Collagen IV504
D2nucleusNFMTHCollagen IV152
D3nucleusNeuNTHCollagen IV554
D4nucleusGFAPTuj-1CD31623
Lung Tissue
D5nucleusCD31VimentinCollagen IV634
D6nucleusCD31VimentinCollagen IV596
Liver Tissue
D7nucleusCK19ClaudinZO-196

FluoCLIP: Stain-Aware FQA Framework

FluoCLIP Model Architecture

Figure 1: Overview of the FluoCLIP framework, featuring Stage 1 (Stain-Grounding) and Stage 2 (Stain-Guided Ranking).

To handle the heterogeneous focus degradation unique to fluorescence imaging, we propose FluoCLIP, a two-stage vision-language framework:

  1. Stage 1: Stain-Grounding: The model aligns learnable stain tokens with CLIP visual representations so that the text encoder acquires fluorescence-specific semantics. We freeze the pretrained text encoder and attach a compact adapter that learns stain-specific attributes, preserving linguistic consistency while enabling domain adaptation.
  2. Stage 2: Stain-Guided Ranking: The learned stain embeddings are used to condition focus prediction on stain-dependent appearance variations. A conditioning network projects base rank embeddings into a stain-guided space, and intermediate ranks are obtained through interpolation. This allows the model to modulate its focus perception according to the unique characteristics of each fluorophore.

Table 2: Ablation study on FluoCLIP components on the FluoMix dataset. We evaluate the contribution of grounded stain tokens ($S$) and stain-guided rank modules ($\tilde{R}^S$).

Type Configuration $S^{\text{plain}}$ $S^{\text{train}}$ $S$ $R$ $\tilde{R}^S$ Acc. (%) Step Gain Total Gain
(A)Baseline (OrdinalCLIP)83.12±0.41--
(B)(A) + Plain Stain Token83.21±3.930.09-
(C)(A) + Learnable Stain Token81.38±0.63-1.74-
(D)(A) + Grounded Stain Token84.28±0.881.161.16
(E)(D) + Stain-Guided Rank Token85.21±0.880.932.09

Empirical Analysis of Stain-Dependent Focus Behavior

The mean SF values exhibit a generally monotonic decrease as the focus level moves from in-focus to defocused regions across all three datasets. This confirms that the SF metric can reasonably capture the overall directionality of focus degradation implied by the ordinal labels, validating its use as a quantitative sharpness indicator for subsequent analysis.

We then analyze the distribution of the SF metric across different stains within each dataset. In FocusPath, which consists of bright-field H&E-stained whole-slide images, the per-stain variance is modest, with stains exhibiting closely aligned SF ranges and largely overlapping distributions, indicating stain-invariant focus behavior. In contrast, BBBC006 and FluoMix exhibit substantial stain-dependent variability. Different fluorescent stains occupy distinct SF ranges with largely non-overlapping variances, demonstrating that focus degradation is highly stain-dependent in fluorescence microscopy.

This observation supports the hypothesis that focus quality should be modeled as a function of stain characteristics rather than assuming a universal focus-rank mapping.

Rank Curve
FocusPath
BBBC006
FluoMix

Figure 2 : Empirical Analysis of Stain-Dependent Focus Behavior: (a) Mean spatial frequency (SF) versus focus rank for three datasets; the shaded region indicates ±1 standard deviation across samples. SF decreases monotonically with increasing rank, confirming that SF reliably captures focus degradation. (b)–(d) Boxplots of SF values across stains for each dataset (x-axis: stain identity, y-axis: SF distribution). FocusPath shows stain-invariant SF trends, wherease BBBC006 and FluoMix display pronounced stain-dependent variability.

Spatial Frequency (SF) Metric

In our analysis, we utilize the Spatial Frequency (SF) metric as a quantitative proxy for image sharpness to analyze stain-dependent focus behavior. Higher SF values indicate sharper, more in-focus images.

Given an image $I \in \mathbb{R}^{M \times N}$, the row frequency ($RF$) and column frequency ($CF$) components are defined as:

\[RF = \sqrt{\frac{1}{(M-1)N} \sum_{i=1}^{M-1}\sum_{j=1}^{N} (I(i+1,j)-I(i,j))^2}\] \[CF = \sqrt{\frac{1}{M(N-1)} \sum_{i=1}^{M}\sum_{j=1}^{N-1} (I(i,j+1)-I(i,j))^2}\]

The overall Spatial Frequency ($SF$) is then calculated as:

\[SF = \sqrt{RF^2 + CF^2}\]

Source: Image quality measures and their performance. IEEE Trans. Commun..

Citation

@article{park2026fluoclip,
  title={FluoCLIP: Stain-Aware Focus Quality Assessment in Fluorescence Microscopy},
  author={Park, Hyejin and Yoon, Jiwon and Park, Sumin and Kim, Suree and Jang, Sinae and Lee, Eunsoo and Kang, Dongmin and Min, Dongbo},
  journal={arXiv preprint arXiv:2602.23791},
  year={2026}
}