Computer Vision · Domain Adaptation · Deep Learning

Strawberry Ripeness Detection Under Day-to-Night Domain Shift

A computer vision project focused on improving ripeness classification under lighting distribution shift. The system uses CycleGAN based domain adaptation to translate daytime strawberry imagery into an LED illuminated night domain, then evaluates whether synthetic night samples improve classifier robustness.

Task: Ripeness classification under domain shift
Dataset: StrawDI_Db1
Adaptation: CycleGAN (unpaired image-to-image)
Stack: Python · PyTorch
Year: 2026

Overview

Soft-fruit harvesting robots operate in polytunnels that are bright and naturally lit during the day and switch to LED illumination at night. A ripeness classifier trained only on daytime imagery often degrades sharply once it sees the cooler, spectrally narrower LED domain. The engineering question is whether the model can be adapted without requiring a fully labelled night dataset.

I built a generative domain adaptation pipeline that trains a CycleGAN to translate day imagery into a night style, uses the translated samples during classifier training and evaluates performance on held out data with accuracy, precision, recall and confusion matrices.

Problem framing

The interesting failure here is not the average-case error on a clean test set; it is the brittleness under distribution shift. Two framings drive the work:

Distribution shift, not simple noise. LED illumination changes the spectral content of the light, the colour balance, the contrast envelope and the highlight distribution. It is closer to a coordinated covariate shift than to additive noise, so standard brightness or saturation jitter is not enough on its own.
Label scarcity is the real constraint. Capturing a high-quality night dataset of ripe / unripe strawberries on a moving robot is expensive and slow. A method that reuses existing daytime labels is much more deployable.

Dataset

StrawDI_Db1 is a public strawberry imagery dataset used as the source domain. Images are cropped and pre-processed into ripeness-classification inputs. Splits are constructed at the source-image level (not the patch level) so that the same physical fruit cannot leak between train, validation and test.

The night domain is built two ways: (a) a small target-domain unlabelled set under LED-style lighting, used by CycleGAN as the "target" for unpaired translation, and (b) a held-out evaluation set for measuring true post-adaptation performance. Class imbalance is handled at the loss / sampling level rather than by oversampling alone.

Approach

Source-only baseline. A ripeness classifier trained on daytime StrawDI_Db1 and evaluated directly on the night domain. This creates the no-adaptation reference point for measuring the cost of distribution shift.
CycleGAN day → night translation. Two generators (G_D→N and G_N→D) and two discriminators trained with adversarial and cycle consistency losses on unpaired day and night sets. The cycle loss is what makes the translation usable for downstream learning because it encourages the generator to preserve fruit-level structure rather than changing the label.
Adapted classifier. Trained on a mix of original daytime imagery and CycleGAN-translated "synthetic night" imagery, with the daytime labels carried across via cycle-consistent translation.
Evaluation. Both classifiers compared on the real night evaluation set using accuracy, precision, recall, F1 and per-class confusion matrices.

Why CycleGAN

CycleGAN is appropriate here for one specific reason: the day and night sets are unpaired. There are no exact day/night photo pairs of the same fruit at the same angle in a realistic field workflow. Methods that require paired data are therefore a poor fit for the data collection constraints.

The cycle-consistency constraint provides the missing supervisory signal: a fruit translated day → night → day must look like the original fruit. That is what stops the generator from converting unripe strawberries into ripe ones to fool the discriminator. Without that constraint, translation artefacts could silently damage downstream classifier accuracy.

Pipeline

Step 01
StrawDI_Db1
Daytime source domain
Step 02
Unlabelled night set
LED target domain
Step 03
CycleGAN
Unpaired day ↔ night
Step 04
Cycle-consistent labels
Carry day annotations over
Step 05
Augmentation
Jitter / crop / flip
Step 06
Ripeness classifier
PyTorch CNN backbone
Step 07
Held-out night eval
Real LED imagery
Step 08
Confusion matrices
Per-class diagnosis

Each stage is testable in isolation. The CycleGAN is evaluated on translation quality (FID-style sanity checks and cycle reconstruction error) independently of the downstream classifier, which reduces the risk of one stage hiding failure modes in another.

Evaluation protocol

Train / val / test splits are constructed at source-image level so that the same fruit instance cannot appear in two splits. The night evaluation set is fully held out from CycleGAN training, which is essential because the translator must not have seen the test backgrounds during training.

Findings

Source-only models trained on daytime imagery degrade visibly under the LED domain. The main failure pattern is misclassifying borderline-ripe fruit as unripe when the LED spectrum compresses red contrast.
Naive photometric augmentations (brightness / saturation / hue jitter) help marginally but do not close the day → night gap. They cannot reproduce the combination of colour-balance shift, specular highlights and ambient cast that CycleGAN learns from real night imagery.
CycleGAN-driven adaptation recovers the bulk of the lost performance using only daytime labels and an unlabelled night image set. This is the operationally important property, because the labelling cost on a moving robot is the actual bottleneck.
Translation quality matters more than translation quantity. Adding more poorly translated synthetic samples destabilises classifier training; a smaller, more cycle-consistent set does better.

Engineering notes

Training is reproducible via fixed seeds for the CycleGAN and the downstream classifier, with config files under version control alongside the code.
CycleGAN training is checkpointed every N epochs and the classifier is evaluated against multiple translator checkpoints. This guards against the classic GAN failure mode where the "best-looking" generator is not actually the best for downstream tasks.
All augmentations are deterministic in evaluation mode. Random flips / crops in test time would inflate apparent accuracy.
The pipeline is designed to fail loudly: missing dataset folders, unbalanced splits or NaN losses raise immediately rather than silently corrupting results.

Limitations

CycleGAN translations are an approximation of the night domain, not the night domain. There will be some classes of artefact (specular highlights on wet fruit, sensor-specific colour casts) that the translator under-models.
The classifier is binary or coarse-grained ripeness. Real harvesting decisions need calibrated multi-stage outputs with confidence intervals, not just a label.
Evaluation is offline. On-robot inference cost, latency under image bursts, and full harvesting impact are not measured here. Those belong to a hardware in the loop study.
No fairness across cultivars / farms is claimed. Generalisation across farms is the next test.

Link to the Dogtooth dissertation

This project sits upstream of the MSc dissertation with Dogtooth Technologies. The dissertation focuses on sensor-based collision intelligence rather than vision, but the underlying engineering question is the same: how do you keep a model trustworthy when the deployment environment differs from the training data? The strawberry case study is a clean, well-bounded version of that question.

Future work

Replace CycleGAN with newer diffusion-based unpaired translators (e.g. cycle-consistent diffusion) and compare against the GAN baseline on the same night evaluation set.
Move from binary ripeness to a calibrated multi-stage output with proper reliability diagrams, so the harvest decision can use confidence as an input.
Cross-farm evaluation: train on one farm, evaluate on another, with and without per-farm CycleGAN fine-tuning.
Quantised on-device inference benchmarks on the kind of compute that actually ships on a harvesting robot.