Strawberry Ripeness Detection Under Day-to-Night Domain Shift
A computer vision project focused on improving ripeness classification under lighting distribution shift. The system uses CycleGAN based domain adaptation to translate daytime strawberry imagery into an LED illuminated night domain, then evaluates whether synthetic night samples improve classifier robustness.
- Task
- Ripeness classification under domain shift
- Dataset
- StrawDI_Db1
- Adaptation
- CycleGAN (unpaired image-to-image)
- Stack
- Python · PyTorch
- Year
- 2026
Overview
Soft-fruit harvesting robots operate in polytunnels that are bright and naturally lit during the day and switch to LED illumination at night. A ripeness classifier trained only on daytime imagery often degrades sharply once it sees the cooler, spectrally narrower LED domain. The engineering question is whether the model can be adapted without requiring a fully labelled night dataset.
I built a generative domain adaptation pipeline that trains a CycleGAN to translate day imagery into a night style, uses the translated samples during classifier training and evaluates performance on held out data with accuracy, precision, recall and confusion matrices.
Problem framing
The interesting failure here is not the average-case error on a clean test set; it is the brittleness under distribution shift. Two framings drive the work:
- Distribution shift, not simple noise. LED illumination changes the spectral content of the light, the colour balance, the contrast envelope and the highlight distribution. It is closer to a coordinated covariate shift than to additive noise, so standard brightness or saturation jitter is not enough on its own.
- Label scarcity is the real constraint. Capturing a high-quality night dataset of ripe / unripe strawberries on a moving robot is expensive and slow. A method that reuses existing daytime labels is much more deployable.
Dataset
StrawDI_Db1 is a public strawberry imagery dataset used as the source domain. Images are cropped and pre-processed into ripeness-classification inputs. Splits are constructed at the source-image level (not the patch level) so that the same physical fruit cannot leak between train, validation and test.
The night domain is built two ways: (a) a small target-domain unlabelled set under LED-style lighting, used by CycleGAN as the "target" for unpaired translation, and (b) a held-out evaluation set for measuring true post-adaptation performance. Class imbalance is handled at the loss / sampling level rather than by oversampling alone.
Approach
- Source-only baseline. A ripeness classifier trained on daytime StrawDI_Db1 and evaluated directly on the night domain. This creates the no-adaptation reference point for measuring the cost of distribution shift.
- CycleGAN day → night translation. Two generators (GD→N and GN→D) and two discriminators trained with adversarial and cycle consistency losses on unpaired day and night sets. The cycle loss is what makes the translation usable for downstream learning because it encourages the generator to preserve fruit-level structure rather than changing the label.
- Adapted classifier. Trained on a mix of original daytime imagery and CycleGAN-translated "synthetic night" imagery, with the daytime labels carried across via cycle-consistent translation.
- Evaluation. Both classifiers compared on the real night evaluation set using accuracy, precision, recall, F1 and per-class confusion matrices.
Why CycleGAN
CycleGAN is appropriate here for one specific reason: the day and night sets are unpaired. There are no exact day/night photo pairs of the same fruit at the same angle in a realistic field workflow. Methods that require paired data are therefore a poor fit for the data collection constraints.
The cycle-consistency constraint provides the missing supervisory signal: a fruit translated day → night → day must look like the original fruit. That is what stops the generator from converting unripe strawberries into ripe ones to fool the discriminator. Without that constraint, translation artefacts could silently damage downstream classifier accuracy.
Pipeline
Step 01
StrawDI_Db1
Daytime source domain
Step 02
Unlabelled night set
LED target domain
Step 03
CycleGAN
Unpaired day ↔ night
Step 04
Cycle-consistent labels
Carry day annotations over
Step 05
Augmentation
Jitter / crop / flip
Step 06
Ripeness classifier
PyTorch CNN backbone
Step 07
Held-out night eval
Real LED imagery
Step 08
Confusion matrices
Per-class diagnosis
Each stage is testable in isolation. The CycleGAN is evaluated on translation quality (FID-style sanity checks and cycle reconstruction error) independently of the downstream classifier, which reduces the risk of one stage hiding failure modes in another.
Evaluation protocol
Train / val / test splits are constructed at source-image level so that the same fruit instance cannot appear in two splits. The night evaluation set is fully held out from CycleGAN training, which is essential because the translator must not have seen the test backgrounds during training.
Accuracy, precision, recall, macro-F1, and per-class confusion matrices. The confusion matrix is the most interesting artefact here because it reveals whether the source-only baseline fails by collapsing classes (e.g. calling everything ripe under LED) or by swapping them, which has different operational consequences for a harvesting robot.
Source-only vs source + colour-jitter augmentation vs source + CycleGAN translated samples. The colour-jitter ablation is included specifically to defend the use of CycleGAN. If a cheap photometric augmentation closes the gap, the generative pipeline is overkill.
Numbers reported in any write-up are from a single, fixed evaluation protocol with disjoint splits and no test-set leakage into the CycleGAN. The full result table belongs with the repository, where the splits, seeds and implementation can be inspected.
Findings
- Source-only models trained on daytime imagery degrade visibly under the LED domain. The main failure pattern is misclassifying borderline-ripe fruit as unripe when the LED spectrum compresses red contrast.
- Naive photometric augmentations (brightness / saturation / hue jitter) help marginally but do not close the day → night gap. They cannot reproduce the combination of colour-balance shift, specular highlights and ambient cast that CycleGAN learns from real night imagery.
- CycleGAN-driven adaptation recovers the bulk of the lost performance using only daytime labels and an unlabelled night image set. This is the operationally important property, because the labelling cost on a moving robot is the actual bottleneck.
- Translation quality matters more than translation quantity. Adding more poorly translated synthetic samples destabilises classifier training; a smaller, more cycle-consistent set does better.
Engineering notes
- Training is reproducible via fixed seeds for the CycleGAN and the downstream classifier, with config files under version control alongside the code.
- CycleGAN training is checkpointed every N epochs and the classifier is evaluated against multiple translator checkpoints. This guards against the classic GAN failure mode where the "best-looking" generator is not actually the best for downstream tasks.
- All augmentations are deterministic in evaluation mode. Random flips / crops in test time would inflate apparent accuracy.
- The pipeline is designed to fail loudly: missing dataset folders, unbalanced splits or NaN losses raise immediately rather than silently corrupting results.
Limitations
- CycleGAN translations are an approximation of the night domain, not the night domain. There will be some classes of artefact (specular highlights on wet fruit, sensor-specific colour casts) that the translator under-models.
- The classifier is binary or coarse-grained ripeness. Real harvesting decisions need calibrated multi-stage outputs with confidence intervals, not just a label.
- Evaluation is offline. On-robot inference cost, latency under image bursts, and full harvesting impact are not measured here. Those belong to a hardware in the loop study.
- No fairness across cultivars / farms is claimed. Generalisation across farms is the next test.
Link to the Dogtooth dissertation
This project sits upstream of the MSc dissertation with Dogtooth Technologies. The dissertation focuses on sensor-based collision intelligence rather than vision, but the underlying engineering question is the same: how do you keep a model trustworthy when the deployment environment differs from the training data? The strawberry case study is a clean, well-bounded version of that question.
Future work
- Replace CycleGAN with newer diffusion-based unpaired translators (e.g. cycle-consistent diffusion) and compare against the GAN baseline on the same night evaluation set.
- Move from binary ripeness to a calibrated multi-stage output with proper reliability diagrams, so the harvest decision can use confidence as an input.
- Cross-farm evaluation: train on one farm, evaluate on another, with and without per-farm CycleGAN fine-tuning.
- Quantised on-device inference benchmarks on the kind of compute that actually ships on a harvesting robot.