Datasets | UDC-VIT
Despite extensive research on UDC images and their restoration models, studies on videos have yet to be significantly explored. While two UDC video datasets exist, they primarily focus on unrealistic or synthetic UDC degradation rather than real-world UDC degradation. In this paper, we propose a real-world UDC video dataset called . Unlike existing datasets, only exclusively includes human motions that target facial recognition.
Ideally, we would like to compare with two existing UDC video datasets, PexelsUDC and VidUDC33K. However, since PexelsUDC is not publicly available, we use the P-OLED dataset used to create it. Table below gives a summary of the eight previous UDC datasets.
Transmittance decrease and digital noise in the UDC setting
The camera sensor amplifies the desired signal and unwanted noise in low-light conditions. In the UDC setting, where the sensor is beneath the display panel, the transmittance decreases, leading to amplified noise. The P-OLED dataset, captured in a controlled setting, exhibits unrealistic noise and excessive transmittance decrease. Similarly, in the VidUDC33K dataset, the degraded frame’s noise level is somewhat lower than the ground truth. In contrast, UDC-VIT accurately shows actual transmittance decrease and digital noise resulting from quantizing digital image signals.
Visual comparison
VidUDC33K (GT) |
UDC-VIT (GT) |
|
|
VidUDC33K (Input) |
UDC-VIT (Input) |
|
|
The UDC’s unique flare characteristics
UDC flares arise from light diffraction as it passes through the display panel above the digital camera lens. Thus, it is essential for each frame in the UDC video dataset to precisely depict the UDC’s unique flare characteristics, including spatially variant flares, light source variant flares, and temporally variant flares. The P-OLED dataset rarely exhibits flares as it captures images displayed on a monitor in a controlled environment.
Visual comparison
VidUDC33K (GT) |
UDC-VIT (GT) |
|
|
VidUDC33K (Input) |
UDC-VIT (Input) |
|
|
UDC-VIT stands out from other datasets by featuring videos tailored for face recognition. Some datasets, such as T-OLED/P-OLED, SYNTH, and VidUDC33K, only include limited human representations, often too small or from unrecognizable angles for face recognition. Zhifeng et al. introduce still image datasets for face recognition. However, these datasets are generated using a GAN-based model trained on the P-OLED dataset, which does not adequately simulate realistic UDC degradation, notably the lack of flare. Additionally, these datasets are not publicly available. Conversely, UDC-VIT prominently features humans in 64.6% of its videos (approved by the Institutional Review Board (IRB)), featuring various motions (e.g., hand waving, thumbs-up, body-swaying, and walking) by 22 carefully selected subjects from different angles. Users of the UDC-VIT dataset must secure IRB approval per their country’s laws and use it solely for research.
UDC-VIT (GT): hand waving |
UDC-VIT (Input): hand waving |
|
|
UDC-VIT (GT): thumbs-up |
UDC-VIT (Input): thumbs-up |
|
|
The VidUDC33K dataset often presents unrealistic scenarios. Below are the examples of unrealistic scenarios:
- Case 1: The degraded frames lack UDC flares, displaying flares resembling typical lens flares seen in the ground truth frame.
- Case 2: Flares appear in improbable situations.
- Case 3: Unintended white artifacts exist both in the ground-truth and degraded videos.
- Case 4: Some videos showcase the darkened and nearly featureless degraded frames except the first frame due to the simulation of the dynamic changes of the PSF.
- Case 5: Some videos may not significantly contribute to research, prompting consideration for their relevance.
For a detailed explanation, please see the supplementary material (pdf).
VidUDC33K (GT): Case 1 |
VidUDC33K (Input): Case 1 |
|
|
VidUDC33K (GT): Case 2 |
VidUDC33K (Input): Case 2 |
|
|