master's thesis
Deep Learning for Data Imputation in Oceanography
Remote sensing provides essential data for monitoring ocean color and phytoplankton, which are important indicators of marine ecosystem health. However, missing data is a common issue in these observations, and addressing it is necessary to gain a complete understanding of ocean dynamics.
I explored how transformer-based model can impute missing values in variables such as sea surface temperature, chlorophyll-a, and phytoplankton size classes. The work focused on a high-resolution (1/24°) dataset over the Gulf Stream region with an average of 80% missing data.
Since no ground truth exists for most missing regions, this makes the imputation task inherently ill-posed, unlike data assimilation, which integrates observations with a known dynamical model. To evaluate the method, I simulated missing data patterns on sea surface temperature fields, where ground truth is available, allowing controlled experiments to assess reconstruction quality. The model captures spatial, temporal, and multivariate correlations in 3D oceanographic data using self-attention mechanisms, making this approach a promising tool for oceanographic research.