This research consists of 2 papers offering differing reconstruction approaches to obtain high accuracy inferences of 3D structures with textures from 2D single shot images.
The papers strive to challenge existing works surrounding three-dimensional reconstruction with shape and texture from only one view, which currently remain at two-dimensional shot level, obstructing obtainment of high-accuracy inference without explicit 3D supervised annotation.
On one hand, Guanyu aims to suggest an urgently-needed reconstruction approach based on meshes to enhance the 3D mesh attribute learning process, by proposing a novel and concise reconstruction model in computer vision, training shape and surface colour.
The model is realised through 2 separate pipelines by CNN:
- Geometric shape process learning by shape volume.
- The encoding-decoding system of colour, which is unified by regressed colour volume and transforming 3D to 2D flow field, respectively.
The visualisation achieved by Guanyu is a result of blending these 2 factors with appropriate weighting. To generate textured models, the one-to-one mapping fetches colours through a three-channel volume and the same space measurement as the shape network.
Illustration of Point Cloud Reconstruction of Amber’s Model
Meanwhile, Amber aims to create an end-to-end single view 2D to 3D reconstruction model by adopting an encoder-decoder network to use:
- A transformer-CNN encoder for feature extraction
- A decoder for shape and surface learning to generate a point cloud reconstruction
Reconstruction Illustration on Real World Image
Consequently, the research demonstrates the proficiency of using a hybrid encoder that consists of CNN and transformer, proving successful even when trained on a small dataset.