This research consists of 2 papers offering differing reconstruction approaches to obtain high accuracy inferences of 3D structures with textures from 2D single shot images.

The papers strive to challenge existing works surrounding three-dimensional reconstruction with shape and texture from only one view, which currently remain at two-dimensional shot level, obstructing obtainment of high-accuracy inference without explicit 3D supervised annotation.

On one hand, Guanyu aims to suggest an urgently-needed reconstruction approach based on meshes to enhance the 3D mesh attribute learning process, by proposing a novel and concise reconstruction model in computer vision, training shape and surface colour.

The model is realised through 2 separate pipelines by CNN:

  1. Geometric shape process learning by shape volume.
  2. The encoding-decoding system of colour, which is unified by regressed colour volume and transforming 3D to 2D flow field, respectively.

The visualisation achieved by Guanyu is a result of blending these 2 factors with appropriate weighting. To generate textured models, the one-to-one mapping fetches colours through a three-channel volume and the same space measurement as the shape network.

Illustration of Point Cloud Reconstruction of Amber’s Model

Meanwhile, Amber aims to create an end-to-end single view 2D to 3D reconstruction model by adopting an encoder-decoder network to use:

  1. A transformer-CNN encoder for feature extraction
  2. A decoder for shape and surface learning to generate a point cloud reconstruction

Reconstruction Illustration on Real World Image

Consequently, the research demonstrates the proficiency of using a hybrid encoder that consists of CNN and transformer, proving successful even when trained on a small dataset.