This is the sixth and final article from the series “A Scanning Dream”. Here we will discuss how to use the reconstructed object or scene, as it doesn’t really depend on the scanning method. Also, we will discuss the dream described in the first article. Can it be achieved today?
As we’ve seen in previous articles, all known methods for 3D reconstruction produce high-poly meshes and high detailed diffuse maps with a mostly useless UV map. Besides the high-poly issue, models also suffer from visual artifacts caused by non-ideal scanning conditions and physical limitations of the sensors. All of this makes it hard to use a 3D reconstructed model as-is for real world production cases, for example mobile VR platforms like Oculus Quest (such platforms have a limit of 100 000 polygons per frame). So the reconstructed models require post-processing before it can be used.
Traditional way
The traditional post-processing approach has the following steps:
- Retopology from high-poly to low-poly model.
- Building an albedo map.
- Building normal, height, and ambient occlusion maps.
- Building roughness and metallic maps.
Retopology
Retopology is a procedure of creating a low-detail mesh based on a high-poly model, where missing details are baked into textures. Retopology can be done before the final texturing phase during 3D reconstruction. Professional experienced 3D modelers usually do this job with Blender, ZBrush, or any other similar program. There are also automatic mesh simplify methods, but generally such methods can’t use semantic and context information while simplifying so they can’t provide the same quality as a professional 3D modeller. The most interesting experimental Open Source software that does automatic retopology is Instant Meshes, but it is not capable of building a UV map. The algorithms behind Instant Meshes are used in other projects as well, for example in Modo.
After retopology, you get not only the low-poly model but also a series of models that gradually differ by polygon count (LOD - Level of Detail), so you could use them in more effective traditional rendering engine like Unreal Engine 4 and Unity. In most cases you also get an effective UV map along with the low-poly meshes.
Making an albedo map
The 3D reconstructed model has only a diffuse map that has fixed light and shadows captured during the scanning. In some cases it can be enough, but usually there are dynamic light effects in 3D and VR so an albedo map is required. Semi-automatic software like Agisoft De-Lighter can generate an albedo map. The user just has to mark the shadowed and lighted areas of the model by using brushes of various sizes.
Making normal, ambient occlusion, and height maps
When considering retopology, remember that you can bake the missing details into normal, height, and ambient occlusion maps. For example, it can be achieved using a well known free program xNormals that uses ray tracing to create texture maps.
Making roughness and metallic maps
Roughness and metallic maps are usually required for PBR (Physically Based Rendering). Artists might use a copy of the albedo map to create such maps. Unfortunately, it is mostly manual work that is hard to automate even if the maps can be done with a simple color filter.
Alternative ways
The traditional post-processing approach requires a lot of manual work and the resulting low-poly models don’t have the same quality as original 3D reconstructed models. You can see baking artifacts when examining the model closely, and this is very common in VR. Alternative post-processing approaches might completely remove manual work or simplify it significantly.
Google Seurat
Google Seurat was an Open Source side project created by the same team that previously worked on the light field array camera scanning for the Daydream VR project. Seurat is able to simplify any complex 3D scene including 3D reconstructed ones. It produces models that can be rendered even on mobile VR platforms like Oculus Quest. Unfortunately, it is now discontinued.
Seurat captures the target scene with the full information about light, shadows and all used shaders without changing the models. The capture is done by a set of fixed cameras with non-overlapping FOVs using the light field method in random points of the limited subspace of the scene (as a result, the possible movement for the player is limited). The result of the simplification is a set of the low-poly textured meshes. The compression rate can be colossal: millions of polygons compressed to tens of thousands.
From our experience, it’s impossible to get more than 1-2 square meters of space for movement while using Seurat. Capturing a large scene takes a tremendous amount of time even when using a powerful PC (because Seurat runs mostly on the CPU). Also, we saw visual artifacts with small objects like tree leaves.
Despite the fact that Seurat is discontinued and hasn’t been updated for several years it is still working fine. It can be used for limited scenarios e.g. for VR content that uses small but high detailed spaces.
Umbra3D
Umbra3D is a commercial toolkit that optimizes rendering of any complex 3D scene including 3D reconstructed ones. It has 2 main components: the import module and the streaming and rendering module.
Since it’s commercial, all implementation details are hidden but we can surmise how it works based on the description:
- A static scene (meshes and textures) is imported in the toolkit. Large meshes are split into small parts automatically. These parts are added to the hierarchical LOD structure according to their position in space. Thus, the whole scene is effectively covered by LOD, without doing retopology manually.
- The rendering module loads polygons from the previously generated LOD on-demand based on the camera position and orientation (scene streaming). It’s able to load scene parts from both from disk and cloud.
It’s unclear if Umbra3D works for static scenes only or can also optimize dynamic objects, for example objects held by the player’s avatar in VR.
The toolkit provides native C-lang SDK with wrappers for C++, alongside a Unity plugin and a Web-rendering module.
It seems possible to use the reconstructed scene in Umbra 3D without any manual post-processing, if there are no serious artifacts and no dynamic light sources.
Unreal Engine 5
Recently, Epic Games announced the new Unreal Engine 5 (May 2020). This new engine is amazing, and among other cool things it brings Nanite technology for scenes with billions of polygons. It is directly noted in the announcement that you are now able to render original high-poly models as is in real-time even after 3D reconstruction.
The engine isn’t available publicly yet so there aren’t many details but we think it’s possible to assume the following based on the interview with the top managers:
- The Unreal Engine 5 demo is built for the new Sony PlayStation 5 and uses its full hardware potential. But it’s expected for new technologies (including Nanite) to work the same way on PC and mobile platforms. So the same Unreal Engine 5 demo with less details can run on an Android device released 3 years ago. VR platforms were not mentioned but they are implied if Android is supported.
- Nanite unleashes the potential of the recently acquired Quixel Megascans - a large library of reconstructed photo-realistic objects and textures.
- Nanite has an automatic LOD generator.
- Nanite has scene streaming capabilities so polygons are loaded on demand depending on the camera position and orientation.
- It is still recommended to build texture maps but it’s not required to bake the missing details after retopology. At a minimum, roughness and metallic maps are required for proper lighting.
- The scene streaming quality depends on the new high-speed SSD inside PlayStation 5. 5G with a speed close to 1 Gbps, especially for VR platforms like Oculus Quest, might be a game-changer.
- Nanite and Lumen (a new lighting system) were built as prototypes by a single Epic engineer in a few years.
Conclusion
The scanning dream described in the first article was based on the following steps:
- Scan the scene.
- Scan the whole scene from a single point with a smartphone.
- Scan in video mode.
- Move a sofa in the reconstructed scene.
- Scan moving cranes.
- Using the reconstructed scene without manual post-processing.
- Rendering in mobile VR as is.
Let’s sum up what is possible today and what is still a dream.
Scanning the reality
Today 3D scanning and reconstruction is not a dream but a reality, especially for modern AAA products. Also, there is a huge library of reconstructed photo-realistic objects and textures Quixel Megascans.
Scan the whole scene from a single point with a smartphone
Currently, it’s impossible to get detailed scans for small and large objects from a single far away point with just a smartphone. You have to walk around a lot, following a shooting plan. Even using structured light systems indoors you are limited by the sensor’s min and max distance. The ToF systems used in some modern smartphones have low powered built-in emitters and are limited in distance from 20cm up to several meters. The photogrammetric method is limited by physical matrix size, lens optical parameters, and fixed focus. The most promising way seems to be future smartphones with built-in light field cameras. Also, the light field approach can potentially help with scanning of transparent, reflective, and plain surfaces like windows in a building.
Scan in video mode
Today you can do 3D reconstruction from a 4K video with a high enough quality by using the photogrammetric method. But technically we doubt it makes sense to use a regular video instead of a special scanning mode on a smartphone. For example, all modern 3D scanners for Android and iOS work in this mode guiding users in real-time.
Move a sofa in the reconstructed scene
Today you are able to move the sofa as an object on the reconstructed scene but you neet to cut it out of the mesh and patch the hole behind manually. General automatic 3D segmentation and classification is the hot issue of modern computer vision science. Significant progress was made for the same task for 2D images using CNN models, but there are no production-ready models yet except for several successful experiments for limited data sets.
Scan the moving cranes
Despite the fact that structured light and ToF systems are able to capture moving objects and are frequently used for this purpose, it’s hard to complete 3D reconstruction for such cases. It’s possible to capture an angle of the crane’s jib by analyzing depth maps in real-time but the whole 3D model will be a mess if the crane moves even during the reconstruction.
Using a reconstructed scene without manual post-processing
There are at least 2 tools that make it possible to use a reconstructed high-poly model directly in the game engine - Google Seurat and Umbra 3D. Unreal Engine 5 looks excellent and can do much more than simply render tremendous amounts of polygons but it is not available publicly at the moment.
Rendering in mobile VR as is
All 3 tools: Google Seurat, Umbra 3d, and Unreal Engine 5 potentially can be used to create content for mobile VR platforms like Oculus Quest. Scene streaming, which is implemented in both Umbra 3d and Unreal Engine 5, adaptively loads content based on demand and hardware abilities. Modern 5G VR hardware with speeds of up to 1 Gbps will increase scene streaming potential even further.
Finally
Our scanning dream seems closer than we could imagine in the beginning of this series. We bet the virtual future will be very close to the real world or even better and will use streaming through portable VR and AR devices, becoming accessible and wide-spread. Let’s hope humanity can stay rooted in the real world and not become stuck in the virtual one.