# objective

nvdiffmodeling is an open source repo of NVIDIA that optimizes mesh, material and other information through in-depth learning. The data is the mesh grid to be optimized obj file and material information mtl files, because they do experiments on simulation data, naturally they also have the truth mesh and mtl, and directly render when the image target of the corresponding angle is needed_ Mesh, and then you can calculate the loss of the optimized mesh picture and the picture rendered from the angle corresponding to the truth value.

However, in the actual project, of course, we do optimization on a bad mesh and texture, and there is no truth value of them (we have a truth value and optimize a hammer), so we can't render the images that should be available from the corresponding perspective from the truth value of mesh and material. Therefore, what we need to do is to convert our data into render_ For the pictures after mesh, we also need to get nvdiffmodeling the information it needs from our mesh. This article first combs the main logic of this project.

# Code content

img_opt,img_ref is to render the current mesh and the mesh of the true value at a specific angle and save the picture after a fixed number of iterations.

color_opt,color_ref is the image obtained by rendering mesh and truth mesh to be optimized before each loss calculation. We need to fix the angle of mesh rendering to calculate loss with the truth value of the image taken at a specific angle.

During the simulation experiment, some RT matrices are randomly generated in the code, and then the operation is carried out:

mvp = np.zeros((FLAGS.batch, 4,4), dtype=np.float32) campos = np.zeros((FLAGS.batch, 3), dtype=np.float32) lightpos = np.zeros((FLAGS.batch, 3), dtype=np.float32) # ============================================================================================== # Build transform stack for minibatching # ============================================================================================== for b in range(FLAGS.batch): # Random rotation/translation matrix for optimization. r_rot = util.random_rotation_translation(0.25) r_mv = np.matmul(util.translate(0, 0, -RADIUS), r_rot) mvp[b] = np.matmul(proj_mtx, r_mv).astype(np.float32) campos[b] = np.linalg.inv(r_mv)[:3, 3] lightpos[b] = util.cosine_sample(campos[b])*RADIUS

r_rot is a randomly created RT matrix. The external parameter describes the transformation from the world coordinate system to the camera coordinate system.

mvp[b] is the projection matrix in batch B × Results of external parameters.

campos is r_ The fourth column element in row 1, 2 and 3 of the inverse matrix of MV (the inversion of rotation and translation matrix is the camera attitude matrix, which describes how the camera coordinate system is transformed into the world coordinate system), that is to get - T, which describes the position of the camera center in the world coordinate system.

Then create a batch × five hundred and twelve × five hundred and twelve × 3 random background color.

Then move the mesh to the center:

def center_by_reference(base_mesh, ref_aabb, scale): center = (ref_aabb[0] + ref_aabb[1]) * 0.5 scale = scale / torch.max(ref_aabb[1] - ref_aabb[0]).item() v_pos = (base_mesh.v_pos - center[None, ...]) * scale return Mesh(v_pos, base=base_mesh)

Where center is the center point of xyz's bounding box, 1 × 3. Scale is the corresponding scale, v_pos is the coordinates of all vertices after scaling, N × 3.

Then call render_ The mesh function renders the truth value of mesh to get the picture, color_ The shape of ref is [minibatch, full_res, full_res, 3], and the visualization results are shown in the figure below.

with torch.no_grad(): color_ref = render.render_mesh(glctx, _opt_ref, mvp, campos, lightpos, FLAGS.light_power, iter_res, spp=iter_spp, num_layers=1, background=randomBgColor, min_roughness=FLAGS.min_roughness)

render_mesh is defined as follows:

def render_mesh( ctx, mesh, mtx_in, view_pos, light_pos, light_power, resolution, spp = 1, num_layers = 1, msaa = False, background = None, antialias = True, min_roughness = 0.08 ):

Mesh is the mesh after moving to the center, mtx_in is the mvp matrix, that is
p
r
o
j
c
a
m
e
r
a
c
l
i
p
T
w
o
r
l
d
c
a
m
e
r
a
T
m
o
d
e
l
w
o
r
l
d
proj_{camera}^{clip}T_{world}^{camera}T_{model}^{world}
projcameraclipTworldcameraTmodelworld，view_pos is campos, that is, T, light_pos is lightpos, lightpower is set in the super parameter, and resolution is the resolution ITER set in the super parameter_ Res (i.e. train_res), spp is the ITER set in the super parameter_ Spp (default = 1).

Convert these numpy variables to tensor:

def prepare_input_vector(x): x = torch.tensor(x, dtype=torch.float32, device='cuda') if not torch.is_tensor(x) else x return x[:, None, None, :] if len(x.shape) == 2 else x full_res = resolution*spp # Convert numpy arrays to torch tensors mtx_in = torch.tensor(mtx_in, dtype=torch.float32, device='cuda') if not torch.is_tensor(mtx_in) else mtx_in light_pos = prepare_input_vector(light_pos) light_power = prepare_input_vector(light_power) view_pos = prepare_input_vector(view_pos)

Then convert the vertex of mesh to the clipping space where xyz belongs to [- 1,1], which is the coordinate obtained by converting the vertex [minipatch_size, num_vertices, 3] through mvp matrix, so v_ pos_ The shape of clip is [minipatch_size, num_vertices, 4], where the truth value ref mesh may be different from the base mesh to be optimized, so num is output_ Vertices is the number of vertices in ref mesh or base mesh.

# clip space transform v_pos_clip = ru.xfm_points(mesh.v_pos[None, ...], mtx_in)

Next, render all layers from front to back, num_layers defaults to 1. Here, rasterize in nvdiffrast is used first_ next_ layer(), nvdiffrast's document says in num_ When layers is 1, like rasterize() function, the returned shapes of rast and db are [batch_size, full_res, full_res, 4]. The four dimensions of rast are uvzw, u and v are the coordinates represented by the three vertices of the pixel in the patch in three-dimensional space, z is the depth value, w is the id of the triangular patch, and db stores the derivative. Then, render Py the render in this file_ layer for rendering, which will be analyzed later. In short, layers is a list shaped like [1,2, batch_size, full_res, full_res, 4].

# Render all layers front-to-back layers = [] with dr.DepthPeeler(ctx, v_pos_clip, mesh.t_pos_idx.int(), [resolution*spp, resolution*spp]) as peeler: for _ in range(num_layers): rast, db = peeler.rasterize_next_layer() layers += [(render_layer(rast, db, mesh, view_pos, light_pos, light_power, resolution, min_roughness, spp, msaa), rast)]

After each layer is rendered, it's time to start mixing. Consider the background first. If the background is empty, it is the simplest to initialize a full directly_ If the RGB matrix of res has a background, the background must be as large as resolution. If the scaling ratio of spp is greater than 1, the background must be interpolated.

# Clear to background layer if background is not None: assert background.shape[1] == resolution and background.shape[2] == resolution if spp > 1: background = util.scale_img_nhwc(background, [full_res, full_res], mag='nearest', min='nearest') accum_col = background else: accum_col = torch.zeros(size=(1, full_res, full_res, 3), dtype=torch.float32, device='cuda')

Next, we need to synthesize the colors of each layer together, from far to near. For each color and rast, the fourth dimension of rast is the id of the triangular patch. If it is greater than 0, it means that it should be rendered in this pixel.

The last item of color is transparency. The cumulative color and the color of this layer are linearly interpolated, i.e. accum_col + alpha * (color[…, 0:3] - accum_col). If anti aliasing is required, call the anti aliasing function again.

# Composite BACK-TO-FRONT for color, rast in reversed(layers): alpha = (rast[..., -1:] > 0) * color[..., 3:4] accum_col = torch.lerp(accum_col, color[..., 0:3], alpha) if antialias: accum_col = dr.antialias(accum_col.contiguous(), rast, v_pos_clip, mesh.t_pos_idx.int()) # TODO: need to support bfloat16

Finally, if the spp is greater than 1, use the average pool to downsample the image, otherwise it will be accum_col returns. The visualization is shown in the figure below. This is render_ The returned result of mesh is only that the shape and texture of this image are still very poor because it was generated at the beginning of training.

# Downscale to framebuffer resolution. Use avg pooling out = util.avg_pool_nhwc(accum_col, spp) if spp > 1 else accum_col return out

Now, let's look back at render_ What did layer do.

def render_layer( rast, rast_deriv, mesh, view_pos, light_pos, light_power, resolution, min_roughness, spp, msaa ):

First, change the resolution to the specified size. MSAA is multi sampling anti aliasing, which finds the pixels at the edge of the object, and then scales them.

full_res = resolution*spp ################################################################################ # Rasterize ################################################################################ # Scale down to shading resolution when MSAA is enabled, otherwise shade at full resolution if spp > 1 and msaa: rast_out_s = util.scale_img_nhwc(rast, [resolution, resolution], mag='nearest', min='nearest') rast_out_deriv_s = util.scale_img_nhwc(rast_deriv, [resolution, resolution], mag='nearest', min='nearest') * spp else: rast_out_s = rast rast_out_deriv_s = rast_deriv

Then based on v_pos is the position of mesh vertex and rast_out_s is rast, t_pos_idx is the id of the vertex to interpolate in space. After each vector face, calculate the normal_ Normals and make an id, so their shapes are [num_faces, 3]. gb_geometric_normal is to interpolate the normal vector of each face. In fact, what we get is the normal vector of each pixel in the picture corresponding to the patch in three-dimensional space. Similarly, gb_normal is to interpolate the normal vector of the vertex, gb_tangent is to interpolate the tangent vector of the vertex. Their shapes are [minipatch, full_res, full_res, 3]. The visualization results are shown in the figure below.

Then, the texture of each vertex is interpolated, GB_ The shape of texc is [minipatch, full_res, full_res, 2], gb_texc_ The shape of deriv is [minipatch, full_res, full_res, 4].

################################################################################ # Interpolate attributes ################################################################################ # Interpolate world space position gb_pos, _ = interpolate(mesh.v_pos[None, ...], rast_out_s, mesh.t_pos_idx.int()) # Compute geometric normals. We need those because of bent normals trick (for bump mapping) v0 = mesh.v_pos[mesh.t_pos_idx[:, 0], :] v1 = mesh.v_pos[mesh.t_pos_idx[:, 1], :] v2 = mesh.v_pos[mesh.t_pos_idx[:, 2], :] face_normals = util.safe_normalize(torch.cross(v1 - v0, v2 - v0)) face_normal_indices = (torch.arange(0, face_normals.shape[0], dtype=torch.int64, device='cuda')[:, None]).repeat(1, 3) gb_geometric_normal, _ = interpolate(face_normals[None, ...], rast_out_s, face_normal_indices.int()) # Compute tangent space assert mesh.v_nrm is not None and mesh.v_tng is not None gb_normal, _ = interpolate(mesh.v_nrm[None, ...], rast_out_s, mesh.t_nrm_idx.int()) gb_tangent, _ = interpolate(mesh.v_tng[None, ...], rast_out_s, mesh.t_tng_idx.int()) # Interpolate tangents # Texure coordinate assert mesh.v_tex is not None gb_texc, gb_texc_deriv = interpolate(mesh.v_tex[None, ...], rast_out_s, mesh.t_tex_idx.int(), rast_db=rast_out_deriv_s)

After getting the coordinates on the mtl texture map corresponding to each pixel in the picture, you can color them. Naturally, the shape of color is also [minibatch, full_res, full_res, 4]. The visualization results are shown in the figure below.

################################################################################ # Shade ################################################################################ color = shade(gb_pos, gb_geometric_normal, gb_normal, gb_tangent, gb_texc, gb_texc_deriv, view_pos, light_pos, light_power, mesh.material, min_roughness) ################################################################################ # Prepare output ################################################################################ # Scale back up to visibility resolution if using MSAA if spp > 1 and msaa: color = util.scale_img_nhwc(color, [full_res, full_res], mag='nearest', min='nearest') # Return color & raster output for peeling return color

Through render_ After that, we can learn the method of kmes and diffuse to render the image in a certain angle of view, and then we can calculate the true value of kmes and diffuse in a certain angle of view. How good is the final result?