Sunday, August 17, 2014

Homemade 3D rendering engine (X)

Final thoughts, implementing this has been a fun exercise, it is emotionally pleasing to see theory in practice. I learnt a big deal of these from Dave Mount’s lecture notes and Computational Geometry for the binary space partitioning algorithm.

In practice, however, this is entirely impractical. With a 30 frames per second implementation it use up all the CPU time. When generating animation, we have the clear the screen and draw again, which has poor performance in WPF. BSP tree need to be built for each frame but not incrementally is also painful, floating point errors add up, the list goes on and on.

So this whole thing is really done for the fun. It could be altered to make a great exercise for students in Computer Graphics so that they grasp the basic theory, but don’t really use it.

A couple more meta-learning I had in this documenting exercise is that:
  1. Document the idea now! Documenting it two years from now is a painful exercise to reverse engineer those idea back from code, and
  2. Documenting the idea helps find bugs – case in point is the fourth case in the binary space partitioning tree.


Homemade 3D rendering engine (IX)

Now we have laid out the basics for rendering. Let’s cover other topics that might be useful to understand the whole game. One problem is event handling, when user click on a particular cube face, how do I respond to that event? It turns out to be rather simple, simply register a click event handler on the triangle, and record the cube face with the triangle, and that’s it. A typical solution might need to go back to the BSP tree to search from front to back.

How about animation? In my game, animation is abstracted under the model. A model is a collection of triangle, during rendering, if an animation is needed (e.g. rotating a particular face), a different set of triangles is returned using a composite pattern. A model is a transform model of the underlying model which is a cube, which is then a composition of six triangles.

So during each rendering, the whole screen is cleared, the set of triangles are obtained from the composite model, a BSP tree is built, a back to end traversal is used to determine rendering order, the projection is used to compute 2d coordinates, and a triangle object is rendered to the screen.

This conclude the design of the whole system!

Homemade 3D rendering engine (VIII)

While the binary space partitioning algorithm is conceptually simple, the implementation of it is fairly complicated because a lot of details are missing from the previous discussion. For example, how is the model represented? How do we choose the plane to split? How to split the model? Apparently, to implement the algorithm, all these questions has to be answered.

In my implementation, a model is a collection of triangles, with vertexes represented in world’s coordinates. We just pick one triangle (which then implies a plane) to split, the tricky part are the triangles that cross the plane need to split, and the computation of that splitting can be fairly complicated as shown below.

First, a plane can be represented by a point and its normal, crossing the two vectors of the triangle gives the normal. Determine which half-space contains the camera is straightforward.

Second, if a triangle’s vertexes are all on the same side of the half-space (the plane inclusive), we are lucky and just group that triangle in one of the left or right sub-trees.

Third, if a triangle’s vertexes are on both sides, there must be one on one side and two on other side. Determine the lonely one, find the two intersection point on the plane, that forms a triangle, the rest is a trapezoid which can easily split into two triangles as well.


In fact, there is a fourth case, that one on one size, one on the plane, and one on the other side, in that case we should split into two triangles. The case was missing in my code and some triangles mysteriously disappear … L

Homemade 3D rendering engine (VII)

The projection model we discussed above is good enough to produce wire frame models, where models are lines and are projected to the screen from rendering. In particular, we do not care about drawing lines that should appear behind some other lines.

But for my purpose, I need to produce a cube with surface painted with colors, I need to differentiate if the plane is in front of behind, and not draw anything behind.

While the Painter’s algorithm is the most popular algorithm around, it requires too much memory for my application and I cannot use hardware acceleration, so I opted not to use that, instead, I have learnt and used the binary space partitioning tree algorithm.

Let’s start with defining half-space, a half-space is simply the region of the space that is above/below a plane. So given a plane, the space is divided into two half-spaces.

So given a plane, we can divide the model into three parts, one that is in the half space where it contains the camera. One that is exactly on the plane, and one that is in the other half-space.

Now it comes a simple divide and conquer strategy. We start with the complex model, finding a plane to split the model, build a node in the tree with that plane to represent the split, and three children as the split models. Recursively do it until it is just one plane, then we stop.


The resulting tree is the binary space partitioning tree, given a binary space partitioning tree, we know that a model appearing in the half-space containing the camera will always appear in front of the model on the plane, which is on turn appear in front of the model in the other half-space, so by doing a tree-traversal, we can render the items from back to front, effectively hiding the items that should appear on the back.

Homemade 3D rendering engine (VI)

After the detour to mathematics, it’s time to complete the projection. We note firstly that, the projection we wanted is a function, given the world’s coordinate, returns the point’s eye’s coordinate.

Remember we knew the axis of the eye’s coordinate system as vectors in the world’s coordinate system, it make the inverse problem of the projection easy. Given a point $ (e_x, e_y, e_z)^T $ in the eyes coordinate system, all we need to do to compute $ (w_x, w_y, w_z)^T $ is to apply the axis as follow:

$\left(\begin{matrix}{w_x\\w_y\\w_z}\end{matrix}\right) = e_x \vec{E_x} + e_y \vec{E_y} + e_z \vec{E_z} + \left(\begin{matrix}{c_x\\c_y\\c_z}\end{matrix}\right)$

or simply

$ \vec{w} = (\vec{E_x}  \vec{E_y}  \vec{E_z})\vec{e} + \vec{c} $

So all we need to do is to express $ (e_x, e_y, e_z)^T $ in terms of $ (w_x, w_y, w_z)^T $ in the equation above. Note that the transformation is affine, so we could just form the 4 x 4 matrix and invert it. But there is an easier way.
Since $ \vec{E_x} $, $ \vec{E_y} $, $ \vec{E_z} $ is an orthonormal frame, we knew from linear algebra that the matrix is orthonormal and its inverse is simply its transpose, so we have:

$\begin{eqnarray*} \vec{w} & = & (\vec{E_x} \vec{E_y} \vec{E_z})\vec{e} + \vec{c} \\ \vec{w} - \vec{c} & = & (\vec{E_x} \vec{E_y} \vec{E_z})\vec{e} \\ \vec{e} & = & \left(\begin{matrix}{\vec{E_x} \\ \vec{E_y} \\ \vec{E_z}}\end{matrix}\right)(\vec{w} - \vec{c}) \end{eqnarray*}$

Which is yet another affine transformation that we can effectively code. Since the projection matrix doesn’t change much, it make sense to simply compute the matrix once and cache it for computing projections.

Since at the end of the day the screen is 2D, so what we will do is to simply drop the z-coordinate. 

Homemade 3D rendering engine (V)

At this point, it does not harm for us to extend the concept of homogeneous coordinates a little bit to fit our purposes. In particular, for points, let’s define $ (cx, cy, cz, c)^T $ the same as $ (x, y, z, 1)^T $. This is nothing but just mapping the previously unused representation into a single canonical representation.

A linear transformation is defined as $ T(a\vec{x} + b\vec{y}) = aT(\vec{x}) + bT(\vec{y}) $, and it is typically represented as a matrix $ T(\vec{x}) = A\vec{x} $.

A translation is defined as $ T(\vec{x}) = \vec{x} + \vec{v} $, apparently, translation is not linear as $ T(\vec{0}) = \vec{v} \ne \vec{0} $.

As we can see, to do the projection, we need both translation and linear transformation, the fact that translation cannot be represented as matrix makes computation cumbersome.

Now the interesting thing happens, with homogeneous coordinates, we can represent both translation and general linear transformation as a matrix!

To see how, we can represent general linear transformation as follow

$ \left(\begin{matrix} \mathbf{A} & 0 \\ 0 & 1 \end{matrix}\right) \left(\begin{matrix} \vec{x} \\ 1 \end{matrix} \right) = \left(\begin{matrix} \mathbf{A}\vec{x} \\ 1 \end{matrix}\right) $

That’s doesn’t sound particularly useful, we could have done it without the extra dimension, but this is amazing

$ \left(\begin{matrix} \mathbf{I} & \vec{v} \\ 0 & 1 \end{matrix}\right) \left(\begin{matrix} \vec{x} \\ 1 \end{matrix} \right) = \left(\begin{matrix} \vec{x} + \vec{v} \\ 1 \end{matrix}\right) $

This also means we can compose translation and general linear transformation into a single matrix, these more general transformations are mathematically called affine transformations, and that’s what we will use in the projection. Note, however, that by composing the matrix, it might not be the case that we always ends with having a resulting point ends with 1, that’s why we need to introduce the homogeneous coordinates.

Homemade 3D rendering engine (IV)

Once we have both coordinate systems, we will do the transformation. In order to do the transformation, let’s start with some basic math. We already know about points can be represented as its coordinate, now we introduce vector. A vector is simply an arrow (i.e. its direction) with a length (i.e. its magnitude). It is floating in the space so we can move it around. Given two points, we can have a vector, represented as this simple equation:

Point – Point = Vector

So vector is also just an ordered list of numbers with the same dimension as points do.
It also make sense to play with the vector’s length by scaling it, in particular, we have this, meaning the length is scaled by s

Vector = s (as a Positive Real Number) * Vector

As well as reversing direction

Vector = -1 * Vector


For anyone who knew about vector in their math course, this would sound trivial, now let’s add some new convention here that is particularly useful. Since vector and point are both represented as an ordered list of numbers, there is no way to distinguish points from vectors, which could be a problem, so let’s add 1 dimension to the list at the end of the list of numbers, with points ends with 1 and vector ends with 0. Note that if we treat the last dimension just like any other dimensions, we have a nice type checking property – if the last dimension is 0 or 1, the operation is valid, otherwise, it isn’t. For example, it make no sense to add two points by coordinates, so it ends up with a list of numbers ending with 2. We call this new representation homogeneous coordinates and we’ll see this is particularly useful later on.