Rendering in 3D

Intro

So I recently decided that I wanted to learn about quaternions. Somehow I’d missed them so far but I’d heard about how awesome they were for handling rotations in 3D. To that end, I decided that a great project to help me learn is a 3D renderer! Turns out that I didn’t actually need to play with them that much for the renderer but I still made it anyway. As such feel free to skip the quaternion part if you’re not into that sort of thing. This post is basically a break down of what I did and the process I went through in working out how to draw something in 3D. If you see any mistakes, in methodology or typos, please comment and alert me to them!

Quaternions

Despite hearing good things about quaternions I was still sceptical, “these quaternions are better than trusty rotation matrices? Pffft”. After some reading, watching YouTube videos and discussing with friends, I am now convinced of their worth. They are superior to specifying angles (Euler angles) because you can easily chain multiple together and don’t need to worry about weird things like gimbal lock. Quaternions are also superior to chaining matrices together as one can directly obtain the transformation about any axis with quaternions and they are numerically more stable. In particular, if a matrix is drifting from a rotation matrix to a more general linear transformation thanks to accumulated error, fixing requires orthogonalisation while a quaternion only requires normalisation. The main draw back is that it’s not immediately obvious the angle of rotation that a quaternion represents, although it’s axis of rotation is immediately apparent which is more than one can say about a rotation matrix.

Quaternions are an extension from complex numbers, with

    \[ I^2 = J^2 = K^2 = IJK = -1. \]

A quaternion looks like

    \[ q = r + aI + bJ + cK \]

similar to complex number’s a+bi form. Note that unlike normal complex numbers multiplication is not commutative (order matters). What has this got to do with rotations? If we wish to rotate {\bf u} by \theta about the vector {\bf \hat\mu} we can do so with

    \[ u' = q(0 + u_xI + u_yJ + u_zK)q^{-1} \]

where

    \[ q = e^{\theta \hat\mu} = \cos\theta + \sin\theta(\mu_xI + \mu_yJ+\mu_zK) \]

and

    \[ q^{-1} = \frac{q^*}{||q||^2}. \]

q^* is conjugation and similar to complex numbers

    \[ (r + aI + bJ + cK)^* = r - aI - bJ - cK. \]

This follows the right hand ‘grip’ rule, stick your thumb out and curl your fingers a bit, if the thumb points in the direction of {\bf u} then the rotation is in the direction your fingers are pointing. Anyway, enough about quaternions, onto rendering!

Projection

Before we start, note I’ve decided on a right handed, z-up coordinate system. Everyone seems to use something different, if you don’t like it, you can suck it. We wish to render (draw) a 3D scene composed of polygons. To do this we project those polygons onto a render surface (plane). In particular consider the following figure,

Camera1

we have a camera at the origin which is oriented using the angles \text{theta} and \text{phi} (azimuth and elevation respectively). If the scene doesn’t have the camera at the origin a translation can put it there for you. Note the poorly drawn camera has a planar surface pointing out. Casting a ray from origin out through this surface to objects in the scene will place the pixels on the computer screen. That surface is the screen. But we don’t want to trace rays here, it’s too computationally expensive, so we’re going to cheat and project the polygons onto the surface. That projection is too hard for me with the weird orientation, so we’re going to point the camera straight up like so,

Camera2

and by rotating it about the z axis like this,

Camera4

the problem is much easier! Now we have a camera pointing directly up with its base pointing towards the negative y-axis. Of course, we don’t just rotate the camera but the entire world at the same time, that is, we change the reference frame.

Looking at this from the side,

Projection2

it is now easy to project onto the viewing surface. Let’s place the viewing surface at z=1. If we have a point {\bf p}, then we can project it onto the surface at point {\bf p}' with

    \[ {\bf p}' = \frac{\bf p}{p_z}. \]

I’ve chosen a horizontal viewing angle (field of view) of \pi/2 or 90^\circ to make the mathematics easier. In our scene the view surface has a width of 2\tan(\pi/2)=2, while our screen has a width of w. Thus the final position of a point on the screen is

    \[ {\bf p}_\text{on screen} = \left(\frac w2, \frac h2\right) + \frac w2(p_x',-p_y'). \]

The negative in the y-coordinate comes from computers plotting pixels with the top left being labelled (0,0). Remember p_z, it will be used for the z-buffer later.

Unfortunately, there is a problem with the above, in particular look at the following case

BadProjection

Here we have a polygon that clearly should be out of view (it’s 90^\circ off to the side), but it projects across the screen regardless! To stop these situations we stop ourselves from drawing certain polygons, or we “cull” some polygons before drawing. I looked at many ways to identify the above situation and couldn’t find an elegant one and so the issue is solved by culling any triangle who has a point below the x-y plane (z=0). We could incorrectly cull a triangle that should be drawn but that’s unlikely: the camera would need to be very close to the triangle and the polygon quite large.

Each polygon we will assign a front and back side. In addition to culling polygons which are behind the camera we also cull those which have their faces pointing away (called back-face culling). This assumes that each polygon is a part of a solid object that has a non-zero volume, as such, the only way to see the back of a polygon is to be inside the object. If {\bf n} is the surface normal of the polygon and {\bf p} is as above then

    \[ {\bf p}.{\bf n} > 0 \]

only if the face is pointing away. That is, if the normal points in the same direction as the position then it is pointing away and we discard it. Any of the points on the surface would do.

Drawing Triangles

Believe it or not, this is the part I had the most trouble with. We’ve projected the polygons of our scene onto the screen, now we need to draw them. The difficulty comes from the fact that our triangles have been specified with floating point numbers while pixels are by nature discrete, there are no half pixels. We will ignore the fact that people generally use quadrilaterals for 3D models (a friend tells me they’re good for animation) and we’re going to assume that all polygons are triangles. If a higher order polygon is given, it can be divided up into triangles.

To simplify, we will divide up our triangles like so,

Triangles

This problem is similar to integrating areas which coincidentally is what I was teaching to my second years just a few weeks ago. It’s easy to determine which row of pixels to range over, just take the top point’s y value and the lowest point’s y value rounded down. More difficult is the range of x. It is done by interpolating the appropriate vertices to determine the points specified by the red crosses. In particular, if the second row of the triangle is y then the left and right crosses are at

    \[ {\bf L} = (1-\alpha_y){\bf p} + \alpha_y {\bf q}, \quad {\bf R} = (1-\alpha_y){\bf p} + \alpha_y {\bf r} \]

where

    \[ \alpha_y = \frac{y-(p_y+1)}{q_y-p_y}. \]

Choosing the lower boundary of the row for the points determining our range has the benefit that the points are definitely inside the triangle, other points may have the chance in the top most row of extending the triangle out above {\bf p}. Choosing to round outwards ensures that triangles placed next to each other have no gap between them, an erroneous colour here or there is no problem in comparison to having pixel wide gaps which you can see through. The z-value of any particular pixel at (x,y) can be determined by interpolating between the vertices with

    \[ z = \alpha_y((1-\alpha_x)q_z + \alpha_xr_z) + (1-\alpha_y)p_z \]

where

    \[ \alpha_x = \frac{x-L_x}{R_x-L_x}. \]

The z-value of each pixel is used to ensure that an object hidden is not drawn on top of what’s in front. We maintain a ‘z-buffer’ which remembers the depth of the polygon drawn to each pixel. If we try to draw to a pixel with a lower z-value than our current one, then we discard it and move on.

Texture mapping is, as the name implies, mapping textures onto a 3D model. That is, each vertex in the polygon are given extra coordinates in an image file which is mapped like a decal onto the model. Having the above mathematics done makes this task relatively easy, if p', q' and r' are the texture coordinates of the above vertices then we may obtain the position of the pixel using

    \[ \alpha_y((1-\alpha_x){\bf q}' + \alpha_x{\bf r}') + (1-\alpha_y){\bf p}'. \]

The lower triangle is drawn similarly, however the red crosses are taken at the top of the row.

Coding It Up

I won’t go into great details as to what the code is doing, you can read it yourself here. I’ll just briefly describe some things to get you going. Firstly, I’ve used SDL 1.2 (Simple Direct media Layer) which abstracts away talking to the OS. It’s a pretty awesome library allowing platform agnostic access to windows, drawing, input, audio and more. SDL2 has been used by these guys called VALVe who are pushing for multi-platform gaming. SDL2 however is more centred around hardware acceleration which is the exact opposite of what we want. To obtain SDL1.2 head on over to SDL.org and pick it up. If you want to compile my code you will also need SDL_image which handles all the image loading nastiness. Finally, I’ve made use of C++11’s new features so it will need to be compiled with C++11 enabled. I’ll leave you to figure out how to link and compile as it varies from system to system, although I can help if you shoot me an email. I used various models from friends to test, not having a proper full .obj reader means that some models wont work. If you head here you can grab a face I know will.