Developing a Software Renderer Part 2
Software Rendering ·In the second part of this series, I will describe how to optimize and improve the software rasterizer that we developed in the first part.
We will use a block based approach to rasterizing the triangles to be able to discard regions outside the triangle faster. We will also optimize the inner loop so that fever computations are required and refactor our code base to make things simple.
Refactoring
To make it easy to optimize our algorithm from Part 1 we will do some refactoring first.
We add some utility methods to our EdgeEquation
and ParameterEquation
classes to be able to step some given the edge and parameter value v
along the
x and y axis.
struct EdgeEquation {
[...]
/// Step the equation value v to the x direction.
float stepX(float v) const
{
return v + a;
}
/// Step the equation value v to the x direction.
float stepX(float v, float stepSize) const
{
return v + a * stepSize;
}
/// Step the equation value v to the y direction.
float stepY(float v) const
{
return v + b;
}
/// Step the equation value vto the y direction.
float stepY(float v, float stepSize) const
{
return v + b * stepSize;
}
};
For the ParameterEquation
class the methods look identical.
We encapsulate the edge and parameter equations in a TriangleEquations
class.
This allows us to make the drawTriangle
method shorter and pass all the
triangle equations to other method. We will need this for the rasterizeBlock
method and also for some edge and parameter stepping methods.
struct TriangleEquations {
float area;
EdgeEquation e0;
EdgeEquation e1;
EdgeEquation e2;
ParameterEquation r;
ParameterEquation g;
ParameterEquation b;
TriangleEquations(const Vertex &v0, const Vertex &v1, const Vertex &v2)
{
e0.init(v0, v1);
e1.init(v1, v2);
e2.init(v2, v0);
area = 0.5f * (e0.c + e1.c + e2.c);
// Cull backfacing triangles.
if (area < 0)
return;
r.init(v0.r, v1.r, v2.r, e0, e1, e2, area);
g.init(v0.g, v1.g, v2.g, e0, e1, e2, area);
b.init(v0.b, v1.b, v2.b, e0, e1, e2, area);
}
};
We also declare a PixelData
class which encapsulates the computed per pixel
data. This class will later be passed to a pixel shader once we have that
framework implemented. We add a stepX
and stepY
utility methods that allows
us to step the pixel data to the x and y direction.
Using the step we can compute the pixel data for the corner of a block that
uses the expensive evaluate
method and then compute the remaining pixels of
the block by simply stepping the initial value and avoiding to have to call
evaluate
for every pixel.
struct PixelData {
float r;
float g;
float b;
/// Initialize pixel data for the given pixel coordinates.
void init(const TriangleEquations &eqn, float x, float y)
{
r = eqn.r.evaluate(x, y);
g = eqn.g.evaluate(x, y);
b = eqn.b.evaluate(x, y);
}
/// Step all the pixel data in the x direction.
void stepX(const TriangleEquations &eqn)
{
r = eqn.r.stepX(r);
g = eqn.g.stepX(g);
b = eqn.b.stepX(b);
}
/// Step all the pixel data in the y direction.
void stepY(const TriangleEquations &eqn)
{
r = eqn.r.stepY(r);
g = eqn.g.stepY(g);
b = eqn.b.stepY(b);
}
};
And finally we implement a EdgeData
class which encapsulates the computed edge
values for the triangle and also allows for easy stepping along the x and y
direction.
struct EdgeData {
float ev0;
float ev1;
float ev2;
/// Initialize the edge data values.
void init(const TriangleEquations &eqn, float x, float y)
{
ev0 = eqn.e0.evaluate(x, y);
ev1 = eqn.e1.evaluate(x, y);
ev2 = eqn.e2.evaluate(x, y);
}
/// Step the edge values in the x direction.
void stepX(const TriangleEquations &eqn)
{
ev0 = eqn.e0.stepX(ev0);
ev1 = eqn.e1.stepX(ev1);
ev2 = eqn.e2.stepX(ev2);
}
/// Step the edge values in the x direction.
void stepX(const TriangleEquations &eqn, float stepSize)
{
ev0 = eqn.e0.stepX(ev0, stepSize);
ev1 = eqn.e1.stepX(ev1, stepSize);
ev2 = eqn.e2.stepX(ev2, stepSize);
}
/// Step the edge values in the y direction.
void stepY(const TriangleEquations &eqn)
{
ev0 = eqn.e0.stepY(ev0);
ev1 = eqn.e1.stepY(ev1);
ev2 = eqn.e2.stepY(ev2);
}
/// Step the edge values in the y direction.
void stepY(const TriangleEquations &eqn, float stepSize)
{
ev0 = eqn.e0.stepY(ev0, stepSize);
ev1 = eqn.e1.stepY(ev1, stepSize);
ev2 = eqn.e2.stepY(ev2, stepSize);
}
/// Test for triangle containment.
bool test(const TriangleEquations &eqn)
{
return eqn.e0.test(ev0) && eqn.e1.test(ev1) && eqn.e2.test(ev2);
}
};
Block Based Rasterization
With all of these helper classes block based triangle rasterization now becomes
a lot simpler. In the drawTriangle
method we first compute the triangle
equations and discard back-facing triangles. We compute the triangle bounding box
and clip it to the scissor rectangle as we did in Part 1.
void drawTriangle(const Vertex& v0, const Vertex &v1, const Vertex &v2)
{
// Compute triangle equations.
TriangleEquations eqn(v0, v1, v2);
// Check if triangle is back-facing.
if (eqn.area < 0)
return;
// Compute triangle bounding box and clip to scissor rect.
[...]
Then we round the bounding box values to a block grid. This allows us to then easily step whole blocks in the following for loops which step the x and y values in BlockSize steps.
// Round to block grid.
minX = minX & ~(BlockSize - 1);
maxX = maxX & ~(BlockSize - 1);
minY = minY & ~(BlockSize - 1);
maxY = maxY & ~(BlockSize - 1);
Inside the for loops we compute the EdgeData
for the four corners and test
each corner for containment in the triangle. If all are outside we can safely
reject the whole block. If all are inside we can rasterize a fully covered block
with rasterizeBlock<false>
where the template parameter tells if the edge
equations should be computed or not. Since the block is fully covered the edge
equations will not need to be computed and we can pass false. If the block is
partially covered we call rasterizeBlock<true>
to make that function also
compute the edge equations.
float s = (float)BlockSize - 1;
// Add 0.5 to sample at pixel centers
for (float x = minX + 0.5f, xm = maxX + 0.5f; x <= xm; x += BlockSize)
for (float y = minY + 0.5f, ym = maxY + 0.5f; y <= ym; y += BlockSize)
{
// Test if block is inside or outside triangle or touches it
EdgeData e00; e00.init(eqn, x, y);
EdgeData e01 = e00; e01.stepY(eqn, s);
EdgeData e10 = e00; e10.stepX(eqn, s);
EdgeData e11 = e01; e11.stepX(eqn, s);
bool e00_0 = eqn.e0.test(e00.ev0), e00_1 = eqn.e1.test(e00.ev1), e00_2 = eqn.e2.test(e00.ev2), e00_all = e00_0 && e00_1 && e00_2;
bool e01_0 = eqn.e0.test(e01.ev0), e01_1 = eqn.e1.test(e01.ev1), e01_2 = eqn.e2.test(e01.ev2), e01_all = e01_0 && e01_1 && e01_2;
bool e10_0 = eqn.e0.test(e10.ev0), e10_1 = eqn.e1.test(e10.ev1), e10_2 = eqn.e2.test(e10.ev2), e10_all = e10_0 && e10_1 && e10_2;
bool e11_0 = eqn.e0.test(e11.ev0), e11_1 = eqn.e1.test(e11.ev1), e11_2 = eqn.e2.test(e11.ev2), e11_all = e11_0 && e11_1 && e11_2;
int result = e00_all + e01_all + e10_all + e11_all;
// Potentially all out.
if (result == 0)
{
// Test for special case.
bool e00Same = e00_0 == e00_1 == e00_2;
bool e01Same = e01_0 == e01_1 == e01_2;
bool e10Same = e10_0 == e10_1 == e10_2;
bool e11Same = e11_0 == e11_1 == e11_2;
if (!e00Same || !e01Same || !e10Same || !e11Same)
PixelShader::template drawBlock<true>(eqn, x, y);
}
else if (result == 4)
{
// Fully Covered
rasterizeBlock<false>(eqn, x, y);
}
else
{
// Partially Covered
rasterizeBlock<true>(eqn, x, y);
}
}
}
Block rasterization also becomes simple with the help of our utility classes and methods. We compute the edge and pixel values for the top-left corner of the block and step them along the x and y direction to avoid having to compute the whole equation per pixel.
template <bool TestEdges>
void rasterizeBlock(const TriangleEquations &eqn, float x, float y)
{
PixelData po;
po.init(eqn, x, y);
EdgeData eo;
if (TestEdges)
eo.init(eqn, x, y);
for (float yy = y; yy < y + BlockSize; yy += 1.0f)
{
PixelData pi = po;
EdgeData ei;
if (TestEdges)
ei = eo;
for (float xx = x; xx < x + BlockSize; xx += 1.0f)
{
if (!TestEdges || ei.test(eqn))
{
int rint = (int)(pi.r * 255);
int gint = (int)(pi.g * 255);
int bint = (int)(pi.b * 255);
Uint32 color = SDL_MapRGB(m_surface->format, rint, gint, bint);
putpixel(m_surface, (int)xx, (int)yy, color);
}
pi.stepX(eqn);
if (TestEdges)
ei.stepX(eqn);
}
po.stepY(eqn);
if (TestEdges)
eo.stepY(eqn);
}
}
Conclusion
Our block based approach works and it is definitely faster then the simple approach from the first part.
Other improvements yet to come are the support of texture coordinates and other per vertex parameters, perspective correct parameter interpolation, multi-threaded rasterization and a pixel shader framework that allows us to configure the per pixel operations in a flexible manner to support texture mapping, alpha blending and other stuff. These are topics that will be covered in the next posts.
Continue on Part 3.
Checkout the source at my github repo.