# Archive for May, 2009

### Scale and 32-Bit Imprecision

Over the last few months, I have made quite a few posts and answered a few questions about the issue of imprecision, and more importantly how I overcame it. So I have decided to write an article, so I can always link back to it.

There are a few ways of overcoming this problem, but just to be sure what we’re talking about, let me explain how this imprecision occurs. The normal way to store coordinates and positions in XNA (and DirectX for that matter) is to use 32-bit floating point numbers, for example when you use the Vector2 and Vector3 structs. This is the best way to work, especially considering that most of today’s GPUs do not support 64-bit floating point values (doubles) and it is more than enough to render normal scenes. 32-bit floating point numbers are able to store numbers with up to 15 digits, but the problem is however, that they can only do precise arithmetic on the first 5 or 6 digits. Consider the following:

0.000001 + 0.000001 does not always equal 0.000002

1000001 – 1 does not always equal 1000000

Obviously, if you are trying to navigate the camera very far away from (0,0,0) then you will start to encounter problems. This becomes a real problem say, if you are making a space shooter with realistic distances between planets, or a medieval trading game with a planet the size of Earth. In the latter case, when you get down to the meter level of the planet, the vertices start to shake.

The symptoms to look for if you suspect floating point imprecision are randomly ‘shaking’ vertices and problems with navigating the camera. This is obviously a byproduct of these miscalculated values as shown above.

After doing some searching on the internet, it is quite apparent that the two main ways of overcoming this is to either,

a) Represent all object positions using fp64 values (doubles).

b) Use fixed point numbers.

After looking around, I decided to use the first option, and so this is what I will outline below. Basically, the biggest problem with (a) is that the GPU does not support fp64 values, so before rendering we must first convert everything back to fp32 values. A lot of people actually suggested that fixed point numbers (b) are the way forward, but the implementation seemed a lot harder than using doubles, and two of the biggest arguments were that 1) most older computers and resource limited devices do not have dedicated FPUs for floating point calculations, and 2) that using doubles doesn’t actually solve the problem, but the first point isn’t really a problem for today’s gaming machines and although using doubles doesn’t fix the problem, it is certainly good enough to use, as I will explain below.

**Using fp64 doubles for positions:**

So this was my chosen method for Britonia. But what does switching to fp64 doubles actually do for us? Well, it allows us to do calculations on values with up to 15 digits. This is enough precision to represent the space between the Sun and Pluto with a resolution down to the centremeter level, which should solve the majority of our arithmetic problems (if you are creating a universe, you would have to create different ‘spaces’ were 1 unit has different values. For example, at the Galaxy level 1 unit may be 1LY, and at the solar system level 1 unit may be 1 meter etc.)

Okay, now using 64-bits solves the arithmetic problem, but there is still the problem that the GPU doesn’t support fp64. This means we have to convert the 64-bit values into 32-bit values before sending anything off to the GPU, but if we are sending massive fp32 values to the GPU, it is also going to run into imprecision problems when transforming the vertices to clip/screen space etc.

The idea here is pretty simple. The problem of imprecision occurs when numbers get either too large or too small as we saw above. Imagine we have a ship at 15000010 and the camera is at 15000040 looking down at the ship (In 1D space). We store the positions in fp64 doubles, which enables us to calculate and apply velocity or pan the camera around the ship or whatever, all with 64-bit precision. But we still need to convert the positions into useable fp32 values. To do this, instead of treating 0 (or (0, 0, 0) in 3D space) as the origin of the universe, we use the camera’s current position as the origin. Then we calculate the positions of all the objects relative to the camera (as this is now the origin). So, for the ship we would do:

15000010 – 15000040 // ship pos minus cam position

= -30.

This means that if we set the camera position to 0 the relative ship position will be -30. As you can see, this effectively solves the imprecision from large numbers.

So, from our example above, we would calculate the 3d positional data of a scene as follows:

fp64Vector3 loCameraPosition = new fp64Vector3 (0, 1500013002, 200); fp64Vector3 loObjectPosition = new fp64Vector3 (0, 1500011002, 0); // Get the object position relative to the camera positon. Fp64Vector3 loRelativeObjectPosition = loObjectPosition - loCameraPosition; // Convert this value into a fp32 value Vector3 lo32bitPosition = new Vector3( (float) loRelativeObjectPosition.X, (float) loRelativeObjectPosition.Y, (float) loRelativeObjectPosition.Z); // Use the 32bit position data to create a world matrix for the object which can be sent to the GPU for rendering. Note that the rotations can be calculated like normal with 32-bit floats, as rotations here are local to the object. Matrix loWorld = Matrix.CreateRotations(0) * Matrix.CreateTranslation(lo32bitPosition);

And from here you can render the object as you would any other object in XNA. Just pass the new world matrix for this object to the shader, set the camera position to Vector3.Zero, create a new View Matrix and render the model.

// Create a new view matrix and treat the camera as the centre as the centre of the world. Matrix loViewMatrix = Matrix.CreateLookAt(Vector3.Zero, Camera.Forward, Camera.Up); Effect.Parameters[“xWorld”].SetValue(loWorld); // this is the world position relative to the camera Effect.Parameters[“xView”].SetValue(loViewMatrix); // this is our new view matrix with cam pos set to Zero. Effect.Parameters[“xProjection”].SetValue(Camera.Projection); // this is just a normal projection matrix // Render the model with the new view matrix with the camera position set as Vector3.Zero. Model.Render();

Wraping Up.

As mentioned above, I would just like to point out that this method doesn’t actually fix anything, as imprecision does still occur in objects which are very far away from the camera, but by using fp64 doubles, this means that any problems will occur so far away that you wouldn’t notice a difference anyway. Just think about sitting on the sun watching two ants playing chase on Pluto.

## Recent Comments