I use quaternions if I need to span the whole space (i.e. my initial estimate could be far off the answer). If finding a solution using optimization, e.g.
R* =argmin_(R∈SO(3) ) norm(f(R))^2
Then given the function quat2mat which converts to SO(3), I optimize the function
g(q) = f(quat2mat(q/norm(q)))
The division by norm doesn't actually make the derivatives much worse, and is much better than optimization subject to |q|=1. I tend not to worry too much about the gauge freedom g(q) = g(λq), but one should renormalize q after each descent step/linesearch.
If I believe I have a decent initial estimate R_init (sounds like your problem), then I tend to use the exponential map
R=R_init exp([a]_× )
where exp is the matrix exponential, [v]_× is the 3×3 matrix which effects cross-product by v, and a is, oddly, the rotation axis multiplied by the angle of rotation. http://en.wikipedia.org/wiki/Rodrigues%27_rotation_formula
It has pleasant derivatives, its singularities live far from the origin, and it’s 3 parameters, so you don’t need to get inside the optimizer to renormalize.
But of course, for best behaviour, you do need to rebase inside the optimizer to ensure that each iteration computes an update in the tangent plane (you'll need to read up for this, starting at http://lear.inrialpes.fr/pubs/2000/TMHF00/Triggs-va99.pdf perhaps.)
Update: http://awful.codeplex.com/SourceControl/latest#examples/optimization_tutorial/test_rodrigues.m plots some graphs that suggest rebased exponential map does reduce average iteration counts.