Add faster, optional sqr routine for internal LibTomMath