[yt-dev] round-off error in TotalQuantity

Wed Jul 10 11:30:20 PDT 2013

On Wed, Jul 10, 2013 at 2:26 PM, Douglas Harvey Rudd <drudd at uchicago.edu> wrote:
> As a followup, I believe there is at least once instance where we would not
> want to use float64 as the accumulator: if the field itself is integer and
> the sum does not overflow.  Something like this might be better:
>
> if 'int' in str(data[field].dtype) :
>    totals.append(data[field].sum(dtype=np.int64))
> elif 'float' in str(data[field].dtype) :
>    totals.append(data[field].sum(dtype=np.float64))
> else :
>   totals.append(data[field].sum())
>

The string operators will be slower than we want here; I think we
should use something like:

if data[field].dtype in (np.int32, np.int64)

Additionally, I think in most cases where values are passed to Cython
we'll be casting them to a 64 bit number, either integer or float.
It's probably safe to assume that at some point during the computation
the values will be promoted to 64-bit, and then decompose via chunking
to avoid having too much in memory.

-Matt

> Using np.float64 in the case of integers as well would alleviate the
> overflow problem, but would lead to approximate results where the user might
> expect an exact answer.
>
> Douglas Rudd
> Scientific Computing Consultant
> Research Computing Center
> drudd at uchicago.edu
>
>
>
> On Jul 10, 2013, at 6:26 AM, Britton Smith <brittonsmith at gmail.com> wrote:
>
> Hi Doug,
>
> I don't seem any reason why we wouldn't want to fix this to always use
> higher precision.  Could you issue a PR?
>
> Britton
>
>
> On Wed, Jul 10, 2013 at 2:36 AM, Douglas Harvey Rudd <drudd at uchicago.edu>
> wrote:
>>
>> Hi all,
>>
>> I'm working on getting the ARTIO frontend working with a script used by
>> the AGORA collaboration to measure density profiles.  This script uses the
>> "TotalQuantity" quantity to sum all particle masses within a given sphere,
>> then differences a nested set of spheres to measure the mean density in
>> spherical shells.
>>
>> I noticed a 2% difference at large radius from my own profiling routines,
>> and tracked it down to the precision of the sum, which used the same
>> precision as the input variable, float32 for ARTIO. Since there are a large
>> number of particles, summing 1e5 x 10^-7 floats led to O(1%) error.
>>
>> While this is clearly not an ideal way to measure density profiles, it's
>> probably a good idea to help avoid such issues in future by changing
>> TotalQuantity (and other places, like _TotalMass) to use a 64-bit
>> accumulator, e.g.
>>         totals.append(data[field].sum(dtype=np.float64))
>>
>> The alternative is to promote all float32 input variables to float64,
>> which is wasteful of memory, or otherwise warn the user that they are likely
>> to encounter loss of precision.
>>
>> Douglas Rudd
>> Scientific Computing Consultant
>> Research Computing Center
>> drudd at uchicago.edu
>>
>>
>>
>> _______________________________________________
>> yt-dev mailing list
>> yt-dev at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>
>
> _______________________________________________
> yt-dev mailing list
> yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>
>
>
> _______________________________________________
> yt-dev mailing list
> yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>