[yt-users] TotalQuantity vs. summing manually

Mon Dec 23 06:26:33 PST 2013

Hi Geoffrey,

On Sun, Dec 22, 2013 at 7:51 PM, Geoffrey So <gsiisg at gmail.com> wrote:
> Thanks, Matt, John, some follow-up
>
> "In any case, the difference in the total quantity is about a few percent it
> looks like. Are you attempting to quantify something with that level of
> sensitivity?"
> - I'm trying to compare star particle mass, and if I use TotalQuantity and
> StarMassMsun comparing to summing the star particle's mass.  For small halos
> that are only a few cells, the difference can be quite high ~O(10%).  So
> from the perspective of actually particles inside the halo, I should use the
> particle list inside the 3D object, right?  Same thing with dark matter
> particle mass?

I think it all comes down to how you want to represent a halo, the
force resolution in your simulation, and how the halos are identified.
 As John noted, you're smearing our stars over a handful of cells, so
if that is what you want to be examining and calculating then you are
doing it right; if the precision of your halo definition is higher
than that of the precision of a cell definition then you should use
the halo definition.  I am not prepared to make a suggestion on that,
as it will really depend on to what end you put the results of the
analysis, the specific manner in which the data was generated, and how
you generate your halo definitions.

>
> "The output isn't what you might expect, because numpy uses the dtype of the
> array for the accumulation."
> - I don't quite understand why would float32 accumulation with sum() not
> equal to the actual thing but casting it as float64 does (isn't 2^32 < 512^3
> so no overflow should occur)?  Should I always use arr.sum(dtype='float64')
> when using sum() with possible large datasets?

You should probably use float64 even for relatively small arrays --
the roundoff error can accumulate quite quickly.  As for the other
issue, 2^32 isn't the maximum for float32; you have to track the
mantissa as well as the exponent for float values.  You might find
these of some interest:

http://www.cplusplus.com/reference/climits/
http://en.wikipedia.org/wiki/C_data_types

But often it's easier to think about things in terms of relative
differences -- the problem isn't representing big or small values,
it's representing relatively small differences.  A couple years ago I
forwarded the exchange I had on numpy-discussion about this to yt-dev;
if you look in the May 2010 archives you'll see it.

-Matt

>
> From
> G.S.
>
>
>
> On Sat, Dec 21, 2013 at 5:43 AM, Matthew Turk <matthewturk at gmail.com> wrote:
>>
>> Hi Geoffrey,
>>
>> On Fri, Dec 20, 2013 at 9:02 PM, Geoffrey So <gsiisg at gmail.com> wrote:
>> > Hi all,
>> >
>> > I found that using a Enzo dataset I was getting slightly different
>> > numbers
>> > when
>> > 1) using the list of dark matter particles selected by creation_time <
>> > 0.0
>> >
>> > In [97]: sph_dm = sph['creation_time'] < 0.0
>> > In [98]: print "%12.12e" % (sph['ParticleMassMsun'][sph_dm]).sum()
>> > 1.211311468567e+11
>> >
>> > 2) compared to summing the dark matter particles inside a 3D container
>> > with
>> > TotalQuantity
>> >
>> > In [101]: print "%12.12e" %
>> > (sph.quantities['TotalQuantity']('Dark_Matter_Density')[0]*vol/Msun)
>> > 1.188937185993e+11
>> >
>> > I'm wondering if the field and particles are handled differently when
>> > being
>> > counted as inside or outside the 3D container?
>> >
>>
>> I think John's answer is completely correct, and likely the dominant
>> problem, but there's a funny thing that happens if you're not extra
>> careful with big arrays.  As an example, something that I ran into a
>> couple years ago:
>>
>> # This is an array filled with ones.
>> arr = numpy.ones((512, 512, 512), dtype="float32")
>> print arr.sum()
>> print arr.size
>> print arr.sum(dtype = "float64")
>>
>> The output isn't what you might expect, because numpy uses the dtype
>> of the array for the accumulation.  The quantities -- thanks to Doug
>> Rudd -- are uniformly careful about upgrading to 64 bits the
>> accumulators used for quantities.  Again, I don't think this is
>> dominant, but it is something to be aware of.
>>
>> -Matt
>>
>> > From
>> > G.S.
>> >
>> > _______________________________________________
>> > yt-users mailing list
>> > yt-users at lists.spacepope.org
>> > http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>> >
>> _______________________________________________
>> yt-users mailing list
>> yt-users at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>
>
>
> _______________________________________________
> yt-users mailing list
> yt-users at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>