[yt-users] Performance when using derived fields

Thu Nov 27 12:53:17 PST 2014

I've reproduced similar timings using a FLASH dataset that I think is
smaller than the one you're working with.  Follow the issue I've created to
track this here:

https://bitbucket.org/yt_analysis/yt/issue/944/slices-of-fields-that-need-ghost-zones-are

Thanks for the report Dan!  Hopefully now that we have a test script that
shows similar timings, we should be able to work on making this faster.

On Thu, Nov 27, 2014 at 11:39 AM, Kacper Kowalik <xarthisius.kk at gmail.com>
wrote:

> On 11/27/2014 01:31 PM, Matthew Turk wrote:
> > Hi Dan (and Nathan),
> >
> > I take that back, I was looking at the wrong file.  IO is a very small
> > portion of the time -- a lot of it is spent in units-related
> > activities.  I'm going to dig in to see what the time is spent in, but
> > I think there is likely some location where a lot of teeny tiny unit
> > operations are happening, over and over and over, and that's killing
> > the performance.  Looks like it's in sanitize_units_add and
> > sanitize_units_mul, which gets called in data_containers.py (line 549)
> > in the selector creation.
> >
> > Following that further down, I believe the issue is inside the
> > RegionSelector, which gets generated a ton of times, and which is
> > doing all unitful operations even after converting the values to the
> > units it knows they should be in.  I think this might be fixable by
> > using the .d attributes inside the RegionSelector __init__.
> >
> > Dan, if you get an example of this working with the sample datasets on
> > the website, that would be very useful to work on.
> >
> > Kacper, what do you think about adding speed tests to the CI system?
>
> It's already there, see for example [1]. There's one "but"... we would
> need pretty decent box to make those kind of tests meaningful. I bet
> that 99.99% of fluctuation in [1] are caused by buildbox load rather
> code changes.
>
> Cheers,
> Kacper
>
> [1]
>
> http://tests.yt-project.org/job/py2.7-yt-3.0_answer.enzo.moving7/lastCompletedBuild/testReport/history/
>
> > -Matt
> >
> > On Thu, Nov 27, 2014 at 2:17 PM, Matthew Turk <matthewturk at gmail.com>
> wrote:
> >> Hi Dan,
> >>
> >> Looking this over, I think the problem is definitely that the ghost
> >> zone generation is currently quite stupid in 3.0.  We're revving up
> >> for a 3.1 release, but I think for 3.2 or even within the 3.1 cycle we
> >> can aim to have this fixed; I've implemented some simple caching that
> >> should speed it up, but which has some issues presently that keep it
> >> from being fully functional and effective.
> >>
> >> One issue that we've (well, I say "we" but I'm the most guilty) had
> >> lately is keeping track of things and ensuring that blockers and
> >> important new features are added with reasonable time.  This has come
> >> up repeatedly in the user survey that's currently out.  You've been in
> >> IRC lately and I wonder if maybe you could help us to keep track of
> >> this, by reminding me once in a while if it slips off the radar?  It
> >> seems the bug tracker is not the best place, and making sure that we
> >> have the underscoring that comes from mentioning and communicating can
> >> really help to prioritize things.
> >>
> >> -Matt
> >>
> >> On Thu, Nov 27, 2014 at 10:26 AM, Daniel Fenn <dsfenn at gmail.com> wrote:
> >>> Hi Nathan,
> >>>
> >>> I've attached the html output of pyprof2html. I did it for two cases--a
> >>> large file on a fast disk, and a somewhat smaller file on a slower
> disk. In
> >>> both cases, I only used one processor. The output is not real
> meaningful to
> >>> me, since I'm not familiar with the code, and I'm not much of a python
> >>> programmer.
> >>>
> >>> I'll work on doing this with one of the yt datasets, and let you know
> how it
> >>> goes.
> >>>
> >>> Thanks for your help.
> >>>
> >>> Dan
> >>>
> >>> 2014-11-27 0:08 GMT-05:00 Nathan Goldbaum <nathan12343 at gmail.com>:
> >>>>
> >>>>
> >>>>
> >>>> On Wed, Nov 26, 2014 at 8:09 PM, Daniel Fenn <dsfenn at gmail.com>
> wrote:
> >>>>>
> >>>>> Hi everyone. Is it normal for slices to take substantially longer to
> be
> >>>>> created if they're made from a derived field rather than an
> intrinsic one?
> >>>>>
> >>>>> Specifically, I'm having an issue creating an axis-aligned slice
> using
> >>>>> the divergence of the velocity field. It's taking around 6.5 minutes
> just to
> >>>>> make the slice, whereas if I use temperature or density, it takes
> around 10
> >>>>> seconds or so for the same dataset.
> >>>>
> >>>>
> >>>> Velocity divergence is particularly expensive because it needs ghost
> >>>> zones, which must be filled in using cascading interpolation.
> Depending on
> >>>> the AMR structure in your data, this might be quite expensive.
> >>>>
> >>>> That said, 6.5 minutes for a slice is pretty bad.
> >>>>
> >>>> Is there any chance you can profile the execution of your script? If
> you
> >>>> run your script like so:
> >>>>
> >>>> $ python -m cProfile myscript.py
> >>>>
> >>>> you should get a summary printed to stdout of where the interpreter is
> >>>> spending most of its time. You can also visualize this in more depth
> if you
> >>>> dump the profile data to a file, and then use pyprof2html or snakeviz
> to
> >>>> load and display the data, e.g:
> >>>>
> >>>> $ python -m cProfile -o out.cprof myscript.py
> >>>> $ pyprof2html out.cprof
> >>>>
> >>>> cProfile comes with pyhon, but pyp4orf2html and snakeviz must be
> installed
> >>>> separately using pip.
> >>>>
> >>>> It would be great if you could file an issue about this.  Even better
> if
> >>>> you can reproduce it using one of the test datasets on
> yt-project.org/data
> >>>> and attach a script that we can run ourselves.
> >>>>
> >>>>>
> >>>>>
> >>>>> I also notice that the amount of time it takes is not dependent on
> the
> >>>>> number of processors I'm using. I've used 1, 12, and 24 processors,
> with
> >>>>> identical results, even though I'm calling enable_parallelism(), and
> I can
> >>>>> see that all the processes are running.
> >>>>>
> >>>>>
> >>>>> I read in the docs that slice operations aren't generally done in
> >>>>> parallel, but in this case it seems that maybe it would be
> beneficial. A
> >>>>> similar operation in VisIt completes much faster, so I'm wondering
> if I've
> >>>>> misconfigured something, or if there is something I can do to speed
> things
> >>>>> up.
> >>>>>
> >>>>> I'd appreciate any thoughts anyone has on the subject.
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Dan
> >>>>>
> >>>>> _______________________________________________
> >>>>> yt-users mailing list
> >>>>> yt-users at lists.spacepope.org
> >>>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
> >>>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> yt-users mailing list
> >>>> yt-users at lists.spacepope.org
> >>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
> >>>>
> >>>
> >>>
> >>> _______________________________________________
> >>> yt-users mailing list
> >>> yt-users at lists.spacepope.org
> >>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
> >>>
> > _______________________________________________
> > yt-users mailing list
> > yt-users at lists.spacepope.org
> > http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
> >
>
>
>
> _______________________________________________
> yt-users mailing list
> yt-users at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spacepope.org/pipermail/yt-users-spacepope.org/attachments/20141127/55a14c4a/attachment.htm>