[yt-dev] io chunks in yt-3.0

Tue May 13 04:56:04 PDT 2014

On Mon, May 12, 2014 at 2:26 PM, Britton Smith <brittonsmith at gmail.com> wrote:
> Hi Matt,
>
> Sure, I'm doing the following with the rockstar_halos data on
> yt-project.org/data:
>
> pf = load("rockstar_halos/halos_0.0.bin")
> dd = pf.all_data()
> print dd["particle_mass"].size
> for chunk in dd.chunks([], "io"):
>     print chunk["particle_mass"].size
>
> This dataset has two files, but running this, I only get one chunk with the
> same size as the all_data container.

I've looked into it, and this is my fault for how the particle
datasets are currently set up.  I knew this a little bit ago, but it's
been long enough since I last looked at it that I've forgotten.  Right
now, this is the *biggest* problem with particle datasets.  There is
no support for sub-selecting data components as is.  There are two
things:

1) If you select a small region, it will not read all the data files
-- it knows the ones to read.
2) There's only one chunk, regardless of what you select.

This is because right now, everything lives inside the single global
octree.  This is something I worked on last fall, but didn't finish.
Essentially, what needs to happen is to convert the single global
octree to a forest of octrees, similar to how RAMSES, ART, ARTIO are
set up in yt.  The reason it's somewhat harder for particle datasets
is that many of the operations we do on particle datasets
(specifically on SPH) require neighboring particles, so the selection
needs to be somewhat fuzzier to ensure boundary particles are
included.

This is something that's essentially ready to go, but when we started
pushing for 3.0, I put it off on my todo list until 3.1.  It's going
to be a major blocker not too long from now -- we're already hitting
that point -- so the sooner we get to 3.0 the sooner we can do this.
In NYC this week I can talk with you in person and maybe we can figure
out a game plan for trying to implement this, if you want to have a
go.

-Matt

>
> Britton
>
>
> On Mon, May 12, 2014 at 2:19 PM, Matthew Turk <matthewturk at gmail.com> wrote:
>>
>> Hi Britton,
>>
>> On Mon, May 12, 2014 at 2:16 PM, Britton Smith <brittonsmith at gmail.com>
>> wrote:
>> > Hey all,
>> >
>> > I'm working on changing how HaloCatalog objects loop over halos from a
>> > model
>> > where every processor has to hold the entire halo list to a model in
>> > which
>> > we loop over "io" chunks.  This will make the HaloCatalog scale much
>> > better
>> > to extremely large catalogs.  My understanding was that an "io" chunk
>> > was
>> > essentially one file on disk.  However, when I try to get io chunks from
>> > any
>> > one of the various halo catalog frontends, there seems to only ever be a
>> > single chunk that contains everything, regardless of how many files on
>> > disk
>> > the data is spread over.  Is there a way to change this so that an io
>> > chunk
>> > represents the data from a single file?
>>
>> That sounds like a bug; IO should be single file at a time.  Can you
>> show me how you're checking?
>>
>> -Matt
>>
>> >
>> > Britton
>> >
>> > _______________________________________________
>> > yt-dev mailing list
>> > yt-dev at lists.spacepope.org
>> > http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>> >
>> _______________________________________________
>> yt-dev mailing list
>> yt-dev at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>
>
>
> _______________________________________________
> yt-dev mailing list
> yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>