[yt-users] Problems reading GADGET 2 Binary datafile using yt

Wed Feb 22 11:12:06 PST 2017

On Wed, Feb 22, 2017 at 12:17 PM, Alankar Dutta <dutta.alankar at gmail.com>
wrote:

> Hello,
>
> The entire snapshot file is huge (around 1 TB) so I am attaching only the
> first part of it as of now via Google Drive. Hope this helps. I can share
> it via other means if necessary.
>

Hmm, there seems to be an issue when yt tries to parse the header and
validate that everything is consistent.

You can see the affected code block in yt here:

https://bitbucket.org/yt_analysis/yt/src/abf5a8eff1b2d0cd776a41a33e5d3f3d25232ecc/yt/frontends/gadget/data_structures.py?at=yt&fileviewer=file-view-default#data_structures.py-340

This function is called (indirectly) by yt.load() to verify that a given
file is *really* a Gadget binary file. When I follow along with the
execution of that function for the file you sent me, I end up getting that
np0 20736113 while np1 is 41472226.0 (i.e. exactly twice np0). The first
integer, np0, is read in directly from the header of the binary file and
corresponds to the number of particles in the dataset as written out to the
header by Gadget. The second, np1. is the number of particles in the
dataset inferred by reading in the size of the position block. The fact
that inferring the particle count with the size of the position block ends
up with exactly twice the number of particles as we expect probably
indicates that there's a single/double precision issue. In fact, it seems
that we expect each position entry to require 4 bytes (e.g. 32 bit, or
single precision), so I infer that your file contains double precision
positions (e.g. 8 bytes per particle).

So, all that to say, it looks like we would need to patch the Gadget
frontend to support your output type, which seems to have double precision
fields, which is a little bit different from the other Gadget binary
outputs we've seen in the past. These sorts of issues with Gadget are
unfortunately somewhat common due to the fragmentation in the Gadget
ecosystem, with many research groups maintaining mutually incompatible
versions of Gadget.

Note that if you look at page 32 in the Gadget user guide (
https://wwwmpa.mpa-garching.mpg.de/gadget/users-guide.pdf), this *is* a bit
different from the output format documented there, which specifies single
precision positions.

To add support for this output type we'd need to start with an example
smallish (<5 GB) dataset in this format that we can add as a public test
dataset on yt-project.org/data. Once that's available, we can patch the
Gadget frontend to support this output type.

Finally, you mentioned that your dataset is pretty large (~1 TB).
Unfortunately, yt will currently have trouble scaling to datasets that
large. Right now yt will require substantial amounts of RAM to index
datasets larger than about 1024^3 particles, since yt makes use of a global
octree for indexing and managing I/O chunking. With a dataset so large, the
octree index requires a substantial amount of RAM.

I am currently actively working on improving yt's scaling for large
particle datasets. This is a major development effort that will likely be
included in either the next major release of yt or the one after that.
Unfortunately I think you will have lots of issues trying to get yt's
current particle support to work well with as big of a dataset as you need
to work with and you will likely need to wait until the development effort
I'm working on is publicly available. I'd encourage you to sign up to the
yt-dev mailing list if you want to hear more about this effort. I will be
sharing a design document there describing the changes to yt that will be
necessary to improve scaling for particle data in the next week or two.

I hope that's helpful,

Nathan Goldbaum

> Cheers,
> Alankar
>  snapshot_068.0
> <https://drive.google.com/file/d/0B6IIQdUdRX9UN3dWRkdLWWxjNjg/view?usp=drive_web>
> 
>
> On Wed, Feb 22, 2017 at 10:37 PM, Nathan Goldbaum <nathan12343 at gmail.com>
> wrote:
>
>> We should have support for outputs in the SnapFormat=1 output format in
>> the latest release of yt. If you're not using the latest version of yt,
>> please try updating. If it's a multi-file dataset, you should load the 0th
>> file.
>>
>> If that is not working it would help to debug the issue if you can share
>> an output file that isn't loading. The easiest way to share an ouput is to
>> use the yt curldrop:
>>
>> https://docs.hub.yt/services.html#curldrop
>>
>> If you're not comfortable sharing the file publicly you can mail me
>> off-list with the link to the output file.
>>
>> Hope that helps,
>>
>> Nathan
>>
>> On Wed, Feb 22, 2017 at 10:49 AM, Alankar Dutta <dutta.alankar at gmail.com>
>> wrote:
>>
>>> Hello YT-community,
>>>
>>> This outputs are created with SnapFormat parameter set to 1. This is
>>> requirement mentioned in the users guideline of yt.
>>>
>>> Cheers,
>>> Alankar
>>>
>>> On Wed, Feb 22, 2017 at 10:02 PM, Alankar Dutta <dutta.alankar at gmail.com
>>> > wrote:
>>>
>>>> Hello YT-community,
>>>>
>>>> I have been trying to use yt for analysis of the output from a GADGET 2
>>>> simulation stored as an Unformatted Fortran Binary. It consists of files
>>>> named as snapshot_068 which is divided into 1024 subfiles named as
>>>> snapshot_068.0, snapshot_068.1 and so on. Whenever I am loading this with
>>>> yt I am getting the following error message and I have got no idea as to
>>>> how to fix this. I have also tried reading only one subfile of this multi
>>>> part snapshot but with no success. I am relying on the community to help me
>>>> in this regard.
>>>>
>>>> My code:
>>>>
>>>> fname = 'snapdir_068/snapshot_068'
>>>> ds = yt.load(fname)
>>>>
>>>> Error displayed:
>>>>
>>>> yt : [ERROR    ] 2017-02-22 21:55:34,587 None of the arguments provided
>>>> to load() is a valid file
>>>> yt : [ERROR    ] 2017-02-22 21:55:34,587 Please check that you have
>>>> used a correct path
>>>> Traceback (most recent call last):
>>>>   File "<stdin>", line 1, in <module>
>>>>   File "/home/alankar/anaconda3/lib/python3.5/site-packages/yt/convenience.py",
>>>> line 76, in load
>>>>     raise YTOutputNotIdentified(args, kwargs)
>>>> yt.utilities.exceptions.YTOutputNotIdentified: Supplied
>>>> ('snapshot_068',) {}, but could not load!
>>>>
>>>>
>>>>
>>>>
>>>> #Trying to read only one of the multi part file
>>>> My code:
>>>>
>>>> fname = 'snapdir_068/snapshot_068.0'
>>>> ds = yt.load(fname)
>>>>
>>>> Error displayed:
>>>>
>>>> yt : [ERROR    ] 2017-02-22 21:57:17,625 Couldn't figure out output
>>>> type for /media/alankar/Seagate Expansion Drive/mb2/snapshots/snapdir_06
>>>> 8/snapshot_068.0
>>>> Traceback (most recent call last):
>>>>   File "<stdin>", line 1, in <module>
>>>>   File "/home/alankar/anaconda3/lib/python3.5/site-packages/yt/convenience.py",
>>>> line 98, in load
>>>>     raise YTOutputNotIdentified(args, kwargs)
>>>> yt.utilities.exceptions.YTOutputNotIdentified: Supplied
>>>> ('snapshot_068.0',) {}, but could not load!
>>>>
>>>>
>>>> Cheers,
>>>> Alankar
>>>>
>>>>
>>>
>>> _______________________________________________
>>> yt-users mailing list
>>> yt-users at lists.spacepope.org
>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>
>>>
>>
>> _______________________________________________
>> yt-users mailing list
>> yt-users at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>
>>
>
> _______________________________________________
> yt-users mailing list
> yt-users at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spacepope.org/pipermail/yt-users-spacepope.org/attachments/20170222/4d6ba2b3/attachment.htm>