[yt-dev] Issue #1305: Performance Issues with large BoxLib datasets (yt_analysis/yt)

Chris Byrohl issues-reply at bitbucket.org
Wed Jan 4 10:34:26 PST 2017


New issue 1305: Performance Issues with large BoxLib datasets
https://bitbucket.org/yt_analysis/yt/issues/1305/performance-issues-with-large-boxlib

Chris Byrohl:

When loading a larger BoxLib dataset (~270Gb), yt is stuck for hours without making any progress. 

Interrupting points to _reconstruct_parent_child(self):

```
#!python

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-14-fa86699093be> in <module>()
----> 1 box['density']

/home/uni09/cosmo/cbyrohl/anaconda3/envs/py35/lib/python3.5/site-packages/yt/data_objects/data_containers.py in __getitem__(self, key)
    263         Returns a single field.  Will add if necessary.
    264         """
--> 265         f = self._determine_fields([key])[0]
    266         if f not in self.field_data and key not in self.field_data:
    267             if f in self._container_fields:

/home/uni09/cosmo/cbyrohl/anaconda3/envs/py35/lib/python3.5/site-packages/yt/data_objects/data_containers.py in _determine_fields(self, fields)
    993             else:
    994                 fname = field
--> 995                 finfo = self.ds._get_field_info("unknown", fname)
    996                 if finfo.particle_type:
    997                     ftype = self._current_particle_type

/home/uni09/cosmo/cbyrohl/anaconda3/envs/py35/lib/python3.5/site-packages/yt/data_objects/static_output.py in _get_field_info(self, ftype, fname)
    622     _last_finfo = None
    623     def _get_field_info(self, ftype, fname = None):
--> 624         self.index
    625         if fname is None:
    626             if isinstance(ftype, DerivedField):

/home/uni09/cosmo/cbyrohl/anaconda3/envs/py35/lib/python3.5/site-packages/yt/data_objects/static_output.py in index(self)
    417                 raise RuntimeError("You should not instantiate Dataset.")
    418             self._instantiated_index = self._index_class(
--> 419                 self, dataset_type=self.dataset_type)
    420             # Now we do things that we need an instantiated index for
    421             # ...first off, we create our field_info now.

/home/uni09/cosmo/cbyrohl/anaconda3/envs/py35/lib/python3.5/site-packages/yt/frontends/boxlib/data_structures.py in __init__(self, ds, dataset_type)
    144         self.directory = ds.output_dir
    145 
--> 146         GridIndex.__init__(self, ds, dataset_type)
    147         self._cache_endianness(self.grids[-1])
    148 

/home/uni09/cosmo/cbyrohl/anaconda3/envs/py35/lib/python3.5/site-packages/yt/geometry/geometry_handler.py in __init__(self, ds, dataset_type)
     48 
     49         mylog.debug("Setting up domain geometry.")
---> 50         self._setup_geometry()
     51 
     52         mylog.debug("Initializing data grid data IO")

/home/uni09/cosmo/cbyrohl/anaconda3/envs/py35/lib/python3.5/site-packages/yt/geometry/grid_geometry_handler.py in _setup_geometry(self)
     52 
     53         mylog.debug("Constructing grid objects.")
---> 54         self._populate_grid_objects()
     55 
     56         mylog.debug("Re-examining index")

/home/uni09/cosmo/cbyrohl/anaconda3/envs/py35/lib/python3.5/site-packages/yt/frontends/boxlib/data_structures.py in _populate_grid_objects(self)
    295         mylog.debug("Creating grid objects")
    296         self.grids = np.array(self.grids, dtype='object')
--> 297         self._reconstruct_parent_child()
    298         for i, grid in enumerate(self.grids):
    299             if (i % 1e4) == 0: mylog.debug("Prepared % 7i / % 7i grids", i,

/home/uni09/cosmo/cbyrohl/anaconda3/envs/py35/lib/python3.5/site-packages/yt/frontends/boxlib/data_structures.py in _reconstruct_parent_child(self)
    311                                 self.grid_levels[i] + 1,
    312                                 self.grid_left_edge, self.grid_right_edge,
--> 313                                 self.grid_levels, mask)
    314             ids = np.where(mask.astype("bool"))  # where is a tuple
    315             grid._children_ids = ids[0] + grid._id_offset
```

The result of 

```
#!python

np.savez('data.npz', left_edge=self.grid_left_edge, right_edge=self.grid_right_edge, levels=self.grid_levels)
```

for the beginning of that routine can be found here: http://use.yt/upload/87b007b1




More information about the yt-dev mailing list