[yt-dev] Data downloader

Tue Jul 24 11:22:47 PDT 2012

Hi all,

Alright -- I am going to move it to a single .tar file.  I'll put up
an index.html that points to each "simulation" and includes the file
size.  We can hand out either the URLs or just a pointer to /data/ .
If anyone has any additional data they'd like to upload, let me know
either on or off list and we can add it in.

Kacper noted in IRC that we could be saving a considerable amount of
disk space by repacking with GZIP filtering enabled.  I'm not going to
do this at the current time, but I wanted to note that we should be
exploring this in the future.

For those of you who have been seeing the PRs going back and forth,
there's been quite a bit of documentation and polishing work going on
in the yt repos.  It looks like we're down to only a handful of
tickets for 2.4, so I think we might be able to put out a release this
week or early next.  (Been a long time coming, too!)

-Matt

On Tue, Jul 24, 2012 at 2:13 PM, Nathan Goldbaum <nathan12343 at gmail.com> wrote:
> Hi Casey,
>
> The direct links are of course still available (just not publicly visible):
> http://yt-project.org/data/
>
> -Nathan
>
> On Jul 24, 2012, at 11:09 AM, Casey W. Stark wrote:
>
> Sounds good. I would support a tarball per simulation also. Maybe this is
> just me, but I would prefer a direct link and using my own tools.
>
> - Casey
>
>
> On Tue, Jul 24, 2012 at 11:01 AM, Britton Smith <brittonsmith at gmail.com>
> wrote:
>>
>> Even one tar file per simulation would be ok, at least for the smaller
>> ones.  The enzo_tiny_cosmology simulation is designed to showcase time
>> series and things using multiple datasets, so having a tarfile for each
>> dataset is probably unnecessary.  Perhaps we could just evaluate what files
>> are meant to be used in groups and put those together in single tarfiles.
>>
>> Britton
>>
>> On Tue, Jul 24, 2012 at 1:55 PM, Casey W. Stark <caseywstark at gmail.com>
>> wrote:
>>>
>>> I would be in favor of one tarball for simplicity. Are the example files
>>> that large?
>>>
>>> - Casey
>>>
>>>
>>> On Tue, Jul 24, 2012 at 10:48 AM, Matthew Turk <matthewturk at gmail.com>
>>> wrote:
>>>>
>>>> Hi John,
>>>>
>>>> Hm, that's puzzling.  Stephen, any ideas?
>>>>
>>>> After Stephen's email, I went from +0 on keeping the downloader to +1,
>>>> because I think having it from the command line is a much simpler
>>>> solution than what we had tried before, which was the
>>>> download-by-hand.  So let's see if we can address this, and then check
>>>> it in to scripts/ .
>>>>
>>>> -Matt
>>>>
>>>> On Tue, Jul 24, 2012 at 1:45 PM, John ZuHone <jzuhone at gmail.com> wrote:
>>>> > Hi all,
>>>> >
>>>> > What I don't like about the downloader is the directory structure it
>>>> > creates. At least on my machine, if I download only the sloshing dataset, I
>>>> > get:
>>>> >
>>>> > GasSloshing/GasSloshing/sloshing_nomag2*
>>>> >
>>>> > as the location of the files. Is there any reason why it ended up this
>>>> > way?
>>>> >
>>>> > John
>>>> >
>>>> > On Jul 24, 2012, at 11:26 AM, Stephen Skory wrote:
>>>> >
>>>> >> Hi Matt,
>>>> >>
>>>> >>> One question that's come up is: do we want to continue using
>>>> >>> download.py?  The burden on uploaders is moderately higher, in that
>>>> >>> the files all have to be tarred up in a particular way, and they
>>>> >>> have
>>>> >>> to be added to download.py, but it does provide a measure of
>>>> >>> robustness.  If we do this, can we move download.py into the main
>>>> >>> distribution, under scripts/ ?  Stephen, John, others who have used
>>>> >>> the script, what do you think about this?
>>>> >>
>>>> >> I will not be insulted if we do away with the download script. If a
>>>> >> web page of download links is easier for everyone around, that's fine
>>>> >> by me. The best argument I can think of for keeping it, or something
>>>> >> similar, it is forces some kind of uniformity so that the datasets
>>>> >> are
>>>> >> in an expected layout. Also, downloading tens of data dumps one at at
>>>> >> time from a webpage is kind of tedious. That could be solved by
>>>> >> having
>>>> >> both the separate data dumps and a big 'ol tarball of the whole thing
>>>> >> so people could get exactly what they want, but that doubles disk
>>>> >> space on someone's computer.
>>>> >>
>>>> >> But this isn't HIPPA medical data, so we can be loosie goosie if that
>>>> >> makes things easier for everyone!
>>>> >>
>>>> >> --
>>>> >> Stephen Skory
>>>> >> s at skory.us
>>>> >> http://stephenskory.com/
>>>> >> 510.621.3687 (google voice)
>>>> >> _______________________________________________
>>>> >> yt-dev mailing list
>>>> >> yt-dev at lists.spacepope.org
>>>> >> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>>>> >
>>>> > _______________________________________________
>>>> > yt-dev mailing list
>>>> > yt-dev at lists.spacepope.org
>>>> > http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>>>> _______________________________________________
>>>> yt-dev mailing list
>>>> yt-dev at lists.spacepope.org
>>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>>>
>>>
>>>
>>> _______________________________________________
>>> yt-dev mailing list
>>> yt-dev at lists.spacepope.org
>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>>>
>>
>>
>> _______________________________________________
>> yt-dev mailing list
>> yt-dev at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>>
>
> _______________________________________________
> yt-dev mailing list
> yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>
>
>
> _______________________________________________
> yt-dev mailing list
> yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>