Metadata-Version: 2.1
Name: pooch
Version: 1.6.0
Summary: "Pooch manages your Python library's sample data files: it automatically downloads and stores them in a local directory, with support for versioning and corruption checks."
Home-page: https://github.com/fatiando/pooch
Author: The Pooch Developers
Author-email: fatiandoaterra@protonmail.com
Maintainer: "Leonardo Uieda"
Maintainer-email: leouieda@gmail.com
License: BSD 3-Clause License
Project-URL: Documentation, https://www.fatiando.org/pooch
Project-URL: Release Notes, https://github.com/fatiando/pooch/releases
Project-URL: Bug Tracker, https://github.com/fatiando/pooch/issues
Project-URL: Source Code, https://github.com/fatiando/pooch
Keywords: data,download,caching,http
Platform: any
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: License :: OSI Approved :: BSD License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Software Development :: Libraries
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Requires-Python: >=3.6
Description-Content-Type: text/x-rst
License-File: LICENSE.txt
Requires-Dist: appdirs (>=1.3.0)
Requires-Dist: packaging (>=20.0)
Requires-Dist: requests (>=2.19.0)
Provides-Extra: progress
Requires-Dist: tqdm (<5.0.0,>=4.41.0) ; extra == 'progress'
Provides-Extra: sftp
Requires-Dist: paramiko (>=2.7.0) ; extra == 'sftp'
Provides-Extra: xxhash
Requires-Dist: xxhash (>=1.4.3) ; extra == 'xxhash'

.. image:: https://github.com/fatiando/pooch/raw/main/doc/_static/readme-banner.png
    :alt: Pooch

`Documentation <https://www.fatiando.org/pooch>`__ |
`Documentation (dev version) <https://www.fatiando.org/pooch/dev>`__ |
Part of the `Fatiando a Terra <https://www.fatiando.org>`__ project

.. image:: https://img.shields.io/pypi/v/pooch.svg?style=flat-square
    :alt: Latest version on PyPI
    :target: https://pypi.org/project/pooch/
.. image:: https://img.shields.io/conda/vn/conda-forge/pooch.svg?style=flat-square
    :alt: Latest version on conda-forge
    :target: https://github.com/conda-forge/pooch-feedstock
.. image:: https://img.shields.io/codecov/c/github/fatiando/pooch/main.svg?style=flat-square
    :alt: Test coverage status
    :target: https://codecov.io/gh/fatiando/pooch
.. image:: https://img.shields.io/pypi/pyversions/pooch.svg?style=flat-square
    :alt: Compatible Python versions.
    :target: https://pypi.org/project/pooch/
.. image:: https://img.shields.io/badge/doi-10.21105%2Fjoss.01943-blue.svg?style=flat-square
    :alt: Digital Object Identifier for the JOSS paper
    :target: https://doi.org/10.21105/joss.01943


About
-----

*Does your Python package include sample datasets? Are you shipping them with the code?
Are they getting too big?*

Pooch is here to help! It will manage a data *registry* by downloading your data files
from a server only when needed and storing them locally in a data *cache* (a folder on
your computer).

Here are Pooch's main features:

* Pure Python and minimal dependencies.
* Download a file only if necessary (it's not in the data cache or needs to be updated).
* Verify download integrity through SHA256 hashes (also used to check if a file needs to
  be updated).
* Designed to be extended: plug in custom download (FTP, scp, etc) and post-processing
  (unzip, decompress, rename) functions.
* Includes utilities to unzip/decompress the data upon download to save loading time.
* Can handle basic HTTP authentication (for servers that require a login) and printing
  download progress bars.
* Easily set up an environment variable to overwrite the data cache location.

*Are you a scientist or researcher? Pooch can help you too!*

* Automatically download your data files so you don't have to keep them in your GitHub
  repository.
* Make sure everyone running the code has the same version of the data files (enforced
  through the SHA256 hashes).


Example
-------

For a **scientist downloading a data file** for analysis:

.. code:: python

    import pooch
    import pandas as pd


    # Download a file and save it locally, returning the path to it.
    # Running this again will not cause a download. Pooch will check the hash
    # (checksum) of the downloaded file against the given value to make sure
    # it's the right file (not corrupted or outdated).
    fname_bathymetry = pooch.retrieve(
        url="https://github.com/fatiando-data/caribbean-bathymetry/releases/download/v1/caribbean-bathymetry.csv.xz",
        known_hash="md5:a7332aa6e69c77d49d7fb54b764caa82",
    )

    # Pooch can also download based on a DOI from certain providers.
    fname_gravity = pooch.retrieve(
        url="doi:10.5281/zenodo.5882430/southern-africa-gravity.csv.xz",
        known_hash="md5:1dee324a14e647855366d6eb01a1ef35",
    )

    # Load the data with Pandas
    data_bathymetry = pd.read_csv(fname_bathymetry)
    data_gravity = pd.read_csv(fname_gravity)



For **package developers** including sample data in their projects:

.. code:: python

    """
    Module mypackage/datasets.py
    """
    import pkg_resources
    import pandas
    import pooch

    # Get the version string from your project. You have one of these, right?
    from . import version


    # Create a new friend to manage your sample data storage
    GOODBOY = pooch.create(
        # Folder where the data will be stored. For a sensible default, use the
        # default cache folder for your OS.
        path=pooch.os_cache("mypackage"),
        # Base URL of the remote data store. Will call .format on this string
        # to insert the version (see below).
        base_url="https://github.com/myproject/mypackage/raw/{version}/data/",
        # Pooches are versioned so that you can use multiple versions of a
        # package simultaneously. Use PEP440 compliant version number. The
        # version will be appended to the path.
        version=version,
        # If a version as a "+XX.XXXXX" suffix, we'll assume that this is a dev
        # version and replace the version with this string.
        version_dev="main",
        # An environment variable that overwrites the path.
        env="MYPACKAGE_DATA_DIR",
        # The cache file registry. A dictionary with all files managed by this
        # pooch. Keys are the file names (relative to *base_url*) and values
        # are their respective SHA256 hashes. Files will be downloaded
        # automatically when needed (see fetch_gravity_data).
        registry={"gravity-data.csv": "89y10phsdwhs09whljwc09whcowsdhcwodcydw"}
    )
    # You can also load the registry from a file. Each line contains a file
    # name and it's sha256 hash separated by a space. This makes it easier to
    # manage large numbers of data files. The registry file should be packaged
    # and distributed with your software.
    GOODBOY.load_registry(
        pkg_resources.resource_stream("mypackage", "registry.txt")
    )


    # Define functions that your users can call to get back the data in memory
    def fetch_gravity_data():
        """
        Load some sample gravity data to use in your docs.
        """
        # Fetch the path to a file in the local storage. If it's not there,
        # we'll download it.
        fname = GOODBOY.fetch("gravity-data.csv")
        # Load it with numpy/pandas/etc
        data = pandas.read_csv(fname)
        return data


Projects using Pooch
--------------------

* `scikit-image <https://github.com/scikit-image/scikit-image>`__
* `MetPy <https://github.com/Unidata/MetPy>`__
* `icepack <https://github.com/icepack/icepack>`__
* `histolab <https://github.com/histolab/histolab>`__
* `seaborn-image <https://github.com/SarthakJariwala/seaborn-image>`__
* `Ensaio <https://github.com/fatiando/ensaio>`__

*If you're using Pooch, send us a pull request adding your project to the list.*


Contacting Us
-------------

Find out more about how to reach us at
`fatiando.org/contact <https://www.fatiando.org/contact/>`__


Citing Pooch
------------

This is research software **made by scientists** (see
`AUTHORS.md <https://github.com/fatiando/pooch/blob/main/AUTHORS.md>`__). Citations
help us justify the effort that goes into building and maintaining this project. If you
used Pooch for your research, please consider citing us.

See our `CITATION.rst file <https://github.com/fatiando/pooch/blob/main/CITATION.rst>`__
to find out more.


Contributing
------------

Code of conduct
+++++++++++++++

Please note that this project is released with a
`Code of Conduct <https://github.com/fatiando/community/blob/main/CODE_OF_CONDUCT.md>`__.
By participating in this project you agree to abide by its terms.

Contributing Guidelines
+++++++++++++++++++++++

Please read our
`Contributing Guide <https://github.com/fatiando/pooch/blob/main/CONTRIBUTING.md>`__
to see how you can help and give feedback.

Imposter syndrome disclaimer
++++++++++++++++++++++++++++

**We want your help.** No, really.

There may be a little voice inside your head that is telling you that you're
not ready to be an open source contributor; that your skills aren't nearly good
enough to contribute.
What could you possibly offer?

We assure you that the little voice in your head is wrong.

**Being a contributor doesn't just mean writing code**.
Equally important contributions include:
writing or proof-reading documentation, suggesting or implementing tests, or
even giving feedback about the project (including giving feedback about the
contribution process).
If you're coming to the project with fresh eyes, you might see the errors and
assumptions that seasoned contributors have glossed over.
If you can write any code at all, you can contribute code to open source.
We are constantly trying out new skills, making mistakes, and learning from
those mistakes.
That's how we all improve and we are happy to help others learn.

*This disclaimer was adapted from the*
`MetPy project <https://github.com/Unidata/MetPy>`__.


License
-------

This is free software: you can redistribute it and/or modify it under the terms
of the `BSD 3-clause License <https://github.com/fatiando/pooch/blob/main/LICENSE.txt>`__.
A copy of this license is provided with distributions of the software.


