minder_utils_lite
:¶Firstly, we need to follow the installation instructions on https://minder-utils.github.io. The following is the code that should be run in the last step:
!pip install -e git+https://github.com/minder-utils/minder_utils_lite.git#egg=minder_utils
Obtaining minder_utils from git+https://github.com/minder-utils/minder_utils_lite.git#egg=minder_utils Cloning https://github.com/minder-utils/minder_utils_lite.git to ./src/minder-utils Requirement already satisfied: numpy==1.19.5 in /Users/ac4919/miniforge3/envs/dri_data_utils/lib/python3.8/site-packages (from minder_utils) (1.19.5) Requirement already satisfied: pandas==1.1.5 in /Users/ac4919/miniforge3/envs/dri_data_utils/lib/python3.8/site-packages (from minder_utils) (1.1.5) Requirement already satisfied: requests in /Users/ac4919/miniforge3/envs/dri_data_utils/lib/python3.8/site-packages (from minder_utils) (2.26.0) Requirement already satisfied: argparse in /Users/ac4919/miniforge3/envs/dri_data_utils/lib/python3.8/site-packages (from minder_utils) (1.4.0) Requirement already satisfied: python-dateutil>=2.7.3 in /Users/ac4919/miniforge3/envs/dri_data_utils/lib/python3.8/site-packages (from pandas==1.1.5->minder_utils) (2.8.2) Requirement already satisfied: pytz>=2017.2 in /Users/ac4919/miniforge3/envs/dri_data_utils/lib/python3.8/site-packages (from pandas==1.1.5->minder_utils) (2021.3) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /Users/ac4919/miniforge3/envs/dri_data_utils/lib/python3.8/site-packages (from requests->minder_utils) (1.26.7) Requirement already satisfied: charset-normalizer~=2.0.0; python_version >= "3" in /Users/ac4919/miniforge3/envs/dri_data_utils/lib/python3.8/site-packages (from requests->minder_utils) (2.0.4) Requirement already satisfied: idna<4,>=2.5; python_version >= "3" in /Users/ac4919/miniforge3/envs/dri_data_utils/lib/python3.8/site-packages (from requests->minder_utils) (3.2) Requirement already satisfied: certifi>=2017.4.17 in /Users/ac4919/miniforge3/envs/dri_data_utils/lib/python3.8/site-packages (from requests->minder_utils) (2021.5.30) Requirement already satisfied: six>=1.5 in /Users/ac4919/miniforge3/envs/dri_data_utils/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas==1.1.5->minder_utils) (1.16.0) Installing collected packages: minder-utils Attempting uninstall: minder-utils Found existing installation: minder-utils 0.0.2 Uninstalling minder-utils-0.0.2: Successfully uninstalled minder-utils-0.0.2 Running setup.py develop for minder-utils Successfully installed minder-utils
!!! If you are a collaborator for the Github repo and this is linked, make sure that you are running this notebook outside of the src directory (after install) to avoid accidentally pushing confidential information !!!
Now, we want to save our token to the package. These settings will be saved indefinitely, unless the package is updated or uninstalled.
import minder_utils.settings as settings
# I have my token saved in a txt file called my_token
# you can also simply paste your token as a string in the token_save function
with open('my_token.txt', 'r') as file:
token = file.read()
settings.token_save(token)
Token Saved
Once you have done this, you may use the package freely!
Downloading data is simple! Please see the following instructions:
First, we will want to import the class that allows us to download the data, Downloader
:
from minder_utils.download import Downloader
dl = Downloader()
With this class, we may check to see which datasets are available and which groups of datasets are available.
First, let us check which groups are available to download data from:
dl.get_group_names()
['vital_signs', 'environmental', 'activity', 'other', 'lookup', 'care']
Let's say we want to download the activity data. We can see which datasets are available using the following function:
categories = dl.get_category_names('activity')
categories
['raw_activity_pir', 'raw_door_sensor', 'raw_appliance_use']
Great! Now let us try to download this data:
dl.export(categories=dl.get_category_names('activity'), # the datasets to download
since = '2021-10-05', # the date from which the data should be downloaded from
until = '2021-10-08', # the date from which the data should be downloaded to
save_path='./data/raw_data/') # the path to save the data
Creating new export request Exporting the {'raw_activity_pir': {}, 'raw_door_sensor': {}, 'raw_appliance_use': {}} From 2021-10-05 to 2021-10-08 Waiting for the sever to complete the job [-------->=>-] Job is completed, start to download the data Start to export job Exporting 1/3 raw_door_sensor Success Exporting 2/3 raw_activity_pir Success Exporting 3/3 raw_appliance_use Success
Now, let's say a couple of days has passed and we would like to update our data. It is possible to use the refresh function to refresh our data. Simply run the following command:
dl.refresh(categories=dl.get_category_names('activity'), # datasets to refresh
until = '2021-10-10', # new date for the data to run to
save_path='./data/raw_data/') # save path currently holding the data to refresh
Checking current files... Creating new parallel export requests Waiting for the sever to complete the job [-------->=>-] The server has finished processing the requests Success
To access the data, you may run the usual commands to load a csv file:
import pandas as pd
data = pd.read_csv('./data/raw_data/raw_activity_pir' + '.csv')
data.shape
(94701, 9)
.