Manage and version images, audio, video, and text files in storage and organize your ML modeling process into a reproducible workflow.
Инициализация dvc репозитория
dvc init git commit -m "Initialize DVC"
Добавление данных
dvc add data/data.csv git add data/data.csv.dvc data/.gitignore git commit -m "Add raw data"
Обновление данных
dvc commit git commit -m "Change data"
Добавление хранилища
dvc remote add -d --project gdrive gdrive://<url> git add .dvc/config git commit -m "Configure remote storage"
Отправка в хранилище
dvc push
Выгрузка из хранилиища
dvc pull
Переключение между версиями
git checkout <git-reversion> dvc checkout
from dvclive import Live with Live() as live: live.log_param("epochs", NUM_EPOCHS) for epoch in range(NUM_EPOCHS): train_model(...) metrics = evaluate_model(...) for metric_name, value in metrics.items(): live.log_metric(metric_name, value) live.next_step()
import dvc.api with dvc.api.open( 'get-started/data.csv', repo='https://github.com/iterative/dataset-registry' ) as f: # ... f is a file-like object
Pipeline
stages: prepare: ... # stage 1 definition train: ... # stage 2 definition evaluate: ... # stage 3 definition
Stage
stages: prepare: cmd: source src/cleanup.sh deps: - src/cleanup.sh - data/raw outs: - data/clean.csv
stages: train: cmd: ... deps: ... params: # from params.yaml - learning_rate - nn.epochs - nn.batch_size outs: ...
Pros
Cons