Skip to content

Conversation

djudjuu
Copy link
Collaborator

@djudjuu djudjuu commented Aug 26, 2025

  • prefect integration docs
  • decomposition helper
  • rewrite
  • decomposition image

Copy link

netlify bot commented Aug 26, 2025

Deploy Preview for dlt-hub-docs failed. Why did it fail? →

Name Link
🔨 Latest commit 81b732c
🔍 Latest deploy log https://app.netlify.com/projects/dlt-hub-docs/deploys/68b6f883b3bcf600086e8a66

@djudjuu djudjuu mentioned this pull request Aug 26, 2025
Copy link

⚠️ Possible file(s) that should be tracked in LFS detected ⚠️

    The following file(s) exceeds the file size limit: 50000 bytes, as set in the .yml configuration files:

    docs/website/docs/plus/production/images/prefect-extract-artifact.png, docs/website/docs/plus/production/images/prefect-retry-artifacts.png, docs/website/docs/plus/production/images/prefect-schema-change-artifact.png, docs/website/docs/plus/production/images/prefect-source-decomposition.png

    Consider using git-lfs to manage large files.

@github-actions github-actions bot added the lfs-detected! Warning Label for use when LFS is detected in the commits of a Pull Request label Aug 26, 2025
Copy link

⚠️ Possible file(s) that should be tracked in LFS detected ⚠️

    The following file(s) exceeds the file size limit: 50000 bytes, as set in the .yml configuration files:

    docs/website/docs/plus/production/images/prefect-extract-artifact.png, docs/website/docs/plus/production/images/prefect-retry-artifacts.png, docs/website/docs/plus/production/images/prefect-schema-change-artifact.png, docs/website/docs/plus/production/images/prefect-source-decomposition.png

    Consider using git-lfs to manage large files.

@github-actions github-actions bot removed the lfs-detected! Warning Label for use when LFS is detected in the commits of a Pull Request label Aug 27, 2025
@djudjuu djudjuu force-pushed the docs/prefect-integration branch from d566473 to 2ed2b15 Compare August 27, 2025 12:41
For stability reasons, this actually runs one resource alone and then all others in parallel.
This is because otherwise, on the first run, all resources would try to create the same dlt-tables.
:::
:::warning
Copy link
Collaborator Author

@djudjuu djudjuu Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShreyasGS read this (under this comment)

Copy link
Contributor

@VioletM VioletM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very exciting!

General comments:

  1. I think in most of the cases dlt should be written without backticks (https://www.notion.so/dlthub/Documentation-Writing-Guide-87f4fcc32655460c83bcaf7787d11e67?source=copy_link#2288bedf0d7043e4bc1b4a0af7897b1f)

  2. Sentence case for any headers

  3. Please, put it through the grammar checker, I'm really bad with detecting grammar errors


## Key features

- **Prefect Collector:** a dedicated way to do real-time [progress monitoring] and summary reports after each pipeline stage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why [progress monitoring] is in braces?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx, i thought maybe this could be a link but removed it ultimately: https://dlthub.com/docs/general-usage/pipeline#monitor-the-loading-progress


### Schema Change Reports

The `PrefectCollector` will also create artifacts when schema changes are detected.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we elaborate on the artifacts? what's inside?

Prefect has built-in functionality to [include logs from other libraries](https://docs.prefect.io/v3/advanced/logging-customization#include-logs-from-other-libraries) and display them as part of their UI.

You can tell prefect to include `dlt`'s logs by setting the corresponding prefect environment variable, for example by adding this to your `.env` file:
```sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could you add the secrets.toml version as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting. its actually a prefect configuration not dlt, but i can stress that


## Runner integration

The `PrefectCollector` integrates seamlessly with the [dlt+ runner](../production/pipeline-runner.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `PrefectCollector` integrates seamlessly with the [dlt+ runner](../production/pipeline-runner.md).
The `PrefectCollector` integrates seamlessly with the [dlt+ Runner](../production/pipeline-runner.md).


### Pipeline Retries

Prefect retry-mechanism is not a perfect fit for dlt pipelines.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest first to mention that dlt+ Runner fixes the problem and then explain it :)

The dlt+ Runner provides a retry configuration that ensures pipeline state and intermediate results are preserved across retry attempts.

This is important because Prefect’s default retry mechanism is not optimized for dlt pipelines. During execution, dlt generates intermediate files in the pipeline’s working directory. If a run fails and Prefect retries it, those files may not be available anymore. For example, if the retry happens on a different worker node or inside an ephemeral Docker container.

To do so, ...

@djudjuu djudjuu force-pushed the docs/prefect-integration branch from b18cd50 to 413b953 Compare September 2, 2025 13:31
@djudjuu djudjuu requested a review from VioletM September 2, 2025 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants