Skip to content

feat: Google Cloud Storage support documentation #1027

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions changelog/2024-05-31-secondary-storage/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ image: ./secondary_storage.png
description: Read and write from a storage that is not your main storage by specifying it in the S3 object as "secondary_storage" with the name of it.
features:
[
'Add additional storages from S3, Azure Blob, AWS OIDC or Azure Workload Identity.',
'Add additional storages from S3, Azure Blob, AWS OIDC or Google Cloud Storage.',
'From script, specify the secondary storage with an object with properties `s3` (path to the file) and `storage` (name of the secondary storage).'
]
docs: /docs/core_concepts/object_storage_in_windmill#secondary-storage
---
---
10 changes: 9 additions & 1 deletion docs/advanced/18_instance_settings/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ This setting is only available on [Enterprise Edition](/pricing).

This feature has no overlap with the [Workspace object storage](../../core_concepts/38_object_storage_in_windmill/index.mdx#workspace-object-storage).

You can choose to use S3, Azure Blob Storage or AWS OIDC. For each you will find a button to test settings from a server or from a worker.
You can choose to use S3, Azure Blob Storage, AWS OIDC or Google Cloud Storage. For each you will find a button to test settings from a server or from a worker.

![S3/Azure for Python/Go cache & large logs](../../core_concepts/20_jobs/s3_azure_cache.png "S3/Azure for Python/Go cache & large logs")

Expand Down Expand Up @@ -145,6 +145,14 @@ You can choose to use S3, Azure Blob Storage or AWS OIDC. For each you will find

This setting is only available on [Enterprise Edition](/pricing).

#### Google Cloud Storage

| Field | Description |
|-------|-------------|
| Bucket | The name of your Google Cloud Storage bucket |
| Service Account Key | The service account key for your Google Cloud Storage bucket in JSON format |


### Private Hub base url

Base url of your [private Hub](../../core_concepts/32_private_hub/index.mdx) instance, without trailing slash.
Expand Down
10 changes: 5 additions & 5 deletions docs/core_concepts/11_persistent_storage/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -56,20 +56,20 @@ All details at:
/>
</div>

## Large data: S3, R2, MinIO, Azure Blob
## Large data: S3, R2, MinIO, Azure Blob, Google Cloud Storage

On heavier data objects & unstructured data storage, [Amazon S3](https://aws.amazon.com/s3/) (Simple Storage Service) and its alternatives [Cloudflare R2](https://www.cloudflare.com/developer-platform/r2/) and [MinIO](https://min.io/) as well as [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs) storage are highly scalable and durable object storage service that provides secure, reliable, and cost-effective storage for a wide range of data types and use cases.
On heavier data objects & unstructured data storage, [Amazon S3](https://aws.amazon.com/s3/) (Simple Storage Service) and its alternatives [Cloudflare R2](https://www.cloudflare.com/developer-platform/r2/) and [MinIO](https://min.io/) as well as [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs) and [Google Cloud Storage](https://cloud.google.com/storage) are highly scalable and durable object storage services that provide secure, reliable, and cost-effective storage for a wide range of data types and use cases.

Windmill comes with a [native integration with S3 and Azure Blob](./large_data_files.mdx), making it the recommended storage for large objects like files and binary data.
Windmill comes with a [native integration with S3, Azure Blob, and Google Cloud Storage](./large_data_files.mdx), making them the recommended storage for large objects like files and binary data.

![Workspace object storage Infographic](./s3_infographics.png "Workspace object storage Infographic")

All details at:

<div className="grid grid-cols-2 gap-6 mb-4">
<DocCard
title="Large data: S3, R2, MinIO, Azure Blob"
description="Windmill comes with a native integration with S3 and Azure Blob, making it the recommended storage for large objects like files and binary data."
title="Large data: S3, R2, MinIO, Azure Blob, Google Cloud Storage"
description="Windmill comes with a native integration with S3, Azure Blob, and Google Cloud Storage, making them the recommended storage for large objects like files and binary data."
href="/docs/core_concepts/persistent_storage/large_data_files"
/>
</div>
Expand Down
16 changes: 8 additions & 8 deletions docs/core_concepts/11_persistent_storage/large_data_files.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,17 @@ import DocCard from '@site/src/components/DocCard';
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

# Large data: S3, R2, MinIO, Azure Blob
# Large data: S3, R2, MinIO, Azure Blob, Google Cloud Storage

This page is part of our section on [Persistent storage & databases](./index.mdx) which covers where to effectively store and manage the data manipulated by Windmill. Check that page for more options on data storage.

On heavier data objects & unstructured data storage, [Amazon S3](https://aws.amazon.com/s3/) (Simple Storage Service) and its alternatives [Cloudflare R2](https://www.cloudflare.com/developer-platform/r2/) and [MinIO](https://min.io/) as well as [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs) are highly scalable and durable object storage service that provides secure, reliable, and cost-effective storage for a wide range of data types and use cases.
On heavier data objects & unstructured data storage, [Amazon S3](https://aws.amazon.com/s3/) (Simple Storage Service) and its alternatives [Cloudflare R2](https://www.cloudflare.com/developer-platform/r2/) and [MinIO](https://min.io/) as well as [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs) and [Google Cloud Storage](https://cloud.google.com/storage) are highly scalable and durable object storage services that provide secure, reliable, and cost-effective storage for a wide range of data types and use cases.

Windmill comes with a [native integration with S3 and Azure Blob](../38_object_storage_in_windmill/index.mdx), making it the recommended storage for large objects like files and binary data.
Windmill comes with a [native integration with S3, Azure Blob, and Google Cloud Storage](../38_object_storage_in_windmill/index.mdx), making them the recommended storage for large objects like files and binary data.

## Workspace object storage

Connect your Windmill workspace to your S3 bucket or your Azure Blob storage to enable users to read and write from S3 without having to have access to the credentials.
Connect your Windmill workspace to your S3 bucket, Azure Blob storage, or Google Cloud Storage to enable users to read and write from S3 without having to have access to the credentials.

Windmill S3 bucket browser will not work for buckets containing more than 20 files and uploads are limited to files < 50MB. Consider upgrading to Windmill [Enterprise Edition](/pricing) to use this feature with large buckets.

Expand All @@ -21,7 +21,7 @@ Windmill S3 bucket browser will not work for buckets containing more than 20 fil
<div className="grid grid-cols-2 gap-6 mb-4">
<DocCard
title="Workspace object storage"
description="Connect your Windmill workspace to your S3 bucket or your Azure Blob storage to enable users to read and write from S3 without having to have access to the credentials."
description="Connect your Windmill workspace to your S3 bucket, Azure Blob storage, or Google Cloud Storage to enable users to read and write from S3 without having to have access to the credentials."
href="/docs/core_concepts/object_storage_in_windmill#workspace-object-storage"
/>
</div>
Expand Down Expand Up @@ -173,14 +173,14 @@ For more info on how Data pipelines in Windmill, see [Data pipelines](../27_data
/>
</div>

## Use Amazon S3, R2, MinIO and Azure Blob directly
## Use Amazon S3, R2, MinIO, Azure Blob, and Google Cloud Storage directly

Amazon S3, Cloudflare R2 and MinIO all follow the same API schema and therefore have a [common Windmill resource type](https://hub.windmill.dev/resource_types/42/). Azure Blob has a slightly different API than S3 but works with Windmill as well using its dedicated [resource type](https://hub.windmill.dev/resource_types/137/)
Amazon S3, Cloudflare R2 and MinIO all follow the same API schema and therefore have a [common Windmill resource type](https://hub.windmill.dev/resource_types/42/). Azure Blob and Google Cloud Storage have slightly different APIs than S3 but work with Windmill as well using their dedicated resource types ([Azure Blob](https://hub.windmill.dev/resource_types/137/), [Google Cloud Storage](https://hub.windmill.dev/resource_types/268))

<div className="grid grid-cols-2 gap-6 mb-4">
<DocCard
title="S3 APIs integrations"
description="Use Amazon S3, Cloudflare R2, MinIO and Azure Blob directly within scripts and flows."
description="Use Amazon S3, Cloudflare R2, MinIO, Azure Blob, and Google Cloud Storage directly within scripts and flows."
href="/docs/integrations/s3"
/>
</div>
8 changes: 4 additions & 4 deletions docs/core_concepts/18_files_binary_data/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,18 +7,18 @@ import TabItem from '@theme/TabItem';
In Windmill, JSON is the primary data format used for representing information.
Binary data, such as files, are not easy to handle. Windmill provides two options.

1. Have a dedicated storage for binary data: S3 or Azure Blob. Windmill has a first class integration with S3 buckets or Azure Blob containers.
1. Have a dedicated storage for binary data: S3, Azure Blob, or Google Cloud Storage. Windmill has a first class integration with S3 buckets, Azure Blob containers, or Google Cloud Storage buckets.
2. If the above is not an option, there's always the possibility to store the binary as base64 encoded string.

## Workspace object storage

The recommended way to store binary data is to upload it to S3 or Azure Blob Storage leveraging [Windmill's workspace object storage](../38_object_storage_in_windmill/index.mdx).
The recommended way to store binary data is to upload it to S3, Azure Blob Storage, or Google Cloud Storage leveraging [Windmill's workspace object storage](../38_object_storage_in_windmill/index.mdx).

Instance and workspace object storage are different from using [S3 resources](../../integrations/s3.mdx) within scripts, flows, and apps, which is free and unlimited. What is exclusive to the [Enterprise](/pricing) version is using the integration of Windmill with S3 that is a major convenience layer to enable users to read and write from S3 without having to have access to the credentials.

:::info

Windmill's integration with S3 and Azure Blob Storage works exactly the same and the features described below works in both cases. The only difference is that you need to select an `azure_blob` resource when setting up the S3 storage in the Workspace settings.
Windmill's integration with S3, Azure Blob Storage, and Google Cloud Storage works exactly the same and the features described below work in all cases. The only difference is that you need to select an `azure_blob` resource for Azure Blob or a `gcloud_storage` resource for Google Cloud Storage when setting up the storage in the Workspace settings.

:::

Expand Down Expand Up @@ -49,7 +49,7 @@ All details on Workspace object storage, and how to [read](../38_object_storage_
<div className="grid grid-cols-2 gap-6 mb-4">
<DocCard
title="Workspace object storage"
description="Connect your Windmill workspace to your S3 bucket or your Azure Blob storage to enable users to read and write from S3 without having to have access to the credentials."
description="Connect your Windmill workspace to your S3 bucket, your Azure Blob storage or your GCS bucket to enable users to read and write from S3 without having to have access to the credentials."
href="/docs/core_concepts/object_storage_in_windmill#workspace-object-storage"
/>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/core_concepts/19_rich_display_rendering/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,7 @@ Learn more at:
<div className="grid grid-cols-2 gap-6 mb-4">
<DocCard
title="Workspace object storage"
description="Connect your Windmill workspace to your S3 bucket or your Azure Blob storage to enable users to read and write from S3 without having to have access to the credentials."
description="Connect your Windmill workspace to your S3 bucket, your Azure Blob storage or your GCS bucket to enable users to read and write from S3 without having to have access to the credentials."
href="/docs/core_concepts/object_storage_in_windmill#workspace-object-storage"
/>
</div>
Expand Down
11 changes: 9 additions & 2 deletions docs/core_concepts/20_jobs/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ For large logs storage (and display) and cache for distributed Python jobs, you

This feature has no overlap with the [Workspace object storage](../38_object_storage_in_windmill/index.mdx#workspace-object-storage).

You can choose to use S3, Azure Blob Storage or AWS OIDC. For each you will find a button to test settings from a server or from a worker.
You can choose to use S3, Azure Blob Storage, AWS OIDC or Google Cloud Storage. For each you will find a button to test settings from a server or from a worker.

<div className="grid grid-cols-2 gap-6 mb-4">
<DocCard
Expand Down Expand Up @@ -174,4 +174,11 @@ You can choose to use S3, Azure Blob Storage or AWS OIDC. For each you will find
| Access key | string | The primary or secondary access key for the storage account. This key is used to authenticate and provide access to Azure Blob Storage. |
| Tenant ID | string | (optional) The unique identifier (GUID) for your Azure Active Directory (AAD) tenant. Required if using Azure Active Directory for authentication. |
| Client ID | string | (optional) The unique identifier (GUID) for your application registered in Azure AD. Required if using service principal authentication via Azure AD. |
| Endpoint | string | (optional) The specific endpoint for Azure Blob Storage, typically used when interacting with non-Azure Blob providers like Azurite or other emulators. For Azure Blob Storage, this is auto-generated and not usually needed. |
| Endpoint | string | (optional) The specific endpoint for Azure Blob Storage, typically used when interacting with non-Azure Blob providers like Azurite or other emulators. For Azure Blob Storage, this is auto-generated and not usually needed. |

#### Google Cloud Storage

| Field | Description |
|-------|-------------|
| Bucket | The name of your Google Cloud Storage bucket |
| Service Account Key | The service account key for your Google Cloud Storage bucket in JSON format |
40 changes: 39 additions & 1 deletion docs/core_concepts/27_data_pipelines/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ Find all details at:
<div className="grid grid-cols-2 gap-6 mb-4">
<DocCard
title="Workspace object storage"
description="Connect your Windmill workspace to your S3 bucket or your Azure Blob storage to enable users to read and write from S3 without having to have access to the credentials."
description="Connect your Windmill workspace to your S3 bucket, Azure Blob storage, or GCS bucket to enable users to read and write from S3 without having to have access to the credentials."
href="/docs/core_concepts/object_storage_in_windmill#workspace-object-storage"
/>
</div>
Expand Down Expand Up @@ -167,6 +167,44 @@ def main(input_file: S3Object):
return S3Object(s3=output_file)
```

</TabItem>
<TabItem value="polars (Google Cloud Storage)" label="Polars (Google Cloud Storage)" attributes={{className: "text-xs p-4 !mt-0 !ml-0"}}>

```python
import wmill
from wmill import S3Object
import polars as pl


def main(input_file: S3Object):
# this will default to the workspace Google Cloud Storage resource
endpoint_url = wmill.polars_connection_settings().s3fs_args["endpoint_url"]
storage_options = wmill.polars_connection_settings().storage_options

# this will use the designated resource
# storage_options = wmill.polars_connection_settings("<PATH_TO_S3_RESOURCE>").storage_options

# input is a parquet file, we use read_parquet in lazy mode.
# Polars can read various file types, see
# https://pola-rs.github.io/polars/py-polars/html/reference/io.html
input_uri = "{}/{}".format(endpoint_url, input_file["s3"])

input_df = pl.read_parquet(input_uri, storage_options=storage_options).lazy()

# process the Polars dataframe. See Polars docs:
# for dataframe: https://pola-rs.github.io/polars/py-polars/html/reference/dataframe/index.html
# for lazy dataframe: https://pola-rs.github.io/polars/py-polars/html/reference/lazyframe/index.html
output_df = input_df.collect()
print(output_df)

# To write back the result to Google Cloud Storage, Polars needs an s3fs connection
output_file = "output/result.parquet"
output_uri = "{}/{}".format(endpoint_url, output_file)
output_df.write_parquet(output_uri, storage_options=storage_options)

return S3Object(s3=output_file)
```

</TabItem>
<TabItem value="duckdb (Python / AWS S3)" label="DuckDB (Python / AWS S3)" attributes={{className: "text-xs p-4 !mt-0 !ml-0"}}>

Expand Down
8 changes: 4 additions & 4 deletions docs/core_concepts/38_object_storage_in_windmill/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,13 @@ Additionally, for [instance integration](#instance-object-storage), the Enterpri

## Workspace object storage

Connect your Windmill workspace to your S3 bucket or your Azure Blob storage to enable users to read and write from S3 without having to have access to the credentials.
Connect your Windmill workspace to your S3 bucket, Azure Blob storage, or GCS bucket to enable users to read and write from S3 without having to have access to the credentials.

![Workspace object storage infographic](../11_persistent_storage/s3_infographics.png 'Workspace object storage infographic')

Windmill S3 bucket browser will not work for buckets containing more than 20 files and uploads are limited to files < 50MB. Consider upgrading to Windmill [Enterprise Edition](/pricing) to use this feature with large buckets.

Once you've created an [S3 or Azure Blob resource](../../integrations/s3.mdx) in Windmill, go to the workspace settings > S3 Storage. Select the resource and click Save.
Once you've created an [S3, Azure Blob, or Google Cloud Storage resource](../../integrations/s3.mdx) in Windmill, go to the workspace settings > S3 Storage. Select the resource and click Save.

![S3 storage workspace settings](../11_persistent_storage/workspace_settings.png)

Expand Down Expand Up @@ -314,7 +314,7 @@ For more info on how to use files and S3 files in Windmill, see [Handling files

Read and write from a storage that is not your main storage by specifying it in the S3 object as "secondary_storage" with the name of it.

From the workspace settings, in tab "S3 Storage", just click on "Add secondary storage", give it a name, and pick a resource from type "S3", "Azure Blob", "AWS OIDC" or "Azure Workload Identity". You can save as many additional storages as you want as long as you give them a different name.
From the workspace settings, in tab "S3 Storage", just click on "Add secondary storage", give it a name, and pick a resource from type "S3", "Azure Blob", "Google Cloud Storage", "AWS OIDC" or "Azure Workload Identity". You can save as many additional storages as you want as long as you give them a different name.

Then from script, you can specify the secondary storage with an object with properties `s3` (path to the file) and `storage` (name of the secondary storage).

Expand Down Expand Up @@ -377,7 +377,7 @@ Under [Enterprise Edition](/pricing), instance object storage offers advanced fe

![Instance object storage infographic](./instance_object_storage_infographic.png 'Instance object storage infographic')

This can be configured from the [instance settings](../../advanced/18_instance_settings/index.mdx#instance-object-storage), with configuration options for S3, Azure Blob or AWS OIDC.
This can be configured from the [instance settings](../../advanced/18_instance_settings/index.mdx#instance-object-storage), with configuration options for S3, Azure Blob, Google Cloud Storage, or AWS OIDC.

![S3/Azure for Python/Go cache & large logs](../../core_concepts/20_jobs/s3_azure_cache.png "S3/Azure for Python/Go cache & large logs")

Expand Down
2 changes: 1 addition & 1 deletion docs/core_concepts/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ On top of its editors to build endpoints, flows and apps, Windmill comes with a
/>
<DocCard
title="Object storage in Windmill"
description="Windmill comes with native integrations with S3 and Azure Blob, making it the recommended storage for large objects like files and binary data."
description="Windmill comes with native integrations with S3, Azure Blob, AWS OIDC and Google Cloud Storage, making it the recommended storage for large objects like files and binary data."
href="/docs/core_concepts/object_storage_in_windmill"
/>
<DocCard
Expand Down
Loading