Skip to content

Introduce Logging Stack: Add fluentd, add loki #1058

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 33 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
f0d8cf0
wip
mrnicegyu11 Sep 19, 2024
e906b41
Merge remote-tracking branch 'upstream/main' into main
mrnicegyu11 Oct 23, 2024
14c751d
Merge remote-tracking branch 'upstream/main' into main
mrnicegyu11 Oct 23, 2024
293f63c
Add csi-s3 and have portainer use it
mrnicegyu11 Oct 24, 2024
f7f72ec
Change request @hrytsuk 1GB max portainer volume size
mrnicegyu11 Oct 25, 2024
94cfb76
t push
mrnicegyu11 Oct 28, 2024
509c717
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Oct 29, 2024
1a65ecf
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Nov 13, 2024
77ee45e
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Nov 25, 2024
c9c70d6
Arch Linux Certificates Customization
mrnicegyu11 Dec 3, 2024
7b8be53
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Dec 5, 2024
bcd61cd
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Dec 12, 2024
58e1030
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Dec 13, 2024
ed8d479
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Jan 10, 2025
dda6e01
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Feb 4, 2025
f6f4f36
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Feb 25, 2025
5dca5c3
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Mar 13, 2025
4a653ef
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Mar 20, 2025
3a21f0f
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Mar 28, 2025
48fbbca
Fix pgsql exporter failure
mrnicegyu11 Apr 24, 2025
08c57db
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 May 6, 2025
5ecbfec
[Kubernetes] Introduce on-prem persistent Storage (Longhorn) :tada: …
YuryHrytsuk May 6, 2025
3ea41b5
Experimental: Try to add tracing to simcore-traefik on master
mrnicegyu11 May 9, 2025
1cf605d
Merge remote-tracking branch 'upstream/main' into 2025/add/fluentd
mrnicegyu11 May 14, 2025
bcc67d4
wip
mrnicegyu11 May 14, 2025
57947e3
Merge remote-tracking branch 'upstream/main' into 2025/add/fluentd
mrnicegyu11 May 21, 2025
88e4ed5
Merge remote-tracking branch 'upstream/main' into 2025/add/fluentd
mrnicegyu11 May 28, 2025
cae2c20
Merge remote-tracking branch 'upstream/main' into 2025/add/fluentd
mrnicegyu11 Jul 23, 2025
a7cb5ae
t push
mrnicegyu11 Jul 23, 2025
5e1766a
Fix accidental commit
mrnicegyu11 Jul 23, 2025
a91bcea
Fluentd fixes
mrnicegyu11 Jul 24, 2025
c39cbb1
Add placement constraints
mrnicegyu11 Jul 24, 2025
92070e2
revert arch linux changes
mrnicegyu11 Jul 24, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions scripts/deployments/deploy_everything_locally.bash
Original file line number Diff line number Diff line change
Expand Up @@ -243,9 +243,9 @@ if [ "$start_opsstack" -eq 0 ]; then
call_make "." up-"$stack_target";
popd

# -------------------------------- GRAYLOG -------------------------------
log_info "starting graylog..."
service_dir="${repo_basedir}"/services/graylog
# -------------------------------- LOGGING -------------------------------
log_info "starting logging..."
service_dir="${repo_basedir}"/services/logging
pushd "${service_dir}"
call_make "." up-"$stack_target"
sleep 1
Expand Down
Binary file removed services/graylog/GraylogWorkflow.png
Binary file not shown.
18 changes: 0 additions & 18 deletions services/graylog/docker-compose.aws.yml

This file was deleted.

18 changes: 0 additions & 18 deletions services/graylog/docker-compose.dalco.yml

This file was deleted.

18 changes: 0 additions & 18 deletions services/graylog/docker-compose.master.yml

This file was deleted.

File renamed without changes.
File renamed without changes.
File renamed without changes.
30 changes: 30 additions & 0 deletions services/logging/docker-compose.aws.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
services:
mongodb:
deploy:
placement:
constraints:
- node.labels.logging==true
elasticsearch:
deploy:
placement:
constraints:
- node.labels.logging==true
graylog:
dns: # Add this always for AWS, otherwise we get "No such image: " for docker services
8.8.8.8
Comment on lines +13 to +14
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not necessary when the docker swarm base network is setup as I believe we do now.

deploy:
placement:
constraints:
- node.labels.logging==true

fluentd:
deploy:
placement:
constraints:
- node.labels.logging==true

loki:
deploy:
placement:
constraints:
- node.labels.logging==true
30 changes: 30 additions & 0 deletions services/logging/docker-compose.dalco.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
services:
mongodb:
deploy:
placement:
constraints:
- node.labels.logging==true

elasticsearch:
deploy:
placement:
constraints:
- node.labels.logging==true

graylog:
deploy:
placement:
constraints:
- node.labels.logging==true

fluentd:
deploy:
placement:
constraints:
- node.labels.logging==true

loki:
deploy:
placement:
constraints:
- node.labels.logging==true
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,8 @@ services:
deploy:
placement:
constraints: []

fluentd:
deploy:
placement:
constraints: []
30 changes: 30 additions & 0 deletions services/logging/docker-compose.master.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
services:
mongodb:
deploy:
placement:
constraints:
- node.labels.logging==true

elasticsearch:
deploy:
placement:
constraints:
- node.labels.logging==true

graylog:
deploy:
placement:
constraints:
- node.labels.logging==true

fluentd:
deploy:
placement:
constraints:
- node.labels.logging==true

loki:
deploy:
placement:
constraints:
- node.labels.logging==true
197 changes: 197 additions & 0 deletions services/logging/docker-compose.yml.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
services:
# MongoDB: https://hub.docker.com/_/mongo/
mongodb:
image: mongo:6.0.6
init: true
volumes:
# data persistency
- mongo_data:/data/db
deploy:
replicas: 1
restart_policy:
condition: on-failure
resources:
limits:
memory: 1.2G
cpus: "1"
reservations:
memory: 300M
cpus: "0.1"
networks:
graylog:
aliases:
- mongo # needed because of graylog configuration

# Elasticsearch: https://www.elastic.co/guide/en/elasticsearch/reference/6.6/docker.html
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch-oss:7.10.2
init: true
volumes:
# data persistency
- elasticsearch_data:/usr/share/elasticsearch/data
environment:
- http.host=0.0.0.0
- transport.host=localhost
- network.host=0.0.0.0
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
deploy:
replicas: 1
restart_policy:
condition: on-failure
resources:
limits:
memory: 2G
cpus: "2"
reservations:
memory: 1G
cpus: "0.1"
networks:
graylog:
# Graylog: https://hub.docker.com/r/graylog/graylog/
graylog:
image: graylog/graylog:6.0.5
init: true
# user: "1000:1001"
configs:
- source: graylog_config
target: /files/osparc-custom-content-pack-v2.json
volumes:
# Mount local configuration directory into Docker container
# - graylog_config:/usr/share/graylog/data/config
# data persistency
- graylog_journal:/usr/share/graylog/data/journal
env_file:
- .env
environment:
# CHANGE ME (must be at least 16 characters)!
- GRAYLOG_PASSWORD_SECRET=${GRAYLOG_PASSWORD_SECRET}
# Username: admin
- GRAYLOG_ROOT_PASSWORD_SHA2=${GRAYLOG_ROOT_PASSWORD_SHA2}
- GRAYLOG_HTTP_EXTERNAL_URI=${GRAYLOG_HTTP_EXTERNAL_URI}
- GRAYLOG_ELASTICSEARCH_HOSTS=http://elasticsearch:9200,
networks:
public:
monitoring:
graylog:
aliases:
- graylog
ports:
- 12201:12201/udp
- 12202:12202/udp
deploy:
replicas: 1
restart_policy:
condition: on-failure
resources:
limits:
cpus: "2.00"
memory: 5G
reservations:
cpus: "0.1"
memory: 1G
labels:
- traefik.enable=true
- traefik.docker.network=${PUBLIC_NETWORK}
# direct access through port
- traefik.http.services.graylog.loadbalancer.server.port=9000
- traefik.http.routers.graylog.rule=Host(`${MONITORING_DOMAIN}`) && PathPrefix(`/graylog`)
- traefik.http.routers.graylog.entrypoints=https
- traefik.http.routers.graylog.tls=true
- traefik.http.middlewares.graylog_replace_regex.replacepathregex.regex=^/graylog/?(.*)$$
- traefik.http.middlewares.graylog_replace_regex.replacepathregex.replacement=/$${1}
- traefik.http.routers.graylog.middlewares=ops_whitelist_ips@swarm, ops_gzip@swarm, graylog_replace_regex
fluentd:
image: itisfoundation/fluentd:v1.16.9-1.0
configs:
- source: fluentd_config
target: /fluentd/etc/fluent.conf
environment:
- GRAYLOG_HOST=graylog
- GRAYLOG_PORT=12201
- LOKI_URL=http://loki:3100
- FLUENTD_HOSTNAME={% raw %}{{.Node.Hostname}}{% endraw %}
ports:
- "24224:24224/tcp"
deploy:
#mode: global # Run on all nodes
restart_policy:
condition: on-failure
resources:
limits:
cpus: '1.0'
memory: 1G
reservations:
cpus: '0.5'
memory: 512M
update_config:
parallelism: 1
delay: 10s
order: start-first
networks:
- monitoring
- graylog
healthcheck:
test: ["CMD", "curl", "-f", "http://0.0.0.0:24220/api/plugins"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s

loki:
image: grafana/loki:3.5.0
configs:
- source: loki_config
target: /etc/loki/loki.yaml
command: -config.file=/etc/loki/loki.yaml
deploy:
placement:
constraints: []
replicas: 1
restart_policy:
condition: any
delay: 5s
resources:
limits:
cpus: '1.0'
memory: 2G
reservations:
cpus: '0.5'
memory: 1G
update_config:
parallelism: 1
delay: 10s
order: start-first
networks:
- monitoring
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://0.0.0.0:3100/ready"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s


volumes:
loki-data:
mongo_data:
elasticsearch_data:
graylog_journal:

networks:
graylog:
public:
external: true
name: ${PUBLIC_NETWORK}
monitoring:
external: true
name: ${MONITORED_NETWORK}
configs:
graylog_config:
name: ${STACK_NAME}_graylog_config_{{ "./data/contentpacks/osparc-custom-content-pack-v2.json" | sha256file | substring(0,10) }}
file: ./data/contentpacks/osparc-custom-content-pack-v2.json
fluentd_config:
name: ${STACK_NAME}_fluentd_config_{{ "./fluentd/fluent.conf" | sha256file | substring(0,10) }}
file: ./fluentd/fluent.conf
loki_config:
name: ${STACK_NAME}_loki_config_{{ "./loki.yaml" | sha256file | substring(0,10) }}
file: ./loki.yaml
26 changes: 26 additions & 0 deletions services/logging/fluentd/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
FROM fluent/fluentd:v1.16.9-1.0

USER root

# Install dependencies and plugins
RUN apk add --no-cache --update --virtual .build-deps \
sudo build-base ruby-dev curl \
&& sudo gem install fluent-plugin-grafana-loki \
&& sudo gem install fluent-plugin-gelf-best \
&& sudo gem install fluent-plugin-prometheus \
&& apk del .build-deps \
&& apk add --no-cache curl jq \
&& rm -rf /var/cache/apk/* \
&& rm -rf /tmp/* /var/tmp/* /usr/lib/ruby/gems/*/cache/*.gem

# Create directories with appropriate permissions
RUN mkdir -p /fluentd/buffer /fluentd/log \
&& chown -R fluent:fluent /fluentd/buffer /fluentd/log

# Health check
HEALTHCHECK --interval=30s --timeout=30s --retries=3 \
CMD curl -s http://localhost:24220/api/plugins | jq -e '.plugins | length > 0' || exit 1

USER fluent

ENTRYPOINT ["fluentd", "-c", "/fluentd/etc/fluent.conf"]
Loading
Loading