Skip to content

Conversation

lisguo
Copy link
Contributor

@lisguo lisguo commented Sep 26, 2025

Description of the issue

Currently if you tried to use append_dimensions within a host metrics plugin, it will append the raw string. If you tried to use the ${aws: formatting to substitute the dimension with EC2 metadata information or tags, it will not work. This is only supported in the global append_dimensions section.

{
  "metrics": {
    "append_dimensions": {
      "InstanceId": "${aws:InstanceId}"
    },
    "metrics_collected": {
      "cpu": {
        "append_dimensions": {
          "InstanceType": "${aws:InstanceType}",
          "ImageId": "${aws:ImageId}",
          "ServiceName": "MyServiceApplication"
        },
        "measurement": [
          "cpu_usage_idle"
        ]
      }
    }
  }
}

The above will create the following dimensions today:

Key Value
InstanceId i-00000001
InstanceType ${aws:InstanceType}
ImageId ${aws:ImageId}
ServiceName MyServiceApplication

where ${aws:InstanceType} is the literal string value. It does not get substituted

Description of changes

The host plugins are telegraf-based and use TOML. We can query EC2 metadata service at translation time to populate the correct value into the TOML file.

From the above example:

[[inputs.cpu]]
    fieldpass = ["usage_idle"]
    percpu = false
    totalcpu = true
    [inputs.cpu.tags]
      ImageId = "ami-0abcdef1234567890"
      InstanceType = "t3.medium"
      ServiceName = "MyServiceApplication"

License

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Tests

Manual testing:
For the following cwagent config

{
    "metrics": {
        "append_dimensions": {
            "InstanceId": "${aws:InstanceId}"
        },
        "metrics_collected": {
            "cpu": {
                "append_dimensions": {
                    "InstanceType": "${aws:InstanceType}",
                    "ImageId": "${aws:ImageId}",
                },
                "measurement": [
                    "cpu_usage_idle",
                    "cpu_usage_iowait",
                    "cpu_usage_user",
                    "cpu_usage_system"
                ],
                "totalcpu": true
            },
            "disk": {
                "append_dimensions": {
                    "InstanceType": "${aws:InstanceType}",
                    "ImageId": "${aws:ImageId}",
                },
                "measurement": [
                    "used_percent",
                    "inodes_free"
                ],
                "resources": [
                    "*"
                ],
                "dimensions": [
                    [
                        "device",
                        "fstype",
                        "path"
                    ]
                ]
            },
            "diskio": {
                "append_dimensions": {
                    "InstanceType": "${aws:InstanceType}",
                    "ImageId": "${aws:ImageId}",
                },
                "measurement": [
                    "io_time"
                ],
                "resources": [
                    "*"
                ]
            },
            "mem": {
                "append_dimensions": {
                    "InstanceType": "${aws:InstanceType}",
                    "ImageId": "${aws:ImageId}",
                },
                "measurement": [
                    "used_percent"
                ]
            },
            "swap": {
                "append_dimensions": {
                    "InstanceType": "${aws:InstanceType}",
                    "ImageId": "${aws:ImageId}",
                },
                "measurement": [
                    "used_percent"
                ]
            }
        }
    }
}

Toml:

[inputs]

  [[inputs.cpu]]
    fieldpass = ["usage_idle", "usage_iowait", "usage_user", "usage_system"]
    percpu = false
    totalcpu = true
    [inputs.cpu.tags]
      ImageId = "ami-0254b2d5c4c472488"
      InstanceType = "t3.medium"

  [[inputs.disk]]
    fieldpass = ["used_percent", "inodes_free"]
    tagexclude = ["mode"]
    [inputs.disk.tags]
      ImageId = "ami-0254b2d5c4c472488"
      InstanceType = "t3.medium"

  [[inputs.diskio]]
    fieldpass = ["io_time"]
    [inputs.diskio.tags]
      ImageId = "ami-0254b2d5c4c472488"
      InstanceType = "t3.medium"

  [[inputs.mem]]
    fieldpass = ["used_percent"]
    [inputs.mem.tags]
      ImageId = "ami-0254b2d5c4c472488"
      InstanceType = "t3.medium"

  [[inputs.swap]]
    fieldpass = ["used_percent"]
    [inputs.swap.tags]
      ImageId = "ami-0254b2d5c4c472488"
      InstanceType = "t3.medium"

EC2 Tagger Yaml Config:

    ec2tagger:
        ec2_metadata_tags:
            - InstanceId
        imds_retries: 1
        middleware: agenthealth/statuscode
        refresh_tags_interval: 0s
        refresh_volumes_interval: 0s

Requirements

Before commiting your code, please do the following steps.

  1. Run make fmt and make fmt-sh
  2. Run make lint

Integration Tests

To run integration tests against this PR, add the ready for testing label.

@lisguo lisguo requested a review from a team as a code owner September 26, 2025 19:58
@lisguo lisguo requested review from dricross, jefchien and the-mann and removed request for the-mann September 26, 2025 19:58
@lisguo lisguo added the ready for testing Indicates this PR is ready for integration tests to run label Sep 29, 2025
dricross
dricross previously approved these changes Oct 1, 2025

// FilterReservedKeys out reserved tag keys
// FilterReservedKeys out reserved tag keys and resolves AWS metadata variables at translation time
func FilterReservedKeys(input any) any {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right place for this. Have we considered putting it in translator/translate/util/placeholderUtil and using the ec2tagger.SupportedAppendDimensions that the ec2taggerprocessor translator uses?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing this as part of the FilterReservedKeys, can we just call ResolveAWSMetadataPlaceholders after in commonconfigutil.go?

}

for {
result, err := ec2Client.DescribeTags(input)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to ask q if there's any other place in the code that uses DescribeTags that I could borrow this logic from...didn't find anything. Let me know if I can re-use something here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we use DescribeTags for retrieving the EKS cluster name: https://github.com/aws/amazon-cloudwatch-agent/blob/main/translator/translate/logs/util/get_eks_cluster_name.go#L80. And we have some specialized retry logic with backoff logic, so it might be important to re-use that. That code is specific for retrieving the EKS cluster name though, so some refactoring would be needed to make these two uses mesh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I could try to refactor but not sure if I should tackle it in this PR. Can be a follow up


func getEC2TagValue(tagKey string) string {
ec2Util := ec2util.GetEC2UtilSingleton()
if ec2Util.InstanceID == "" || ec2Util.Region == "" {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need instance id and region to call describe tags for the current instance

}

for {
result, err := ec2Client.DescribeTags(input)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we use DescribeTags for retrieving the EKS cluster name: https://github.com/aws/amazon-cloudwatch-agent/blob/main/translator/translate/logs/util/get_eks_cluster_name.go#L80. And we have some specialized retry logic with backoff logic, so it might be important to re-use that. That code is specific for retrieving the EKS cluster name though, so some refactoring would be needed to make these two uses mesh.


// FilterReservedKeys out reserved tag keys
// FilterReservedKeys out reserved tag keys and resolves AWS metadata variables at translation time
func FilterReservedKeys(input any) any {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing this as part of the FilterReservedKeys, can we just call ResolveAWSMetadataPlaceholders after in commonconfigutil.go?

jefchien
jefchien previously approved these changes Oct 7, 2025
for {
result, err := ec2Client.DescribeTags(input)
if err != nil {
log.Printf("E! Failed to describe EC2 tag '%s': %v", tagKey, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Won't this start to show up in every translator log since we don't include DescribeTags in the default policy? Are we sure we want to log this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this already when trying to get the EKS cluster name: https://github.com/aws/amazon-cloudwatch-agent/blob/main/translator/translate/logs/util/get_eks_cluster_name.go#L80

But this is only invoked if a customer sets their append_dimensions with an aws placeholder. So I think it makes sense to log it because they are trying to use a tag for their dimension, but it will fail.

Comment on lines 161 to 164
// Cache AWS metadata on first use
if awsMetadata == nil {
awsMetadata = GetAWSMetadataPlaceholderInfo()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This will cache the metadata for one append_dimensions section, but I think it's going to re-generate it for each append_dimensions section in the agent config, incurring another AWS call to DescribeTags. Hard to know if that's an issue or not. The special retry logic in get_eks_cluster_name.go tells me we've had some issues in the past with the DescribeTags call. GetAWSMetadataPlaceholderInfo could do the caching instead so that it's shared across the entire invocation of the translator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, for ec2 metadata at least we end up calling the ec2util singleton, so we are kind of double caching here: ec2 := ec2util.GetEC2UtilSingleton(). For tags though we want to make sure we aren't making double the calls. Can have a separate singleton for that

@lisguo lisguo changed the title Add support for EC2 metadata or tags as dimensions for host metrics Add support for EC2 metadata as dimensions for host metrics Oct 7, 2025
@lisguo lisguo merged commit 237e256 into main Oct 8, 2025
221 checks passed
@lisguo lisguo deleted the toml-tag branch October 8, 2025 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready for testing Indicates this PR is ready for integration tests to run

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants