[Bug] Memory Leak In Netty Recycler of Bookie Client

### Search before reporting

- [x] I searched in the [issues](https://github.com/apache/pulsar/issues) and found nothing similar.


### Read release policy

- [x] I understand that [unsupported versions](https://pulsar.apache.org/contribute/release-policy/#supported-versions) don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.


### User environment

pulsar-2.9.x and pulsar-3.0.x both have this memory leak

### Issue Description

After running the broker for a long time, it is found that the broker heap memory  and zgc time keep increasing. 
After dumping the heap memory, we found that reason is in netty recycler which is use for cache of bookie client variable. The recycler used memory keep increasing. 

As seen in the heapdump, there are so many LocalPools in one FastThreadLocalThread, and the consumerBuffer contain so many reference in one LocalPool.

Our setting is io.netty.recycler.maxCapacityPerThread=1024, PerchannelBookieClient number is 16 * 500=8000, 16 is the broker default config, 500 is the number of bookies.  And if change to io.netty.recycler.maxCapacityPerThread=0, the memory leak issue is fixed, but the write and read performance would decrease. -Dpulsar.allocator.leak_detection=Advanced and -Dio.netty.leakDetectionLevel=PARANOID  is set and no information is log.

![Image](https://github.com/user-attachments/assets/49037835-fb80-4503-ad96-2ae6175b9830)
![Image](https://github.com/user-attachments/assets/428c5d39-4774-49a4-b676-96181449e72d)

![Image](https://github.com/user-attachments/assets/83ffa9c5-6b18-4325-8f5e-1afa8eadf628)

![Image](https://github.com/user-attachments/assets/6925e3d0-ac97-4794-a38a-a2bcfe168725)

### Error messages

```text

```

### Reproducing the issue

continue running broker, start a perf produce process with large qps, normal throughput can reproduce.

### Issue Analysis

The root reason is each perChannelBookieClient has separate recycler and one broker would generate so many recyclers.

Such for our cluster, the recycler number = 16 * 500 * 2 = 16000,  16 is the bookkeeperNumberOfChannelsPerBookie config in broker.conf, 500 is the bookies number in one cluster, 2 is corresponding to two recycler in bookieClient, AddCompletion and EntryCompletionKey. 

All the perChannelBookieClient share the same threadPool : BookieClientWorker. The thread number = cpu core number = 32. 

Therefore, the largest object number cache in one broker's recycler is : 16000 * 32 * 1024 = 524288000 (1024 is the io.netty.recycler.maxCapacityPerThread). If one object is 300 Bytes, the full space of recycler object is : 150GB. That's the reason why occur memory leak. 

Actually the root reason is in bkClient, the recycler in perChannelBookieClient is not static, which would result in generating too many recyclers.  The more bookies in cluster or the more ledgers created in cluster,  the easier memory increase in broker. 




### Are you willing to submit a PR?

- [x] I'm willing to submit a PR!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Memory Leak In Netty Recycler of Bookie Client #24355

Search before reporting

Read release policy

User environment

Issue Description

Error messages

Reproducing the issue

Issue Analysis

Are you willing to submit a PR?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Memory Leak In Netty Recycler of Bookie Client #24355

Description

Search before reporting

Read release policy

User environment

Issue Description

Error messages

Reproducing the issue

Issue Analysis

Are you willing to submit a PR?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions