從 Kafka 內部儲存資料來看 Kafka 的基本概念是很合適的,因為怎麼存就會了解它的架構和限制。
Kafka 本質上是 Distributed, Replicated Messaging Queue ,在微服務和分散式計算的經常會被提及。要最大化效能就要對資料的存放有些了解。
Kafka 的基本概念如下圖:
下圖是概念和內部儲存資料的關係圖:
注意:上圖的 index/timeIndex 檔案只是示意圖,它們不是一個每筆 message 都有一筆 index 資料,見下面的實測。
窺視一個 partition 的資料夾
1$ ll /data/kafka/kafka-logs/test.eugene.test-7
1total 8.0K
2-rw-r--r-- 1 root root 10M Aug 22 18:31 00000000000000000000.index
3-rw-r--r-- 1 root root 88 Aug 22 18:35 00000000000000000000.log
4-rw-r--r-- 1 root root 10M Aug 22 18:31 00000000000000000000.timeindex
5-rw-r--r-- 1 root root 8 Aug 22 18:31 leader-epoch-checkpoint
用工具 DumpLogSegments 可以一窺內容
OffsetIndex - Index Of Offsets Of Log Segment
1$ bin/kafka-run-class.sh kafka.tools.DumpLogSegments --deep-iteration --print-data-log --files /data/kafka/kafka-logs/test.eugene.test-7/00000000000000000000.index
1Dumping /data/kafka/kafka-logs/test.eugene.test-7/00000000000000000000.index
2offset: 0 position: 0
TimeIndex - Index Of Timestamp And Offsets Of Log Segment
1$ bin/kafka-run-class.sh kafka.tools.DumpLogSegments --deep-iteration --print-data-log --files /data/kafka/kafka-logs/test.eugene.test-7/00000000000000000000.timeindex
1Found timestamp mismatch in :/data/kafka/kafka-logs/test.eugene.test-7/00000000000000000000.timeindex
2 Index timestamp: 0, log timestamp: 1629628512555
Log File
1$ bin/kafka-run-class.sh kafka.tools.DumpLogSegments --deep-iteration --print-data-log --files /data/kafka/kafka-logs/test.eugene.test-7/00000000000000000000.log
1Dumping /data/kafka/kafka-logs/test.eugene.test-7/00000000000000000000.log
2Starting offset: 0
3baseOffset: 0 lastOffset: 0 count: 1 baseSequence: -1 lastSequence: -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 0 isTransactional: false isControl: false position: 0 CreateTime: 1629628512555 size: 88 magic: 2 compresscodec: NONE crc: 1254090055 isvalid: true
4| offset: 0 CreateTime: 1629628512555 keysize: 0 valuesize: 18 sequence: -1 headerKeys: [] key: 12 payload: {
5 "data": 12
6}
7baseOffset: 1 lastOffset: 1 count: 1 baseSequence: -1 lastSequence: -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 0 isTransactional: false isControl: false position: 88 CreateTime: 1629628976272 size: 90 magic: 2 compresscodec: NONE crc: 2535940961 isvalid: true
8| offset: 1 CreateTime: 1629628976272 keysize: 2 valuesize: 18 sequence: -1 headerKeys: [] key: 13 payload: {
9 "data": 13
10}
相關連結
A Practical Introduction to the Internals of Kafka Storage | Medium
評論