Skip to content

Conversation

S0naliThakur
Copy link
Member

@S0naliThakur S0naliThakur commented Aug 6, 2025

PR Type

Enhancement


Description

  • Introduces localStateConsistencyReport diagnostic utility.

  • Adds debug endpoint for state consistency reporting.

  • Optionally prints state consistency report on process exit.

  • Updates configuration for state consistency reporting.


Changes walkthrough 📝

Relevant files
Enhancement
stateConsistency.ts
Add local state consistency diagnostic utility                     

src/debug/stateConsistency.ts

  • Implements localStateConsistencyReport for comparing account state
    across cache, trie, and storage.
  • Provides detailed and summary reporting options.
  • Supports chunked processing and rate limiting.
  • Exports types for report options and results.
  • +294/-0 
    debug.ts
    Add debug endpoint for state consistency report                   

    src/debug/debug.ts

  • Imports and exposes localStateConsistencyReport via new debug
    endpoint.
  • Allows configurable report parameters through query string.
  • Handles errors and returns JSON report.
  • +25/-0   
    index.ts
    Optionally print state consistency report on exit               

    src/exit-handler/index.ts

  • Makes exit log function async to support awaiting report.
  • Optionally generates and logs state consistency report on exit.
  • Handles and logs errors during report generation.
  • +18/-3   
    shardus-types.ts
    Extend server configuration types for state consistency   

    src/shardus/shardus-types.ts

  • Adds printStateConsistencyOnExit to server configuration interface.
  • +2/-0     
    Configuration changes
    server.ts
    Update server config for state consistency options             

    src/config/server.ts

  • Sets recordAccountStates default to true.
  • Adds printStateConsistencyOnExit config flag.
  • +2/-1     

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • @S0naliThakur S0naliThakur changed the title add localStateConsistencyReport - phase 1 SHARD-2719: add localStateConsistencyReport - phase 1 Aug 6, 2025
    Copy link

    github-actions bot commented Aug 6, 2025

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🏅 Score: 91
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Performance Impact

    The localStateConsistencyReport function processes potentially large sets of account data and queries storage in batches. The rate-limiting and chunking logic should be reviewed for possible performance bottlenecks or unintended impact on system resources, especially if invoked during process exit.

    export async function localStateConsistencyReport(opts: ConsistencyOptions = {}): Promise<{
      summary: ConsistencySummary
      details: AccountConsistencyResult[]
    }> {
      const recordsPerSecond = opts.recordsPerSecond ?? 5000 // default fairly fast
      const reportStartTime = Date.now()
      const summary: ConsistencySummary = {
        totalAccounts: 0,
        matchingAll: 0,
        mismatching: 0,
        cthFalse: 0,
        cttFalse: 0,
        cshFalse: 0,
        cstFalse: 0,
        tshFalse: 0,
        tstFalse: 0,
        totalChunks: 0,
        chunksProcessed: 0,
        totalTimeMs: 0,
        averageTimePerChunk: 0,
        recordsPerSecondActual: 0,
      }
    
      const details: AccountConsistencyResult[] = []
    
      const stateManager = Context.stateManager
      if (!stateManager) {
        throw new Error('stateManager not initialised yet')
      }
    
      const cacheMapGlobal = stateManager.accountCache?.accountsHashCache3?.accountHashMap
      const trie = stateManager.accountPatcher
      const storage = Context.storage
      const crypto = Context.crypto
    
      // Debug: Test if storage has ANY account states at all
      if (storage && typeof (storage as any).queryAccountStateTable === 'function') {
        try {
          // eslint-disable-next-line @typescript-eslint/no-explicit-any
          const testQuery = await (storage as any).queryAccountStateTable('0', 'f', 0, Date.now(), 10)
          console.log(`Debug: Storage test query returned ${testQuery?.length || 0} total account states`)
        } catch (e) {
          console.log('Debug: Storage test query failed:', (e as Error).message)
        }
      }
    
      const allChunks = generate256Chunks()
      const maxChunks = opts.maxChunks || 256
      const chunks = allChunks.slice(0, maxChunks)
      summary.totalChunks = chunks.length
    
      for (let i = 0; i < chunks.length; i++) {
        const chunk = chunks[i]
        const chunkStartTime = Date.now()
    
        const cacheMap: Map<string, { hash: string; timestamp: number }> = new Map()
        const trieMap: Map<string, { hash: string }> = new Map()
        const storageMap: Map<string, { hash: string; timestamp: number }> = new Map()
    
        // 1) CACHE – iterate global map once per chunk (plain loop for perf)
        if (cacheMapGlobal) {
          for (const [accountId, history] of cacheMapGlobal) {
            if (accountId.startsWith(chunk.prefix)) {
              const latest = history.accountHashList?.[0]
              if (latest) {
                cacheMap.set(accountId, { hash: latest.h, timestamp: latest.t })
              }
            }
          }
        }
    
        // 2) TRIE – iterate leaf nodes whose radix share the prefix
        if (trie?.shardTrie?.layerMaps) {
          const leafDepth = trie.treeMaxDepth ?? 4
          const leafLayer = trie.shardTrie.layerMaps[leafDepth]
          if (leafLayer) {
            for (const [radix, node] of leafLayer) {
              if (!radix.startsWith(chunk.prefix)) continue
              // accountTempMap is preferred (contains most recent hashes)
              const acctMap = node.accountTempMap ?? null
              if (acctMap) {
                for (const [accId, acc] of acctMap) {
                  if (accId.startsWith(chunk.prefix)) {
                    trieMap.set(accId, { hash: acc.hash })
                  }
                }
              }
              // Fall back to static accounts array if present
              if (node.accounts) {
                for (const acc of node.accounts) {
                  // eslint-disable-next-line security/detect-object-injection
                  const accountId = acc.accountID as string
                  if (accountId.startsWith(chunk.prefix)) {
                    trieMap.set(accountId, { hash: acc.hash })
                  }
                }
              }
            }
          }
        }
    
        // Build union set and prepare for DB query
        const unionIds: string[] = Array.from(new Set<string>([...cacheMap.keys(), ...trieMap.keys()]))
    
        // 3) STORAGE – fetch newest state row per account in batches of 800 to stay below SQLite parameter limits
        const batchSize = 800
        let storageErrors = 0
        for (let i = 0; i < unionIds.length; i += batchSize) {
          const slice = unionIds.slice(i, i + batchSize)
          try {
            if (!storage) {
              throw new Error('Storage not initialized')
            }
            // storage.queryAccountStateTableByListNewest returns rows with accountId, txTimestamp, stateAfter
            // eslint-disable-next-line @typescript-eslint/no-explicit-any
            const rows: any[] = await (storage as any).queryAccountStateTableByListNewest(slice)
            if (rows && Array.isArray(rows)) {
              for (const row of rows) {
                if (row && row.stateAfter && row.accountId && row.txTimestamp) {
                  const rowHash = crypto.hash(row.stateAfter)
                  storageMap.set(row.accountId, { hash: rowHash, timestamp: Number(row.txTimestamp) })
                }
              }
            } else if (slice.length > 0) {
              // Log when we query for accounts but get no results - this helps debug the issue
              console.log(`Storage query returned no data for ${slice.length} accounts in chunk ${chunk.prefix}. Sample account: ${slice[0]?.substring(0, 8)}...`)
            }
          } catch (e) {
            storageErrors++
            const logger = Context.logger?.getLogger('stateConsistency')
            if (logger) {
              logger.warn(`Storage query failed for chunk ${chunk.prefix}, batch ${Math.floor(i/batchSize)}: ${(e as Error).message}`)
            }
            // Continue processing other batches
          }
        }
    
        // Log storage errors if any occurred
        if (storageErrors > 0) {
          const logger = Context.logger?.getLogger('stateConsistency')
          if (logger) {
            logger.warn(`Total storage query errors for chunk ${chunk.prefix}: ${storageErrors}`)
          }
        }
    
        // 4) Compare
        const allIds = new Set<string>([...unionIds, ...storageMap.keys()])
        for (const accountId of allIds) {
          const c = cacheMap.get(accountId)
          const t = trieMap.get(accountId)
          const s = storageMap.get(accountId)
    
          const res: AccountConsistencyResult = {
            accountId,
            cache: c,
            trie: t,
            storage: s,
            cth: c && t ? c.hash === t.hash : false,
            ctt: false, // trie doesn't have timestamp in v1
            csh: c && s ? c.hash === s.hash : false,
            cst: c && s ? c.timestamp === s.timestamp : false,
            tsh: t && s ? t.hash === s.hash : false,
            tst: false, // trie timestamp not tracked in v1
          }
    
          summary.totalAccounts++
    
          const allMatch = res.cth && res.csh && res.tsh && res.cst // check all hash comparisons plus cache-storage timestamp
    
          if (allMatch) {
            summary.matchingAll++
          } else {
            summary.mismatching++
            if (!res.cth) summary.cthFalse++
            if (!res.ctt) summary.cttFalse++
            if (!res.csh) summary.cshFalse++
            if (!res.cst) summary.cstFalse++
            if (!res.tsh) summary.tshFalse++
            if (!res.tst) summary.tstFalse++
          }
    
          // Add to details if requested
          if (!opts.onlyMismatch || !allMatch) {
            if (!opts.summaryOnly) details.push(res)
          }
        }
    
        // 5) Rate limiting and pacing
        const elapsed = Date.now() - chunkStartTime
        const recordsProcessed = allIds.size
        const targetTimeMs = recordsProcessed > 0 ? (recordsProcessed / recordsPerSecond) * 1000 : 0
        const waitFor = Math.max(10, targetTimeMs - elapsed)
    
        if (recordsProcessed > 0) {
          await sleep(waitFor)
        } else {
          await sleep(50) // minimum sleep when no records processed
        }
    
        summary.chunksProcessed++
      }
    
      // Calculate final statistics
      const reportEndTime = Date.now()
      summary.totalTimeMs = reportEndTime - reportStartTime
      summary.averageTimePerChunk = summary.chunksProcessed > 0 ? summary.totalTimeMs / summary.chunksProcessed : 0
      summary.recordsPerSecondActual = summary.totalAccounts > 0 && summary.totalTimeMs > 0 ? 
        (summary.totalAccounts / summary.totalTimeMs) * 1000 : 0
    
      return {
        summary,
        details: opts.summaryOnly ? [] : details,
      }
    }
    Exit Path Robustness

    The exit handler now optionally generates and logs a potentially large state consistency report on process exit. This could delay shutdown or cause issues if the report is slow or fails. Ensure this does not block or destabilize the exit process.

    async runExitLog(isCleanExit: boolean, exitType: string, msg: string) {
      this.exitLogger.fatal(`isCleanExit: ${isCleanExit}  exitType: ${exitType}  msg: ${msg}`)
      let log: string[] = []
      const fakeStream = {
        write: (data: string) => {
          log.push(data)
        },
      }
      const toMB = 1 / 1000000
      const report = process.memoryUsage()
    
      log.push(`System Memory Report.  Timestamp: ${Date.now()}\n`)
      log.push(`rss: ${(report.rss * toMB).toFixed(2)} MB\n`)
      log.push(`heapTotal: ${(report.heapTotal * toMB).toFixed(2)} MB\n`)
      log.push(`heapUsed: ${(report.heapUsed * toMB).toFixed(2)} MB\n`)
      log.push(`external: ${(report.external * toMB).toFixed(2)} MB\n`)
      log.push(`arrayBuffers: ${(report.arrayBuffers * toMB).toFixed(2)} MB\n\n\n`)
    
      this.memStats.gatherReport()
      this.memStats.reportToStream(this.memStats.report, fakeStream, 0)
      this.exitLogger.fatal(log.join(''))
    
      log = []
      profilerInstance.scopedProfileSectionStart('counts')
      const arrayReport = this.counters.arrayitizeAndSort(this.counters.eventCounters)
    
      this.counters.printArrayReport(arrayReport, fakeStream, 0)
      profilerInstance.scopedProfileSectionEnd('counts')
      this.exitLogger.fatal(log.join(''))
    
      // ----------- State consistency report (optional) ------------
      try {
        const cfg = Context.config?.debug
        if (cfg?.printStateConsistencyOnExit) {
          const { localStateConsistencyReport } = await import('../debug/stateConsistency')
          const rep = await localStateConsistencyReport({ summaryOnly: false, onlyMismatch: true })
          this.exitLogger.fatal('State consistency summary:\n' + JSON.stringify(rep.summary, null, 2))
          if (rep.details && rep.details.length > 0) {
            this.exitLogger.fatal('State consistency mismatches:\n' + JSON.stringify(rep.details, null, 2))
          }
        }
      } catch (e) {
        this.exitLogger.fatal('Error generating state consistency report: ' + e.message)
      }
    
      this.writeExitSummary(isCleanExit, exitType, msg)

    @S0naliThakur S0naliThakur force-pushed the localStateConsistencyReport branch from 607506e to 227a1ac Compare August 7, 2025 13:32
    @S0naliThakur S0naliThakur marked this pull request as ready for review August 7, 2025 13:36
    @S0naliThakur S0naliThakur self-assigned this Aug 7, 2025
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    1 participant