Skip to content

[DLCov] Origin-Tracking: Enable collecting and symbolizing stack traces #143591

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

SLTozer
Copy link
Contributor

@SLTozer SLTozer commented Jun 10, 2025

This patch is part of a series that adds origin-tracking to the debugify source location coverage checks, allowing us to report symbolized stack traces of the point where missing source locations appear.

This patch adds a pair of new functions in signals.h that can be used to collect and symbolize stack traces respectively. This has major implementation overlap with the existing stack trace collection/symbolizing methods, but the existing functions are specialized for dumping a stack trace to stderr when LLVM crashes, while these new functions are meant to be called repeatedly during the execution of the program, and therefore we need a separate set of functions.

NB: These functions are only present when origin-tracking is enabled in CMake. Carrying over a conversation from when this was last put up for review, where @jryans requested that the functions be made available unconditionally, I personally believe that since the functions are doing "unusual" things it would be better to avoid compiling them in until we have a confirmed second use-case; I could easily be swayed from this if there's a good argument that another category of users exists, or that in principle there's not a good reason to hide the functions.

Copy link
Contributor Author

SLTozer commented Jun 10, 2025

@SLTozer SLTozer force-pushed the users/SLTozer/dl-coverage-origin-symbolize branch from d10a102 to b2ecf5e Compare June 10, 2025 20:13
@SLTozer SLTozer force-pushed the users/SLTozer/dl-coverage-origin-cmake branch from c9f3a10 to 007f528 Compare June 10, 2025 20:13
@SLTozer SLTozer changed the title [DLCov] Origin-Tracking: SymbolizeAddresses [DLCov] Origin-Tracking: Collect symbolized stack traces Jun 10, 2025
@SLTozer SLTozer force-pushed the users/SLTozer/dl-coverage-origin-symbolize branch from b2ecf5e to fe689dd Compare June 11, 2025 08:51
@SLTozer SLTozer force-pushed the users/SLTozer/dl-coverage-origin-cmake branch from 007f528 to 2bcb596 Compare June 11, 2025 08:51
Copy link

github-actions bot commented Jun 11, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff HEAD~1 HEAD --extensions inc,h,cpp -- llvm/include/llvm/Support/Signals.h llvm/lib/Support/Signals.cpp llvm/lib/Support/Unix/Signals.inc llvm/lib/Support/Windows/Signals.inc
View the diff from clang-format here.
diff --git a/llvm/include/llvm/Support/Signals.h b/llvm/include/llvm/Support/Signals.h
index 18df195ba..5969a0a6b 100644
--- a/llvm/include/llvm/Support/Signals.h
+++ b/llvm/include/llvm/Support/Signals.h
@@ -20,9 +20,9 @@
 #include <string>
 
 #if LLVM_ENABLE_DEBUGLOC_TRACKING_ORIGIN
-#include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/DenseSet.h"
+#include "llvm/ADT/SmallVector.h"
 namespace llvm {
 // Typedefs that are convenient but only used by the stack-trace-collection code
 // added if DebugLoc origin-tracking is enabled.
@@ -30,7 +30,7 @@ using AddressSet = DenseSet<void *, DenseMapInfo<void *, void>>;
 using SymbolizedAddressMap =
     DenseMap<void *, SmallVector<std::string, 0>, DenseMapInfo<void *, void>,
              detail::DenseMapPair<void *, SmallVector<std::string, 0>>>;
-}
+} // namespace llvm
 #endif
 
 namespace llvm {
diff --git a/llvm/lib/Support/Signals.cpp b/llvm/lib/Support/Signals.cpp
index a9c61f234..ea0018fd3 100644
--- a/llvm/lib/Support/Signals.cpp
+++ b/llvm/lib/Support/Signals.cpp
@@ -155,8 +155,8 @@ collectAddressSymbols(void **AddressList, unsigned AddressCount,
   StringSaver StrPool(Allocator);
   SmallVector<const char *, 0> Modules(AddressCount, nullptr);
   SmallVector<intptr_t, 0> Offsets(AddressCount, 0);
-  if (!findModulesAndOffsets(AddressList, AddressCount, Modules.data(), Offsets.data(),
-                             MainExecutableName, StrPool))
+  if (!findModulesAndOffsets(AddressList, AddressCount, Modules.data(),
+                             Offsets.data(), MainExecutableName, StrPool))
     return {};
   int InputFD;
   SmallString<32> InputFile, OutputFile;
@@ -169,7 +169,7 @@ collectAddressSymbols(void **AddressList, unsigned AddressCount,
     raw_fd_ostream Input(InputFD, true);
     for (unsigned AddrIdx = 0; AddrIdx < AddressCount; AddrIdx++) {
       if (Modules[AddrIdx])
-        Input << Modules[AddrIdx] << " " << (void*)Offsets[AddrIdx] << "\n";
+        Input << Modules[AddrIdx] << " " << (void *)Offsets[AddrIdx] << "\n";
     }
   }
 
@@ -188,7 +188,6 @@ collectAddressSymbols(void **AddressList, unsigned AddressCount,
   if (RunResult != 0)
     return {};
 
-
   SmallVector<std::pair<unsigned, std::string>, 0> Result;
   auto OutputBuf = MemoryBuffer::getFile(OutputFile.c_str());
   if (!OutputBuf)
@@ -208,8 +207,7 @@ collectAddressSymbols(void **AddressList, unsigned AddressCount,
   // is known or the module+address offset otherwise.
   for (unsigned AddrIdx = 0; AddrIdx < AddressCount; AddrIdx++) {
     if (!Modules[AddrIdx]) {
-      auto &SymbolizedFrame =
-        Result.emplace_back(std::make_pair(AddrIdx, ""));
+      auto &SymbolizedFrame = Result.emplace_back(std::make_pair(AddrIdx, ""));
       raw_string_ostream OS(SymbolizedFrame.second);
       OS << format_ptr(AddressList[AddrIdx]);
       continue;
@@ -222,8 +220,7 @@ collectAddressSymbols(void **AddressList, unsigned AddressCount,
       StringRef FunctionName = *CurLine++;
       if (FunctionName.empty())
         break;
-      auto &SymbolizedFrame =
-        Result.emplace_back(std::make_pair(AddrIdx, ""));
+      auto &SymbolizedFrame = Result.emplace_back(std::make_pair(AddrIdx, ""));
       raw_string_ostream OS(SymbolizedFrame.second);
       OS << format_ptr(AddressList[AddrIdx]) << ' ';
       if (!FunctionName.starts_with("??"))
@@ -234,7 +231,8 @@ collectAddressSymbols(void **AddressList, unsigned AddressCount,
       if (!FileLineInfo.starts_with("??"))
         OS << FileLineInfo;
       else
-        OS << "(" << Modules[AddrIdx] << '+' << format_hex(Offsets[AddrIdx], 0) << ")";
+        OS << "(" << Modules[AddrIdx] << '+' << format_hex(Offsets[AddrIdx], 0)
+           << ")";
     }
   }
   return Result;
@@ -247,7 +245,8 @@ ErrorOr<std::string> getLLVMSymbolizerPath(StringRef Argv0 = {}) {
   } else if (!Argv0.empty()) {
     StringRef Parent = llvm::sys::path::parent_path(Argv0);
     if (!Parent.empty())
-      LLVMSymbolizerPathOrErr = sys::findProgramByName("llvm-symbolizer", Parent);
+      LLVMSymbolizerPathOrErr =
+          sys::findProgramByName("llvm-symbolizer", Parent);
   }
   if (!LLVMSymbolizerPathOrErr)
     LLVMSymbolizerPathOrErr = sys::findProgramByName("llvm-symbolizer");
@@ -265,10 +264,10 @@ static bool printSymbolizedStackTrace(StringRef Argv0, void **StackTrace,
   if (Argv0.contains("llvm-symbolizer"))
     return false;
 
-  // FIXME: Subtract necessary number from StackTrace entries to turn return addresses
-  // into actual instruction addresses.
-  // Use llvm-symbolizer tool to symbolize the stack traces. First look for it
-  // alongside our binary, then in $PATH.
+  // FIXME: Subtract necessary number from StackTrace entries to turn return
+  // addresses into actual instruction addresses. Use llvm-symbolizer tool to
+  // symbolize the stack traces. First look for it alongside our binary, then in
+  // $PATH.
   ErrorOr<std::string> LLVMSymbolizerPathOrErr = getLLVMSymbolizerPath(Argv0);
   if (!LLVMSymbolizerPathOrErr)
     return false;
@@ -281,7 +280,7 @@ static bool printSymbolizedStackTrace(StringRef Argv0, void **StackTrace,
                              : sys::fs::getMainExecutable(nullptr, nullptr);
 
   auto SymbolizedAddressesOpt = collectAddressSymbols(
-    StackTrace, Depth, MainExecutableName.c_str(), LLVMSymbolizerPath);
+      StackTrace, Depth, MainExecutableName.c_str(), LLVMSymbolizerPath);
   if (!SymbolizedAddressesOpt)
     return false;
   for (unsigned FrameNo = 0; FrameNo < SymbolizedAddressesOpt->size();
@@ -315,13 +314,14 @@ void sys::symbolizeAddresses(AddressSet &Addresses,
   // here.
   std::string MainExecutableName = sys::fs::getMainExecutable(nullptr, nullptr);
 
-  auto SymbolizedAddressesOpt = collectAddressSymbols(
-    AddressList.begin(), AddressList.size(),
-    MainExecutableName.c_str(), LLVMSymbolizerPath);
+  auto SymbolizedAddressesOpt =
+      collectAddressSymbols(AddressList.begin(), AddressList.size(),
+                            MainExecutableName.c_str(), LLVMSymbolizerPath);
   if (!SymbolizedAddressesOpt)
     return;
   for (auto SymbolizedFrame : *SymbolizedAddressesOpt) {
-    SmallVector<std::string, 0> &SymbolizedAddrs = SymbolizedAddresses[AddressList[SymbolizedFrame.first]];
+    SmallVector<std::string, 0> &SymbolizedAddrs =
+        SymbolizedAddresses[AddressList[SymbolizedFrame.first]];
     SymbolizedAddrs.push_back(SymbolizedFrame.second);
   }
   return;

Copy link

github-actions bot commented Jun 11, 2025

✅ With the latest revision this PR passed the undef deprecator.

@SLTozer SLTozer changed the title [DLCov] Origin-Tracking: Collect symbolized stack traces [DLCov] Origin-Tracking: Enable collecting and symbolizing stack traces Jun 11, 2025
@SLTozer SLTozer self-assigned this Jun 11, 2025
@SLTozer SLTozer added the debuginfo label Jun 11, 2025 — with Graphite App
@llvmbot
Copy link
Member

llvmbot commented Jun 11, 2025

@llvm/pr-subscribers-platform-windows
@llvm/pr-subscribers-llvm-support

@llvm/pr-subscribers-debuginfo

Author: Stephen Tozer (SLTozer)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/143591.diff

4 Files Affected:

  • (modified) llvm/include/llvm/Support/Signals.h (+40)
  • (modified) llvm/lib/Support/Signals.cpp (+116)
  • (modified) llvm/lib/Support/Unix/Signals.inc (+15)
  • (modified) llvm/lib/Support/Windows/Signals.inc (+5)
diff --git a/llvm/include/llvm/Support/Signals.h b/llvm/include/llvm/Support/Signals.h
index 6ce26acdd458e..a6f99d8bbdc95 100644
--- a/llvm/include/llvm/Support/Signals.h
+++ b/llvm/include/llvm/Support/Signals.h
@@ -14,7 +14,9 @@
 #ifndef LLVM_SUPPORT_SIGNALS_H
 #define LLVM_SUPPORT_SIGNALS_H
 
+#include "llvm/Config/llvm-config.h"
 #include "llvm/Support/Compiler.h"
+#include <array>
 #include <cstdint>
 #include <string>
 
@@ -22,6 +24,22 @@ namespace llvm {
 class StringRef;
 class raw_ostream;
 
+#if LLVM_ENABLE_DEBUGLOC_ORIGIN_TRACKING
+// Typedefs that are convenient but only used by the stack-trace-collection code
+// added if DebugLoc origin-tracking is enabled.
+template <typename T, typename Enable> struct DenseMapInfo;
+template <typename ValueT, typename ValueInfoT> class DenseSet;
+namespace detail {
+template <typename KeyT, typename ValueT> struct DenseMapPair;
+}
+template <typename KeyT, typename ValueT, typename KeyInfoT, typename BucketT>
+class DenseMap;
+using AddressSet = DenseSet<void *, DenseMapInfo<void *, void>>;
+using SymbolizedAddressMap =
+    DenseMap<void *, std::string, DenseMapInfo<void *, void>,
+             detail::DenseMapPair<void *, std::string>>;
+#endif
+
 namespace sys {
 
 /// This function runs all the registered interrupt handlers, including the
@@ -57,6 +75,28 @@ LLVM_ABI void DisableSystemDialogsOnCrash();
 ///        specified, the entire frame is printed.
 LLVM_ABI void PrintStackTrace(raw_ostream &OS, int Depth = 0);
 
+#if LLVM_ENABLE_DEBUGLOC_ORIGIN_TRACKING
+#ifdef NDEBUG
+#error DebugLoc origin-tracking should not be enabled in Release builds.
+#endif
+/// Populates the given array with a stack trace of the current program, up to
+/// MaxDepth frames. Returns the number of frames returned, which will be
+/// inserted into \p StackTrace from index 0. All entries after the returned
+/// depth will be unmodified. NB: This is only intended to be used for
+/// introspection of LLVM by Debugify, will not be enabled in release builds,
+/// and should not be relied on for other purposes.
+template <unsigned long MaxDepth>
+int getStackTrace(std::array<void *, MaxDepth> &StackTrace);
+
+/// Takes a set of \p Addresses, symbolizes them and stores the result in the
+/// provided \p SymbolizedAddresses map.
+/// NB: This is only intended to be used for introspection of LLVM by
+/// Debugify, will not be enabled in release builds, and should not be relied
+/// on for other purposes.
+void symbolizeAddresses(AddressSet &Addresses,
+                        SymbolizedAddressMap &SymbolizedAddresses);
+#endif
+
 // Run all registered signal handlers.
 LLVM_ABI void RunSignalHandlers();
 
diff --git a/llvm/lib/Support/Signals.cpp b/llvm/lib/Support/Signals.cpp
index 9f9030e79d104..50b0d6e78ddd1 100644
--- a/llvm/lib/Support/Signals.cpp
+++ b/llvm/lib/Support/Signals.cpp
@@ -253,6 +253,122 @@ static bool printSymbolizedStackTrace(StringRef Argv0, void **StackTrace,
   return true;
 }
 
+#if LLVM_ENABLE_DEBUGLOC_ORIGIN_TRACKING
+void sys::symbolizeAddresses(AddressSet &Addresses,
+                             SymbolizedAddressMap &SymbolizedAddresses) {
+  assert(!DisableSymbolicationFlag && !getenv(DisableSymbolizationEnv) &&
+         "Debugify origin stacktraces require symbolization to be enabled.");
+
+  // Convert Set of Addresses to ordered list.
+  SmallVector<void *, 0> AddressList(Addresses.begin(), Addresses.end());
+  if (AddressList.empty())
+    return;
+  int NumAddresses = AddressList.size();
+  llvm::sort(AddressList);
+
+  // Use llvm-symbolizer tool to symbolize the stack traces. First look for it
+  // alongside our binary, then in $PATH.
+  ErrorOr<std::string> LLVMSymbolizerPathOrErr = std::error_code();
+  if (const char *Path = getenv(LLVMSymbolizerPathEnv)) {
+    LLVMSymbolizerPathOrErr = sys::findProgramByName(Path);
+  }
+  if (!LLVMSymbolizerPathOrErr)
+    LLVMSymbolizerPathOrErr = sys::findProgramByName("llvm-symbolizer");
+  assert(!!LLVMSymbolizerPathOrErr &&
+         "Debugify origin stacktraces require llvm-symbolizer.");
+  const std::string &LLVMSymbolizerPath = *LLVMSymbolizerPathOrErr;
+
+  // Try to guess the main executable name, since we don't have argv0 available
+  // here.
+  std::string MainExecutableName = sys::fs::getMainExecutable(nullptr, nullptr);
+
+  BumpPtrAllocator Allocator;
+  StringSaver StrPool(Allocator);
+  std::vector<const char *> Modules(NumAddresses, nullptr);
+  std::vector<intptr_t> Offsets(NumAddresses, 0);
+  if (!findModulesAndOffsets(AddressList.data(), NumAddresses, Modules.data(),
+                             Offsets.data(), MainExecutableName.c_str(),
+                             StrPool))
+    return;
+  int InputFD;
+  SmallString<32> InputFile, OutputFile;
+  sys::fs::createTemporaryFile("symbolizer-input", "", InputFD, InputFile);
+  sys::fs::createTemporaryFile("symbolizer-output", "", OutputFile);
+  FileRemover InputRemover(InputFile.c_str());
+  FileRemover OutputRemover(OutputFile.c_str());
+
+  {
+    raw_fd_ostream Input(InputFD, true);
+    for (int i = 0; i < NumAddresses; i++) {
+      if (Modules[i])
+        Input << Modules[i] << " " << (void *)Offsets[i] << "\n";
+    }
+  }
+
+  std::optional<StringRef> Redirects[] = {InputFile.str(), OutputFile.str(),
+                                          StringRef("")};
+  StringRef Args[] = {"llvm-symbolizer", "--functions=linkage", "--inlining",
+#ifdef _WIN32
+                      // Pass --relative-address on Windows so that we don't
+                      // have to add ImageBase from PE file.
+                      // FIXME: Make this the default for llvm-symbolizer.
+                      "--relative-address",
+#endif
+                      "--demangle"};
+  int RunResult =
+      sys::ExecuteAndWait(LLVMSymbolizerPath, Args, std::nullopt, Redirects);
+  if (RunResult != 0)
+    return;
+
+  // This report format is based on the sanitizer stack trace printer.  See
+  // sanitizer_stacktrace_printer.cc in compiler-rt.
+  auto OutputBuf = MemoryBuffer::getFile(OutputFile.c_str());
+  if (!OutputBuf)
+    return;
+  StringRef Output = OutputBuf.get()->getBuffer();
+  SmallVector<StringRef, 32> Lines;
+  Output.split(Lines, "\n");
+  auto CurLine = Lines.begin();
+  for (int i = 0; i < NumAddresses; i++) {
+    assert(!SymbolizedAddresses.contains(AddressList[i]));
+    std::string &SymbolizedAddr = SymbolizedAddresses[AddressList[i]];
+    raw_string_ostream OS(SymbolizedAddr);
+    if (!Modules[i]) {
+      OS << format_ptr(AddressList[i]) << '\n';
+      continue;
+    }
+    // Read pairs of lines (function name and file/line info) until we
+    // encounter empty line.
+    for (bool IsFirst = true;; IsFirst = false) {
+      if (CurLine == Lines.end())
+        return;
+      StringRef FunctionName = *CurLine++;
+      if (FunctionName.empty())
+        break;
+      // Add indentation for lines after the first; we use 3 spaces, because
+      // currently that aligns with the expected indentation that will be added
+      // to the first line by Debugify.
+      if (!IsFirst)
+        OS << "   ";
+      OS << format_ptr(AddressList[i]) << ' ';
+      if (!FunctionName.starts_with("??"))
+        OS << FunctionName << ' ';
+      if (CurLine == Lines.end()) {
+        OS << '\n';
+        return;
+      }
+      StringRef FileLineInfo = *CurLine++;
+      if (!FileLineInfo.starts_with("??"))
+        OS << FileLineInfo;
+      else
+        OS << "(" << Modules[i] << '+' << format_hex(Offsets[i], 0) << ")";
+      OS << '\n';
+    }
+  }
+  return;
+}
+#endif
+
 static bool printMarkupContext(raw_ostream &OS, const char *MainExecutableName);
 
 LLVM_ATTRIBUTE_USED
diff --git a/llvm/lib/Support/Unix/Signals.inc b/llvm/lib/Support/Unix/Signals.inc
index 6668a2953b3b2..70b2cf7c756a9 100644
--- a/llvm/lib/Support/Unix/Signals.inc
+++ b/llvm/lib/Support/Unix/Signals.inc
@@ -507,6 +507,21 @@ static int dl_iterate_phdr_cb(dl_phdr_info *info, size_t size, void *arg) {
   return 0;
 }
 
+#if LLVM_ENABLE_DEBUGLOC_ORIGIN_TRACKING
+#if !defined(HAVE_BACKTRACE)
+#error DebugLoc origin-tracking currently requires `backtrace()`.
+#endif
+namespace llvm {
+namespace sys {
+template <unsigned long MaxDepth>
+int getStackTrace(std::array<void *, MaxDepth> &StackTrace) {
+  return backtrace(StackTrace.data(), MaxDepth);
+}
+template int getStackTrace<16ul>(std::array<void *, 16ul> &);
+} // namespace sys
+} // namespace llvm
+#endif
+
 /// If this is an ELF platform, we can find all loaded modules and their virtual
 /// addresses with dl_iterate_phdr.
 static bool findModulesAndOffsets(void **StackTrace, int Depth,
diff --git a/llvm/lib/Support/Windows/Signals.inc b/llvm/lib/Support/Windows/Signals.inc
index f11ad09f37139..0fe6967b291da 100644
--- a/llvm/lib/Support/Windows/Signals.inc
+++ b/llvm/lib/Support/Windows/Signals.inc
@@ -9,6 +9,7 @@
 // This file provides the Win32 specific implementation of the Signals class.
 //
 //===----------------------------------------------------------------------===//
+#include "llvm/Config/llvm-config.h"
 #include "llvm/Support/ConvertUTF.h"
 #include "llvm/Support/ExitCodes.h"
 #include "llvm/Support/FileSystem.h"
@@ -542,6 +543,10 @@ void sys::PrintStackTraceOnErrorSignal(StringRef Argv0,
 extern "C" VOID WINAPI RtlCaptureContext(PCONTEXT ContextRecord);
 #endif
 
+#if LLVM_ENABLE_DEBUGLOC_ORIGIN_TRACKING
+#error DebugLoc origin-tracking currently unimplemented for Windows.
+#endif
+
 static void LocalPrintStackTrace(raw_ostream &OS, PCONTEXT C) {
   STACKFRAME64 StackFrame{};
   CONTEXT Context{};

@SLTozer SLTozer marked this pull request as ready for review June 11, 2025 10:53
@jryans
Copy link
Member

jryans commented Jun 11, 2025

This patch adds a pair of new functions in signals.h that can be used to collect and symbolize stack traces respectively. This has major implementation overlap with the existing stack trace collection/symbolizing methods, but the existing functions are specialized for dumping a stack trace to stderr when LLVM crashes, while these new functions are meant to be called repeatedly during the execution of the program, and therefore we need a separate set of functions.

This is the main point I was questioning last time, and I am still not sure I follow your thinking here. There's quite a lot of duplicated code, and I believe it would be best for future maintenance if we could reduce that duplication as much as possible.

It looks like your new function builds up some data in memory, rather than writing it all to output. Perhaps printSymbolizedStackTrace could be reimplemented to call your new symbolizeAddresses, thus reducing duplication significantly?

Perhaps I've missed some crucial reason why that's hard to do here...?

I definitely want to see this work move forward in some form. I'm just trying to reduce future maintenance work where possible.

NB: These functions are only present when origin-tracking is enabled in CMake. Carrying over a conversation from when this was last put up for review, where @jryans requested that the functions be made available unconditionally, I personally believe that since the functions are doing "unusual" things it would be better to avoid compiling them in until we have a confirmed second use-case; I could easily be swayed from this if there's a good argument that another category of users exists, or that in principle there's not a good reason to hide the functions.

Making the new function available unconditionally is less important to me than reducing the code duplication discussed above. It still do believe other users will likely emerge (as I can imagine wanting to use this for some of my own projects), but if you'd rather hide it for now, that's okay with me. (A second use case can be the one to remove the compile guards, for example.)

@SLTozer SLTozer force-pushed the users/SLTozer/dl-coverage-origin-cmake branch from 2bcb596 to f631e37 Compare June 11, 2025 11:32
@SLTozer SLTozer force-pushed the users/SLTozer/dl-coverage-origin-symbolize branch 2 times, most recently from 12f5a10 to f47991b Compare June 11, 2025 12:13
@SLTozer SLTozer force-pushed the users/SLTozer/dl-coverage-origin-cmake branch from f631e37 to 5713e04 Compare June 11, 2025 12:13
Copy link
Member

@jmorse jmorse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Knarly but necessary I suppose. If there is indeed a way of reducing duplication that'd be fab.

A bigger question though is whether this can be tested. That would have the benefit of keeping this working across a range of platforms, and then the functionality wouldn't be behind a flag either. However it's also very deep -- is it feasible for this to be tested at all?

Comment on lines +511 to +513
#if !defined(HAVE_BACKTRACE)
#error DebugLoc origin-tracking currently requires `backtrace()`.
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this instead be a cmake check? (I've no idea how to achieve that, it just might be more convenient to developers to fail-early-fail-hard).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, I'll look into it - though if possible I'd prefer to replace the use of backtrace() with in-program stack traversal, which would require -fno-omit-frame-pointer (I'm not sure how you'd check that in CMake either) but would be significantly faster than making the lib call, so if it turns out to be non-trivial (e.g. if there's some ordering issue in the CMake config) then it may not be worth it.

Comment on lines 277 to 278
assert(!!LLVMSymbolizerPathOrErr &&
"Debugify origin stacktraces require llvm-symbolizer.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that this shouldn't be an assertion, something like report_fatal_error instead? While the pre-requisites of this code is that it's not compiled under NDEBUG, if that ever changes then this error-path with stop being handled. It'd be better to not rely on assert for the reporting.

Comment on lines 311 to 316
#ifdef _WIN32
// Pass --relative-address on Windows so that we don't
// have to add ImageBase from PE file.
// FIXME: Make this the default for llvm-symbolizer.
"--relative-address",
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backtrace fetcher is not supported on Windows, so does this really need to consider Windows here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need it for now, but since this is copied from the existing invocation of the symbolizer (and I'm currently looking at extracting this and a few other parts out), it's more effort to remove support for Windows than to keep it; I do intend to add Windows support later on, it's just trickier than using backtrace() and not an urgent feature.

}
// Read pairs of lines (function name and file/line info) until we
// encounter empty line.
for (bool IsFirst = true;; IsFirst = false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for-inside-for, makes this quadratic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't exactly say so - we're iterating over the call stack, which means iterating over the real stack frames, and for each real stack frame iterating over all the inlined calls at that frame's PC, so I would say that we're iterating linearly over the set of real+inlined frames.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice; please to add comments to that effect

@SLTozer
Copy link
Contributor Author

SLTozer commented Jun 12, 2025

A bigger question though is whether this can be tested.

That's a good question - in theory yes, but it sounds brittle. Assuming we're testing the output of backtrace/symbolize, we'd have a test that could fail if any of a variety of function names changed. Another argument against testing is that this code exists to generate formatted, human readable output - most of the complicated parts are in llvm-symbolizer and libbacktrace, while these functions are essentially just invoking them and formatting the output.

All the same, testing is generally good; I'll consider how would be best to test this, but I will also happily take suggestions if you/others have any.

@SLTozer SLTozer force-pushed the users/SLTozer/dl-coverage-origin-symbolize branch from f47991b to 622d1fb Compare June 12, 2025 15:21
Base automatically changed from users/SLTozer/dl-coverage-origin-cmake to main June 13, 2025 11:54
Copy link
Member

@jmorse jmorse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any good ideas for making this testable -- it's fundamentally introspection on the innards of LLVM/clang, which then makes it heavily dependent on how LLVM is built. And that then means it'll have to be UNSUPPORTED in various configurations/scenarios, making coverage pretty random.

I suppose we can state: all the source-location monitoring we're planning is based on this, and it'll be covered by a public bot, so it's become obvious if this doesn't work.

}
// Read pairs of lines (function name and file/line info) until we
// encounter empty line.
for (bool IsFirst = true;; IsFirst = false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice; please to add comments to that effect

@SLTozer SLTozer force-pushed the users/SLTozer/dl-coverage-origin-symbolize branch from 622d1fb to 46c1770 Compare June 20, 2025 15:04
@SLTozer
Copy link
Contributor Author

SLTozer commented Jun 20, 2025

Updated this to try and de-duplicate, does it look better like this? The git diff doesn't look especially clean, but hopefully it's clear from the new sizes of the public functions (for crash dumps and debugify respectively) that most of the logic is now shared. Still not sure about testing, but there is one lit test that tests crash dump output, and from manually crashing the compiler a few times it looks to output exactly the same as before.

Copy link
Member

@jmorse jmorse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tentative LGTM, CC @jryans on the topic of the refactor / de-duplication

Copy link
Member

@jryans jryans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this refactoring, looks good to me! 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants