Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/tools/cmd/heapdump: create a heap dump viewer #16410

Closed
matloob opened this issue Jul 18, 2016 · 31 comments
Closed

x/tools/cmd/heapdump: create a heap dump viewer #16410

matloob opened this issue Jul 18, 2016 · 31 comments

Comments

@matloob
Copy link
Contributor

matloob commented Jul 18, 2016

@alandonovan @randall77 @aclements

I'd like to propose a new tool: a viewer for go heap dumps.

runtime/debug.WriteHeapDump already provides a mechanism for writing heap dumps, but we don't provide a tool to inspect the heap dumps. We should provide a graphical tool for inspecting and analyzing heap dumps similar to those that exist for Java heap dumps.

The tool would fit best in the tools subrepo or the core repo.

@fjl
Copy link

fjl commented Jul 18, 2016

A viewer for Go 1.4 dumps is available here: https://github.com/randall77/heapdump14
I have sucessfully used it to debug my application (compiled with Go 1.6) but some hacking was required to make it work.

The challenge for this tool is that the heapdump format doesn't contain complete type information. There is a lot of guesswork involved to figure out the types of all objects. In my case it failed to recognize most of them and I had to add an additional heuristic that tries to identify types by matching their GC signature.

I would love to see a better tool, heap dumps are invaluable when it comes to finding memory leaks.

@randall77 works on Go and can probably say more about this.

@aclements
Copy link
Member

@fjl, I think the plan is to start with heapdump14 (or at least salvage as much as possible from it). Broadly, this may also involve making the heap dump format more friendly to analysis tools, so they don't have to do as much guesswork.

@randall77
Copy link
Contributor

Yes, the 1.3 viewer was awesome because we had full type info for every object in the heap. 1.4 lost that info and the 1.4 viewer tries (badly) to reconstruct type info as much as it can from the DWARF info for the roots. We'd want to do a better job of that if possible.

What's the plan for heap dump format? We talked a while ago about changing to use a standard core dump with some breadcrumbs to encode the heap metadata. Any progress on that?

@aclements
Copy link
Member

@matloob, it would be great if the heap reader part of the tool lived in a separate package that can be used to build tools other than the viewer. There have been several times when I've wanted to do one-off heap analyses (usually for debugging the GC), but the tooling simply wasn't feasible. Such a package wouldn't necessarily have to satisfy Go 1 compatibility.

@fjl
Copy link

fjl commented Jul 18, 2016

For my use cases, it would be perfectly acceptable if precise heapdump analysis required a GODEBUG flag (e.g. to enable tracking of interface types) or automated source rewrites like cmd/cover does. That would probably simplify the analysis code a lot.

@matloob
Copy link
Contributor Author

matloob commented Jul 18, 2016

@fjl Yes, the goal of this proposal is to have an up-to-date heap viewer. The end product will look quite different from the current viewer, but we'll reuse as much as it as we can.

@aclements Yes, I'll try to keep as much of the tool in separate packages that can be used to read or analyze a heap.

@randall77 Yes, I think it would be good to use a standard core dump + breadcrumbs, but I haven't looked very deeply yet.

@matloob
Copy link
Contributor Author

matloob commented Jul 20, 2016

Is it okay if I start working on a prototype in the tools subrepo? I'm thinking of putting things in to the 'cmd/heapview' subdirectory.

@bradfitz
Copy link
Contributor

SGTM.

@bradfitz bradfitz added this to the Unreleased milestone Jul 20, 2016
@bradfitz bradfitz changed the title proposal: heap dump viewer x/tools/cmd/heapdump: create a heap dump viewer Jul 20, 2016
@matloob
Copy link
Contributor Author

matloob commented Jul 20, 2016

On second thought I think 'goheap' might be a better name for the command, to distinguish from other heap related commands on a system.

@matloob matloob changed the title x/tools/cmd/heapdump: create a heap dump viewer x/tools/cmd/goheap: create a heap dump viewer Jul 20, 2016
@bradfitz
Copy link
Contributor

I don't like prefacing everything with "go".

to distinguish from other heap related commands on a system.

I don't have any other heap-related commands on my system. :-)

@matloob matloob changed the title x/tools/cmd/goheap: create a heap dump viewer x/tools/cmd/heapdump: create a heap dump viewer Jul 20, 2016
@matloob
Copy link
Contributor Author

matloob commented Jul 20, 2016

heapdump it is! we can always change the name later

gopherbot pushed a commit to golang/tools that referenced this issue Jul 20, 2016
This change creates a place where we can start building
the 'heapdump' heap viewer and analyzer

Updates golang/go#16410

Change-Id: I216e13f1ceb6790bf492cfc8cbcc4f19f12b0b9e
Reviewed-on: https://go-review.googlesource.com/25085
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@aclements
Copy link
Member

heapview? "heapdump" says to me that it dumps the heap, not that it analyzes heap dumps. (But my opinion on this is not very strong. :)

@bradfitz
Copy link
Contributor

heapview is also fine with me.

@matloob
Copy link
Contributor Author

matloob commented Jul 20, 2016

My opinion isn't very strong either, but one advantage of heapdump is that it can name a more general tool.

So you might use heapdump view to start the viewer or heapdump grab to grab a heapdump or heapdump stats to get some stats on a heap dump?

@aclements
Copy link
Member

I would lean toward making those separate tools instead of using subcommands. In particular, if there's a library that lets people write other tools to process heap dumps, making these separate commands puts those other tools on the same footing, and users don't have to remember what is a subcommand of the "official" heapdump command and what isn't.

@matloob
Copy link
Contributor Author

matloob commented Jul 20, 2016

ok, looks like everyone's ok with heapview, so unless there are any objections, i'll start using that name?

@gopherbot
Copy link

CL https://golang.org/cl/25101 mentions this issue.

gopherbot pushed a commit to golang/tools that referenced this issue Jul 20, 2016
Updates golang/go#16410

Change-Id: I0133971f9a1149da6d88180fc2e378003f189cc8
Reviewed-on: https://go-review.googlesource.com/25101
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@gopherbot
Copy link

CL https://golang.org/cl/25240 mentions this issue.

gopherbot pushed a commit to golang/tools that referenced this issue Jul 26, 2016
This change primarily exists to import Typescript and the
ES6 module loader polyfill as dependencies for this project.
Both dependencies are relatively lightweight and can be easily
removed if we decide we don't need them.

The module loader polyfill implements support for an upcoming
browser feature in ES6 (the next version of JavaScript). This
feature helps modularize Javascript code and conveniently split it
into multiple files. It should be supported by the stable versions
of the four major browsers (Chrome, Firefox, Safari and Edge)
by the end of the year. Once that happens, we can remove the polyfill.

The Typescript compiler provides two things: First, it compiles
new, but not-yet-supported ES6 Javascript features into ES5. It
also provides a typechecker similar to what Closure does, but types
are indicated in syntax rather than JSDoc comments. If we decide
we don't want this dependency, we can compile the Typescript code
into human-readable JavaScript code. (The compiler basically
strips out types and replaces ES6 language features with more
well-supported JavaScript equivalents). The Typescript compiler
is not required for development. typescript.js and a feature in
the module loader will be used to compile Typescript into JavaScript
at serving time. (We might want to do something different for the
production version, but we can get to that later).

The change also adds code to serve the HTML and Javascript files.

Updates golang/go#16410

Change-Id: I42c669d1de636d8b221fc03ed22aa7ac60554610
Reviewed-on: https://go-review.googlesource.com/25240
Reviewed-by: Austin Clements <austin@google.com>
@gopherbot
Copy link

CL https://golang.org/cl/25273 mentions this issue.

gopherbot pushed a commit to golang/tools that referenced this issue Jul 27, 2016
This change breaks out the code that adds handler funcs and
starts the HTTP server into separate functions, so that they
can be overridden in other environments, such as Google's.

For instance, listenAndServe can be overridden in an init method
in a different file to use a HTTP2 server.

Updates golang/go#16410

Change-Id: I074242af10486c60c374e9ac7ebe9d0e61a8fa22
Reviewed-on: https://go-review.googlesource.com/25273
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
@tombergan
Copy link
Contributor

tombergan commented Aug 1, 2016

It turns out I prototyped part of what this issue is asking for without realizing this issue existed. Thanks to @matloob for pointing me here. I also fixed a few bugs in the old heapview code.
https://github.com/tombergan/goheapdump

I agree with Austin that the main artifact of this issue should be an API for analyzing heap dumps. I took a stab at such an API here. It would be great to unify the API for heap analysis with the API in x/debug. I'm actually more excited about the idea of an automated heap checker rather than just a heap viewer (although that would be cool and useful as well). I wrote a simple http.Response leak checker here. I have more thoughts in that file about heap checkers we could write.

Also, a big +1 to making the heapdump just a core file, possibly with extra breadcrumbs. I ran into a few problems with the current heapdump format. I tried to fix some of them, but couldn't easily fix all of them -- I ran into similar type-matching issues for interfaces as @fjl. My initial thought was to use a core file without breadcrumbs. This way we could potentially deprecate runtime.WriteHeapDump entirely and instead use any core file. Roughly:

  • Use types from DWARF to bootstrap the heapdump loader. This gives you enough info to find and walk the mspans, _types, itabs, and so on.
  • Walk those runtime structures to learn types for the entire heap.
  • Walk the GC masks to enumerate all the pointers tracked by GC.
  • Export all of this to the client using a nice, reflect-like API.

The downside of this approach is that the runtime structures may change from release-to-release. This could be a pain to maintain.

Long story short, a tool like this is something I've wanted when debugging OOMs in my day-to-day work. It's also a technically interesting problem and I'm happy to help out as needed.

@matloob
Copy link
Contributor Author

matloob commented Aug 1, 2016

@tombergan I haven't had a chance to look very deeply at the API in your package yet, but it looks good.

My plan was to start by working on in support libraries for reading and understanding cores files as heap dumps into golang.org/x/tools/cmd/heapview/internal as support code. Once we're more confident about it, we can move the libraries into golang.org/x/tools/heap, but working in an internal package at first allows us some flexibility in changing the API.

I think it would be good to start with your API interface (and as much of the code that's applicable to cores as heaps) golang.org/x/tools/cmd/heapview/internal and build from there. What do you think?

@matloob
Copy link
Contributor Author

matloob commented Aug 1, 2016

I've put up a proposal doc for this issue: https://github.com/golang/proposal/blob/master/design/16410-heap-viewer.md

@rhysh
Copy link
Contributor

rhysh commented Aug 1, 2016

@matloob Will the ELF core file described in the proposal doc be the same (on ELF-based systems) as one generated by GOTRACEBACK=crash or gcore(1)? Is the ELF format all they'll have in common, or will they contain the same information and layout, such that the tools you're producing would be able to work on core files from any of those three sources?

Ideally, there will be a 'one-click' solution to get from running program to dump. One possible way to do this would be to add a library to expose a special HTTP handler. Requesting the page would that would trigger a core dump to a user-specified location on disk while the program's running, and start the heap dump viewer program.

What information is currently missing from Linux core dumps of Go programs that would be necessary to reconstruct the heap? What is required to include that information in GOTRACEBACK=crash core dumps?

@randall77
Copy link
Contributor

I've taken a look through the proposal doc and I like it.

@tombergan , I looked through your API. Here's a few comments:

  • We absolutely want to have the dump analyzer/viewer to be able to handle large heaps. At the same time, I think designing the analyzer to somehow process the dump with O(1) space is a hard research problem. I don't want us to tackle that in v1. For now, we can probably get away with using O(1) space per object (independent of object size) as the current viewer attempts to do. I expect most large heaps are large because of large objects ([]byte probably), so using O(1) space per object will reduce space usage significantly.
  • The API surface is really large. I'm not sure there is anything really to be done here, except to prune unnecessary or redundant stuff whenever we can.

What information is currently missing from Linux core dumps of Go programs that would be necessary > to reconstruct the heap? What is required to include that information in GOTRACEBACK=crash core dumps?

This one should be our top priority. We'll want 1.8 to have all the fixes we need to get reliable types/breadcrumbs/dwarfinfo/etc. in the dumps (core files?). It would be a bummer to discover after the 1.8 freeze (Nov 1) that there is one bit of information we realize we needed but didn't have.

@tombergan
Copy link
Contributor

tombergan commented Aug 2, 2016

Proposal LGTM! Some comments:

The advantage of the hprof format is that there already exist many tools for analyzing hprof dumps. It will be a good idea to consider this format more throughly before making a decision.

I vote against using hprof as the main format because it doesn't support interior pointers well. Everything is based on object ID. A lot of the interesting analyses we might want to do will need to understand interior pointers (example).

@rhysh Will the ELF core file described in the proposal doc be the same (on ELF-based systems) as one generated by GOTRACEBACK=crash or gcore(1)?
What information is currently missing from Linux core dumps of Go programs that would be necessary to reconstruct the heap?

Good questions that the proposal should answer. I believe there are three things we want to extract from the core dump, besides the usual DWARF data: (1) Dynamic types of interface values, (2) location of goroutine stacks, and (3) Where the GC thinks the pointers are. In theory, all of these can be extracted from an ordinary core file with DWARF data, but to do that, you need to walk the internal runtime structures. This means the heapdump library will need to change each time the runtime structs change, which could be annoying. The alternative idea is to embed (1,2,3) in a custom section of the ELF file using a stable format. This means less work for the heapdump library, however, it also means the heapdump library won't be able to process ordinary core files as generated by gcore.

I have a slight preference for supporting ordinary core files, but am curious what others think. Note that the x/debug library already has code to walk some of the runtime structures. There's probably an opportunity to share code with that library.

@matloob I think it would be good to start with your API interface (and as much of the code that's applicable to cores as heaps) golang.org/x/tools/cmd/heapview/internal and build from there. What do you think?

SGTM. As @randall77 points out, that API I prototyped could use some pruning and cleaning. Don't be afraid to take a hatchet to it. Feel free to CC me on any CLs or assign me bits of work as they come up.

@randall77 We absolutely want to have the dump analyzer/viewer to be able to handle large heaps. At the same time, I think designing the analyzer to somehow process the dump with O(1) space is a hard research problem. I don't want us to tackle that in v1.

Agree with that completely, except I might replace "hard" with "fun and distracting" :-) The offline email thread with @alandonovan was more about the API than the implementation (not painting ourselves into a corner where the API becomes impossible to implement for large heaps).

@aclements
Copy link
Member

The alternative idea is to embed (1,2,3) in a custom section of the ELF file using a stable format. This means less work for the heapdump library, however, it also means the heapdump library won't be able to process ordinary core files as generated by gcore.

The plan was to not to have a special ELF section, but for the runtime to construct a data structure at a known symbol with the high-level information the heap dumper needs in a form that's convenient for the heap dumper. This would be easy to read out of the core file and wouldn't require any special core dump support. We definitely want ordinary system core dumps to work, both because that's easier for bootstrapping and because that's the only way you're going to get a core dump on OOM.

@y3llowcake
Copy link

Are supporting changes to the runtime in progress? Is there somewhere I can follow along or offer assistance?

As someone who runs a memory intensive production service, I can't express my desire for this enough.

@tombergan
Copy link
Contributor

There are no supporting changes to the runtime in progress AFAIK. I've been working on a corefile-based heap debugger (https://github.com/tombergan/goheapdump) but it's still a ways from being usable and this is not my primary project, so progress is slow.

It's in theory possible to implement a corefile-based heap debugger without any runtime library changes, but the downside is you have to reimplement all of the logic to grok GC bitmaps in the corefile tool. This is basically what I'm doing. @aclements may have something more clever in mind, I'm not sure. The only deficiency I've noticed so far is incomplete DWARF info. For example, AFAICT, there is no DWARF info for free variables in closures -- I believe these are stored in *funcvals, although I'm not totally clear on the internal representation. Another example is that internal structures like runtime.arraytype are often missing from the DWARF, which makes walking itabs kind of annoying.

@Dieterbe
Copy link
Contributor

regarding the API, I think i have another nice use case for this. if we're able to walk all the pointers through the heap in the same way the GC does, then we have a nice way to represent GC workload and generate reports about which kinds of datastructures account for most of the work, e.g. you can find out where you can get most bang of the buck wrt optimizing pointer-based structures into pointerless ones to shorten GC times. similar to pprof profiles where we assign weights to lines of code, we would here assign weights to memory locations and types. (not sure how feasible this is and whether it makes sense?)

@randall77
Copy link
Contributor

@Dieterbe: Sure, number of pointers is a reasonable proxy for GC load. Objects, and subgraphs of objects, can be weighted by the number of pointers they contain.

See #21356 , I'm working on a library for reading core files.
That library is intended for use by such tools. I'm planning on putting a reference example of such tool in x/debug/cmd/coreview. But the intent is that others will be able to use the library to make more awesome tools than I can.

@matloob
Copy link
Contributor Author

matloob commented Jan 30, 2019

I'm not planning to, and haven't been able to work on this. I'm going to close this bug.

@matloob matloob closed this as completed Jan 30, 2019
@golang golang locked and limited conversation to collaborators Jan 30, 2020
@rsc rsc unassigned matloob Jun 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants