Static analysis tools

for Go code comprehension and refactoring

golang nyc meetup

13 Nov 2014

Alan Donovan

Google, New York


This talk was presented at the GothamGo Kickoff Meetup in New York City, November 2014.



Programmers say "writing code", but most of that time is spent reading

Actually: reading code, and making logical deductions

Machines are good at reading and logic

Machines should be helping us read code.

And write it! Refactoring can be tedious and error-prone.




Demo: the Go oracle

Supports interactive, editor-integrated queries:
- where is this thing defined?
- what are the methods of this type?
- what are the free variables of this selection?
- what might this expression point to?
- what dynamic types might this interface or reflect.Value contain?


Demo: godoc analysis features

Package view
- method set and implements relation for every type
- internal call graph of every package

Source code view
- kind, type, definition of every identifier
- method set and implements relation for every type
- peers of every channel send/receive
- callers of every function
- callees of every call site (even dynamic)


How it works


Libraries and tools

Mostly in repo


go/types: the Go type checker

Computes mappings from:
- identifiers to definitions
- constant expressions to values
- expressions to types

Many subtle corners:
- method set computation
- recursive interfaces
- shifts

Making it correct, fast, and clean was a substantial project

Author: Robert Griesemer


go/ssa: intermediate representation (IR)

Typed abstract syntax trees (ASTs) are too varied and subtle to analyze directly

Programs are lowered into Static Single-Assignment form (SSA):
- simplifies dataflow analyses since reaching definitions are implicit
- invented 1991, now mainstream (gcc, llvm)

All Go programs can be expressed using only ~30 basic instructions

Simple, explicit, high-level, high source fidelity

The llgo project is using go/ssa as a front-end for LLVM


Demo: ssadump


go/pointer: pointer analysis

Pointers complicate reasoning about program behaviour
- functions may be called dynamically
- a variable may be updated and accessed via different names ("aliases")
- runtime types are values too: interface, reflect.Type

We use pointer analysis to answer the question:
which variables might this pointer point to?


Pointer analysis

Let pts(p) be the points-to set of pointer p
Its elements are abstract variables: globals, locals, unnamed (new(T))


Statement      Constraint
 y = &x         pts(y) ⊇ {x}
 y = x          pts(y) ⊇ pts(x)                     "inclusion-based"
*y = x          ∀o ∈ pts(y). pts(o) ⊇ pts(x)
 y = *x         ∀o ∈ pts(x). pts(y) ⊇ pts(o)
 y = &x->f      \
 y = x.(T)       } not shown
 reflection     /

Pointer analysis: constraint generation

All Go operations can be expressed in these constraints
Function, map, slice and channel ops all simplify to struct ops

Treatment of unsafe.Pointer conversion is unsound
But we don't care! This isn't a compiler

The constraint system is:
- field-sensitive: struct fields x.f and x.g have distinct solutions
- flow-insensitive: the order of statements is ignored
- context-insensitive: each function is analyzed only once
- whole-program: Go source is required for all dependencies
- inclusion-based: i.e. Andersen's analysis

Optimization: don't make constraints for "uninteresting" types
such as types not related to a one-shot Oracle query


Pointer analysis: pre-solver optimizations

Constraint system for 124KLoC program (oracle) has:

254,000 variables
184,000 equations

Solving phase is in O(|v|³), so pre-solver optimisation is crucial

We can bring this down to:

88,000 variables
65,000 equations


Pointer Equivalence by Hash-Value Numbering (Hardekopf & Lin '07)

p = &x                  a = &x                         if ... {
q = p                   b = a                            p, q = &x, &y
r = q                   c = b                          } else {
s = r                   if ... { a = c }                 p, q = &y, &x

Hash-Value Numbering


Pointer analysis: solving

It's transitive closure of a graph, essentially

Lots of optimizations, for example:

sparse bit vectors, a very compact representation for points-to sets

Solver log is >1GB. Debugging is fun.


Call graph

The call graph can be read directly from the solution

The tool prints raw call graphs

Demo (time permitting)


Refactoring tools


gorename: precise, type-safe identifier renaming

A refactoring tool, usable from...

% gorename -from bytes.Buffer.Len -to Size
Renamed 105 occurrences in 42 files in 33 packages.

All renamings are reversible

Sound: either a conflict is reported, or the refactoring preserves behaviour*

*except reflection


Demo: gorename


Example-based refactoring with eg

A Go port of the Java Refaster tool

A template specifies the pattern and replacement as Go expressions:

package P

import ( "errors"; "fmt" )

func before(s string) error { return fmt.Errorf("%s", s) }
func after(s string)  error { return errors.New(s) }

Parameters are wildcards

% eg -t template.go <package> ...

Demo: eg



From the outset, Go has been renowned for its tools: go, gofmt

We have many building blocks for sophisticated source analysis tools

What should we build next?


Thank you

golang nyc meetup

13 Nov 2014

Alan Donovan

Google, New York

Use the left and right arrow keys or click the left and right edges of the page to navigate between slides.
(Press 'H' or navigate to hide this message.)