Go, from C to Go

GopherCon

25 Apr 2014

Russ Cox

Google

Video

A video of this talk was recorded at GopherCon in Denver.

2

Go Compiler

3

Go Compiler

80,000+ lines of C.

4

Problem

Programming in Go is fun.

Programming in C is not.

5

Problem

Writing a Go compiler requires Go expertise.

Writing any program in C requires C expertise.

Writing a Go compiler in C requires Go and C expertise.

6

Solution

Write the Go compiler in Go.

7

Past

Why not write the Go compiler in Go on day one?

8

Present

Why do it today?

9

How?

Crazy idea: mechanical conversion.

“One big gofix.”

10

C

11

C

12

C Data Model

13

C Control Flow

14

C Program Model

15

Conversion

16

Challenges for Converting C to Go

17

Goal

Automated conversion of our C code to Go.

Target: our C code, not all C code.

18

Warmups

19

Unions

go/src/cmd/gc/go.h

struct  Val
{
    short   ctype;
    union
    {
        short   reg;        // OREGISTER
        short   bval;       // bool value CTBOOL
        Mpint*  xval;       // int CTINT, rune CTRUNE
        Mpflt*  fval;       // float CTFLT
        Mpcplx* cval;       // float CTCPLX
        Strlit* sval;       // string CTSTR
    } u;
};
20

Unions

go/include/link.h

struct  Addr
{
    short type;
    union
    {
        char    sval[8];
        float64 dval;
        Prog*   branch; // for 5g, 6g, 8g
    } u;
    
    ...
};
21

Unions

#define struct union /* Great space saver */

22

Unions

#define union struct /* legal in C! */

And anyway, there are only two.

23

#define

Can't just expand during parsing.

24

#define

Not many.

/*
 * defined macros
 *    you need super-gopher-guru privilege
 *    to add this list.
 */
#define nelem(x)    (sizeof(x)/sizeof((x)[0]))
#define nil         ((void*)0)
...

Extend parser to recognize special cases.

25

#define

Annotate some.

#define    BOM    0xFEFF
/*c2go enum { BOM = 0xFEFF }; */

Rewrite others.

enum {
    BOM = 0xFEFF,
};
26

Comments

Can't just discard during parsing.

/*
 * If the new process paused because it was
 * swapped out, set the stack level to the last call
 * to savu(u_ssav).  This means that the return
 * which is executed immediately after the call to aretu
 * actually returns from the last routine which did
 * the savu.
 *
 * You are not expected to understand this.
 */
if(rp->p_flag&SSWAP) {
    rp->p_flag =& ~SSWAP;
    aretu(u.u_ssav);
}
27

Comments

Record precise source locations.

case OMAPLIT:
    n->esc = EscNone;  // until proven otherwise
    e->noesc = list(e->noesc, n);
    n->escloopdepth = e->loopdepth;

    // Keys and values make it to memory, lose track.
    for(ll=n->list; ll; ll=ll->next) {
        escassign(e, &e->theSink, ll->n->left);
        escassign(e, &e->theSink, ll->n->right);
    }
    break;

Whole-line comments attach to syntax immediately following (or EOF).

Suffix comments attach to syntax immediately before.

Syntax carries comments if it moves.

28

Goto

29

C Goto

“27. Horrors! goto’s and labels

C has a goto statement and labels, so you can branch about the way you used to. But most of the time goto’s aren’t needed... The code can almost always be more clearly expressed by for/while, if/else, and compound statements.

30

C Goto

One use of goto’s with some legitimacy is in a program which contains a long loop, where a while(1) would be too extended. Then you might write

mainloop:
    ...
    goto mainloop;

Another use is to implement a break out of more than one level of for or while. goto’s can only branch to labels within the same function.”

— Kernighan, Programming in C – A Tutorial

31

Go Goto Restrictions

.   if x {
        goto Done
    }
    
    y := f()
    print(y)

Done:
    close(c)
    return
32

Go Goto Restrictions

.   var y int

    if x {
        goto Done
    }
    
    y = f()
    print(y)

Done:
    close(c)
    return
33

Go Goto Restrictions

if bad {
Bad:
    printError()
    return err
}

...

if other bad thing {
    goto Bad
}
34

Go Goto Restrictions

switch x {
case 1:
    F()
    goto Common;
case 2:
    G()
    goto Common
case 3:
Common:
    H()
}
35

Goto in Go compiler

1032 goto statements
241 labels

36

Goto in Go compiler

35 indented labels

18 switch case
6 multilevel break/continue
5 ‘else’ statement
4 cleanup/error labels
1 loop
1 difficult to explain

37

Refactor switch case goto

switch(r->op) {
case OINDEXMAP:
    n->op = OAS2MAPR;
    goto common;
case ORECV:
    n->op = OAS2RECV;
    goto common;
case ODOTTYPE:
    n->op = OAS2DOTTYPE;
    r->op = ODOTTYPE2;
common:
    ...
}
38

Refactor switch case goto

switch r.op {
case OINDEXMAP, ORECV, ODOTTYPE:
    switch r.op {
    case OINDEXMAP:
        n.op = OAS2MAPR
    case ORECV:
        n.op = OAS2RECV
    case ODOTTYPE:
        n.op = OAS2DOTTYPE
        r.op = ODOTTYPE2
    }
    ...
}
39

General solution

Baker, An Algorithm for Structuring Flowgraphs, JACM 1977

But we don't need it.

Handle trivial rewrites in converter.
Rewrite problematic gotos by hand.

40

Type Mapping

41

Type Mapping

General question: what type to use in the Go translation?

42

Type Mapping

Build graph of “assigned” value flow and extract clusters.

x = y;

int f(void) {
    return x;
}
w = f();

void g(int z);
g(x);
g(y);

Apply to entire compiler (all files). Exclude some functions.

43

Type Mapping

int
islvalue(Node *n)
{
    switch(n->op) {
    case OINDEX:
        if(isfixedarray(n->left->type))
            return islvalue(n->left);
        if(n->left->type != T && n->left->type->etype == TSTRING)
            return 0;
        // fall through
    case OIND:
    case ODOTPTR:
    case OCLOSUREVAR:
        return 1;
    case ODOT:
        return islvalue(n->left);
    case ONAME:
        if(n->class == PFUNC)
            return 0;
        return 1;
    }
    return 0;
}
44

Type Mapping

int
islvalue(Node *n)
{
    ...
    return islvalue(n->left);
    ...
    return 0;
    ...
    return 1;
    ...
    return islvalue(n->left);
    ...
    return 0;
    ...
    return 1;
    ...
    return 0;
}
45

Type Mapping

cluster
    types: int
    values:
        return from islvalue
        0
        1
        islvalue(n)
        islvalue(n->left)
        islvalue(n->right)
    contexts:
        bool condition
        /* if(islvalue(n)), if(!islvalue(n)), ... */

Translation: bool.

46

Type Mapping

cluster
    types: int
    values:
        return from checksliceconst
        0
        -1
    contexts:
        checksliceconst(lo, hi) < 0
        checksliceconst(lo, mid) < 0
        checksliceconst(mid, hi) < 0

Translation: bool or error.

47

Type Mapping

cluster
    types: Val*
    values:
        var Val *v
        va_arg(fp->args, Val*)
    contexts:
        v->ctype
        v->u

Translation: pointer.

48

Type Mapping

cluster
    types: long*
    values:
        var long* a1
        &a->a[0]
    contexts:
        *a1
        a1++

Translation: slice.

49

Type Mapping

Cluster statistics

Clustering does not rely on C type information at all.

50

Conversion status

By the way, please try the Go 1.3 beta!

51

Go from C to Go!

52

Thank you

GopherCon

25 Apr 2014

Russ Cox

Google

Use the left and right arrow keys or click the left and right edges of the page to navigate between slides.
(Press 'H' or navigate to hide this message.)