Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: cmd/compile: add tail call optimization for self-recursion only #16798

Closed
ghost opened this issue Aug 19, 2016 · 26 comments
Closed

proposal: cmd/compile: add tail call optimization for self-recursion only #16798

ghost opened this issue Aug 19, 2016 · 26 comments
Labels
FrozenDueToAge LanguageChange Proposal v2 A language change or incompatible library change
Milestone

Comments

@ghost
Copy link

ghost commented Aug 19, 2016

I know it's been discussed that we wanted to keep the stack trace the same, but if gc only does self recursion call optimisation the stack would roughly stay the same, the only thing that would change is the number of frames for the recursive function call.

@docmerlin
Copy link

docmerlin commented Aug 19, 2016

While I would prefer full-on TCO, this would be a good start!

@ghost
Copy link
Author

ghost commented Aug 19, 2016

The go team was very clear that they wanted to keep the stack traces clean and accurate, complete TCO would completely break that. So that's why self-recursino only makes sense, because 1000 000 call to Foo doesn't really help anybody anyway

@bradfitz bradfitz added LanguageChange LongTerm v2 A language change or incompatible library change labels Aug 19, 2016
@bradfitz
Copy link
Contributor

See also dup closed bug #15304

@bradfitz bradfitz added this to the Unplanned milestone Aug 19, 2016
@cznic
Copy link
Contributor

cznic commented Aug 19, 2016

because 1000 000 call to Foo doesn't really help anybody anyway

It does help a lot when the args differ from frame to frame.

@cznic
Copy link
Contributor

cznic commented Aug 19, 2016

Also, any tail self recursion can, and most of the time should, be rewritten as a loop.

@ghost
Copy link
Author

ghost commented Aug 19, 2016

It's not really a dup, I was asking there for all TCO, which there was indeed good reason not to implement. I've looked and other language will at least have self-recursion or even cyclic recursion TCOed. It could make sense for Go. I get that the whole Go2 is basically shutting this conversation down and that's fine.

It does help a lot when the args differ from frame to frame.

If you write a loop you're not getting that frame by frame information back anyway (from the stack trace)

@cznic
Copy link
Contributor

cznic commented Aug 19, 2016

If you write a loop you're not getting that frame by frame information back anyway (from the stack trace)

If tail recursion is rewritten as a loop there are 2 cases: stackless (eg. n!) or the loop keeps explicit stack (of something, eg. naive fib), replacing the (CPU) frames. So in either case the debug info about the state is at most one fmt.Printf away.

What I wanted to point out is that in a language w/o TCO, one can always rewrite the code to not depend on TCO.

@ghost
Copy link
Author

ghost commented Aug 19, 2016

What I wanted to point out is that in a language w/o TCO, one can always rewrite the code to not depend on TCO.

I 100% agree, but some algorithms are so much cleaner when written recursively

@sovietspaceship2
Copy link

This change alone is enough to make me want to use go (again) instead of erlang for many things. Do it and I will love go forever. Do it do it do it. Really. Do it.

@dr2chase
Copy link
Contributor

Was going to comment (related to unsafe, elsewhere) that removing recursion is exactly the sort of error-prone hand-optimization that can also be used to hide introduced vulnerabilities. self-TCO ought to be something that a programmer can at least request -- after all, if it's done by hand, those stack frames are just as gone, and debugging is just as impeded, but it also has the chance of human error added.

Note that one of the late-caught bugs in the new SSA back-end was a mistake in the hand-de-recursing of a clever recursive algorithm. The hand-optimized code usually worked.

@sovietspaceship2
Copy link

Many functional programming languages have excellent support for (general) TCO and it is not an impediment at all for debugging. Code compiled in debug mode can still be instrumented to keep count of self-calls and informations about lost stackframes. In functional languages like Haskell, purity from side-effects helps debugging since only the initial values are needed to know the state of every subsequent call; in Go it's not guaranteed so you lose intermediate stackframes and there's no way to recover them, but since most uses for recursive calls are in algorithms, which I hope are implemented in a mathematically sound manner, the issue is greatly reduced. Everything is moving in this direction, and Go should not be left behind.

Also, as dr2chase noted, I often find myself rewriting beautifully recursive code from their mathematical description to ugly hackish iterative style and it is not easy at all to prove they are completely equivalent until your application in production decides to run an infinite loop because the terminating condition is not met with certain inputs.

Do it.

@houd1ni
Copy link

houd1ni commented Mar 2, 2017

@sovietspaceship Agree. It's truly hard to write naturally recursive code into loops. I do not see any sence in looking deeply on big stacktraces of calling foo, when we haven't TCO, because all the performance power is lost. Though, the math recursive algorythms are usually good by theoretical side, so one has less chances to be involved in such debugging. I also guess that the aim of TCO-less is a bit more complicate code when wrote bad and a bit more chances to write less idiomatic code. But for sure it the pros worth it. Please, point me somewhere if I'm confused. Thank you for the language.

@rsc rsc changed the title cmd/compile: Add tail call optimization for self-recursion only cmd/compile: add tail call optimization for self-recursion only Jun 16, 2017
@rsc rsc changed the title cmd/compile: add tail call optimization for self-recursion only proposal: cmd/compile: add tail call optimization for self-recursion only Jun 17, 2017
@SophisticaSean
Copy link

Would like to add that this would be an amazing feature to have in go. There's just no getting around proper recursion with some algorithms, and if you try, the iterative implementation is terrible to read and understand and, usually, takes way longer to write.

I would love go way more if we did this. I think its a low-impact feature that will help a lot of devs.

@l3x
Copy link

l3x commented Nov 7, 2017

What if Go supported compiler directives (compiler hints) in the form of annotations in the comment line directly above a function like the following?

// @tco yCombinator implements the lambda expression that
// enables composition of unpure components in our workflow
func yCombinator(f FuncFunc) Func {
   g := func(r RecursiveFunc) Func {
       return f(func(x int) int {
           return r(r)(x)
       })
   }
   return g(g)
}

I agree with @SophisticaSean in that this would likely be a low impact feature that would help a lot of developers, especially the ones that would like to introduce functional programming (FP) techniques in their apps.

Supporting the @tco annotation compiler hint would allow Go to support TCO in a backwards compatible manner, right? Optimization for programmers that know what they want, with no ill side effects. What’s not to love?

If Go offered TCO support, that would increase the performance of FP apps by approx. three fold and make FP in Go generally viable.

p.s. For a detailed discussion of TCO, Recursion, Y-Combinator, Generics, Monads and more please get a copy of my book (to be released in a few weeks).

Thanks! Lex

@ianlancetaylor
Copy link
Contributor

@l3x We do already support compiler annotations (see https://golang.org/cmd/compile/#hdr-Compiler_Directives) but we don't like them. We would not want to use a compiler annotation for a fundamental mechanism like tail call optimization that will change the behavior of runtime.Callers.

In the past we've discussed adding a become f() statement, that would act like return f() but would do a tail call. I don't know how much appetite there is for that.

@l3x
Copy link

l3x commented Nov 7, 2017

@ianlancetaylor I can see why the compiler hint would not be the best idea. Thanks for that info!

Adding a become () statement sounds good to me.

I can't find a current proposal for adding a become () statement. Perhaps, we could add a proposal for that?

TCO is necessity for people interested in writing functional programming (FP) style code, that is by its nature recursive. This might be good indicator of how much interest exists for writing FP style code.

@ianlancetaylor
Copy link
Contributor

Filed issue #22624 to record the idea for Go 2.

@dr2chase
Copy link
Contributor

dr2chase commented Nov 7, 2017

Tail-call is not a complete slam-dunk with the existing calling conventions.
Suppose main() calls f(x) tail-calls g(x,y).
The current calling convention has main allocating a single, fixed-size frame large enough for all main's locals and large enough to pass all the parameters to all the things called by main.

In this cases, f, which takes a single parameter. F cannot tail-call g within the parameter list supplied by main because it is simply not large enough; it could extend main's frame before calling g, but after f-really-g returns to main, main expects its original frame.

It is not as simple as you might hope to "just change the calling convention" because Go runs on small stacks and checks their bounds at each stack-growth; we'd need to be sure that we got the growth and check handshakes right. There are also stack maps to consider, which I believe are currently referenced to SP -- which will change, if we modify frame sizes to accommodate changing numbers parameters.

@l3x
Copy link

l3x commented Nov 8, 2017

@dr2chase What if we were to limit TCO for only functions that accept a single argument? We could use currying to translate the evaluation of a function that takes multiple arguments into a sequence of single argument functions, right?

@randall77
Copy link
Contributor

@t3x The same problem occurs for single arguments, as that argument can be of arbitrary size.

@bcmills
Copy link
Contributor

bcmills commented Nov 8, 2017

F cannot tail-call g within the parameter list supplied by main because it is simply not large enough; it could extend main's frame before calling g, but after f-really-g returns to main, main expects its original frame.

It could leave another frame on the stack if necessary, of course, and you could use the return address to detect it. (The returned-to blocks for “tail call” frames could all have the same return address, since all they need to do is pop the frame and update the frame pointer. The size of the tail-call frame could be stored at a known offset.)

The first “tail call” in a chain of calls wouldn't be able to reuse the caller's argument space, but subsequent calls would — that is, we would still have the desired O(1) stack usage for a chain of N tail calls.

@dr2chase
Copy link
Contributor

dr2chase commented Nov 8, 2017

We'd need to get the return values sent to the right place, too, but this is an interesting approach.

@randall77
Copy link
Contributor

@bcmills Kind of, but it's not that simple.

Suppose we have f that calls g that then tail calls h. Just before the tail call, we have:

+---------+
|         |
|    f    |
|         |
+---------+
|rets of g|
+---------+
|args of g|
+---------+
|         |
|    g    |
|         |
+---------+

So how do we modify the stack to have h at the bottom instead? I see two problems:

  1. There may not be enough room to put arguments to h, as it may have larger arguments than g.
  2. h doesn't put its results in the right place (where f expects them), if its argument size is different than g's.

Any stub would have to solve both of those problems. So we might introduce a glue routine (invisible to users):

+---------+
|         |
|    f    |
|         |
+---------+
|rets of g|
+---------+
|args of g|
+---------+
|  glue   |
+---------+
|rets of h|
+---------+
|args of h|
+---------+
|         |
|    h    |
|         |
+---------+

That gives us a place to put the arguments to h, and the glue code can copy back the values that h returned to the appropriate place where f expects them.

It all works, but now a sequence of tail calls grows the stack indefinitely. The question becomes: how to avoid using a glue routine most of the time? Especially in mutual co-tail-calls, I don't see how to do it. If g tail calls h, which tail calls g, which tail calls h, etc, and g and h have different arg sizes, you're going to need the glue routine on at least one of the two tail calls.

We could compile specialized versions of every function that might be tail called, taking arguments by pointer instead of directly on the stack (or some other scheme, like extra arg/return offsets). But that's a lot of extra code, especially because any tail call of a function variable means almost every function might need one of these specialized versions.

@bcmills
Copy link
Contributor

bcmills commented Nov 8, 2017

The question becomes: how to avoid using a glue routine most of the time?

That's exactly what I'm talking about: if we're about to make a tail call, we check whether the return address for the current frame is the glue handler itself. If it is, we can wipe out the stack up to and including the glue frame and replace it with a new glue frame appropriate to the function we're about to call.

If the return address is not the glue handler, then we need to allocate one and the stack will grow. But only the first argument-growing call in a chain of mutual tail-calls should lack such a frame.

@randall77
Copy link
Contributor

I see, you want to introspect one frame up the stack to see if you can redo the glue frame.
That might work. A few complications:

  1. All frames in Go are constant sized, so we might need a few of them, power-of-two sized like we do for reflect.call. Or maybe it can be variable-sized, with the size recorded just above the return value area?
  2. The glue routine is now polymorphic, so it needs to be dynamically typed. The glue routine can have a few fields where it can record GC info, and the stack scanning & stack copying needs to recognize the glue routine and use its recorded GC info.
  3. If we go for a variable-sized glue frame (instead of the powers-of-two set of fixed-frame glues), then the glue code itself needs to figure out how big its own frame is. We have frame pointers on x86 at least, so maybe that works. Not sure if we have them on all architectures. If not, we'd need to modify the calling convention to leave space that the glue frame can use (similar to how we store the link register on some architectures).

Ok, you've convinced me it is at least plausible. Certainly not easy.

@ianlancetaylor
Copy link
Contributor

I think that the tail call optimization is only useful when the programmer can reliably know that the tail call will occur. Conversely I don't think we should do tail calls unpredictably, because of the effect on stack traces. I'm going to close this in favor of #22624 which suggests one possible mechanism for deciding whether to do a tail call or not.

@golang golang locked and limited conversation to collaborators Jan 16, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge LanguageChange Proposal v2 A language change or incompatible library change
Projects
None yet
Development

No branches or pull requests