Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: spec: sum types based on general interfaces #57644

Open
ianlancetaylor opened this issue Jan 5, 2023 · 160 comments
Open

proposal: spec: sum types based on general interfaces #57644

ianlancetaylor opened this issue Jan 5, 2023 · 160 comments
Labels
generics Issue is related to generics LanguageChange Proposal v2 A language change or incompatible library change
Milestone

Comments

@ianlancetaylor
Copy link
Contributor

This is a speculative issue based on the way that type parameter constraints are implemented. This is a discussion of a possible future language change, not one that will be adopted in the near future. This is a version of #41716 updated for the final implementation of generics in Go.

We currently permit type parameter constraints to embed a union of types (see https://go.dev/ref/spec#Interface_types). We propose that we permit an ordinary interface type to embed a union of terms, where each term is itself a type. (This proposal does not permit the underlying type syntax ~T to be used in an ordinary interface type, though of course that syntax is still valid for a type parameter constraint.)

That's really the entire proposal.

Embedding a union in an interface affects the interface's type set. As always, a variable of interface type may store a value of any type that is in its type set, or, equivalently, a value of any type in its type set implements the interface type. Inversely, a variable of interface type may not store a value of any type that is not in its type set. Embedding a union means that the interface is something akin to a sum type that permits values of any type listed in the union.

For example:

type MyInt int
type MyOtherInt int
type MyFloat float64
type I1 interface {
    MyInt | MyFloat
}
type I2 interface {
    int | float64
}

The types MyInt and MyFloat implement I1. The type MyOtherInt does not implement I1. None of MyInt, MyFloat, or MyOtherInt implement I2.

In all other ways an interface type with an embedded union would act exactly like an interface type. There would be no support for using operators with values of the interface type, even though that is permitted for type parameters when using such a type as a type parameter constraint. This is because in a generic function we know that two values of some type parameter are the same type, and may therefore be used with a binary operator such as +. With two values of some interface type, all we know is that both types appear in the type set, but they need not be the same type, and so + may not be well defined. (One could imagine a further extension in which + is permitted but panics if the values are not the same type, but there is no obvious reason why that would be useful in practice.)

In particular, the zero value of an interface type with an embedded union would be nil, just as for any interface type. So this is a form of sum type in which there is always another possible option, namely nil. Sum types in most languages do not work this way, and this may be a reason to not add this functionality to Go.

As an implementation note, we could in some cases use a different implementation for interfaces with an embedded union type. We could use a small code, typically a single byte, to indicate the type stored in the interface, with a zero indicating nil. We could store the values directly, rather than boxed. For example, I1 above could be stored as the equivalent of struct { code byte; value [8]byte } with the value field holding either an int or a float64 depending on the value of code. The advantage of this would be reducing memory allocations. It would only be possible when all the values stored do not include any pointers, or at least when all the pointers are in the same location relative to the start of the value. None of this would affect anything at the language level, though it might have some consequences for the reflect package.

As I said above, this is a speculative issue, opened here because it is an obvious extension of the generics implementation. In discussion here, please focus on the benefits and costs of this specific proposal. Discussion of sum types in general, or different proposals for sum types, should remain on #19412 or newer variants such as #54685. Thanks.

@ianlancetaylor ianlancetaylor added LanguageChange v2 A language change or incompatible library change Proposal generics Issue is related to generics labels Jan 5, 2023
@ianlancetaylor ianlancetaylor added this to the Proposal milestone Jan 5, 2023
@dsnet
Copy link
Member

dsnet commented Jan 6, 2023

This proposal does not permit the underlying type syntax ~T to be used in an ordinary interface type, though of course that syntax is still valid for a type parameter constraint.

Could you comment on why this restriction occurs? Is this simply to err on the side of caution initially and potentially remove this restriction in the future? Or is there a technical reason not to do this?

@ianlancetaylor
Copy link
Contributor Author

The reason to not permit ~T is that the current language would provide no mechanism for extracting the type of such a value. Given interface { ~int }, if I store a value of type myInt in that interface, then code in some other package would be unable to use a type assertion or type switch to get the value out of the interface type. The best that it could do would be something like reflect.TypeOf(v).Kind(). That seems sufficiently awkward that it requires more thought and attention, beyond the ideas in this proposal.

@dsnet
Copy link
Member

dsnet commented Jan 6, 2023

Is there a technical reason that the language could not also evolve to support ~T in a type switch? Granted that this is outside the scope of this proposal, but I think there is a valid use case for it.

@jimmyfrasche
Copy link
Member

In a vacuum, I'd prefer pretty much any other option, but since it's what generics use, it's what we should go with here and we should embrace it fully. Specifically,

  1. type I2 int | float64 should be legal
  2. v, ok := i.(int | float64) follows from 1
  3. in a type switch case int | float64: works like 2
  4. string | fmt.Stringer should be legal even though that does not currently work with constraints

@dsnet I think comparable and ~T could be considered and discussed separately—if for no reason other than this thread will probably get quite long on its own. I'm 👍 on both.

@DeedleFake
Copy link

DeedleFake commented Jan 6, 2023

With the direct storage mechanism detailed in the post as an alternative to boxing, would it be possible for the zero-value not to be nil after all? For example, if the code value is essentially an index into the list of types and the value stores the value of that type directly, then the zero value with all-zeroed memory would actually default to a zero value of the first type in the list. For example, given

type Example interface {
  int16 | string
}

the zero value in memory would look like {code: 0, value: 0}.

Also, in that format, would the value side change sizes depending on the type? For example, would a value of Example(1) look like {code: 0, value: [...]byte{0, 1}) ignoring endianess, while a value of Example("example") would look like {code: 1, value: [...]byte{/* raw bytes of a string header */}}? If so, how would this affect embedding these interface types into other types, such as a []Example? Would the slice just assume the maximum possible necessary size for the given types? Edit: Never mind, dumb question. The size changing could be a minor optimization when copying, but of course anywhere it's stored would have to assume the maximum possible size, even just local variables, unless the compiler could prove that it's only ever used with a smaller type, I guess.

It would only be possible when all the values stored do not include any pointers, or at least when all the pointers are in the same location relative to the start of the value.

I don't understand this comment, which may indicate that I'm missing something fundamental about the explanation. Why would pointers make any difference? If the above Example type had int16 | string | *int, why would it not just be {code: 2, value: /* the pointer value itself, ignoring whatever it points to */}?

@apparentlymart
Copy link

apparentlymart commented Jan 6, 2023

The example in the proposal is rather contrived, so I tried to imagine some real situations I've encountered where this new capability could be useful to express something that was harder to express before.


Is the following also an example of something that this proposal would permit?

type Success[T] struct {
    Value T
}

type Failure struct {
    Err error
}

type Result[T] interface {
    Success[T] | Failure
}

func Example() Result[string] {
    return Success[string]{"hello"}
}

(NOTE WELL: I'm not meaning to imply that the above would be a good idea, but it's the example that came most readily to mind because I just happened to write something similar -- though somewhat more verbose -- to smuggle (result, error) tuples through a single generic type parameter yesterday. Outside of that limited situation I expect it would still be better to return (string, error).)


Another example I thought of is encoding/json's Token type, which is currently defined as type Token any and is therefore totally unconstrained.

Although I expect it would not be appropriate to change this retroactively for compatibility reasons, presumably a hypothetical green field version of that type could be defined like this instead:

type Token interface {
    Delim | bool | float64 | Number | string
    // (json.Token also allows nil, but since that isn't a type I assume
    // it wouldn't be named here and instead it would just be
    // a nil value of type Token.)
}

Given that the exact set of types here is finite, would we consider it to be a breaking change to add new types to this interface later? If not, that could presumably allow the following to compile by the compiler noticing that the case labels are exhaustive:

// TokenString is a rather useless function that's just here to illustrate an
// exhaustive type switch...
func TokenString(t Token) string {
    switch t := t.(type) {
        case Delim:
            return string(t)
        case bool:
            return strconv.FormatBool(t)
        case float64:
            return strconv.FormatFloat(t, 'g', -1, 64)
        case Number:
            return string(t)
        case string:
            return string
    }
}

I don't feel strongly either way about whether such sealed interfaces should have this special power, but it does seem like it needs to be decided either way before implementation because it would be hard to change that decision later without breaking some existing code.

Even if this doesn't include a special rule for exhaustiveness, this still feels better in that it describes the range of Decoder.Token() far better than any does.

EDIT: After posting this I realized that my type switch doesn't account for nil. That feels like it's a weird enough edge that it probably wouldn't be worth the special case of allowing exhaustive type-switch matching.


Finally, it seems like this would shrink the boilerplate required today to define what I might call a "sealed interface", by which I mean one which only accepts a fixed set of types defined in the same package as the interface.

One way I've used this in the past is to define struct types that act as unique identifiers for particular kinds of objects but then have some functions that can accept a variety of different identifier types for a particular situation:

type ResourceID struct {
    Type string
    Name string
}

type ModuleID struct {
    Name string
}

type Targetable interface {
    // Unexported method means that only types
    // in this package can implement this interface.
    targetable()
}

func (ResourceID) targetable() {}
func (ModuleID) targetable() {}

func Target(addr Targetable) {
    // ...
}

I think this proposal could reduce that to the following, if I've understood it correctly:

type ResourceID struct {
    Type string
    Name string
}

type ModuleID struct {
    Name string
}

type Targetable interface {
    ResourceID | ModuleID
}

func Target(addr Targetable) {
    // ...
}

If any of the examples I listed above don't actually fit what this proposal is proposing (aside from the question about exhaustive matching, which is just a question), please let me know!

If they do, then I must admit I'm not 100% convinced that the small reduction in boilerplate is worth this complexity, but I am leaning towards 👍 because I think the updated examples above would be easier to read for a future maintainer who is less experience with Go and so would benefit from a direct statement of my intent rather than having to infer the intent based on familiarity with idiom or with less common language features.

@ianlancetaylor
Copy link
Contributor Author

@dsnet Sure, we could permit case ~T in a type switch, but there are further issues. A type switch can have a short declaration, and in a type switch case with a single type we're then permitted to refer to that variable using the type in the case. What type would that be for case ~T? If it's T then we lost the methods, and fmt.Printf will behave unexpectedly if the original type had a String method. If it's ~T what can we do with a value of that type? It's quite possible that these questions can be answered, but it's not just outside the scope of this proposal, it's actually complicated.

@ianlancetaylor
Copy link
Contributor Author

@DeedleFake The alternative implementation is only an implementation issue, not a language issue. We shouldn't use that to change something about the language, like whether the value can be nil or some other zero value. In Go the zero value of interface types is nil. It would be odd to change that for the special case of interfaces that embed a union type element.

The reason pointer values matter is that given a value of the interface type, the current garbage collector implementation has to be able to very very quickly know which fields in that value are pointers. The current implementation does this by associating a bitmask of pointers with each type, such that a 1 in the bitmask means that the pointer-sized slot at that offset in the value always holds a pointer.

@ianlancetaylor
Copy link
Contributor Author

@apparentlymart I think that everything you wrote is correct according to this proposal. Thanks.

@DeedleFake
Copy link

In Go the zero value of interface types is nil. It would be odd to change that for the special case of interfaces that embed a union type element.

It would be, but I think it would be worth it. And I don't think it would be so strange as to completely preclude eliminating the extra oddness that would come from union types always being nilable. In fact, I'd go so far as to say that if this way of implementing unions has to have them be nilable, then a different way of implementing them should be found.

The reason pointer values matter is that given a value of the interface type, the current garbage collector implementation has to be able to very very quickly know which fields in that value are pointers.

I was worried it was going to be the garbage collector... Ah well.

@merykitty
Copy link

A major problem is that type constraints work on static types while interfaces work on dynamic types of objects. This immediately prohibits this approach to do union types.

type Addable interface {
    int | float32
}

func Add[T Addable](x, y T) T {
    return x + y
}

This works because the static type of T can only be int or float, which means the addition operation is defined for all the type set of T. However, if we allow Addable to be a sum type, then the type set of T becomes {int, float, Addable} which does not satisfy the aforementioned properties!!!

@apparentlymart
Copy link

@merykitty per my understanding of the proposal, I think for the dynamic form of what you wrote you'd be expected to write something this:

type Addable interface {
    int | float32
}

func Add(x, y Addable) Addable {
    switch x := x.(type) {
    case int:
        return x + y.(int)
    case float32:
        return x + y.(float32)
    default:
        panic("unsupported Addable types %T + %T", x, y)
    }
}

Of course this would panic if used incorrectly, but I think that's a typical assumption for interface values since they inherently move the final type checking to runtime.

I would agree that the above seems pretty unfortunate, but I would also say that this feels like a better use-case for type parameters than for interface values and so the generic form you wrote is the better technique for this (admittedly contrived) goal.

@Merovius
Copy link
Contributor

Merovius commented Jan 6, 2023

@merykitty No, in your example, Addable itself should not be able to instantiate Add. Addable does not implement itself (only int and float32 do).

@Merovius
Copy link
Contributor

Merovius commented Jan 6, 2023

also, note that the type set never includes interfaces. So Addable is never in its own type set.

@mateusz834
Copy link
Member

mateusz834 commented Jan 6, 2023

Is something like that going to be allowed?

type IntOrStr interface {
	int | string
}

func DoSth[T IntOrStr](x T) {
	var a IntOrStr = x
        _ = a
}

@zephyrtronium
Copy link
Contributor

Let's say I have these definitions.

type I1 interface {
	int | any
}

type I2 interface {
	string | any
}

type I interface {
	I1 | I2
}

Would it be legal to have a variable of type I? Can I assign an I1 to it? What about string? any(int8)? int8?

@Merovius
Copy link
Contributor

Merovius commented Jan 6, 2023

@mateusz834 Can't see why not.

@zephyrtronium

Would it be legal to have a variable of type I? Can I assign an I1 to it? What about string? any(int8)? int8?

I think the answer to all of these is "yes". For the cases where you assign an interface value, the dynamic type/value of the I variable would then become the dynamic type/value of the assigned interface. In particular, the dynamic type would never be an interface.

@Merovius
Copy link
Contributor

Merovius commented Jan 6, 2023

FWIW my main issue with this proposal is that IMO union types should allow representing something like ~string | fmt.Stringer , but for well-known reasons this isn't possible right now and it's not clear it ever would be. One advantage of "real" sum types is that they have an easier time representing that kind of thing. Specifically, I don't think #54685 has that problem (though it's been a spell that I looked at that proposal in detail).

@leighmcculloch

This comment was marked as resolved.

@leighmcculloch
Copy link
Contributor

leighmcculloch commented Jan 7, 2023

@ianlancetaylor Does the proposal as-is allow both type sets and functions in an interface? It would have a remarkable property not typically present in sum types where you could have a closed set of types along with the ability to have those types implement some common functions and be used as an interface.

@zephyrtronium
Copy link
Contributor

zephyrtronium commented Jan 7, 2023

@leighmcculloch

To address this shortcoming, could we make interface types that contain type sets non-nullable by default, and require an explicit nil | in the type set list. For type sets that do not specify nil, the default value of the interface value would be the zero value of the first type listed.

For reference, this has been suggested a few times in #19412 and #41716, starting with #19412 (comment). Requiring nil variants versus allowing source code order to affect semantics is the classic tension of sum types proposals.

Sometimes discriminated unions have cases where no data is required. I don't think the proposal supports this.

The spelling of a type with no information beyond existence is usually struct{}, or more generally any type with exactly one value. void, i.e. the zero type, means something different: logically it would represent that your unconditional variant is impossible, not that it carries no additional information.

Does the proposal as-is allow both type sets and functions in an interface? It would have a remarkable property not typically present in sum types where you could have a closed set of types along with the ability to have those types implement some common functions and be used as an interface.

Yes, since the proposal is just to allow values of general interfaces less ~T elements, methods would be fine and would dynamically dispatch to the concrete type. I agree that's a neat behavior. Unfortunately it does imply that methods can't be defined on a sum type itself; you'd have to wrap it in a struct or some other type.

@leighmcculloch
Copy link
Contributor

leighmcculloch commented Jan 7, 2023

Thanks @zephyrtronium. Taking your feedback into account, and also realizing that it is easy to redefine types, then I think points (2) and (3) I raised are not issues. Type definitions can be used to give the same type different semantics for each case. For example:

type ClaimPredicateUnconditional struct{}
type ClaimPredicateAnd []ClaimPredicate
type ClaimPredicateOr []ClaimPredicate
type ClaimPredicateNot ClaimPredicate
type ClaimPredicateBeforeAbsoluteTime Int64
type ClaimPredicateBeforeRelativeTime Int64

type ClaimPredicate interface {
    ClaimPredicateUnconditional |
    ClaimPredicateAnd |
    ClaimPredicateOr |
    ClaimPredicateNot |
    ClaimPredicateBeforeAbsoluteTime |
    ClaimPredicateBeforeRelativeTime
}

In the main Go code base I work in we have 106 unions implemented as multi-field structs, which require a decent amount of care to use. I think this proposal would make using those unions easier to understand, probably on par in terms of effort to write. If tools like gopls went on to support features like pre-filling out the case statements of a switch based on the type sets, since it can know the full set, that would make writing code using them easier too.

The costs of this proposal feel minimal. Any code using the sum type would experience the type as an interface and have nothing new to learn over that of interfaces. This is I think the biggest benefit of this proposal.

@ncruces
Copy link
Contributor

ncruces commented Jan 7, 2023

To me, nil seems to be the big question here?

On the one hand, interface types are nilable and their zero value is nil.

On the other hand, union interface constraints made only of non-nilable types prevent a T from being nil, and that behaviour seems useful here as well. Is it that big a can of worms to say these can't be nil?

Exhaustiveness in type switches could potentially be left to tools.

@gophun
Copy link

gophun commented Aug 16, 2023

@Merovius
Yes, it would be awkward if interface is reused for union types. But if we made them separate (adding a keyword is not completely ruled out) with the "zero value based on first type" rule:

type foo union { a | b }

Then these

interface { a | b }
interface { a | b ; b | a }

could be short form for:

interface { union{ a | b } }
interface { union{ a | b } ; union{ b | a } }

Here the order wouldn't matter.

@Merovius
Copy link
Contributor

@gophun This issue is about using general interfaces for unions. #19412 is about other options - which each have their own set of problems, but that discussion doesn't belong here. And FWIW, adding a union keyword like you suggest has been discussed over there at length.

@gophun
Copy link

gophun commented Aug 16, 2023

@Merovius Thank you for the pointer; I'll take it over there. The keyword option was criticized solely on the grounds of being "not backwards compatible," a stance that has been clearly contradicted by the Go project lead.

@Merovius
Copy link
Contributor

The keyword option was criticized solely on the grounds of being "not backwards compatible,"

That is not true. But again, that discussion doesn't belong here.

@arvidfm

This comment was marked as off-topic.

@zephyrtronium
Copy link
Contributor

It would not, for a couple reasons:

  1. The type parameter T is set at compile time. &m.Value is a *T where T is one of the elements of constraints.Integer, constraints.Float, string, or []byte, not a sum type. This proposal is about allowing run-time values of interface types that contain union elements; it is mostly orthogonal to type parameters.
  2. constraints.Signed &c. use ~T elements. This proposal does not allow values of interface types when those interfaces contain ~T elements.

I think what you want is #45380.

@zephyrtronium
Copy link
Contributor

Seeing this example, it occurs to me that #57644 (comment) actually seems to be wrong. Consider these definitions:

type bytestring interface {
    string | []byte
}

func f[T bytestring]() {}

Type bytestring itself can instantiate f if bytestring satisfies bytestring, which it does if bytestring implements bytestring. Since bytestring is an interface, it implements any interface of which its type set is a subset, which trivially includes itself. Therefore f[bytestring] is a legal instantiation.

So, it seems that we need additional adjustments to the spec to make interfaces with union elements legal. Otherwise every type constraint which includes a union element and no ~T terms gains a non-empty set of members, all of interface type, which will be illegal in almost every case.

@zephyrtronium
Copy link
Contributor

Triple posting aside, discussion on #48522 prompted me to think about what "additional adjustments to the spec" we would actually need for the proposed union types to not break existing code.

My initial thought was "interfaces containing union elements cannot satisfy the interfaces they implement." That would prevent bytestring above from instantiating functions with any constraints, which seems obviously a non-starter.

The minimal condition, in the sense of allowing unions to satisfy the most constraints, would be that they satisfy any constraint of which their operation set is a superset. Being interfaces, the operations they bring are comparison, type assertion, and their methods. Type assertion is mostly irrelevant due to the rules about interfaces in union elements. So, it seems like we could get away with a rule like, "an interface T containing union elements with no ~U terms satisfies constraints that are basic interfaces that T implements, as well as constraints that can be written in the form interface { comparable; E } where E is a basic interface that T implements."

The question then becomes where that rule leaves us with covariance. With this rule, and returning to the definitions above, can we write func g[T bytestring]() { f[T]() }? Since type parameters are interfaces underneath, I think the answer is no. We need more precision to handle type parameters. We end up with a definition of "satisfies" that looks, in total, something like:


A type T satisfies a constraint C if:

  • T is not an interface containing union elements with no ~U terms, or T is a type parameter type, and T implements C;
  • T is not an interface containing union elements with no ~U terms, C can be written in the form interface { comparable; E } where E is a basic interface, and T is comparable and implements E; or
  • T is an interface containing union elements with no ~U terms, but not a type parameter type, and C is a basic interface that T implements or C can be written in the form interface { comparable; E } where E is a basic interface that T implements.

I find this definition hard to follow compared to the current one, but I think it does everything we need for this proposal.

@mikeschinkel
Copy link

I have been reading this thanks to @Merovius linking it from the [go-nuts] list.

Seems to me the biggest argument against interfaces-as-sum-types is over zero values, i.e. that Go fundamentally requires zero values and that can't change, there is no consensus on how to arrive at a zero value for these sum types, and with others wanting sum types to not have zero values as they see zero values conflicting with the benefits they see sum-types providing. IOW, a classic catch-22.

If I understanding this wrong, please let me know.

I think it would be great if this could become a feature of Go so I considered that catch-22 in hopes to resolve it and came up with something I think could work.

The first aspect would be to require that these sum types not be able to be instantiated without providing an explicit value. That would be mean some of the following would throw a compiler error:

type Identifier interface {
   int | string 
}
var widgetId Identifier                     // throws compile error
widgetId := Identifier(1)                   // compiles fine
widgetIds := make([]Identifier,3)           // throws compile error 
widgetIds := []Identifier{                  // compiles fine
   Identifier(123), 
   Identifier("happy"), 
   Identifier(456),
}

Unless I miss some way in which a property can get a zero value, the above limitation would be sufficient to ensure that a sum type never had an opportunity to have a zero value (I ignored in my example returning an uninitialized value from a func but let's assume that is disallowed to.)

If simply disallowing sum types from being created if not initialized is not sufficient — because someone might use CGo or some other edge case to create an uninitialized sum type — then we would need a real zero value. That is where IMO reconsidering the untyped builtin zero (#61372) could fit in. A sum type could have a zero value of just zero and could not be otherwise be represented.

So if it comes to pass that a variable or expression of type of a sum type has a zero value then using that variable or expression for anything other than assignment of a non-zero value or checking if it is equal to zero would be a runtime error generating a panic. Since in normal cases it should never have a zero value then having a zero value occuring would truly be an exceptional case indicating an error somewhere else and code, and thus deserving of a panic.

The zero value could be represented internally exactly as an interface containing nil is represented, but for a sum type st then st==nil would always be false and st==zero would only ever be true for an exceptional case that should never happen for normal use cases.

The only real downside I see to this approach would be that you could not pre-create a slice or map with any elements using make().

However, if for our 3rd aspect we allowed extending make() to recognize what I will call an "initializer" then make could initialize sum types to a default value. Consider setting the value of a slice of ten Identifiers to be 0:

ids := make([]Identifier{0},10)

And this sets a slice of 25 Identifiers to be initialized to an empty string:

ids := make([]Identifier{""},25)

So there it is. Please feel free to poke holes in this approach when and if you find any.

P.S. We don't really even need zero to make this work, the zero value could still be nil but would be just as constrained as I described for zero. But having zero would be a nice distinction because then sum types would be the only type in Go with a zero value but no other representationt vs. having a nil that behaves differently for sum types than for other nillable types.

@ianlancetaylor
Copy link
Contributor Author

@mikeschinkel Thanks. The idea of not permitting the type to be instantiated without a value has been suggested several times before in the various discussions of sum types. It has always been rejected. Zero values are built into the language too deeply. For one example--and just for one example, there are other problems--how do you handle a type assertion to the sum type if the type assertion fails? What value do you give to the first result of the type assertion?

@mikeschinkel
Copy link

@ianlancetaylor — Thank you for acknowledging.

I see your perspective in how my suggestion also results in a zero values concern.

how do you handle a type assertion to the sum type if the type assertion fails? What value do you give to the first result of the type assertion?

I expect you meant that as a rhetorical question, but since you posed the question I hope you do not mind me at least answering it.

If a type assertion fails in an assignment to a variable of a sum type then the variable would get the value of zero (now it seems zero is required, after all), as the same rules I outlined above would apply. Doing anything with that variable besides assigning it a non-zero value or testing it for equality with zero should panic.

That seems reasonable to me, at least, because a failed type assertion is a failure so accessing the value of that variable is almost certainly a logic error anyway. Right?

That scenario does bring up a question of whether or not a zero-valued sum-type could be passed to a function, and I could go either way on that. Seems less than reasonable to disallow it, but then the panic would occur elsewhere compared to where the the panic was caused.

I do respect that you and others may view those constraints as not what Go should be, and I will be accepting of that if it is the final ruling.

However, AFAICT, I still think the logic of my suggestion is valid, unless there is some other scenario that emerges that cannot be resolved in the same way as for failed type assertions. 🤷‍♂️

@Merovius
Copy link
Contributor

@mikeschinkel IMO your suggestion is now back to the point where every sum type has a zero value of nil. It doesn't really matter if it is spelled nil or zero, as far as the contentious questions are concerned, as long as it the semantics are the same. Which it seems they mostly are.

That doesn't mean the suggestion isn't viable, it's just that it doesn't differ significantly from what we have been talking about so far.

To me, that means FWIW that there is no need to disallow make etc. either, because if there is any way to create an invalid zero value, usage of a variant needs to be prepared to deal with it. If it needs to be prepared to deal with it, might as well keep more coherency in the language and not treat them specially at the point of creation.

That scenario does bring up a question of whether or not a zero-valued sum-type could be passed to a function, and I could go either way on that.

This has been discussed above as well. It seems hard to impossible to me to disallow it, without drastical changes to Go's type system. Whether or not a variable is zero is no longer a static property but a runtime property and trying to make static assertions about those tends to pretty quickly devolve into solving the halting problem. See also the various suggestions over the year to disallow dereferencing nil-pointers statically.

@mikeschinkel
Copy link

@Merovius

"IMO your suggestion is now back to the point where every sum type has a zero value of nil. It doesn't really matter if it is spelled nil or zero, as far as the contentious questions are concerned, as long as it the semantics are the same."

Admittedly there is not much difference, but there is one tangible difference; consistency.

If we allowed that sum types could just be nil then there would be the contentious question of consistency; i.e. that some variables that can contain nil would be handled differently than others. Using zero instead of nil removes that one objection, which is why I suggested it.

But yes, that is the only difference, however it could be the difference between someone objecting to sum types vs. supporting them. What percentage of people who would do each remains to be seen.

"That doesn't mean the suggestion isn't viable, it's just that it doesn't differ significantly from what we have been talking about so far."

Yes, and the difference is that the combination of things — at least on this issue — have not been discussed prior AFAICT.

"To me, that means FWIW that there is no need to disallow make etc."

With the addition of initializers in my suggestion there is no reason to disallow make() either, but I get that is likely orthogonal to your point.

"because if there is any way to create an invalid zero value, usage of a variant needs to be prepared to deal with it. If it needs to be prepared to deal with"

Yes, and that is where we disagree. My suggestion proposes making zero an exceptional case such that the vast majority of code would safely not deal with it because existence of a zero value and subsequent use would in itself be an exceptional case worthy of an immediate panic.

"might as well keep more coherency in the language and not treat them specially at the point of creation."

From a purity standpoint you are probably correct. But my understanding of Go's nature is that they have historically placed emphasis on pragmatism over purity. Otherwise there would have been no append(), copy(), etc.

Respecting the existing nature of the Go language, I am arguing that since it is impossible to find a perfect solution, maybe instead we could be pragmatic and accept a really good one?

The reason I think this approach could work is because there is only one way thus far I have discovered thus far that a sum type variable could have a zero value during non-exceptional coding practices, that one way is itself a test for validity so the compiler could ensure that it is not misused if we limit to not allowing them to be passed to funcs when they have zero values (your questioning has now convinced me this is the right approach.)

"Whether or not a variable is zero is no longer a static property but a runtime property and trying to make static assertions about those tends to pretty quickly devolve into solving the halting problem."

Unless I am missing something, it would seem easy to determine if a variable is used without ok being checked, for our one known non-exceptional case. That is a highly constrained problem vs trying to determine if an arbitrary program will run forever. We would have to limit to checking ok and not allowing things like if ok || myStatus()=1 {...} but that feels like a reasonable limitation — since a simple restructure of that if statement would create a knowable construct — given the value offered by a workable sum type.

So I ask this: rather than discuss in abstract terms, can you or others identify places where the compiler could not easily identify when a sum type variable received a value of zero and thus not be able to disallow it?

@mrwonko
Copy link

mrwonko commented Mar 22, 2024

For one example--and just for one example, there are other problems--how do you handle a type assertion to the sum type if the type assertion fails?

If we had sum types, we could return an Optional sum type. So something like

var typeAsserted optional[MySumType] = nothing
typeAsserted, _ = anything.(MySumType)

(In practice, you would usually not declare the destination separately, I just did it to highlight its type.)
I understand this would be somewhat inconsistent with non-sumtype-type-assertions, but it sidesteps the zero-issue.

But if we can generalize it so nothing=zero=nil, maybe we can treat every type assertion as returning an optional, and optionals are implicitly convertible to zero values where available?

@Merovius
Copy link
Contributor

Merovius commented Mar 22, 2024

because there is only one way thus far I have discovered thus far that a sum type variable could have a zero value during non-exceptional coding practices

Ian wrote

For one example--and just for one example, there are other problems--how do you handle a type assertion to the sum type if the type assertion fails? What value do you give to the first result of the type assertion?

I feel like that should have made clear that this was just one example. As far as I can tell, to name a few others, you have not yet talked about channel-receives, map-accesses, reflect.New, extra capacity allocated by append, the statement var x T when T is a type-parameter (and any other statically disallowed code for these specific types), named returns (in particular in the presence of panic) or clear on a slice. There might be others.

I'll also note that the suggestion to disallow uninitialized values came up in this discussion before and most of this list has been posted there as well.

And while I appreciate that it is frustrating to be told that something you see as an easy solution is unworkable, I'd also ask for a little bit of trust that when people like Ian or I say things like "Zero values are built into the language too deeply", it's not just an off-the-cuff remark. We wouldn't say that, if we saw a realistic way to make it work.

In particular, listing instances of where zero values are mentioned in the spec is not meant as a request to special case solutions to them, but as a demonstration of what we mean when we say "zero values are built into the language too deeply".

@icholy
Copy link

icholy commented Mar 22, 2024

This might be a bit off-topic, but have zero value semantics like this been discussed?:

func main() {
	var m map[string]string

	assert(m == zero)
	assert(m == nil)

	m = nil

	assert(m != zero)
	assert(m == nil)

	m = zero

	assert(m == zero)
	assert(m == nil)
}

func assert(b bool) {
	if !b {
		panic("assertion failed")
	}
}

@Merovius
Copy link
Contributor

And FWIW

Respecting the existing nature of the Go language, I am arguing that since it is impossible to find a perfect solution, maybe instead we could be pragmatic and accept a really good one?

I assume that the pragmatic solution we will eventually adopt (if any) is to use union-element interfaces as variants and make their zero value nil. To be clear, that's not my favorite solution, just the one that seems most pragmatic, given where we are. But it requires accepting the bad that comes with it and I don't resent the fact that we don't do that lightly.

@ianlancetaylor
Copy link
Contributor Author

@mrwonko Thanks. In these kinds of discussions, it is always possible to find a solution for any given problem. But it is also necessary to step back and consider the overall picture. Go is intended to be a reasonably simple, reasonably orthogonal language. When we add special cases we weaken those properties.

This proposal is, I think, a somewhat simple, reasonably orthogonal, change that we could make. The question here is not how to complicate it to make it better. We're almost certainly not going to do that. Rather than make it more complicated, we will choose to make no change at all. The question here is whether to make this change at all--that is, whether the benefits of the change are worth adding more complexity to the language. Or perhaps we can find a way to make it more simple and more orthogonal.

@ngortheone
Copy link

Odin lang is in many ways inspired by golang.

Odin has enum type that expresses sum type idea.
To instantiate an enum variable one has to spell out the concrete type

Foo :: enum {
	A,
	B,
	C,
	D,
}

f := Foo.A

https://odin-lang.org/docs/overview/#partial-switch

It is true that if golang tries to implement sum types via extending/overloading interface complications are guaranteed.
But what about creating a separate keyword enum ? This helps to sidestep a lot of complications that come from interface

type Foo enum {
    A string
    B int
}

f := Foo.A // OK
b := Foo   // Compile failure, unspecified concrete type

@Merovius
Copy link
Contributor

Merovius commented Mar 22, 2024

@ngortheone Note that this issue is specifically about using union-elements in interfaces as variants. There are other issues (my personal favorite is #54685) to discuss other ideas and #19412 as an umbrella issue for the general idea of variant types.

I'll note that the vague notion of adding a new syntactical construct and type kind has been suggested a lot of times so far, so your suggestion isn't really novel.

@ngortheone
Copy link

so your suggestion isn't really novel

That probably means that the solution space to the sum type problem is small and the search has already exhausted all good options. The main question now is:

Knowing all pros and cons of each solution will golang decide to go for any solution at all?

@mikeschinkel
Copy link

mikeschinkel commented Mar 23, 2024

I feel like that should have made clear that this was just one example.

Logically-speaking, why? I addressed the one example, and was looking forward to considering others.

As far as I can tell, to name a few others, you have not yet talked about channel-receives, map-accesses, reflect.New, extra capacity allocated by append, the statement var x T when T is a type-parameter (and any other statically disallowed code for these specific types), named returns (in particular in the presence of panic) or clear on a slice.

I would tackle each of these, but from the tone of your arguments I don't feel like continuing what is evidently a contentious debate with you here on this issue.

I'll also note that the suggestion to disallow uninitialized values came up in this discussion before and most of this list has been posted there as well.

I had searched for "disallow" and "initialized" on this page prior to my posting and they appeared nowhere.

I searched again just now, but this time I opened up all the posts marked "off-topic" and found it was you telling @atdiar it wasn't possible, that culminated in his frustrated (your word) post before Robert Griesemer called for respectful discussion.

However, nowhere in that dialog did anyone other than you — i.e. no one from the Go team — argue against the idea.

So my takeaway is that you are asserting that if you already expressed an opinion against something that no one else should be able to discuss it? Just wanting to make sure I understand correctly.

And while I appreciate that it is frustrating to be told that something you see as an easy solution is unworkable,

No, it is absolutely not frustrating to be told something that what I presented as a strawman proposal is unworkable when objective and specific arguments against it are given. That is entirely the point of such a proposal to flesh out its feasibility.

"I'd also ask for a little bit of trust... We wouldn't say that, if we saw a realistic way to make it work. "

What is frustrating instead is to be told, effectively "We have already considered ever conceivable option and so you should just trust me that you have no value to offer here."

when people like Ian or I say things like "Zero values are built into the language too deeply", it's not just an off-the-cuff remark.

As George Bernard Shaw said "The single biggest problem in communication is the illusion that it has taken place." You assume the statement "Zero values are built into the language too deeply" are interpreted as you understand the phrase in a binary form exactly as you understand it without recognizing that others don't interpret that statement the same.

From my perspective my proposal absolutely respected that statement; why else would I have included the concept of zero to be applied to sum types if I was disrespecting Ian's comment? I was explicitly trying to address how useful sum types and zero values could coexist.

BTW, I really like how Ian engages in discussions on this forum. He always replies in a respectful manner, makes a statement when he needs to, but evidently doesn't feel the need to debate everyone who has a proposal, even if it is not one they will pursue. The Go team then ultimately makes their decisions and we all move forward. His approach makes everyone feel as if they can contribute, but is tactful when a discussion gets out of control and reigns it in with a statement of intent. It would be a lot nicer in these forums if everyone were able to participate without any self-appointed gatekeepers.

@Merovius
Copy link
Contributor

Merovius commented Mar 23, 2024

@mikeschinkel FWIW there is also #19412, which is more general, so contains more discussions of broader proposals than this one. Searching that casually brings up more discussion about these specific problems, involving a lot more people than me, including people on the Go team.

So, apologies for writing "this discussion". It was imprecise. The general discourse about variants has been going on for a while and I don't always remember where all parts of it happened.

@thepudds
Copy link
Contributor

Just to briefly underscore one point @Merovius has made a few times, I thought it might be helpful to re-post this snippet of Ian's original proposal text (from top comment above):

In discussion here, please focus on the benefits and costs of this specific proposal. Discussion of sum types in general, or different proposals for sum types, should remain on #19412 or newer variants such as #54685. Thanks.

(And of course, given how bad GitHub issues are for long conversations, it's worth keeping in mind the benefits of scaling with many conversations in places outside of GitHub issues, such as the #generics channel of Gopher Slack, which has friendly & thoughtful discussions, or elsewhere like golang-nuts, r/golang on Reddit, or by sharing a Gist you wrote in one of those places, etc.).

@mikeschinkel
Copy link

@thepudds — Given your comment it is worth noting that while some people may see discussion as being a different proposal, others making suggestions see it as addressing ways to make the original proposal viable.

Also, given the concept of scaling with many conversations, it would be respectful of and incumbent on those who have the time to seek out and follow many different discussions in many different places that not everyone is fully aware of to not seek to tamp down comments by others without at least first linking to their specific points from those other discussions, and especially before admonishing people for discussing things "that have already be addressed and resolved," but elsewhere. #fwiw

@perj
Copy link

perj commented Apr 16, 2024

I think it would be very helpful if the resolved concrete types were also possible to list using the reflect package. If they are, it would be possible to add functionality to json.Unmarshal to write to these interfaces.

That is, this example would work, if json.Unmarshal would get the types [int, string] from the passed pointer and try to decode each in turn.

func main() {
	var v interface{ string | int }
	err := json.Unmarshal([]byte(`42`), &v)
	fmt.Println(v, err)
}

would print 42, and v would have the underlying type int. For var v any the underlying type would be float64.

I'm don't think the json package functionality would have to be part of this exact proposal. There would have to be decisions made about error handling, for example. But I would expect some support in the reflect package. Presumably reflect.Type.Implements will also need this list, regardless.

// SumTypes returns the concrete types implementing t.
// It panics if t is not an interface type.
// It returns nil if t does not have any type constraints set.
// The returned types are sorted in lexicographic order, including package path.
func SumTypes(t Type) []Type

With that documentation, I suppose methods would also be checked, otherwise Implements(t) might still return false on some of the returned types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
generics Issue is related to generics LanguageChange Proposal v2 A language change or incompatible library change
Projects
None yet
Development

No branches or pull requests