Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: spec: add native support for programming persistent memory in Go #43810

Closed
jerrinsg opened this issue Jan 20, 2021 · 41 comments
Closed

Comments

@jerrinsg
Copy link
Contributor

Persistent Memory is a new memory technology that allows byte-addressability at DRAM-like access speed and provides disk-like persistence. Applications using persistent memory benefit in a number of ways such as seeing improved performance and faster restart times. More details on this technology can be found at pmem.io.

This is a proposal to add native support for programming persistent memory in Go. A detailed design of our approach to add this support is described in our 2020 USENIX ATC paper go-pmem. An implementation of the above design based on Go 1.15 release is available here. In summary, adding support for natively programming persistent memory requires following capabilities to be added to Go :

  • Manage a garbage-collected persistent memory heap
  • Provide an interface for applications to allocate objects in persistent memory heap
  • Enable applications to make crash-consistent updates to data in persistent memory
  • Support applications to recover following a crash/restart

There exists libraries such as Intel PMDK that provides C and C++ developers support for persistent memory programming. Other programming languages such as Java and Python are also exploring ways to enable efficient access to persistent memory. But no language provide a native persistent memory programming support. This proposal attempts to remedy this problem by making Go the first language to completely support persistent memory.

Since adding this support involves significant changes to the language runtime and compiler, we have also prepared a design document that I will attach to this proposal.

@gopherbot gopherbot added this to the Proposal milestone Jan 20, 2021
@jerrinsg
Copy link
Contributor Author

@zephyrtronium
Copy link
Contributor

I spend enough time explaining to new Go users why they shouldn't use println. Please don't make me explain why they shouldn't use pmake.

@DarkGhostHunter
Copy link

I spend enough time explaining to new Go users why they shouldn't use println. Please don't make me explain why they shouldn't use pmake.

Because...

@fbnz156
Copy link

fbnz156 commented Jan 20, 2021

Honestly, I don't like the idea of persistent memory in general, but I especially don't like the idea of it in a GC language like Go. Please don't.

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Jan 20, 2021
@ianlancetaylor
Copy link
Contributor

Is a single root sufficient? A package that is aware of persistent memory might find it useful to declare its own package-local persistent memory data structures, but it seems awkward to have to tie that into a single program root.

This is likely obvious to people familiar with persistent memory, but from reading the design doc I don't understand how a transaction can fail, or what happens if a transaction does fail. What is the purpose of the txn keyword? And why are there parentheses after txn?

Does anything prevent storing a pointer to non-persistent memory into persistent memory? Presumably doing so would lead to data corruption if the program restarts.

@gopherbot
Copy link

Change https://golang.org/cl/284992 mentions this issue: design: persistent memory support in Go

gopherbot pushed a commit to golang/proposal that referenced this issue Jan 21, 2021
Design document for the proposal - add native support for programming
persistent memory in Go (https://golang.org/issue/43810).

For golang/go#43810

Change-Id: I0b237f7e07634c0bc9d0dbadfc03f37910b83bce
Reviewed-on: https://go-review.googlesource.com/c/proposal/+/284992
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@mohit10verma
Copy link
Contributor

Hi Ian,
Here are the answers:

Is a single root sufficient? A package that is aware of persistent memory might find it useful to declare its own package-local persistent memory data structures, but it seems awkward to have to tie that into a single program root.

There can be multiple roots to persistent memory and our current design allows that. Our example shows a single root, but it is possible for users to make multiple roots (we call it named objects) using the pmem package with pmem.New() function and later retrieve these roots with pmem.Get(). The slice of these named objects is pointed to by runtime.SetRoot()

This is likely obvious to people familiar with persistent memory, but from reading the design doc I don't understand how a transaction can fail, or what happens if a transaction does fail. What is the purpose of the txn keyword? And why are there parentheses after txn?

A transaction can fail for example, if a program or system crashes. The txn keyword is used to demarcate the boundaries of a transaction. There are various alternatives to do this. For example use the //go:transactional pragma or as you mentioned don’t use a parantheses after txn. This is syntactical sugar in our current design and it can change.

Does anything prevent storing a pointer to non-persistent memory into persistent memory? Presumably doing so would lead to data corruption if the program restarts.

In case there is a pointer to non-persistent memory from persistent memory, all these pointers are zeroed out when a program restarts

@ianlancetaylor
Copy link
Contributor

If the language supported generics, then as the design doc notes we could implement pnew and pmake as library functions. Can txn be implemented as a library function? That is, could we have beginTransaction and endTransaction? Or does it have to be implemented in the compiler?

@davecheney
Copy link
Contributor

In case there is a pointer to non-persistent memory from persistent memory, all these pointers are zeroed out when a program restarts

How is that better than pointing to a random area of memory? I guess the program blows up deferences a nil pointer rather than a pointer to random memory, but this doesn’t sound very useful if every reference type is spring loaded with nil even if the original author went to lengths to avoid creating objects with invalid references.

Also, how does persistent memory keep track of everything that is a pointer without confusing it with large numbers, floats, strings, etc?

@mohit10verma
Copy link
Contributor

If the language supported generics, then as the design doc notes we could implement pnew and pmake as library functions. Can txn be implemented as a library function? That is, could we have beginTransaction and endTransaction? Or does it have to be implemented in the compiler?

Within a txn block, we instrument stores to persistent memory and inject calls to an undo log before the actual updates are made. (We have a SSA pass, similar to writeBarrier SSA pass to do this). These logs are needed to recover from crashes should they occur. Even with a library, the compiler needs to be aware of a transactional region and instrument the stores.

Alternatively, instead of a language change we can have pragmas like //go:transactional for functions.

@jerrinsg
Copy link
Contributor Author

Hi Dave,

Thanks for the questions.

In case there is a pointer to non-persistent memory from persistent memory, all these pointers are zeroed out when a program restarts

How is that better than pointing to a random area of memory? I guess the program blows up deferences a nil pointer rather than a pointer to random memory, but this doesn’t sound very useful if every reference type is spring loaded with nil even if the original author went to lengths to avoid creating objects with invalid references.

Pointing to a random area of memory when an application restarts is much more dangerous than overwriting those pointers with nil pointers. The random area in memory could be an unmapped region of memory which upon accessing will result in the OS killing that application, or it could be a region in a span that is unused which violates the invariants kept by the garbage collector. Even worse, it could point to a valid region in memory, and the application can unitentionally corrupt such memory.
Even though zeroing is not ideal, we felt it is the right compromise we could arrive at. Application that incorrectly store such references will crash on accessing such data. Theoretically, it also facilitates a defensive style of programming where applications can check for the validity of each pointer in persistent memory before accessing such pointers.

Also, how does persistent memory keep track of everything that is a pointer without confusing it with large numbers, floats, strings, etc?

During the regular run of an application, persistent memory heap is managed by the Go runtime exactly like the volatile memory heap. They are managed in 64MB arenas, and the arena stores metadata about that region such as its heap type bitmap, span map, etc (these metadata are stored in volatile memory). In addition to this, we added code to log some of this metadata in a header region of each persistent memory arena. Two types of metadata are logged - heap type bitmaps and span table. Heap type bitmap stores the exact bitmap used by the Go runtime that is used to identify pointers in the heap. The span table stores information about each span in the arena such as its sizeclass, needzero parameter, number of pages (for large spans), etc. On an application restart, the heap type bitmap is restored as such from the persistent memory arena header to its corresponding arena metadata in volatile memory. Spans are recreated using the span table. The GC then uses the restored heap type bits to identify pointers in the heap (just like for volatile memory).

@jerrinsg
Copy link
Contributor Author

I spend enough time explaining to new Go users why they shouldn't use println. Please don't make me explain why they shouldn't use pmake.

Not sure what is the concern you are expressing here. Language changes such as pnew/pmake are not necessary to add support for programming persistent memory. We have explored an alternative using generics where these functionalities can be provided by functions exported by a package - https://github.com/golang/proposal/blob/master/design/43810-go-pmem.md#compatibility.

@ulikunitz
Copy link
Contributor

My concern is that pmem is not available on most hardware that developers are using today. That leads me to the question, whether it could be emulated with memory-mapped disk files?

Usually the Go philosophy is to implement abstract concepts and map them to hardware features if they are supported. Examples are goroutines or the functions in math/bits.

@daheige
Copy link

daheige commented Jan 21, 2021

support this proposal

@rogpeppe
Copy link
Contributor

rogpeppe commented Jan 21, 2021

I have questions :)
Disclaimer: I'm not previously familiar with the concept of persistent memory, so some questions might well be misguided.

The docs say that pmemory is allocated similarly to Go, with the pnew function. But in Go the new builtin is almost never used. Instead, memory is most often allocated by using struct or slice literals. Would that be possible with this proposal? Can you allocate a map inside pmem? What about other kinds of data structures? I'm thinking that it would be nice to be able to re-use existing Go algorithm implementations without making a specialised copy that works only on pmem.

When you have a program that's using pmem you have two kinds of memory, not distinguished in the type system: volatile and persistent. If I understand correctly, if you accidentally store a pointer to volatile memory inside persistent memory, you'll lose that data on restart. I could see that as being a fruitful source of bugs: is this issue amenable to some kind of static analysis to avoid those bugs?

Persistent memory is accessed via a file-like API, so it should be possible to access more than one at a time, but the pmem package proposed here is global - it doesn't seem to provide any way to access more than one pmem file at a time from the same program. Is this a fundamental constraint?

Is pmem access fundamentally exclusive? That is, is it possible to run two instances of the same pmem program concurrently? In my brief scan through some documents and code, I didn't see anything that indicated that it was. And ISTM that it would potentially be useful to be able to have multiple processes running with access to the same persistent dataset. If access is not exclusive, that raises the issue of synchronisation primitives: would a sync.Mutex or a sync.Cond work in persistent memory? If access is not exclusive, then is it possible to do garbage collection at all? The possibility of having pointers from persistent to volatile memory seems particularly hazardous in this scenario.

@raspi
Copy link

raspi commented Jan 21, 2021

Since the tech is quite new, how's the security? I didn't find any papers evaluating security of it. Security is hardly mentioned in the docbook. From that perspective it's another Row Hammer / Spectre / etc waiting to happen.

@mohit10verma
Copy link
Contributor

mohit10verma commented Jan 21, 2021

My concern is that pmem is not available on most hardware that developers are using today. That leads me to the question, whether it could be emulated with memory-mapped disk files?

Usually the Go philosophy is to implement abstract concepts and map them to hardware features if they are supported. Examples are goroutines or the functions in math/bits.

Operating systems like Linux can mmap persistent memory into the virtual address space. Using this, it is possible to fake persistent memory for testing purposes using something like RAMdisk etc.

@thejerf
Copy link

thejerf commented Jan 21, 2021

This seems to propose adding software transactional semantics to the base layer of Go. As far as I know, trying to create a practical STM has been an unremitting failure in every language that does not have type-system level support for isolating IO from STM, despite very significant effort poured into the effort by well-funded attempts. I see nothing about Go that would mitigate this in Go; quite the contrary, it lacks even some of the tools that languages like C# had, which still proved ultimately inadequate to the task.

Moreover, I envision trying to use this in practical code, and it seems like this API will ultimately prove very inadequate and become a real thorn-in-the-side on an ongoing basis as it requires more and more extensions (even beyond the STM problems in the previous paragraph) to make it even possible to address problems arising in field usage. For instance, the paper mentions that memory leaks are more dangerous in persisted memory because they survive a reboot. However... memory leaks will happen. "Just don't leak" isn't a solution. So, how are we supposed to address them? The mentioned API has nothing that could solve that problem that I can see. The obvious extensions will be nearly unusable, IMHO; it would take something relatively deep.

This would seem to be setting up for very long term support for this use case that will require wildly disproportional effort to maintain vs. the amount of use its going to have in the broader Go community compared to the existing code base.

It seems like you'd be much happier just forking Go and being able to do what you need to do to deepen this integration without having to run everything past the core Go team, who are always going to be having to balance your niche desires vs the much larger core and generally you're going to lose.

@ulikunitz
Copy link
Contributor

@mohit10verma Apologees for not being clear enough. I intended to ask whether persistent memory can be emulated on hardware without persistent memory. My guess is yes, but it would be much slower.

@mohit10verma
Copy link
Contributor

mohit10verma commented Jan 21, 2021

This seems to propose adding software transactional semantics to the base layer of Go. As far as I know, trying to create a practical STM has been an unremitting failure in every language that does not have type-system level support for isolating IO from STM, despite very significant effort poured into the effort by well-funded attempts. I see nothing about Go that would mitigate this in Go; quite the contrary, it lacks even some of the tools that languages like C# had, which still proved ultimately inadequate to the task.

Yes you are right. Making updates to persistent memory in a crash-consistent way with the help of language needs Software Transactional Memory support. And as you mentioned there have been many unsuccessful efforts. For example C# tried and failed to support STM. Our current design also has open issues like how to handle IO within a transaction.
An alternative is to not have transactions in-built to Go, but leave it to the user to write their own transactions (like Databases do). This makes the programming difficult (all stores will have to be manually logged by the user. Our transaction package is an example of such a user-level package).
In this approach, Go manages the persistent memory heap (allocation + Garbage collection) and leaves it to the user to write transaction packages. Infact, we have seen this proposal for Java.

Moreover, I envision trying to use this in practical code, and it seems like this API will ultimately prove very inadequate and become a real thorn-in-the-side on an ongoing basis as it requires more and more extensions (even beyond the STM problems in the previous paragraph) to make it even possible to address problems arising in field usage. For instance, the paper mentions that memory leaks are more dangerous in persisted memory because they survive a reboot. However... memory leaks will happen. "Just don't leak" isn't a solution. So, how are we supposed to address them? The mentioned API has nothing that could solve that problem that I can see. The obvious extensions will be nearly unusable, IMHO; it would take something relatively deep.

“memory leaks will happen” is not correct. We extend Go memory allocator and Garbage collector to manage persistent memory and Go garbage collector guarantees not to leak memory.

This would seem to be setting up for very long term support for this use case that will require wildly disproportional effort to maintain vs. the amount of use its going to have in the broader Go community compared to the existing code base.

Yes it is long term and so we opened this proposal for the community to talk about.

It seems like you'd be much happier just forking Go and being able to do what you need to do to deepen this integration without having to run everything past the core Go team, who are always going to be having to balance your niche desires vs the much larger core and generally you're going to lose.

As you suggested, we do have a fork of Go-1.15 with all our changes. We maintain this fork here. Here is the design document we posted along with this proposal. Applications using persistent memory benefit from very fast restart times and more throughput compared to running on traditional IO devices. We already see some language communities discussing persistent memory (Java, Python extensions) and so we opened this proposal.

@jerrinsg
Copy link
Contributor Author

jerrinsg commented Jan 21, 2021

I have questions :)
Disclaimer: I'm not previously familiar with the concept of persistent memory, so some questions might well be misguided.

We do appreciate all the questions :)

The docs say that pmemory is allocated similarly to Go, with the pnew function. But in Go the new builtin is almost never used. Instead, memory is most often allocated by using struct or slice literals. Would that be possible with this proposal? Can you allocate a map inside pmem? What about other kinds of data structures? I'm thinking that it would be nice to be able to re-use existing Go algorithm implementations without making a specialised copy that works only on pmem.

Our implementation do not support allocating persistent memory using struct or slice literals. This is not a fundamental limitation - just that we have not added such a support. But it will definitely need some new syntax to distinguish between persistent and volatile memory allocations.
We do not support creating maps in persistent memory using our pmake() API. Other datastructures can be allocated using the pnew() API. To an application, persistent memory looks like just another region in the heap, and data in persistent memory are accessed using regular pointers. So existing Go algorithms can run on data in persistent memory but they may not offer important guarantees such as crash consistency.

When you have a program that's using pmem you have two kinds of memory, not distinguished in the type system: volatile and persistent. If I understand correctly, if you accidentally store a pointer to volatile memory inside persistent memory, you'll lose that data on restart. I could see that as being a fruitful source of bugs: is this issue amenable to some kind of static analysis to avoid those bugs?

Yes, we do allow pointers to volatile memory to be store inside persistent memory. On application restart, any such pointers are zeroed out. We do agree this is not ideal. Static analysis is unlikely to catch all sources of such bugs as what gets stored in a pointer is very run-time dependent. We have seen few solutions being proposed to mitigate such problems. For example, Autopersist uses a reachability analysis to automatically persist all objects reachable from a durable root in persistent memory.

Persistent memory is accessed via a file-like API, so it should be possible to access more than one at a time, but the pmem package proposed here is global - it doesn't seem to provide any way to access more than one pmem file at a time from the same program. Is this a fundamental constraint?

This is not a fundamental constraint. It is possible to extend the API to allow persistent memory to be mapped from multiple files. The PMDK library from Intel, which is a persistent memory programming library for C and C++, provides similar functionalities - allowing users to create multiple pools, where each pool is backed by a separate file in persistent memory. But PMDK has the limitation that each pool is created of a fixed size. Extending a pool is not possible once it is filled up. In our design, applications start with a small heap that is dynamically expanded on demand. Multiple objects groups can be identified using named objects. So we did not find a need to have persistent memory be mapped from multiple files.

Is pmem access fundamentally exclusive? That is, is it possible to run two instances of the same pmem program concurrently? In my brief scan through some documents and code, I didn't see anything that indicated that it was. And ISTM that it would potentially be useful to be able to have multiple processes running with access to the same persistent dataset. If access is not exclusive, that raises the issue of synchronisation primitives: would a sync.Mutex or a sync.Cond work in persistent memory? If access is not exclusive, then is it possible to do garbage collection at all? The possibility of having pointers from persistent to volatile memory seems particularly hazardous in this scenario.

Our model exposes each persistent memory file to be used exclusively by a process - i.e., we assume another process is not using the same file at the same time. But again, this is not a fundamental limitation. Persistent memory files are like regular files on a disk. It is possible for two processes to map the same file into their address space and carefully curate access to the same file but in our experience this is very unusual. sync.Mutex and sync.Cond do work in persistent memory, but they may need to be reinitialized during application restart to ensure no stale state persists.

@mohit10verma
Copy link
Contributor

mohit10verma commented Jan 21, 2021

Since the tech is quite new, how's the security? I didn't find any papers evaluating security of it. Security is hardly mentioned in the docbook. From that perspective it's another Row Hammer / Spectre / etc waiting to happen.

I am not a security expert but I found some references to the security model of persistent memory.
This document mentions that data on persistent memory is encrypted using 256-bit AES hardware encryption.

@jerrinsg
Copy link
Contributor Author

@mohit10verma Apologees for not being clear enough. I intended to ask whether persistent memory can be emulated on hardware without persistent memory. My guess is yes, but it would be much slower.

Hi @ulikunitz,
Our programming model requires users to provide a file as an argument to the pmem.Init() API that initializes persistent memory. But this file need not reside in persistent memory. It could be a file in a ramdisk as Mohit mentioned, or a file on a regular disk. These are great to experiment with the new programming model for persistent memory but they will not offer the data persistence or data access speed benefits (respectively) of using persistent memory.
It is also possible to configure Linux boot options to earmark a portion of the DRAM to appear as persistent memory. See this guide for an example. Emulating DRAM as persistent memory gives you DRAM-like access speed but no data persistence. Hope that clarifies.

@ulikunitz
Copy link
Contributor

@jerrinsg Thanks, excellent explanation.

@rogpeppe
Copy link
Contributor

Our implementation do not support allocating persistent memory using struct or slice literals. This is not a fundamental limitation - just that we have not added such a support. But it will definitely need some new syntax to distinguish between persistent and volatile memory allocations.

Yes, we do allow pointers to volatile memory to be store inside persistent memory. On application restart, any such pointers are zeroed out. We do agree this is not ideal.

This is not a fundamental constraint. It is possible to extend the API to allow persistent memory to be mapped from multiple files.

Our model exposes each persistent memory file to be used exclusively by a process - i.e., we assume another process is not using the same file at the same time. But again, this is not a fundamental limitation.

ISTM that there are enough open possibilities here that it would be wrong to tie Go down by accepting a proposal at this stage. It's very early days for persistent memory and we still don't know how things might develop.

Persistent memory is definitely an interesting area to explore, but I think this issue is a non-starter at the current time and in its current state.

@thejerf
Copy link

thejerf commented Jan 22, 2021

Our current design also has open issues like how to handle IO within a transaction.

This is the issue. Adding the primitives was never the problem for other languages. In practice, as you compose transactions together the probability that something would go wrong with repeating IO approached 1 too quickly for real programmers. What works in a small sample function didn't work in the real world.

The reason I suggest you just fork Go is that with that freedom, you stand a chance of solving it. I can imagine three possibilities to explore off the top of my head.

“memory leaks will happen” is not correct. We extend Go memory allocator and Garbage collector to manage persistent memory and Go garbage collector guarantees not to leak memory.

I've written plenty of Go programs that leak, obviously not GC, but because I appended all my log messages to a slice and never cleared it or whatever. I can solve it with a restart in current Go. Persistent memory is going to need other solutions to this, because real code is going to do things like create new named objects all the time without properly keeping track of them (even if they are nominally "live" by GC standards), thus "leaking". This is going to be a large problem in practice.

And this is a great example of what I mean by the freedom to explore. You're going to find out you need solutions. Your first crack is, with all due respect, likely to not be as good as it could be. You need the freedom to change your answers at this phase.

You have a paper and a demo. That is good work and true progress; do not let me sound like I think otherwise. But the next step now is to publicize your demo and try to create a community around it in the persistent memory community, to get some prototyping going and find out what works and what doesn't in some more real programs, with real programmers. You're on step one and trying to skip to step twelve.

I do not mean this as discouragement... I mean this post as encouragement, and license, and permission! Fork the code, explore, learn, don't wait for approval! "Let's start from a working general-purpose language and fix it to work really well with persistent memory!" is a solid value proposition to create a community around, because of the Go implementation's relative simplicity compared to almost any other language of comparable capability. This competitive time in the technology's life is no time to be running everything past a committee and, well, naysayers like me! (Though I'm actually trying to be a "yea-sayer" here, it just may not be the "yea" you may have been looking for. I cheer you on and hope you create something wonderful!)

Explore! Create! Have some fun!

@gvrajesh
Copy link

Since the tech is quite new, how's the security?

Persistent memory is accessed similar to DRAM with load/store instructions, different levels of caches, etc. So it inherits any known cache side channel attacks. I am not a device expert, but I don't think Row_Hammer immediately applies to persistent memory because Row_Hammer depends on how DRAM device works.

As memory encryption technologies like TME and MKTME 1 mature, I believe they will work for persistent memory too. In addition, as Mohit points out, current persistent memory devices have capability to encrypt/decrypt data at store/load.

@rsc rsc moved this from Incoming to Active in Proposals (old) Jan 27, 2021
@rsc rsc changed the title proposal: add native support for programming persistent memory in Go proposal: spec: add native support for programming persistent memory in Go Jan 27, 2021
@rsc
Copy link
Contributor

rsc commented Jan 27, 2021

My primary concern about this proposal is the goal of "making Go the first language to completely support persistent memory". Being the first means making lots of mistakes and spending a lot of time redesigning, cleaning them up, and so on. I was once asked about the impact of the research literature on Go's design, and I replied in part:

Go is more an engineering project than a pure research project. Like most engineering, it is fundamentally conservative, using ideas that are proven and well understood and will work well together.

We really can't say that this approach to persistent memory is proven, nor well understood, nor that it will work well together with the other ideas in Go.

The cost of adopting a design that turns out not to be right is high. We have to keep supporting it even though it's not right, for backwards compatibility. On top of that, this particular design requires significant work in the garbage collector - a critical part of Go - and would limit future changes there. We really would want to be sure that we're not taking on a net cost as opposed to a net benefit, and I don't see the evidence for that.

Part of the reason we spent so long on generics was to avoid making mistakes. I think we've ended up in a decent place there, far better than we would have if we'd run with the best ideas we had in, say, 2012. We wanted to get something we'd be happy supporting for the next 10, 20 years. Waiting for the right design is a good strategy, one we have employed repeatedly.

I don't mean to discount the potentially transformative impact that support for and widespread availability of persistent memory might bring to programming. But this proposal is asking us to take a very big step on something and to lock in a particular way of doing things that may or may not be the right approach even a couple years from now, much less 10 or more.

I wonder whether there is any kind of generalization we could do that would let us make a much smaller change to the language and enable much more of the persistent memory "runtime" to live as an importable package maintained outside the core Go system.

@jerrinsg
Copy link
Contributor Author

Thanks Russ for your response. We understand that our design is not well-proven and there could be flaws that we have not thought about. Even if our proposal is not acccepted in its entirety, we feel it still can be broken into pieces that can be used to add support for programming persistent memory in a phased manner.

The first step can be to provide an interface which lets a Go program map a persistent memory file into its address space. One option could be to support an additional flag MAP_SYNC [2] in the Mmap API to let users map files that specifically support direct-access (DAX). Also, data written to persistent memory can be guaranteed to be made persistent only when they are flushed from the CPU caches as well. An API needs to be made available that can flush a range of memory address using appropriate CPU instructions.

Adding these two support will make it possible for Go programs to easily program persistent memory. It then becomes possible for external packages to be written which provides other features such as memory management, transactions, etc. Note that this is similar to what was implemented in OpenJDK through JEP 352 [1]

There are advantages to providing these features in the core runtime than as an external package - these are the bare-minimal changes required to support persistent memory programming and it becomes much easier later to expand the language to add additional features such as a garbage-collected persistent memory heap, support for transactionally updating persistent memory datastructures, etc. If there is sufficient interest to expand on the initial support added to Go, more features can be added later on.

[1] https://bugs.openjdk.java.net/browse/JDK-8207851
[2] https://man7.org/linux/man-pages/man2/mmap.2.html

@frioux
Copy link

frioux commented Jan 28, 2021

Maybe silly question: can't this functionality be implemented in an external package first, such that other external packages can use it and have the capability without requiring language changes? You refer to the mmap api, that can be used without changes to the (frozen) core syscall package, right?

@jerrinsg
Copy link
Contributor Author

Maybe silly question: can't this functionality be implemented in an external package first, such that other external packages can use it and have the capability without requiring language changes? You refer to the mmap api, that can be used without changes to the (frozen) core syscall package, right?

I was actually suggesting adding capabilities to mmap persistent memory files and adding an API to ensure data persistence be added to core Go. I did not know that the syscall package is frozen.
When we first started working on this project, we explored approaches without requiring any language changes. One of the goals of our project was to expose a programming model that was simple and similar to how Go programs are written currently. But we quickly realized this is not possible if persistent memory programming support is added through an external package
(1) Go is a language with automatic memory management. If persistent memory is managed using a package, then you will need to have explicit APIs for memory allocation and deallocation. Persistent memory is non-volatile. It is easy to miss a free() call and leak memory forever. Also, Go's garbage collector (GC) will not have any visibility into the persistent memory region managed by this library. Any volatile objects pointed from the persistent memory region may be considered as unreachable by the GC.
(2) Supporting transactional data updates is very important for persistent memory programming. We provide a simple txn() { ... } interface for this, where any persistent data updates within the txn(){ ... } block is automatically made crash-consistent. Without compiler changes, enabling such crash-consistent updates is still possible, but the code becomes very verbose. Each data update will now need to be preceded by transactional statements.

So, yes, it is possible to add this support through an external package, but it will be error-prone and expose a difficult programming model. This is why we are proposing native support be added in Go to program persistent memory.

@beoran
Copy link

beoran commented Feb 1, 2021

Looking at the low level API for persistent memory, this is, at least on Linux, simply mapping an file. https://pmem.io/pmdk/libpmem/ I don't see why this needs to be harder than https://golang.org/pkg/os/#Open, which requires a Close to be called in the end. You can provide transactions though a mutex-locked callback or a channel of update functions.

I think this is the actual challenge you should take: provide the functionality as an easy to use go module, then we can consider what pars are better as part of the standard libraries or of the language proper.

@ohir
Copy link

ohir commented Feb 1, 2021

Persistent memory technology to my knowledge is encumbered by a net of Intel patents, which in parts extend to the software. I do not think that Go ecosystem would benefit from an entanglement of active and enforceable software patents of Intel (and more litigous Oracle, too).

@mohit10verma As an excercise: get your patent attorney to help and please conceive an workaround that lets generated code keep a variable of interface type in the "persistent memory" in a manner that does not infringe on the "pwn it all" US9940229B2.

@jerrinsg
Copy link
Contributor Author

jerrinsg commented Feb 2, 2021

Looking at the low level API for persistent memory, this is, at least on Linux, simply mapping an file. https://pmem.io/pmdk/libpmem/ I don't see why this needs to be harder than https://golang.org/pkg/os/#Open, which requires a Close to be called in the end. You can provide transactions though a mutex-locked callback or a channel of update functions.

If you look at the PMDK man page, the bulk of the persistent memory initialization is done by the function pmem_map_file(). libpmem does a bunch of things under the covers - identify the mode in which the persistent memory namespace is configured (fsdax or devdax), identify the most optimized cache flush instruction supported on the platform (clwb, clflush, etc.), determine whether memory barriers are necessary to ensure persistence, mmap the persistent memory file with the newly added MAP_SYNC flag so that applications can safely flush the file from userspace, etc. So adding low-level persistent memory support does require a lot more support than just calling os.Open() and os.Close().

I think this is the actual challenge you should take: provide the functionality as an easy to use go module, then we can consider what pars are better as part of the standard libraries or of the language proper.

We did explore adding this functionality as a module but dropped that effort due to reasons I mentioned in an earlier comment.

@beoran
Copy link

beoran commented Feb 3, 2021

Sorry, I think I wasn't very clear about what I meant. I saw that comment, but there you state that the API you would need to provide when doing it as a package wouldn't very nice because you would like to implement a memory like API. So I am suggesting instead of such a memory like API, a file like API would be just as easy to use for the end user, and possible to implement as a package. It also seems more logical to me. Since persistent memory is persistent, like hard disks or flash drives are, to my mind, it seems more like a storage location for files than something I'd like to use similar to RAM.

@rsc
Copy link
Contributor

rsc commented Feb 3, 2021

Based on my comment last week and the response, it sounds like we should probably decline this full proposal but could easily add the MAP_SYNC flag and perhaps even the "flush memory region" operation.

Since syscall already has Mmap, it would be fine to add MAP_SYNC to syscall on systems that define that flag, even though syscall is frozen to new API. (This is adding a newly defined bit for use with old API.)

And then if you would like to file a proposal for the API for flushing a memory region, that would be fine too.
(Perhaps syscall.Mflush of a []byte like Mprotect/Mlock/etc? Anyway, it should be a separate proposal.)

Thanks.

@rsc
Copy link
Contributor

rsc commented Feb 3, 2021

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

@rsc rsc moved this from Active to Likely Decline in Proposals (old) Feb 3, 2021
@jerrinsg
Copy link
Contributor Author

jerrinsg commented Feb 4, 2021

So I am suggesting instead of such a memory like API, a file like API would be just as easy to use for the end user, and possible to implement as a package.

Persistent memory gives a byte-level granularity of read/write access to data stored in it. So in order to get best performance out of it, applications should use direct load/store cpu instructions to read/write persistent memory data. This is achieved by mounting a persistent memory device in DAX mode that avoids the page cache. Applications can then map a file in persistent memory into its address space and directly use it, just like memory. See [1] for some additional FAQs about persistent memory programming. Due to these reasons, a file-like API will be unsuitable for persistent memory programming.

[1] https://software.intel.com/content/www/us/en/develop/articles/persistent-memory-faq.html

@jerrinsg
Copy link
Contributor Author

jerrinsg commented Feb 4, 2021

Based on my comment last week and the response, it sounds like we should probably decline this full proposal but could easily add the MAP_SYNC flag and perhaps even the "flush memory region" operation.

Thanks @rsc for the updates. Should separate proposals be opened to (1) add a MAP_SYNC flag and (2) add an API that supports flushing a memory address range?

@rsc
Copy link
Contributor

rsc commented Feb 10, 2021

@jerrinsg, yes please add separate proposals for MAP_SYNC and a flushing memory API. Feel free to link to my comment above for context. #43810 (comment)

@rsc
Copy link
Contributor

rsc commented Feb 10, 2021

No change in consensus, so declined.
— rsc for the proposal review group

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Development

No branches or pull requests