Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

io/fs: add SymlinkFS interface #49580

Open
zombiezen opened this issue Nov 14, 2021 · 53 comments
Open

io/fs: add SymlinkFS interface #49580

zombiezen opened this issue Nov 14, 2021 · 53 comments
Labels
FixPending Issues that have a fix which has not yet been reviewed or submitted. Proposal Proposal-Accepted
Milestone

Comments

@zombiezen
Copy link
Contributor

What version of Go are you using (go version)?

$ go version
go version go1.17.3 linux/amd64

Does this issue reproduce with the latest release?

Yes

What did you do?

Walked a directory with fs.WalkDir and encountered a symlink that I wanted to read.

What did you expect to see?

A function fs.ReadLink that behaves like os.Readlink, but operates on an fs.FS. Design sketch:

package fs

// ReadLink returns the destination of the named symbolic link.
//
// If fsys does not implement ReadLinkFS, then ReadLink returns an error.
func ReadLink(fsys FS, name string) (string, error)

// ReadLinkFS is the interface implemented by a file system that supports symbolic links.
type ReadLinkFS interface {
  FS

  // ReadLink returns the destination of the named symbolic link.
  ReadLink(name string) (string, error)
}

I would also want the file system returned by os.DirFS to have an implementation that calls os.Readlink. IIUC archive/zip.Reader would probably also benefit from an implementation.

An open question in my mind is whether the returned destination should be a slash-separated path or kept as-is. I think for consistency it probably should convert to a slash-separated path, but I'm not sure if this has problems on Windows.

What did you see instead?

No such API exists.

Other details

I have bandwidth to contribute an implementation of this, but I understand we're in the freeze and the earliest this could go in is Go 1.19.

This is somewhat related to #45470, but I'm not proposing changing any existing semantics, just adding a new method.

@gopherbot gopherbot added this to the Proposal milestone Nov 14, 2021
@bcmills
Copy link
Contributor

bcmills commented Nov 15, 2021

An open question in my mind is whether the returned destination should be a slash-separated path or kept as-is.

I would say it should be slash-separated and also relative to the same FS: links to absolute paths should be made relative, and links to paths above the FS root (or on an entirely different volume) should be rejected.

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Nov 24, 2021
@rsc
Copy link
Contributor

rsc commented Dec 1, 2021

Rewriting the link is tricky and not rewriting the link is also tricky. It's unclear to me what we should do here, if anything.

@rsc
Copy link
Contributor

rsc commented Dec 1, 2021

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Incoming to Active in Proposals (old) Dec 1, 2021
@zombiezen
Copy link
Contributor Author

As a data point, for the application I was writing, rewriting the link to be relative is effectively what I did anyway. I wanted to create a zip archive of an on-disk directory, so absolute paths were rewritten to be relative to the DirFS root. Returning an error if the path could not be represented was acceptable.

@rsc
Copy link
Contributor

rsc commented Dec 8, 2021

So the API for ReadLink is that it must return a path relative to the link, and if it can't do that, then it returns an error?
It seems like most symlinks are absolute, though, and most DirFS(foo) will not use foo = "/", so that will make most symlinks result in errors?

I'm trying to understand how useful this will be in practice, to balance against the cost. Will it be useful in practice? Or will people just be frustrated that 99% of symlinks aren't usable?

@bcmills
Copy link
Contributor

bcmills commented Dec 8, 2021

So the API for ReadLink is that it must return a path relative to the link, and if it can't do that, then it returns an error?

Yes.

It seems like most symlinks are absolute, though, and most DirFS(foo) will not use foo = "/", so that will make most symlinks result in errors?

I would expect the ReadLink method to internally transform absolute link paths to relative ones (for example, by using filepath.Rel and then checking that the returned path does not begin with ../.) That would allow symlinks to succeed when they are to other locations under the FS root — they would only fail if they are absolute symlinks that would otherwise refer to locations completely outside the FS tree.

So, for example, os.DirFS("/") on Unix would be able to resolve symlinks anywhere on the filesystem, even if those symlinks are absolute.

@hherman1
Copy link

hherman1 commented Dec 8, 2021

I would expect the ReadLink method to internally transform absolute link paths to relative ones (for example, by using filepath.Rel and then checking that the returned path does not begin with ../.) That would allow symlinks to succeed when they are to other locations under the FS root — they would only fail if they are absolute symlinks that would otherwise refer to locations completely outside the FS tree.

Does this extend to non local-disk filesystems?

@zombiezen
Copy link
Contributor Author

@rsc:

So the API for ReadLink is that it must return a path relative to the link, and if it can't do that, then it returns an error?

That was sufficient for my application. The root of my DirFS could contain multiple projects (think a GOPATH-like setup), so symlinks would usually resolve within the io/fs filesystem.

It seems like most symlinks are absolute, though, [...]

I'm not 100% convinced of that, but I don't have evidence to dispute your claim.

FWIW the cases that I was concerned with in a DevTools context:

  • Symlinks within the same directory. A common case is multiple aliases for a single on-disk program, like Busybox.
  • IIRC old versions of Git would use symlinks to represent symbolic references like HEAD instead of a file.

I'm trying to understand how useful this will be in practice, to balance against the cost. Will it be useful in practice? Or will people just be frustrated that 99% of symlinks aren't usable?

Agreed, I think weighing this tradeoff is the trickiest part of this proposal.

Relative links are IMO the most useful for a consumer of this API. I could imagine an implementation of ReadLink also returning absolute paths in the case where the returned path is above the root io/fs directory, but this might add complexity when the paths aren't slash-separated. However, I tend to agree with @bcmills that being strict about this is probably better.

You're probably already considering this, but it's just FS.ReadLink that would be picky about the link target. The implicit interface of DirFS.Open is to follow symlinks (again, #45470 tracks spelling out that behavior for other filesystems), and symlinks are already visible in directory listings.


@hherman1:

Does this extend to non local-disk filesystems?

I'm proposing the slash-separated relative path restriction would extend to non-local-disk filesystems, yes. How each implementation meets this contract is up to the individual filesystem.

@rsc
Copy link
Contributor

rsc commented Dec 15, 2021

@zombiezen, thanks for the use cases. It seems fairly unobtrusive to add ReadLinkFS and fs.ReadLink, so the cost seems low and the benefit > 0.

@bcmills:

I would expect the ReadLink method to internally transform absolute link paths to relative ones (for example, by using filepath.Rel and then checking that the returned path does not begin with ../.) That would allow symlinks to succeed when they are to other locations under the FS root — they would only fail if they are absolute symlinks that would otherwise refer to locations completely outside the FS tree.

So, for example, os.DirFS("/") on Unix would be able to resolve symlinks anywhere on the filesystem, even if those symlinks are absolute.

Rewriting symlinks may run into problems. I've been burned enough that I'm a bit wary about that. Should we start with just erroring out on the absolute ones?

@bcmills
Copy link
Contributor

bcmills commented Dec 15, 2021

I think it would be fine to start by erroring out on absolute links, and perhaps define a specific error (or error type) for symlinks that refer to locations outside of the FS.

From the perspective of fs.ReadLinkFS, the requirement would be that every returned path is relative to the passed-in name and below the FS root. (It would be up to the specific FS implementation to decide whether to achieve that by rewriting absolute links or rejecting them.)

@rsc
Copy link
Contributor

rsc commented Jan 5, 2022

OK, so it sounds like we agree on ReadLink and ReadLinkFS but with the restriction that the returned link must be relative, and absolute symlinks return errors instead. Do I have that right?

@zombiezen
Copy link
Contributor Author

IIUC the full constraint for the returned link is relative and underneath the root of FS. Otherwise, yes, that matches my understanding.

@rsc
Copy link
Contributor

rsc commented Jan 12, 2022

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc rsc moved this from Active to Likely Accept in Proposals (old) Jan 12, 2022
@rsc rsc moved this from Likely Accept to Accepted in Proposals (old) Jan 19, 2022
@rsc
Copy link
Contributor

rsc commented Jan 19, 2022

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: io/fs: add ReadLinkFS interface io/fs: add ReadLinkFS interface Jan 19, 2022
@rsc rsc modified the milestones: Proposal, Backlog Jan 19, 2022
@zombiezen
Copy link
Contributor Author

Excellent! I'm working on a CL now.

Implementation question: I would very much like to be able to use filepath.Clean and filepath.Rel in the os.DirFS implementation, but that would introduce a cyclic dependency AFAICT. What's the best way to use those functions?

@ianlancetaylor
Copy link
Contributor

In principle you could move the functions into an internal package imported by both the os package and the path/filepath package. But try to avoid that if you can. Better to keep the os package focused on the simplest possible approach. In particular I don't see why the os package would need to use Clean.

DavidGamba added a commit to DavidGamba/dgtools that referenced this issue Jan 20, 2022
@gopherbot
Copy link

Change https://go.dev/cl/385534 mentions this issue: io/fs: add ReadLinkFS interface

@kaniini
Copy link

kaniini commented Feb 21, 2022

I have use-cases which require the ability to have an absolute path returned, namely, taking an fs.FS and taring it up (tons of symlinks to /bin/busybox), would it be possible to make it so that an absolute path can be returned if opted into?

@mvdan
Copy link
Member

mvdan commented Sep 6, 2023

Thank you for considering Lstat even though I arrived late to the proposal process :)

@rsc
Copy link
Contributor

rsc commented Oct 3, 2023

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@tianon
Copy link
Contributor

tianon commented Oct 3, 2023

Can someone clarify for me why the proposed readlink function would modify the symlinks at all (or rather, why that's mandated)? In the filesystem, symlinks are mostly effectively just special files, and dangling symlinks are a valid use case, for example.

As a more concrete example, I've got an implementation of FS for a Git repo for which symlinks have been a pain, but an interface that puts extra constraints on what I can return about symlinks doesn't really resolve the problem for me (symlinks might point outside the repo, to non-existent files, etc, and those are all valid and often useful to the consumer; in those cases, Lstat would succeed but Stat would error and that's reasonable and matches the behavior on a normal system).

@mvdan
Copy link
Member

mvdan commented Oct 3, 2023

@tianon where do you see that modification being documented or implemented?

@tianon
Copy link
Contributor

tianon commented Oct 3, 2023

I guess I'm probably just misunderstanding the discussions here around the use of Rel and Clean (and absolute path restrictions) and now I realize those are probably specific to a particular implementation of the interface 🤦 apologies!

@rsc
Copy link
Contributor

rsc commented Oct 11, 2023

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: io/fs: add SymlinkFS interface io/fs: add SymlinkFS interface Oct 11, 2023
@zombiezen
Copy link
Contributor Author

The CL is ready for review with the new changes.

@ianlancetaylor
Copy link
Contributor

@zombiezen Is that https://go.dev/cl/385534 ? It seems to have picked up a merge conflict.

@zombiezen
Copy link
Contributor Author

Yes, that's the one. I'll go fix that.

@mvdan mvdan mentioned this issue Oct 24, 2023
@dsnet
Copy link
Member

dsnet commented Nov 1, 2023

I apologize for bikeshedding now, but should we declare two interfaces instead of one?

type ReadLinkFS interface {
    ReadLink(name string) (string, error)
}
type StatLinkFS interface {
    StatLink(name string) (FileInfo, error)
}

Arguments for this:

  1. It follows the current pattern where each interface type with a FS suffix only contains a single method
    and the prefix is named on the method itself.
  2. If we ever support writable interfaces, I fear that SymlinkFS would take a valuable name for the Symlink method itself.

@mvdan
Copy link
Member

mvdan commented Nov 1, 2023

My thinking was to have both in the same interface because, on the reading side, I can't think why an implementation would be able to provide one method but not the other. They implement two sides of the same feature, so I wouldn't split them up unless we have good reason to.

Your arguments about following precedent are interesting. I can't say I disagree with them, but I'm also not terribly concerned if we aren't consistent.

Speaking of late bikeshedding... I just noticed that the accepted design at #49580 (comment) uses the names ReadLink and Lstat, whereas os uses Readlink (one uppercase) and Lstat. Did we do that on purpose? If the point is to be more user friendly, then perhaps something like Joe's StatLink is more self-describing than Lstat. I only suggested Lstat to mirror os, but we already aren't mirroring it with ReadLink.

@dsnet
Copy link
Member

dsnet commented Nov 1, 2023

I just noticed that the accepted design at #49580 (comment) uses the names ReadLink and Lstat, whereas os uses Readlink (one uppercase) and Lstat. Did we do that on purpose?

I assumed yes since there was a light precedent set forth by fs to rename legacy C-like names. For example, we added os.File.ReadDir and codified that method in the fs.ReadDirFile interface instead of using the legacy os.File.Readdir (with lower case dir) name.

@dsnet
Copy link
Member

dsnet commented Nov 1, 2023

The relevance of my concern arises because I'm working on a package that extends io/fs to support full writeability. I had defined an interface called SymlinkFS that just includes the Symlink method. I wanted to design the package to be forwards compatible with what was potentially being added to the io/fs package, and the name conflict over SymlinkFS came up.

@zombiezen
Copy link
Contributor Author

@ianlancetaylor @bcmills With the release freeze coming, what's left to do for this CL? IMO this is ready to go, barring any changes based on @dsnet's comments.

@paralin
Copy link
Contributor

paralin commented Dec 2, 2023

Looking forward to SymlinkFS support; I have had a case recently writing a routine to copy some contents from an io/fs.FS to another format, and due to the lack of symlink support, needed to port it to a different filesystem abstraction.

@dmitshur dmitshur modified the milestones: Backlog, Go1.23 Jan 19, 2024
@dmitshur dmitshur added the FixPending Issues that have a fix which has not yet been reviewed or submitted. label Jan 19, 2024
@dsnet
Copy link
Member

dsnet commented Apr 26, 2024

I was chatting with @mvdan and I was emphasizing that my main concern was point 2 in my earlier comment (which I should have listed first):

If we ever support writable interfaces, I fear that SymlinkFS would take a valuable name for the Symlink method itself.

To elaborate further, this arises as I was working on a project that needed writable virtual FS interfaces. While working on that, I had a clone of the "io/fs" API but with additional interfaces to support writing. One of my declarations was SymlinkFS to, well, create symlinks. Thus, the addition of an interface called SymlinkFS is stealing a good name from that possible future world.

@mvdan and I propose:

type ReadLinkFS interface {
    ReadLink(name string) (string, error)
    StatLink(name string) (FileInfo, error)
}

which:

  • Renames SymlinkFS as ReadLinkFS to avoid stealing the SymlinkFS name.
  • Combines ReadLink and StatLink into the same interface as proposed by @mvdan in io/fs: add SymlinkFS interface #49580 (comment). The naming of the interfaces takes after the ReadLink method and StatLink comes along for the ride. Stat is still a "read" operation, so this doesn't seem that objectionable.
  • Renames Lstat as StatLink to be consistent in naming style as ReadLink and to C-style names in a modern Go API.

@zombiezen
Copy link
Contributor Author

@dsnet LGTM. I can make that change to the CL.

@mvdan
Copy link
Member

mvdan commented Apr 30, 2024

@ianlancetaylor @rsc #49580 (comment) tweaks the names of the interface of one of the methods compared to the accepted version as of #49580 (comment); does that sound good?

It's unclear to me if we want to re-open the proposal as a "likely accept" for a week or two to see if anyone disagrees. In any case I'd love to have final agreement in the next week or two so that we can merge the updated CL before the 1.23 freeze.

@zombiezen
Copy link
Contributor Author

Another point in favor of calling this new interface ReadLinkFS: it has symmetry with the existing ReadFileFS and ReadDirFS interfaces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FixPending Issues that have a fix which has not yet been reviewed or submitted. Proposal Proposal-Accepted
Projects
Status: Accepted
Development

No branches or pull requests