Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: archive/tar: implement fs.FS #61232

Closed
sding3 opened this issue Jul 7, 2023 · 10 comments
Closed

proposal: archive/tar: implement fs.FS #61232

sding3 opened this issue Jul 7, 2023 · 10 comments
Labels
Milestone

Comments

@sding3
Copy link
Contributor

sding3 commented Jul 7, 2023

archive/tar.Reader should implement fs.FS which puts it on par with archive/zip.Reader and it enables the tarball to be accessed through the fs.FS interface without requiring an intermediate step of inflating the tarball out to the filesystem.

@sding3 sding3 added the Proposal label Jul 7, 2023
@seankhliao seankhliao changed the title archive/tar: implement fs.FS proposal: archive/tar: implement fs.FS Jul 7, 2023
@gopherbot gopherbot added this to the Proposal milestone Jul 7, 2023
@seankhliao
Copy link
Member

seankhliao commented Jul 7, 2023

I don't think this is feasible with the current API and without in memory buffering: archive/tar.NewReader takes an io.Reader, it can only read the source once, plus tar is not a generally seekable archive format.

@seankhliao seankhliao added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Jul 7, 2023
@sding3
Copy link
Contributor Author

sding3 commented Jul 7, 2023

I think we can manage that by checking to see if the io.Reader also implements io.Seeker or io.ReaderAt, and error out the fs.FS operations when the io.Reader cannot be sought.

@apparentlymart
Copy link

apparentlymart commented Jul 7, 2023

I think even if the given reader did also implement io.Seeker and/or io.ReaderAt this would still be a pretty significant change to the existing design of tar.Reader, which currently relies on the underlying reader's tracking of position within the file and doesn't retain any memory of entries that were already visited.

It would be possible in principle to make a separate type that wraps an io.ReadSeeker and implements fs.FS by making a fresh tar.Reader for each call and scanning over potentially the entire input on each operation, but that would be pretty inefficient in comparison to most other fs.FS implementations such that I'm skeptical that it would be widely used. Of course, if you implement it and publish it somewhere that other people can use it then that might prove me wrong. 😀

Do you have some specific ideas for how it would be implemented? If the proposal is to redesign tar.Reader's internals so that it can support random access by seeking and scanning (which it cannot do today), I expect you'd need to demonstrate significant demand for this capability to justify that additional complexity and the risk of significantly changing the existing working code. One way to achieve that would be to copy the contents of archive/tar into a separate library of your own and modify it to provide what you need and use that to demonstrate that it's feasible to implement and that the result is useful (hopefully it would be used by more than just you).

Based on experience with other proposals, I think you'd then also need to demonstrate that there's some benefit to it being in standard library rather than just remaining as a separate library maintained in your own repository.

@neild
Copy link
Contributor

neild commented Jul 7, 2023

tar (Tape ARchive) is a streaming format. It's fundamentally unsuited for random access. You could build a random access reader, of course, but you'd be fighting the format and it would be a completely different implementation than archive/tar.

If you want random access within an archive, you're better off using zip, which is designed for the purpose.

@seankhliao seankhliao added WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. and removed WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Jul 8, 2023
@sding3
Copy link
Contributor Author

sding3 commented Jul 8, 2023

It looks like the zip format affords a "central directory" at the end of the file.

The first call of tar.Reader.Open() can build a similar "central directory" so the penalty is paid only once, and the godoc on that should make this clear.

Tar archives frequently appear as os.File, which supports random access (ReadAt).

@sding3
Copy link
Contributor Author

sding3 commented Jul 8, 2023

An alternative is to make a new constructor to allow for a more self-contained implementation and stronger distinction from the existing tar.Reader.

package tar

func FS(r io.ReaderAt, size int64) (fs.FS, error)

@apparentlymart
Copy link

An entirely new implementation that requires an io.ReaderAt and constructs a directory in memory does seem technically possible indeed, but it doesn't seem like it would have much in common with the implementation in archive/tar. You could implement it as a library outside of standard library to demonstrate how it would work, to see how much code it would potentially share with the existing archive/tar implementation, and to see whether it's widely useful enough to warrant potential inclusion in the standard library.

FWIW in my experience when I've used archive/tar I've typically passed it an archive/gzip.Reader wrapping an io.File rather than an io.File directly, because I most often interact with gzip-compressed tar files. I don't mean to say that there aren't situations where it's valuable to work directly with an uncompressed archive, but this demonstrates another important difference between tar and zip: the zip format uses separate compressed streams for each file, with the directory describing the location of each compressed stream, and so it's possible to perform streaming decompression of each file separately while reading it. tar doesn't support compression itself, and so the overall tar stream is typically compressed separately for efficient distribution.

@seankhliao
Copy link
Member

there are existing implementations, neither of which seem popular enough to be worth adding into the standard library

https://pkg.go.dev/github.com/nlepage/go-tarfs
https://pkg.go.dev/github.com/quay/claircore/pkg/tarfs

@seankhliao seankhliao removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Jul 9, 2023
@sding3
Copy link
Contributor Author

sding3 commented Jul 10, 2023

Thanks - closing this out.

@sding3 sding3 closed this as completed Jul 10, 2023
@rsc
Copy link
Contributor

rsc commented Aug 9, 2023

This proposal has been declined as retracted.
— rsc for the proposal review group

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Declined
Development

No branches or pull requests

6 participants