proposal: net/http: basic seek support for io.FS #61791

oliverpool · 2023-08-06T20:29:26Z

Support for io.FS in net/http was added in 7211694

I have been experimenting lately with serving the content of a tar or a zip file via the http.FS. Supporting Seek completely is the most challenging part (and a bad experience, because it breaks at runtime, on specific files - when the MIME type must be detected).

However after analyzing the code of net/http, it appears that Seek is only called in 2 cases:

sniff the MIME ContentType and seek back to the beginning afterward
for range requests

Instead of properly implementing Seek I attempted an alternative approach, faking just enough of the Seek method to support:

sniff the MIME ContentType and seek at the start afterward (by buffering the first 512 bytes)
seek forward for the ranges requests (by discarding the bytes read and failing on non-ascending ranges)

Non-ascending multi-range request will fail on the first request going backward (except if the previous ranges were within sniffLen). And a malicious actor could request the last byte as a range forcing the server to read the whole file (kind of an amplification attack).

I have created an importable package to experiment with this approach:
https://git.sr.ht/~oliverpool/exp/tree/main/item/seekfaker/seekfaker.go

It seems to work (the tests do work at least :), but strongly relies on the internal of net/http (making it quite brittle).

I think it would be a nice addition to the stdlib, to:

make it stable
improve the integration of io.FS (by reducing drastically the work of io.FS implementers)

The comment added in afd792f could then be reworded to state // The files provided by fsys must implement io.Seeker to efficiently support range requests.

Draft for the changes to the stdlib: https://git.sr.ht/~oliverpool/go/tree/httpfs_seekable/item/src/net/http/fsys.go (adapted from the fs.go file).

2023-08-07: updated wording to address the misunderstanding of #61791 (comment)

The text was updated successfully, but these errors were encountered:

AlexanderYastrebov · 2023-08-06T21:23:17Z

sniff the MIME ContentType and seek at the start afterward (by buffering the first 512 bytes)

Isn't it seeking to the start already?

go/src/net/http/fs.go

Lines 238 to 248 in 460dc37

    
           if ctype == "" { 
        
           	// read a chunk to decide between utf-8 text and binary 
        
           	var buf [sniffLen]byte 
        
           	n, _ := io.ReadFull(content, buf[:]) 
        
           	ctype = DetectContentType(buf[:n]) 
        
           	_, err := content.Seek(0, io.SeekStart) // rewind to output whole file 
        
           	if err != nil { 
        
           		Error(w, "seeker can't seek", StatusInternalServerError) 
        
           		return 
        
           	} 
        
           }

(Update) This is the problem (also explained in #48781):

go/src/net/http/fs.go

Lines 792 to 798 in 460dc37

    
           func (f ioFile) Seek(offset int64, whence int) (int64, error) { 
        
           	s, ok := f.file.(io.Seeker) 
        
           	if !ok { 
        
           		return 0, errMissingSeek 
        
           	} 
        
           	return s.Seek(offset, whence) 
        
           }

ianlancetaylor · 2023-08-07T05:09:44Z

CC @neild @bradfitz

Prototype for golang#61791

neild · 2023-08-07T16:04:58Z

I don't think we want the complexity of wrapping a non-io.Seeker in something that tries to implement Seek.

If you want to serve a non-seekable filesystem, like the contents of a tar, you can do so fairly simply with a handler that opens the file and uses io.Copy to serve it. This won't handle Range requests (serving the entire file in response to one), but I believe it will handle Content-Type detection.

But it doesn't seem unreasonable to relax the requirement that fs.File files served by net/http don't need to implement io.Seeker, especially since #51971 is adding ServeFileFS.

If we do that, then I think the way is to change serveContent to not require a seekable input. It can defer content-type detection to ResponseWrite.Write (I'm not sure why it isn't doing that now--historical accident, or is there a better reason?), and either ignore Range requests for non-seekable files or implement them by discarding the skipped portions of the file.

AlexanderYastrebov · 2023-08-07T16:16:10Z

I think this could also be achieved by a wrapper FS that would wrap Files to implement "stream seek" (forward seek by discarding or backwards seek not further than size of buffered last written bytes).

neild · 2023-08-07T17:14:04Z

I'm not sure why it isn't doing that now--historical accident, or is there a better reason?

Answering my own question: ServeContent can detect a content type when the response doesn't include the first bytes of the file, such as when responding to a HEAD request or a range request.

oliverpool · 2023-08-07T19:02:30Z

If we do that, then I think the way is to change serveContent to not require a seekable input. It can defer content-type detection to ResponseWrite.Write (I'm not sure why it isn't doing that now--historical accident, or is there a better reason?), and either ignore Range requests for non-seekable files or implement them by discarding the skipped portions of the file.

I think my prototype is implementing this suggestion.

The ioFileSeekFaker struct is just a way to have this functionality implemented without having to add more logic to the already quite complex serveContent. Besides it only slows down the non-seeker case. Implementing this inside serveContent would mean something like:

move the DetectContentType buffer at the top level, to be able to serve it later (sendContent = io.MultiReader(bytes.NewReader(ctypeBuf), content))
add the io.CopyN(io.Discard, ...) on "simple" range request
add the io.CopyN(io.Discard, ...) on "multiple" range request and stop on non-ascending range (or respond with StatusRequestedRangeNotSatisfiable immediately)

PS: I think supporting DetectContentType and "simple" range request would cover most usecases (I have never seen multi-range requests in the wild).

Prototype for golang#61791

rsc · 2023-08-09T18:18:03Z

I don't believe this is a good idea. Essentially everything passed to net/http should implement Seek. Otherwise range requests fail, and if range requests fail, then downloads can't be resumed, structured files can't be fetched incrementally, and so on.

Chrome and most browsers that preview PDF files fetch the specific sections use range requests to fetch the specific file sections they need to render the current page, instead of having to download the entire file just to show page 1. Long ago, before Go handled range requests well, I noticed that reading PDFs on Go servers was incredibly slow. It turned out this was because Chrome was assuming that range requests are always supported, and so it would do a range request for a specific block of the file, Go would send back the entire file, and Chrome would extract the one block it wanted. Then the next block that needed to be read, same thing. It turned viewing a single page from what should have been sub-linear amounts of bandwidth to almost quadratic. Maybe Chrome has been fixed now, but that experience taught me that in the modern world you're just not a real web server if you don't support range requests.

If we did the "seek by reading and discarding" implementation of range requests, that would fix the bandwidth issue in the PDF failure but not the overall cost. It would create a nice low-bandwidth denial-of-service for the server: a client could connect and ask for the very last byte of the file and cause the server to process the entire compressed data stream.

Given that modern web servers must support range requests and range requests must have Seek to be implemented efficiently, it seems OK to me to leave things as they are and strongly encourage fs.File implementations used with net/http to implement Seek.

rsc · 2023-08-09T21:42:04Z

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

oliverpool · 2023-08-14T16:14:02Z

Chrome and most browsers that preview PDF files fetch the specific sections use range requests to fetch the specific file sections they need to render the current page, instead of having to download the entire file just to show page 1.

Apparently it only performs "single-range" requests. Or do you know of any "multiple-range" requests widely used?

It would create a nice low-bandwidth denial-of-service for the server

Yes, this has been acknowledged in the first message. A mitigation would be to limit the max possible seek, however it would mean that the "good" actors will have to download the entire file in case they make such a range-request (making for a terrible user experience, as you noted).

strongly encourage fs.File implementations used with net/http to implement Seek

Currently it is more of an obligation (than just a mere encouragement :)

My current understanding is that such "hack" should be better served by a package outside of the stdlib (with sufficient warning regarding DoS/bad user experience).

And regarding my first argument "make it stable", it is probably not even relevant since the numbers of bytes read for MIME sniffing is unlikely to change in the future.

rsc · 2023-08-16T18:03:14Z

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

rsc · 2023-08-30T18:19:05Z

No change in consensus, so declined.
— rsc for the proposal review group

oliverpool added the Proposal label Aug 6, 2023

gopherbot added this to the Proposal milestone Aug 6, 2023

This was referenced Aug 6, 2023

WIP: hack archive/tar to prevent having to read entries in memory nlepage/go-tarfs#13

Closed

caddy.fs: support tar archive caddyserver/caddy#4945

Closed

AlexanderYastrebov added a commit to AlexanderYastrebov/go that referenced this issue Aug 7, 2023

net/http: add stream file seeker

0040c40

Prototype for golang#61791

AlexanderYastrebov added a commit to AlexanderYastrebov/go that referenced this issue Aug 7, 2023

net/http: add stream file seeker

c9d20a6

Prototype for golang#61791

rsc added the Proposal-FinalCommentPeriod label Aug 16, 2023

rsc closed this as completed Aug 30, 2023

rsc removed the Proposal-FinalCommentPeriod label Aug 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: net/http: basic seek support for io.FS #61791

proposal: net/http: basic seek support for io.FS #61791

oliverpool commented Aug 6, 2023 •

edited

AlexanderYastrebov commented Aug 6, 2023 •

edited

ianlancetaylor commented Aug 7, 2023

neild commented Aug 7, 2023

AlexanderYastrebov commented Aug 7, 2023

neild commented Aug 7, 2023 •

edited

oliverpool commented Aug 7, 2023 •

edited

rsc commented Aug 9, 2023

rsc commented Aug 9, 2023

oliverpool commented Aug 14, 2023

rsc commented Aug 16, 2023

rsc commented Aug 30, 2023

proposal: net/http: basic seek support for io.FS #61791

proposal: net/http: basic seek support for io.FS #61791

Comments

oliverpool commented Aug 6, 2023 • edited

AlexanderYastrebov commented Aug 6, 2023 • edited

ianlancetaylor commented Aug 7, 2023

neild commented Aug 7, 2023

AlexanderYastrebov commented Aug 7, 2023

neild commented Aug 7, 2023 • edited

oliverpool commented Aug 7, 2023 • edited

rsc commented Aug 9, 2023

rsc commented Aug 9, 2023

oliverpool commented Aug 14, 2023

rsc commented Aug 16, 2023

rsc commented Aug 30, 2023

oliverpool commented Aug 6, 2023 •

edited

AlexanderYastrebov commented Aug 6, 2023 •

edited

neild commented Aug 7, 2023 •

edited

oliverpool commented Aug 7, 2023 •

edited