Source file src/runtime/mgc.go
1 // Copyright 2009 The Go Authors. All rights reserved. 2 // Use of this source code is governed by a BSD-style 3 // license that can be found in the LICENSE file. 4 5 // Garbage collector (GC). 6 // 7 // The GC runs concurrently with mutator threads, is type accurate (aka precise), allows multiple 8 // GC thread to run in parallel. It is a concurrent mark and sweep that uses a write barrier. It is 9 // non-generational and non-compacting. Allocation is done using size segregated per P allocation 10 // areas to minimize fragmentation while eliminating locks in the common case. 11 // 12 // The algorithm decomposes into several steps. 13 // This is a high level description of the algorithm being used. For an overview of GC a good 14 // place to start is Richard Jones' gchandbook.org. 15 // 16 // The algorithm's intellectual heritage includes Dijkstra's on-the-fly algorithm, see 17 // Edsger W. Dijkstra, Leslie Lamport, A. J. Martin, C. S. Scholten, and E. F. M. Steffens. 1978. 18 // On-the-fly garbage collection: an exercise in cooperation. Commun. ACM 21, 11 (November 1978), 19 // 966-975. 20 // For journal quality proofs that these steps are complete, correct, and terminate see 21 // Hudson, R., and Moss, J.E.B. Copying Garbage Collection without stopping the world. 22 // Concurrency and Computation: Practice and Experience 15(3-5), 2003. 23 // 24 // 1. GC performs sweep termination. 25 // 26 // a. Stop the world. This causes all Ps to reach a GC safe-point. 27 // 28 // b. Sweep any unswept spans. There will only be unswept spans if 29 // this GC cycle was forced before the expected time. 30 // 31 // 2. GC performs the mark phase. 32 // 33 // a. Prepare for the mark phase by setting gcphase to _GCmark 34 // (from _GCoff), enabling the write barrier, enabling mutator 35 // assists, and enqueueing root mark jobs. No objects may be 36 // scanned until all Ps have enabled the write barrier, which is 37 // accomplished using STW. 38 // 39 // b. Start the world. From this point, GC work is done by mark 40 // workers started by the scheduler and by assists performed as 41 // part of allocation. The write barrier shades both the 42 // overwritten pointer and the new pointer value for any pointer 43 // writes (see mbarrier.go for details). Newly allocated objects 44 // are immediately marked black. 45 // 46 // c. GC performs root marking jobs. This includes scanning all 47 // stacks, shading all globals, and shading any heap pointers in 48 // off-heap runtime data structures. Scanning a stack stops a 49 // goroutine, shades any pointers found on its stack, and then 50 // resumes the goroutine. 51 // 52 // d. GC drains the work queue of grey objects, scanning each grey 53 // object to black and shading all pointers found in the object 54 // (which in turn may add those pointers to the work queue). 55 // 56 // e. Because GC work is spread across local caches, GC uses a 57 // distributed termination algorithm to detect when there are no 58 // more root marking jobs or grey objects (see gcMarkDone). At this 59 // point, GC transitions to mark termination. 60 // 61 // 3. GC performs mark termination. 62 // 63 // a. Stop the world. 64 // 65 // b. Set gcphase to _GCmarktermination, and disable workers and 66 // assists. 67 // 68 // c. Perform housekeeping like flushing mcaches. 69 // 70 // 4. GC performs the sweep phase. 71 // 72 // a. Prepare for the sweep phase by setting gcphase to _GCoff, 73 // setting up sweep state and disabling the write barrier. 74 // 75 // b. Start the world. From this point on, newly allocated objects 76 // are white, and allocating sweeps spans before use if necessary. 77 // 78 // c. GC does concurrent sweeping in the background and in response 79 // to allocation. See description below. 80 // 81 // 5. When sufficient allocation has taken place, replay the sequence 82 // starting with 1 above. See discussion of GC rate below. 83 84 // Concurrent sweep. 85 // 86 // The sweep phase proceeds concurrently with normal program execution. 87 // The heap is swept span-by-span both lazily (when a goroutine needs another span) 88 // and concurrently in a background goroutine (this helps programs that are not CPU bound). 89 // At the end of STW mark termination all spans are marked as "needs sweeping". 90 // 91 // The background sweeper goroutine simply sweeps spans one-by-one. 92 // 93 // To avoid requesting more OS memory while there are unswept spans, when a 94 // goroutine needs another span, it first attempts to reclaim that much memory 95 // by sweeping. When a goroutine needs to allocate a new small-object span, it 96 // sweeps small-object spans for the same object size until it frees at least 97 // one object. When a goroutine needs to allocate large-object span from heap, 98 // it sweeps spans until it frees at least that many pages into heap. There is 99 // one case where this may not suffice: if a goroutine sweeps and frees two 100 // nonadjacent one-page spans to the heap, it will allocate a new two-page 101 // span, but there can still be other one-page unswept spans which could be 102 // combined into a two-page span. 103 // 104 // It's critical to ensure that no operations proceed on unswept spans (that would corrupt 105 // mark bits in GC bitmap). During GC all mcaches are flushed into the central cache, 106 // so they are empty. When a goroutine grabs a new span into mcache, it sweeps it. 107 // When a goroutine explicitly frees an object or sets a finalizer, it ensures that 108 // the span is swept (either by sweeping it, or by waiting for the concurrent sweep to finish). 109 // The finalizer goroutine is kicked off only when all spans are swept. 110 // When the next GC starts, it sweeps all not-yet-swept spans (if any). 111 112 // GC rate. 113 // Next GC is after we've allocated an extra amount of memory proportional to 114 // the amount already in use. The proportion is controlled by GOGC environment variable 115 // (100 by default). If GOGC=100 and we're using 4M, we'll GC again when we get to 8M 116 // (this mark is computed by the gcController.heapGoal method). This keeps the GC cost in 117 // linear proportion to the allocation cost. Adjusting GOGC just changes the linear constant 118 // (and also the amount of extra memory used). 119 120 // Oblets 121 // 122 // In order to prevent long pauses while scanning large objects and to 123 // improve parallelism, the garbage collector breaks up scan jobs for 124 // objects larger than maxObletBytes into "oblets" of at most 125 // maxObletBytes. When scanning encounters the beginning of a large 126 // object, it scans only the first oblet and enqueues the remaining 127 // oblets as new scan jobs. 128 129 package runtime 130 131 import ( 132 "internal/cpu" 133 "runtime/internal/atomic" 134 "unsafe" 135 ) 136 137 const ( 138 _DebugGC = 0 139 _ConcurrentSweep = true 140 _FinBlockSize = 4 * 1024 141 142 // debugScanConservative enables debug logging for stack 143 // frames that are scanned conservatively. 144 debugScanConservative = false 145 146 // sweepMinHeapDistance is a lower bound on the heap distance 147 // (in bytes) reserved for concurrent sweeping between GC 148 // cycles. 149 sweepMinHeapDistance = 1024 * 1024 150 ) 151 152 // heapObjectsCanMove always returns false in the current garbage collector. 153 // It exists for go4.org/unsafe/assume-no-moving-gc, which is an 154 // unfortunate idea that had an even more unfortunate implementation. 155 // Every time a new Go release happened, the package stopped building, 156 // and the authors had to add a new file with a new //go:build line, and 157 // then the entire ecosystem of packages with that as a dependency had to 158 // explicitly update to the new version. Many packages depend on 159 // assume-no-moving-gc transitively, through paths like 160 // inet.af/netaddr -> go4.org/intern -> assume-no-moving-gc. 161 // This was causing a significant amount of friction around each new 162 // release, so we added this bool for the package to //go:linkname 163 // instead. The bool is still unfortunate, but it's not as bad as 164 // breaking the ecosystem on every new release. 165 // 166 // If the Go garbage collector ever does move heap objects, we can set 167 // this to true to break all the programs using assume-no-moving-gc. 168 // 169 //go:linkname heapObjectsCanMove 170 func heapObjectsCanMove() bool { 171 return false 172 } 173 174 func gcinit() { 175 if unsafe.Sizeof(workbuf{}) != _WorkbufSize { 176 throw("size of Workbuf is suboptimal") 177 } 178 // No sweep on the first cycle. 179 sweep.active.state.Store(sweepDrainedMask) 180 181 // Initialize GC pacer state. 182 // Use the environment variable GOGC for the initial gcPercent value. 183 // Use the environment variable GOMEMLIMIT for the initial memoryLimit value. 184 gcController.init(readGOGC(), readGOMEMLIMIT()) 185 186 work.startSema = 1 187 work.markDoneSema = 1 188 lockInit(&work.sweepWaiters.lock, lockRankSweepWaiters) 189 lockInit(&work.assistQueue.lock, lockRankAssistQueue) 190 lockInit(&work.wbufSpans.lock, lockRankWbufSpans) 191 } 192 193 // gcenable is called after the bulk of the runtime initialization, 194 // just before we're about to start letting user code run. 195 // It kicks off the background sweeper goroutine, the background 196 // scavenger goroutine, and enables GC. 197 func gcenable() { 198 // Kick off sweeping and scavenging. 199 c := make(chan int, 2) 200 go bgsweep(c) 201 go bgscavenge(c) 202 <-c 203 <-c 204 memstats.enablegc = true // now that runtime is initialized, GC is okay 205 } 206 207 // Garbage collector phase. 208 // Indicates to write barrier and synchronization task to perform. 209 var gcphase uint32 210 211 // The compiler knows about this variable. 212 // If you change it, you must change builtin/runtime.go, too. 213 // If you change the first four bytes, you must also change the write 214 // barrier insertion code. 215 var writeBarrier struct { 216 enabled bool // compiler emits a check of this before calling write barrier 217 pad [3]byte // compiler uses 32-bit load for "enabled" field 218 needed bool // identical to enabled, for now (TODO: dedup) 219 alignme uint64 // guarantee alignment so that compiler can use a 32 or 64-bit load 220 } 221 222 // gcBlackenEnabled is 1 if mutator assists and background mark 223 // workers are allowed to blacken objects. This must only be set when 224 // gcphase == _GCmark. 225 var gcBlackenEnabled uint32 226 227 const ( 228 _GCoff = iota // GC not running; sweeping in background, write barrier disabled 229 _GCmark // GC marking roots and workbufs: allocate black, write barrier ENABLED 230 _GCmarktermination // GC mark termination: allocate black, P's help GC, write barrier ENABLED 231 ) 232 233 //go:nosplit 234 func setGCPhase(x uint32) { 235 atomic.Store(&gcphase, x) 236 writeBarrier.needed = gcphase == _GCmark || gcphase == _GCmarktermination 237 writeBarrier.enabled = writeBarrier.needed 238 } 239 240 // gcMarkWorkerMode represents the mode that a concurrent mark worker 241 // should operate in. 242 // 243 // Concurrent marking happens through four different mechanisms. One 244 // is mutator assists, which happen in response to allocations and are 245 // not scheduled. The other three are variations in the per-P mark 246 // workers and are distinguished by gcMarkWorkerMode. 247 type gcMarkWorkerMode int 248 249 const ( 250 // gcMarkWorkerNotWorker indicates that the next scheduled G is not 251 // starting work and the mode should be ignored. 252 gcMarkWorkerNotWorker gcMarkWorkerMode = iota 253 254 // gcMarkWorkerDedicatedMode indicates that the P of a mark 255 // worker is dedicated to running that mark worker. The mark 256 // worker should run without preemption. 257 gcMarkWorkerDedicatedMode 258 259 // gcMarkWorkerFractionalMode indicates that a P is currently 260 // running the "fractional" mark worker. The fractional worker 261 // is necessary when GOMAXPROCS*gcBackgroundUtilization is not 262 // an integer and using only dedicated workers would result in 263 // utilization too far from the target of gcBackgroundUtilization. 264 // The fractional worker should run until it is preempted and 265 // will be scheduled to pick up the fractional part of 266 // GOMAXPROCS*gcBackgroundUtilization. 267 gcMarkWorkerFractionalMode 268 269 // gcMarkWorkerIdleMode indicates that a P is running the mark 270 // worker because it has nothing else to do. The idle worker 271 // should run until it is preempted and account its time 272 // against gcController.idleMarkTime. 273 gcMarkWorkerIdleMode 274 ) 275 276 // gcMarkWorkerModeStrings are the strings labels of gcMarkWorkerModes 277 // to use in execution traces. 278 var gcMarkWorkerModeStrings = [...]string{ 279 "Not worker", 280 "GC (dedicated)", 281 "GC (fractional)", 282 "GC (idle)", 283 } 284 285 // pollFractionalWorkerExit reports whether a fractional mark worker 286 // should self-preempt. It assumes it is called from the fractional 287 // worker. 288 func pollFractionalWorkerExit() bool { 289 // This should be kept in sync with the fractional worker 290 // scheduler logic in findRunnableGCWorker. 291 now := nanotime() 292 delta := now - gcController.markStartTime 293 if delta <= 0 { 294 return true 295 } 296 p := getg().m.p.ptr() 297 selfTime := p.gcFractionalMarkTime + (now - p.gcMarkWorkerStartTime) 298 // Add some slack to the utilization goal so that the 299 // fractional worker isn't behind again the instant it exits. 300 return float64(selfTime)/float64(delta) > 1.2*gcController.fractionalUtilizationGoal 301 } 302 303 var work workType 304 305 type workType struct { 306 full lfstack // lock-free list of full blocks workbuf 307 _ cpu.CacheLinePad // prevents false-sharing between full and empty 308 empty lfstack // lock-free list of empty blocks workbuf 309 _ cpu.CacheLinePad // prevents false-sharing between empty and nproc/nwait 310 311 wbufSpans struct { 312 lock mutex 313 // free is a list of spans dedicated to workbufs, but 314 // that don't currently contain any workbufs. 315 free mSpanList 316 // busy is a list of all spans containing workbufs on 317 // one of the workbuf lists. 318 busy mSpanList 319 } 320 321 // Restore 64-bit alignment on 32-bit. 322 _ uint32 323 324 // bytesMarked is the number of bytes marked this cycle. This 325 // includes bytes blackened in scanned objects, noscan objects 326 // that go straight to black, and permagrey objects scanned by 327 // markroot during the concurrent scan phase. This is updated 328 // atomically during the cycle. Updates may be batched 329 // arbitrarily, since the value is only read at the end of the 330 // cycle. 331 // 332 // Because of benign races during marking, this number may not 333 // be the exact number of marked bytes, but it should be very 334 // close. 335 // 336 // Put this field here because it needs 64-bit atomic access 337 // (and thus 8-byte alignment even on 32-bit architectures). 338 bytesMarked uint64 339 340 markrootNext uint32 // next markroot job 341 markrootJobs uint32 // number of markroot jobs 342 343 nproc uint32 344 tstart int64 345 nwait uint32 346 347 // Number of roots of various root types. Set by gcMarkRootPrepare. 348 // 349 // nStackRoots == len(stackRoots), but we have nStackRoots for 350 // consistency. 351 nDataRoots, nBSSRoots, nSpanRoots, nStackRoots int 352 353 // Base indexes of each root type. Set by gcMarkRootPrepare. 354 baseData, baseBSS, baseSpans, baseStacks, baseEnd uint32 355 356 // stackRoots is a snapshot of all of the Gs that existed 357 // before the beginning of concurrent marking. The backing 358 // store of this must not be modified because it might be 359 // shared with allgs. 360 stackRoots []*g 361 362 // Each type of GC state transition is protected by a lock. 363 // Since multiple threads can simultaneously detect the state 364 // transition condition, any thread that detects a transition 365 // condition must acquire the appropriate transition lock, 366 // re-check the transition condition and return if it no 367 // longer holds or perform the transition if it does. 368 // Likewise, any transition must invalidate the transition 369 // condition before releasing the lock. This ensures that each 370 // transition is performed by exactly one thread and threads 371 // that need the transition to happen block until it has 372 // happened. 373 // 374 // startSema protects the transition from "off" to mark or 375 // mark termination. 376 startSema uint32 377 // markDoneSema protects transitions from mark to mark termination. 378 markDoneSema uint32 379 380 bgMarkReady note // signal background mark worker has started 381 bgMarkDone uint32 // cas to 1 when at a background mark completion point 382 // Background mark completion signaling 383 384 // mode is the concurrency mode of the current GC cycle. 385 mode gcMode 386 387 // userForced indicates the current GC cycle was forced by an 388 // explicit user call. 389 userForced bool 390 391 // initialHeapLive is the value of gcController.heapLive at the 392 // beginning of this GC cycle. 393 initialHeapLive uint64 394 395 // assistQueue is a queue of assists that are blocked because 396 // there was neither enough credit to steal or enough work to 397 // do. 398 assistQueue struct { 399 lock mutex 400 q gQueue 401 } 402 403 // sweepWaiters is a list of blocked goroutines to wake when 404 // we transition from mark termination to sweep. 405 sweepWaiters struct { 406 lock mutex 407 list gList 408 } 409 410 // cycles is the number of completed GC cycles, where a GC 411 // cycle is sweep termination, mark, mark termination, and 412 // sweep. This differs from memstats.numgc, which is 413 // incremented at mark termination. 414 cycles atomic.Uint32 415 416 // Timing/utilization stats for this cycle. 417 stwprocs, maxprocs int32 418 tSweepTerm, tMark, tMarkTerm, tEnd int64 // nanotime() of phase start 419 420 pauseNS int64 // total STW time this cycle 421 pauseStart int64 // nanotime() of last STW 422 423 // debug.gctrace heap sizes for this cycle. 424 heap0, heap1, heap2 uint64 425 426 // Cumulative estimated CPU usage. 427 cpuStats 428 } 429 430 // GC runs a garbage collection and blocks the caller until the 431 // garbage collection is complete. It may also block the entire 432 // program. 433 func GC() { 434 // We consider a cycle to be: sweep termination, mark, mark 435 // termination, and sweep. This function shouldn't return 436 // until a full cycle has been completed, from beginning to 437 // end. Hence, we always want to finish up the current cycle 438 // and start a new one. That means: 439 // 440 // 1. In sweep termination, mark, or mark termination of cycle 441 // N, wait until mark termination N completes and transitions 442 // to sweep N. 443 // 444 // 2. In sweep N, help with sweep N. 445 // 446 // At this point we can begin a full cycle N+1. 447 // 448 // 3. Trigger cycle N+1 by starting sweep termination N+1. 449 // 450 // 4. Wait for mark termination N+1 to complete. 451 // 452 // 5. Help with sweep N+1 until it's done. 453 // 454 // This all has to be written to deal with the fact that the 455 // GC may move ahead on its own. For example, when we block 456 // until mark termination N, we may wake up in cycle N+2. 457 458 // Wait until the current sweep termination, mark, and mark 459 // termination complete. 460 n := work.cycles.Load() 461 gcWaitOnMark(n) 462 463 // We're now in sweep N or later. Trigger GC cycle N+1, which 464 // will first finish sweep N if necessary and then enter sweep 465 // termination N+1. 466 gcStart(gcTrigger{kind: gcTriggerCycle, n: n + 1}) 467 468 // Wait for mark termination N+1 to complete. 469 gcWaitOnMark(n + 1) 470 471 // Finish sweep N+1 before returning. We do this both to 472 // complete the cycle and because runtime.GC() is often used 473 // as part of tests and benchmarks to get the system into a 474 // relatively stable and isolated state. 475 for work.cycles.Load() == n+1 && sweepone() != ^uintptr(0) { 476 sweep.nbgsweep++ 477 Gosched() 478 } 479 480 // Callers may assume that the heap profile reflects the 481 // just-completed cycle when this returns (historically this 482 // happened because this was a STW GC), but right now the 483 // profile still reflects mark termination N, not N+1. 484 // 485 // As soon as all of the sweep frees from cycle N+1 are done, 486 // we can go ahead and publish the heap profile. 487 // 488 // First, wait for sweeping to finish. (We know there are no 489 // more spans on the sweep queue, but we may be concurrently 490 // sweeping spans, so we have to wait.) 491 for work.cycles.Load() == n+1 && !isSweepDone() { 492 Gosched() 493 } 494 495 // Now we're really done with sweeping, so we can publish the 496 // stable heap profile. Only do this if we haven't already hit 497 // another mark termination. 498 mp := acquirem() 499 cycle := work.cycles.Load() 500 if cycle == n+1 || (gcphase == _GCmark && cycle == n+2) { 501 mProf_PostSweep() 502 } 503 releasem(mp) 504 } 505 506 // gcWaitOnMark blocks until GC finishes the Nth mark phase. If GC has 507 // already completed this mark phase, it returns immediately. 508 func gcWaitOnMark(n uint32) { 509 for { 510 // Disable phase transitions. 511 lock(&work.sweepWaiters.lock) 512 nMarks := work.cycles.Load() 513 if gcphase != _GCmark { 514 // We've already completed this cycle's mark. 515 nMarks++ 516 } 517 if nMarks > n { 518 // We're done. 519 unlock(&work.sweepWaiters.lock) 520 return 521 } 522 523 // Wait until sweep termination, mark, and mark 524 // termination of cycle N complete. 525 work.sweepWaiters.list.push(getg()) 526 goparkunlock(&work.sweepWaiters.lock, waitReasonWaitForGCCycle, traceBlockUntilGCEnds, 1) 527 } 528 } 529 530 // gcMode indicates how concurrent a GC cycle should be. 531 type gcMode int 532 533 const ( 534 gcBackgroundMode gcMode = iota // concurrent GC and sweep 535 gcForceMode // stop-the-world GC now, concurrent sweep 536 gcForceBlockMode // stop-the-world GC now and STW sweep (forced by user) 537 ) 538 539 // A gcTrigger is a predicate for starting a GC cycle. Specifically, 540 // it is an exit condition for the _GCoff phase. 541 type gcTrigger struct { 542 kind gcTriggerKind 543 now int64 // gcTriggerTime: current time 544 n uint32 // gcTriggerCycle: cycle number to start 545 } 546 547 type gcTriggerKind int 548 549 const ( 550 // gcTriggerHeap indicates that a cycle should be started when 551 // the heap size reaches the trigger heap size computed by the 552 // controller. 553 gcTriggerHeap gcTriggerKind = iota 554 555 // gcTriggerTime indicates that a cycle should be started when 556 // it's been more than forcegcperiod nanoseconds since the 557 // previous GC cycle. 558 gcTriggerTime 559 560 // gcTriggerCycle indicates that a cycle should be started if 561 // we have not yet started cycle number gcTrigger.n (relative 562 // to work.cycles). 563 gcTriggerCycle 564 ) 565 566 // test reports whether the trigger condition is satisfied, meaning 567 // that the exit condition for the _GCoff phase has been met. The exit 568 // condition should be tested when allocating. 569 func (t gcTrigger) test() bool { 570 if !memstats.enablegc || panicking.Load() != 0 || gcphase != _GCoff { 571 return false 572 } 573 switch t.kind { 574 case gcTriggerHeap: 575 // Non-atomic access to gcController.heapLive for performance. If 576 // we are going to trigger on this, this thread just 577 // atomically wrote gcController.heapLive anyway and we'll see our 578 // own write. 579 trigger, _ := gcController.trigger() 580 return gcController.heapLive.Load() >= trigger 581 case gcTriggerTime: 582 if gcController.gcPercent.Load() < 0 { 583 return false 584 } 585 lastgc := int64(atomic.Load64(&memstats.last_gc_nanotime)) 586 return lastgc != 0 && t.now-lastgc > forcegcperiod 587 case gcTriggerCycle: 588 // t.n > work.cycles, but accounting for wraparound. 589 return int32(t.n-work.cycles.Load()) > 0 590 } 591 return true 592 } 593 594 // gcStart starts the GC. It transitions from _GCoff to _GCmark (if 595 // debug.gcstoptheworld == 0) or performs all of GC (if 596 // debug.gcstoptheworld != 0). 597 // 598 // This may return without performing this transition in some cases, 599 // such as when called on a system stack or with locks held. 600 func gcStart(trigger gcTrigger) { 601 // Since this is called from malloc and malloc is called in 602 // the guts of a number of libraries that might be holding 603 // locks, don't attempt to start GC in non-preemptible or 604 // potentially unstable situations. 605 mp := acquirem() 606 if gp := getg(); gp == mp.g0 || mp.locks > 1 || mp.preemptoff != "" { 607 releasem(mp) 608 return 609 } 610 releasem(mp) 611 mp = nil 612 613 // Pick up the remaining unswept/not being swept spans concurrently 614 // 615 // This shouldn't happen if we're being invoked in background 616 // mode since proportional sweep should have just finished 617 // sweeping everything, but rounding errors, etc, may leave a 618 // few spans unswept. In forced mode, this is necessary since 619 // GC can be forced at any point in the sweeping cycle. 620 // 621 // We check the transition condition continuously here in case 622 // this G gets delayed in to the next GC cycle. 623 for trigger.test() && sweepone() != ^uintptr(0) { 624 sweep.nbgsweep++ 625 } 626 627 // Perform GC initialization and the sweep termination 628 // transition. 629 semacquire(&work.startSema) 630 // Re-check transition condition under transition lock. 631 if !trigger.test() { 632 semrelease(&work.startSema) 633 return 634 } 635 636 // In gcstoptheworld debug mode, upgrade the mode accordingly. 637 // We do this after re-checking the transition condition so 638 // that multiple goroutines that detect the heap trigger don't 639 // start multiple STW GCs. 640 mode := gcBackgroundMode 641 if debug.gcstoptheworld == 1 { 642 mode = gcForceMode 643 } else if debug.gcstoptheworld == 2 { 644 mode = gcForceBlockMode 645 } 646 647 // Ok, we're doing it! Stop everybody else 648 semacquire(&gcsema) 649 semacquire(&worldsema) 650 651 // For stats, check if this GC was forced by the user. 652 // Update it under gcsema to avoid gctrace getting wrong values. 653 work.userForced = trigger.kind == gcTriggerCycle 654 655 if traceEnabled() { 656 traceGCStart() 657 } 658 659 // Check that all Ps have finished deferred mcache flushes. 660 for _, p := range allp { 661 if fg := p.mcache.flushGen.Load(); fg != mheap_.sweepgen { 662 println("runtime: p", p.id, "flushGen", fg, "!= sweepgen", mheap_.sweepgen) 663 throw("p mcache not flushed") 664 } 665 } 666 667 gcBgMarkStartWorkers() 668 669 systemstack(gcResetMarkState) 670 671 work.stwprocs, work.maxprocs = gomaxprocs, gomaxprocs 672 if work.stwprocs > ncpu { 673 // This is used to compute CPU time of the STW phases, 674 // so it can't be more than ncpu, even if GOMAXPROCS is. 675 work.stwprocs = ncpu 676 } 677 work.heap0 = gcController.heapLive.Load() 678 work.pauseNS = 0 679 work.mode = mode 680 681 now := nanotime() 682 work.tSweepTerm = now 683 work.pauseStart = now 684 systemstack(func() { stopTheWorldWithSema(stwGCSweepTerm) }) 685 // Finish sweep before we start concurrent scan. 686 systemstack(func() { 687 finishsweep_m() 688 }) 689 690 // clearpools before we start the GC. If we wait they memory will not be 691 // reclaimed until the next GC cycle. 692 clearpools() 693 694 work.cycles.Add(1) 695 696 // Assists and workers can start the moment we start 697 // the world. 698 gcController.startCycle(now, int(gomaxprocs), trigger) 699 700 // Notify the CPU limiter that assists may begin. 701 gcCPULimiter.startGCTransition(true, now) 702 703 // In STW mode, disable scheduling of user Gs. This may also 704 // disable scheduling of this goroutine, so it may block as 705 // soon as we start the world again. 706 if mode != gcBackgroundMode { 707 schedEnableUser(false) 708 } 709 710 // Enter concurrent mark phase and enable 711 // write barriers. 712 // 713 // Because the world is stopped, all Ps will 714 // observe that write barriers are enabled by 715 // the time we start the world and begin 716 // scanning. 717 // 718 // Write barriers must be enabled before assists are 719 // enabled because they must be enabled before 720 // any non-leaf heap objects are marked. Since 721 // allocations are blocked until assists can 722 // happen, we want enable assists as early as 723 // possible. 724 setGCPhase(_GCmark) 725 726 gcBgMarkPrepare() // Must happen before assist enable. 727 gcMarkRootPrepare() 728 729 // Mark all active tinyalloc blocks. Since we're 730 // allocating from these, they need to be black like 731 // other allocations. The alternative is to blacken 732 // the tiny block on every allocation from it, which 733 // would slow down the tiny allocator. 734 gcMarkTinyAllocs() 735 736 // At this point all Ps have enabled the write 737 // barrier, thus maintaining the no white to 738 // black invariant. Enable mutator assists to 739 // put back-pressure on fast allocating 740 // mutators. 741 atomic.Store(&gcBlackenEnabled, 1) 742 743 // In STW mode, we could block the instant systemstack 744 // returns, so make sure we're not preemptible. 745 mp = acquirem() 746 747 // Concurrent mark. 748 systemstack(func() { 749 now = startTheWorldWithSema() 750 work.pauseNS += now - work.pauseStart 751 work.tMark = now 752 memstats.gcPauseDist.record(now - work.pauseStart) 753 754 sweepTermCpu := int64(work.stwprocs) * (work.tMark - work.tSweepTerm) 755 work.cpuStats.gcPauseTime += sweepTermCpu 756 work.cpuStats.gcTotalTime += sweepTermCpu 757 758 // Release the CPU limiter. 759 gcCPULimiter.finishGCTransition(now) 760 }) 761 762 // Release the world sema before Gosched() in STW mode 763 // because we will need to reacquire it later but before 764 // this goroutine becomes runnable again, and we could 765 // self-deadlock otherwise. 766 semrelease(&worldsema) 767 releasem(mp) 768 769 // Make sure we block instead of returning to user code 770 // in STW mode. 771 if mode != gcBackgroundMode { 772 Gosched() 773 } 774 775 semrelease(&work.startSema) 776 } 777 778 // gcMarkDoneFlushed counts the number of P's with flushed work. 779 // 780 // Ideally this would be a captured local in gcMarkDone, but forEachP 781 // escapes its callback closure, so it can't capture anything. 782 // 783 // This is protected by markDoneSema. 784 var gcMarkDoneFlushed uint32 785 786 // gcMarkDone transitions the GC from mark to mark termination if all 787 // reachable objects have been marked (that is, there are no grey 788 // objects and can be no more in the future). Otherwise, it flushes 789 // all local work to the global queues where it can be discovered by 790 // other workers. 791 // 792 // This should be called when all local mark work has been drained and 793 // there are no remaining workers. Specifically, when 794 // 795 // work.nwait == work.nproc && !gcMarkWorkAvailable(p) 796 // 797 // The calling context must be preemptible. 798 // 799 // Flushing local work is important because idle Ps may have local 800 // work queued. This is the only way to make that work visible and 801 // drive GC to completion. 802 // 803 // It is explicitly okay to have write barriers in this function. If 804 // it does transition to mark termination, then all reachable objects 805 // have been marked, so the write barrier cannot shade any more 806 // objects. 807 func gcMarkDone() { 808 // Ensure only one thread is running the ragged barrier at a 809 // time. 810 semacquire(&work.markDoneSema) 811 812 top: 813 // Re-check transition condition under transition lock. 814 // 815 // It's critical that this checks the global work queues are 816 // empty before performing the ragged barrier. Otherwise, 817 // there could be global work that a P could take after the P 818 // has passed the ragged barrier. 819 if !(gcphase == _GCmark && work.nwait == work.nproc && !gcMarkWorkAvailable(nil)) { 820 semrelease(&work.markDoneSema) 821 return 822 } 823 824 // forEachP needs worldsema to execute, and we'll need it to 825 // stop the world later, so acquire worldsema now. 826 semacquire(&worldsema) 827 828 // Flush all local buffers and collect flushedWork flags. 829 gcMarkDoneFlushed = 0 830 systemstack(func() { 831 gp := getg().m.curg 832 // Mark the user stack as preemptible so that it may be scanned. 833 // Otherwise, our attempt to force all P's to a safepoint could 834 // result in a deadlock as we attempt to preempt a worker that's 835 // trying to preempt us (e.g. for a stack scan). 836 casGToWaiting(gp, _Grunning, waitReasonGCMarkTermination) 837 forEachP(func(pp *p) { 838 // Flush the write barrier buffer, since this may add 839 // work to the gcWork. 840 wbBufFlush1(pp) 841 842 // Flush the gcWork, since this may create global work 843 // and set the flushedWork flag. 844 // 845 // TODO(austin): Break up these workbufs to 846 // better distribute work. 847 pp.gcw.dispose() 848 // Collect the flushedWork flag. 849 if pp.gcw.flushedWork { 850 atomic.Xadd(&gcMarkDoneFlushed, 1) 851 pp.gcw.flushedWork = false 852 } 853 }) 854 casgstatus(gp, _Gwaiting, _Grunning) 855 }) 856 857 if gcMarkDoneFlushed != 0 { 858 // More grey objects were discovered since the 859 // previous termination check, so there may be more 860 // work to do. Keep going. It's possible the 861 // transition condition became true again during the 862 // ragged barrier, so re-check it. 863 semrelease(&worldsema) 864 goto top 865 } 866 867 // There was no global work, no local work, and no Ps 868 // communicated work since we took markDoneSema. Therefore 869 // there are no grey objects and no more objects can be 870 // shaded. Transition to mark termination. 871 now := nanotime() 872 work.tMarkTerm = now 873 work.pauseStart = now 874 getg().m.preemptoff = "gcing" 875 systemstack(func() { stopTheWorldWithSema(stwGCMarkTerm) }) 876 // The gcphase is _GCmark, it will transition to _GCmarktermination 877 // below. The important thing is that the wb remains active until 878 // all marking is complete. This includes writes made by the GC. 879 880 // There is sometimes work left over when we enter mark termination due 881 // to write barriers performed after the completion barrier above. 882 // Detect this and resume concurrent mark. This is obviously 883 // unfortunate. 884 // 885 // See issue #27993 for details. 886 // 887 // Switch to the system stack to call wbBufFlush1, though in this case 888 // it doesn't matter because we're non-preemptible anyway. 889 restart := false 890 systemstack(func() { 891 for _, p := range allp { 892 wbBufFlush1(p) 893 if !p.gcw.empty() { 894 restart = true 895 break 896 } 897 } 898 }) 899 if restart { 900 getg().m.preemptoff = "" 901 systemstack(func() { 902 now := startTheWorldWithSema() 903 work.pauseNS += now - work.pauseStart 904 memstats.gcPauseDist.record(now - work.pauseStart) 905 }) 906 semrelease(&worldsema) 907 goto top 908 } 909 910 gcComputeStartingStackSize() 911 912 // Disable assists and background workers. We must do 913 // this before waking blocked assists. 914 atomic.Store(&gcBlackenEnabled, 0) 915 916 // Notify the CPU limiter that GC assists will now cease. 917 gcCPULimiter.startGCTransition(false, now) 918 919 // Wake all blocked assists. These will run when we 920 // start the world again. 921 gcWakeAllAssists() 922 923 // Likewise, release the transition lock. Blocked 924 // workers and assists will run when we start the 925 // world again. 926 semrelease(&work.markDoneSema) 927 928 // In STW mode, re-enable user goroutines. These will be 929 // queued to run after we start the world. 930 schedEnableUser(true) 931 932 // endCycle depends on all gcWork cache stats being flushed. 933 // The termination algorithm above ensured that up to 934 // allocations since the ragged barrier. 935 gcController.endCycle(now, int(gomaxprocs), work.userForced) 936 937 // Perform mark termination. This will restart the world. 938 gcMarkTermination() 939 } 940 941 // World must be stopped and mark assists and background workers must be 942 // disabled. 943 func gcMarkTermination() { 944 // Start marktermination (write barrier remains enabled for now). 945 setGCPhase(_GCmarktermination) 946 947 work.heap1 = gcController.heapLive.Load() 948 startTime := nanotime() 949 950 mp := acquirem() 951 mp.preemptoff = "gcing" 952 mp.traceback = 2 953 curgp := mp.curg 954 casGToWaiting(curgp, _Grunning, waitReasonGarbageCollection) 955 956 // Run gc on the g0 stack. We do this so that the g stack 957 // we're currently running on will no longer change. Cuts 958 // the root set down a bit (g0 stacks are not scanned, and 959 // we don't need to scan gc's internal state). We also 960 // need to switch to g0 so we can shrink the stack. 961 systemstack(func() { 962 gcMark(startTime) 963 // Must return immediately. 964 // The outer function's stack may have moved 965 // during gcMark (it shrinks stacks, including the 966 // outer function's stack), so we must not refer 967 // to any of its variables. Return back to the 968 // non-system stack to pick up the new addresses 969 // before continuing. 970 }) 971 972 systemstack(func() { 973 work.heap2 = work.bytesMarked 974 if debug.gccheckmark > 0 { 975 // Run a full non-parallel, stop-the-world 976 // mark using checkmark bits, to check that we 977 // didn't forget to mark anything during the 978 // concurrent mark process. 979 startCheckmarks() 980 gcResetMarkState() 981 gcw := &getg().m.p.ptr().gcw 982 gcDrain(gcw, 0) 983 wbBufFlush1(getg().m.p.ptr()) 984 gcw.dispose() 985 endCheckmarks() 986 } 987 988 // marking is complete so we can turn the write barrier off 989 setGCPhase(_GCoff) 990 gcSweep(work.mode) 991 }) 992 993 mp.traceback = 0 994 casgstatus(curgp, _Gwaiting, _Grunning) 995 996 if traceEnabled() { 997 traceGCDone() 998 } 999 1000 // all done 1001 mp.preemptoff = "" 1002 1003 if gcphase != _GCoff { 1004 throw("gc done but gcphase != _GCoff") 1005 } 1006 1007 // Record heapInUse for scavenger. 1008 memstats.lastHeapInUse = gcController.heapInUse.load() 1009 1010 // Update GC trigger and pacing, as well as downstream consumers 1011 // of this pacing information, for the next cycle. 1012 systemstack(gcControllerCommit) 1013 1014 // Update timing memstats 1015 now := nanotime() 1016 sec, nsec, _ := time_now() 1017 unixNow := sec*1e9 + int64(nsec) 1018 work.pauseNS += now - work.pauseStart 1019 work.tEnd = now 1020 memstats.gcPauseDist.record(now - work.pauseStart) 1021 atomic.Store64(&memstats.last_gc_unix, uint64(unixNow)) // must be Unix time to make sense to user 1022 atomic.Store64(&memstats.last_gc_nanotime, uint64(now)) // monotonic time for us 1023 memstats.pause_ns[memstats.numgc%uint32(len(memstats.pause_ns))] = uint64(work.pauseNS) 1024 memstats.pause_end[memstats.numgc%uint32(len(memstats.pause_end))] = uint64(unixNow) 1025 memstats.pause_total_ns += uint64(work.pauseNS) 1026 1027 markTermCpu := int64(work.stwprocs) * (work.tEnd - work.tMarkTerm) 1028 work.cpuStats.gcPauseTime += markTermCpu 1029 work.cpuStats.gcTotalTime += markTermCpu 1030 1031 // Accumulate CPU stats. 1032 // 1033 // Pass gcMarkPhase=true so we can get all the latest GC CPU stats in there too. 1034 work.cpuStats.accumulate(now, true) 1035 1036 // Compute overall GC CPU utilization. 1037 // Omit idle marking time from the overall utilization here since it's "free". 1038 memstats.gc_cpu_fraction = float64(work.cpuStats.gcTotalTime-work.cpuStats.gcIdleTime) / float64(work.cpuStats.totalTime) 1039 1040 // Reset assist time and background time stats. 1041 // 1042 // Do this now, instead of at the start of the next GC cycle, because 1043 // these two may keep accumulating even if the GC is not active. 1044 scavenge.assistTime.Store(0) 1045 scavenge.backgroundTime.Store(0) 1046 1047 // Reset idle time stat. 1048 sched.idleTime.Store(0) 1049 1050 // Reset sweep state. 1051 sweep.nbgsweep = 0 1052 sweep.npausesweep = 0 1053 1054 if work.userForced { 1055 memstats.numforcedgc++ 1056 } 1057 1058 // Bump GC cycle count and wake goroutines waiting on sweep. 1059 lock(&work.sweepWaiters.lock) 1060 memstats.numgc++ 1061 injectglist(&work.sweepWaiters.list) 1062 unlock(&work.sweepWaiters.lock) 1063 1064 // Increment the scavenge generation now. 1065 // 1066 // This moment represents peak heap in use because we're 1067 // about to start sweeping. 1068 mheap_.pages.scav.index.nextGen() 1069 1070 // Release the CPU limiter. 1071 gcCPULimiter.finishGCTransition(now) 1072 1073 // Finish the current heap profiling cycle and start a new 1074 // heap profiling cycle. We do this before starting the world 1075 // so events don't leak into the wrong cycle. 1076 mProf_NextCycle() 1077 1078 // There may be stale spans in mcaches that need to be swept. 1079 // Those aren't tracked in any sweep lists, so we need to 1080 // count them against sweep completion until we ensure all 1081 // those spans have been forced out. 1082 sl := sweep.active.begin() 1083 if !sl.valid { 1084 throw("failed to set sweep barrier") 1085 } 1086 1087 systemstack(func() { startTheWorldWithSema() }) 1088 1089 // Flush the heap profile so we can start a new cycle next GC. 1090 // This is relatively expensive, so we don't do it with the 1091 // world stopped. 1092 mProf_Flush() 1093 1094 // Prepare workbufs for freeing by the sweeper. We do this 1095 // asynchronously because it can take non-trivial time. 1096 prepareFreeWorkbufs() 1097 1098 // Free stack spans. This must be done between GC cycles. 1099 systemstack(freeStackSpans) 1100 1101 // Ensure all mcaches are flushed. Each P will flush its own 1102 // mcache before allocating, but idle Ps may not. Since this 1103 // is necessary to sweep all spans, we need to ensure all 1104 // mcaches are flushed before we start the next GC cycle. 1105 // 1106 // While we're here, flush the page cache for idle Ps to avoid 1107 // having pages get stuck on them. These pages are hidden from 1108 // the scavenger, so in small idle heaps a significant amount 1109 // of additional memory might be held onto. 1110 // 1111 // Also, flush the pinner cache, to avoid leaking that memory 1112 // indefinitely. 1113 systemstack(func() { 1114 forEachP(func(pp *p) { 1115 pp.mcache.prepareForSweep() 1116 if pp.status == _Pidle { 1117 systemstack(func() { 1118 lock(&mheap_.lock) 1119 pp.pcache.flush(&mheap_.pages) 1120 unlock(&mheap_.lock) 1121 }) 1122 } 1123 pp.pinnerCache = nil 1124 }) 1125 }) 1126 // Now that we've swept stale spans in mcaches, they don't 1127 // count against unswept spans. 1128 sweep.active.end(sl) 1129 1130 // Print gctrace before dropping worldsema. As soon as we drop 1131 // worldsema another cycle could start and smash the stats 1132 // we're trying to print. 1133 if debug.gctrace > 0 { 1134 util := int(memstats.gc_cpu_fraction * 100) 1135 1136 var sbuf [24]byte 1137 printlock() 1138 print("gc ", memstats.numgc, 1139 " @", string(itoaDiv(sbuf[:], uint64(work.tSweepTerm-runtimeInitTime)/1e6, 3)), "s ", 1140 util, "%: ") 1141 prev := work.tSweepTerm 1142 for i, ns := range []int64{work.tMark, work.tMarkTerm, work.tEnd} { 1143 if i != 0 { 1144 print("+") 1145 } 1146 print(string(fmtNSAsMS(sbuf[:], uint64(ns-prev)))) 1147 prev = ns 1148 } 1149 print(" ms clock, ") 1150 for i, ns := range []int64{ 1151 int64(work.stwprocs) * (work.tMark - work.tSweepTerm), 1152 gcController.assistTime.Load(), 1153 gcController.dedicatedMarkTime.Load() + gcController.fractionalMarkTime.Load(), 1154 gcController.idleMarkTime.Load(), 1155 markTermCpu, 1156 } { 1157 if i == 2 || i == 3 { 1158 // Separate mark time components with /. 1159 print("/") 1160 } else if i != 0 { 1161 print("+") 1162 } 1163 print(string(fmtNSAsMS(sbuf[:], uint64(ns)))) 1164 } 1165 print(" ms cpu, ", 1166 work.heap0>>20, "->", work.heap1>>20, "->", work.heap2>>20, " MB, ", 1167 gcController.lastHeapGoal>>20, " MB goal, ", 1168 gcController.lastStackScan.Load()>>20, " MB stacks, ", 1169 gcController.globalsScan.Load()>>20, " MB globals, ", 1170 work.maxprocs, " P") 1171 if work.userForced { 1172 print(" (forced)") 1173 } 1174 print("\n") 1175 printunlock() 1176 } 1177 1178 // Set any arena chunks that were deferred to fault. 1179 lock(&userArenaState.lock) 1180 faultList := userArenaState.fault 1181 userArenaState.fault = nil 1182 unlock(&userArenaState.lock) 1183 for _, lc := range faultList { 1184 lc.mspan.setUserArenaChunkToFault() 1185 } 1186 1187 // Enable huge pages on some metadata if we cross a heap threshold. 1188 if gcController.heapGoal() > minHeapForMetadataHugePages { 1189 mheap_.enableMetadataHugePages() 1190 } 1191 1192 semrelease(&worldsema) 1193 semrelease(&gcsema) 1194 // Careful: another GC cycle may start now. 1195 1196 releasem(mp) 1197 mp = nil 1198 1199 // now that gc is done, kick off finalizer thread if needed 1200 if !concurrentSweep { 1201 // give the queued finalizers, if any, a chance to run 1202 Gosched() 1203 } 1204 } 1205 1206 // gcBgMarkStartWorkers prepares background mark worker goroutines. These 1207 // goroutines will not run until the mark phase, but they must be started while 1208 // the work is not stopped and from a regular G stack. The caller must hold 1209 // worldsema. 1210 func gcBgMarkStartWorkers() { 1211 // Background marking is performed by per-P G's. Ensure that each P has 1212 // a background GC G. 1213 // 1214 // Worker Gs don't exit if gomaxprocs is reduced. If it is raised 1215 // again, we can reuse the old workers; no need to create new workers. 1216 for gcBgMarkWorkerCount < gomaxprocs { 1217 go gcBgMarkWorker() 1218 1219 notetsleepg(&work.bgMarkReady, -1) 1220 noteclear(&work.bgMarkReady) 1221 // The worker is now guaranteed to be added to the pool before 1222 // its P's next findRunnableGCWorker. 1223 1224 gcBgMarkWorkerCount++ 1225 } 1226 } 1227 1228 // gcBgMarkPrepare sets up state for background marking. 1229 // Mutator assists must not yet be enabled. 1230 func gcBgMarkPrepare() { 1231 // Background marking will stop when the work queues are empty 1232 // and there are no more workers (note that, since this is 1233 // concurrent, this may be a transient state, but mark 1234 // termination will clean it up). Between background workers 1235 // and assists, we don't really know how many workers there 1236 // will be, so we pretend to have an arbitrarily large number 1237 // of workers, almost all of which are "waiting". While a 1238 // worker is working it decrements nwait. If nproc == nwait, 1239 // there are no workers. 1240 work.nproc = ^uint32(0) 1241 work.nwait = ^uint32(0) 1242 } 1243 1244 // gcBgMarkWorkerNode is an entry in the gcBgMarkWorkerPool. It points to a single 1245 // gcBgMarkWorker goroutine. 1246 type gcBgMarkWorkerNode struct { 1247 // Unused workers are managed in a lock-free stack. This field must be first. 1248 node lfnode 1249 1250 // The g of this worker. 1251 gp guintptr 1252 1253 // Release this m on park. This is used to communicate with the unlock 1254 // function, which cannot access the G's stack. It is unused outside of 1255 // gcBgMarkWorker(). 1256 m muintptr 1257 } 1258 1259 func gcBgMarkWorker() { 1260 gp := getg() 1261 1262 // We pass node to a gopark unlock function, so it can't be on 1263 // the stack (see gopark). Prevent deadlock from recursively 1264 // starting GC by disabling preemption. 1265 gp.m.preemptoff = "GC worker init" 1266 node := new(gcBgMarkWorkerNode) 1267 gp.m.preemptoff = "" 1268 1269 node.gp.set(gp) 1270 1271 node.m.set(acquirem()) 1272 notewakeup(&work.bgMarkReady) 1273 // After this point, the background mark worker is generally scheduled 1274 // cooperatively by gcController.findRunnableGCWorker. While performing 1275 // work on the P, preemption is disabled because we are working on 1276 // P-local work buffers. When the preempt flag is set, this puts itself 1277 // into _Gwaiting to be woken up by gcController.findRunnableGCWorker 1278 // at the appropriate time. 1279 // 1280 // When preemption is enabled (e.g., while in gcMarkDone), this worker 1281 // may be preempted and schedule as a _Grunnable G from a runq. That is 1282 // fine; it will eventually gopark again for further scheduling via 1283 // findRunnableGCWorker. 1284 // 1285 // Since we disable preemption before notifying bgMarkReady, we 1286 // guarantee that this G will be in the worker pool for the next 1287 // findRunnableGCWorker. This isn't strictly necessary, but it reduces 1288 // latency between _GCmark starting and the workers starting. 1289 1290 for { 1291 // Go to sleep until woken by 1292 // gcController.findRunnableGCWorker. 1293 gopark(func(g *g, nodep unsafe.Pointer) bool { 1294 node := (*gcBgMarkWorkerNode)(nodep) 1295 1296 if mp := node.m.ptr(); mp != nil { 1297 // The worker G is no longer running; release 1298 // the M. 1299 // 1300 // N.B. it is _safe_ to release the M as soon 1301 // as we are no longer performing P-local mark 1302 // work. 1303 // 1304 // However, since we cooperatively stop work 1305 // when gp.preempt is set, if we releasem in 1306 // the loop then the following call to gopark 1307 // would immediately preempt the G. This is 1308 // also safe, but inefficient: the G must 1309 // schedule again only to enter gopark and park 1310 // again. Thus, we defer the release until 1311 // after parking the G. 1312 releasem(mp) 1313 } 1314 1315 // Release this G to the pool. 1316 gcBgMarkWorkerPool.push(&node.node) 1317 // Note that at this point, the G may immediately be 1318 // rescheduled and may be running. 1319 return true 1320 }, unsafe.Pointer(node), waitReasonGCWorkerIdle, traceBlockSystemGoroutine, 0) 1321 1322 // Preemption must not occur here, or another G might see 1323 // p.gcMarkWorkerMode. 1324 1325 // Disable preemption so we can use the gcw. If the 1326 // scheduler wants to preempt us, we'll stop draining, 1327 // dispose the gcw, and then preempt. 1328 node.m.set(acquirem()) 1329 pp := gp.m.p.ptr() // P can't change with preemption disabled. 1330 1331 if gcBlackenEnabled == 0 { 1332 println("worker mode", pp.gcMarkWorkerMode) 1333 throw("gcBgMarkWorker: blackening not enabled") 1334 } 1335 1336 if pp.gcMarkWorkerMode == gcMarkWorkerNotWorker { 1337 throw("gcBgMarkWorker: mode not set") 1338 } 1339 1340 startTime := nanotime() 1341 pp.gcMarkWorkerStartTime = startTime 1342 var trackLimiterEvent bool 1343 if pp.gcMarkWorkerMode == gcMarkWorkerIdleMode { 1344 trackLimiterEvent = pp.limiterEvent.start(limiterEventIdleMarkWork, startTime) 1345 } 1346 1347 decnwait := atomic.Xadd(&work.nwait, -1) 1348 if decnwait == work.nproc { 1349 println("runtime: work.nwait=", decnwait, "work.nproc=", work.nproc) 1350 throw("work.nwait was > work.nproc") 1351 } 1352 1353 systemstack(func() { 1354 // Mark our goroutine preemptible so its stack 1355 // can be scanned. This lets two mark workers 1356 // scan each other (otherwise, they would 1357 // deadlock). We must not modify anything on 1358 // the G stack. However, stack shrinking is 1359 // disabled for mark workers, so it is safe to 1360 // read from the G stack. 1361 casGToWaiting(gp, _Grunning, waitReasonGCWorkerActive) 1362 switch pp.gcMarkWorkerMode { 1363 default: 1364 throw("gcBgMarkWorker: unexpected gcMarkWorkerMode") 1365 case gcMarkWorkerDedicatedMode: 1366 gcDrain(&pp.gcw, gcDrainUntilPreempt|gcDrainFlushBgCredit) 1367 if gp.preempt { 1368 // We were preempted. This is 1369 // a useful signal to kick 1370 // everything out of the run 1371 // queue so it can run 1372 // somewhere else. 1373 if drainQ, n := runqdrain(pp); n > 0 { 1374 lock(&sched.lock) 1375 globrunqputbatch(&drainQ, int32(n)) 1376 unlock(&sched.lock) 1377 } 1378 } 1379 // Go back to draining, this time 1380 // without preemption. 1381 gcDrain(&pp.gcw, gcDrainFlushBgCredit) 1382 case gcMarkWorkerFractionalMode: 1383 gcDrain(&pp.gcw, gcDrainFractional|gcDrainUntilPreempt|gcDrainFlushBgCredit) 1384 case gcMarkWorkerIdleMode: 1385 gcDrain(&pp.gcw, gcDrainIdle|gcDrainUntilPreempt|gcDrainFlushBgCredit) 1386 } 1387 casgstatus(gp, _Gwaiting, _Grunning) 1388 }) 1389 1390 // Account for time and mark us as stopped. 1391 now := nanotime() 1392 duration := now - startTime 1393 gcController.markWorkerStop(pp.gcMarkWorkerMode, duration) 1394 if trackLimiterEvent { 1395 pp.limiterEvent.stop(limiterEventIdleMarkWork, now) 1396 } 1397 if pp.gcMarkWorkerMode == gcMarkWorkerFractionalMode { 1398 atomic.Xaddint64(&pp.gcFractionalMarkTime, duration) 1399 } 1400 1401 // Was this the last worker and did we run out 1402 // of work? 1403 incnwait := atomic.Xadd(&work.nwait, +1) 1404 if incnwait > work.nproc { 1405 println("runtime: p.gcMarkWorkerMode=", pp.gcMarkWorkerMode, 1406 "work.nwait=", incnwait, "work.nproc=", work.nproc) 1407 throw("work.nwait > work.nproc") 1408 } 1409 1410 // We'll releasem after this point and thus this P may run 1411 // something else. We must clear the worker mode to avoid 1412 // attributing the mode to a different (non-worker) G in 1413 // traceGoStart. 1414 pp.gcMarkWorkerMode = gcMarkWorkerNotWorker 1415 1416 // If this worker reached a background mark completion 1417 // point, signal the main GC goroutine. 1418 if incnwait == work.nproc && !gcMarkWorkAvailable(nil) { 1419 // We don't need the P-local buffers here, allow 1420 // preemption because we may schedule like a regular 1421 // goroutine in gcMarkDone (block on locks, etc). 1422 releasem(node.m.ptr()) 1423 node.m.set(nil) 1424 1425 gcMarkDone() 1426 } 1427 } 1428 } 1429 1430 // gcMarkWorkAvailable reports whether executing a mark worker 1431 // on p is potentially useful. p may be nil, in which case it only 1432 // checks the global sources of work. 1433 func gcMarkWorkAvailable(p *p) bool { 1434 if p != nil && !p.gcw.empty() { 1435 return true 1436 } 1437 if !work.full.empty() { 1438 return true // global work available 1439 } 1440 if work.markrootNext < work.markrootJobs { 1441 return true // root scan work available 1442 } 1443 return false 1444 } 1445 1446 // gcMark runs the mark (or, for concurrent GC, mark termination) 1447 // All gcWork caches must be empty. 1448 // STW is in effect at this point. 1449 func gcMark(startTime int64) { 1450 if debug.allocfreetrace > 0 { 1451 tracegc() 1452 } 1453 1454 if gcphase != _GCmarktermination { 1455 throw("in gcMark expecting to see gcphase as _GCmarktermination") 1456 } 1457 work.tstart = startTime 1458 1459 // Check that there's no marking work remaining. 1460 if work.full != 0 || work.markrootNext < work.markrootJobs { 1461 print("runtime: full=", hex(work.full), " next=", work.markrootNext, " jobs=", work.markrootJobs, " nDataRoots=", work.nDataRoots, " nBSSRoots=", work.nBSSRoots, " nSpanRoots=", work.nSpanRoots, " nStackRoots=", work.nStackRoots, "\n") 1462 panic("non-empty mark queue after concurrent mark") 1463 } 1464 1465 if debug.gccheckmark > 0 { 1466 // This is expensive when there's a large number of 1467 // Gs, so only do it if checkmark is also enabled. 1468 gcMarkRootCheck() 1469 } 1470 1471 // Drop allg snapshot. allgs may have grown, in which case 1472 // this is the only reference to the old backing store and 1473 // there's no need to keep it around. 1474 work.stackRoots = nil 1475 1476 // Clear out buffers and double-check that all gcWork caches 1477 // are empty. This should be ensured by gcMarkDone before we 1478 // enter mark termination. 1479 // 1480 // TODO: We could clear out buffers just before mark if this 1481 // has a non-negligible impact on STW time. 1482 for _, p := range allp { 1483 // The write barrier may have buffered pointers since 1484 // the gcMarkDone barrier. However, since the barrier 1485 // ensured all reachable objects were marked, all of 1486 // these must be pointers to black objects. Hence we 1487 // can just discard the write barrier buffer. 1488 if debug.gccheckmark > 0 { 1489 // For debugging, flush the buffer and make 1490 // sure it really was all marked. 1491 wbBufFlush1(p) 1492 } else { 1493 p.wbBuf.reset() 1494 } 1495 1496 gcw := &p.gcw 1497 if !gcw.empty() { 1498 printlock() 1499 print("runtime: P ", p.id, " flushedWork ", gcw.flushedWork) 1500 if gcw.wbuf1 == nil { 1501 print(" wbuf1=<nil>") 1502 } else { 1503 print(" wbuf1.n=", gcw.wbuf1.nobj) 1504 } 1505 if gcw.wbuf2 == nil { 1506 print(" wbuf2=<nil>") 1507 } else { 1508 print(" wbuf2.n=", gcw.wbuf2.nobj) 1509 } 1510 print("\n") 1511 throw("P has cached GC work at end of mark termination") 1512 } 1513 // There may still be cached empty buffers, which we 1514 // need to flush since we're going to free them. Also, 1515 // there may be non-zero stats because we allocated 1516 // black after the gcMarkDone barrier. 1517 gcw.dispose() 1518 } 1519 1520 // Flush scanAlloc from each mcache since we're about to modify 1521 // heapScan directly. If we were to flush this later, then scanAlloc 1522 // might have incorrect information. 1523 // 1524 // Note that it's not important to retain this information; we know 1525 // exactly what heapScan is at this point via scanWork. 1526 for _, p := range allp { 1527 c := p.mcache 1528 if c == nil { 1529 continue 1530 } 1531 c.scanAlloc = 0 1532 } 1533 1534 // Reset controller state. 1535 gcController.resetLive(work.bytesMarked) 1536 } 1537 1538 // gcSweep must be called on the system stack because it acquires the heap 1539 // lock. See mheap for details. 1540 // 1541 // The world must be stopped. 1542 // 1543 //go:systemstack 1544 func gcSweep(mode gcMode) { 1545 assertWorldStopped() 1546 1547 if gcphase != _GCoff { 1548 throw("gcSweep being done but phase is not GCoff") 1549 } 1550 1551 lock(&mheap_.lock) 1552 mheap_.sweepgen += 2 1553 sweep.active.reset() 1554 mheap_.pagesSwept.Store(0) 1555 mheap_.sweepArenas = mheap_.allArenas 1556 mheap_.reclaimIndex.Store(0) 1557 mheap_.reclaimCredit.Store(0) 1558 unlock(&mheap_.lock) 1559 1560 sweep.centralIndex.clear() 1561 1562 if !_ConcurrentSweep || mode == gcForceBlockMode { 1563 // Special case synchronous sweep. 1564 // Record that no proportional sweeping has to happen. 1565 lock(&mheap_.lock) 1566 mheap_.sweepPagesPerByte = 0 1567 unlock(&mheap_.lock) 1568 // Sweep all spans eagerly. 1569 for sweepone() != ^uintptr(0) { 1570 sweep.npausesweep++ 1571 } 1572 // Free workbufs eagerly. 1573 prepareFreeWorkbufs() 1574 for freeSomeWbufs(false) { 1575 } 1576 // All "free" events for this mark/sweep cycle have 1577 // now happened, so we can make this profile cycle 1578 // available immediately. 1579 mProf_NextCycle() 1580 mProf_Flush() 1581 return 1582 } 1583 1584 // Background sweep. 1585 lock(&sweep.lock) 1586 if sweep.parked { 1587 sweep.parked = false 1588 ready(sweep.g, 0, true) 1589 } 1590 unlock(&sweep.lock) 1591 } 1592 1593 // gcResetMarkState resets global state prior to marking (concurrent 1594 // or STW) and resets the stack scan state of all Gs. 1595 // 1596 // This is safe to do without the world stopped because any Gs created 1597 // during or after this will start out in the reset state. 1598 // 1599 // gcResetMarkState must be called on the system stack because it acquires 1600 // the heap lock. See mheap for details. 1601 // 1602 //go:systemstack 1603 func gcResetMarkState() { 1604 // This may be called during a concurrent phase, so lock to make sure 1605 // allgs doesn't change. 1606 forEachG(func(gp *g) { 1607 gp.gcscandone = false // set to true in gcphasework 1608 gp.gcAssistBytes = 0 1609 }) 1610 1611 // Clear page marks. This is just 1MB per 64GB of heap, so the 1612 // time here is pretty trivial. 1613 lock(&mheap_.lock) 1614 arenas := mheap_.allArenas 1615 unlock(&mheap_.lock) 1616 for _, ai := range arenas { 1617 ha := mheap_.arenas[ai.l1()][ai.l2()] 1618 for i := range ha.pageMarks { 1619 ha.pageMarks[i] = 0 1620 } 1621 } 1622 1623 work.bytesMarked = 0 1624 work.initialHeapLive = gcController.heapLive.Load() 1625 } 1626 1627 // Hooks for other packages 1628 1629 var poolcleanup func() 1630 var boringCaches []unsafe.Pointer // for crypto/internal/boring 1631 1632 //go:linkname sync_runtime_registerPoolCleanup sync.runtime_registerPoolCleanup 1633 func sync_runtime_registerPoolCleanup(f func()) { 1634 poolcleanup = f 1635 } 1636 1637 //go:linkname boring_registerCache crypto/internal/boring/bcache.registerCache 1638 func boring_registerCache(p unsafe.Pointer) { 1639 boringCaches = append(boringCaches, p) 1640 } 1641 1642 func clearpools() { 1643 // clear sync.Pools 1644 if poolcleanup != nil { 1645 poolcleanup() 1646 } 1647 1648 // clear boringcrypto caches 1649 for _, p := range boringCaches { 1650 atomicstorep(p, nil) 1651 } 1652 1653 // Clear central sudog cache. 1654 // Leave per-P caches alone, they have strictly bounded size. 1655 // Disconnect cached list before dropping it on the floor, 1656 // so that a dangling ref to one entry does not pin all of them. 1657 lock(&sched.sudoglock) 1658 var sg, sgnext *sudog 1659 for sg = sched.sudogcache; sg != nil; sg = sgnext { 1660 sgnext = sg.next 1661 sg.next = nil 1662 } 1663 sched.sudogcache = nil 1664 unlock(&sched.sudoglock) 1665 1666 // Clear central defer pool. 1667 // Leave per-P pools alone, they have strictly bounded size. 1668 lock(&sched.deferlock) 1669 // disconnect cached list before dropping it on the floor, 1670 // so that a dangling ref to one entry does not pin all of them. 1671 var d, dlink *_defer 1672 for d = sched.deferpool; d != nil; d = dlink { 1673 dlink = d.link 1674 d.link = nil 1675 } 1676 sched.deferpool = nil 1677 unlock(&sched.deferlock) 1678 } 1679 1680 // Timing 1681 1682 // itoaDiv formats val/(10**dec) into buf. 1683 func itoaDiv(buf []byte, val uint64, dec int) []byte { 1684 i := len(buf) - 1 1685 idec := i - dec 1686 for val >= 10 || i >= idec { 1687 buf[i] = byte(val%10 + '0') 1688 i-- 1689 if i == idec { 1690 buf[i] = '.' 1691 i-- 1692 } 1693 val /= 10 1694 } 1695 buf[i] = byte(val + '0') 1696 return buf[i:] 1697 } 1698 1699 // fmtNSAsMS nicely formats ns nanoseconds as milliseconds. 1700 func fmtNSAsMS(buf []byte, ns uint64) []byte { 1701 if ns >= 10e6 { 1702 // Format as whole milliseconds. 1703 return itoaDiv(buf, ns/1e6, 0) 1704 } 1705 // Format two digits of precision, with at most three decimal places. 1706 x := ns / 1e3 1707 if x == 0 { 1708 buf[0] = '0' 1709 return buf[:1] 1710 } 1711 dec := 3 1712 for x >= 100 { 1713 x /= 10 1714 dec-- 1715 } 1716 return itoaDiv(buf, x, dec) 1717 } 1718 1719 // Helpers for testing GC. 1720 1721 // gcTestMoveStackOnNextCall causes the stack to be moved on a call 1722 // immediately following the call to this. It may not work correctly 1723 // if any other work appears after this call (such as returning). 1724 // Typically the following call should be marked go:noinline so it 1725 // performs a stack check. 1726 // 1727 // In rare cases this may not cause the stack to move, specifically if 1728 // there's a preemption between this call and the next. 1729 func gcTestMoveStackOnNextCall() { 1730 gp := getg() 1731 gp.stackguard0 = stackForceMove 1732 } 1733 1734 // gcTestIsReachable performs a GC and returns a bit set where bit i 1735 // is set if ptrs[i] is reachable. 1736 func gcTestIsReachable(ptrs ...unsafe.Pointer) (mask uint64) { 1737 // This takes the pointers as unsafe.Pointers in order to keep 1738 // them live long enough for us to attach specials. After 1739 // that, we drop our references to them. 1740 1741 if len(ptrs) > 64 { 1742 panic("too many pointers for uint64 mask") 1743 } 1744 1745 // Block GC while we attach specials and drop our references 1746 // to ptrs. Otherwise, if a GC is in progress, it could mark 1747 // them reachable via this function before we have a chance to 1748 // drop them. 1749 semacquire(&gcsema) 1750 1751 // Create reachability specials for ptrs. 1752 specials := make([]*specialReachable, len(ptrs)) 1753 for i, p := range ptrs { 1754 lock(&mheap_.speciallock) 1755 s := (*specialReachable)(mheap_.specialReachableAlloc.alloc()) 1756 unlock(&mheap_.speciallock) 1757 s.special.kind = _KindSpecialReachable 1758 if !addspecial(p, &s.special) { 1759 throw("already have a reachable special (duplicate pointer?)") 1760 } 1761 specials[i] = s 1762 // Make sure we don't retain ptrs. 1763 ptrs[i] = nil 1764 } 1765 1766 semrelease(&gcsema) 1767 1768 // Force a full GC and sweep. 1769 GC() 1770 1771 // Process specials. 1772 for i, s := range specials { 1773 if !s.done { 1774 printlock() 1775 println("runtime: object", i, "was not swept") 1776 throw("IsReachable failed") 1777 } 1778 if s.reachable { 1779 mask |= 1 << i 1780 } 1781 lock(&mheap_.speciallock) 1782 mheap_.specialReachableAlloc.free(unsafe.Pointer(s)) 1783 unlock(&mheap_.speciallock) 1784 } 1785 1786 return mask 1787 } 1788 1789 // gcTestPointerClass returns the category of what p points to, one of: 1790 // "heap", "stack", "data", "bss", "other". This is useful for checking 1791 // that a test is doing what it's intended to do. 1792 // 1793 // This is nosplit simply to avoid extra pointer shuffling that may 1794 // complicate a test. 1795 // 1796 //go:nosplit 1797 func gcTestPointerClass(p unsafe.Pointer) string { 1798 p2 := uintptr(noescape(p)) 1799 gp := getg() 1800 if gp.stack.lo <= p2 && p2 < gp.stack.hi { 1801 return "stack" 1802 } 1803 if base, _, _ := findObject(p2, 0, 0); base != 0 { 1804 return "heap" 1805 } 1806 for _, datap := range activeModules() { 1807 if datap.data <= p2 && p2 < datap.edata || datap.noptrdata <= p2 && p2 < datap.enoptrdata { 1808 return "data" 1809 } 1810 if datap.bss <= p2 && p2 < datap.ebss || datap.noptrbss <= p2 && p2 <= datap.enoptrbss { 1811 return "bss" 1812 } 1813 } 1814 KeepAlive(p) 1815 return "other" 1816 } 1817