PJ Malloy (@thepudds)
January 2025
Discussion at https://go.dev/issue/74299.
CL stack at https://go.dev/cl/673695.
This covers a design and initial implementation for a mechanism
to free user memory eagerly to the Go runtime. The freed memory can be reused
immediately in subsequent allocations without needing to wait for the GC cycle to progress.
The mechanism is implemented in a new runtime function, runtime.freegc, which is inserted
by the compiler in cases where it can prove the memory is no longer used.
A call to runtime.freegc records memory as dead and tracks it for
later reuse. The allocator is updated to be able to safely
reuse these heap objects, and care is taken so that runtime.freegc is able
to coexist and cooperate with the concurrently running garbage collector.
Updates to the compiler include automatically inserting calls
to runtime.freegc such as when the compiler proves heap allocated memory
does not live past a function return and therefore may be safely freed
on function exit.
The compiler is also updated to do a form of alias analysis so that even escaping heap memory can be immediately freed once it is dead in some cases, such as the intermediate memory for appends of a non-aliased slice in a loop.
When enabled, runtime.freegc allows faster reuse of user memory,
reducing the allocation rate from the GC's perspective, and thus
reducing the frequency of GC cycles, total GC CPU usage, and how often
write barriers are enabled. It can also produce a more cache-friendly
allocation pattern for user code.
This design composes well with the compiler's historic ability to do stack allocation of statically-known sizes by filling in gaps that stack allocation cannot currently handle. It also composes well with possible future improvements like:
w.Write(b) would allow b to be stack allocated, but which
would be limited to only helping statically-sized slices unless there is an
ability to hook into something like runtime.freegc. Other improvements also
become more broadly applicable and hence more worthwhile if runtime.freegc exists.There is also a reasonably rich group of direct follow-on targets for runtime.freegc
that likely become available, and it becomes another tool in the toolbox for
the compiler and runtime.
Today, users can implement their own free lists and achieve some of the memory reuse
benefits described here. However, runtime.freegc is implemented at a low level of
the runtime and has various advantages like being able to reuse memory across types
more safely and easily, more directly hook into what the GC is doing, more efficiently
track reusable objects, and even change the GC shape of reusable objects, which is
something even unsafe user code cannot do.
Updating the compiler translates to the reuse being both safer and more automated than hand written code using a custom free list.
And of course, the goal is to speed up safe, idiomatic Go code without requiring a variety of ad-hoc user written free lists.
In an attempt to help triangulate on what is possible for the runtime and compiler to do together to reduce how much work the GC must do, this document proposes updates to:
The runtime: implement runtime.freegc and associated runtime entry-points.
Freed objects are recorded in one of two types of new free lists, and
are then immediately reusable for the next allocation in the same span class
via updates to the allocator. Challenges here include:
The compiler: automatically insert runtime calls to track and free slices
allocated via make when the compiler can prove it is safe to do so. Also implemented
is automatically inserting calls to free the intermediate memory
from repeated calls to append in a loop (without freeing the final slice) when it
can prove the memory is unaliased. This is done regardless of whether
the final slice escapes (for example, even if the final slice is returned from
the function that built the slice using append in a loop).
A sample benchmark included below shows strings.Builder operating
roughly 2x faster.
An experimental modification to reflect to use runtime.freegc
and then using that reflect with json/v2 gave reported memory
allocation reductions of -43.7%, -32.9%, -21.9%, -22.0%, -1.0%
for the 5 official real-world unmarshalling benchmarks from
go-json-experiment/jsonbench by the authors of json/v2, covering
the CanadaGeometry through TwitterStatus datasets.
A separate CL updates the builtin maps to free backing map storage during growth and split to free logically dead table memory that is not held by map iteration.
Note: there is no current intent to modify the standard library to have
explicit calls to runtime.freegc, and of course such an ability
would never be exposed to end-user code. The reflect and other standard
library modifications were experiments that were helpful while designing
and implementing the runtime side prior to some of the compiler work.
Additional steps that are natural follow-ons (but not yet attempted) include:
Recognizing append in loops that start with a slice of unknown provenance,
such as a function parameter, but still be able to automatically
free intermediate memory starting with the second slice growth if the slice
does not otherwise become aliased.
Recognizing append in an iterator-based loop to be able to prove it is
safe to free unaliased intermediate memory, such as in the patterns
shown in slices.Collect. (Update: this is now implemented but not yet in review.)
Possibly extend to freeing more than just slices. Slices are a natural first step, but the large majority of the work implemented is independent of whether the heap object represents a slice. For example, on function exit, we likely will be able to free the table memory for non-escaping maps.
There are currently four primary runtime entry-points with working implementations, plus some additional variations on these that are not listed here. There are additional details with more draft API doc towards the end of this document.
freegc is the primary entry-point for freeing user memory. It is emitted directly
by the compiler and also underlies the implementation of the other free* runtime
entry-points.
func freegc(ptr unsafe.Pointer, size uintptr, noscan bool)
growsliceNoAlias is used during an append. It is like the existing growslice,
but only for the case where we know there are no aliases for
the backing memory of the slice, and hence we can free the old backing
memory as the slice is updated with new, larger backing memory. Importantly, this
works even if the slice escapes, such as if the resulting slice is returned
from a function. growsliceNoAlias does both the grow and the free.
func growsliceNoAlias(oldPtr *any, newLen, oldCap, num int, et *byte) (ary []any)
makeslicetracked64 (and variations) allocates a heap object and records
or "tracks" the allocated object's pointer. makeslicetracked64 is like
the existing makeslice64, but fills in a trackedObjs for later
use by freegcTracked. The compiler is responsible for reserving stack space
for a properly sized backing array for trackedObj based on how many objects
will be tracked, which gives the runtime the space it needs to write down
its tracking information. (An alternative not yet attempted might be to use
the standard makeslice64 and have compiler-generated code update the tracking
information.)
func makeslicetracked64(et *_type, len64 int64, cap64 int64, trackedObjs *[]trackedObj) unsafe.Pointer
freegcTracked is automatically inserted by the compiler in cases where we know the
lifetime of a heap-allocated object is tied to a certain scope (usually a function),
which allows for tracked heap objects to be automatically freed as a scope is exited.
freegcTracked must be paired with functions like makeslicetracked64.
func freegcTracked(trackedObjs *[]trackedObj)
An additional runtime API was temporarily used by some CLs as a
stand-in for makeslicetracked64 for convenience, but there was no
plan to use it beyond helping to bootstrap some of the work:
// TEMPORARY API.
func trackSlice(sl unsafe.Pointer, trackedObjs *[]trackedObj)
freegc vs. freegcTrackedThis section summarizes some differences between freegc and freegcTracked
and attempts to explain why freegcTracked exists at all.
First, note that freegc accepts an unsafe.Pointer and hence keeps the pointer
alive. It therefore could be a pessimization in some cases
(such as a long-lived function or function that straddles GC transitions)
if the caller does not call freegc before or roughly when
the liveness analysis of the compiler would otherwise have determined
the heap object is reclaimable by the GC.
For certain allocations or in certain loops, we directly insert freegc
via the compiler almost exactly at the point where the memory goes dead (or
similarly insert a layer on top of freegc like growsliceNoAlias).
freegc says that the memory is immediately available for reuse,
which is especially valuable in a loop or when reusing memory is otherwise
useful without waiting to exit a function.
However, we would be giving up on many valuable targets if we did not also take advantage of escape analysis results (based on its current implementation and based on some escape analysis modifications as part of this work).
Escape analysis today provides primarily function-scoped lifetime results.
If we merely added freegc to function exit points to free objects with function-scoped
lifetimes, that would likely be a pessimization if the object otherwise would have
been dead before the function exit.
This is part of the reason freegcTracked exists and takes a different approach
than freegc. Our tracking of an allocation does not keep the tracked memory alive,
therefore avoiding the liveness pessimization described above when used at scope
exit points. Another benefit of freegcTracked is that it frees multiple
objects at scope exit, which means fewer calls into the runtime and less emitted
code (including compared to if we were to instead have one freegc
per object at scope exit).
In addition, a given pointer today can go dead at multiple places in a function,
which would also add complexity if we were to directly use freegc automatically
wherever the pointer goes dead, in contrast to freegcTracked effectively being
a single clean up operation as the function exits.
Another difference is freegcTracked is essentially a two-step process -- first
there must be a tracked allocation, and later there is a freegcTracked call
that operates on those tracked allocation(s). In contrast, freegc can
operate on any heap allocation as long as we know it is safe to do so.
Summarizing freegc vs. freegcTracked:
freegc freegcTracked
------------------------------------- ---------------------------
frees single heap object can free multiple heap objects
lower-level primitive implemented on top of freegc
operates on any heap allocation paired with tracked allocation function
call keeps pointer alive tracking does not keep pointer alive
primarily intended for use the moment primarily intended for use at
a pointer is otherwise dead scope exit
The initial implementation of tracking avoided keeping the pointer alive
via an approach of storing a uintptr on the stack along with a sequence number (which is
incremented as the GC cycle progresses) to then later know whether it is
still valid to use the stored uintptr to free its heap object as the scope is exited.
While reviewing this work, Keith Randall recently suggested an
improvement to the implementation of freegcTracked where the runtime zeros out
the tracked pointers on the stack as the GC cycle progresses, which is somewhat
similar in effect to a weak pointer. The GC-related portions of the runtime
would take a more active role now in the tracking, rather than the freegcTracked code
inferring what has happened based on the sequence number. This new approach might
touch more pieces of the code base, but it is better overall, and the intent is to now
implement this improvement. The general characteristics of freegcTracked
described above still apply (except now there is no sequence number), and most
of the associated implementation is unaffected.
There are more details on these functions at the bottom of this document, including draft API doc.
The changes described in this subsection do not rely on runtime.freegcTracked,
but do rely on runtime.freegc and runtime.growsliceNoAlias.
As part of this work, the compiler has been updated to prove whether certain variables
are aliased or not, and to also examine append calls to see if it can prove
whether the slice is not aliased.
If so, the compiler automatically inserts a call to the new
runtime function runtime.growsliceNoAlias (briefly described above),
which frees the old backing memory after slice growth is complete
and the old storage is logically dead.
This works with multiple append calls for the same slice,
including inside loops, and the final slice memory can be
escaping, such as returned from the function. (In that case,
the final slice value is of course not freed.)
An example target is:
func f1(input []int) []int {
var s []int
for _, x := range input {
s = append(s, g(x))
}
return s
}
After proving that the slice s cannot be aliased in that loop,
the compiler converts that roughly to:
func f1(input []int) []int {
var s []int
for _, x := range input {
// s cannot be aliased here.
// The old backing array for s is freed inside growsliceNoAlias after growth.
// The final slice is not freed. It is returned to the caller of f1.
// Note: this is simplified from the actual signature for growsliceNoAlias.
s = growsliceNoAlias(s, g(x))
}
return s
}
Note that proving a slice is unaliased must be careful in the presence
of loops (or gotos that form unstructured loops). In this next example,
we do not treat s as entirely unaliased because it is aliased on
the second pass through the loop:
var alias []int
func f2() {
var s []int
for i := range 10 {
s = append(s, i)
alias = s
}
}
For some additional details on the overall approach, see the new
src/cmd/compile/internal/escape/alias.go file in
CL 710015 that is part of this work (especially
the (*aliasAnalysis).analyze method).
By tracking aliasing and renamings across function inlinings (as
well as some support for recognizing whether a goto is an unstructured loop),
the compiler is also able to automatically free intermediate memory from appends
used with iterators, such as in:
s := slices.Collect(seq)
or
var s2 []int
s2 = slices.AppendSeq(s2, seqA)
s2 = slices.AppendSeq(s2, seqB)
s2 = slices.AppendSeq(s2, seqC)
(An earlier version of this work was only able to handle these iterator cases
via manual edits to the slices package; that is no longer needed).
makeThe changes described in this subsection do rely on runtime.freegcTracked.
Additional CLs update the compiler to automatically insert free calls for slices allocated via
make when the compiler can prove it is safe to do so.
Consider for example the following code:
func f1(size int) {
s := make([]int64, size) // heap allocated if > 32 bytes
_ = s
}
In today's compiler, the backing array for s is not marked as escaping,
but will be heap allocated if larger than 32 bytes.
As part of this work, the compiler has been updated so that the
backing array for s is recognized as non-escaping but unknown size and
it is valid to free it when f1 is done.
The compiler automatically transforms f1 to roughly:
func f1(size int) {
var freeablesArr [1]struct { // compiler counts how many are needed (one, in this case)
ptr uintptr
// Possibly 1-2 additional fields in future.
}
freeables := freeablesArr[:0]
defer freegcTracked(&freeables)
s := make([]int64, size)
trackSlice(unsafe.Pointer(&s), &freeables) // NOTE: only temporarily two runtime calls
_ = s
}
TODO: remove the trackSlice from this example, which was only a temporary API used to help
bootstrap this work.
TODO: mention make-like uses of append, such as append([]byte(nil), make([]byte, n)...).
The freeablesArr is stored on the stack and automatically sized by the compiler based
on the number of target sites. The defer is currently inserted in a way so that it is
an open-coded defer and hence has low performance impact, though the plan is to switch
to inserting the calls without relying on defer.
For more details on freegcTracked, see the high-level description above or the API doc below.
The transformed code just above shows what is currently implemented, but two items seen there that might change as the implementation is polished:
freegcTracked currently takes a slice for convenience, but we could make that tighter.makeslicetracked64 (for example, in f1 above),
but the compiler initially inserted calls to a new runtime.trackSlice instead,
which initially seemed slightly easier to implement on the compiler side.
The plan is to have the compiler use makeslicetracked64 instead.If there are multiple target sites within a function, such as in f2 here:
func f2(size1 int, size2 int, x bool) {
s1 := make([]int64, size1) // heap allocated if > 32 bytes
if x {
s2 := make([]int64, size2) // heap allocated if > 32 bytes
_ = s2
}
_ = s1
}
The compiler counts the max number of sites (not sensitive to conditionals),
and sizes freeablesArr appropriately (two element array in this case),
with a single call to freegcTracked per function:
func f2(size1 int, size2 int, x bool) {
var freeablesArr [2]struct { // compiler counts how many are needed (two, in this case)
ptr uintptr
// Possibly 1-2 additional fields in future.
}
freeables := freeablesArr[:0]
defer freegcTracked(&freeables) // only a single freegcTracked call, regardless of how many tracked sites
s1 := make([]int64, size1)
trackSlice(unsafe.Pointer(&s1), &freeables)
if x {
s2 := make([]int64, size2)
trackSlice(unsafe.Pointer(&s2), &freeables) // NOTE: only temporarily two runtime calls
_ = s2
}
_ = s1
}
TODO: remove the trackSlice from this example, which was only a temporary API used to help
bootstrap this work.
The max number of automatically inserted call sites is bounded
based on the count of make calls in a given function.
Inside for loops or when a goto looks like it is
being used in an unstructured loop, the compiler
currently skips automatically inserting these calls.
However, a modest extension (and the current plan) is to have the compiler instead use a different runtime entry point for freeing in loops. The idea is to allow heap-allocated slice memory to be reclaimed on each loop iteration when it returns to the same allocation site, and when it's safe to do so. Because it's the same allocation site, this doesn't require unbounded or dynamic space in the trackedObj slice. (The compiler can already prove it is safe and the runtime entry point is prototyped.)
strings.BuilderAs an early experiment, we updated strings.Builder to have manual calls
to runtime.freegc, though there is no intent to actually manually
modify strings.Builder.
strings.Builder is a somewhat
low-level part of the stdlib that already helps encapsulate some
performance-related trickery, including using unsafe and
internal/bytealg.MakeNoZero. The strings.Builder already owns its in-progress
buffer that is being built up, until it returns a usually final string.
The existing internal comment says:
// External users should never get direct access to this buffer, since
// the slice at some point will be converted to a string using unsafe,
// also data between len(buf) and cap(buf) might be uninitialized.
buf []byte
If we update strings.Builder to explicitly call runtime.freegc on
that buf after a resize operation (but without freeing the usually final
incarnation of buf that will be returned to the user as a string via Builder.String), we
can see some nice benchmark results on the existing strings benchmarks
that call strings.(*Builder).Write N times and then call String. Here, the
(uncommon) case of a single Write is not helped (given it never
resizes after first alloc if there is only one Write), but the impact grows
such that it is up to ~2x faster as there are more resize operations due
to more Write calls:
│ disabled.out │ new-free-20.txt │
│ sec/op │ sec/op vs base │
BuildString_Builder/1Write_36Bytes_NoGrow-4 55.82n ± 2% 55.86n ± 2% ~ (p=0.794 n=20)
BuildString_Builder/2Write_36Bytes_NoGrow-4 125.2n ± 2% 115.4n ± 1% -7.86% (p=0.000 n=20)
BuildString_Builder/3Write_36Bytes_NoGrow-4 224.0n ± 1% 188.2n ± 2% -16.00% (p=0.000 n=20)
BuildString_Builder/5Write_36Bytes_NoGrow-4 239.1n ± 9% 205.1n ± 1% -14.20% (p=0.000 n=20)
BuildString_Builder/8Write_36Bytes_NoGrow-4 422.8n ± 3% 325.4n ± 1% -23.04% (p=0.000 n=20)
BuildString_Builder/10Write_36Bytes_NoGrow-4 436.9n ± 2% 342.3n ± 1% -21.64% (p=0.000 n=20)
BuildString_Builder/100Write_36Bytes_NoGrow-4 4.403µ ± 1% 2.381µ ± 2% -45.91% (p=0.000 n=20)
BuildString_Builder/1000Write_36Bytes_NoGrow-4 48.28µ ± 2% 21.38µ ± 2% -55.71% (p=0.000 n=20)
There is no plan to hand insert calls to runtime.freegc within strings.Builder
(including there are alternatives that could be used within
strings.Builder, such as a segmented buffer and/or sync.Pool
such as used within cmd/compile). We are mainly using it here as a
small-ish concrete example.
In addition, after doing the hand edited strings.Builder prototype,
the compiler work has since progressed further, and I have some hope for
the compiler to be able to automatically free intermediate memory
for many common uses of strings.Builder via growsliceNoAlias
(with roughly 2 of 3 steps towards that goal currently implemented;
strings.Builder is more complex than an ordinary use of append
for a few reasons).
There are other potential optimizations for strings.Builder,
like allocating an initial buf (e.g., 64 bytes)
on the first Write, faster growth than current 1.25x growth for larger
slices, and "right sizing" the final string; an earlier cut did
implement those optimizations.
For the cases where a normal allocation is happening without any reuse, some initial micro-benchmarks suggest the impact of these changes could be small to negligible (at least with GOAMD64=v3):
goos: linux
goarch: amd64
pkg: runtime
cpu: AMD EPYC 7B13
│ base-512M-v3.bench │ ps16-512M-goamd64-v3.bench │
│ sec/op │ sec/op vs base │
Malloc8-16 11.01n ± 1% 10.94n ± 1% -0.68% (p=0.038 n=20)
Malloc16-16 17.15n ± 1% 17.05n ± 0% -0.55% (p=0.007 n=20)
Malloc32-16 18.65n ± 1% 18.42n ± 0% -1.26% (p=0.000 n=20)
MallocTypeInfo8-16 18.63n ± 0% 18.36n ± 0% -1.45% (p=0.000 n=20)
MallocTypeInfo16-16 22.32n ± 0% 22.65n ± 0% +1.50% (p=0.000 n=20)
MallocTypeInfo32-16 23.37n ± 0% 23.89n ± 0% +2.23% (p=0.000 n=20)
geomean 18.02n 18.01n -0.05%
This section lists the runtime APIs that are automatically inserted by the compiler.
Note: the term "logically dead" here is distinct from the compiler's liveness analysis, and the runtime free and re-use implementations must be (and attempt to be) robust in sequences such as a first alloc of a given pointer, GC observes and queues the pointer, runtime.free, GC dequeues pointer for scanning, runtime re-uses the same pointer, all within a single GC phase. (TODO: more precise term than "logically dead"?)
// freegc records that a heap object is reusable and available for
// immediate reuse in a subsequent mallocgc allocation, without
// needing to wait for the GC cycle to progress.
//
// The information is recorded in a free list stored in the
// current P's mcache. The caller must pass in the user size
// and whether the object has pointers, which allows a faster free
// operation.
//
// freegc must be called by the effective owner of ptr who knows
// the pointer is logically dead, with no possible aliases that might
// be used past that moment. In other words, ptr must be the
// last and only pointer to its referent.
//
// The intended callers are the runtime and compiler-generated calls.
//
// ptr must point to a heap object or into the current g's stack,
// in which case freegc is a no-op. In particular, ptr must not point
// to memory in the data or bss sections, which is partially enforced.
// For objects with a malloc header, ptr should point mallocHeaderSize bytes
// past the base; otherwise, ptr should point to the base of the heap object.
// In other words, ptr should be the same pointer that was returned by mallocgc.
//
// In addition, the caller must know that ptr's object has no specials, such
// as might have been created by a call to SetFinalizer or AddCleanup.
// (Internally, the runtime deals appropriately with internally-created
// specials, such as specials for memory profiling).
//
// If the size of ptr's object is less than 16 bytes or greater than
// 32KiB - gc.MallocHeaderSize bytes, freegc is currently a no-op. It must only
// be called in alloc-safe places.
//
// Note that freegc accepts an unsafe.Pointer and hence keeps the pointer
// alive. It therefore could be a pessimization in some cases (such
// as a long-lived function) if the caller does not call freegc before
// or roughly when the liveness analysis of the compiler
// would otherwise have determined ptr's object is reclaimable by the GC.
func freegc(ptr unsafe.Pointer, size uintptr, noscan bool) bool
// growsliceNoAlias is like growslice but only for the case where
// we know that oldPtr is not aliased.
//
// In other words, the caller must know that there are no other references
// to the backing memory of the slice being grown aside from the slice header
// that will be updated with new backing memory when growsliceNoAlias
// returns, and therefore oldPtr must be the only pointer to its referent
// aside from the slice header updated by the returned slice.
func growsliceNoAlias(oldPtr unsafe.Pointer, newLen, oldCap, num int, et *_type) slice
// makeslicetracked64 is like the existing makeslice64, but the caller
// also passes in a pointer to a trackedObj slice that allows the runtime
// to track the heap-allocated backing array and possibly later
// free the tracked objects via freegcTracked. The contents of
// the trackedObj slice are opaque to the caller, but the caller
// provides the space for the trackedObj slice.
//
// Currently, the only intended caller is code inserted by
// the compiler, which only does so when escape analysis can
// prove the associated pointer has a known scope -- currently,
// when it can prove the pointer does not outlive the current
// function. This effectively creates a scope for the
// associated heap pointers. The compiler also arranges for
// freegcTracked to be called on function exit, so that
// the heap objects are automatically freed when leaving the
// scope. The compiler also arranges for the trackedObj slice's
// backing array to be placed on the user's normal goroutine
// stack to provide storage for the tracking information,
// with that storage matching the lifetime of the tracked
// heap objects (that is, both match the lifetime of
// the function call).
func makeslicetracked64(et *_type, len64 int64, cap64 int64, trackedObjs *[]trackedObj) unsafe.Pointer
// freegcTracked marks as reusable the objects allocated by
// calls to makeslicetracked64 and tracked in the trackedObjs
// slice, unless the GC has already possibly reclaimed
// those objects.
//
// The trackedObj slice does not keep the associated pointers
// alive past the start of a mark cycle, and
// freegcTracked becomes a no-op if the GC cycle
// has progressed such that the GC might have already reclaimed
// the tracked pointers before freegcTracked was called.
// This might happen for example if a function call lasts
// multiple GC cycles.
//
// The compiler statically counts the maximum number of
// possible makeslicetracked64 calls in a function
// in order to statically know how much user
// stack space is needed for the trackedObj slice.
func freegcTracked(trackedObjs *[]trackedObj)
// trackedObj is used to track a heap-allocated object that
// has a proven scope and can potentially be freed by freegcTracked.
//
// trackedObj does not keep the tracked object alive.
//
// trackedObj is opaque outside the allocation and free code.
//
// TODO: update description to describe runtime zeroing.
//
// TODO: there is a plan to allow a trackedObj to have specials,
// including user-created via SetFinalizer or AddCleanup, but
// not yet implemented. (In short -- update runtime.addspecial
// to update a bitmap of objIndexes that have specials so
// that the fast path for allocation & free does not need to
// walk an mspan's specials linked list, and then a trackedObj with
// specials will not be reused. An alternative might be to
// sweep the span and process the specials after reaching
// the end of an mspan in the mcache if there are reusable
// objects with specials, but that might be harder or perhaps
// infeasible. More generally, sweeping sooner than the
// normal GC but not in the allocation/free fast path (perhaps
// in a goroutine managed by the runtime), might be worth
// exploring for other reasons than just specials).
type trackedObj struct {
ptr uintptr // tracked object
// Future: possibly 1-2 additional metadata fields to speed up free.
}
An earlier version of this work also implemented an API that just took an
unsafe.Pointer, without any size or noscan parameters, like:
func free(ptr unsafe.Pointer)
However, this was more expensive, primarily due to needing to call
spanOf, findObject or similar. Early measurements suggested that
passing the size and noscan parameters like in freegc allowed
eliminating roughly 60% of the cost of a free operation.
(A small historical note: the more explicit API taking size and noscan parameters
was originally named freeSized but was later renamed to freegc; the older freeSized name
might still be visible in some earlier prototype CLs.)
Separately, we will likely also use a runtime.freegcSlice. It is a
small function over runtime.freegc that accepts a slice. (Initially it was
just a test function, but ended up being more convenient in some cases,
including in some of the early prototype compiler work.)