-
Notifications
You must be signed in to change notification settings - Fork 2
Description
This issues contains a detailed design of implementing the Reference Counting Deeply Immutable Data
Structures with Cycles for Python.
Motivation
Python 3.12 associates each heap object with an interpreter. These objects are part of a doubly linked list on the interpreter that created the object. If we were to share objects across interpreters and a different interpreter decrement the final reference, then it would have to modify the doubly linked list for a different interpreter. This concurrent modification could form a bottle neck and would require locking.
The Immutable SCC paper means that immutable state can be safely reference counted and does not require a backup cycle detection.
Challenges with respect to the paper
There are a couple of things that need to differ from the paper for the Python context.
External references
We want to allow freezing a component, which still has incoming edges. Consider the following graph:
%%{init: {'theme': 'neutral', 'themeVariables': { 'fontSize': '16px' }}}%%
graph TD
id1[frame]
id1--> |z| id3[rc=1]
id3--> |f| id5[rc=1]
id4--> |f| id6[rc=2]
id5--> |g| id6
id5--> |f| id4[rc=3]
id1--> |x| id4
id6--> |g| id4
id1
id5
id4
id6
id3
subgraph LocalReg[ ]
id1
id3
id4
id5
id6
end
style LocalReg fill:#eefcdd
classDef unreachable stroke-width:2px,stroke:red
classDef highlight stroke-width:4px,stroke:yellow
classDef error stroke-width:4px,stroke:red
classDef tainted fill:#afa8db
classDef tainted_immutable stroke-width:4px,stroke:#afa8db
classDef immutable fill:#94f7ff
If we freeze(x)
, then we should result in
%%{init: {'theme': 'neutral', 'themeVariables': { 'fontSize': '16px' }}}%%
graph TD
id1[frame]
id1--> |z| id3[rc=1]
id3--> |f| id5[rc=1]
id4--> |f| id6[rc=3]:::immutable
id5--> |g| id6
id5--> |f| id4[rc=.]:::immutable
id1--> |x| id4
id6--> |g| id4
id4-.-> |root| id6
id1
id5
id4
id6
id3
subgraph LocalReg[ ]
id1
id3
id4
id5
id6
end
style LocalReg fill:#eefcdd
classDef unreachable stroke-width:2px,stroke:red
classDef highlight stroke-width:4px,stroke:yellow
classDef error stroke-width:4px,stroke:red
classDef tainted fill:#afa8db
classDef tainted_immutable stroke-width:4px,stroke:#afa8db
classDef immutable fill:#94f7ff
This requires us to use the existing reference counts to work out what the reference count of the SCC should be. This requires additional storage that was not required in the paper.
Failure
The paper did not consider the possibility of the algorithm failing. We would like that to be possible without breaking the invariant that objects can only reach other objects.
Implementation
Python 3.12 has two pointer sized elements that are used for the GCs doubly linked list of objects. We can use this state to implement the SCC algorithm in Python 3.12 with minimal disruption. This does not exist for all Python objects: those that cannot participate in cycles do not have this space allocated. This means we need to be able to handle atomic reference counting on these types of objects without that additional space. See _PyType_PreHeaderSize.
We need to borrow parts of the reference count to represent the following states:
- Mutable
- Immutable direct (either the root of an SCC, or an immutable object without GC space)
- Immutable indirect (part of an SCC and not the root.)
- Pending in an SCC that is currently being calculated (this should not persist beyond a freeze).
This can be borrowed in the top parts of the reference count. (Consider 32bit, with Immortal are we out of space?)
Union-find
We use the two GC fields for handling tracking the SCCs using a cyclic singly-linked-list of all the elements in the SCC, and a parent pointing tree to efficiently implement the union find structure. The root of the SCC in the parent pointing tree can be used to carry the rank of the SCC (its maximum possible depth). The rank only really requires a maximum of 7-bits as it is the maximum depth of a mostly balanced tree.
Algorithm
The following is a recursive function that implements the freeze operation
def freeze(r):
pending = []
def freeze_inner(x):
match x.status
| MUTABLE =>
# Check if the GC space exists for SCC calc
if _PyObject_IS_GC(x):
# Make this pending as it could be part of a non-trivial SCC.
x.status = PENDING;
pending.push(x);
# Traverse children
for each f in x
freeze_inner(x.f);
# On postorder traversal check if this is a completed SCC.
if (pending.peek() == x)
# A completed SCC
pending.pop();
r = find(x)
# Accumulate the rcs in each element of the SCC,
# internal edges will have been decremented already
rc = 0;
for (s : scc(x)):
rc += s.rc
# Optimise to point directly at the root
s.root = root
s.status = IMMUTABLE_INDIRECT
# Set up root with the summed rc, and set its status as the root.
r.rc = rc
r.status = IMMUTABLE_DIRECT
else:
# No space for SCC stuff, just make immutable and recurse.
# A cycle through this will never be collected.
r.status = IMMUTABLE_DIRECT
for each f in x
freeze_inner(x.f);
| PENDING =>
x.decref() # Internal edge so remove RC.
while (union(x, pending.peek()))
pending.pop();
| IMMUTABLE_DIRECT =>
incref(x);
| IMMUTABLE_INDIRECT =>
incref(find(x))
freeze_inner(r)
The code above is written as a recursive function. This is not practical, and a DFS stack should be used to prevent stack overflow. This also needs to handle the postorder step correctly. The Verona runtime achieves this by borrowing a bit to indicate that this is either a pre- or post-order visit to the node: https://github.com/microsoft/verona-rt/blob/7deb5873937354e1fa64c9879c4e1e077b0f327d/src/rt/region/freeze.h#L181-L196
Keeping the cyclic list in the union find means this could be extended to handle failure without breaking the invariant.