Skip to content

Commit e3759c4

Browse files
committed
docs: expand on building process section
Signed-off-by: Akshay Mestry <[email protected]>
1 parent 0a28ec8 commit e3759c4

File tree

1 file changed

+231
-5
lines changed

1 file changed

+231
-5
lines changed

docs/source/projects/xsnumpy.rst

Lines changed: 231 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
.. Author: Akshay Mestry <[email protected]>
22
.. Created on: Saturday, March 01 2025
3-
.. Last updated on: Saturday, March 01 2025
3+
.. Last updated on: Sunday, March 02 2025
44
55
:og:title: Building xsNumpy
6-
:og:description: Building a lightweight, pure-python implementation of NumPy's
7-
core features
6+
:og:description: A lightweight, pure-python implementation of NumPy's core
7+
features
88
:og:type: article
99

1010
.. _project-building-xsnumpy:
@@ -87,8 +87,234 @@ scratch?* Again, not to replace it but to learn from it.
8787
Building Process
8888
-------------------------------------------------------------------------------
8989

90-
Now that I was motivated to learn the nitty-gritty implementation details of
91-
NumPy, I was ready to build my own version of it.
90+
I was ready to build my small version of NumPy, but I didn't know where to
91+
start. I began scrutinizing and poking at various NumPy functions and methods.
92+
Soon I realized that most of NumPy APIs rely on one core construct, the
93+
:func:`numpy.array` function, which is a cheeky little wrapper for
94+
:class:`numpy.ndarray`. That's where I decided to start, implementing my
95+
primary ``xsnumpy.ndarray`` data structure.
96+
97+
To be honest, it seemed simple and fairly straightforward in my head |dash| a
98+
collection of numbers arranged in rows and columns. I mean, what else could be
99+
there in an array? Wrong! The more I dove deep into the implementation, more
100+
things started poking their heads up. I had to think about
101+
`memory allocation and management`_, calculations for `shape`_ (size),
102+
`strides`_, and how to store the data more efficiently.
103+
104+
A few weeks in, I somehow got around implementing a barebones version of
105+
:class:`numpy.ndarray` using :py:mod:`ctypes`.
106+
107+
.. code-block:: python
108+
:linenos:
109+
110+
class ndarray:
111+
"""Simplified implementation of a multi-dimensional array.
112+
113+
An array object represents a multidimensional, homogeneous
114+
collection or list of fixed-size items. An associated data-type
115+
property describes the format of each element in the array.
116+
117+
:param shape: The desired shape of the array. Can be an int for
118+
1D arrays or a sequence of ints for multidimensional arrays.
119+
:param dtype: The desired data type of the array, defaults to
120+
`None` if not specified.
121+
:param buffer: Object used to fill the array with data, defaults to
122+
`None`.
123+
:param offset: Offset of array data in buffer, defaults to `0`.
124+
:param strides: Strides of data in memory, defaults to `None`.
125+
:param order: The memory layout of the array, defaults to `None`.
126+
:raises RuntimeError: If an unsupported order is specified.
127+
:raises ValueError: If invalid strides or offsets are provided.
128+
"""
129+
130+
def __init__(
131+
self,
132+
shape: _ShapeLike | int,
133+
dtype: None | DTypeLike | _BaseDType = None,
134+
buffer: None | t.Any = None,
135+
offset: t.SupportsIndex = 0,
136+
strides: None | _ShapeLike = None,
137+
order: None | _OrderKACF = None,
138+
) -> None:
139+
"""Initialize an `ndarray` object from the provided shape."""
140+
if order is not None:
141+
raise RuntimeError(
142+
f"{type(self).__qualname__} supports only C-order arrays;"
143+
" 'order' must be None"
144+
)
145+
if not isinstance(shape, Iterable):
146+
shape = (shape,)
147+
self._shape = tuple(int(dim) for dim in shape)
148+
if dtype is None:
149+
dtype = float64
150+
elif isinstance(dtype, type):
151+
dtype = globals()[
152+
f"{dtype.__name__}{'32' if dtype != builtins.bool else ''}"
153+
]
154+
else:
155+
dtype = globals()[dtype]
156+
self._dtype = dtype
157+
self._itemsize = int(_convert_dtype(dtype, "short")[-1])
158+
self._offset = int(offset)
159+
if buffer is None:
160+
self._base = None
161+
if self._offset != 0:
162+
raise ValueError("Offset must be 0 when buffer is None")
163+
if strides is not None:
164+
raise ValueError("Buffer is None; strides must be None")
165+
self._strides = calc_strides(self._shape, self.itemsize)
166+
else:
167+
if isinstance(buffer, ndarray) and buffer.base is not None:
168+
buffer = buffer.base
169+
self._base = buffer
170+
if isinstance(buffer, ndarray):
171+
buffer = buffer.data
172+
if self._offset < 0:
173+
raise ValueError("Offset must be non-negative")
174+
if strides is None:
175+
strides = calc_strides(self._shape, self.itemsize)
176+
elif not (
177+
isinstance(strides, tuple)
178+
and all(isinstance(stride, int) for stride in strides)
179+
and len(strides) == len(self._shape)
180+
):
181+
raise ValueError("Invalid strides provided")
182+
self._strides = tuple(strides)
183+
buffersize = self._strides[0] * self._shape[0] // self._itemsize
184+
buffersize += self._offset
185+
Buffer = _convert_dtype(dtype, "ctypes") * buffersize
186+
if buffer is None:
187+
if not isinstance(Buffer, str):
188+
self._data = Buffer()
189+
elif isinstance(buffer, ctypes.Array):
190+
self._data = Buffer.from_address(ctypes.addressof(buffer))
191+
else:
192+
self._data = Buffer.from_buffer(buffer)
193+
194+
.. note::
195+
196+
This is not the complete implementation. For brevity, many details have
197+
been abstracted away. To see the complete implementation of the
198+
``xsnumpy.ndarray`` class, check out the
199+
`code <https://github.com/xames3/xsnumpy/blob/
200+
69c302ccdd594f1d8f0c51dbe16346232c39047f/xsnumpy/_core.py#L183>`_ on
201+
GitHub.
202+
203+
.. dropdown:: Code explanation
204+
:animate: fade-in
205+
206+
Alright, let me break this down in a way that makes sense. First, I start
207+
with checking if the shape is an :py:class:`collections.abc.Iterable`
208+
(a sequence like a :py:class:`tuple` or :py:class:`list`). If it's not,
209+
I'm wrapping it into a tuple to ensure that the shape is always
210+
represented as a tuple. The shape is then converted into a tuple of
211+
integers, ensuring the dimensions are valid.
212+
213+
.. code-block:: python
214+
:linenos:
215+
216+
if not isinstance(shape, Iterable):
217+
shape = (shape,)
218+
self._shape = tuple(int(dim) for dim in shape)
219+
220+
Next up, the ``dtype`` (data type). If ``dtype`` is not provided, the
221+
constructor sets the default data type to ``None``. If a :py:class:`type`
222+
(such as :py:class:`int`, :py:class:`float`, etc.) is provided, it
223+
dynamically retrieves the appropriate data type from the global namespace
224+
using :func:`globals`. This allows flexibility in handling various types.
225+
Finally, the resolved data type is assigned to ``self._dtype``.
226+
227+
.. code-block:: python
228+
:linenos:
229+
230+
if dtype is None:
231+
dtype = float64
232+
elif isinstance(dtype, type):
233+
dtype = globals()[
234+
f"{dtype.__name__}{'32' if dtype != builtins.bool else ''}"
235+
]
236+
else:
237+
dtype = globals()[dtype]
238+
self._dtype = dtype
239+
240+
The size of each element in the array is calculated based on the provided
241+
data type. I wrote a handy function, ``_convert_dtype`` to fetch the
242+
appropriate size of the data type (in a ``short`` format), and the last
243+
value is used to determine the item size.
244+
245+
This is super important for calculating memory layout and strides!
246+
247+
.. code-block:: python
248+
:linenos:
249+
250+
self._itemsize = int(_convert_dtype(dtype, "short")[-1])
251+
252+
Now, if ``buffer`` is ``None``, the array is initialized without an
253+
external memory buffer. In this case:
254+
255+
- The offset must be zero
256+
- Strides must also be ``None``
257+
258+
The constructor calculates the strides. The strides is nothing but steps
259+
between consecutive elements in memory.
260+
261+
.. code-block:: python
262+
:linenos:
263+
264+
if buffer is None:
265+
self._base = None
266+
if self._offset != 0:
267+
raise ValueError("Offset must be 0 when buffer is None")
268+
if strides is not None:
269+
raise ValueError("Buffer is None; strides must be None")
270+
self._strides = calc_strides(self._shape, self.itemsize)
271+
272+
If a ``buffer`` is provided, the constructor handles it by checking if
273+
it's another ``ndarray``. If the ``ndarray`` has a base buffer, it uses
274+
that. The buffer is assigned to ``self._base``, and strides are either
275+
provided or calculated.
276+
277+
The constructor validates the offset (it must be non-negative) and the
278+
strides (it must be a tuple of integers matching the shape's dimensions).
279+
280+
.. code-block:: python
281+
:linenos:
282+
283+
else:
284+
if isinstance(buffer, ndarray) and buffer.base is not None:
285+
buffer = buffer.base
286+
self._base = buffer
287+
if isinstance(buffer, ndarray):
288+
buffer = buffer.data
289+
if self._offset < 0:
290+
raise ValueError("Offset must be non-negative")
291+
if strides is None:
292+
strides = calc_strides(self._shape, self.itemsize)
293+
elif not (
294+
isinstance(strides, tuple)
295+
and all(isinstance(stride, int) for stride in strides)
296+
and len(strides) == len(self._shape)
297+
):
298+
raise ValueError("Invalid strides provided")
299+
self._strides = tuple(strides)
300+
301+
Finally, the constructor calculates the total buffer size based on the
302+
strides, shape, and item size. The ``Buffer`` is a type derived from the
303+
data type (dtype) and its size. Depending on whether the buffer is provided
304+
or not, it initializes ``self._data`` using different methods:
305+
306+
- If no buffer is provided, a new buffer is created
307+
- If the buffer is a :py:class:`ctypes.Array`, the address of the buffer
308+
is used to initialize the data. Basically, we use its address like a map
309+
- If it's any other type of buffer, the buffer is used directly
310+
311+
Phew, that was a lot, but now you can see how it's all orchestrated!
92312

93313
.. _NumPy: https://numpy.org/
94314
.. _xsNumPy: https://github.com/xames3/xsnumpy
315+
.. _memory allocation and management: https://numpy.org/doc/stable/reference/
316+
c-api/data_memory.html
317+
.. _shape: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.
318+
shape.html
319+
.. _strides: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.
320+
strides.html

0 commit comments

Comments
 (0)