Table of Contents
1. Data Manipulation
In order to get anything done, we need some way to store and manipulate data. Generally, there are two important things we need to do with data: (i) acquire them; and (ii) process them once they are inside the computer. There is no point in acquiring data without some way to store it, so let us get our hands dirty first by playing with synthetic data. To start, we introduce the $n$-dimensional array, which is also called the ndarray.
If you have worked with NumPy, the most widely-used scientific computing package in Python, then you will find this section familiar. No matter which framework you use, its tensor class (ndarray in MXNet, DJL and clj-djl, Tensor in both PyTorch and TensorFlow) is similar to NumPy’s ndarray with a few killer features. First, GPU is well-supported to accelerate the computation whereas NumPy only supports CPU computation. Second, the tensor class supports automatic differentiation. These properties make the tensor class suitable for deep learning. Throughout the book, when we say ndarrays, we are referring to instances of the ndarray class unless otherwise stated.
1.1. Getting Started
In this section, we aim to get you up and running, equipping you with the basic math and numerical computing tools that you will build on as you progress through the book. Do not worry if you struggle to grok some of the mathematical concepts or library functions. The following sections will revisit this material in the context of practical examples and it will sink. On the other hand, if you already have some background and want to go deeper into the mathematical content, just skip this section.
To start, we import the ndarray namespace from clj-djl. Here, the ndarray namespace includes functions supported by clj-djl.
(ns clj-d2l.data-manipulation (:require [clj-djl.ndarray :as nd] [clj-d2l.core :as d2l]))
An ndarray represents a (possibly multi-dimensional) array of numerical values. With one axis, an ndarray corresponds (in math) to a vector. With two axes, an ndarray corresponds to a matrix. NDArrays with more than two axes do not have special mathematical names.
To start, we can use arange to create a row vector x containing the first 12 integers starting with 0. Each of the values in an ndarray is called an element of the ndarray. For instance, there are 12 elements in the ndarray x. Unless otherwise specified, a new ndarray will be stored in main memory and designated for CPU-based computation.
(def ndm (nd/base-manager)) (def x (nd/arange ndm 0 12)) x
ND: (12) cpu() int32 [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
We can access an ndarray’s shape (the length along each axis) by inspecting its shape property.
(nd/shape x)
(12)
If we just want to know the total number of elements in an ndarray,
i.e., the product of all of the shape elements, we can inspect its
size. Because we are dealing with a vector here, the single element
of its shape is same to its size. The difference is that shape
will return a Shape object.
(nd/size x)
12
To change the shape of an ndarray without altering either the number
of elements or their values, we can invoke the reshape
function. For
example, we can transform our ndarray, x, from a row vector with
shape (12) to a matrix with shape (3, 4). This new ndarray contains
the exact same values, but views them as a matrix organized as 3
rows and 4 columns. To reiterate, although the shape has changed,
the elements have not. Note that the size is unaltered by reshaping.
(def y (nd/reshape x [3 4])) y
ND: (3, 4) cpu() int32 [[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], ]
Reshaping by manually specifying every dimension is unnecessary. If
our target shape is a matrix with shape (height, width), then after we
know the width, the height is given implicitly. Why should we have to
perform the division ourselves? In the example above, to get a matrix
with 3 rows, we specified both that it should have 3 rows and 4
columns. Fortunately, ndarrays can automatically work out one
dimension given the rest. We invoke this capability by placing -1 for
the dimension that we would like ndarrays to automatically infer. In
our case, instead of calling (reshape x [3 4])
, we could have
equivalently called (nd/reshape x [-1 4])
or (nd/reshape x [3 -1])
.
(def y (nd/reshape x [3 -1])) y
ND: (3, 4) cpu() int32 [[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], ]
Passing create method with only Shape will grab a chunk of memory and hands us back a matrix without bothering to change the value of any of its entries. This is remarkably efficient but we must be careful because the entries might take arbitrary values, including very big ones!
(nd/create ndm (nd/shape [3 4]))
ND: (3, 4) cpu() float32 [[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [ 1.21782351e+29, 0.00000000e+00, 1.21783258e+29, 0.00000000e+00], [ 1.19888078e+29, 0.00000000e+00, 1.21744119e+29, 0.00000000e+00], ]
Typically, we will want our matrices initialized either with zeros,
ones, some other constants, or numbers randomly sampled from a
specific distribution. We can create a ndarray representing a tensor
with all elements set to 0 and a shape of [2 3 4]
as follows:
(nd/zeros ndm [2 3 4])
ND: (2, 3, 4) cpu() float32 [[[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], ], [[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], ], ]
Similarly, we can create ndarrays with each element set to 1 as follows:
(nd/ones ndm [2 3 4])
ND: (2, 3, 4) cpu() float32 [[[1., 1., 1., 1.], [1., 1., 1., 1.], [1., 1., 1., 1.], ], [[1., 1., 1., 1.], [1., 1., 1., 1.], [1., 1., 1., 1.], ], ]
Often, we want to randomly sample the values for each element in an ndarray from some probability distribution. For example, when we construct arrays to serve as parameters in a neural network, we will typically initialize their values randomly. The following snippet creates an ndarray with shape (3, 4). Each of its elements is randomly sampled from a standard Gaussian (normal) distribution with a mean of 0 and a standard deviation of 1.
(nd/random-normal ndm 0 1 (nd/shape [3 4]))
ND: (3, 4) cpu() float32 [[ 1.1631, 2.2122, 0.4838, 0.774 ], [ 0.2996, 1.0434, 0.153 , 1.1839], [-1.1688, 1.8917, 1.5581, -1.2347], ]
We can directly use a clojure vec as the shape:
(nd/random-normal ndm 0 1 [3 4])
ND: (3, 4) cpu() float32 [[-0.5459, -1.771 , -2.3556, -0.4514], [ 0.5414, 0.5794, 2.6785, -1.8561], [ 1.2546, -1.9769, -0.5488, -0.208 ], ]
You can also just pass the shape and it will use default values for mean and standard deviation (0 and 1).
(nd/random-normal ndm [3 4])
ND: (3, 4) cpu() float32 [[-0.6811, 0.2444, -0.1353, -0.0372], [ 0.3772, -0.4877, 0.4102, -0.0226], [ 0.5713, 0.5746, -2.758 , 1.4661], ]
We can also specify the exact values for each element in the desired ndarray by supplying a clojure vec (or list) containing the numerical values. Here, the outermost list corresponds to axis 0, and the inner list to axis 1.
(nd/create ndm [2 1 4 3 1 2 3 4 4 3 2 1] [3 4])
ND: (3, 4) cpu() int64 [[ 2, 1, 4, 3], [ 1, 2, 3, 4], [ 4, 3, 2, 1], ]
(nd/create ndm [[2 1 4 3][1 2 3 4][4 3 2 1]])
ND: (3, 4) cpu() int64 [[ 2, 1, 4, 3], [ 1, 2, 3, 4], [ 4, 3, 2, 1], ]
1.2. Operations
This book is not about software engineering. Our interests are not limited to simply reading and writing data from/to arrays. We want to perform mathematical operations on those arrays. Some of the simplest and most useful operations are the elementwise operations. These apply a standard scalar operation to each element of an array. For functions that take two arrays as inputs, elementwise operations apply some standard binary operator on each pair of corresponding elements from the two arrays. We can create an elementwise function from any function that maps from a scalar to a scalar.
In mathematical notation, we would denote such a unary scalar operator (taking one input) by the signature \(f: \mathbb{R} \ rightarrow \mathbb{R}\). This just means that the function is mapping from any real number (\(\mathbb{R}\)) onto another. Likewise, we denote a binary scalar operator (taking two real inputs, and yielding one output) by the signature \(f: \mathbb{R}, \mathbb{R} \rightarrow \mathbb{R}\). Given any two vectors \(\mathbf{u}\) and \(\mathbf{v}\) of the same shape, and a binary operator \(f\), we can produce a vector \(\mathbf{c} = F(\mathbf{u}, \mathbf{v})\) by setting \(c_i \gets f(u_i, v_i)\) for all \(i\), where \(c_i, u_i\), and \(v_i\) are the \(i^\mathrm{th}\) elements of vectors \(\mathbf{c}\), \(\mathbf{u}\), and \(\mathbf{v}\). Here, we produced the vector-valued \(F: \mathbb{R}^d, \mathbb{R}^d \rightarrow \mathbb{R}^d\) by lifting the scalar function to an elementwise vector operation.
The common standard arithmetic operators (+
, -
, *
, /
) have all been
lifted to elementwise operations for any identically-shaped ndarrays
of arbitrary shape. We can call elementwise operations on any two
ndarrays of the same shape. In the following example, we use commas to
formulate a 5-element tuple, where each element is the result of an
elementwise operation.
1.2.1. Operations
The common standard arithmetic operators (+
, -
, *
, /
) have all been
lifted to elementwise operations.
(def x (nd/create ndm [1. 2. 4. 8.])) (def y (nd/create ndm [2. 2. 2. 2.])) (nd/+ x y)
ND: (4) cpu() float64 [ 3., 4., 6., 10.]
(nd/- x y)
ND: (4) cpu() float64 [-1., 0., 2., 6.]
(nd// x y)
ND: (4) cpu() float64 [0.5, 1. , 2. , 4. ]
(nd/pow x y)
ND: (4) cpu() float64 [ 1., 4., 16., 64.]
Many more operations can be applied elementwise, including unary operators like exponentiation.
(nd/exp x)
ND: (4) cpu() float64 [ 2.71828183e+00, 7.38905610e+00, 5.45981500e+01, 2.98095799e+03]
In addition to elementwise computations, we can also perform linear algebra operations, including vector dot products and matrix multiplication. We will explain the crucial bits of linear algebra (with no assumed prior knowledge) in -Section 2.3-.
We can also concatenate multiple ndarrays together, stacking them end-to-end to form a larger ndarray. We just need to provide a list of ndarrays and tell the system along which axis to concatenate. The example below shows what happens when we concatenate two matrices along rows (axis 0, the first element of the shape) vs. columns (axis 1, the second element of the shape). We can see that the first output ndarray’s shape is (6, 4), its axis-0 length (6) is the sum of the two input ndarrays’ axis-0 lengths \((3+3)\); while the second output ndarray’s shape is (3, 8), its axis-1 length (8) is the sum of the two input ndarrays’ axis-1 lengths \((4+4)\).
(def X (-> (nd/arange ndm 12) (nd/reshape [3 4]))) X
ND: (3, 4) cpu() int32 [[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], ]
(def Y (nd/create ndm [[2 1 4 3][1 2 3 4][4 3 2 1]])) Y
ND: (3, 4) cpu() int64 [[ 2, 1, 4, 3], [ 1, 2, 3, 4], [ 4, 3, 2, 1], ]
;; concat only support int32 and float32 datatype (def Y (nd/to-type Y :int32 false)) (nd/concat Y Y)
ND: (6, 4) cpu() int32 [[ 2, 1, 4, 3], [ 1, 2, 3, 4], [ 4, 3, 2, 1], [ 2, 1, 4, 3], [ 1, 2, 3, 4], [ 4, 3, 2, 1], ]
(nd/concat X Y 1)
ND: (3, 8) cpu() int32 [[ 0, 1, 2, 3, 2, 1, 4, 3], [ 4, 5, 6, 7, 1, 2, 3, 4], [ 8, 9, 10, 11, 4, 3, 2, 1], ]
The third argument of nd/concat
is to specify the axis to concatenate,
default is axis-0.
Sometimes, we want to construct a binary ndarray via logical
statements. Take (nd/= X Y)
as an example. For each position, if X and
Y are equal at that position, the corresponding entry in the new
tensor takes a value of true
, meaning that the logical statement (nd/=
X Y)
is true at that position; otherwise that position takes false
.
(nd/= X Y)
ND: (3, 4) cpu() boolean [[false, true, false, true], [false, false, false, false], [false, false, false, false], ]
Summing all the elements in the ndarray yields a ndarray with only one element.
(nd/sum X)
ND: () cpu() int32 66
1.3. Broadcasting Mechanism
In the above section, we saw how to perform elementwise operations on two ndarrays of the same shape. Under certain conditions, even when shapes differ, we can still perform elementwise operations by invoking the broadcasting mechanism. This mechanism works in the following way: First, expand one or both arrays by copying elements appropriately so that after this transformation, the two ndarrays have the same shape. Second, carry out the elementwise operations on the resulting arrays.
In most cases, we broadcast along an axis where an array initially only has length 1, such as in the following example:
(def a (-> (nd/arange ndm 3) (nd/reshape [3 1]))) a
ND: (3, 1) cpu() int32 [[ 0], [ 1], [ 2], ]
(def b (-> (nd/arange ndm 2) (nd/reshape [1 2]))) b
ND: (1, 2) cpu() int32 [[ 0, 1], ]
Since a and b are \(3 \times 1\) and \(1 \times 2\) matrices respectively, their shapes do not match up if we want to add them. We broadcast the entries of both matrices into a larger \(3 \times 2\) matrix as follows: for matrix a it replicates the columns and for matrix b it replicates the rows before adding up both elementwise.
The result of \(a\) broadcasted is:
(nd/concat a a 1)
ND: (3, 2) cpu() int32 [[ 0, 0], [ 1, 1], [ 2, 2], ]
The result of \(b\) broadcasted is:
(->> b (nd/concat b) (nd/concat b))
ND: (3, 2) cpu() int32 [[ 0, 1], [ 0, 1], [ 0, 1], ]
Thus the result is:
(nd/+ a b)
ND: (3, 2) cpu() int32 [[ 0, 1], [ 1, 2], [ 2, 3], ]
1.4. Indexing and Slicing
Just as in any other Python array, elements in a ndarray can be accessed by index. As in any Python array, the first element has index 0 and ranges are specified to include the first but before the last element. As in standard Python lists, we can access elements according to their relative position to the end of the list by using negative indices.
Java and Clojure do not support operator[]
overload, a simulation is
done with index and slice string.
X
ND: (3, 4) cpu() int32 [[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], ]
Thus, [-1] selects the last element and [1:3] selects the second and the third elements as follows:
(nd/get X "-1")
ND: (4) cpu() int32 [ 8, 9, 10, 11]
(nd/get X "1:3")
ND: (2, 4) cpu() int32 [[ 4, 5, 6, 7], [ 8, 9, 10, 11], ]
Beyond reading, we can also set elements of a matrix by specifying indices.
(nd/set X "1,2" 999)
ND: (3, 4) cpu() int32 [[ 0, 1, 2, 3], [ 4, 5, 999, 7], [ 8, 9, 10, 11], ]
If we want to assign multiple elements the same value, we simply index all of them and then assign them the value. For instance, [0:2, :] accesses the first and second rows, where : takes all the elements along axis 1 (column). While we discussed indexing for matrices, this obviously also works for vectors and for tensors of more than 2 dimensions.
(nd/set X "0:2,:" 12)
ND: (3, 4) cpu() int32 [[12, 12, 12, 12], [12, 12, 12, 12], [ 8, 9, 10, 11], ]
1.5. Saving Memory
Running operations can cause new memory to be allocated to host
results. For example, if we write (def Y2 (nd/+! X Y)
, we will
dereference the ndarray that Y used to point to and instead point Y at
the newly allocated memory. In the following example, we demonstrate
this with Clojure’s identical?
function, which results true
if the two
object are exactly the same. After running Y’ = Y + X, we will find
that Y and Y’ are different objects. That is because Clojure first
evaluates Y + X, allocating new memory for the result and then makes Y
point to this new location in memory.
(def Y (nd/zeros ndm (nd/get-shape X))) (def Y' (nd/+ Y X)) (identical? Y Y')
false
(def Y'' (nd/+! Y X)) (identical? Y Y'')
true
Running operations can cause new memory to be allocated to host results. For example, if we write y = x.add(y), we will dereference the ndarray that y used to point to and instead point y at the newly allocated memory.
This might be undesirable for two reasons. First, we do not want to run around allocating memory unnecessarily all the time. In machine learning, we might have hundreds of megabytes of parameters and update all of them multiple times per second. Typically, we will want to perform these updates in place. Second, we might point at the same parameters from multiple variables. If we do not update in place, other references will still point to the old memory location, making it possible for parts of our code to inadvertently reference stale parameters.
Fortunately, performing in-place operations in DJL is easy. We can assign the result of an operation to a previously allocated array using inplace operators like addi, subi, muli, and divi.
(def Y (nd/zeros ndm (nd/get-shape X))) (def Y' (nd/+ Y X)) (identical? Y Y')
false
(def Y'' (nd/+! Y X)) (identical? Y Y'')
true
1.6. Conversion to Other Clojure Objects
(type (nd/to-vec X))
class clojure.lang.PersistentVector
(nd/to-vec X)
[12 12 12 12 12 12 12 12 8 9 10 11]
(type (nd/to-array X))
class [Ljava.lang.Integer;
(type X)
class ai.djl.mxnet.engine.MxNDArray
X
ND: (3, 4) cpu() int32 [[12, 12, 12, 12], [12, 12, 12, 12], [ 8, 9, 10, 11], ]
To convert a size-1 tensor to a scalar
(def a (nd/create ndm [3.5])) a
ND: (1) cpu() float64 [3.5]
(nd/get-element a)
3.5