1
00:00:00,000 --> 00:00:06,200
Hi guys, welcome back to CS445 - Computational Photography.

2
00:00:06,200 --> 00:00:15,640
Today I'm going to be briefly going over some basics of how to use NumPy, an essential python library for scientific computing.

3
00:00:15,640 --> 00:00:30,380
You'll be using NumPy a lot in this class, and it is pretty useful for anything you might do in data science, computer vision, and

4
00:00:30,380 --> 00:00:40,040
anything that involves matrices and arrays. The best resource for learning how to use NumPy is the NumPy documentation. The link is right here.

5
00:00:40,040 --> 00:00:49,980
As always you'll receive this in the course material. This notebook and any other notebooks we use in these tutorials.

6
00:00:49,980 --> 00:01:04,040
With that, let's get started. You can go to numpy.org/#getting-started, and there are instructions on how to download and install NumPy.

7
00:01:04,040 --> 00:01:13,880
So I've done so here. Then we'll open up a Jupyter notebook and we'll import numpy. [Note: If you do "import numpy as np" then you should refer to numpy as "np" in your code.]

8
00:01:13,880 --> 00:01:24,080
Alright so the fundamental objects in NumPy are called arrays. If you've programmed in C or C++,

9
00:01:24,080 --> 00:01:31,020
they're pretty much what you are used to in those languages.

10
00:01:31,020 --> 00:01:43,260
Actually, more specifically, NumPy arrays are quite similar to Matlab arrays, and a lot of interfaces are pretty much the same.

11
00:01:43,260 --> 00:01:53,920
Some of the important attributes of NumPy arrays are the shape and the data type. You're going to specify these when you create an array

12
00:01:53,920 --> 00:02:02,340
and you're going to use the shape when you loop through the array or utilize its properties.

13
00:02:02,340 --> 00:02:12,880
The shape is a tuple whose length is the number of dimensions. The data type refers to the kinds of objects that are in the NumPy array.

14
00:02:12,880 --> 00:02:27,480
They could be ints, floats or even strings. Let's go over some ways to create arrays. NumPy's ndarray class has a low-level constructor, but we rarely use it.

15
00:02:27,480 --> 00:02:36,320
Generally what we do is we use a standard python list. So as you can see, "seq" is a list here.

16
00:02:36,320 --> 00:02:50,480
Then we use the <b><i>np.array</i></b> function which takes another type of sequence and converts it into a NumPy array. So let's run this cell.

17
00:02:50,480 --> 00:03:05,700
There you go, as you can see the type of "arr_seq" is numpy.ndarray, and its data type is int64. This is the default integer data type that NumPy parses.

18
00:03:05,700 --> 00:03:19,740
You can look more closely at the defaults here at this website, the scipy docs for NumPy - NumPy ndarray class.

19
00:03:19,740 --> 00:03:29,620
You can google for the numpy.array functions. The resources are really rich.

20
00:03:29,620 --> 00:03:48,460
So alternatively, NumPy provides utility functions for common array creation use cases. Sometimes you just want to create an array that may be empty or look a certain way.

21
00:03:48,460 --> 00:04:05,280
One way to do this is through these functions. So let's run <b><i>numpy.empty</i></b>. numpy.empty is going take a tuple of the shape that you want your array to have.

22
00:04:05,280 --> 00:04:17,880
So we just want a 2-by-2 empty array here. As you can see the array is not strictly <i>empty</i> but it is uninitialized, and that's the point of np.empty.

23
00:04:17,880 --> 00:04:26,860
It's a very fast function as it doesn't initialize any of the memory locations that it uses. Yeah that's np.empty.

24
00:04:26,860 --> 00:04:45,360
There is also <b><i>np.ones</i></b>. So most of these functions take a given shape, as you can see here, so you have to provide them with a tuple of the size and shape that you want your array to be.

25
00:04:45,360 --> 00:04:52,940
Here we go, with np.ones, we want a 1-by-4 array, that's 1 row and 4 columns. This is what that looks like.

26
00:04:52,940 --> 00:05:07,880
Then running <b>np.zeros</b> with a 4-row 1-column array will give us this. So <b>np.ones</b> just fills up the array with all 1 and <b>np.zeros</b> fills up the array with all 0.

27
00:05:07,880 --> 00:05:17,980
Then there's the general version of <b><i>np.ones</i></b> and <b><i>np.zeros</i></b> called <b><i>np.full</i></b> which takes at least two arguments.

28
00:05:17,980 --> 00:05:25,720
The first argument is the shape, as usual. The second argument is the number that you want to fill the array with.

29
00:05:25,720 --> 00:05:43,980
Here I've asked for a 4-element array filled with 10. Notice one important difference between this shape <i>(4,)</i> and <i>(4,1)</i>.

30
00:05:43,980 --> 00:06:04,200
So these are strictly <i>not</i> the same. The (4,1) array is a 2-dimensional array. So you can see that there are two different sets of square brackets. There is one outside and then there is this set encapsulating the inner elements.

31
00:06:04,200 --> 00:06:18,880
Here we have 4 rows and 1 column per row, whereas here we just have a single 1-dimensional array with 4 elements.

32
00:06:18,880 --> 00:06:34,720
NumPy has support for 1-dimensional arrays, so this is not the same as (1,4) or (4,1), and we'll talk a little bit more about this with regards to reshaping and broadcasting.

33
00:06:34,720 --> 00:06:54,260
Another useful function is called <b><i>numpy.arange</i></b>, which basically  just gives you evenly spaced values within a given interval. I'm going to be using this function quite a bit so I thought I'd cover it.

34
00:06:54,260 --> 00:07:05,940
This function is especially useful when you're plotting. It will come in handy, so make sure you know how to use this one.

35
00:07:05,940 --> 00:07:27,560
Now let's talk about how to access arrays. NumPy array elements are accessed using the standard python access syntax. It's the obj, in this case the array, and inside square brackets it's going to have some <i>selector</i>.

36
00:07:27,560 --> 00:07:44,940
As with multi-dimensional python lists, you can use commas to separate the selectors for various dimensions and this colon you can use to select ranges within one dimension.

37
00:07:44,940 --> 00:08:00,500
Let's take a look at this we'll make "x" a range from 0 to 20 with a step of 2. So this is going to generate the even numbers between 0 and 20 not including 20. That's actually important to note here. So let's run that.

38
00:08:00,500 --> 00:08:20,520
Then we're going to have x[2] give us 4, and the way to read this is x[1:3] will give us 2 and 4. The 0th element is 0. The 1st element is 2.

39
00:08:20,520 --> 00:08:42,980
The second element is 4. So notice that you actually don't get x[3] here. That's important to note. Let's actually see what x[3] is here. It's 6. So x[1:3] actually doesn't include x[3] and that's important to know.

40
00:08:42,980 --> 00:08:56,540
I encourage you try more examples, obviously. You might want to try with negative indices as ranges in python do support negative indices as well negative steps.

41
00:08:56,540 --> 00:09:11,480
Here maybe we can add a step of 1 or of 2. That will give you a slightly smaller array. But yeah I encourage you to mess around with that.

42
00:09:11,480 --> 00:09:25,500
Let's quickly talk about the memory layout of arrays. All arrays in NumPy are 1-dimensional contiguous segments.

43
00:09:25,500 --> 00:09:48,880
There is no 2-dimensional arrays in memory. The only thing that is allowing you to interface with them two dimensionally is an indexing scheme that maps the total number of integers in that array into certain indices that's managed by numpy.

44
00:09:48,880 --> 00:10:00,520
This indexing scheme is dependent on the shape, and how many bytes each item takes which is determined by the dtype associated with the array.

45
00:10:00,520 --> 00:10:12,860
It is also important to note that data in NumPy arrays is arranged in <i>row-major</i> order which means that if you have a 2-dimensional array, the first dimension is going to be the row.

46
00:10:12,860 --> 00:10:29,880
So the way to read this is going to be the first row (the index 0 row) and the third column (the index 2 column).

47
00:10:29,880 --> 00:10:54,060
Let's just see what that's about. We've created an empty 3-by-3 array. We're going to access the first row, last column here. There we have it 2.058e-312 that's this element. So that's row-major.

48
00:10:54,060 --> 00:11:13,720
Now let' s talk a liitle about reshaping and resizing. There are two important array functions used to manipulate the shape of arrays. We mainly use reshape and I think you'll probably be mainly using reshape as well in this course.

49
00:11:13,720 --> 00:11:29,720
Reshape returns an array with the same data as the given array, but with the shape that you provide. So let's look at some examples. Let's create a 1-dimensional array of size 12.

50
00:11:29,720 --> 00:11:51,720
There it is. Now we'll reshape it to be a 2-dimensional array with 2 rows and 6 columns. That's what it's going to look like. Let's see, we can also reshape to something else.

51
00:11:51,720 --> 00:12:15,640
I think it's important to note here that you can pass in a tuple as well for reshape, but it's not necessary, you can also pass in the numbers directly as a sequence of dimensions.

52
00:12:15,640 --> 00:12:37,440
So I can do this, and turn it into a 3-dimensional array with 12 rows, 1 column and 1 cell deep. So feel free to mess around with that. For now we'll reshape it to (3,4) that looks like this.

53
00:12:37,440 --> 00:13:00,340
Note that what resize does is a little different. It changes the shape of the given array in place. So at this point after doing x.reshape(3,4), we haven't actually modified "x" in any way. The output of this expression is a 2-dimensional array of shape 3-by-4.

54
00:13:00,340 --> 00:13:16,900
However, once we execute resize, let's see what happens. Let's get rid of this for a second. So resize doesn't produce any output but it does modify the array, so let's query what happened to our array.

55
00:13:16,900 --> 00:13:33,600
So there it is. Let's try this one more time. We have "x" here. It is an array of 12 elements. Before doing resize, let's check out what happened to "x".

56
00:13:33,600 --> 00:13:39,320
So "x" is like this. No we'll do resize and query "x" again.

57
00:13:39,320 --> 00:13:46,320
So this is the same as the output of <i>x.reshape</i>, but now we've modified "x".

58
00:13:46,320 --> 00:14:00,560
Well that was reshaping and resizing. I encourage you to try multiple examples. There are some caveats that arise when you are trying to

59
00:14:00,560 --> 00:14:20,620
reshape the channels of images, you have to be really specifically careful with the order in which you reshape. You might need to use some reshapes and transposes. When those situations arise in this class, we will be specific about them.

60
00:14:20,620 --> 00:14:30,340
Next let's talk about array slicing and indexing. Like we saw in the array access section, arrays are accessed using the standard python bracket syntax.

61
00:14:30,340 --> 00:14:40,900
So in this section we'll look at various forms of the selection object i.e. what you can put inside the square brackets.

62
00:14:40,900 --> 00:14:55,260
So one thing you can put inside the square brackets are ranges. The basic range selection syntax is <b><i>start:stop:step</i></b>. So let's look at an example of this.

63
00:14:55,260 --> 00:15:09,060
So here is a 1-dimensional array of 100 elements reshaped to a 2-dimensional 10-by-10 array. The way we would select the bottom right 5-by-5 subarray.

64
00:15:09,060 --> 00:15:20,780
So how would we do that? We want to start at the 5th row and the 5th column and select everything to the bottom and right of that.

65
00:15:20,780 --> 00:15:40,660
When you don't specify a stop value like this, NumPy assumes that you mean you want to iterate till the end. So it will replace this with 10 implicitly in this case.

66
00:15:40,660 --> 00:15:55,900
The default value for the step is 1. Here we're messing around with what the value of the step can be. So we can even have it be negative values.

67
00:15:55,900 --> 00:16:09,860
So if we put -1, what we'll get is, starting at the 5th row 5th column, we'll get the entire 5-by-5 subarray to the top-left of that.

68
00:16:09,860 --> 00:16:24,980
But, it'll be rotated 180 degrees. So this is a neat way to rotate a subarray by 180 degrees. Just change the step to -1. I'd definitely give this a shot.

69
00:16:24,980 --> 00:16:41,860
Another thing we can do is to take the whole array and change the step that we use. That way we can choose various elements from the subarray, and do some sort of downsampling.

70
00:16:41,860 --> 00:16:53,380
So definitely experiment with these values and as always, if you have any questions about this, do let us know on Piazza.

71
00:16:53,380 --> 00:17:06,920
Let's talk about some more objects we can put inside the square brackets. The three dots or ellipses object along with the new axis object can also be used for indexing.

72
00:17:06,920 --> 00:17:24,360
What the new axis and ellipses objects are for is: the ellipses object (...) is for substituting all of the first few dimensions.

73
00:17:24,360 --> 00:17:35,060
A specific use case might be like the one here. You want to view an array of all first elements along the third dimension in this array.

74
00:17:35,060 --> 00:17:53,500
What that would mean is that you want to select... Essentially what you're doing is computing this (for a 3-dimensional array, x[...,0] is the same as x[:,:,0].). So what just the ":" means is "take everything" (in that dimension).

75
00:17:53,500 --> 00:18:07,160
Assume the start, which is by default 0, and implicitly assume that you want to go all the way up to the end along the first and second dimensions.

76
00:18:07,160 --> 00:18:22,300
Then just take the 0th element. So here for 3 dimensions we can do this, but if we had a 10-dimensional array, instead of putting 10 colons here, we can just replace all this with 3 dots.

77
00:18:22,300 --> 00:18:34,600
That will give us what we want. Oops, we forgot to run this cell. There we go. So that's what the ellipses are for.

78
00:18:34,600 --> 00:18:51,900
You can also use the ellipses to fill in the rest. So if you just want the first element along the first dimension, and everything in that element you can use the ellipses for that.

79
00:18:51,900 --> 00:19:06,520
But this is equivalent to just this. (x[0,...] is the same as x[0].) So that's why it's not as useful as in the first case.

80
00:19:06,520 --> 00:19:22,620
Next let's talk about the <b><i>newaxis</i></b> object. The newaxis object is used when you want to expand the shape of an array along a certain dimension. So let's look at "x" which has shape (3,3,3).

81
00:19:22,620 --> 00:19:32,740
Now we'll select all the elements in "x" using just colons. So 3 colons means that we'll take all elements along all dimensions.

82
00:19:32,740 --> 00:19:47,320
But at the second position, we'll insert an <b><i>np.newaxis</i></b>, and now we'll look at the shape of this. As you can see there is an extra dimension here and it has a depth of 1.

83
00:19:47,320 --> 00:19:59,180
The <b><i>np.newaxis</i></b> is basically just an alias for the python <b><i>None</i></b> object, so if you had a None in here, it would basically have the same effect.

84
00:19:59,180 --> 00:20:14,100
Using the <b><i>np.newaxis</i></b> object at the beginning without any other dimensions would prepend a dimension at the beginning of the shape.

85
00:20:14,100 --> 00:20:30,240
You can use the ellipses and the np.newaxis object in tandem to easily append a dimension to the end of the shape, so this is the way you can expand it along the last position.

86
00:20:30,240 --> 00:20:47,980
Now let's talk about advanced indexing. Advanced indexing is triggered when the selection object i.e. the object inside the square brackets is a non-tuple sequence object, another ndarray, or a tuple with at least one sequence object or ndarray.

87
00:20:47,980 --> 00:21:03,880
There are specifically two types of advanced indexing: Integer and Boolean. It's important to note that unlike slicing, or basic indexing which returns a view of the underlying array data, advanced indexing always returns a copy of the data.

88
00:21:03,880 --> 00:21:23,600
So a view is kind of like a pointer. It just shows you the data. When you print out the result of a basic slice, you are not copying any of the data. You don't have a copy of the data, but advanced indexing always returns a copy of the data.

89
00:21:23,600 --> 00:21:35,820
So let's look at integer indexing first. We'll start with the same array we were working with earlier. What does this look like it will give us?

90
00:21:35,820 --> 00:21:51,960
Each of these sequences of integers are of length 2, which means that we'll be selecting 2 items. This is basically specifying the value of one dimension for 2 elements.

91
00:21:51,960 --> 00:22:08,500
It's the same with these two sequences. So basically we'll get the elements at (0,1,0) and (2,2,1). So let's run that.

92
00:22:08,500 --> 00:22:24,020
Now let's change these numbers around a little bit.

93
00:22:24,020 --> 00:22:43,520
So you can pause the video and think about what this will return. Let's run it. So we got back 21 and 9. So what happens when we change the length of these sequences?

94
00:22:43,520 --> 00:23:05,920
Note that each one of these has to be of the same length. So you can't change the length of one of them and have the other ones not be changed. So here we're asking for 3 values from the array. So let's run that.

95
00:23:05,920 --> 00:23:26,200
So that's how integer indexing works there. Now let's look at a different array. Let's look at a 4-by-4 array here. How exactly would we select just the corner elements of this array? That's 0, 3, 12, and 15.

96
00:23:26,200 --> 00:23:39,620
So one way we can do it is with integer indexing. We'll define the corner rows as a 2-d array.

97
00:23:39,620 --> 00:23:57,420
This is what it's going to look like. The corner columns are another 2D array. What NumPy lets us do here is just pass these 2D arrays to the square brackets. It'll give us the corner elements.

98
00:23:57,420 --> 00:24:12,620
So how is this being interpreted here? Basically, NumPy will match these element wise or broadcast them, and we'll get the elements at those positions.

99
00:24:12,620 --> 00:24:24,720
So here we'll get element (0,0), (0,3), (3,0) and (3,3) which are the corner elements in a 4-by-4 array.

100
00:24:24,720 --> 00:24:41,360
So we can also achieve this with broadcasting. If we define the corner rows here like this and the corner columns like this as well, we can use <b><i>np.newaxis</i></b> to force a broadcast here.

101
00:24:41,360 --> 00:25:06,560
What will happen is that NumPy knows that this becomes a 3-by-1 array and this is still a 1-dimensional array, it will broadcast them into these implicitly.

102
00:25:06,560 --> 00:25:23,540
We'll once again get the corner elements just like we wanted. I encourage you mess around with these values and try this. This is yet another example of broadcasting.

103
00:25:23,540 --> 00:25:36,560
The shape of this is 3-by-1 and here we have a just a (3,None) shape. So what would the output shape of this be?

104
00:25:36,560 --> 00:25:47,220
It is a (3,3), so think about why that is.

105
00:25:47,220 --> 00:25:59,360
Let's look at boolean indexing now. Arrays in NumPy are compatible with boolean operators and we can exploit this to essentially filter items in the array.

106
00:25:59,360 --> 00:26:13,200
So let's look at this range from -10 to 10. If we want only the positive elements we can just pass in this expression into the square brackets.

107
00:26:13,200 --> 00:26:25,740
That'll give us an array of all the positive values. Remember this is a copy. This is not referring to these elements in the array. This is a copy.

108
00:26:25,740 --> 00:26:39,320
Similarly, we can find the even elements by passing in this selector. We can get the odd elements by just changing this.

109
00:26:39,320 --> 00:26:59,760
Here we go, these are the odd elements. It should be noted that you can modify the values of the original array using these types of operators.

110
00:26:59,760 --> 00:27:12,400
As shown here, what we want to do is increment all the negative values by 10. Let's go ahead and run this. This should give us no particular output,

111
00:27:12,400 --> 00:27:22,500
but then if we query "x" again, we'll see that all the negative elements have been incremented by 10.

112
00:27:22,500 --> 00:27:38,300
Well that was the NumPy tutorial for CS445. You should go through this one more time on your own, and feel free to ask questions on Piazza.

113
00:27:38,300 --> 00:27:45,300
You'll be using NumPy extensively so this tutorial should come in handy. Okay, thank you!