Numpy Array Sort, Search, Count, Replace¶

Numpy也有指令可將陣列中的元素做排序、找出最大、最小的元素位置、篩選找出符合條件的元素、取代等。

舉例前，我們先用亂數建立一個二維陣列：

In [1]:

import numpy as np
A = np.random.rand(15).reshape(3,5) #2d array
print(A)

[[ 0.01663677  0.08998309  0.39570028  0.77258603  0.75701306]
 [ 0.42942372  0.32741335  0.94156336  0.50594256  0.18768062]
 [ 0.79970882  0.98071259  0.60782182  0.53817992  0.79366708]]

Sorting¶

np.sort(arr, axis=...)

arr: Array to be sorted.
axis: dimension along which to sort (integer).
- None -- the array is flattened before sorting.
- -1 (default) -- sorts along the last axis.

In [2]:

print(np.sort(A))                # sort along the last dimension (default)

[[ 0.01663677  0.08998309  0.39570028  0.75701306  0.77258603]
 [ 0.18768062  0.32741335  0.42942372  0.50594256  0.94156336]
 [ 0.53817992  0.60782182  0.79366708  0.79970882  0.98071259]]

In [3]:

print(np.sort(A, axis=0))        # sort the first dimension

[[ 0.01663677  0.08998309  0.39570028  0.50594256  0.18768062]
 [ 0.42942372  0.32741335  0.60782182  0.53817992  0.75701306]
 [ 0.79970882  0.98071259  0.94156336  0.77258603  0.79366708]]

In [4]:

print(np.sort(A, axis=None))     # sort the flattened array

[ 0.01663677  0.08998309  0.18768062  0.32741335  0.39570028  0.42942372
  0.50594256  0.53817992  0.60782182  0.75701306  0.77258603  0.79366708
  0.79970882  0.94156336  0.98071259]

Maximum and Minimum value¶

Returns the maximum (minimum) values along a dimension.

np.amax(arr, axis=...)

np.amin(arr, axis=...)

arr : Input array.
axis : dimension along which to find max/min value (integer).
- None (default) -- the array is flattened

Note: If at least one element is NaN, the searched max/min value will be NaN as well. To ignore NaN values (MATLAB behavior), use nanmax, nanmin

In [5]:

print(A)
print(np.amax(A))   # the maximum in the flattened array

[[ 0.01663677  0.08998309  0.39570028  0.77258603  0.75701306]
 [ 0.42942372  0.32741335  0.94156336  0.50594256  0.18768062]
 [ 0.79970882  0.98071259  0.60782182  0.53817992  0.79366708]]
0.980712588381

In [6]:

print(np.amax(A,axis=0))   # the maximum along the first dimension

[ 0.79970882  0.98071259  0.94156336  0.77258603  0.79366708]

In [7]:

print(np.amax(A,axis=1))   # the maximum along the second dimension

[ 0.77258603  0.94156336  0.98071259]

Maximum and Minimum location¶

Returns the indices of the maximum (minimum) values along a dimension.

np.argmax(arr, axis=...)

np.argmin(arr, axis=...)

arr : Input array.
axis : dimension along which to find max/min value (integer).
- None (default) -- the array is flattened

Note: If at least one element is NaN, the searched max/min value will be NaN as well. To ignore NaN values (MATLAB behavior), use nanargmax, nanargmin

In [8]:

print(A)
print(np.argmax(A))   # indice of the maximum in the flattened array

[[ 0.01663677  0.08998309  0.39570028  0.77258603  0.75701306]
 [ 0.42942372  0.32741335  0.94156336  0.50594256  0.18768062]
 [ 0.79970882  0.98071259  0.60782182  0.53817992  0.79366708]]
11

In [9]:

print(np.argmax(A,axis=0))   # indice of the maximum along the first dimension

[2 2 1 0 2]

In [10]:

print(np.argmax(A,axis=1)) # indice of the maximum along the second dimension

[3 2 1]

Compare amax and nanmax¶

In [11]:

b = A[0,:].copy()
b[2] = np.NaN
print(b)

[ 0.01663677  0.08998309         nan  0.77258603  0.75701306]

In [12]:

print(np.amax(b))

nan

In [13]:

print(np.nanmax(b))

0.772586029065

Use the results from argmax/argmin as an "index array"¶

The array generated by np.argmax,etc can be used as an "index array" to get elements from another array. (refer to NumpyArrayIndexing)

Example: Let generate another random 2-D array of the same shape as A. Get the elements in the new array at the location where the maxmum of A occurs along the first dimension.

In [14]:

S = np.random.rand(15).reshape(3,5) #2d array
print(S)

[[ 0.5876305   0.53293801  0.87346922  0.81107165  0.06274591]
 [ 0.02095624  0.11242287  0.77941256  0.06204872  0.67847421]
 [ 0.653141    0.82396529  0.60186961  0.43972113  0.13158718]]

In [15]:

indAmax=np.argmax(A,axis=0)
print(indAmax) # indice of the maximum along the first dimension

[2 2 1 0 2]

In [16]:

print(S[indAmax,[0,1,2,3,4]])  # elements in S where max A occurs along the first dimension

[ 0.653141    0.82396529  0.77941256  0.81107165  0.13158718]

Mask and count¶

Generate a mask with given logical condition
Returns a Boolean array (True=1/False=0)
Sum up the Boolean array = the count of elements that logical condition is True (=1)

arr.sum(axis)

arr: input array
axis: dimension along which to sum up the elements
- None (default) -- flattened the array

In [17]:

print(A)
A02=A>0.2         # mask of A>0.2
print(A02)        # Boolean array

[[ 0.01663677  0.08998309  0.39570028  0.77258603  0.75701306]
 [ 0.42942372  0.32741335  0.94156336  0.50594256  0.18768062]
 [ 0.79970882  0.98071259  0.60782182  0.53817992  0.79366708]]
[[False False  True  True  True]
 [ True  True  True  True False]
 [ True  True  True  True  True]]

In [18]:

print(A02.sum())  # count the total number of elements >0.2
print(A02.sum(0))  # count the number of elements >0.2 along the first dimension

12
[2 2 3 3 2]

"Mask out" using the Boolean arrays¶

The Boolean array of the logical condition can be used as "indices"
Returns only the elements that meet the logical condition

In [19]:

print(A[A02])

[ 0.39570028  0.77258603  0.75701306  0.42942372  0.32741335  0.94156336
  0.50594256  0.79970882  0.98071259  0.60782182  0.53817992  0.79366708]

numpy.where: mask out and replace 找出符合條件的元素並用指定的值取代¶

類似GrADS的const功能

np.where(condition, arr, newval)

condition: logical condition
arr: array to be masked out and replace
newval: new value to replace

參考資料: https://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html

In [20]:

Anew=np.where(A02,A,np.nan)
print(Anew)

[[        nan         nan  0.39570028  0.77258603  0.75701306]
 [ 0.42942372  0.32741335  0.94156336  0.50594256         nan]
 [ 0.79970882  0.98071259  0.60782182  0.53817992  0.79366708]]

Find and Count the NaN elements¶

Returns the boolean array of testing element with NaN.

np.isnan(arr)

In [21]:

print(Anew)
print(np.isnan(Anew))   # find the NaN elements

[[        nan         nan  0.39570028  0.77258603  0.75701306]
 [ 0.42942372  0.32741335  0.94156336  0.50594256         nan]
 [ 0.79970882  0.98071259  0.60782182  0.53817992  0.79366708]]
[[ True  True False False False]
 [False False False False  True]
 [False False False False False]]

In [22]:

print(np.isnan(Anew).sum(0))  # count the number of NaN along the first dimension

[1 1 0 0 1]