Numpy Array Sort, Search, Count, Replace

Numpy也有指令可將陣列中的元素做排序、找出最大、最小的元素位置、篩選找出符合條件的元素、取代等。

舉例前,我們先用亂數建立一個二維陣列:

In [1]:
import numpy as np
A = np.random.rand(15).reshape(3,5) #2d array
print(A)
[[ 0.01663677  0.08998309  0.39570028  0.77258603  0.75701306]
 [ 0.42942372  0.32741335  0.94156336  0.50594256  0.18768062]
 [ 0.79970882  0.98071259  0.60782182  0.53817992  0.79366708]]

Sorting

np.sort(arr, axis=...)

  • arr: Array to be sorted.
  • axis: dimension along which to sort (integer).
    • None -- the array is flattened before sorting.
    • -1 (default) -- sorts along the last axis.
In [2]:
print(np.sort(A))                # sort along the last dimension (default)
[[ 0.01663677  0.08998309  0.39570028  0.75701306  0.77258603]
 [ 0.18768062  0.32741335  0.42942372  0.50594256  0.94156336]
 [ 0.53817992  0.60782182  0.79366708  0.79970882  0.98071259]]
In [3]:
print(np.sort(A, axis=0))        # sort the first dimension
[[ 0.01663677  0.08998309  0.39570028  0.50594256  0.18768062]
 [ 0.42942372  0.32741335  0.60782182  0.53817992  0.75701306]
 [ 0.79970882  0.98071259  0.94156336  0.77258603  0.79366708]]
In [4]:
print(np.sort(A, axis=None))     # sort the flattened array
[ 0.01663677  0.08998309  0.18768062  0.32741335  0.39570028  0.42942372
  0.50594256  0.53817992  0.60782182  0.75701306  0.77258603  0.79366708
  0.79970882  0.94156336  0.98071259]

Maximum and Minimum value

Returns the maximum (minimum) values along a dimension.

np.amax(arr, axis=...)

np.amin(arr, axis=...)

  • arr : Input array.
  • axis : dimension along which to find max/min value (integer).
    • None (default) -- the array is flattened

Note: If at least one element is NaN, the searched max/min value will be NaN as well. To ignore NaN values (MATLAB behavior), use nanmax, nanmin

In [5]:
print(A)
print(np.amax(A))   # the maximum in the flattened array
[[ 0.01663677  0.08998309  0.39570028  0.77258603  0.75701306]
 [ 0.42942372  0.32741335  0.94156336  0.50594256  0.18768062]
 [ 0.79970882  0.98071259  0.60782182  0.53817992  0.79366708]]
0.980712588381
In [6]:
print(np.amax(A,axis=0))   # the maximum along the first dimension
[ 0.79970882  0.98071259  0.94156336  0.77258603  0.79366708]
In [7]:
print(np.amax(A,axis=1))   # the maximum along the second dimension
[ 0.77258603  0.94156336  0.98071259]

Maximum and Minimum location

Returns the indices of the maximum (minimum) values along a dimension.

np.argmax(arr, axis=...)

np.argmin(arr, axis=...)

  • arr : Input array.
  • axis : dimension along which to find max/min value (integer).
    • None (default) -- the array is flattened

Note: If at least one element is NaN, the searched max/min value will be NaN as well. To ignore NaN values (MATLAB behavior), use nanargmax, nanargmin

In [8]:
print(A)
print(np.argmax(A))   # indice of the maximum in the flattened array
[[ 0.01663677  0.08998309  0.39570028  0.77258603  0.75701306]
 [ 0.42942372  0.32741335  0.94156336  0.50594256  0.18768062]
 [ 0.79970882  0.98071259  0.60782182  0.53817992  0.79366708]]
11
In [9]:
print(np.argmax(A,axis=0))   # indice of the maximum along the first dimension
[2 2 1 0 2]
In [10]:
print(np.argmax(A,axis=1)) # indice of the maximum along the second dimension
[3 2 1]

Compare amax and nanmax

In [11]:
b = A[0,:].copy()
b[2] = np.NaN
print(b)
[ 0.01663677  0.08998309         nan  0.77258603  0.75701306]
In [12]:
print(np.amax(b))
nan
In [13]:
print(np.nanmax(b))
0.772586029065

Use the results from argmax/argmin as an "index array"

The array generated by np.argmax,etc can be used as an "index array" to get elements from another array. (refer to NumpyArrayIndexing)

Example: Let generate another random 2-D array of the same shape as A. Get the elements in the new array at the location where the maxmum of A occurs along the first dimension.

In [14]:
S = np.random.rand(15).reshape(3,5) #2d array
print(S)
[[ 0.5876305   0.53293801  0.87346922  0.81107165  0.06274591]
 [ 0.02095624  0.11242287  0.77941256  0.06204872  0.67847421]
 [ 0.653141    0.82396529  0.60186961  0.43972113  0.13158718]]
In [15]:
indAmax=np.argmax(A,axis=0)
print(indAmax) # indice of the maximum along the first dimension
[2 2 1 0 2]
In [16]:
print(S[indAmax,[0,1,2,3,4]])  # elements in S where max A occurs along the first dimension
[ 0.653141    0.82396529  0.77941256  0.81107165  0.13158718]

Mask and count

  • Generate a mask with given logical condition
  • Returns a Boolean array (True=1/False=0)
  • Sum up the Boolean array = the count of elements that logical condition is True (=1)

arr.sum(axis)

  • arr: input array
  • axis: dimension along which to sum up the elements
    • None (default) -- flattened the array
In [17]:
print(A)
A02=A>0.2         # mask of A>0.2
print(A02)        # Boolean array
[[ 0.01663677  0.08998309  0.39570028  0.77258603  0.75701306]
 [ 0.42942372  0.32741335  0.94156336  0.50594256  0.18768062]
 [ 0.79970882  0.98071259  0.60782182  0.53817992  0.79366708]]
[[False False  True  True  True]
 [ True  True  True  True False]
 [ True  True  True  True  True]]
In [18]:
print(A02.sum())  # count the total number of elements >0.2
print(A02.sum(0))  # count the number of elements >0.2 along the first dimension
12
[2 2 3 3 2]

"Mask out" using the Boolean arrays

  • The Boolean array of the logical condition can be used as "indices"
  • Returns only the elements that meet the logical condition
In [19]:
print(A[A02])
[ 0.39570028  0.77258603  0.75701306  0.42942372  0.32741335  0.94156336
  0.50594256  0.79970882  0.98071259  0.60782182  0.53817992  0.79366708]

numpy.where: mask out and replace 找出符合條件的元素並用指定的值取代

類似GrADS的const功能

np.where(condition, arr, newval)

  • condition: logical condition
  • arr: array to be masked out and replace
  • newval: new value to replace

參考資料: https://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html

In [20]:
Anew=np.where(A02,A,np.nan)
print(Anew)
[[        nan         nan  0.39570028  0.77258603  0.75701306]
 [ 0.42942372  0.32741335  0.94156336  0.50594256         nan]
 [ 0.79970882  0.98071259  0.60782182  0.53817992  0.79366708]]

Find and Count the NaN elements

Returns the boolean array of testing element with NaN.

np.isnan(arr)

In [21]:
print(Anew)
print(np.isnan(Anew))   # find the NaN elements
[[        nan         nan  0.39570028  0.77258603  0.75701306]
 [ 0.42942372  0.32741335  0.94156336  0.50594256         nan]
 [ 0.79970882  0.98071259  0.60782182  0.53817992  0.79366708]]
[[ True  True False False False]
 [False False False False  True]
 [False False False False False]]
In [22]:
print(np.isnan(Anew).sum(0))  # count the number of NaN along the first dimension
[1 1 0 0 1]