# Numpy mean of nonzero values

*Solution:*

Get the count of non-zeros in each row and use that for averaging the summation along each row. Thus, the implementation would look something like this -

```
np.true_divide(matrix.sum(1),(matrix!=0).sum(1))
```

If you are on an older version of NumPy, you can use float conversion of the count to replace `np.true_divide`

, like so -

```
matrix.sum(1)/(matrix!=0).sum(1).astype(float)
```

Sample run -

```
In [160]: matrix
Out[160]:
array([[0, 0, 1, 0, 2],
[1, 0, 0, 2, 0],
[0, 1, 1, 0, 0],
[0, 2, 2, 2, 2]])
In [161]: np.true_divide(matrix.sum(1),(matrix!=0).sum(1))
Out[161]: array([ 1.5, 1.5, 1. , 2. ])
```

Another way to solve the problem would be to replace zeros with `NaNs`

and then use `np.nanmean`

, which would ignore those `NaNs`

and in effect those original `zeros`

, like so -

```
np.nanmean(np.where(matrix!=0,matrix,np.nan),1)
```

From performance point of view, I would recommend the first approach.

I will detail here the more general solution that uses a masked array. To illustrate the details let's create an lower triangular matrix with only ones:

```
matrix = np.tril(np.ones((5, 5)), 0)
```

If you the terminology above is not clear this matrix looks like this:

```
[[ 1., 0., 0., 0., 0.],
[ 1., 1., 0., 0., 0.],
[ 1., 1., 1., 0., 0.],
[ 1., 1., 1., 1., 0.],
[ 1., 1., 1., 1., 1.]]
```

Now, we want our function to return an average of 1 for each of rows. Or in other words that the mean over the axis 1 is equal to a vector of five ones. In order to achieve this we created a masked matrix **where the entries whose values are zero are considered invalid**. This can be achieved with`np.ma.masked_equal`

:

```
masked = np.ma.masked_equal(matrix, 0)
```

Finally we perform numpy operations in this array that will systematically ignore the masked elements (the 0's). With this in mind we obtain the desired result by:

```
masked.mean(axis=1)
```

This should produce a vector whose entries are only ones.

In more detail the output of `np.ma.masked_equal(matrix, 0)`

should look like this:

```
masked_array(data =
[[1.0 -- -- -- --]
[1.0 1.0 -- -- --]
[1.0 1.0 1.0 -- --]
[1.0 1.0 1.0 1.0 --]
[1.0 1.0 1.0 1.0 1.0]],
mask =
[[False True True True True]
[False False True True True]
[False False False True True]
[False False False False True]
[False False False False False]],
fill_value = 0.0)
```

This indicates that eh values on `--`

are considered invalid. This is also shown in the mask attribute of the masked arrays as True **which indicates that IT IS an invalid element** and therefore should be ignored.

Finally the output of the mean operation on this array should is:

```
masked_array(data = [1.0 1.0 1.0 1.0 1.0],
mask = [False False False False False],
fill_value = 1e+20)
```