删除DataFrame中某列值为NaN的记录/行

这是系列文章,我会按照stackoverflow上pandas相关问题投票数排序进行整理学习。不学习是会变咸鱼的~

原问题:How to drop row of Pandas DataFrame whose value in certain columns is NaN

如题。我们先创建数据

1
2
3
4
5
6
7
8
9
10
11
import pandas as pd
import numpy as np

input_rows = [[1,2,3], [2,3,4], [np.nan, 2, np.nan, 5], [np.nan, 5, 7]]
df = pd.DataFrame(input_rows, columns=['a', 'b', 'c', 'd'])

out: a b c d
0 1.0 2 3.0 NaN
1 2.0 3 4.0 NaN
2 NaN 2 NaN 5.0
3 NaN 5 7.0 NaN

1)DataFrame.dropna

1
2
# 删除所有带有NaN的行
df.dropna()

关于dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)参数的说明:

  • axis:默认是0,即删除行。1或者columns则是删除列
  • how:删除方式。any删除至少有一个NaN的行/列;all删除全部都是NaN的行/列
  • thresh:阈值。int,删除的行/列至少有n个NaN值
  • subset:列表。columns或者index,只删除指定列/行

2)pandas.notnull

1
df = df[pd.notnull(df['a'])]

3)pandas.isnull

1
df = df[~pd.isnull(df['d'])]

4)numpy.isnan

1
df = df[~np.isnan(df['a'])]

5)query

1
df = df.query('a == a')