Python Pandas计算特定值的出现次数

我试图找到某个值出现在一列中的次数.

我用data = pd.DataFrame.from_csv(‘data / DataSet2.csv’)制作了数据帧

现在我想找到某个列出现的次数.这是怎么做到的？

我以为是下面的,我在教育栏目中查看并计算时间？发生.

下面的代码显示我试图找到第9次出现的次数,错误是我运行代码时得到的

码

missing2 = df.education.value_counts()['9th']
print(missing2)

错误

KeyError: '9th'

最佳答案

您可以根据条件创建数据子集,然后使用 shape或len：

print df
  col1 education
0    a       9th
1    b       9th
2    c       8th

print df.education == '9th'
0     True
1     True
2    False
Name: education, dtype: bool

print df[df.education == '9th']
  col1 education
0    a       9th
1    b       9th

print df[df.education == '9th'].shape[0]
2
print len(df[df['education'] == '9th'])
2

性能很有趣,最快的解决方案是比较numpy数组和sum：

graph

码：

import perfplot, string
np.random.seed(123)


def shape(df):
    return df[df.education == 'a'].shape[0]

def len_df(df):
    return len(df[df['education'] == 'a'])

def query_count(df):
    return df.query('education == "a"').education.count()

def sum_mask(df):
    return (df.education == 'a').sum()

def sum_mask_numpy(df):
    return (df.education.values == 'a').sum()

def make_df(n):
    L = list(string.ascii_letters)
    df = pd.DataFrame(np.random.choice(L, size=n), columns=['education'])
    return df

perfplot.show(
    setup=make_df,
    kernels=[shape, len_df, query_count, sum_mask, sum_mask_numpy],
    n_range=[2**k for k in range(2, 25)],
    logx=True,
    logy=True,
    equality_check=False, 
    xlabel='len(df)')

点击查看更多相关文章

转载注明原文：Python Pandas计算特定值的出现次数 - 乐贴网