+-
python – Pandas – 在pd.merge时为缺少的值添加NaN
我将目录中的所有文件连接成一个,但是一些文件具有不同数量的条目 – 当文件中没有该键的值时,如何放置NaN?

例如:

file1.cs

 NUM, NAME, ORG, DATA
 1,AAA,10,123.4
 1,AAB,20,176.5
 1,AAC,30,133.5

文件2. CS

 NUM, NAME, ORG, DATA
 1,AAA,10,111.4
 1,AAC,30,122.5
 2,BBA,12,156.7

期望的输出

 NUM, NAME, ORG, File1, File2 ....
 1, AAA, 10, 123.4, 111.4
 1, AAB, 20, 176.5, NaN
 1, AAC, 30, 133.5, 122.5
 2, BBA, 12, NaN,   156.7
 .....

这就是我尝试过的:

import pandas as pd
import glob

writer = pd.ExcelWriter('analysis.xlsx', engine='xlsxwriter')
data = []
df1 = pd.read_csv("file1.cs", sep = ',', header = 'infer')    

for infile in glob.glob("*.cs"):
    df = pd.read_csv(infile, sep = ',', header = 'infer')
    name = infile[13:-7]
    df['filename'] = name
    data.append(df)
result = pd.merge(df1, data.to_frame(), on= 'NAME')
result.to_excel(writer, sheet_name=sheetname)
writer.save()

我也尝试过pd.concat(data,axis = 1,ignore_index = False),但这不会添加NaN,因为它只是根据列名连接文件.

最佳答案
使用merge参数如何等于’outer’:

df1.merge(df2, on=['NUM','NAME','ORG'], how='outer')

输出:

   NUM NAME  ORG  DATA_x  DATA_y
0    1  AAA   10   123.4   111.4
1    1  AAB   20   176.5     NaN
2    1  AAC   30   133.5   122.5
3    2  BBA   12     NaN   156.7

要获得您的确切输出,请使用:

df1.rename(columns={'DATA':'FILE'})\
   .merge(df2.rename(columns={'DATA':'FILE'}), 
         on=['NUM','NAME','ORG'],
         how='outer', 
         suffixes=('1','2'))

输出:

   NUM NAME  ORG  FILE1  FILE2
0    1  AAA   10  123.4  111.4
1    1  AAB   20  176.5    NaN
2    1  AAC   30  133.5  122.5
3    2  BBA   12    NaN  156.7
点击查看更多相关文章

转载注明原文:python – Pandas – 在pd.merge时为缺少的值添加NaN - 乐贴网