+-
我将目录中的所有文件连接成一个,但是一些文件具有不同数量的条目 – 当文件中没有该键的值时,如何放置NaN?
例如:
file1.cs
NUM, NAME, ORG, DATA
1,AAA,10,123.4
1,AAB,20,176.5
1,AAC,30,133.5
文件2. CS
NUM, NAME, ORG, DATA
1,AAA,10,111.4
1,AAC,30,122.5
2,BBA,12,156.7
期望的输出
NUM, NAME, ORG, File1, File2 ....
1, AAA, 10, 123.4, 111.4
1, AAB, 20, 176.5, NaN
1, AAC, 30, 133.5, 122.5
2, BBA, 12, NaN, 156.7
.....
这就是我尝试过的:
import pandas as pd
import glob
writer = pd.ExcelWriter('analysis.xlsx', engine='xlsxwriter')
data = []
df1 = pd.read_csv("file1.cs", sep = ',', header = 'infer')
for infile in glob.glob("*.cs"):
df = pd.read_csv(infile, sep = ',', header = 'infer')
name = infile[13:-7]
df['filename'] = name
data.append(df)
result = pd.merge(df1, data.to_frame(), on= 'NAME')
result.to_excel(writer, sheet_name=sheetname)
writer.save()
我也尝试过pd.concat(data,axis = 1,ignore_index = False),但这不会添加NaN,因为它只是根据列名连接文件.
最佳答案
使用merge参数如何等于’outer’:
df1.merge(df2, on=['NUM','NAME','ORG'], how='outer')
输出:
NUM NAME ORG DATA_x DATA_y
0 1 AAA 10 123.4 111.4
1 1 AAB 20 176.5 NaN
2 1 AAC 30 133.5 122.5
3 2 BBA 12 NaN 156.7
要获得您的确切输出,请使用:
df1.rename(columns={'DATA':'FILE'})\
.merge(df2.rename(columns={'DATA':'FILE'}),
on=['NUM','NAME','ORG'],
how='outer',
suffixes=('1','2'))
输出:
NUM NAME ORG FILE1 FILE2
0 1 AAA 10 123.4 111.4
1 1 AAB 20 176.5 NaN
2 1 AAC 30 133.5 122.5
3 2 BBA 12 NaN 156.7
点击查看更多相关文章
转载注明原文:python – Pandas – 在pd.merge时为缺少的值添加NaN - 乐贴网