如何使用pandas.concat()函数连接pandas.DataFrame和pandas.Series。
将对以下内容进行说明。
-
pandas.concat的基本用法()
- 指定要连接的对象:objs
- 连接方向的指定(垂直/水平):axis
- 指定连接方法(外部连接/内部连接):join
- pandas.DataFrame的连接
- pandas.Series的连接
- pandas.DataFrame和pandas.Series的连接
使用以下的pandas.DataFrame和pandas.Series为例。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
|
import pandas as pd df1 = pd.DataFrame({ 'A' : [ 'A1' , 'A2' , 'A3' ], 'B' : [ 'B1' , 'B2' , 'B3' ], 'C' : [ 'C1' , 'C2' , 'C3' ]}, index = [ 'ONE' , 'TWO' , 'THREE' ]) print (df1) # A B C # ONE A1 B1 C1 # TWO A2 B2 C2 # THREE A3 B3 C3 df2 = pd.DataFrame({ 'C' : [ 'C2' , 'C3' , 'C4' ], 'D' : [ 'D2' , 'D3' , 'D4' ]}, index = [ 'TWO' , 'THREE' , 'FOUR' ]) print (df2) # C D # TWO C2 D2 # THREE C3 D3 # FOUR C4 D4 s1 = pd.Series([ 'X1' , 'X2' , 'X3' ], index = [ 'ONE' , 'TWO' , 'THREE' ], name = 'X' ) print (s1) # ONE X1 # TWO X2 # THREE X3 # Name: X, dtype: object s2 = pd.Series([ 'Y2' , 'Y3' , 'Y4' ], index = [ 'TWO' , 'THREE' , 'FOUR' ], name = 'Y' ) print (s2) # TWO Y2 # THREE Y3 # FOUR Y4 # Name: Y, dtype: object |
pandas.concat的基本用法()
指定要连接的对象:objs
通过参数objs指定要连接的pandas.DataFrame和pandas.Series,指定类型为列表或元组。
1
2
3
4
5
6
7
8
9
|
df_concat = pd.concat([df1, df2]) print (df_concat) # A B C D # ONE A1 B1 C1 NaN # TWO A2 B2 C2 NaN # THREE A3 B3 C3 NaN # TWO NaN NaN C2 D2 # THREE NaN NaN C3 D3 # FOUR NaN NaN C4 D4 |
要连接的对象的数量不限于两个,可以是三个或更多。
1
2
3
4
5
6
7
8
9
10
11
12
|
df_concat_multi = pd.concat([df1, df2, df1]) print (df_concat_multi) # A B C D # ONE A1 B1 C1 NaN # TWO A2 B2 C2 NaN # THREE A3 B3 C3 NaN # TWO NaN NaN C2 D2 # THREE NaN NaN C3 D3 # FOUR NaN NaN C4 D4 # ONE A1 B1 C1 NaN # TWO A2 B2 C2 NaN # THREE A3 B3 C3 NaN |
结果是创建了一个新的对象,原始对象保持不变。
连接方向的指定(垂直/水平):axis
垂直或水平方向由axis参数指定。 如果axis = 0,则它们是垂直链接的。默认设置为axis = 0,因此可以省略不写。
1
2
3
4
5
6
7
8
9
|
df_v = pd.concat([df1, df2], axis = 0 ) print (df_v) # A B C D # ONE A1 B1 C1 NaN # TWO A2 B2 C2 NaN # THREE A3 B3 C3 NaN # TWO NaN NaN C2 D2 # THREE NaN NaN C3 D3 # FOUR NaN NaN C4 D4 |
axis = 1,水平方向上连接。
1
2
3
4
5
6
7
|
df_h = pd.concat([df1, df2], axis = 1 ) print (df_h) # A B C C D # ONE A1 B1 C1 NaN NaN # TWO A2 B2 C2 C2 D2 # THREE A3 B3 C3 C3 D3 # FOUR NaN NaN NaN C4 D4 |
指定连接方法(外部连接/内部连接):join
参数join:指定列名(或行名)的并集,或者仅将公共部分保留。
join ='outer’是外部连接。列名(或行名)形成一个联合,保留所有列(或行)。它是默认设置,因此可以省略不写。在这种情况下,原始对象中列(或行)不存在的值将由的缺少值NaN代替。
join ='inner’是内部连接。仅保留具有相同列名(或行名)的列(或行)。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
|
df_v_out = pd.concat([df1, df2], join = 'outer' ) print (df_v_out) # A B C D # ONE A1 B1 C1 NaN # TWO A2 B2 C2 NaN # THREE A3 B3 C3 NaN # TWO NaN NaN C2 D2 # THREE NaN NaN C3 D3 # FOUR NaN NaN C4 D4 df_v_in = pd.concat([df1, df2], join = 'inner' ) print (df_v_in) # C # ONE C1 # TWO C2 # THREE C3 # TWO C2 # THREE C3 # FOUR C4 |
水平方向。
1
2
3
4
5
6
7
8
9
10
11
12
13
|
df_h_out = pd.concat([df1, df2], axis = 1 , join = 'outer' ) print (df_h_out) # A B C C D # FOUR NaN NaN NaN C4 D4 # ONE A1 B1 C1 NaN NaN # THREE A3 B3 C3 C3 D3 # TWO A2 B2 C2 C2 D2 df_h_in = pd.concat([df1, df2], axis = 1 , join = 'inner' ) print (df_h_in) # A B C C D # TWO A2 B2 C2 C2 D2 # THREE A3 B3 C3 C3 D3 |
如何更改列名和行名,请参考下列连接。
pandas.DataFrame的连接
将pandas.DataFrames连接在一起时,返回的也是pandas.DataFrame类型的对象。
1
2
3
4
5
6
7
8
9
10
11
12
|
df_concat = pd.concat([df1, df2]) print (df_concat) # A B C D # ONE A1 B1 C1 NaN # TWO A2 B2 C2 NaN # THREE A3 B3 C3 NaN # TWO NaN NaN C2 D2 # THREE NaN NaN C3 D3 # FOUR NaN NaN C4 D4 print ( type (df_concat)) # <class 'pandas.core.frame.DataFrame'> |
pandas.Series的连接
如果是pandas.Series之间的连接,则垂直连接(默认值axis= 0)返回的也是pandas.Series类型的对象。
1
2
3
4
5
6
7
8
9
10
11
12
|
s_v = pd.concat([s1, s2]) print (s_v) # ONE X1 # TWO X2 # THREE X3 # TWO Y2 # THREE Y3 # FOUR Y4 # dtype: object print ( type (s_v)) # <class 'pandas.core.series.Series'> |
axis = 1时,水平方向连接,返回pandas.DataFrame类型的对象。
1
2
3
4
5
6
7
8
9
10
|
s_h = pd.concat([s1, s2], axis = 1 ) print (s_h) # X Y # FOUR NaN Y4 # ONE X1 NaN # THREE X3 Y3 # TWO X2 Y2 print ( type (s_h)) # <class 'pandas.core.frame.DataFrame'> |
也可以使用参数join。
1
2
3
4
5
|
s_h_in = pd.concat([s1, s2], axis = 1 , join = 'inner' ) print (s_h_in) # X Y # TWO X2 Y2 # THREE X3 Y3 |
pandas.DataFrame和pandas.Series的连接
对于pandas.DataFrame和pandas.Series连接,水平连接(axis= 1)将pandas.Series添加为新列。列名称是pandas.Series的名称。
1
2
3
4
5
6
7
|
df_s_h = pd.concat([df1, s2], axis = 1 ) print (df_s_h) # A B C Y # FOUR NaN NaN NaN Y4 # ONE A1 B1 C1 NaN # THREE A3 B3 C3 Y3 # TWO A2 B2 C2 Y2 |
也可以使用参数join。
1
2
3
4
5
|
df_s_h_in = pd.concat([df1, s2], axis = 1 , join = 'inner' ) print (df_s_h_in) # A B C Y # TWO A2 B2 C2 Y2 # THREE A3 B3 C3 Y3 |
垂直连接(axis = 0)。
1
2
3
4
5
6
7
8
9
|
df_s_v = pd.concat([df1, s2]) print (df_s_v) # A B C 0 # ONE A1 B1 C1 NaN # TWO A2 B2 C2 NaN # THREE A3 B3 C3 NaN # TWO NaN NaN NaN Y2 # THREE NaN NaN NaN Y3 # FOUR NaN NaN NaN Y4 |
添加行,可以在.loc中指定新的行名称并分配值,或使用append()方法。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
df1.loc[ 'FOUR' ] = [ 'A4' , 'B4' , 'C4' ] print (df1) # A B C # ONE A1 B1 C1 # TWO A2 B2 C2 # THREE A3 B3 C3 # FOUR A4 B4 C4 s = pd.Series([ 'A5' , 'B5' , 'C5' ], index = df1.columns, name = 'FIVE' ) print (s) # A A5 # B B5 # C C5 # Name: FIVE, dtype: object df_append = df1.append(s) print (df_append) # A B C # ONE A1 B1 C1 # TWO A2 B2 C2 # THREE A3 B3 C3 # FOUR A4 B4 C4 # FIVE A5 B5 C5 |
到此这篇关于Pandas.concat连接DataFrame,Series的示例代码的文章就介绍到这了,更多相关Pandas concat连接内容请搜索服务器之家以前的文章或继续浏览下面的相关文章希望大家以后多多支持服务器之家!
原文链接:https://blog.csdn.net/qq_18351157/article/details/104557778