问题描述
我正在尝试浏览分层数据帧,并将每条可能的路由记录到另一个数据帧中。这些路线的深度可以可变。原始数据帧(DF)。最高列表示父列中的值不是任何:
的子值 父级 子项 最高 a b 1 b c 0 b d 0 d e 0最终目标数据帧:
级别3 级别2 级别1 级别0 a b c a b d e这就是我目前拥有的
def search(parent):
for i in range(df.shape[0]):
if(df.iloc[i,0] == parent):
search(df.iloc[i,1])
for i in range(df.shape[0]):
if(df.iloc[i,2] == 1):
search(df.iloc[i,0])
我可以浏览层次结构,但我不知道如何将其保存为所需的格式。
推荐答案
可以使用networkx
来解决。注如果使用networkx
,则不需要highest
列。查找所有路径的主要函数是all_simple_paths
# Python env: pip install networkx
# Anaconda env: conda install networkx
import networkx as nx
# Create network from your dataframe
#G = nx.from_pandas_edgelist(df, source='parent', target='child',
# create_using=nx.DiGraph)
# For older versions of networkx
G = nx.DiGraph()
for _, (source, target) in df[['parent', 'child']].iterrows():
G.add_edge(source, target)
# Find roots of your graph (a root is a node with no input)
roots = [node for node, degree in G.in_degree() if degree == 0]
# Find leaves of your graph (a leaf is a node with no output)
leaves = [node for node, degree in G.out_degree() if degree == 0]
# Find all paths
paths = []
for root in roots:
for leaf in leaves:
for path in nx.all_simple_paths(G, root, leaf):
paths.append(path)
# Create a new dataframe
out = pd.DataFrame(paths).fillna('')
out.columns = reversed(out.add_prefix('level ').columns)
输出:
>>> out
level 3 level 2 level 1 level 0
0 a b c
1 a b d e