如何使用递归记录父子层次结构中的所有路由?

人气:774 发布:2022-10-16 标签: recursion python hierarchy pandas dataframe

问题描述

我正在尝试浏览分层数据帧,并将每条可能的路由记录到另一个数据帧中。这些路线的深度可以可变。

原始数据帧(DF)。最高列表示父列中的值不是任何:

的子值 父级 子项 最高 a b 1 b c 0 b d 0 d e 0

最终目标数据帧:

级别3 级别2 级别1 级别0 a b c a b d e

这就是我目前拥有的

def search(parent):
    for i in range(df.shape[0]):
        if(df.iloc[i,0] == parent):
            search(df.iloc[i,1])

for i in range(df.shape[0]):
    if(df.iloc[i,2] == 1):
        search(df.iloc[i,0])

我可以浏览层次结构,但我不知道如何将其保存为所需的格式。

推荐答案

可以使用networkx来解决。注如果使用networkx,则不需要highest列。查找所有路径的主要函数是all_simple_paths

# Python env: pip install networkx
# Anaconda env: conda install networkx
import networkx as nx

# Create network from your dataframe
#G = nx.from_pandas_edgelist(df, source='parent', target='child',
#                            create_using=nx.DiGraph)

# For older versions of networkx
G = nx.DiGraph()
for _, (source, target) in df[['parent', 'child']].iterrows():
    G.add_edge(source, target)

# Find roots of your graph (a root is a node with no input)
roots = [node for node, degree in G.in_degree() if degree == 0]

# Find leaves of your graph (a leaf is a node with no output)
leaves = [node for node, degree in G.out_degree() if degree == 0]

# Find all paths
paths = []
for root in roots:
  for leaf in leaves:
    for path in nx.all_simple_paths(G, root, leaf):
        paths.append(path)

# Create a new dataframe
out = pd.DataFrame(paths).fillna('')
out.columns = reversed(out.add_prefix('level ').columns)

输出:

>>> out
  level 3 level 2 level 1 level 0
0       a       b       c        
1       a       b       d       e

811