Programming languages should have a tree traversal primitive
There should be a control flow construct in programming languages that can handle tree-like traversal in a nice way, similar to how for/foreach loops can handle linear traversal. It's a bit of a missing gap in the current set of control flow constructs most languages these days have settled on. Its a thing I end up having to do *all the time* and it seems like there should be some shortcuts for it.
I posted a thought about this recently and was thinking about how I would want something like that to look like. I can't find anything similar in any other languages. I'm most familiar with C++, so my sample code here is going to be C++-ish, but its general enough that you could fit it into basically anything else. I'm not a language design expert, so this is more of a loose idea than a formal proposal.
for_tree(Node* N = mytreeroot; N != NULL; N : {N->left, N->right}){
print(N->value);
}
with the specifics removed, this is *almost* the same syntax as an existing for loop
for_tree(init; condition; branch){
//body
}
It's basically a for loop that branches instead of continuing linearly.
"init" here runs when you enter the for_tree loop, exactly the same as a normal for loop
"condition" must be met to enter the body of the loop
"branch" evolves the initial value into multiple branches (it would have to compile down into recursive function calls here)
"Why not just use recursive functions"
Well, this is significantly easier and less error prone than implementing recursive functions for every operation you'd want to do on every type of tree, plus all the relevant code ends up in-place and readable. This would likely compile down to the same code that a recursive function would anyway. It's just a nice shortcut, the same way for is a nice shortcut for gotos and labels.
for_tree(Node* N = mytreeroot; N != NULL; N : {N->left, N->right}){
print(N->value);
}
would compile down into the equivalent of
void for_tree_internal(Node* N){
print(N->value);
for(Node* n2 : {N->left, N->right}){
if(n2 != NULL){
for_tree_internal(n2);
}
}
}
for_tree_internal(mytreeroot);
Doesn't for_tree(...) look a lot nicer and simpler and less error prone than needing to implement a recursive function for each operation you would want to do on a tree?
Additionally, for_tree would offer a few more benefits that you couldn't easily do with a recursive function, for instance being able to use break, continue, and return from within the function body. Those would behave similarly to how a for loop does.
for_tree(Node* N = mytreeroot; N != NULL; N : {N->left, N->right}){
if(N->value == 10) {
found = true;
break; //would break out of the entire for_tree block,
//unwinding the stack in the process
}
}
There is an additional flow construct that could be added here, "prune", which would prevent the loop from traversing into the branches off of the current node.
for_tree(Node* N = mytreeroot; N != NULL; N : {N->left, N->right}){
if(N->value == 10) {
prune; //dont need to check any children of this node
}
}
"Why not just use a range based for loop?"
Well a range based for loop requires that your tree exist in memory AND that you have an iterator defined for your tree. With for_tree you could operate on an entirely imperative tree, without needing to define any iterators or generator functions. Here's an example where I'm checking every single string composed of "a", "b", and "c" of length 8 or less.
for_tree(string x = ""; x.size() <= 8; x : {x+"a", x+"b", x+"c"}){
print(x);
}
This is an operation that requires "tree-like traversal" but is not iterating over a data structure that exists in memory. You can do that entire thing inside for_tree. It's entirely agnostic to any underlying data structures you are or aren't using. The "prune" flow described earlier is also something you can't get from a range based for loop.
There is a bit of a footgun here, which is that in that above string example its actually generating and rejecting all of the strings of length 9. A normal for loop *also* reaches the state after its last valid state before exiting, its just kind of not that much of an issue with a linear for loop where that will usually be like, one extra addition and conditional check. On the other hand, "one level deeper" in a tree traversal means checking the validity of multiple times as many leaf nodes as you need to.
You could resolve this manually in multiple ways, generate an empty branch list if you are at a leaf already
for_tree(string x = ""; x.size() <= 8;
x : x.size() < 8 ? {x+"a", x+"b", x+"c"} : {}){
print(x);
}
or just prune in the function body
for_tree(string x = ""; x.size() <= 8; x : {x+"a", x+"b", x+"c"}){
print(x);
if(x.size() == 8) prune;
}
There may be a better way to resolve that, but would have to pull the syntax further away from a for loop I think. Anyway.
"What if you wanted to traverse breadth-first instead of depth-first"
So, a depth first search is pretty simple, uses minimal extra memory, and works great with the stack that we are already using. BFS requires a lot of extra memory (enough that it would likely need to use the heap) and is more complex. If you really wanted to you could do an alternate function like for_tree_breadth_first(...), but I think the extra complexity needed for a BFS might be too much for a “primitive” construct.
Anyway, as a bonus here's a test C++ implementation that tries to get as close to this as possible with a bunch of templates and macros. The usage is uglier, and a few constructs do not work (ex, you cant return from within a for_tree here), but its kind of a neat proof of concept I think.
#include <vector> | |
#include <string> | |
#include <iostream> | |
enum class _tree_return { | |
Continue = 0, | |
Prune, | |
Break | |
}; | |
template<typename T, typename F1, typename F2, typename F3> | |
_tree_return _for_tree(T initial, F1 condition, F2 branch, F3 visit) { | |
_tree_return result = visit(initial); | |
if(result == _tree_return::Break) return _tree_return::Break; | |
if(result != _tree_return::Prune) { | |
for(T subnode : branch(initial)) { | |
if(condition(subnode)) { | |
_tree_return result = _for_tree(subnode, condition, branch, visit); | |
if(result == _tree_return::Break) return _tree_return::Break; | |
} | |
} | |
} | |
return _tree_return::Continue; | |
} | |
#define tree_break return _tree_return::Break | |
#define tree_prune return _tree_return::Prune | |
#define tree_continue return _tree_return::Continue | |
//v-- semicolon to not allow you to get the return value here | |
#define for_tree(XName, Xinitial, Condition, Branch, Visit) ;_for_tree(Xinitial, \ | |
[&](decltype(Xinitial) XName){ return Condition; }, \ | |
[&](decltype(Xinitial) XName){ return std::vector<decltype(Xinitial)>Branch; }, \ | |
[&](decltype(Xinitial) XName){ Visit; return _tree_return::Continue; }) | |
//excuse the use of a std::vector in there, I guess you cant return an initialize_list from a lambda | |
//that wouldn't really be an issue if this was implemented at the language level instead of hacked together from lambdas and macros | |
struct Node { | |
Node* left = NULL; | |
Node* right = NULL; | |
std::string value; | |
Node(std::string value):value(value){} | |
}; | |
int main() { | |
//syntax is a little uglier than it could be if it was native | |
//imperative tree sample | |
for_tree(x, std::string(""), x.size()<=8, ({x+"a", x+"b", x+"c"}), { | |
std::cout << x << std::endl; | |
}); | |
//tree structure sample | |
Node mytree("root"); | |
mytree.left = new Node("left"); | |
mytree.right = new Node("right"); | |
mytree.left->left = new Node("leftleft"); | |
for_tree(x, &mytree, x != NULL, ({x->left, x->right}), { | |
std::cout << x->value << std::endl; | |
}); | |
return 0; | |
} |