33 Comments

Awesome article!

I've read around various reports where ChatGPT's actual logical thinking and problem-solving performance greatly increases when instructed to "think step by step and explain your reasoning", "describe your approach in great detail before tackling the problem" and such. Maybe that would allow the user to "inject" a step it missed in its reasoning.

I think it would be very interesting to prime it in such a way, correcting and adjusting its "train of thought" (so to speak) and maybe ask for a pseudocode implementation of the algorithm, before it even attempted to write a single line of actual code. I have no idea if it would matter, but from what I can recollect I've had better/more appropriate results when I tried to allow it for more prior planning instead of asking for code right away. Specifically, I noticed it tends to end in those "wrong solutions loop" (I've had those too) much less often. That was on ChatGPT3.5 though, I've no access to v4 ATM.

Expand full comment

Perhaps I'm missing a detail, but why is the pathfinding problem here not a textbook variation on Dijkstra? To quote WP: https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm#Related_problems_and_algorithms

> The functionality of Dijkstra's original algorithm can be extended with a variety of modifications. For example, sometimes it is desirable to present solutions which are less than mathematically optimal. To obtain a ranked list of less-than-optimal solutions, the optimal solution is first calculated. A single edge appearing in the optimal solution is removed from the graph, and the optimum solution to this new graph is calculated. Each edge of the original solution is suppressed in turn and a new shortest-path calculated. The secondary solutions are then ranked and presented after the first optimal solution.

This would seem to cover your fire example and your different-requested-ranges example. You do Dijkstra to obtain the feasible path (taking into account movement penalties like water); then you calculate the next less optimal as described; you calculate the total damage of the feasible paths, sort by least damage to find the safest but still feasible path to show the player, and there you go - you get the safest feasible path first, then a modest amount of damage, then more damage, etc. This also covers the tradeoff between range and damage: if you target a nearby tile, then there will almost certainly be a safe feasible path, and an intermediate range may be able to avoid most damage, while if you target one at the maximum range, then you will just have to incur whatever damage happens along the way because that will be the only feasible path. This also naturally handles differently weighted damage if you have more hazards than just fire. Plus, it sounds much more straightforward and debuggable than whatever it is you have because it's factorized into first feasible movement, and then preference over movements, so you can see where any bugs come from: did the movement planning erroneously think a path was impossible, or did the preference misrank it?

> or if you think that it really could just be A* with a modified heuristic instead

I don't think that's equivalent to A* with a modified heuristic.

Expand full comment

I failed to see how this solves the fire problem, the logic seems to be disconnected here.

Nothing in what you said solve the problem of trying to not pass through some type of tile if possible.

Imagine you are away from the target by a wall of fire tile of size 2N+1, the tile in front of you is a fire and there're 2 half-wall of size N on each side. Then the feasible path is of length 2, the fire tile then the target.

Then how can you extend from that 2-path to 2N+3 path (i.e going around the fire from 1 side)? The list of less than optimal solutions described in the wiki page will be of size 3, while the actual solution is the 2N+3 path. Well, there's a way to fix this problem for sure, but it's just not with whatever you commented there.

Expand full comment

I don't follow you not following. The modified version will calculate the admissible paths, starting with the first one through the fire which is the minimal distance, then the increasing distance-paths which go around the fire, and you apply your penalty weights to discard the fire paths and leave just the round-about paths; penalties and ranking can use whatever utility function you have over their properties, including lexicographic ones like 'paths which go through fire are always inferior to paths which do not', which will then drop the first path through the fire and leave just the round-abouts. And if there are no non-fire paths, then none of them get dropped and you just get the shortest path through fire, which is the least-bad one.

Expand full comment

So let's say N is 1000, according to your reasoning, you will find all the sets of paths that have length from 2 to 1000 and sort them. It's a rather inefficient workaround.

Brian Damgaard has pointed out a straight forward way to find the optimal path.

Expand full comment

I only said that there was a textbook Dijkstra straight out of Wikipedia that solved his problem, not that it was extremely efficient for all possible problems one might have.

As for efficiency: that's not an issue for him if he had used it, because all of his paths are so short. His issue was correctness and implementation simplicity (both of which this algorithm has). But to discuss tangential issues like finding very long optimal paths - if N is large, then of course optimal pathfinding has to be expensive without baking in domain-specific knowledge. You don't know which path will be optimal under your utility function, which might be arbitrarily complex and weigh different hazards in different ways, or try to optimize for properties like maximizing health points at the end of the path etc, and there are a lot of possible paths. If that turns out to be a bottleneck when profiled, well, one can explore optimizations to that textbook algorithm, which doubtless exist.

---

Damgaard's solution is clever but as I understand it, he's augmenting the state-space with a binary feature, increasing the dimensionality, so what he gains in efficiency in this case he loses in generality (expanding dimensionality comes with its own costs, and it's hard to see how you would extend that approach to things like real-valued costs - which are totally plausible for movement costs in a game, incidentally). Also not sure I'd describe something taking so much Lisp as being 'straight forward' either.

Expand full comment

___

gwern wrote:

> he's {Brian Damgaard = me] augmenting the state-space [...]

> increasing the dimensionality, [...]

> loses in generality [...]

> and it's hard to see how you would extend that

> approach to things like real-valued costs [...]

This statement is wrong because it confuses graph node properties ("dimensions") and costs. These two things have nothing to do with each other.

One might even say that the Lisp demo-program operates with real-valued costs, except that the program scales the costs to integers for the sake of efficiency. Before the scaling, the penalty for making a turn is "1 / boardSize".

Like Mark also noted, it's not true that there is a loss of generality. The given task simply cannot be solved correctly without increasing the dimensionality to the 4-dimensional [column, row, fires, direction] space.

____

gwern wrote:

> Are you going to phone up Damgaard for every single game level,

> asking him how to encode into the graph-state the

> different costs of the water level, the fire level,

> the mountain level, the forest level, the space level...

It's unnecessary to phone me to do that. Most of these cost examples are simple properties of the board squares. They are independent of the search path. Consequently, they need only to be recorded as board square properties, and not as graph-node properties.

The demo-program already offers that type of functionality. Each individual board square has a "cost" field, in which different costs can be encoded. The board-parser function illustrates the principle by showing that cost = 2 for "water" squares.

____

qwern wrote:

> how, for example, would you modify it such that going through fire

> costs 100x more than moving through 1 space, or to include

> a multiplier like picking up a speed-boost item

> which halves the 1-space movement cost or a water penalty

> which doubles it?

In essence, this is merely a repetition of the previous question. The answer is that there are straightforward solutions for dealing with different square types and different items on the board. Many of them ("water", for instance) simply have other square costs than the empty floor squares.

____

qwern wrote:

> Much better to have a transparent utility function

> 'cost (node) { fire(node)*Inf + water(node)*0.5 + speed(node) + ... ; }

There are several significant things wrong with that statement.

1. "Much better" than what? The A* demo-program has a function which computes the cost of making a move this way (just with real weight factors instead of the example factors sketched in the quotation, of course)

2. It's necessary to distinguish between the cost of making a move and the cost of walking the path from the start position to the resulting graph node. Taken literally, the quoted text says "cost( node )", but it's really just a sketch of a move-cost computation. It's preferable to reserve the term "node cost" for the cost of reaching a node.

The node cost depends not only on the board square, but also on the number of already visited fire squares and the number of turns. The demo-program deals with it by storing information in the graph nodes about the number of visited fire squares and the last move direction.

3. The sketched cost function could not work. The cost of visiting a fire square should not be "infinity".

____

gwern wrote:

> Also not sure I'd describe something

> taking so much Lisp as being 'straight forward' either.

"...so much..."! The demo-program is only 107 code lines, including board input parsing and pretty output formatting, and with an A* search algorithm that always finds correct solutions for the given task. The demo-program is close to being as concise and short as it possibly can be.

Expand full comment

Hey, appreciate the articulate answer.

First, I disagree that Damgaard's solution is a trade-off in generality. Finding the space and cost function to apply to A* is the whole point of this problem.

More importantly, I now just realized that what you described (aka your interpretation of the textbook Dijkstra) will NOT solve the problem.

Simply imagine that there are multiple different optimal paths at length N. Without domain-specific knowledge, A* will just pick 1 of them, there's no way for you to tell if the one A* picks will be the correct best. So what ever it chooses, applying "Each edge of the original solution is suppressed in turn and a new shortest-path calculate" or whatever arbitrary rule that you have in mind will not get you the correct path. And that's why you cannot solve this problem without actually carefully crafting the cost function

Expand full comment

Of course you need domain knowledge. There is no a priori law telling you that going through fire is or is not acceptable in your video game.

The point of the text book Dijkstra is that you cleanly factor out that domain knowledge to your utility function, and the planning algorithm focuses on minimizing the loss of that blackbox. You simply write Dijkstra, write a separate utility function, and modify the function as necessary. You want to avoid all paths with fire lexicographically? You modify the sort/cost, not the Dijkstra or level. You want to add a bonus here or avoid a curve there? Modify the sort/cost, without modifying the Dijkstra or level. And so on.

You can see why this would be important in game development - there is no 'carefully crafted cost function' there, it's just baked into the state in a punning way, so you can't modify it easily. Are you going to phone up Damgaard for every single game level, asking him how to encode into the graph-state the different costs of the water level, the fire level, the mountain level, the forest level, the space level... - especially as you experiment with the gameplay to 'find the fun' and realize that you need a different cost here and there and everywhere? Much better to have a transparent utility function 'cost (node) { fire(node)*Inf + water(node)*0.5 + speed(node) + ... ; }' you can tweak and then it Just Works when put into Dijkstra's to sort the list of admissible paths, solving OP's problem cleanly.

Damgaard's solution entangles the lexicographic preference over going through fire into the representation of the state/DAG, which works but is going to be hard to modify to a non-binary non-lexicographic: how, for example, would you modify it such that going through fire costs 100x more than moving through 1 space, or to include a multiplier like picking up a speed-boost item which halves the 1-space movement cost or a water penalty which doubles it?

> Simply imagine that there are multiple different optimal paths at length N. Without domain-specific knowledge, A* will just pick 1 of them, there's no way for you to tell if the one A* picks will be the correct best.

If there are 'multiple different optimal paths', and A* picks one of them, it has by definition picked a correct best one and you are happy with it. If the one it picked was not 'the correct best', then it was not 'optimal', by definition. So this cannot be an objection.

(If you really did have a preference over these equally 'optimal' paths and were unhappy about the selection, then you failed to encode it into your utility function, and that's the problem, not Dijkstra. Obviously, it is not possible for any algorithm to pick the best one if you have not said what is best. Also, this provides a ranked list, so the user can override it to select the next one.)

Expand full comment

Nice article! Just FYI: How would you correctly go about this? I imagine you have to calculate a specific angle range where the circles are allowed to touch for the crescents to overlap?

Expand full comment

i havent tried this but i’d be curious to try to do it with signed distance functions…. but if it does work it seems a bit general/maybe computationally expensive? for a shape thats so easy to construct

Expand full comment

You need to focus on the intersections of the different circles.

When the outer circles don't intersect, there are obviously no collisions.

If the corners of one crescent are inside the other, that's the first collision case.

The next is when the outer arcs collide - that can be checked by looking at the intersection points between the outer circles - if at least one is in both crescents, there is also a collision.

The last case is when the inner arc of one crescent collides with the outer arc of the other. To test for this, check where the outer circle of one crescent intersects with the inner circle of the other. If these points are within both crescents, there is a collision. (Do this check with each crescent in "inner" and "outer" role, and optionally optimize by comparing curvature first).

If these tests don't return a collision, the crescents don't overlap.

Expand full comment

Thanks for a good article! The "cat and fire" pathfinding problem is interesting, and I'm pleased to tell that - contrary to what the article says - the problem is solvable with a normal A* search.

It's just a matter of noticing, the the search states (graph nodes, or whatever we call them) are not the [col, row] squares in the 2D plane. Instead, each search state is uniquely defined by the 3 properties [col, row, visited-fire-squares].

Furthermore, the cost function for each move adds the penalty (1 + depthLimit) to the cost if it's a fire square. This ensures that the cat walks over fire squares only if it's necessary to do so to meet the budget. The minimum number of fire squares are visited in that case.

That's all there is to it!

Tyler wrote:

> (or if you think that it really could just be A* with a modified heuristic instead) (and if you are going to try and “suggest” to me ways to improve it, then please actually implement and try it in context first

I enjoyed implementing it in my home-made Lisp programming language. The code listing follows below, but first, to show that the program works correctly, here are the results for the test level with 6 fire squares presented in the article. I added a "back corridor" to the test level to show that with no depth limit, the A* algorithm finds the longer path around the obstacles and avoids walking over any of the fire squares.

Legend:

s: start

g: goal

f: fire

w: water (not used)

#: wall

-: empty

Uppercase move: move to a fire square

Lowercase move: move to an empty square

----------

---f--###-

-s-ffffg--

---f--###-

----------

Depth limit 5: Not solved (It's unsolvable with this depth limit)

----------

---f--###-

-s-ffffg--

---f--###-

----------

Depth limit 6: rRRRRr

----------

---f--###-

-s-ffffg--

---f--###-

----------

Depth limit 8: ruRrrDRr

----------

---f--###-

-s-ffffg--

---f--###-

----------

Depth limit 10: ruurrdrDRr

----------

---f--###-

-s-ffffg--

---f--###-

----------

Depth limit +inf: ruurrrrrrrddll

Tyler wrote:

>[...] and know I won’t use it because what I have works fine

>already, has been *thoroughly* tested on thousands of

>different cases, and doesn’t have any performance issues either)

That's right; there is no reason to change running code that works. That said, perhaps it's interesting to see how the task can be solved in a minimum of code lines with a standard A* search algorithm.

______________

Code listing

I always enjoy writing miniature programs like this one. It's only ca. 110 lines, including input and output functions. The code is heavily commented in a style that makes it imperative to read it in a text editor with word wrapping disabled.

Even though it's not a standard Lisp dialect, it should be easy enough to get the gist of what the program is doing, and also to convert the code to other programming languages.

[It turned out that the post was too long with the complete code listing. Instead, here is a link to download it from DropBox.]

https://www.dropbox.com/s/y3jktdgwm5ytxj6/Mewgenics%20Pathfinding.lsp.txt?dl=1

Expand full comment

The vanilla A* algorithm can find the perfect solution for the "cat and fire" task, including the optimal minimum of turns.

To recap, the task is to find a path from A to B subject to a given maximum path length, and minimizing - listed in order of priority:

* the number of visited fire squares

* the path length

* the number of turns

In order to do that, each node (i.e., each move) in the search graph is defined by the tuple:

* column

* row

* number of visited fire squares

* move direction

The cost function adds penalties for walking over a fire square and for making turns.

* Penalty for fire squares: 1 + the path length limit

* Penalty for making a turn: 1 / boardSize

With these penalty values, the returned solution is guaranteed to be optimal.

So, voila! The plain vanilla A* algorithm demonstrates again how good and versatile it is.

__________

Examples

Here is the output from my Lisp program for the small test level, showing that the program finds the best solutions.

----------

---f--###-

-s-ffffg--

---f--###-

----------

Depth limit 5: Not solved (The task is unsolvable with this depth limit)

----------

---f--###-

-s-ffffg--

---f--###-

----------

Depth limit 6: rRRRRr

----------

---f--###-

-s-ffffg--

---f--###-

----------

Depth limit 8: drRrrURr

----------

---f--###-

-s-ffffg--

---f--###-

----------

Depth limit 10: ddrrrruURr

----------

---f--###-

-s-ffffg--

---f--###-

----------

Depth limit +inf: ddrrrrrrrruull

_______________

Code listing

Here is a DropBox link to the source code listing:

https://www.dropbox.com/s/y3jktdgwm5ytxj6/Mewgenics%20Pathfinding.lsp.txt?dl=1

Expand full comment

I think the interesting aspect here is that the GPT "API" needs to be learned, but it is not formally defined. You have to coax it and use your own intuition to know how to guide it to the desired outcome.

Expand full comment

Personnaly, I frequently bump into the character output limit of GPT 4, and my code output is cut in the middle. Any trick / workaround ?

Expand full comment

I think the tidy-and-efficient way to solve the original pathing problem is to include the number of fire tiles visited as a property of the _search space nodes_, then also include the count of fire tiles visited as the most significant cost tuple element. The A* heuristic can just be plus Manhattan distance in the physical dimension of the space, since admissibility only requires that heuristic never exceed the actual cost. Search will then treat reaching the same tile passing through 0, 1, etc fire tiles as separate possibilities, but will still consider only the most cost-efficient way of reaching a given tile crossing a given number of fire tiles.

I'm curious if any of the steps GPT-4 tried moved in this direction. If not, I think that's even more of an indictment of its limitations.

Expand full comment

Not to be _that guy_ but I'm just gonna be that one guy that everyone loves. The half-assed unsolicited backseat proofreader, yaaay.

Really though, I suppose I'll start off by saying that this is quite an interesting read. I'm curious what correct solutions to the crescent collision look like and what it would take for GPT-like AI to reach that.

But alas, while reading this I kept finding instances of "its / it's" that are mixed up, and every time my brain stumbles and trips over these. Maybe I could be better at ignoring them, but you even got some wrong in both directions. So I took a closer look after I was done reading, and while I do always feel conflicted about "blasting" people for tiny mistakes, well, I got the numbers now either way, so:

22x "its" -- correct use: 10 -- wrong use: 12

CWWWWWWCCCWCWCWWCWWCCC

16x "it's" -- correct use: 13 -- wrong use: 3

CCCCCWCCCCCCWCCW

So it seems like you mostly just forget the apostrophes, but then sprinkle in a few in places they don't belong. Idk, hopefully this helps correct those typos rather than coming off as being rude. :)

Expand full comment

Here is a site attempting to test outside GPT's training set:

https://BUGFIX-66.com

It was originally intended for Copilot, but it works for ChatGPT and the rest.

About 75% of the problems should be novel. They're easy enough that many dozens of Redditors have solved most of them.

Expand full comment

Great post.

Regarding crescents - I’m not sure you’re example fails 2a - it states both inner circles are *inside* the others outer circle.

The bullshitting interpretation is clever - I’m astounded it kindof even plausibly bullshits - it seems to sort of understand circles! Also I kindof think plausible bullshitting could also be like brainstorming - path finding could this work? I can almost think myself would come up with ok how bout this - at least my younger self - I’m too slow these days!! Lol

Expand full comment

I installed CoPilot in IntelliJ as soon as GitHub launched it. Since Microsoft controls both GitHub and OpenAI, I assume that the AI behind CoPilot is similar to ChatGPT.

CoPilot is fantastic when you need a function that solves a known problem you don't know. Then you write the function's name, and it suggests the rest. You make some minor changes, and it is ok.

The more I use it, the more CoPilot writes code using my style. In other words, it is pretty useful.

However, when I must solve an original problem and I try to prompt a suggestion, it does nothing, confirming that — for now — AI is excellent at learning and showing what it has learned but cannot really create new solutions.

Expand full comment

I tried ChatGPT with GPT-4 just today, and I think you nailed it. The solutions often _looks_ good and reasonable, but ultimately fails. Very good for creating templates, sketches, and so on.

Expand full comment

Really liked your article. Just my opinion about AI coding:

I don't think that most people, who claim that GPT-4 can code say that it can code difficult and very specific tasks. But it obviously can code simple and repetitive tasks and how much of the software engineering work is that difficult, really?

Of course we still need people to check the code from LLMs, but we are just at the beginning. I truely believe that Coding will move towards "low code" and text based code generation and we will see systems that fix code by jira ticket descriptions in the next years, semi-automating a lot of annoying easy work for us. We need to somehow get a share of the productivity increase, that's the real problem here.

Expand full comment

Well, two issues with this: your tile game prompts are a little vague (except the crescent ones). I think they should be more descriptive and give more invariants.

Second, regarding the crescent task: I think tons of programmers would also get it wrong when it comes to edge cases.

Heck, most people can't even write fizz buzz:

https://blog.codinghorror.com/why-cant-programmers-program/

Expand full comment

GPT-4 can't reason about a hard problem if it doesn't have example references of the same problem in its training set. That's the issue. No amount of tweaking the prompt or overexplaining the invariants (without just writing the algorithm in english, which if you can get to that point then you already solved the hard part of the problem) will get it to come to a proper solution, because it doesn't know one. You're welcome to try it yourself with the problems I posted here.

Expand full comment

"GPT-4 can't reason about a hard problem if it doesn't have example references of the same problem in its training set. "

Most people can't reason about hard problems either. And many more can't reason about almost any problem without being given examples - they're bad at abstract reasoning (and school didn't help them). Hey, a lot can't even point to Africa on a globe.

What GPT-4 does is already impressive, and was science fiction just a few years ago.

Expand full comment

It doesn't matter that a lot of people would get it wrong, that's kind of the whole point.

Expand full comment

Oh, but it does. If something can replace "a lot of people" in certain tasks, that's already huge.

Not to mention the relevant technology is in its infancy. Ten years ago we had nothing of the quality, and 50 years ago we had things not much better than ELIZA.

Expand full comment

"at least for now."

Threatening.

Expand full comment
Comment deleted
Mar 17, 2023
Comment deleted
Expand full comment

yeah

Expand full comment