halmos multicore processing

Let’s assume a single test with 4 paths. We’re going to represent how long it takes to solve a given path with squares: 🟩 is fast, 🟩🟩🟩 is slower, and so on

solver	path #1	path #2	path #3	path #4
z3	🟩	🟩🟩	🟩	🟩🟩🟩

How long does this take to complete on a machine with 16 cores?

with --solver-threads=1, we have 1 core busy at 100% on the first path, then the second, and so on sequentially so the total time is 🟩🟩🟩🟩🟩🟩🟩
with --solver-threads=4, we process all 4 paths in parallel. The first 3 paths complete quickly, but the halmos instance has to wait for the slowest one, so the total time is 🟩🟩🟩
with --solver-threads=16, we still process all 4 paths in parallel, but each solver can only use one core at a time. So we don’t see any improvement over --solver-threads=4, total time is still 🟩🟩🟩

So the best we can do is 🟩🟩🟩, because that’s how long it takes for z3 to process the hardest path using 1 core at 100%.

Or is it? What if we used other solvers? What if we ran 4 halmos instances, each using a different solver? Let’s say the results look like this:

solver	path #1	path #2	path #3	path #4	sequential time	parallel time
z3	🟩	🟩🟩	🟩	🟩🟩🟩	🟩🟩🟩🟩🟩🟩🟩	🟩🟩🟩
spiky	🟩🟩🟩🟩	🟩	🟩	🟩	🟩🟩🟩🟩🟩🟩🟩	🟩🟩🟩🟩
even	🟩🟩	🟩🟩	🟩🟩	🟩🟩	🟩🟩🟩🟩🟩🟩🟩🟩	🟩🟩
fasty	🟩🟩	🟩	🟩	🟩	🟩🟩🟩🟩🟩	🟩🟩

Some takeaways:

for even, it takes the same time to process each path. It’s pretty terrible sequentially, but it’s great in parallel (i.e. with --solver-threads=4)
conversely, something like spiky that is very fast on most queries but very slow on outliers. That makes its parallel performance very bad (since we need to wait on the slowest query)
fasty is basically fast on everything which is great sequentially, but it still doesn’t beat even in parallel performance

Running all 4 halmos instances in parallel, our CPU usage over time would look like this:

| CPU usage

100% | 🟪 🟪 🟪 🟪 | | | | | --- | --- | --- | --- | --- | | 75% | 🟦 🟦 🟦 🟦 | | | | | 50% | 🟧 🟧 🟧 🟧 | 🟦 🟦 🟦 🟦 | | | | 25% | 🟩 🟩 🟩 🟩 | 🟪 🟧 🟩 🟩 |

🟧 🟩 |

🟧 | | | time t = 1 | t = 2 | t = 3 | t = 4 |

legend: