103 Comments

I go down to Puerto Rico pretty frequently and have some good connections there. Including in the (small) tech scene. Let me know if I can help Insight Prediction founder.

Expand full comment

Does anybody know of a good prediction market looking at mortgage rates?

(edit) Or perhaps has an informed opinion on how high the 30-year fixed rate will go in the USA this year?

I'm planning on a 30-year fixed mortgage. I have an uncertain closing date (9 to 18 months away) and the opportunity to lock a rate in for 9 months, buy a rate-lock extension for one month, and then buy re-locks every 5 days.

This means that I could pay a few thousand dollars to make the rate that I pay on my mortgage be today's rate, for the next 9 months. If the lock-in time expire, and I don't renew the lock, the rate goes to whatever the rate is on that day. For example, if I lock in today, I pay a few thousand and I will have a 4.6% rate on mortgage as long as I close within 270 days. If the lock expires, and the rate at closing time is 5.5%, then my mortgage rate will be 5.5% and I will be out the several thousand dollars.

I can comfortably figure out the right solution given a mortgage rate, but I'm struggling trying to get a probable range of mortgage rates across the next n months. A prediction market seems like it could be a good approach.

Expand full comment

> The only difference I can find is that Insight requires two media sources to resolve positively and Metaculus only one - surely people don’t think there’s a 40% chance troops will enter Kyiv but only one source will report it?

The difference also includes the chance that they don't enter Kyiv, but one source reports that they did anyway.

Expand full comment

I looked at the comments section for the Metaculus question on "will Russian soldiers enter Kyiv". It's basically an argument over whether >100 soldiers entered Kyiv between Feb 25 and Feb 27 and whether they counted under the question criteria. Sometimes you can predict the future, sometimes you can't even accurately predict the past : )

Expand full comment

"I’m hoping to get people together next year, come up with a standard question set, and give it to as many platforms (and individuals!) as possible to see what happens."

The best time to do this was at the start of this year. The second best time is now! Why wait until the end of the year? Just that it might be a sort of a Schelling point? You could also pick June as a mid-year thing and coordinate around that.

Expand full comment

The market "When will programs write programs for us?" is an interesting example. The headline, and how most of the people that I've talked to interpreted it, implied something very different from how it was graded. How people interpreted it was more along the lines of "when can programs replace a significant fraction of programming work", and we're still far away from that almost a year later.

Of course, the fault there is in the predictors for not reading the full description, but it does suggest that the problem is not in prediction AI progress but in question reading. If your takeaway from it was "predictors are too conservative on AI progress, GAI is coming sooner than people think", you're not taking away the right lesson. I think the right lesson here is "people are bad at predicting when the question title is ambiguous, even if there's a specific description in the body".

Expand full comment

(sigh) You've finally convinced me to join Metaculus and start recording my own predictions.

Expand full comment

Can I get a sanity check here? I occasionally place bets on Kalshi to see if I do as well in real-life markets as I do on the reputation based ones. Thus far I've done ok but really confused by the action on this question:

New COVID-19 case average by April 1 - https://kalshi.com/events/SCASER3/markets/SCASE-029

My modeling predicts that covid cases under <25,000 by April 1 is highly likely, probably by Friday unless BA2 creates an uptick (we're currently at about 27000 cases and dropping) but the market currently has it at 23%. There's $25000 in open interest so liquidity isn't an issue. Is there something I'm missing?

Delete if this is too self-serving but thought it may slot under prediction market goofiness.

Expand full comment

Looking over the prediction contest data, most of the averages and aggregates are similar to Scott's original prediction.

The one that is dramatically different is whether inflation will be < 3%. Scott says 80%, the average is 51%, and the aggregate is 35%.

Expand full comment

>The only difference I can find is that Insight requires two media sources to resolve positively and Metaculus only one - surely people don’t think there’s a 40% chance troops will enter Kyiv but only one source will report it?

Seems more likely the error would be in the opposite direction. If they don't enter the city but at least one news source says they did, especially given existence of propaganda and fog of war

Expand full comment

Some more weird stuffs about Aver: their Discord server has 19k+ members yet isn't really... active. That's a lot of big numbers for one platform we never heard of before. I hope someone will prove my suspicion wrong!

I'm excited about all the sketchy-looking crypto prediction platforms - it means more potential arbitrage opportunities.

Expand full comment

So how much moderation is enough moderation for a prediction market site?

(edit to add: Manifold Markets doesn't feel like it has enough moderation.)

I would like it if the various glowfic markets were required to be categorized in a way that makes them less visible. Also sports betting. And anything involving root vegetables.

Hopefully some form of community moderation can solve that. The holy grail is "prediction-market based moderation", though you probably need two incommensurate currencies to pull that off. Roughly, if there is >95% certainty that a comment is bad, it is automatically pulled for an hour.

On the other hand, I signed up for Metaculus a few hours ago and submitted the "Turkey 2022 Name Change" question, and it still hasn't been approved. So sometimes less is more?

Expand full comment
Mar 22, 2022·edited Mar 22, 2022

This is Eric (one of the people running the forecasting contest.) Happy to answer questions about the aggregation method we used! I also want to second Scott's invitation for others to try their hand at aggregating the data, with a few caveats:

- Consider blinding yourself to the particular questions and only looking at the numbers. Otherwise you could end up "cheating" by e.g. choosing an aggregation method that assigns a particularly high probability to the Russia/Ukraine war question (which we now know will resolve positively).

- You can approach this in two ways. First, you could come up with an aggregation method that (like ours) *only* uses Scott's predictions and the contest participants' predictions and not any other data (e.g. "superforecaster or not") about the participants. Or alternatively, you can make your best guess about how to use the extra data to make a better aggregation method. I think both of these are interesting problems!

- I think there's a ton of luck in terms of which aggregation methods will work well, just because most aggregation methods will tend to give similar numbers and the sample of questions just isn't very large. Accordingly, there's a pretty decent chance your aggregation method will do worse than ours even if it's better, or do better than ours even if it's worse. But perhaps we'll take the approaches that do well this year and see how well they do next year!

Expand full comment

If you folks want to make easy money, bet on the Oscar prediction market. It's ridiculously straightforward. The only public prediction that I made (in 2011) was pretty terrible by my standards and I got only 15 out of 24 categories right (but all the major ones excluding supporting actress) [1, 2]. But on a good year I could do up to 21 out of 24, before getting really bored of it.

[1] Post of the predictions before the Oscars (in Italian, sadly): http://www.fiveobstructions.com/?p=2292

[2] Aftermath post: http://www.fiveobstructions.com/?p=2302

Expand full comment

My goodness you all don’t find the Ukrainian war casting @ betting on how many will be butchered utterly tasteless? There are no words…

Expand full comment

I am still dubious about the idea that "gamblers", aggregated, are better predictors.

Is there really any research backing that proposition?

Is the wisdom of a crowd of gamblers more wise than the wisdom of the crowd.

The theory seems to be that having skin in the game makes one a better predictor, but aren't there 1000s of gamblers in bankruptcy.

Markets are only sort of efficient. And markets are also susceptible to corruption and manipulation. Even with some transparency and the SEC, there is still undetected insider trading and madoff scams. Is it really possible that an online prediction market where some one can bet as igottip234 can be a good thing?

Why does a known gambler like Nate Silver refuse to publicly state whether he bets on his own predictions when it is known that his predictions will affect odds in other markets?

Expand full comment

I see that the linked Metaculus problem specifies resolution criteria of passing at least 2 out of 5 trials from a given list. Did anyone actually test Codex on those problems? How many did it pass?

Anyway, assuming that Codex did pass some of those, I think the problem is that people were estimating dates based on their interpretation of the question (when will AI become useful for code generation) rather than the actual resolution criteria (when will AI get good at copypasting leetcode solutions from Stack Overflow).

Expand full comment

In what way can OpenAI’s Codex be said to be a program writing programs that a compiler cannot?

Both are programs that you give a high level specification to and it attempts to return a lower level program that accomplishes that spec.

The list from the links is of course much higher level and more freeform of a specification than, say, C code. But is it higher level as compared to its products? Is the difference between a paragraph and Python more than the difference between C and machine code? Good compiled and interpreted code already do read like their spec.

Expand full comment
Mar 22, 2022·edited Mar 22, 2022

Hum…let me see if I understand how this works.

A question is proposed - Will Ukraine successfully drive out Russian forces?

Currently the market might say 0.5% yes.

If territory begins to be taken back by Ukrainian forces that percentage will rise. If we get Russian forces in full retreat it might go to 95% and then as the last Russian troop leaves someone will win.

Where are we judging that the prediction market added any value?

Expand full comment

I wrote a little script that compiles a daily newsletter with the biggest Metaculus changes: https://newspredictions.substack.com/p/news-predictions-2022-03-23?showWelcome=true&s=w

It's very similar to the Metaculus Twitter Bot but, well, a newsletter.

Expand full comment
Mar 23, 2022·edited Mar 23, 2022

> Even in late 2020, just before the question stopped accepting new predictions, the forecast was January 2027. The real answer was six months later, mid-2021, when OpenAI released Codex. I don’t want to update too much on a single data point, but this is quite the data point. If I had to cram this into the narrative of “not systematically underappreciating speed of AI progress”, I would draw on eg this question about fusion, where the resolution criteria (ignition) may have been met by an existing system - tech forecasters tend to underestimate the ability of cool prototypes to fulfill forecasting question criteria without being the One Amazing Breakthrough they’re looking for.

I think it's a much less spectacular data point than it might look, and the fusion example is a good example of why. If you think forecasters know how much more progress needs to be done, and were wrong about how fast that progress will get made, then the forecasters were off by an order of magnitude, which is a lot. But if you think forecasters knew how fast progress would happen, and were wrong about how much more needed to be done, then progress was 6 years closer to Codex than they thought it was, and expressing their error as a ratio instead wouldn't make sense, since they'd be consistently 6 years off according to this model (unless they learn more about how much progress needs to be done before getting Codex), but the ratio of their 6-year-too-late prediction to the actual time until Codex will change, asymptoting to infinity the moment before Codex is completed.

Expand full comment

From Fox News:

“U.S. and European officials have been working to seize excessive yachts belonging to Putin's oligarch allies and moored in various NATO countries.”

The use of the word “excessive” in that paragraph makes me chuckle.

Expand full comment

Two months before invasion, metaculus gave ~30% odds of such an event. Even two weeks before the war started, it was still under 50% odds. https://www.metaculus.com/questions/8898/russian-invasion-of-ukraine-before-2023/

Expand full comment