home

author: niplav, created: 2023-12-21, modified: 2024-04-18, language: english, status: finished, importance: 3, confidence: certain

Subscripts in text can be used to attach explicit probabilities to claims99%.

Subscripts for Probabilities

Gwern has wondered about a use-case for subscripts in hypertext. While they have settled on a specific use-case, namely years for citations, I propose a different one: reporting explicit probabilities.

Explicitely giving for probabilities in day-to-day English text is usually quite clunky: "I assign 35% to North Korea testing an intercontinental ballistic missile until the end of this year" reads far less smoothly than "I don't think North Korea will test an intercontinental ballistic missile this year".

And since subscripts are a solution in need of a problem, one can wonder how well those two fit together: Quite well, I claim.

In short, I propose to append probabilities in subscript after a statement using standard HTML subscript notation (or $\LaTeX$ as a fallback if it's available), with the probability possibly also being a link to a relevant forecasting platform with the same question:

I think Donald Trump is going to be incarcerated before 203065%.

This is almost as readable as the sentence without the probability.

There are some complications with negations in sentences or multiple statements. For the most part, I'll simply avoid such cases ("Doctor, it hurts when I do this!" "Don't do that, then."), but if I had to, I'd solve the first problem by declaring that the probability applies to the literal meaning of the previous sentence, including all negations; the problem with multiple statements is solved by delimiters.

As an example for the different kinds of negation: "The train won't come more than 5 minutes late90%" would (arguendo) mean the same thing as "I don't think the train will come more than 5 minutes late90%" means the same as "The train will take more than 5 minutes to arrive10%" equivalent to "I assign 90% probability to the train arriving within the next 5 minutes".

With multiple statements, my favorite way of delimiting is currently half brackets: "I think ⸤it'll rain tomorrow⸥55%, but ⸤Tuesday is going to be sunny⸥80%, but I don't think ⸤your uncle is going to be happy about that⸥15%."

The probabilities in this context aren't quite evidentials, but neither are they veridicals nor miratives, I propose the world "credal" for this category.

Enumerating Possible Notations

The exact place of insertion is subtle: In sentences with a single central statement, there are multiple locations one could place the probability.

This becomes trickier in sentences with multiple statements.

Variants

A variant of the notation could use decimal notation instead of percentages, and leave out trailing zeroes. "I think it'll rain tomorrow$_{50\%}$" would then become the more compact "I think it'll rain tomorrow$_{.5}$". This has the advantage of being compatible with plain text through the combining dot below diacritic, which would yield "I think it'll rain tomorroẉ₅". However, the meaning of the combining dot can be ambiguous at first.

On LessWrong, one can also use reacts signifying probabilities on one's own text. While it's restricted to LessWrong, it also allows other people to easily assign different probabilities to your statements.

Since the people writing the text reporting probabilities are probably logically non-omniscient bounded agents, it might as well be useful to report the time or effort one has spent on refining the reported probability: "I reckon humanity will survive the 21st century55%:20h", indicating that the speaker has reflected on this question for 20 hours to arrive at their current probability (something akin to reporting an "epistemic effort" for a piece of information). I fear that this notation is getting into cumbersome territory and won't be using it.

Notation Options and Difficulties

There are three available options: Either ones writing platform supports HTML, in which case one can use the <sub>18%</sub> tags (giving 18%), or it supports $\LaTeX$, which creates a sligthly fancier looking but also more fragile notation using _{18\%} (resulting in $_{18\%}$), or ones platform directly supports subscripting, such as pandoc with ~18%~, but not Reddit Markdown (which does support superscript). More info about other platforms here.

Ideally one would simply use Unicode subscripts, which are available for all digits, but tragically not for the percentage sign '%' or a simple dot '.'. Perhaps a project for the future: After all, they did include a subscript '+'₊, a subscript '-'₋, equality sign '='₌ and parentheses '()'₍₎, but many subscript letters (b, c, d, f, g, j, q, r, u, v, w, y and z) are still missing…

Applications

I've used this notation sparingly but increasingly, a good example of a first exploration is here and interspersed in the text here.

Fischer 2023 uses a different notation:

The notation proposed here would change the text:

Discussions

See Also