Hi all,

A few days ago Nadav pointed out that when you add attestation points from the same oracle in the "no hash"[2] scheme you will get the same point even if the individual points were different.
This means you can get a CET encryption key that is the same for a different set of outcomes. This is bad.

Observe a bet on a number 0-3 decomposed into two binary nonces (R_0, R_1). The encryption keys for each outcome would be:

0: R_0 + 0*X + R_1 + 0*X
1: R_0 + 1*X + R_1 + 0*X
2: R_0 + 0*X + R_1 + 1*X
3: R_0 + 1*X + R_1 + 1*X

Note that CETs 1 and 2 are the equal? I call this a CET collision.

## How did you miss this Lloyd?

I actually have not studied the case of combining points from the same oracle. I assumed it would be simpler than the case of having different oracles and so security would follow on from that. It turns out this was a very very wrong assumption.

I also never considered an adversary's ability to "collide" CET encryption keys so that they are the same. I only found that if a CET is used then all the oracle's whose attestation points were combined to create it did indeed participate in the attestation (their contribution cannot be canceled out as long as you have these proofs of knowledge on each R).
But if the decryption key for a CET is the same as another one then (from the point of view of the protocol) they have actually attested to two contradictory things.

In this case the oracles are not to blame for the equivocation so this breaks fraud proofs in the case of multiple nonces from the same oracle.
Clearly in the no hash scheme these collisions happen without any malicious behavior at all but that doesn't mean they can't happen in the hash scheme. This is what this post is about.

## Collision attacks without oracle participation fundamentals

Consider an adversary who is going to find a magic combination of events/nonces to combine to create a CETs such that if the oracle(s) attest to one thing the adversary is going to be able to decrypt more than one CET from the information available to it. The oracles are behaving completely honestly in this scenario. What are the constraints on this attack?

Conjecture   #1 The two CETs will have exactly the same encryption key.

If they didn't then the adversary would have created a combination of attention points where they know the scalar offset between the two groups that caused the collision. They don't know the discrete logarithms of the component points so this should be impossible..

Conjecture #2 The points that make up the two CETs will be derived from the same nonces.

First let's denote sum(R + c * X) = R_0 + c_0 * X + R_1 + c_1 * X ... R_n + c_n * X for n possible nonces.

If you know sum(R_a + c_b*X) = sum(R_b + c_b*X) where not every R_a on the left is the same as every R_b on the right then you have figured out an equality between two groups of independent points which means you can find discrete logarithms (same reason as Conjecture #1).

The two conjectures above allow us to hone in on what the adversary must be doing. Since they are using the same nonces then we can remove those from the equation:

1. sum(R_a + c_a*X) = sum(R_b + c_b*X)
2. sum(c_a*X) = sum(c_b*X)                      # Every R_a = R_b so subtract from both sides
3. sum(c_a) = sum(c_b)                             # Divide by X

Ok so the adversary MUST be creating the collision by having two sets of c's (of equal size) that are different but sum to the same value. Clearly in the no hash scheme this is very very easy to create such sums but what about the hash scheme?

## Collision attacks on original DLC scheme

In the original proposal "c"s were computed as H(R || X || o) where o is the outcome (the no-hash hash scheme can be thought of as just c = o).
So to cause a collision the attacker has to find two sets of o's (items in the set denoted as o_a and o_b) against a fixed set of Rs  such that sum(H(R || X || o_a)) = sum(H(R || X || o_b)) which can be simplified to:

sum(H(R || X || o_a) - H(R || X || o_b)) =  0

Just to be clear this means finding o_a0, ..., o_an and o_b0, ..., o_bn such that:

H(R_0 || X || o_a0) - H(R_0 || X || o_b0) + H(R_1 || X || o_a1) - H(R_1 || X || o_b1) + ... + H(R_0 || X || o_an) - H(R_0 || X || o_bn) = 0

On average how many nonces and outcomes per nonce do we need before there is a collision?

Each term like H(R_0 || X || o_a0) - H(R_0 || X || o_b0) can take on m * (m - 1) possible values where m is the number of possible outcomes.
For n nonces we have (m^2 - m) ^ n possible CET encryption keys. Let's say we have m = 2^10 = 1024 outcomes per nonce this means to get at least 2^256 possible CETs we need roughly n = 13 nonces to be involved on average to have a collision.

But how hard would it be to find a collision if one were to exist? Well it looks a bit like Wagner's k-sum problem [1] where k is the number of nonces but I think it's best to leave out estimating this for now since it requires a more in-depth study.

## Don't hash the outcomes

What if instead of H(R || X || o) we computed c as o * H(R || X). A bit of a mid-way between the hash and no-hash schemes. On the surface this would seem to fix the original problem with c = o because the H(R || X) term delinearizes it so collisions don't happen naturally. It also preserves most of the speedup that the no hash scheme offers since each CET encryption key can be produced by a single addition from the previous one. The collision problem for this scheme becomes to find:

1. sum(H(R || X) *o_a) = sum(H(R|| X) * o_b)
2. sum(H(R || X)* o_a - H(R || X) * o_b) = 0
3. sum(H(R || X) * (o_a - o_b) ) = 0

So what are the possible values of (o_a - o_b)? If there are m outcomes then it should be 2 * (m-1). e.g. if we have 1024 outcomes the values could be -1023..1023 (not including 0). This is far fewer than the m^2 - m we have in the original.
For m = 2^10 outcomes per nonce we would need around n = 24 nonces on average to have a collision or roughly double what the original needs for a collision to exist.

To actually find a collision it looks like a subset sum kind of problem except with being able to choose coefficients from -m..m. I am not going to guess how difficult this is yet.

## What about malicious oracles

A malicious oracle may choose their R values such that there is a collision. I *think* this would be much more efficient than a malicious user just finding a collision on an existing set of nonces. In order for fraud proofs to work we would have to identify what threshold of nonces is it likely to be a malicious user who created the collision by combining a bunch of nonces together or whether they had help from a malicious oracle.

Of course we really want to avoid this whole game altogether. I am hopeful that with the "nonce-only-hash" H(R ||X) * o scheme it might be hard for the oracle to find collisions no matter how many Rs they try. But showing this to be the case will take more work. It feels like the full-hash scheme will struggle to do this.


Please review my claims here and happy to discuss at the next meeting. Sorry for the awful notation.



[1] https://joinmarket.me/blog/blog/avoiding-wagnerian-tragedies/
[2] https://github.com/LLFourn/dlc-sec/blob/master/main.pdf (section 4)