- Padding distribution by Michael Rogers
- Re: [remailer] Padding distribution by Lance Cottrell
- Re: [remailer] Padding distribution by Michael Rogers
- Re: [remailer] Padding distribution by Greg Rubino
- Re: [remailer] Padding distribution by Lance Cottrell
- Re: [remailer] Padding distribution by Greg Rubino
- Re: [remailer] Padding distribution by Lance Cottrell

- Re: [remailer] Padding distribution by Tom Ritter
- Re: [remailer] Padding distribution by Michael Rogers

- Re: [remailer] Padding distribution by Nick Mathewson
- Re: [remailer] Padding distribution by Michael Rogers
- Re: [remailer] Padding distribution by Nick Mathewson
- Re: [remailer] Padding distribution by Michael Rogers
- Re: [remailer] Padding distribution by Lance Cottrell

- From:
- Michael Rogers
- Date:
- 2011-07-29 @ 11:59

Hi all, I'm working on a problem that I guess some members of the list might find interesting: given a message of size x, how to choose an amount of padding, y, to add to the message, such that the size of the padded message, x + y, reveals as little as possible about x? "Reveals as little as possible" means that the entropy of an observer's probability distribution over possible values of x, conditioned on x + y, should be as high as possible. Any thoughts about how to go about finding a good (deterministic or probabilistic) function f(x) = y? Thanks, Michael

- From:
- Lance Cottrell
- Date:
- 2011-07-29 @ 14:09

Good hiding of information which could be used to track messages always requires a lot of overhead. The decision in Mixmaster was to pad to a size that was not ridiculous for small messages, and to break large messages in to independent units of that standard block size. Any general padding function will always reveal that the message in question is smaller than the padded size. This is a particular tell for very large messages unless your padding function pads a significant fraction of messages to be extraordinarily large (with associated massive overhead). It would help my analysis if I understood the context in which this would be used. Without knowing what the observer can see and what the space of observables looks like, the "entropy" is poorly defined. -Lance On Jul 29, 2011, at 7:59 AM, Michael Rogers wrote: > Hi all, > > I'm working on a problem that I guess some members of the list might > find interesting: given a message of size x, how to choose an amount of > padding, y, to add to the message, such that the size of the padded > message, x + y, reveals as little as possible about x? > > "Reveals as little as possible" means that the entropy of an observer's > probability distribution over possible values of x, conditioned on x + > y, should be as high as possible. > > Any thoughts about how to go about finding a good (deterministic or > probabilistic) function f(x) = y? > > Thanks, > Michael > -- Lance Cottrell loki@obscura.com

- From:
- Michael Rogers
- Date:
- 2011-07-29 @ 17:07

Hi Lance, On 29/07/11 15:09, Lance Cottrell wrote: > It would help my analysis if I understood the context in which this > would be used. Without knowing what the observer can see and what the > space of observables looks like, the "entropy" is poorly defined. Sorry, I should have explained the context in my first message. I'm working on a communication system that uses a variety of different media - memory sticks, Bluetooth, TCP connections, etc - to transport opaque chunks of data. A higher-layer protocol interprets those chunks of data and stitches the various media together into a delay-tolerant network. But some of the media are susceptible to traffic analysis, so I'd like to come up with a general-purpose mechanism that takes an opaque chunk of data and pads it to conceal its size, as far as possible, from an observer who may have some prior knowledge about likely sizes. It would be possible to break each large chunk into smaller chunks, as in Mixmaster, but the observer can still see how many of those smaller chunks are being transmitted, so we don't get the same kind of protection from that approach as Mixmaster does (where, IIRC, each chunk is routed independently through the mix network?). For some media it might be possible to conceal the true volume of traffic by using a fixed transmission rate, buffering real traffic or adding dummy traffic as necessary, but right now I'm focussing on the media where that isn't possible - for example, transporting the data on a memory stick or attaching it to an email. Cheers, Michael

- From:
- Greg Rubino
- Date:
- 2011-07-29 @ 15:13

On Fri, Jul 29, 2011 at 10:09 AM, Lance Cottrell <loki@obscura.com> wrote: > > Any general padding function will always reveal that the message in question is smaller than the padded size. This is a particular tell for very large messages unless your padding function pads a significant fraction of messages to be extraordinarily large (with associated massive overhead). > I was wondering about this... Could the padding function actually include negative values and then the negative padding could be a controlled amount of compression applied to the message? The resulting message being the original message compressed to the point that it is shorter by the amount of "negative" padding. I think you could actually achieve this kind of lengthening and shortening with an adaptation of Huffman's algorithm where the comparison function used to determine the priority of the subtrees inserted into the queue is modified such that it approaches the desired size for the given input. I'm not sure that such an approach would actually hide information (because the encoding tree would have to be transmitted along with the message), but it sounds like it'd be a lot of fun to implement! Just thinking out loud there... --Greg

- From:
- Lance Cottrell
- Date:
- 2011-07-29 @ 15:41

I think this could be equivalently represented as padding everything as before but starting from a maximally compressed version of every message. To me it looks like it would be an equivalent situation so no benefit. -Lance On Jul 29, 2011, at 11:13 AM, Greg Rubino wrote: > On Fri, Jul 29, 2011 at 10:09 AM, Lance Cottrell <loki@obscura.com> wrote: >> >> Any general padding function will always reveal that the message in question is smaller than the padded size. This is a particular tell for very large messages unless your padding function pads a significant fraction of messages to be extraordinarily large (with associated massive overhead). >> > > I was wondering about this... > > Could the padding function actually include negative values and then > the negative padding could be a controlled amount of compression > applied to the message? The resulting message being the original > message compressed to the point that it is shorter by the amount of > "negative" padding. I think you could actually achieve this kind of > lengthening and shortening with an adaptation of Huffman's algorithm > where the comparison function used to determine the priority of the > subtrees inserted into the queue is modified such that it approaches > the desired size for the given input. I'm not sure that such an > approach would actually hide information (because the encoding tree > would have to be transmitted along with the message), but it sounds > like it'd be a lot of fun to implement! > > Just thinking out loud there... > > --Greg > -- Lance Cottrell loki@obscura.com

- From:
- Greg Rubino
- Date:
- 2011-07-29 @ 16:37

```
On Fri, Jul 29, 2011 at 11:41 AM, Lance Cottrell <loki@obscura.com> wrote:
> I think this could be equivalently represented as padding everything as
before but starting from a maximally compressed version of every message.
I think I see what you mean, but...
I am n00b to information theory (in case that wasn't already obvious),
so can you provide a more precision wrt what you mean by equivalence?
That would probably help me think about the problem as well.
Thanks,
Greg
```

- From:
- Lance Cottrell
- Date:
- 2011-07-29 @ 17:33

As I understand the proposal, some files would get compressed by somewhere between 0% and whatever the best available compression system can manage (m%), or the file can get padded up to some larger size from 0% to some maximum (n%). So the file (original size X) ends up being being a new size Y somewhere in the range m X < Y < n X This is really just the same as saying the original file is size m X and could be padded up to n X. Thinking about both shrinking and growing makes it more complicated, so lets assume it has been shrunk completely and then only talk about padding it by various amounts. -Lance On Jul 29, 2011, at 12:37 PM, Greg Rubino wrote: > On Fri, Jul 29, 2011 at 11:41 AM, Lance Cottrell <loki@obscura.com> wrote: >> I think this could be equivalently represented as padding everything as before but starting from a maximally compressed version of every message. > > I think I see what you mean, but... > > I am n00b to information theory (in case that wasn't already obvious), > so can you provide a more precision wrt what you mean by equivalence? > That would probably help me think about the problem as well. > > Thanks, > > Greg > -- Lance Cottrell loki@obscura.com

- From:
- Tom Ritter
- Date:
- 2011-07-29 @ 12:43

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > Any thoughts about how to go about finding a good (deterministic or > probabilistic) function f(x) = y? I've thought a bit about this myself. These are just ideas, nothing formal. Any padding of length y reveals that the message x is of a length <= y, possibly after being compressed. So there's always an information leak there. My first thought was padding 'buckets'. For example: any message of length x is padded to min(1024, 2^ceil(log2(x))). A message of length 4095 is padded to 4096, but a message of length 2049 is padded to the same. And then small messages are always padded up to 1024. A disadvantage is you can say with certainty what the range of lengths for the message was (e.g. 2087-4096). The base is obviously adjustable to widen the range. Let's say a message was padded randomly, a message between 2087-4096 can be padded anywhere from 0-maxrand bytes. If you choose a sufficiently large maxrand number, you get a good distribution. But what must maxrand be to really achieve a good distribution? If it's <= x, then the total length will be padded to <= 2x, and you can determine that a message length is between y/2 and y. Any maxrand based on the message size would give a similar equation, and any constant maxrand would provide a smaller-than-previous range when a message large enough is sent. Large values or large equations for maxrand expands the range, but the range and equations still exists. And there's the practical problem that you don't want to be transmitting 2MB for a 2KB message. If Alice and Bob exchange dozens of messages all of size 2048: is more or less information gained from knowing their messages are 1029-2048 OR watching them exchange dozens of messages, plotting their ranges on a number line, finding the area of most overlap, and making a conjecture narrowing the range? Is more security gained first by adding a maxrand value, and *then* padding it to a bucketsize? Should there be a set of equations determining maxrand, and an equation chosen at random? I don't know, I certainly haven't run any statistical tests, and I'd probably have a flaw in my tests if I tried. But it's a thought. - -tom -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Cygwin) iEYEARECAAYFAk4yqw4ACgkQJZJIJEzU09sm7wCgidpZCKqwXZQVy1rLUzy6Xsvg UgAAnR+y8y+7nelytVc8z2wcpj/6WJu7 =WjQG -----END PGP SIGNATURE-----

- From:
- Michael Rogers
- Date:
- 2011-07-29 @ 16:41

Hi Tom, Thanks for the reply. On 29/07/11 13:43, Tom Ritter wrote: > If Alice and Bob exchange dozens of messages all of size 2048: is more > or less information gained from knowing their messages are 1029-2048 OR > watching them exchange dozens of messages, plotting their ranges on a > number line, finding the area of most overlap, and making a conjecture > narrowing the range? Is more security gained first by adding a maxrand > value, and *then* padding it to a bucketsize? Should there be a set of > equations determining maxrand, and an equation chosen at random? The buckets idea is something I'd considered too, but the protection it provides seems to depend heavily on the observer's prior knowledge about likely message sizes. To take an extreme example, if the observer knows that x must be either 4096 or 4097, the buckets scheme provides no protection (the observer learns with certainty which value x has by observing whether the padded message is 4096 or 8092 bytes long), but if the observer knows that x must be either 4095 or 4096, the buckets scheme provides perfect protection (the observer learns nothing because the padded message is 4096 bytes long in either case). That suggests we might need to take the observer's prior knowledge into account when saying anything about the strength of a given padding scheme. But can we still make general statements about how a given scheme transforms the observer's prior knowledge into posterior knowledge? For example, let's say we have a function sample() that returns a random sample from some probability distribution over the positive integers. To pad a message of size x, we keep calling sample() until it returns a value >= x, and that's the size of our padded message. Now, the observer, who has some prior distribution over possible values of x, sees the padded message, and can calculate, for each possible value of x, the probability of padding that value to the observed size. So now the observer has a posterior distribution over possible values of x. But that's where I get stuck. Can we say anything like "using distribution A for sample() rather than distribution B leaves the observer closer to the prior, therefore distribution A is preferable", or does everything depend on the exact form of the prior (as in the buckets example)? Cheers, Michael

- From:
- Nick Mathewson
- Date:
- 2011-08-01 @ 16:25

On Fri, Jul 29, 2011 at 7:59 AM, Michael Rogers <m--@gmx.com> wrote: > Hi all, > > I'm working on a problem that I guess some members of the list might > find interesting: given a message of size x, how to choose an amount of > padding, y, to add to the message, such that the size of the padded > message, x + y, reveals as little as possible about x? > > "Reveals as little as possible" means that the entropy of an observer's > probability distribution over possible values of x, conditioned on x + > y, should be as high as possible. > > Any thoughts about how to go about finding a good (deterministic or > probabilistic) function f(x) = y? Well, it's trivial to come up with something that meets your stated needs without actually being a good idea: choose y = C - x for some constant C larger than any larger possible value of x. The total value x+y is now equal to C, which reveals nothing about x. Of course, this is probably not what you want, since if you want to support messages up to, say, 100 MB, you would then have to make _every_ message padded to at least 100MB long. Here's another example that does what you asked for while at the same time being an even *worse* idea: choose y = C - x + r() where C is a constant larger than any possible value of x, .and r() is a large randomly chosen integer. Why is this worse? Clearly, x+y doesn't reveal anything about the size of x... but if this mix net is one where messages remain at constant size as they move through the network, then having messages of many different sizes makes those messages less likely to provide cover for one another. If message sizes change, but the way in which they change preserves any information about the original size (for instance, they only get shorter), this can still be used to help trace messages. If we model a large eavesdropping adversary in terms of a long-term intersection attack (for example, Statistical Disclosure and its variants), then having multiple possible message sizes effectively puts every message on a separate mixnet. -- Nick

- From:
- Michael Rogers
- Date:
- 2011-08-02 @ 11:04

Hi Nick, On 01/08/11 17:25, Nick Mathewson wrote: > Well, it's trivial to come up with something that meets your stated > needs without actually being a good idea: choose y = C - x for some > constant C larger than any larger possible value of x. The total > value x+y is now equal to C, which reveals nothing about x. > > Of course, this is probably not what you want, since if you want to > support messages up to, say, 100 MB, you would then have to make > _every_ message padded to at least 100MB long. Good point, I should have said something about the cost of padding. Ideally I'd like to be able to measure the "effectiveness" of a given padding scheme as well as its cost. Then an appropriate tradeoff can be chosen for each application. We already know how to define cost, so the question is how to define effectiveness. > If we model a large eavesdropping adversary in terms of a long-term > intersection attack (for example, Statistical Disclosure and its > variants), then having multiple possible message sizes effectively > puts every message on a separate mixnet. Interesting. The scenario I have in mind isn't a mixnet - each relay can replace the received padding with its own padding before forwarding the message, so a message can change size while moving through the network - but I guess it's still useful to ask whether it's more effective against statistical attacks to pad the message to the same size each time (as in Tom's buckets scheme, for example), or to an independently chosen size each time. Cheers, Michael

- From:
- Nick Mathewson
- Date:
- 2011-08-02 @ 14:33

On Tue, Aug 2, 2011 at 7:04 AM, Michael Rogers <m--@gmx.com> wrote: [...] > Interesting. The scenario I have in mind isn't a mixnet - each relay can > replace the received padding with its own padding before forwarding the > message, so a message can change size while moving through the network - If every relay can replace padding with new padding, then won't the relays have to know how much padding they can safely replace? And if they know how much padding they can safely replace, won't that tell the relays something about the original message size? I can imagine some scenarios where you can tell how much padding you can remove. For one example, there are plenty of mixnet designs where a fixed-size header is removed at each hop and replaced by a fixed amount of padding. In these systems, the amount removed and replaced reveals nothing about the underlying message size. For another example, there might be a mixnet design where padding is only added at each step, but never removed. In this case, you could learn some information about messages by noting that a longer message would never turn into a shorter message. > but I guess it's still useful to ask whether it's more effective against > statistical attacks to pad the message to the same size each time (as in > Tom's buckets scheme, for example), or to an independently chosen size > each time. I think you might need to back up and try to figure out what kind of threat model you're trying to resist here, and what kind of properties you're trying to provide in the presence of that threat. -- Nick

- From:
- Michael Rogers
- Date:
- 2011-08-02 @ 16:17

On 02/08/11 15:33, Nick Mathewson wrote: > If every relay can replace padding with new padding, then won't the > relays have to know how much padding they can safely replace? And if > they know how much padding they can safely replace, won't that tell > the relays something about the original message size? Yes - in my scenario that's not a problem, since the relays can see the plaintext of the messages they're forwarding. I'll explain more below. > I think you might need to back up and try to figure out what kind of > threat model you're trying to resist here, and what kind of properties > you're trying to provide in the presence of that threat. OK - I was hoping this might be a general-purpose problem that would be somewhat independent of my particular scenario, but perhaps that was wishful thinking and I should explain the details of my scenario. The network I'm considering is a delay-tolerant publish-subscribe overlay, broadly similar to Usenet. Each node in the overlay is owned and operated by an individual, and may be hosted on a personal computer or a mobile device such as a smartphone. Direct communication only occurs between nodes whose operators know and trust one another and have exchanged encryption and authentication keys face-to-face. Users (node operators) subscribe to discussion groups. Each group forms a connected subgraph of the overlay. Subscribers can post messages to groups. Each message is flooded to the group's subscribers across the edges of the overlay that connect subscribers. Nodes do not receive messages for groups to which they do not subscribe. Connections between neighbouring nodes may use a wide range of underlying media, from memory sticks to Bluetooth to TCP/IP. These media have a wide range of characteristics, so I divide them into two classes: 1. Batch-mode connections. This class includes media that don't attempt to provide bidirectional, reliable, ordered, timely delivery of data. A memory stick tied to a carrier pigeon would be an example of a batch-mode connection. 2. Stream-mode connections. This class includes media that attempt (though don't necessarily succeed) to provide bidirectional, reliable, ordered, timely delivery of data. A Bluetooth RFCOMM connection would be an example of a stream-mode connection. The threat model is as follows: * All long-range communication infrastructure (internet, phone network, etc) is monitored by the adversary. * The adversary has a limited ability to monitor short-range communication infrastructure (Bluetooth, wifi, etc). * The adversary may operate an unlimited number of overlay nodes. * The adversary only has a limited ability to persuade users to trust the operators of the adversary's nodes - thus the size of the cut between the adversary's nodes and the rest of the network is limited. * The adversary can't break standard cryptographic primitives, and therefore can't decrypt or undetectably modify communication between non-adversarial nodes. * There are some users who can keep their nodes secure - those who can't are counted as working for the adversary. Now, obviously there are a million issues to consider here, but the one I'm currently interested in is how much the adversary can learn about which messages are travelling across which connections by observing the encrypted traffic on those connections. This is a well-studied problem for black box models where the adversary sees the inputs and outputs of an anonymity system and wants to match inputs to outputs, but the situation here is slightly different - both because the adversary has a partial view, and because the system as a whole doesn't have inputs and outputs (though each node does). So I thought I'd go back to square one and try to establish some basic metrics, such as a metric for how much the size of a padded message (together with the observer's prior knowledge) reveals about the size of the original message - but maybe that's the wrong approach and there's a better way to carve up the problem? Thanks for your patience in reading this far! Cheers, Michael

- From:
- Lance Cottrell
- Date:
- 2011-08-02 @ 15:31

Yes On Aug 2, 2011, at 10:33 AM, Nick Mathewson wrote: > On Tue, Aug 2, 2011 at 7:04 AM, Michael Rogers <m--@gmx.com> wrote: > [...] >> Interesting. The scenario I have in mind isn't a mixnet - each relay can >> replace the received padding with its own padding before forwarding the >> message, so a message can change size while moving through the network - > > If every relay can replace padding with new padding, then won't the > relays have to know how much padding they can safely replace? And if > they know how much padding they can safely replace, won't that tell > the relays something about the original message size? > > I can imagine some scenarios where you can tell how much padding you > can remove. For one example, there are plenty of mixnet designs where > a fixed-size header is removed at each hop and replaced by a fixed > amount of padding. In these systems, the amount removed and replaced > reveals nothing about the underlying message size. For another > example, there might be a mixnet design where padding is only added at > each step, but never removed. In this case, you could learn some > information about messages by noting that a longer message would never > turn into a shorter message. > >> but I guess it's still useful to ask whether it's more effective against >> statistical attacks to pad the message to the same size each time (as in >> Tom's buckets scheme, for example), or to an independently chosen size >> each time. > > I think you might need to back up and try to figure out what kind of > threat model you're trying to resist here, and what kind of properties > you're trying to provide in the presence of that threat. > > -- > Nick > -- Lance Cottrell loki@obscura.com