3 April 2003. Add responses.
2 April 2003. Add responses. Link to working paper on usage log data management:
1 April 2003
Cryptome and Cartome attended a session today in New York of the Usage Log Data Management Working Group, on how site operators and ISPs might address problems of web user privacy, law enforcement access to user logs, commercial exploitation of logs, and creation of tools for management of usage log retention to protect user privacy. This was a private meeting held prior to the Computer, Freedom and Privacy conference beginning tomorrow.
E-mail on the session:
Subject: working group meeting on usage log retention Date: Fri, 21 Feb 2003 11:49:14 -0800 From: Jeff Ubois <firstname.lastname@example.org> To: <email@example.com> On April 1 at the Computers, Freedom, and Privacy conference in New York, I'm arranging a working group meeting that will develop a model policy and the specification for a tool to manage web usage log retention. I'm hoping you can attend. A draft of the announcement is below. If you have comments, suggestions, or know people who should attend, please let me know; it's not quite ready for public posting. I would really like to get your ideas about this; is there a time we could talk by phone? Jeff Ubois 510-527-2707 Usage Log Data Management Workshop at the Computers, Freedom & Privacy Conference New York City, April 1 The usage logs generated by web servers contain much data that is useful for site owners, but the current default configurations pose a threat to the privacy of individuals, and present a serious legal risk to organizations. The IP addresses collected in these logs by web servers are becoming increasingly easy to associate with the identity of particular individuals. For organizations, this means that they may violate their own privacy policies by retaining portions of these logs. At a minimum, web site owners are exposed to potential lawsuits and discovery requests. But today, no standard policy exists to manage the retention and eventual destruction of usage log data, and no tool exist to implmenet such policies. The goal of this all day working group meeting is to develop a policy that organizations can use to govern their retention of usage log data, and a specification for a utility for the Apache web server that will delete usage log data according to this policy. ----- Date: Thu, 27 Mar 2003 11:21:22 -0800 Subject: Re: [Fwd: working group meeting on usage log retention] From: Jeff Ubois <firstname.lastname@example.org> To: John Young <email@example.com> That's terrific. The meeting will be held in the New Yorker Hotel's Bay Kipp room, at 481 8th Ave from 9.30 - 5 on April 1. There will be folks from the EFF, the Internet Archive, Chilling Effects, the FTC, CMU, Umich, Gartner and a few other organizations showing up. I'll be sending out a longer note tomorrow to everyone attending. I'm very glad you can make it, I think your experience in this will be invaluable. Jeff On 3/27/03 1:40 PM, "John Young" <firstname.lastname@example.org> wrote: > Jeff, > > Nope, not disinterest, rather working outside NYC for 6 months on a > consultant job. > > Now I'm back and would welcome the opportunity to participate. Point > to the place and time and I'll be there with my partner in cyber-logrolling, > Deborah Natsios, Cartome operator. Date: Fri, 28 Mar 2003 16:47:08 -0800 Subject: Usage log working group: pre-conference notes From: Jeff Ubois <email@example.com> To: <firstname.lastname@example.org>, <email@example.com>, <firstname.lastname@example.org>, <email@example.com>, <firstname.lastname@example.org>, <email@example.com>, <firstname.lastname@example.org>, <email@example.com>, <firstname.lastname@example.org>, <email@example.com>, <firstname.lastname@example.org>, <email@example.com>, <firstname.lastname@example.org>, <email@example.com>, <firstname.lastname@example.org>, <email@example.com>, <firstname.lastname@example.org>, <email@example.com>, <firstname.lastname@example.org>, <email@example.com>, <firstname.lastname@example.org> Hi, Attached are some notes on the upcoming meeting about web usage log retention, which will be held April 1 from 9:00 a.m. to 5:00 p.m. in the Kips Bay Room at the New Yorker Hotel, 481 8th Ave, in New York City. I want to thank everyone for their willingness to contribute to this effort. Thanks to the referrals provided by many of you, we now have attendees and/or input from the Federal Trade Commission; the University of Michigan, UC Berkeley and Carnegie-Mellon; AT&T; EFF; Accrue; Chilling Effects; GartnerGroup; TrustE, the American Library Association; JSTOR; and the Internet Archive. We also have some prominent researchers from the security and privacy community. The agenda of the meeting is somewhat loosely structured, with a general progression over the course of the day from law, to technology, to recommendations. If you want to add something to it or make a short presentation, please let me know. A number of people have made excellent suggestions regarding questions for discussion, pre-conference reading, and possible solutions. A summary of these is in the attachment following the agenda. Based on discussions with several of you, I¹ve pulled together some notes on what a draft recommendation might include. This is very, very, far from a final product, but hopefully it can serve as a useful strawman that we can critique and improve. The best way to reach me is generally via email, but I am also generally available by cell phone at 415 850 5431. I would be happy to talk to anyone in advance of this meeting who has something to suggest, and especially to anyone who has some ideas about how we can best to converge on a set of recommendations. If there¹s anything anyone would like broadcast to participants in advance of the meeting, please let me know. I look forward to seeing you next week. Best Regards, Jeff Ubois
Attached to the last message, meeting purpose, agenda, background, bibliography, topics for discussion, references:
An "interesting paper on deducing identity from IP addresses" paper was also distributed to the group beforehand
Cryptome suggested that while a usage log retention policy is being formulated there should be an immediate public warning about the privacy threat posed by usage logs -- that logs are being subpoenaed and covertly surveilled by officials along with unpublicized commercial exploitation -- and that the group should announce its study was now commencing due to the urgency of the threat.
However, the group decided to make no announcement at the conference of its initiative, instead an announcement of policy proposals and management tools will be made on July 4, 2003.
Cryptome invites today public participation in formulating and promoting no-logging of web usage instead of retention management cloaked with unverifiable privacy policies --
1. To counter the long-promoted premise that usage logs must be kept for system administration.
2. To counter the present automatic logging of users and retention of user data.
3. To counter the premise that management of usage data retention should be the primary privacy goal rather than no logging at all.
5. To advance the premise that users should be the only parties to approve logging of their visits prior to any form of data retention management.
7. Describe the benefits of no-logging over privacy policies.
8. Describe the benefits of users' controlling what logs are created, retained and managed.
9. Descriptions of technical means for the user to prevent usage logging rather than rely on site operators' and ISPs' privacy assurances.
10. Means to deceive web logging programs used by sites which will not agree to no-logging.
Send to: email@example.com
Date: Tue, 01 Apr 2003 21:53:39 -0600 To: firstname.lastname@example.org From: email@example.com Subject: Copy of my email to Jeff Ubois Dear Jeff: I have a comment on logging and such. There is a page I put up that tries to address this issue, at: http://www.google-watch.org/cgi-bin/urldemo.htm Basically, I think you should approach Apache and ask them to change the default configuration for logging so that it does not include the QUERY_STRING. That's the portion after the question mark in a CGI request. In the case of search engines, this string contains the search terms. These get propagated all over the world because they often end up as REFERER strings in other logs. Search terms are rather sensitive items of information. It is already possible to configure Apache logging to strip out the QUERY_STRING, but it takes a fair amount of effort and research to pull it off, so no one does it. What I'm recommending is that Apache change their default logging so that it's the other way around -- it ought to require a lot of trouble and effort to make the logs *include* the QUERY_STRING! This one change would make a very big difference in the total privacy picture. Regards, Daniel Brandt --------------------------------------------------------------------- Public Information Research, PO Box 680635, San Antonio TX 78268-0635 Tel:210-509-3160 Fax:210-509-3161 Nonprofit publisher of NameBase http://www.namebase.org/ firstname.lastname@example.org ---------------------------------------------------------------------
Date: Thu, 03 Apr 2003 14:04:14 +0100 From: Ben Laurie <email@example.com> To: John Young <firstname.lastname@example.org> Cc: email@example.com, firstname.lastname@example.org Subject: Re: Logging of Web Usage John Young wrote: > Ben, > > Would you care to comment for publication on web logging > described in these two files: > > http://cryptome.org/no-logs.htm > > http://cryptome.org/usage-logs.htm > > Cryptome invites comments from others who know the capabilities > of servers to log or not, and other means for protecting user privacy > by users themselves rather than by reliance upon privacy policies > of site operators and government regulation. > > This relates to the data retention debate and current initiatives > of law enforcement to subpoena, surveil, steal and manipulate > log data. I don't have time right now to comment in detail (I will try to later), but it seems to me that, as someone else commented, relying on operators to not keep logs is really not the way to go. If you want privacy or anonymity, then you have to create it for yourself, not expect others to provide it for you. Of course, it is possible to reduce your exposure to others whilst still taking advantage of privacy-enhancing services they offer. Two obvious examples of this are the mixmaster anonymous remailer network, and onion routing. It seems to me if you want to make serious inroads into privacy w.r.t. logging of traffic, then what you want to put your energy into is onion routing. There is _still_ no deployable free software to do it, and that is ridiculous. It seems to me that this is the single biggest win we can have against all sorts of privacy invasions. Make log retention useless for any purpose other than statistics and maintenance. Don't try to make it only used for those purposes. Cheers, Ben.  FWIW, I'd be willing to work on that, but not on my own (unless someone wants to keep me in the style to which I am accustomed, that is). -- http://www.apache-ssl.org/ben.html http://www.thebunker.net/ "There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit." - Robert Woodruff
Date: Thu, 3 Apr 2003 12:09:41 -0600 From: Keith Ray <email@example.com> To: Cypherpunks <firstname.lastname@example.org> Subject: Re: Logging of Web Usage Quoting Ben Laurie <email@example.com>: > It seems to me if you want to make serious inroads into privacy w.r.t. > logging of traffic, then what you want to put your energy into is onion > routing. There is _still_ no deployable free software to do it, and that > is ridiculous. It seems to me that this is the single biggest win we > can have against all sorts of privacy invasions. This sounds like an interesting project to work on. It's hard to belive that only the DoD has played with this technology. Onion routing would seem to have a much larger impact on personal privacy on the Internet than projects like Freenet ever could. After browsing through some of the descriptions of the system, it appears to be a real-time remailer-type system for IP traffic. A client proxy will take the IP traffic, break it up into identically sized packets, and then layer encrypt them starting with the last onion router to the first. Each router along the path would decrypt its layer and then forward the packet to the next router. The part that I am worried about is the liability of running an exit router. I ran a mixmaster remailer for over six months and found out first hand the reaction of people to receiving anonymous death-threats, racial slurs, and spam. The saving grace was the opt-out list for people to refuse to receive future anonymous messages. However, with a real-time system that could encapsulate all IP traffic, this could be used for anonymous hacking. Even if you limit the exit remailer's traffic to just port 80 and actual HTTP requests, there are plenty of exploits and probes that require nothing more. Thanks to the PATRIOT act, those of us in the US can look forward to federal prosecution with possible life sentences if the wrong system is hacked through a router. When the FBI comes knocking, I doubt they will be satisifed with anonymous free speech arguments. DoD's Onion Routing research project http://www.onion-router.net/ -- Keith Ray <firstname.lastname@example.org> -- OpenPGP Key: 0x79269A12
Date: Wed, 2 Apr 2003 13:19:40 -0800 (PST) From: Morlock Elloi <email@example.com> Subject: Re: Logging of Web Usage To: firstname.lastname@example.org Frankly, it seems that some brains around here are softening. Relying on httpd operators to protect those who access is plain silly, even if echelon (funny how that word dropped below radar lately) did not exist. The proper way is, of course, self-protection. Start with tight control of outgoing info from the end-user machine (remove or fake all fields that are not essential, such as referrer, client application, client OS). Use proxies. If you own a multi-IP subnet randomly switch the originating IP - this fucks up most automated tracking. What doesn't exist is mixmaster-grade anon re-httpers. I guess that ones that would let just text through (no images/scripting etc.) would be repulsive enough for wide public and therefore useful. Once you provide your data, it is always retained forever. Learn to live with it.
Date: Wed, 2 Apr 2003 13:24:58 -0800 To: John Young <email@example.com>, Ben Laurie <firstname.lastname@example.org> From: Bill Frantz <email@example.com> Subject: Re: Logging of Web Usage Cc: firstname.lastname@example.org, email@example.com The http://cryptome.org/usage-logs.htm URL says: >Low resolution data in most cases is intended to be sufficient for >marketing analyses. It may take the form of IP addresses that have been >subjected to a one way hash, to refer URLs that exclude information other >than the high level domain, or temporary cookies. Note that since IPv4 addresses are 32 bits, anyone willing to dedicate a computer for a few hours can reverse a one way hash by exhaustive search. Truncating IPs seems a much more privacy friendly approach. This problem would be less acute with IPv6 addresses. Cheers - Bill ------------------------------------------------------------------------- Bill Frantz | Due process for all | Periwinkle -- Consulting (408)356-8506 | used to be the | 16345 Englewood Ave. firstname.lastname@example.org | American way. | Los Gatos, CA 95032, USA
Date: Thu, 3 Apr 2003 01:05:27 +0200 (CEST) From: Thomas Shaddack <email@example.com> To: Morlock Elloi <firstname.lastname@example.org> cc: <email@example.com> Subject: Re: Logging of Web Usage > Relying on httpd operators to protect those who access is plain silly, > even if echelon (funny how that word dropped below radar lately) did > not exist. Echelon could be grouped together with Carnivore and CALEA devices into the group of Generic Transport-level Eavesdroppers. No need to consider it separately, at least for technological purposes. (...am I right?) > What doesn't exist is mixmaster-grade anon re-httpers. I guess that ones that > would let just text through (no images/scripting etc.) would be repulsive > enough for wide public and therefore useful. Could it be constructed as eg. a FreeNet extension? Piggybacking on an existing system is easier than rolling out a whole new thing. > Once you provide your data, it is always retained forever. Learn to > live with it. What worries me a LOT is Google (and search engines in general). Very useful tool, and way too attractive to profile people by their search queries.
Date: Wed, 2 Apr 2003 18:16:18 -0800 From: Seth David Schoen <firstname.lastname@example.org> To: email@example.com, firstname.lastname@example.org Subject: Re: Logging of Web Usage Bill Frantz writes: > The http://cryptome.org/usage-logs.htm URL says: > > >Low resolution data in most cases is intended to be sufficient for > >marketing analyses. It may take the form of IP addresses that have been > >subjected to a one way hash, to refer URLs that exclude information other > >than the high level domain, or temporary cookies. > > Note that since IPv4 addresses are 32 bits, anyone willing to dedicate a > computer for a few hours can reverse a one way hash by exhaustive search. > Truncating IPs seems a much more privacy friendly approach. > > This problem would be less acute with IPv6 addresses. I'm skeptical that it will even take "a few hours"; on a 1.5 GHz desktop machine, using "openssl speed", I see about a million hash operations per second. (It depends slightly on which hash you choose.) This is without compiling OpenSSL with processor-specific optimizations. That would imply a mean time to reverse the hash of about 2100 seconds, which we could probably improve with processor-specific optimizations or by buying a more recent machine. What's more, we can exclude from our search parts of the IP address space which haven't been allocated, and optimize the search by beginning with IP networks which are more likely to be the source of hits based on prior statistical evidence. Even without _any_ of these improvements, it's just about 35 minutes on average. I used to advocate one-way hashing for logs, but a 35-minute search on an ordinary desktop PC is not much obstacle. It might still be helpful if you used a keyed hash and then threw away the key after a short time period (perhaps every 6 hours). Then you can't identify or link visitors across 6-hour periods. If the key is very long, reversing the hash could become very hard. The logging problem will depend on what server operators are trying to accomplish. Some people just want to try to count unique visitors; strangely enough, they might get more privacy-protective (and comparably precise) results by issuing short-lived cookies. -- Seth David Schoen <email@example.com> | Very frankly, I am opposed to people http://www.loyalty.org/~schoen/ | being programmed by others. http://vitanuova.loyalty.org/ | -- Fred Rogers (1928-2003), | 464 U.S. 417, 445 (1984)