3 April 2003. Add responses.
2 April 2003. Add responses. Link to working paper on usage log data management:
http://cryptome.org/usage-logs.htm
1 April 2003
Cryptome and Cartome attended a session today in New York of the Usage Log Data Management Working Group, on how site operators and ISPs might address problems of web user privacy, law enforcement access to user logs, commercial exploitation of logs, and creation of tools for management of usage log retention to protect user privacy. This was a private meeting held prior to the Computer, Freedom and Privacy conference beginning tomorrow.
E-mail on the session:
Subject: working group meeting on usage log retention Date: Fri, 21 Feb 2003 11:49:14 -0800 From: Jeff Ubois <jeff@ubois.com> To: <jya@pipeline.com> On April 1 at the Computers, Freedom, and Privacy conference in New York, I'm arranging a working group meeting that will develop a model policy and the specification for a tool to manage web usage log retention. I'm hoping you can attend. A draft of the announcement is below. If you have comments, suggestions, or know people who should attend, please let me know; it's not quite ready for public posting. I would really like to get your ideas about this; is there a time we could talk by phone? Jeff Ubois 510-527-2707 Usage Log Data Management Workshop at the Computers, Freedom & Privacy Conference New York City, April 1 The usage logs generated by web servers contain much data that is useful for site owners, but the current default configurations pose a threat to the privacy of individuals, and present a serious legal risk to organizations. The IP addresses collected in these logs by web servers are becoming increasingly easy to associate with the identity of particular individuals. For organizations, this means that they may violate their own privacy policies by retaining portions of these logs. At a minimum, web site owners are exposed to potential lawsuits and discovery requests. But today, no standard policy exists to manage the retention and eventual destruction of usage log data, and no tool exist to implmenet such policies. The goal of this all day working group meeting is to develop a policy that organizations can use to govern their retention of usage log data, and a specification for a utility for the Apache web server that will delete usage log data according to this policy. ----- Date: Thu, 27 Mar 2003 11:21:22 -0800 Subject: Re: [Fwd: working group meeting on usage log retention] From: Jeff Ubois <jeff@ubois.com> To: John Young <jya@pipeline.com> That's terrific. The meeting will be held in the New Yorker Hotel's Bay Kipp room, at 481 8th Ave from 9.30 - 5 on April 1. There will be folks from the EFF, the Internet Archive, Chilling Effects, the FTC, CMU, Umich, Gartner and a few other organizations showing up. I'll be sending out a longer note tomorrow to everyone attending. I'm very glad you can make it, I think your experience in this will be invaluable. Jeff On 3/27/03 1:40 PM, "John Young" <jya@pipeline.com> wrote: > Jeff, > > Nope, not disinterest, rather working outside NYC for 6 months on a > consultant job. > > Now I'm back and would welcome the opportunity to participate. Point > to the place and time and I'll be there with my partner in cyber-logrolling, > Deborah Natsios, Cartome operator. Date: Fri, 28 Mar 2003 16:47:08 -0800 Subject: Usage log working group: pre-conference notes From: Jeff Ubois <jeff@ubois.com> To: <jeff@ubois.com>, <lmazzarella@ftc.gov>, <i@post.harvard.edu>, <bob.page@accrue.com>, <brewster@archive.org>, <lesk@acm.org>, <chlaurant@epic.org>, <nancy.kranich@nyu.edu>, <jya@pipeline.com>, <wild@eff.org>, <fran@truste.org>, <nk@jstor.org>, <ver@umich.edu>, <jurban@law.berkeley.edu>, <rms@computerbytesman.com>, <bob.page@accrue.com>, <latanya@cs.cmu.edu>, <jeff@ubois.com>, <mlecky@sympatico.ca>, <wendy@seltzer.com>, <bsteinhardt@aclu.org> Hi, Attached are some notes on the upcoming meeting about web usage log retention, which will be held April 1 from 9:00 a.m. to 5:00 p.m. in the Kips Bay Room at the New Yorker Hotel, 481 8th Ave, in New York City. I want to thank everyone for their willingness to contribute to this effort. Thanks to the referrals provided by many of you, we now have attendees and/or input from the Federal Trade Commission; the University of Michigan, UC Berkeley and Carnegie-Mellon; AT&T; EFF; Accrue; Chilling Effects; GartnerGroup; TrustE, the American Library Association; JSTOR; and the Internet Archive. We also have some prominent researchers from the security and privacy community. The agenda of the meeting is somewhat loosely structured, with a general progression over the course of the day from law, to technology, to recommendations. If you want to add something to it or make a short presentation, please let me know. A number of people have made excellent suggestions regarding questions for discussion, pre-conference reading, and possible solutions. A summary of these is in the attachment following the agenda. Based on discussions with several of you, I¹ve pulled together some notes on what a draft recommendation might include. This is very, very, far from a final product, but hopefully it can serve as a useful strawman that we can critique and improve. The best way to reach me is generally via email, but I am also generally available by cell phone at 415 850 5431. I would be happy to talk to anyone in advance of this meeting who has something to suggest, and especially to anyone who has some ideas about how we can best to converge on a set of recommendations. If there¹s anything anyone would like broadcast to participants in advance of the meeting, please let me know. I look forward to seeing you next week. Best Regards, Jeff Ubois
Attached to the last message, meeting purpose, agenda, background, bibliography, topics for discussion, references:
http://cryptome.org/usage-logs.htm
An "interesting paper on deducing identity from IP addresses" paper was also distributed to the group beforehand
http://cryptome.org/trails1.pdf
Cryptome suggested that while a usage log retention policy is being formulated there should be an immediate public warning about the privacy threat posed by usage logs -- that logs are being subpoenaed and covertly surveilled by officials along with unpublicized commercial exploitation -- and that the group should announce its study was now commencing due to the urgency of the threat.
However, the group decided to make no announcement at the conference of its initiative, instead an announcement of policy proposals and management tools will be made on July 4, 2003.
Cryptome has long warned that no web user privacy policy is reliable and that system administrators should not be trusted, including Cryptome's, due to compromises made for economic, legal and political purposes. That users alone should determine what data about them, if any, is to be generated, collected, archived and manipulated. This should include web usage logs.
Cryptome invites today public participation in formulating and promoting no-logging of web usage instead of retention management cloaked with unverifiable privacy policies --
1. To counter the long-promoted premise that usage logs must be kept for system administration.2. To counter the present automatic logging of users and retention of user data.
3. To counter the premise that management of usage data retention should be the primary privacy goal rather than no logging at all.
4. To counter the premise that privacy policy is sufficient warning to and protection of users.
5. To advance the premise that users should be the only parties to approve logging of their visits prior to any form of data retention management.
6. To refute the concept that site operators and ISPs, rather than users themselves, should be the parties which protect Internet users by privacy policy and usage log retention management.
7. Describe the benefits of no-logging over privacy policies.
8. Describe the benefits of users' controlling what logs are created, retained and managed.9. Descriptions of technical means for the user to prevent usage logging rather than rely on site operators' and ISPs' privacy assurances.
10. Means to deceive web logging programs used by sites which will not agree to no-logging.
11. Means for detecting covert logging taking place behind privacy policy.
Send to: jya@pipeline.com
From: "Stef Caunter" <stef@caunter.ca> To: <jya@pipeline.com> Subject: formulating and promoting no-logging Date: Tue, 1 Apr 2003 21:18:24 -0500 JYA Interesting topic. My thoughts follow your numbering: 1. To counter the long-promoted premise that usage logs must be kept for system administration. The only necessary log for this is the error_log. It can be set to several levels of detail about the webserver function, and about "500" category server errors. All other logs simply record file requests, successful or not, and client browser headers, and show nothing about the relative health of the server. To the contrary, they force a data write for every file requested, increasing the load on the box. 2. To counter the present automatic logging of users and retention of user data. Apache runs better and faster without automatic logging of user requests. The default "high-performance" configuration shipped with version 2 provides for zero logging of client requests. It is not necessary, and can be seen as counter-productive. 3. To counter the premise that management of usage data retention should be the primary privacy goal rather than no logging at all. We tend to retain far too much data in this business, just because it is possible. 4. To counter the premise that privacy policy is sufficient warning to and protection of users. A view you have been championing, that these privacy policy documents are irrelevant to law enforcement and data mining operations, is the only intelligent approach. You can be tied to your IP. 5. To advance the premise that users should be the only parties to approve logging of their visits prior to any form of data retention management. One of my favourite demonstrations when I lecture on this subject is to fire up a new install of IE and to dwell on the first popup warning, which says, "You are about to send information over the internet. It might be possible for others to view ..." and to make people understand that this is true and they should not forget about it. The default "secure connection" warning gives a false sense of security, as it makes no mention of the abili ty to track down an IP address to an ISP assigned customer's activities, encrypted session or not. 6. To refute the concept that site operators and ISPs, rather than users themselves, should be the parties which protect Internet users by privacy policy and usage log retention management. Site operators interested in traffic levels can run counters. ISPs notoriously ignore any concept of privacy; their email server logs are readable by admins, and their web server logs chart advertiser success. 7. Describe the benefits of no-logging over privacy policies. Absolute no-logging as a policy can be more secure. The absence of information means that a compromise is less meaningful. The ability to read server logs is one of the pleasures of a system cracker. The only meaningful log to system administration is the webserver error log, and system daemon message logging. Everything else is trivia, or navel gazing. 8. Describe the benefits of users' controlling what logs are created, retained and managed. 9. Descriptions of technical means for the user to prevent usage logging rather than rely on site operators' and ISPs' privacy assurances. 10. Means to deceive web logging programs used by sites which will not agree to no-logging. These three points force us to confront the request/response nature of the HTTP protocol. To successfully communicate, you must self-identify. Proxying itself is subject to the same logging problem. In a heavily ad and traffic driven industry, the logfiles demonstrate success; this success can be denoted anonymously in a webserver log, but not without technical expertise. Logging is seen to be a "value-add" for hosting companies. The sense of this could theoretically be reversed. Unless no-logging is seen to be a value-add of its own by both surfers and sites, we will not see it put in practice. Traffic can be measured by byte transfer and bandwidth usage as effectively; measuring network usage is done by default on most UNIX systems; counting and charting the number of webserver processes responding to client requests is a very effective way of seeing how busy a dedicated webserver is at any time; transaction based webservers are already counting sales and money, and that is what matters to their owners. Extra server traffic can show up in the network transfer data; larger server logs without higher sales mean nothing; they mean nothing and are of no value anyway unless they are data mined by law enforcement or demographers. 11. Means for detecting covert logging taking place behind privacy policy. Trusted third party verification would be required, much like CAs. If a site is running advertising it's fair to say it's logging and data mining.
Date: Tue, 01 Apr 2003 21:53:39 -0600 To: jya@pipeline.com From: namebase@earthlink.net Subject: Copy of my email to Jeff Ubois Dear Jeff: I have a comment on logging and such. There is a page I put up that tries to address this issue, at: http://www.google-watch.org/cgi-bin/urldemo.htm Basically, I think you should approach Apache and ask them to change the default configuration for logging so that it does not include the QUERY_STRING. That's the portion after the question mark in a CGI request. In the case of search engines, this string contains the search terms. These get propagated all over the world because they often end up as REFERER strings in other logs. Search terms are rather sensitive items of information. It is already possible to configure Apache logging to strip out the QUERY_STRING, but it takes a fair amount of effort and research to pull it off, so no one does it. What I'm recommending is that Apache change their default logging so that it's the other way around -- it ought to require a lot of trouble and effort to make the logs *include* the QUERY_STRING! This one change would make a very big difference in the total privacy picture. Regards, Daniel Brandt --------------------------------------------------------------------- Public Information Research, PO Box 680635, San Antonio TX 78268-0635 Tel:210-509-3160 Fax:210-509-3161 Nonprofit publisher of NameBase http://www.namebase.org/ namebase@earthlink.net ---------------------------------------------------------------------
Date: Thu, 03 Apr 2003 14:04:14 +0100 From: Ben Laurie <ben@algroup.co.uk> To: John Young <jya@pipeline.com> Cc: cypherpunks@lne.com, cryptography@wasabisystems.com Subject: Re: Logging of Web Usage John Young wrote: > Ben, > > Would you care to comment for publication on web logging > described in these two files: > > http://cryptome.org/no-logs.htm > > http://cryptome.org/usage-logs.htm > > Cryptome invites comments from others who know the capabilities > of servers to log or not, and other means for protecting user privacy > by users themselves rather than by reliance upon privacy policies > of site operators and government regulation. > > This relates to the data retention debate and current initiatives > of law enforcement to subpoena, surveil, steal and manipulate > log data. I don't have time right now to comment in detail (I will try to later), but it seems to me that, as someone else commented, relying on operators to not keep logs is really not the way to go. If you want privacy or anonymity, then you have to create it for yourself, not expect others to provide it for you. Of course, it is possible to reduce your exposure to others whilst still taking advantage of privacy-enhancing services they offer. Two obvious examples of this are the mixmaster anonymous remailer network, and onion routing. It seems to me if you want to make serious inroads into privacy w.r.t. logging of traffic, then what you want to put your energy into is onion routing. There is _still_ no deployable free software to do it, and that is ridiculous[1]. It seems to me that this is the single biggest win we can have against all sorts of privacy invasions. Make log retention useless for any purpose other than statistics and maintenance. Don't try to make it only used for those purposes. Cheers, Ben. [1] FWIW, I'd be willing to work on that, but not on my own (unless someone wants to keep me in the style to which I am accustomed, that is). -- http://www.apache-ssl.org/ben.html http://www.thebunker.net/ "There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit." - Robert Woodruff
Date: Thu, 3 Apr 2003 12:09:41 -0600 From: Keith Ray <keith@nullify.org> To: Cypherpunks <cypherpunks@lne.com> Subject: Re: Logging of Web Usage Quoting Ben Laurie <ben@algroup.co.uk>: > It seems to me if you want to make serious inroads into privacy w.r.t. > logging of traffic, then what you want to put your energy into is onion > routing. There is _still_ no deployable free software to do it, and that > is ridiculous[1]. It seems to me that this is the single biggest win we > can have against all sorts of privacy invasions. This sounds like an interesting project to work on. It's hard to belive that only the DoD has played with this technology. Onion routing would seem to have a much larger impact on personal privacy on the Internet than projects like Freenet ever could. After browsing through some of the descriptions of the system, it appears to be a real-time remailer-type system for IP traffic. A client proxy will take the IP traffic, break it up into identically sized packets, and then layer encrypt them starting with the last onion router to the first. Each router along the path would decrypt its layer and then forward the packet to the next router. The part that I am worried about is the liability of running an exit router. I ran a mixmaster remailer for over six months and found out first hand the reaction of people to receiving anonymous death-threats, racial slurs, and spam. The saving grace was the opt-out list for people to refuse to receive future anonymous messages. However, with a real-time system that could encapsulate all IP traffic, this could be used for anonymous hacking. Even if you limit the exit remailer's traffic to just port 80 and actual HTTP requests, there are plenty of exploits and probes that require nothing more. Thanks to the PATRIOT act, those of us in the US can look forward to federal prosecution with possible life sentences if the wrong system is hacked through a router. When the FBI comes knocking, I doubt they will be satisifed with anonymous free speech arguments. DoD's Onion Routing research project http://www.onion-router.net/ -- Keith Ray <keith@nullify.org> -- OpenPGP Key: 0x79269A12
Date: Wed, 2 Apr 2003 13:19:40 -0800 (PST) From: Morlock Elloi <morlockelloi@yahoo.com> Subject: Re: Logging of Web Usage To: cypherpunks@lne.com Frankly, it seems that some brains around here are softening. Relying on httpd operators to protect those who access is plain silly, even if echelon (funny how that word dropped below radar lately) did not exist. The proper way is, of course, self-protection. Start with tight control of outgoing info from the end-user machine (remove or fake all fields that are not essential, such as referrer, client application, client OS). Use proxies. If you own a multi-IP subnet randomly switch the originating IP - this fucks up most automated tracking. What doesn't exist is mixmaster-grade anon re-httpers. I guess that ones that would let just text through (no images/scripting etc.) would be repulsive enough for wide public and therefore useful. Once you provide your data, it is always retained forever. Learn to live with it.
Date: Wed, 2 Apr 2003 13:24:58 -0800 To: John Young <jya@pipeline.com>, Ben Laurie <ben@algroup.co.uk> From: Bill Frantz <frantz@pwpconsult.com> Subject: Re: Logging of Web Usage Cc: cypherpunks@lne.com, cryptography@wasabisystems.com The http://cryptome.org/usage-logs.htm URL says: >Low resolution data in most cases is intended to be sufficient for >marketing analyses. It may take the form of IP addresses that have been >subjected to a one way hash, to refer URLs that exclude information other >than the high level domain, or temporary cookies. Note that since IPv4 addresses are 32 bits, anyone willing to dedicate a computer for a few hours can reverse a one way hash by exhaustive search. Truncating IPs seems a much more privacy friendly approach. This problem would be less acute with IPv6 addresses. Cheers - Bill ------------------------------------------------------------------------- Bill Frantz | Due process for all | Periwinkle -- Consulting (408)356-8506 | used to be the | 16345 Englewood Ave. frantz@pwpconsult.com | American way. | Los Gatos, CA 95032, USA
Date: Thu, 3 Apr 2003 01:05:27 +0200 (CEST) From: Thomas Shaddack <shaddack@ns.arachne.cz> To: Morlock Elloi <morlockelloi@yahoo.com> cc: <cypherpunks@lne.com> Subject: Re: Logging of Web Usage > Relying on httpd operators to protect those who access is plain silly, > even if echelon (funny how that word dropped below radar lately) did > not exist. Echelon could be grouped together with Carnivore and CALEA devices into the group of Generic Transport-level Eavesdroppers. No need to consider it separately, at least for technological purposes. (...am I right?) > What doesn't exist is mixmaster-grade anon re-httpers. I guess that ones that > would let just text through (no images/scripting etc.) would be repulsive > enough for wide public and therefore useful. Could it be constructed as eg. a FreeNet extension? Piggybacking on an existing system is easier than rolling out a whole new thing. > Once you provide your data, it is always retained forever. Learn to > live with it. What worries me a LOT is Google (and search engines in general). Very useful tool, and way too attractive to profile people by their search queries.
Date: Wed, 2 Apr 2003 18:16:18 -0800 From: Seth David Schoen <schoen@loyalty.org> To: cypherpunks@lne.com, cryptography@wasabisystems.com Subject: Re: Logging of Web Usage Bill Frantz writes: > The http://cryptome.org/usage-logs.htm URL says: > > >Low resolution data in most cases is intended to be sufficient for > >marketing analyses. It may take the form of IP addresses that have been > >subjected to a one way hash, to refer URLs that exclude information other > >than the high level domain, or temporary cookies. > > Note that since IPv4 addresses are 32 bits, anyone willing to dedicate a > computer for a few hours can reverse a one way hash by exhaustive search. > Truncating IPs seems a much more privacy friendly approach. > > This problem would be less acute with IPv6 addresses. I'm skeptical that it will even take "a few hours"; on a 1.5 GHz desktop machine, using "openssl speed", I see about a million hash operations per second. (It depends slightly on which hash you choose.) This is without compiling OpenSSL with processor-specific optimizations. That would imply a mean time to reverse the hash of about 2100 seconds, which we could probably improve with processor-specific optimizations or by buying a more recent machine. What's more, we can exclude from our search parts of the IP address space which haven't been allocated, and optimize the search by beginning with IP networks which are more likely to be the source of hits based on prior statistical evidence. Even without _any_ of these improvements, it's just about 35 minutes on average. I used to advocate one-way hashing for logs, but a 35-minute search on an ordinary desktop PC is not much obstacle. It might still be helpful if you used a keyed hash and then threw away the key after a short time period (perhaps every 6 hours). Then you can't identify or link visitors across 6-hour periods. If the key is very long, reversing the hash could become very hard. The logging problem will depend on what server operators are trying to accomplish. Some people just want to try to count unique visitors; strangely enough, they might get more privacy-protective (and comparably precise) results by issuing short-lived cookies. -- Seth David Schoen <schoen@loyalty.org> | Very frankly, I am opposed to people http://www.loyalty.org/~schoen/ | being programmed by others. http://vitanuova.loyalty.org/ | -- Fred Rogers (1928-2003), | 464 U.S. 417, 445 (1984)