Privacy Law and Policy Reporter
It has been said with appropriate irony that `in cyberspace, everyone will be anonymous for 15 minutes'.1 Cyberspace presents both an unexpected opportunity for private (and even anonymous) communications and transactions over distance, and the potential for a panopticon, surveillance more extensive than any previous form of social control.
This paper surveys some of the main elements of this relationship, starting with a look at some apparently new issues raised by the internet, and then by examining how existing privacy laws, particularly those dealing with interception and with privacy principles, deal with cyberspace issues.
Throughout the paper, the term `internet service provider' or `ISP' is used in a rather loose fashion, to encompass a variety of functionally distinct (though sometimes overlapping) parties such as internet access providers, content servers, content creators, and even carriers. As a general principle, liabilities should only fall on those with appropriate functional responsibilities, but this paper does not seek to draw out fine distinctions on that point.
The consequence is simply that vast quantities of personal information about all of us will be collected via a pervasive, world-wide-network (and stored on machines connected to it), whether we know or care, an event new in world history. The accessibility or interconn-ectedness of this information is contingent on many factors -- including custom, public opinion and law -- but is unlikely to be contingent on any serious technical considerations. Because the information will have been collected by processes related to one pervasive network, any impediments to it being found, published, or related to other data elsewhere on the internet are easily removed if those who control the information wish to remove them.
The past's great protectors of privacy: cost, distance, incompatibility, undiscover-ability etc, are all disappearing in the face of the internet and its protocols -- the great equalisers of the 21st century.
We also need to distinguish between those parts of a person's digital persona which are in `public' spaces in the sense of being able to be found by internet search engines or other means, and those parts which are in non-public spaces, either `proprietary' (the databases of a government or company) or `personal' (information found only on the networked computers of the person the subject of the information, or those that person has provided it to, such as by e-mail).
An important point is that those who hold parts of our digital persona in proprietary (or `closed') systems can easily cumulate that information with our `public' digital persona, as well as combining it with that held in other proprietary systems to which they have access. From the cumulative effect of our digital personae, others will draw inferences about our personalities, behaviour etc. The extent to which we will be able (technically and/or legally) to have multiple digital personae will be an important privacy issue.
We only exist virtually in cyberspace -- the digital persona is only a representation of the physical person that, as John Perry Barlow puts it, exists in `meatspace'. Identification occurs at the cyberspace/meatspace interface.
A well-known feature of cyberspace has been that it has often been relatively easy to impersonate someone. Recognising individuals over distance and time without recourse to human memory has always been a key organisational challenge to bureaucracies.4 Tokens, knowledge and biometrics, or combinations of these, provide the links between the physical person and the file. Identification in cyberspace intensifies the challenge because it removes any physical settings or proximity which assist identification, and it often requires real-time responses. The reliability of electronic commerce, or e-mail and other Internet transactions, or the believability of a person's digital persona, depends to a very large extent on the continuing reliability of links between the virtual and physical person.
Biometric identifiers entered directly into networked devices will in the longer run provide a main means of identification. In the more immediate future, smart cards are likely to provide one of the main bridges between physical and virtual identity. They have many potential advantages because they can include in the one token:
(i) digital representations of value (e-cash or credit);
(ii) digital signatures (to provide authentication of messages transmitted); and
(iii) digital biometric identifiers (to guarantee security or access to networks). The portability of smart cards means they can be the link between mobile people and pervasive networks.
Search engines, robots and Internet indexes
One of the most difficult privacy problems of the internet is the power of search engines and indexing facilities. One of the main protectors of privacy on the internet, as elsewhere, was inefficiency -- that it was very difficult to find anything unless someone told you where it was.5 This changed somewhat with very extensive indexes of Internet sites like Yahoo,6 but has gone forever with the release in December 1995 of DEC's Alta Vista search engine,7 and with the subsequent proliferation of e-mail, telephone, address and Usenet directories.
Web pages and Usenet news posts
John Hilvert explains the travails of one user of the Alta Vista search engine:8
When Internet user, Ed Chilton heard about the hot new search engine, Alta Vista, from Digital Equipment Corporation (DEC), he had to try it out. Alta Vista was introduced as a free service back in December last year to show-case DEC's ability to handle the Internet, no matter how it scaled. Using high end DEC Alpha systems and sophisticated software, Alta Vista gobbles and disgorges in a very accessible way, the entire catalogue of some 22 million web pages (11 billion words) and about the last two months of the content of 13,000 news groups. It handles 5 million search requests a day.
Alta Vista uses robots (also known as spiders or webcrawlers)9 to trawl the internet, creating complete word occurrence indexes of every web page and every item posted to every news group that it is allowed to access. As a result it is now possible to search for any occurrence of a name or phrase occurring anywhere in the text of any web page, or in any news posting.
Chilton said it was an important feature of newsgroups that users get to know each other's themes, axes to grind, and pet peeves. `What I do not expect is that the newsgroup clubhouse is bugged, and that what is said there, by any of us, will be recorded and made available to any person on the Internet, for whatever reason persons might have.' Chilton said DEC's Usenet search engine should be banned and its developers publicly brought to their knees.
The irony of all this is: I came across Chilton's privacy lament using the Alta Vista search engine.
As Mr Chilton lamented, the privacy issue here is that, although you must technically make such information available to all on the internet (either by posting it to a newsgroup or putting it in a public_html directory) before robots can index it, you do not necessarily expect that it will be read by anyone outside those with whom you have some common experience, or the information used for purposes completely outside the intended purposes for which it was provided. For example, those involved in creating web pages, or involved in newsgroup discussions, concerning (say) gay and lesbian issues or issues relating to minority religious groups, could find that information about them was being systematically compiled and disseminated so as to harm them. Those who once valued the net as an escape from the values of small communities may find there is no longer any escape except behind barricades of secret communications.
Should there be some privacy right not to be indexed? It is a difficult issue which involves freedom of speech and freedom of the press considerations in a new context, and any legislative intervention could be dangerous indeed.
A possible drawback of this single-file approach is that only a server administrator can maintain such a list, not the individual document maintainers on the server. This can be resolved by a local process to construct the single file from a number of others, but if, or how, this is done is outside of the scope of this document.
If there was a change to the html mark-up standard so that pages could contain information in their header that excluded robot indexing on a page-by-page basis, then such a technical solution would largely solve the problem -- provided all robots obeyed the Robot Exclusion Standard. This would in effect be an `opt out' solution to the problem.
Such a solution has already been adopted with Usenet news posts. Deja News, Alta Vista and some other search facilities allow users to insert the flag `x-no-archive:yes' at the beginning of each post, and they are then not indexed.
`Living down' old Internet information is still possible. Web-indexing engines only maintain details of current versions of pages. Most Usenet indexes (for example, Alta Vista) only retain postings for a few weeks or months, but DejaNews intends to archive all Usenet posts as far back as it can. However, it does accept requests for old posts to be deleted -- again, an opt-out solution.
Several sites have launched recently that allow you to search for e-mail addresses. At this stage, the results tend to be hit or miss, but it won't be long before these services are as comprehensive as other search engines. While being able to find an old friend or a distant relative is a valuable service, it can raise some privacy issues. First, you might not even know that you're listed. Second, if you want to remove your name, you have to write to each service and request that your name be deleted.Again, where such location information is culled from a wide variety of sources and aggregated, the surveillance capacity of the Internet could severely hamper participation in it by individuals who did not wish such location information concerning them to be instantly available, centrally stored, and regularly updated.
The Four 11 White Page Directory FAQ, the Bigfoot FAQ, and the Internet Address Finder FAQ all explain how you can remove your name from their list. However, you may have to be persistent. When I sent a message to Four 11 asking to have my name removed, I got an
e-mail back asking me to reconsider. (Weeks later, Four 11 still hasn't removed my name.) Notably, most of these sites built their databases with names culled from Usenet.
In addition to cataloguing online information, there are several directories on the Web that allow you to search for offline information, such as phone numbers and addresses -- one site even links successful matches to a map showing how to find a person's home! Switchboard offers both a business and a residential directory. Switchboard also publishes a policy statement explaining where the company gets its data, and how you can remove your name.
If so, can the mere fact of external indexing by a search engine then turn something into a generally available publication, destroying otherwise existing privacy rights? That would be somewhat paradoxical. It might be argued that the act of indexing by an index like Alta Vista or a location service was, in some cases, an unfair collection practice (IPP 1) -- which does apply to generally available publications.
With most web browsing software, such as Netscape or Microsoft Explorer, any request to a web sit discloses to the web server accessed:13
Current browsers don't allow these disclosure mechanisms to be turned off, although it is not obvious why users could not be given the option to turn off any other than the first one listed. A user can can delete cookies from his or her machine, but they are like (if mixed Mediterranean metaphors are allowed) a Medusa-like Trojan horse that keeps re-appearing inside your PC, no matter how often you trash it. Commenting on cookies, but with comments equally applicable to other forms of disclosure, Marc Rotenberg identifies the privacy issue as `data collection practices should be fully visible to the individual ... Any feature which results in the collection of personally identifiable information should be made known prior to operation and ... the individual should retain the ability to disengage the feature if he or she so chooses.'16
Another area where web users may have little awareness of who is capable of finding out details of their browsing habits, is caused by the use of proxy servers and proxy caches, where an internet service provider (ISP), in order to preserve bandwidth and costs, caches all pages accessed by users of the ISP, so that subsequent users access copies of the page in the ISP's cache, rather than on the `original' site. However, this means that an ISP who is potentially local to the user -- and with whom the user is a client -- can record information about the user's browsing habits which the user would rather have known only by a server on the other side of the world. There are many other aspects of monitoring of network usage that also raise privacy issues. The effect of telecommunications interception laws will be considered later.Graham Greenleaf, General Editor. Thanks to Nigel Waters and Roger Clarke for their valuable comments on the draft.
1. I stole this quip from John Hilvert, via Andy Warhol and who knows who else ...
2. Science fiction writers and others make it a lot more than that! -- see http://www.anu.edu.au/people/Roger.Clarke/EscVel.html
3. Roger Clarke `The Digital Persona and its Application to Data Surveillance', The Information Society March 1994; for abstract only http://www.anu.edu.au/ people/Roger.Clarke/DV/AbstractDig Persona.html
4. Roger Clarke `Human Identification in Information Systems: Management Challenges and Public Policy Issues' Information Technology and People http://www.anu.edu.au/people/Roger.Clarke/DV/HumanID.html
5. A year ago I heard the world-wide-web compared to removing the spines from all the books in the Library of Congress, just as a tornado hit. You wouldn't say that any more.
8. John Hilvert `Private Lies', Information Age, May 1996, pp 18-23
9. See `World Wide Web Robots, Wanderers, and Spiders' http://info.webcrawler.com/mak/projects/robots/robots.html
10. `A Standard for Robot Exclusion' -- http://188.8.131.52/notes/robots/norobots.htm
11. In the file on the local URL/`robots.txt' -- which only the server administrator could normally access.
12. Susan Stellin `How private is your personal information?' in C-Net's Digital Life series, 1996 -- http://www.cnet.com/Content/Features/Dlife/Privacy2/
13. Geoff King, Manager of AustLII (http://www.austlii.edu.au/), assisted with some of these details.
14. For example, with Netscape, by way of a file called COOKIES.TXT on Windows machines or `MagicCookie' on the Apple Macintosh; see http://proxima.cs.purdue.edu:7000/remote.html to inspect the contents of the cookie on your computer, and other information
15. The most likely use of cookie information is therefore for sites you have previously visited to customise the appearance of their site to take into account what they already know about you.
16. Quoted in John Hilvert, op cit.