Contributed Papers Schedule:
Each speaker will be allotted 15 minutes, plus 5
minutes for questions. Facilities for PowerPoint presentation will
Thursday, October 14th
4:00PM – 6:00PM
Chair: Dr. Bamshad Mobasher, Associate Professor, CTI
Mr. Alex Cameron,
University of Ottawa, Law and Technology,
for Privacy: Internet service providers, privacy and DRM"
Dr. Yucel Saygin, Sabanci University, "Are Anonymous Documents Really Anonymous in the Age of Data Mining?"
Prof. Dennis Hirsch, Capital University Law School,
"Regulating Privacy: Environmental Law
for the Second Industrial Revolution"
4:00PM – 6:00PM
Chair: Dr. Katherine Strandburg, J.D., Assistant Professor, CIPLIT
Mr. Orville Wilson, Security Consultant, "Security and Usability: The Viability of Passwords and Biometrics", Presentation (PPT ~ 210 KB)
Dr. Arthur Keller, UC
Santa Cruz and Open Voting Consortium, "Privacy
Issues in an Electronic Voting Machine",
~ 150 KB)
Prof. Alessandro Acquisti, Carnegie Mellon University, "Privacy and Rationality: Evidence from Survey Data", Presentation (PPT ~ 123 KB)
Dr. Bo Xu, Dr. Ouri
of Illinois at Chicago, "Privacy
Solutions by Mobile Peer-to-Peer Communication"
Friday, October 15th
11:15AM – 1:00PM
Chair: Dr. Daniela Raicu, Assistant Professor, CTI
Prof. Eric Goldman,
Marquette University Law School, "Data
Mining, Unwanted Marketing and Attention Consumption",
~ 113 KB)
Dr. Traian Truta, Northern Kentucky University, "Global Disclosure Risk for Microdata with Continuous Attributes", Presentation (PPT ~ 400 KB)
Dr. Yuval Elovici,
Ben-Gurion University, "Hidden
Web Privacy Preservation Surfing (Hi-WePPS) Model"
Mr. Stephen Raden, DePaul CTI, "Survey of National Identification in USA. Past, Present, Future"
The Social Security number (SSN) was developed for the use of one government organization, the Social Security Administration in 1943. Eventually, it was adopted by all government bodies as a unique identifier for U.S citizen as there was executive pressure for these government branches to not waste the research & development expense for developing their own numbering schemes to identify United States citizens. It has been widely criticized as an inadequate number for data processing even by a trusted body within our government. It has poor numerical properties for security. It is short, and the odds of any random digit number being a valid number are good. It does not contain a checksum so it can not be confirmed to be a valid number without resort to a lookup in an additional data source. Not all citizens are guaranteed to have one as the number is available on request give rise to further problems of error and fraud. In the current era, the SSN has been adopted by as an unique identifier of individuals for a large percentage of information systems across many organizations and businesses both large and small. Records in individuals are stored in databases. In database terms, if the SSN does not act as a primary key or unique index for the individual, it is likely to appear as a candidate key in design terms and as an indexed search key in database implementation. Not only does the (hopeful) uniqueness of the SSN make this a desirable number for use by organizations, the responsibility for possession and recall of the number are pushed way from the business and on to the consumer. To decline to offer your social security number for setting up a particular account or service request, is in many cases, is considered akin to being a luddite or an obstructionist. To participate in modern life, one invariably will give out this number to more than a few non governmental enterprises. In the United States, where the same unique identifier that is officially used for social benefits and tax payer identification, is propagated in a myriad of non-official places and in un-secure transactions and files, the opportunity for identity theft, privacy invasion, and criminal fraud sour. Therefore it would be useful to examine: 1. The history of unique identifiers in this country as well other nations including privacy, legislative, information theory, numerical , or cryptographic concerns. 2. Issues migrating to another number and the difficulty of doing so in database-centric information systems. I would propose to look at the properties of the new unique identifiers, the algorithms used to affect the data transformation, and how the application and business processes are also migrated against the use of the social security number. 3. What are the pros and cons of biometric and other advanced technological solutions for absolute identification of individuals? References: Garfinkel, Simson Database Nation: The Death of Privacy in the 21st Century, 2000 O’Reiley & Associates.
Mr. Alex Cameron, University of Ottawa, Law and Technology, "Pipefitting for Privacy: Internet service providers, privacy and DRM"
If Internet service providers (ISPs) are merely conduits - as the metaphor goes - then the prospect of digital rights management (DRM) ought to raise concerns about leaky pipes. DRM holds great promise for copyright owners and others who seek a solution to online copyright infringement. It does so, however, by siphoning personal data from ISPs’ pipes, tapping directly into them. DRM also promises copyright owners new business models and new revenue streams precisely because it can change the way that content flows through the Internet's intricate system of pipes. Consequently, the promise of DRM is potentially accompanied by the peril of individual privacy. This paper explores the unique position and power of ISPs in the ongoing debate regarding DRM and its effect on personal privacy. As operators of the pipes that connect individuals to the Internet, ISPs are uniquely situated between customers and copyright owners’ online content. As a result of their special position, ISPs have been aggressively targeted by public and private entities for a variety of purposes. ISPs have in many cases attempted to maintain an intermediary status that requires them to balance a complex set of external rights and interests at all ends and connections of their pipes. For the increasing number of ISPs who have branched into providing content to users, this balancing challenge is even greater as interests internal to an ISP may conflict. Yet, for all ISPs, nowhere is their challenge more acute than in the privacy threats posed by DRM. Copyright owners are forging ahead with DRM and specifically targeting DRM implementation at ISPs. This is most clearly seen in the ISP experience in the United States. ISP responses to these targeting efforts are important because, when it comes to personal privacy, many DRM systems developed to date do not mesh well with the rights and interests of ISPs or their customers. This paper encourages ISPs to play an important role in building privacy-friendly pipes. Part one this paper briefly describes the relationship between ISPs, technological protection measures, DRM systems and anti-circumvention laws. Part two examines the impact that DRM and the DMCA have on individuals’ privacy. Arguing that ISPs are information sources and enforcement targets for a variety of entities and purposes, the third part of this paper suggests that ISPs have been and will increasingly be targets for DRM implementation. ISPs are distinct sources of information about what users do online as well as powerful controllers of what users do online – this is the kind of information and control that DRM operates on. The final part of the paper ties together the themes above and asserts that privacy is taking on an increased significance for ISPs in the DRM debate. This part argues that ISPs have a number of interconnected legal, business and social privacy-based reasons to continue to approach DRM with great caution. Asserting that ISPs stand to gain and accomplish a great deal by respecting and standing up for privacy, this part concludes the case for ISPs constructing privacy-friendly pipes between customers and content as part of a solution to the privacy perils raised by DRM.
Dr. Tal Zarsky, University of Haifa - Faculty of Law (Israel), "Information Privacy, Tailored Content & Unfair Advertising: Bringing It All Together; A Unified View of Personal Data Flows, Concrete Privacy Concerns and the Power of Persuasion in the Internet Society"
Over the last several years, legal scholars have been grappling with the concept of online information privacy within the Internet context. As this debate unfolded, it has become apparent that regulators and legal scholars are still striving to establish the fundamental building blocks of the discussion, such as the reasons such information privacy is essential, and the direct and precise detriments of its overall deterioration. Rather than relying on concrete examples, the privacy discourse commonly refers to somewhat abstract notions - such as liberty or autonomy - or to the negative visceral response that arises when the public is confronted with contemporary practices of personal data collection, aggregation and use (as reflected in public opinion polls). Yet in light of the prospect that strict data protection rules would be set in place, these explanations are insufficient, especially for those commercial entities that have a great deal to lose from restrictions over the collection and use of such information. The foundations of information privacy laws must rely upon the understanding of the concrete detriments arising from personal data collection, analysis and use. Therefore, this Article intends to bridge the crucial gap between the somewhat abstract rationales for information privacy, the visceral feeling and public uproar associated with breaches of public trust, the actual and concrete detriments associated with today’s personal information market, and the specific forms of privacy-enabling regulation that are or should be introduced. As an overall analysis of all possible detriments is beyond the scope of this Article, it addresses a specific subset of this issue: The actions of online marketers, advertisers, content providers or any other one of those entities whom constantly interact with the public, and for a variety of reasons have a vested interest in influencing it in a variety of ways. Such influence might amount to an attempt to persuade a specific group of individuals to purchase a product or service, or to adopt a different lifestyle or set of preferences. The new Internet environment places these commercial entities at a crucial nexus, where they are capable of both collecting vast amounts of personal information and using such data. The Article argues that a proper understanding of today’s privacy concerns leads to acknowledging the new powers vested with these commercial entities, which are at times overwhelming and abusive, and therefore mandate regulatory intervention. To correctly identify the detriments stemming from the actions of such commercial entities, the analysis begins (in Part I) by addressing the overall flow of personal information, while taking into account the most recent business practices and technological advances. Starting with collection, the Article describes how the Internet allows content providers and others to gather personal information pertaining to the individuals’ actions, preferences and even personal traits. The Article explains why personal data available through the Internet is of greater quality and quantity than other forms of personal data available offline, while referring to the Internet’s inherent traits as a medium that facilitates a digital environment and omnipresent surveillance. Moving down the information flow, the Article examines how technological advances in the fields of data storage and analysis allows commercial entities to derive knowledge from the rich datasets they now retain, while utilizing once unusable information. The Article emphasizes the use of data mining applications that allow analysts to derive additional patterns and clusters from databases in an automated process that does not require the initiation of a specific query or search. Thereafter, the Article examines the uses of personal information by these entities, while emphasizing the Internet’s ability to substantially alter the way in which firms interact with individuals. As opposed to the one-to-many, or “broadcast” format, which required content providers to convey a uniform message to a diverse audience, the Internet provides for a one-to-one experience, in which the content providers can tailor their interactions with every user. The personalized interface could be tailored to the specific individual on the basis of personal information previously collected from this user. This three-tier process (of the collection, analysis and use of personal data) must be viewed as an ongoing cycle that is constantly reassessed and refined, thus forming a feedback loop for the process of providing personally tailored content for every specific user. Finally, in this part, the Article addresses the trends of concentration in media markets, and the affects such trends will have on the severity of the privacy-related concerns addressed throughout the analysis. Although this issue is rarely associated with information privacy concerns, it is directly linked to the issues at hand, as the existence of such concentration both facilitates the collection of more and better personal information (in the hands of fewer market players), and allows for better opportunities for its subsequent use. The Article briefly addresses empirical findings regarding this issue, and examines how and which forms of concentration (be it horizontal or vertical) potentially exacerbate the abilities of content providers to use personal information in their attempts to influence segments of the public. After drawing out the new uses of personal information, Part II examines whether the online attempts to influence are (A) more effective than other widely-practiced forms of advertising and persuasion, and (B) whether these actions are to be deemed, at times, problematic and unacceptable. The Article refers to several models drawn from the world of advertising and social psychology, and illustrates how the ability to deliver specifically tailored content, on the basis of personal information, will lead to more favorable results for those that strive to influence and persuade, even when carried out in the limited, Internet context. The Article also compares these new online practices to other forms of advertising such as TV ads, direct mailings, telemarketing and door-to-door solicitations. Thereafter, the Article examines whether these new forms of tailored communications might lead to problematic outcomes. Here, the Article addresses three legal paradigms that are commonly used to assess communications and advertisements, while referring to them as (1) means to provide information, (2) means to convey the truth about a product and (3) as a way to influence the consumers’ preferences. Every one of these paradigms introduces a different set of concerns, and specific legal terms and arguments, which frame abusive communications as incomplete & insufficient, unfair & deceptive or overwhelming & impeding on the individuals’ privacy. The analysis balances these concerns with the benefits of these innovative means of communications, that allow content providers to pursue their objective in an efficient manner (and thus benefiting their shareholders and the market as a whole) and provides users with content that is prepared especially for them. The Article concludes that problems may arise according to every one of these paradigms, while emphasizing the content providers enhanced ability to influence consumer preferences. In Part III, the Article suggests solutions to the problems previously addressed, that are based on two distinct notions: (1) providing the consumers with notice as to the tailoring of individualized communications, and the use of personal information within this process (2) assuring that users receive a balanced mix of messages. When addressing notification, the Article distinguishes between today’s popular notions of requiring notification at the time personal information is collected, and notification at the time of information use. The article advocates the latter, while explaining that it is required so to allow users to acknowledge the possible manipulative uses of personal information. By addressing balancing, the Article argues that a countering remedy to unfair, partial and manipulative advertising is assuring an open marketplace of ideas. Therefore, steps must be taken to assure that no single firm or voice will dominantly influence a specific set of consumers. This objective can be achieved by requiring dominant online players to operate a non-discriminating platform, and implementing minimal changes in the browser and search engine technologies. The Article also examines whether these solutions will constitute excessive impediments on the rights of those content providers that will be forced to comply. In doing so, the Article distinguishes between commercial and other speech, and examine whether specific Internet realms should be considered as a public forum. In Part IV, the Article addresses the introduction of Google’s innovative, yet controversial free email applications – Gmail. This service promises free access to a generously large email account, though, in exchange, provides users with tailored advertisements, which were specifically matched for every user on the basis of the content of their email messages. This latter service has led to a public uproar, which is somewhat surprising, as prior court decisions have already drawn out the very limited expectancy of privacy users have within their email accounts. Nevertheless, the public’s visceral response was indeed warranted, as this new application brings together all the elements drawn out in Part I, and might lead to the emergence of unfair and manipulative forms of advertisements. Therefore, the Article explains how the Gmail debate encapsulates the various issues addressed throughout the analysis, and suggests several simple solutions.
Dr. Yucel Saygin, Sabanci University, "Are Anonymous Documents Really Anonymous in the Age of Data Mining?"
Data mining, which aims to turn heaps of data into useful knowledge, has found many applications both in government and commercial organizations. Data mining techniques were used mostly for better customer relationship management and decision-making in commercial organizations. Government agencies are using data mining techniques to tract down suspicious behavior as well. For that, reason data collection initiatives such as CAPPS II (Computer Assisted Passenger Prescreening System) were proposed to collect passenger and itinerary information. Data collection efforts by government agencies and enterprises have raised a lot of concerns among people about their privacy. In fact, TIA (Total Information Awareness) project which aimed to build a centralized database that will store the credit card transactions, emails, web site visits, flight details of Americans was not funded by the Congress due to privacy concerns. Privacy concerns were recently addressed in the context of data mining (such as the work lead by Rakesh Agrawal from IBM Almaden, and Chris Clifton from Purdue University). However, privacy preserving data mining research mostly deals with the problem of mining the data without actually seeing the confidential values. Another aspect of privacy preserving data mining is to be able to protect the data “against” data mining techniques since privacy risks increase even more when we consider powerful data mining tools. This is important when the data itself is not confidential but the implicit knowledge in the data that could be extracted by data mining tools is confidential. The following quote from a NY Times article supports our point: “Pentagon has released a study that recommends the government to pursue specific technologies as potential safeguards against the misuse of data-mining systems similar to those now being considered by the government to track civilian activities electronically in the United States and abroad”. (Markoff J. (2002). Study Seeks Technology Safeguards for Privacy. NY Times, 19 December.) In this work, we concentrate on privacy leaks in text since it is of the main data sources widely available especially on the Internet (newsgroups, emails, technical papers, and many more.). Text data sources were also of interest to data mining research. Text mining techniques have been developed to classify a given text into predefined categories, or to cluster a given set of documents into groups for increasing the recall in document retrieval. Techniques for finding patterns in text such as term associations were also developed. Using text classification techniques, one can identify the author of a text easily. In fact, authorship identification techniques were used to resolve claims about the some works of Shakespeare. However, authors of documents would not like to be public in all cases, consider the reviews for papers (or books) published on the Internet. If someone could collect a sufficient number of documents related to a set of authors, this set could be used to train a classification model to identify the author(s) of a given document. Consider the following scenario: Some database conferences are now doing blind review, but database community being very small, an exhaustive list of authors could easily be obtained from the internet (such as by querying DBLP bibliography for authors of published papers in databases http://www.informatik.uni-trier.de/~ley/db/). A curious reviewer could then employ some of his/her graduate students to collect a set of documents of these authors and construct a classification model to identify the authors of anonymous papers. Maybe curiosity is not a very convincing argument to demonstrate the fact that a reviewer would want to know the authors of the papers he/she is reviewing, but anger could be. Consider now an author whose paper is rejected. Attached with a rejection letter with a few paragraphs of anonymous reviews. Knowing that reviewers are mostly from the Program Committee of the conference, it is not difficult to obtain a full list of possible reviewers from the conference web site. A set of documents by these reviewers could also be easily crawled from the Internet to form a training set for authorship identification. Privacy may be even more important in case of a declaration in the news that needs to be anonymous for security reasons. In a preliminary work we have collected a set of ten authors of a newspaper. Using very simple features such as word and punctuation frequency counts, we were able to train a Support Vector Machine model with which we could identify the authors of the news paper articles with more than 90% accuracy. This was a very limited domain, with a limited set of authors, which resulted in high prediction accuracy. However, we believe that with some domain knowledge, the possible authors could be reduced to a limited set (as in the case of the conference review example) to increase the accuracy in general. The first step in text anonymization is to remove the explicit personally identifying information. Named entity extraction (NEA) techniques could be used to find and remove the personally identifying information from text. NEA techniques were previously shown to be useful for sanitizing medical documents by Latanya Sweeney from Carnegie Mellon University. In a recent work, we have also addressed how documents could be sanitized to remove explicit private information, or personally identifying information. However full text anonymization needs to consider data mining techniques for authorship identification of anonymous documents. This is a difficult task since there are many features that could be used by text classification techniques. Another concern is preserving the quality of the released text (comprehensibility, and readability) while doing anonymization. Initially, we have identified k-anonymity as the basic privacy metric that has been previously proposed for statistical disclosure control (see the work by Pierangela Samarati from University of Milan and Latanya Sweeney from Carnegie Mellon University on disclosure control). We base our work in progress on k-anonymity to anonymize a given set of documents so that the author of a document in a given set of documents can not be distinguished among k-1 other authors. We also proposed heuristics based on updating the documents to reduce the significance of important features used for document classification. Updating the word counts without disturbing the readability of text could be achieved by replacing the words with their synonyms. This will homogenize the word frequencies in different documents so that word frequencies cannot be used in text classification. This initial work, which identified data mining tools as a threat to anonymity of documents, will hopefully lead to more research to ensure full anonymity.
Prof. Dennis Hirsch, Capital University Law School, "Regulating Privacy: Environmental Law for the Second Industrial Revolution"
the 19th Century, the Industrial Revolution emerged in England. It
gave rise to many new products and businesses. It also generated
environmental pollution at an unprecedented rate. These
environmental harms far outstripped the ability of the existing
legal system to deal with them. Over time, a new form of law --
environmental law -- emerged to address these injuries.
The end of the 20th Century has witnessed another economic transformation: the Information Revolution. Many believe that its impact on society will be comparable to that of the Industrial Revolution. Both events have generated new technologies and forms of business. Both have created new types of harms. Due to data mining, "cookies" and spam, information-related businesses are invading personal privacy in ways unknown to prior generations. Much like the environmental harms of a century ago, these new injuries are outstripping the ability of existing legal frameworks to address them. Already, there have been numerous calls for new laws to protect privacy, including new legislation and regulation. Some have already been adopted.
Legal scholars have joined enthusiastically in the search for a legal structure to protect privacy in the information age. Rather than reinventing the wheel, many have looked to existing areas of the law. Commentators have suggested that the basic principles of property law, contract law, trade secret law, and nuisance law, among others, might provide a useful framework for developing a legal regime to protect privacy. However, they have overlooked what may be the most instructive model for thinking about how to protect privacy in the information age: environmental law.
Environmental law holds promise for three reasons. First, it responds to the harms created by the first industrial revolution and should hold lessons for considering how to address those generated by the second. In addition, the underlying theories that explain the existence of environmental harms and that inform much of environmental regulation -- notably, the concepts of "negative externalities" and the "tragedy of the commons" -- apply quite readily, and with only a few minor adjustments, to the information-based harms to privacy. Finally, environmental regulation has been the most controversial and hotly-contested area of business regulation for the past few decades. More intellectual debate, policy experimentation and agency resources have gone into developing this body of regulatory law than almost any other area of administrative practice. The theory and practice of environmental law has accordingly developed a richness and depth that should make it a helpful resource for those who would think about privacy regulation.
This paper will examine whether environmental law might serve as a useful model for the further development of privacy regulation in the information age. It will begin by showing that environmental injuries and privacy harms resemble each other at a theoretical level. The concepts of negative externalities and the tragedy of the commons provide useful constructs for thinking about both phenomena. Having laid this theoretical foundation, the paper will then identify specific environmental laws and policies that might be adapted to the protection of privacy in the information age. Here, the paper will distinguish between traditional, first generation environmental policies – often referred to as "command and control" regulation – and more recent second generation approaches. It will argue that the centralized, command-and-control method is a poor match for rapidly-changing areas such as the Internet and other information-based fields. Second generation strategies that attempt to provide greater flexibility and cost-effectiveness, while remaining demanding in terms of results, may offer a better model. In other words, Internet regulation should learn from the environmental experience by skipping the first generation policy approach and moving right to the second. The paper will examine specific regulatory mechanisms used in the environmental field and will analyze whether they might successfully be applied to the task of privacy protection.
The author is a professor of environmental law at Capital University Law School and a graduate of Yale Law School. He studies new forms of environmental governance and how these initiatives can best promote environmental protection. He is the immediate past Chair of the ABA Environmental Section's Committee on Innovation, Management Systems and Trading -- a 120-member group that examines emerging policy strategies. He believes that privacy harms are the newest species of environmental harm, and that our half century of experience with environmental law can provide useful lessons for privacy regulation in the information age.
Prof. Daniel Steinbock, University of Toledo College of Law, "National Identity Cards: Fourth and Fifth Amendment Issues"
In the past three years there have been serious calls for a national identity system whose centerpiece would be some form of national identity card. Such a system is seen mainly as a tool against terrorists, but also as a useful response to illegal immigration, identity theft, and electoral fraud. This paper analyzes the Fourth and Fifth Amendment issues in two major features of any likely national identity system: requests or demands that individuals present their identification cards; and governmental collection, retention, and use of personal information to be used in identity checks. These issues are evaluated in several different contexts in which they might plausibly arise. The analysis takes account of Illinois v. Lidster and Hiibel v. Sixth Dist. Court of Nevada, two recent cases bearing on the issues that were decided by the Supreme Court during its 2003 term. The paper aims to specify the constitutional obstacles to a national identity system, as well as to indicate what practices present little or no problem. With respect to the Fourth Amendment seizures that might be implicated in official demands for an identity token, a wide range of identification occasions would not present much constitutional difficulty. These occasions might include demands for identification during registration procedures, during investigative and traffic stops, and incident to arrests, as well as requests for identification during consensual encounters. There is one outright prohibition on official insistence on presentation of an identity card, and two important caveats. Suspicionless stops to check identity cards would be unreasonable seizures. Random identification demands by roving patrols are thus constitutionally prohibited. One caveat concerns the use of checkpoints for terrorist profiling. Unless extraordinary circumstances develop, or there is a quantum leap in the effectiveness of this technique, checkpoint stops to link people to a database in order to profile potential terrorists should be held to be unreasonable seizures. On the other hand, checkpoints designed to identify unauthorized migrants or known suspects seem likely to receive judicial acceptance. The second caveat about the design of a national identity system is the danger it poses for drastically increasing the use of pretextual traffic stops—stops for genuine traffic violations undertaken not to enforce the law but for the purpose of checking motorists’ identification. This practice is lawful under the Fourth Amendment, but, in the absence of a wholesale reinterpretation of the pretextual stop doctrine, legislation for a national identity system should attempt to prevent it, difficult as that might be. In fact, this prospect by itself counsels against a national identity system. A review of the Fourth Amendment issues in government-mandated data collection, retention, and use shows that recording public encounters, including normal investigative stops and arrests, as well as checkpoint arrivals, would be unlikely to raise serious objection. Recording investigative stops would probably be upheld despite considerable grounds for finding them to upset the carefully constructed balance sustaining the constitutionality of seizures based upon reasonable suspicion. Governmental acquisition of personal data already supplied in connection with some ordinary service poses the most serious Fourth Amendment issue. This procedure should be treated as a search, but it may be a reasonable one, particularly if accompanied by restrictions on retention and use, at least until it reaches some indefinable level of societal surveillance. Thus far, the Supreme Court seems willing to accept a certain amount of government database creation by way of gathering information already in existence. Where this leaves the prospects for a national identity system is hard to say, particularly because a definitive judgment about its constitutionality can come only after its features are defined. Clearly, the Fourth Amendment stands in the way of the kind of total surveillance and anytime identification demands that would allow such a system to operate at maximum efficiency. On the other hand, there is still a fair amount the government could do in both areas that would withstand Fourth Amendment challenge. Indeed, this review reveals again the truth of Katz v. United States’ famous epigram that “the Fourth Amendment cannot be translated into a general constitutional ‘right to privacy.’”317 Many aspects of a national identity system may arguably intrude on privacy but not offend the Fourth Amendment. Moreover, the Fifth Amendment presents no significant obstacle to the primary functions of a national identity system, particularly in light of the Hiibel decision. This review of the Fourth and Fifth Amendment issues should serve to demonstrate to proponents of a national identity card that there are both limits and dangers to its use. It should also make clear to those who see a national identity system as an Orwellian nightmare that while the Constitution stands somewhat athwart its path, it does not make such a system impossible. Whether the kind of national identity system that could operate lawfully is worth the financial, administrative, and social costs would ultimately be a policy, not a legal, judgment. Much of the policy assessment, though, involves consideration of the government’s need for a national identity card, its effectiveness, and its interference with privacy and free movement. To a large degree, these are the same factors on which the constitutionality of a national identity system turns. This legal analysis thus provides a useful perspective on the desirability, as well as the constitutionality of adopting a national identity card.
Mr. Orville Wilson, Security Consultant, "Security and Usability: The Viability of Passwords and Biometrics"
The Federal Trade Commission (FTC) found that identity theft accounted for nearly $48 billion in losses to businesses over the past five years. They further report that more than 27 million Americans have been victims of identity theft during that period, and nearly 10 million of those were victim last year alone. One of the crucial security missions is the control of access to systems in order to prevent intruders and identity thefts. Biometric technology, application that uses the physiological and behavioral attributes of a living person for the purpose of positively verifying the identity, is the answer to this predicament. Furthermore, biometric is an emerging authentication and identification method that includes a variety of technologies. Recent advances in computer science have made biometric technology products more reliable, less expensive to own and useable in varied industries. This paper discusses the security, usability and applicability of passwords and biometrics in order to determine their viability within organizations, government agencies and or educational institutions. Statistical research and analysis shows the rise of identity theft and the increasing challenge it poses to the access to information resources. For decades people have relied on password for identification and authentication. However, biometrics provides a stronger user authentication than passwords and is diverse and varies widely in cost, performance and other characteristics. Unequivocally, the use of stronger authentication methods is a compelling security solution which makes organizations less prone to identity thefts and unauthorized accesses.
Dr. Arthur Keller, UC Santa Cruz and Open Voting Consortium, "Privacy Issues in an Electronic Voting Machine"
In this paper, we describe the Open Voting Consortium’s voting system and discuss the privacy issues inherent in this system. By extension, many of the privacy issues in this paper also apply to other electronic voting machines, such as DREs (Direct Recording Electronic voting machines). The privacy issues illustrate why careful and thorough design is required to ensure voter privacy and ballot secrecy. The requirements for secrecy in elections depend upon the values and goals of the political culture where voting takes place. Gradations of partial and complete privacy can be found in different cultural settings. Most modern polities institutionalize the ideal of complete privacy by relying on anonymous balloting. The use of secret balloting in elections — where a ballot's contents are disconnected from the identity of the voter — can be traced back to the earliest use of ballots themselves in 6th Century B.C.E. Athens, Greece. The public policy rationales for instituting anonymous balloting typically aim to minimize bribery and intimidation of the voter . Secret ballots, although not always required, have been in use in America since colonial times. Today, almost one hundred years after most states in the U.S. passed laws to require anonymous balloting, a strong sense of voter privacy has emerged as a third rationale. These cultural values and practices contribute to the sets of user requirements that define the expectations of voters in computer-mediated elections and determine alternative sets of specifications that can be considered in developing open source software systems for elections . The Open Voting Consortium (OVC) has developed a model election system that aims as one of its goals to meet these requirements. This paper describes how the OVC model ensures ballot privacy. The OVC has developed the model for an electronic voting system largely in response to the reliability, usability, security, trustworthiness, and accessibility concerns of other voting systems. Privacy was kept in mind throughout the process of designing this system. Section 2 of this paper discusses the requirements for a secret ballot in more detail and how secrecy could be compromised in some systems. Section 3 describes how the OVC handles the privacy concerns. While this paper focuses mostly on privacy issues for US-based elections, and how they are addressed in the OVC system, many of the issues raised are applicable elsewhere. ----- An extended abstract of this paper was accepted for publication in WPES 2004 in Washington DC on October 28, 2004
Mr. Mark Simon, Attorney in private practice, "Addressing Complexity for Consumers Facing RFID Technology"
In June, 2004 the FTC held a workshop on the emergence of radio frequency identification (RFID) technology and its implications for businesses, consumers and policymakers. Workshop panelists and public comments submitted to the FTC on the subject generally reflected agreement that a key issue in addressing the use of RFID technology is the importance of effective disclosure to consumers. It was suggested that entities using RFID be required to provide notice to consumers explaining the technology, including how and why it was being used, in order to permit the consumer to exercise his or her right to informational self-determination. This paper accepts the premise that consumers ought to be provided with the means of assessing and controlling how their privacy may be affected by RFID technology. Given such needs, the author urges policymakers to embrace uniform RFID disclosures and practices so that consumers are not overwhelmed with complexity so as to stymie consumer choice and decision-making abilities. Background information on RFID technology. • Description of RFID technology o Unobtrusiveness o Uniqueness o Interoperability o Proliferation • How RFID may enhance supply chain management and provide benefits to consumers. • Examples of RFID technology currently in use. Threats RFID poses to consumer privacy • Distinguish between an RFID tag on an item that a consumer acquires as compared to an RFID tags applied to a shipping unit. • Describe how an RFID tag on an item can be associated with other personal information at a point of sale. • Discuss potential for exposure of RFID information to RFID readers operated in public venues. • Address security risks of unauthorized access or use of RFID information. A lesson learned from the backlash against Gillette’s attempt utilize RFID technology in the US and the UK. • Consumer issues not addressed • Failure to win consumer confidence by involving eTrust or other private organizations that certify good privacy practices. • CASPIAN led boycott Benefits of uniform consumer disclosures and security practices. • Reduce complexity posed by multiple solutions. • Enhance likelihood of consumer acceptance (e.g., gov’t inspected meat, FDA approved medication). • Certainty of implementation practices for vendors/retailers • Uniform disclosures and practices more readily recognizable and understandable to consumers Adopting uniform measures • Utilize existing infrastructure of private organizations that certify good privacy practices. • Possible role of regulation. Cite examples of existing regulation of privacy disclosures and security measures (HIPAA, G-L-B and other laws) Commentary on budding regulatory initiatives. • Utah Radio Frequency Identification Right to Know Act (H.B. 251 expired before action by Senate). • RFID Right to Know Act of 2004 (S.B. 867) sponsored by State Sen. Maida Coleman (D-District 5) was introduced before the Missouri State Senate. After a second reading in January, the bill was referred to the Senate’s Commerce and Environment Committee. • California SB1834 proposed to set privacy standards for RFID technology was rejected June 25, 2004 • Congress H.R. 4673 “Opt Out of ID Chips Act” introduced June 23, 2004. Conclusion: The promise of RFID benefits can be more readily realized if consumers accept the technology and are not overwhelmed by the complexity of its implementation. Policymakers should embrace uniform solutions as a means to address the complexity that faces consumers.
Prof. Alessandro Acquisti, Carnegie Mellon University, "Privacy and Rationality: Evidence from Survey Data"
From its early days ([Pos78], [Pos81], and [Sti80]) to its more recent incarnations ([CP01], [AV01], and [Tay02]), the economic modelling of privacy has studied individuals as rational economic agents who go about deciding how to protect or divulge their own personal information. According to that view, individuals are forward-lookers, utilitymaximizers, bayesian updaters who are either fully informed or base their decisions on subjective probabilities coming from known random distributions. This approach permeates not just formal models but the policy debate, in the views of those who believe that consumers not only should be granted the right to manage their own privacy tradeoffs without regulative intervention, but are in fact able to do so. There are reasons to doubt that this approach adequately captures the individual decision process with respect to privacy. First, incomplete information affects privacy decision making because of information asymmetries and the complex costs and benefits of privacy related transactions. Benefits and costs associated with privacy intrusions and protection are both monetary and immaterial. Many payoffs are only discovered ex post through actual experience. Uncertainties are such that individuals may ignore threats, protections, and consequences, as well as the likelihood of adverse events. Second, even if an individual had complete information, she would be unable to process and act rationally upon vast amounts of data. Our innate bounded rationality ([Sim82]) limits our ability to acquire and process information, and makes us replace rational decision making with simplistic mental models, approximate strategies, and heuristics. Third, even if an individual had access to complete information and could appropriately compute strategies and actions, she still may deviate from the rational strategy. A vast body of economic and psychological literature has confirmed several forms of psychological distortions that affect individual decision making. [Acq04] presents a model of privacy behavior grounded on some of those distortions - in particular, the tendency to trade-off costs and benefits in ways that damage our future selves in favor of immediate gratification ([RO00]). Over the past 15 years, empirical studies have reported significant privacy concerns across the population (see [Wes91], [ACR99], [Com00], [Har02], [Jup02], and [HAF04]). More recent surveys, anecdotal evidence, and experiments (see [CS02], [HHLP02], [SGB01], and [Jup02]) have highlighted a dichotomy between privacy attitudes and actual behavior: privacy concerned individuals are willing to trade-off privacy for convenience or bargain the release of very personal information in exchange of relatively small rewards. This literature has provided invaluable insights, but has left the dynamics of individual privacy decision making not fully explained. Our research constitutes a combination of theoretical and empirical approaches aimed at investigating the drivers of privacy decision making and behavior. In May 2004, we contacted a population of potential subjects who had shown interest in participating in economic experiments at Carnegie Mellon University. We offered participants a lump sum payment of $16 to fill out an online, anonymous survey. 120 responses were gathered in less than twenty-four hours after the invitation emails had been sent. The goal of our preliminary survey (others are planned) was to test the economic privacy rationality assumption through an analysis of individual privacy knowledge, economic behavior, and psychological distortions. In this paper, we start investigating the level of information the individual possesses when making privacy-sensitive decisions, and we address the effects of possible psychological distortions on privacy attitudes and behavior. The survey contained around 100 questions organized around several categories: demographics; knowledge of privacy risks and protection against privacy risks; past behavior with respect to protection or release of personal information; attitude towards privacy; attitude towards risk (i.e., risk aversion, love, or neutrality); discounting behavior (i.e., exponential versus hyperbolic discounting); and (economic) strategic behavior. Our results make us favor a model that recognizes that the individual’s decision making process with respect to privacy is affected by incomplete information, bounded rationality, and various forms of psychological deviations from rationality. While many participants of our survey showed sophisticated concerns for privacy, we found evidence for factors that can limit the success of individuals’ protective behavior: overconfidence in privacy assessments, incorrect assessment of own privacy behavior, incomplete information about risks and protection, a attitudes/behavior dichotomy, and a buy/sell prices dichotomy. Our data are compatible with the explanation that time inconsistencies in discounting may lead to under-protection and over-release of personal information. In conclusion, we fail to support a model of strict “rationality” as an appropriate way to describe individual privacy behavior, and suggest further work on behavioral alternatives and their experimental validation. We find significant implications for public policy and technology development. In the current public debate, privacy seems anchored on two prominent positions: either consumers should be granted the right to manage their own privacy tradeoffs, or choice “will be preemptively denied to them by privacy fundamentalists out to deny consumers that choice”. However, our observations suggest that several difficulties may obstruct even concerned and motivated individuals in their attempts at protecting their privacy, also when efficient technologies may be available. While respondents’ actual knowledge about law and legislative recommendations was surprisingly weak, a large number of respondents favored governmental legislation and intervention as means for privacy protection (53.7%). Self-protection through technology (14.9%) and group protection through behavioral norms (30.6%) found also support in our test population. Nobody favored the absence of any kind of protection; one subject suggested self-regulation by the private sector. This is a striking result, contradicting the traditional assumption that citizens of the United States are skeptical towards government intervention and favor industry-lead activities. References [Acq04] Alessandro Acquisti. Privacy in electronic commerce and the economics of immediate gratification. In Proceedings of ACM Conference on Electronic Commerce (EC ’04), 2004. [ACR99] Mark Ackerman, Lorrie Cranor, and Joseph Reagle. Privacy in e-commerce: Examining user scenarios and privacy preferences. In Proceedings of the ACM Conference on Electronic Commerce (EC ’99), pages 1–8, 1999. [AV01] Alessandro Acquisti and Hal R. Varian. Conditioning prices on purchase history, 2001. Presented at the European Economic Association Conference, Venice, IT, August 2002. http://www.heinz.cmu.edu/~acquisti/papers/ privacy.pdf. [Com00] Federal Trade Commission. Privacy online: Fair information practices in the electronic marketplace, 2000. http://www.ftc.gov/reports/privacy2000/ privacy2000.pdf. [CP01] Giacomo Calzolari and Alessandro Pavan. Optimal design of privacy policies. Technical report, Gremaq, University of Toulouse, 2001. [CS02] Ramnath K. Chellappa and Raymong Sin. Personalization versus privacy: An empirical examination of the online consumer’s dilemma. In 2002 Informs Meeting, 2002. [HAF04] Bernardo Huberman, Eytan Adar, and Leslie R. Fine. Privacy and deviance. Technical report, HP Labs, 2004. [Har02] Harris Interactive. First major post-9.11 privacy survey finds consumers demanding companies do more to protect privacy; public wants company privacy policies to be independently verified, 2002. http://www.harrisinteractive. com/news/allnewsbydate.asp?NewsID=429. [HHLP02] Il-Horn Harn, Kai-Lung Hui, Tom S. Lee, and Ivan P. L. Png. Online information privacy: Measuring the cost-benefit trade-off. In 23rd International Conference on Information Systems, 2002. [Jup02] Jupiter Research. Seventy percent of US consumers worry about online privacy, but few take protective action, 2002. http://www.jmm.com/xp/jmm/press/ 2002/pr_060302.xml. [Pos78] Richard A. Posner. An economic theory of privacy. Regulation, May-June:19– 26, 1978. [Pos81] Richard A. Posner. The economics of privacy. American Economic Review, 71(2):405–409, 1981. [RO00] Matthew Rabin and Ted O’Donoghue. The economics of immediate gratification. Journal of Behavioral Decision Making, 13:233–250, 2000. [SGB01] Sarah Spiekermann, Jens Grossklags, and Bettina Berendt. E-privacy in 2nd generation e-commerce: Privacy preferences versus actual behavior. In Proceedings of the ACM Conference on Electronic Commerce (EC ’01), pages 38–47, 2001. 3 [Sim82] Herbert A. Simon. Models of bounded rationality. The MIT Press, Cambridge, MA, 1982. [Sti80] George J. Stigler. An introduction to privacy in economics and politics. Journal of Legal Studies, 9:623–644, 1980. [Tay02] Curtis R. Taylor. Private demands and demands for privacy: Dynamic pricing and the market for customer information. Department of Economics, Duke University, Duke Economics Working Paper 02-02, 2002. [Wes91] Alan F. Westin. Harris-Equifax Consumer Privacy Survey 1991. Equifax Inc., Atlanta, GA, 1991.
Dr. Bo Xu, Dr. Ouri Wolfson, University of Illinois at Chicago, "Privacy Solutions by Mobile Peer-to-Peer Communication"
Privacy Solutions by Mobile Peer-to-Peer Communication Bo Xu and Ouri Wolfson Department of Computer Science, University of Illinois at Chicago Location Based Services (LBS) are mobile services in which the user location information is used in order to add value to the service as a whole. Examples of LBS include: getting driving directions, traffic information, weather, and travel schedules that are specific to the region a user is traveling in; locating people and businesses that are in proximity to the user; etc. When requesting a location based service, a user has to disclose its location and communication ID to the server. This raises a privacy issue, which is an obstacle to immediate widespread adoption of LBS. In this paper we propose to solve this privacy issue by mobile peer-to-peer networks. A mobile peer-to-peer network is a set of moving objects that communicate via short-range wireless technologies such as IEEE 802.11 , Bluetooth , or Ultra Wide Band (UWB) . With such communication mechanisms, a moving object receives information from its neighbors, or from remote objects by multi-hop transmission relayed by intermediate moving objects. A killer application of mobile peer-to-peer networks is resource discovery in transportation. For example, the mobile peer-to-peer network approach can be used to disseminate the information of available parking slots, which enables a vehicle to continuously display on a map to the driver, at any time, the available parking spaces around the current location of the vehicle. Or, the driver may use this approach to get the traffic conditions (e.g. average speed) one mile ahead. Similarly, a cab driver may use this approach to find a cab customer, or vice versa. Safety information (e.g. a malfunctioning brake light in a vehicle) can also be disseminated in this fashion. A mobile peer-to-peer network can also be used in matching resource producers and consumers among pedestrians. For example, an individual wishing to sell a pair of tickets for an event (e.g. ball game, concert), may use this approach right before the event, at the event site, to propagate the resource information. For another example, a passenger who arrives at an airport may use this approach to find another passenger for cab-sharing from the airport to downtown, so as to split the cost of the cab. Furthermore, the approach can be used in social networks; when two singles whose profiles match are in close geographic proximity, one can call the other's cell phone and suggest a short face-to-face meeting. We propose an opportunistic approach to information dissemination in mobile peer-to-peer environments. In this approach, a moving object (vehicle/pedestrian) propagates the information it carries to encountered objects, and obtains new information in exchange. For example, a vehicle finds out about available parking spaces from other vehicles. These spaces may either have been vacated by these encountered vehicles or these vehicles have obtained this information from other previously encountered ones. Thus the parking space information transitively spreads out across objects. Similarly, information about an accident or a taxi cab customer is propagated transitively. Observe that in our opportunistic dissemination approach, a moving object does not need to request a server for information about the services in the region it is travelling in. Instead, the information is pushed by service providers to vicinity moving objects. For example, instead of joining a location-based directory service to be searched geographically, an Italian restaurant may decide to put a short-range transmitter and advertise via opportunistic and transitive dissemination. This solves privacy concerns that arise when a user asks for the closest restaurant or gas station. As mentioned earlier, traditionally, the user would have had to provide her location and communication ID to the cellular provider; but she does not need to do so in our scheme. In our scheme, the transmission between two vehicles can be totally anonymous. References:  IEEE Computer Society. Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications.1997.  J. Haartsen, et al. Bluetooth: Vision, Goals, and Architecture. ACM Mobile Computing and Communications Review, 2(4):38-45, October 1998.  Ultra-wideband (UWB). http://www.ubisense.net/technology/uwb.html
Dr. Katherine J. Strandburg, J.D., DePaul College of Law, "Too Much Information! Privacy, Rationality, Temptation and the Implications of 'Willpower Norms'"
This Article explores the implications of bounded
rationality and limited willpower for regulation of the societal
flow of personal information. Because people are limited in their
information processing capacity and are tempted both to pry into
information they cannot interpret correctly and to disclose
information that others cannot process rationally, a problem of “too
much information” can arise. Social norms regulating prying, gossip,
and the “appropriateness” of personal revelations appear to have
developed in response to this problem.
The Article proposes a general category of “willpower norms” that includes many social norms governing the flow of personal information. Willpower norms may be enforced through social sanctions if deviations are detected. Additionally, internal self-control strategies may interact with social norms to provide a novel mechanism by which willpower norms can be effective against undetectable self-control lapses.
Though personal information norms have an important role to play in averting inefficient prying and disclosure, willpower norms, like other social norms, may fail. In some instances, for example, personal information norms can devolve into inefficient “silencing norms” that impede the evolution of social values. The “don’t ask, don’t tell” policy regarding gays in the military can be analyzed in this way.
Bounded rationality and limited self-control undermine the assumptions that more information is always better and that choices to disclose, obtain, or use personal information always indicate long-term preferences. However, the complicated relationship between rational choice, personal autonomy, and efforts to control the flow of personal information, also cautions against overly intrusive legal regulation.
Prof. Eric Goldman, Marquette University Law School, "Data Mining, Unwanted Marketing and Attention Consumption"
Many commentators have assumed, often without further elaboration or support, that data mining inherently creates social problems. Building on this unsupported assumption, commentators have offered a variety of solutions to curtail or eliminate data mining altogether. However, without a clear understanding of the problems created by data mining, it is impossible to develop precise policy responses that appropriate regulate data mining. To lay a foundation for further policy discussions, my paper will fill the analytical holes left by previous commentators and specifically identify the harms created by data mining. These harms include: 1) the risk of incorrect decisions or adverse judgments being made about the data subject 2) the risk that aggregated information will be disclosed and damage the data subject’s reputation (i.e., public disclosure of private facts) 3) the risk that aggregated information will be disclosed and lead to increased physical risks (i.e., stalking) 4) the risk that aggregated information will be disclosed and lead to increased non-physical risks (i.e., identity theft) 5) the risk that marketers will use the information to send unwanted communications to the data subject My paper will focus on the fifth purported harm (unwanted marketing). The paper will show how marketing consumes the scarce resource of consumer attention. To the extent that the marketer does not compensate the consumer, the marketer externalizes some costs to the consumer and overproduces marketing. In response to this problem, many commentators have proposed creating a property interest in personal data. However, the “data-as-property” paradigm is misdirected. The harm of unwanted marketing occurs when marketing consumes attention, not when data is aggregated or mined. Therefore, the data-as-property paradigm does a poor job addressing the harm suffered by the unwanted consumption of attention. Furthermore, the paper will show how data mining creates positive social and private utility by allowing consumers to receive more useful/relevant marketing communications. Data mining can also reduce private and social costs by lowering the transaction costs incurred by marketers (such as the costs of sending communications to uninterested consumers). Therefore, the paper will conclude that concerns about data mining for marketing purposes are misdirected and perhaps unfounded. Instead of restricting marketing efforts to target their communications, we may want to encourage marketers to engage in more data mining as a way of limiting unwanted consumptions of attention.
Prof. Gaia Bernstein, Seton Hall University School of Law, "Accommodating Technological Innovation: Identity, Genetic Testing and the Internet"
To evaluate the need for legal change stemming from technological innovation we need to look beyond the accommodation of specific rules and on to the impact of technological innovation on social structures, institutes and values. In this article I study how social tensions created by recent technological innovations produce a need to elevate a legal interest from the shadows of legal discourse into the forefront of legal debate. Specifically, I examine two information technologies that are exerting significant influence on our lives – genetic testing and the Internet – and their impact on our normative conception of identity. This socially oriented approach leads to several insights. First, I show that a host of seemingly unrelated social and legal controversies emanating from these technologies can be traced to a common tension. I demonstrate that by altering social structures through which we perceive our identity, genetic testing and the Internet induce novel societal tensions Secondly, I find that despite the role identity tensions play in controversies implicating genetic testing and the Internet, these tensions are not addressed neither are they resolved in the legal debate. Most legal controversies in which identity tensions arise are addressed with privacy tools. I show that the application of privacy tools to these situations is related to two legal failures. First, by focusing on privacy tools the law fails to address the effects on human beings and their identities. Secondly, privacy tools often fail to even indirectly protect identity interests. I show that the source of the second failure – the failure of privacy tools to protect identity interests even indirectly – often stems from a structural deficiency of existing privacy tools. The deficiency results from an incompatibility between the informational scenarios driving these novel controversies and the informational scenarios that traditional privacy doctrines were designed to resolve. The new informational scenarios differ from the traditional ones in two respects: 1) the direction of the flow of the information; and 2) the information stage. Traditional informational scenarios focus on the collection and dissemination stages of the information and on the flow of information from the individual whose information is taken toward a third party who often seeks to abuse it. On the other hand, the new informational scenarios involving identity tensions focus on the earlier stage of the creation of the information and on a flow of the information toward the individual. Consequently, it appears that existing doctrines are structurally deficient in their ability to accommodate consideration of identity interests. The failure to address identity interests combined with the frequent failure to provide for their protection calls for the incorporation of identity interests into our legal debate. Identity interests need be considered in controversies as diverse as the physician’s duty to warn relatives of a patient’s genetic condition and commercial profiling on the Internet. Specifically, I propose two potential resolutions: (i) direct incorporation of an independent identity interest and; (ii) indirect incorporation through the readjustment of existing doctrinal tools. I suggest that the pressures applied by the new technologies make both options viable by creating the need for inducing long overdue changes in our legal discourse.
Dr. Traian Truta, Northern Kentucky University, "Global Disclosure Risk for Microdata with Continuous Attributes"
“Privacy is dead, deal with
it,” Sun Microsystems CEO Scott McNealy is widely reported to have
declared some time ago. Privacy in the digital age may not be as
dead and buried as McNealy believes, but it’s certainly on life
support [Meeks 2000]. While releasing information is one of their
foremost reasons to exist, data owners must protect the privacy of
individuals. Privacy concerns are being fueled by an ever-increasing
list of privacy violations, ranging from privacy accidents to
illegal actions [Agrawal et al. 2002]. Privacy issues became more
and more prevalent in today’s society, and many privacy regulations
are enacted in various fields. In the U.S., for example, privacy
regulations promulgated by the Department of Health and Human
Services as part of the Health Insurance Portability and
Accountability Act (HIPAA) went into effect in April 2003 in order
to protect the confidentiality of electronic healthcare information
Microdata represents a series of records, each record containing information on an individual unit such as a person, a firm, an institution, etc [Willemborg et al. 2001]. Microdata can be represented as a single data matrix where the rows correspond to the units (individual units) and the columns to the attributes (as name, address, income, sex, etc.). At present, microdata is released for use by the third party after the data owner has masked the data to limit the possibility of disclosure. We will call the final microdata masked or released microdata [Dalenius et al. 1982]. We will use the term initial microdata for microdata where no masking methods (also called disclosure control methods) were applied. Disclosure risk is the risk that a given form of disclosure will be encountered if masked microdata is released [Chen et al. 1998]. Information loss is the quantity of information, which existed in the initial microdata but which does not occur in masked microdata because of disclosure control methods [Willemborg et al. 2001]. When protecting the confidentiality of individuals, the owner of the data must satisfy the two conflicting requirements: protecting confidentiality for the entities from the initial microdata and maintaining analytic properties in the masked microdata [Kim et al. 2001].
Recent work in disclosure risk assessment can be categorized into two directions: individual and global disclosure risk. Benedetti and Franconi introduced individual risk methodology [Benedetti et al. 1998]. The risk is computed for every released entity from masked microdata. Domingo-Ferrer, Mateo-Sanz and Torra describe three different disclosure risk measures: distance-based record linkage [Pagliuca et al. 1999], probabilistic record linkage [Jaro 1989] and interval disclosure [Domingo-Ferrer et al. 2001]. Global disclosure risk is defined in terms of the expected number of identifications in the released microdata. Eliot and Skinner define a new measure of disclosure risk as the proportion of correct matches amongst those records in the population, which match a sample unique masked microdata record [Elliot 2000, Skinner et al. 2002].
We call the actions taken by the owner of the data in order to protect the initial microdata with one or more disclosure control methods the masking process. The masking process can alter the initial microdata in three different ways: changing the number of records, changing the number of attributes and changing values of specific attributes. The change in number of attributes is always used, since the removal of identifier attributes is the first step for data protection. We call this first mandatory step in the masking process the remove identifiers method. The other two types of changes may or may not be applied to the initial microdata. The most general scenario is when all three changes are applied to the given initial microdata. While change in the number of records is caused by two techniques: simulation [Adam et al. 1989] and sampling [Skinner et al. 1994], the change of attribute values occurs during a larger number of disclosure methods (microaggregation [Domingo-Ferrer et al. 2001], data swapping [Dalenius et al. 1982], adding noise [Kim 1986], etc.).
The attributes from a microdata can also be classified in several categories (continuous, discrete, ordered, partial ordered etc.). We consider an attribute to be continuous if the domain of values ranges over an infinitely divisible continuum of values. The attributes Distance and Length, as well as many financial attributes, fit into this category.
In order to describe the masking process, a few assumptions are needed. The first assumption we make is that the intruder does not have specific knowledge of any confidential information. The second assumption is that an intruder knows all the key and identifier values from the initial microdata, usually through access to an external dataset. In order to identify individuals from masked microdata, the intruder will execute a record linkage operation between the external information dataset and masked microdata. This assumption maximizes the amount of external information available to an intruder. Since, disclosure risk increases when the quantity of external information increases, this second assumption guarantees that any disclosure risk value computed by one of the proposed measures is an upper bound to the disclosure risk value when the amount of external information available to an intruder is not maximal. Based on the above assumptions only key attributes are subject to changes in the masking process.
For any continuous attribute Ok we define the notion of inversion. The pair (xik, xjk) is called inversion for attribute Ok if xik < xjk and x’ik > x’jk for i,j between 1 and n (n is the number of records in both microdata, xik is the value of attribute k for the record j in initial microdata, x’ik is the value of attribute k for the record j in masked microdata). Next, inversion factor for attribute Ok is defined as the minimum between 1 and the number of inversions for the attribute Ok over average number of inversions. The inversion factor will help us assess disclosure risk, since the intruder may try to link an external dataset to the masked microdata based on how the records are ordered for a specific key attribute.
To asses disclosure risk for continuous attribute we create a vicinity set of records for each record form both initial and masked microdata. There are two possible approaches for this construction, to use a distance function between records, or to use interval vicinity for each of the key attributes individually, and at the end to consider in the corresponding vicinity only the records that are in the vicinity of each key attribute. The width of an interval is based on the rank of the attribute or on its standard deviation. Using this method we compute for any pair of corresponding records (xk, x’k) their vicinity sets. If the cardinality of these sets is j and i, respectively, the probability of linking xk and x’k is a function of j and i, and, therefore, we classify each pair (xk, x’k) based on the cardinality of the vicinity sets. We define the classification matrix C such that each element, cij, is equal with the total number of pairs (xk, x’k) that have a vicinity set of size i for the masked microdata record x’k, and a vicinity set of size j for the initial microdata record xk. The matrix C has several properties; the most important one is that the sum of all elements is equal with the number of records in masked microdata. Based on this property, we developed an algorithm that for each pair (xk, x’k) computes the corresponding vicinity sizes j and i, and increments by one the matrix element on the row i and column j.
We extend the minimal, weighted and maximal disclosure risk measures that have been previously presented for discrete attributes and specific disclosure control methods [Truta et al. 2003, Truta et al. 2004, Truta et al. 2004] to be suitable when all key attributes are continuous and the masking process is changing values for those attributes. Our formulations for disclosure risk measures compute overall disclosure risks for given datasets and are not linked to target individuals. The disclosure risk computed considering all attribute for a specified weight matrix [Truta et al. 2004] may be lower than the disclosure risk computed without some of the key attributes. In order to find the correct value for disclosure risk, we have to consider all possible subsets of key attributes, compute disclosure risk for each subset and select the maximum value. The algorithm that generates all possible subsets of key attributes to compute disclosure risk has the exponential complexity O(2p). Fortunately, in a real initial microdata, the number of key attributes is low (usually less than 5). Moreover, the data owner can reduce the number of checked subsets if the inversion factor is either 0 or 1. It can be shown that the key attributes corresponding to the inversion factor 0 will be included in the search. Also, when the inversion factor is 1, their corresponding key attributes will be excluded from the search.
The next step in this research is to simulate various experiments for specific masking methods suitable to continuous attributes (such as microaggregation), and to compute disclosure risk.
Adam, N. R., Wortmann, J. C.: Security Control Methods for Statistical Databases: A Comparative Study. ACM Computing Surveys, Vol. 21, No. 4 (1989)
Agrawal, R., Kiernan, J., Srikant, R., Xu, Y.: Hippocratic Databases. Proc. of the 28th Int’l Conference on Very Large Databases, Hong Kong, China (2002)
Benedetti, R., Franconi, L.: Statistical and Technological Solutions for Controlled Data Dissemination. Pre-proceedings of New Techniques and Technologies for Statistics, Vol. 1 (1998) 225-232
Chen, G., Keller-McNulty, S.: Estimation of Deidentification Disclosure Risk in Microdata. Journal of Official Statistics, Vol. 14, No. 1 (1998) 79-95
Dalenius, T., Reiss, S. P.: Data-Swapping: A Technique for Disclosure Control. Journal of Statistical Planning and Inference, Vol. 6 (1982) 73-85
Domingo-Ferrer, J., Mateo-Sanz, J., Torra, V., Comparing SDC Methods for Microdata on the Basis of Information Loss and Disclosure Risk, Pre-proceedings of ETK-NTTS'2001 (vol. 2), Luxembourg: Eurostat (2001) 807-826
Elliot, M.J.: DIS: A New Approach to the Measurement of Statistical Disclosure Risk. International Journal of Risk Management (2000) 39 –48
HIPAA: Health Insurance Portability and Accountability Act. (2002) Available online at http://www.hhs.gov/ocr/hipaa
Kim J.J.: A Method for Limiting Disclosure in Microdata Based on Random Noise and Transformation. American Statistical Association, Proceedings of the Section on Survey Research Methods (1986) 303-308
Kim, J.J., Winkler, W.E.: Multiplicative Noise for Masking Continuous Data. American Statistical Association, Proceedings of the Section on Survey Research Methods (2001) cd-rom
Skinner, C.J., Marsh, C., Openshaw, S., Wymer, C.: Disclosure Control for Census Microdata. Journal of Official Statistics (1994) 31-51
Skinner, C.J., Elliot, M.J.: A Measure of Disclosure Risk for Microdata. Journal of the Royal Statistical Society, Series B, Vol. 64 (2002) 855-867
Truta, T.M., Fotouhi, F, Barth-Jones, D.: Disclosure Risk Measures for Microdata. International Conference on Scientific and Statistical Database Management (2003) 15 - 22
Truta, T.M., Fotouhi, F, Barth-Jones, D.: Disclosure Risk Measures for Sampling Disclosure Control Method. Annual ACM Symposium on Applied Computing (2004)
Truta, T.M., Fotouhi, F, Barth-Jones, D.: Assessing Global Disclosure Risk Measures in Masked Microdata. Workshop on Privacy and Electronic Society (2004)
Willemborg, L., Waal, T. (ed.): Elements of Statistical Disclosure Control. Springer Verlag (2001)
Dr. Yuval Elovici, Ben-Gurion University, "Hidden Web Privacy Preservation Surfing (Hi-WePPS) model"
In this study we propose Hi-WePPS a new model for privacy preservation while accessing hidden (or invisible) -Web sites. Many Hidden-Web sites require subscription via form filling and usage of a dedicated search engines. The basic assumption that motivated the development of Hi-WePPS was that web sites can not be trusted for preserving their surfers’ privacy. Hi-WePPS generates "intelligent" noise while surfing hidden-web sites in order to conceal users' interests (profile). The noise is in the form of fake requests that provide wrong data to the automatic programs that collect data about the users. A prototype of Hi-WePPS is being developed for preserving the surfer's privacy while accessing to the US patent office site (www.uspto.org). The prototype enables surfers to access the site and search for patents without exposing the exact domain of their interest. Related Work: Many sites on the web try to derive their users’ profiles , either in order to personalize services or for their own commercial benefits. Profile generation might be performed explicitly with the user cooperation and input , or implicitly without the users' awareness or cooperation while violating their privacy. Personalization is a growing issue in the web community. Personalization techniques were originally developed to make web sites welcoming and familiar for regular surfers, like going into favorite restaurants and ordering “the usual”. However, the data collection and profiling techniques developed for personalization are now also used to violate privacy, i.e., to collect users information explicitly without users authorization for commercial purposes and for industrial espionage. Previous studies related to privacy preservation focused on protecting the client’s privacy in the global web. Most systems preserve privacy by anonymzing their users. Crowds  for example, helped the client to mix in the crowd of surfers, so eavesdroppers won’t be able to know which client contacts which web server. JANUS  makes both client and server anonymous using the JANUS system as a separator between them. Subscribed transaction-based sites provide services and require subscription and identification thus the user cannot be anonymous. Thus anonymizing technologies are not adequate for privacy preservation on these sites. PRAW [5,6] is a privacy preservation model that has a unique approach of concealing users interests by generating fake transactions rather than concealing users identity, thus it preserves privacy without anonymity. PRAW  generates fake transactions similar to a user’s behavior in order to create noise in the user’s profile in order to prevent eavesdroppers from generating the actual user profile. However, PRAW was developed only for the general Web, and cannot conceal the user privacy when the user is surfing subscribed sites (hidden-Web). Some sites have the TRUSTe  “trustmark” that grants users a protective feeling while surfing. However, even if a site’s formal policy is to preserve their surfers’ privacy, there might still be some “insider" eavesdropper that does not keep this policy and collects information about the surfers. HI-WePPS: HI-WePPS generates "intelligent" noise while surfing to subscribed (hidden-) web sites in order to conceal user's profile by providing wrong data to the automatic programs that collect data about the users. The model "learns" the user’s interests and the site’s domains in order to automatically fill the forms on the site and generate transactions that would be relevant to the sites domain but still create a fuzzy cloud around the users' actual information needs. Our model is extremely important for users who desire to keep their interests in private when surfing sites with sensitive data. One example is industry people when surfing the patents site (www.uspto.org) and do not want any one to discover what they are up to. HI-WePPS was designed to protect users' from two types of eavesdroppers: a. An eavesdropper attacking on the path between the web site and the user. b. An eavesdropper looking at the web site’s log file or Database In order to formulate the model we use the following definitions, similar to those used in PRAW , as Hi-WePPS and PRAW are based on a similar concept of concealing privacy by fake transaction generation. (PRAW is designed for the general Web but can not deal hidden Web sites). Internal user profile (IUP) is the user profile constructed inside the user’s computer. The profile is based on the terms the user uses in its queries and the content of the pages the user accesses. External user profile (EUP) is the user profile based on the information that might be collected about the user in the subscribed web site or on the path between the user and the web site. An eavesdropper on the target web site is able to compute the EUP by looking at the log file or the log Database at the Web site, and recording every users' action on the web site. We assume that eavesdroppers do not have any smart agents, such as computer viruses, inserted in the user’s computer, and are therefore not able to compute the IUP. Regularly, the IUP and EUP are identical. The goal of Hi-WePPS is to assure that EUP is not identical to IUP, but different in an intelligent way so that IUP can not be derived from the observed EUP even if statistical methods like clustering are applied. EUP is actually an expansion of IUP with fake transactions in the larger domain of IUP. For example, we expect that while the user looks for patents related to Tennis the model will generate faked transactions looking for patents in the general sports domain. If EUP contains fake information related to totally different subjects than the subjects in IUP, it might be possible to differentiate the fake data from the actual data. The model architecture consists of four main components:, System Initializer, Profile Builder ,Wrapper,, and Transaction Generator. After the system is configured and initialized by the System Initializer, the Profile Builder component builds the actual user profile (IUP) using the user’s query and the content of the pages that the user download. The Transaction Generator performs an intelligent query expansion to transform the IUP into a EUP by generating fake transactions. The fake transactions are based on input from the Wrapper component that learns the Web site structure using Hidden Web Crawlers methods like the HiWE crawler . Hi-WePPS Contribution: The developed model presents the following theoretical and practical significance: Privacy model -A new method of privacy preservation model for surfing the hidden-Web is suggested. The model is important for services on the Web that require identification, form filling or the usage of a dedicated search engine. As the Internet is transforming to a non-free service, the number of such Web-sites is growing. Most of the existing solutions suggest anonymous routing and are based on the concept of concealing user’s identity. However, for those cases where the identity of the users is required,, anonymity cannot be a solution for privacy. Hi-WePPS focuses on preserving privacy without anonymity, and thus seems to be an adequate solution. Privacy measure – A new measure of privacy is suggested that is based on the distance between the original user profile (IUP) and the faked profile (EUP). The new measure examines the degree of privacy the system can guarantee its users. User profile frequency update - The frequency of the profile adaptation is a problem related to Information Filtering research area where a formal recommended frequency may contribute to the accurate generation of the profile. In the current study, the frequency of the profile adaptation affects also the degree of privacy a user might obtain. The model has to identify a shift in the user's interests and to adjust the profile accordingly as soon as possible; otherwise the EUP would be too different from the IUP, so that IUP might be exposed. References  S. Reghavan and H. Gracia-Molina, 2001, “Crawling the Hidden Web”, In Proceedings of the 27th International Conference on Very Large Data Bases, 2001, Rome, Italy.  T. Demuth, and A. Rieke, 1999, “On securing the anonymity of content providers in the world wide web”, In Proceedings of SPIE '99, vol. 3657, 494-502.  M.K. Reiter and A.D. Rubin. 1998, “Crowds: Anonymity for Web Transactions”, ACM Transactions on Information and System Security, Vol. 1, No. 1:66-92.  U. Hanani, , B. Shapira, and P. Shoval, 2001, “Information Filtering: Overview of Issues, Research and Systems”, User Modeling and User-Adapted Interaction 11, 203-259.  B. Shapira, Y. Elovici, A. Meshiach and T. Kuflik, 2001, “PRAW – The Model for PRivAte Web”, Accepted for publication in JASIST.  Y. Elovici, B. Shapira and A. Meshiach, "A New Privacy Model for Hiding Group Interests while Accessing the Web”, Workshop on Privacy in the Electronic Society in association with 9th" , ACM Conference on Computer and Communications Security (2002).  P. Benassi, 1999, “TRUSTe: An online privacy seal program”. Communication of the ACM, 42(2), 56-59.
Mr. John Stefani, DePaul Law Alumnus, "Finding Waldo: Preventing the Use of Face-Recognition Software on Political Rallies and Protests"
As more cities install surveillance cameras on street
corners and in other public areas, images of Big Brother haunt
defenders of civil liberties, while hopes of deterring crime
embolden elected officials. Concerns about surveillance are often
couched in terms of “privacy.” However, privacy is not the only
issue of concern: free speech and anonymity are also threatened with
the advent of facial recognition software.
This paper attempts to outline the importance of maintaining the right to public anonymity in the wake of technological advances that make in-person querying of identity superfluous. Drawing from both First and Fourth Amendment law, this paper seeks to establish a nexus between the right to public anonymity with respect to political speech, and expectations of privacy in public.
The Supreme Court has a history of providing legal protection for public anonymity. The Court has helped establish the right to withhold one’s identity in three relevant areas: writings, distribution, and organizational membership. In each area, a person is allowed to enter the public political arena without fear of identification or reprisal. Legal scholars and philosophers alike have an appreciation for the importance of this protection in the development of the political self. Indeed, our Founding Fathers relied on it when penning the Federalist and Anti-Federalist Papers.
Though the courts have ruled that government agents may not require people engaged in political expression to identify themselves, the development of biometric technology is rendering in-person identification unnecessary. Utilizing facial recognition software, surveillance cameras are able to locate and identify an individual within a crowd. This technology threatens to circumvent the courts’ current protection of public anonymity.
As it has with other technologies, the law must evolve to protect important Constitutional privacy values. From wiretapped telephone booths to thermal imaging of homes, the Supreme Court has ensured that privacy protections evolve with the government’s advances in surveillance technology. The Court has not allowed law enforcement agencies to eviscerate privacy protection by “enhancing” human senses through the use of increasingly sophisticated eyes and ears. This same technologically astute analysis should be applied to protecting political expression.
Therefore, the issue of using face recognition software in tandem with public surveillance cameras needs to be addressed. Creating a database of faces for surveillance cameras endangers First Amendment expression in light of the immensity of the databases at the government’s disposal. For example, many states maintain digital records of drivers license photographs. These databases, potentially, would allow officials to program the cameras to identify anyone with a driver’s license.
Recent incidents suggest the potential for abuse and highlight the need for legal protection. The Denver police department was reprimanded for following peaceful protesters while recording names and license plate numbers in addition to photographing the participants. The activities of the police only came to light because the victims were aware of the police presence. What happens when the police no longer need to be physically present to achieve the same results?
If the argument that public anonymity must be protected from the use of biometrics is successful, the next step is tailoring a legal regime to guarantee those protections. Whether through the courts or through legislation, steps need to be taken to define the parameters for using this technology, as well as remedies for when those limits are broken.