|Topic area||Data Mining and Invasion of Privacy, a study of the potential impact of techological advances on society with specific reference to targeted direct mailings (junk mail) and Internet spam.|
|Target audience||Specifically students in a data base or data mining course, but appropriate to others who can understand the concepts of data mining and the problems of junk mail and spam.|
|Activity type||Pre-class research activity followed by in-class discussion, or a homework assignment.|
|Background needed to complete the assignment||An understanding of the principles and concepts of data mining, and the potential for "misuse".|
To study the potential impact of indiscriminate use of data mining as a tool to enhance the accuracy of direct marketing (junk mail) and consequently to uses that more directly invade individual privacy.
Goals for the activity:
To awaken a student's understanding of the ability of computing technology to surpass the bounds of manual activity and to provide services that have a potentially negative impact if used inappropriately.
Knowledge / skills / attitudes to be developed (behavioral objectives):
To think ahead when developing software or algorithms as to the potential negative usage, and not to simply expect that users will act responsibly.
Prepare a hand-out in the form of a pre-class research activity:
In our studies of the impact of the computer on society, there are numerous examples of activities that would not be possible without the aid of the computer. The technique of data mining is one such advance in technology. The methodology alows users to garner data from disparate sources and to coalesce data into information that did not exist in single resources. It is possible to find interesting and (potentially) useful patterns of associations, correlations, dependencies and summarization in data. Examples include finding valuable nuggets of information in market sales data, correlations among individuals in U.S. census information and identifying trends in demographics and election voting practices Thus an individual having given information to different data bases, believing that the data in each is innocuous, now finds that the assemblage of that data provides a portrait that is invading his/her privacy. For example the coalition of data from healthcare data, credit data, student records, driving history etc. could develop a very interesting portrait of a person.
One of the ways in which data mining is conducted in a typical business context is that it is rarely "one-shot"; it's more an iterative, cyclic and repetitive process where the whole methodology of "knowledge discovery" is integrated into the business process. This means that in some cases, it is very tightly coupled to the actual process that generates the data, cleans it and preprocesses it to make it amenable to sophisticated analysis. There is a potential question here on whether this is a good thing to do or is it best to view these stages as distinct ones.
Write a 2000 word report on data mining, its potential as a positive tool in the management of data and the assimilation of information, and the dangers of its misuse. The audience for the report should be non-technical but somewhat computer literate. Examine the relevant laws with respect to computer privacy and determine whether the use of data mining violates those laws.
If this is to be used as an in-class activity then the followign questions should be prepared on a sheet with space for student answers to be completed (say) in the first 10 minutes of the class. There then will follow a discussion of these points.
As a homework assignment include these questions as part of the assignment.
The reports can be graded in the light of the in-class discussion that follows. This would be an opportunity to have students do peer evaluations of their colleagues work. However if this approach is used establish a very clear but rigid grading schedule for each grader to use.
This activity could easily be used either in course specifically on computer ethics and social impact or as a single activity within a course on data bases or data mining. It is a good example of how concerns for ethics and impact can be integrated into many of the other "technical" courses in computer science.
Author contact information:
Department of Computer Science
Blacksburg VA 24061-0106
Ph: (540) 231-5780
FAX: (540) 231-6075
In collaboration with Naren Ramakrishnan, Assistant Professor of Computer Science, Virginia Tech.