The net has ben buzzing with news that the US government wants data from Microsoft, AOL, Yahoo and Google to support its case in the law court for internet censorship to protect children. The first three surrendered the data while Google decided to put up a fight. The question raising the most hit around the net focus around privacy of individuals are compromised. I, instead, think everyone miss the big picture “Can companies and government compel others to give them data they would like to have simply because it is useful to them and might be able to advance their case?”
Let’s get the privacy issue out of the way first. Whether individual privacy is violated depends very heavily on the data being turned over. If it is anonymized data, i.e., no IP address, user name etc that can be traced back to an individual, it would not be a problem. To pacify people that their privacy is intact, the companies involve and the US government simply just have to disclose the nature of data turned over. With this, people can make their mind up on whether massive privacy violation take place. We have a rule of thumb at work, if the data can be trace back to a group of less than 25 persons, then we are voilating Data Protection and Privacy Rules.
A bunch of anonymized data of the searches done by a million or more people is a rich source of data for a data mining operation. The US government is effectively asking the big four search engine to turn over their data to allow them to mine the data. The size of the combined dataset is huge and is in itself, much more valuable then the total sum of the individual datasets Hence, assuming any government, or any big companies is capable or willing to pay for mining a large dataset, should anyone be compelled to give them the datasets? I do not think so. In the case of litigation, I am willing to give the defendent more leeway to ask for a large set of data from the plaintiff to allow him to make his case or vice-versa. I cannot see why a totally unrelated third party can be asked to provide data not directly related to the court case simply because the defendent or the plaintif believe it is helpful. Normally, a search of material held by third party are very narrow and can be directly linked to the court case. Here, it seems the relationship between the data asked by the government and the court case is very slim and at a very long stretch.
Why? The US Government hope to use this to support its case for Internet censorship to protect children. But the nature of the data may mean that this data mining operation is meaningless. Bottomline is that does censorhip of the internet, in the form proposed by the legislation under investigation, protect American Kids and its implication on the freedom of American Adults. It is important to note that the US government can only argue that the legislation will protect American kids, not English Boys or a French girls.
Here lies the problem, the datasets turned over probably consists of searches all over the world, the number of searches from overseas is highly likely to be significant enough to upset the conclusion derived. Then, another fundamental question is how one can distinguish between searches by American kids and searches by American adult. American government is not an idiot who do not realized it, thus raising the concern on how much detail is needed for the data to be meaningful and this give rise to the privacy concerns above. Details are the bad guys when it comes to privacy. Again, only if they disclose the nature of the data turned over, we cannot judge for ourselves.
Lets hope more information comes to light to allow us to make an informed opinion on whether this overstep the privacy mark.