Karela Fry

Just another WordPress.com weblog

Data grabbers: the market opens

leave a comment »

The Financialist, a house magazine of Credit Suisse, talks about data mining:

A data scientist is somebody who can understand business problems, who can actually do an analysis that informs a solution to a problem and then communicate it successfully. But they do it by using a skill set that has never before been combined into one profession.

The basic skills are the technical abilities to get data out of a system and process it, and perhaps build infrastructure on top of it—that’s engineering and hacking. Then you need to do an analysis—that’s statistics, linear algebra and probability theory on the math side. And then the last piece is the combination of social science and curiosity and understanding business—asking the right questions, translating them into your mathematical and engineering analysis, then translating that into something you can actually talk to other human beings about.

I haven’t hired anyone with a data science master’s degree because the programs are just starting. Data scientists come from all different fields, including a lot of academic scientists who are leaving academia and who can be trained to communicate. I’m a computer scientist, and I work with an astrophysicist, a physicist, another computer scientist and a mathematician. But I have peers in other companies and universities who come to it from political science and psychology. It’s such a young field that people are arriving in it from many different directions.

The Guardian publishes a well-argued piece on what data mining from small products could tell you:

The rental terms for the [Oxford English Dictionary] and other [Oxford University Press] rental services are about what you’d expect. The company reserves the right to retain your personal information – everything that goes into a credit-card validation, including your name, address and phone number – and to store every click and query you make while reading the book; as well as which IP address you visit the site from, and to link these things together. It reserves the right to retain this information indefinitely.

It reserves the right to disclose it to a long list of parties, including loosely defined “affiliates,” for a wide range of purposes, including equally nebulous concepts like “security.” And it reserves the right to treat this data as a company asset, and to sell it along with the company, should the publisher ever be sold on.

Taken together, these terms grant the publisher permission to track your movements, your interests, and even your personal relationships (if you log into the service from a friend’s IP address, say). They grant the company to disclose that information now or at any time in the future.

And they allow a third party to buy the company and change the terms under which the data has been gathered, to sell it piecemeal or to publish it.

In this regard, the digital editions of the OED and HTOED are no different from many other digital “products”. OUP’s terms are not the worst in the digital world, though they’re far from the best. What is exceptional is for these terms to be applied to the most significant reference books on the subject of the English language in the world.

Data mining has been discussed widely since the Snowden affair. An extremely illegal form (think of the Will Smith movie An Enemy of the State) is reported in WSJ:

National Security Agency officers on several occasions have channeled their agency’s enormous eavesdropping power to spy on love interests, U.S. officials said.

The practice isn’t frequent — one official estimated a handful of cases in the last decade — but it’s common enough to garner its own spycraft label: LOVEINT.

Spy agencies often refer to their various types of intelligence collection with the suffix of “INT,” such as “SIGINT” for collecting signals intelligence, or communications; and “HUMINT” for human intelligence, or spying.

The “LOVEINT” examples constitute most episodes of willful misconduct by NSA employees, officials said.

In the wake of revelations last week that NSA had violated privacy rules on nearly 3,000 occasions in a one-year period, NSA Chief Compliance Officer John DeLong emphasized in a conference call with reporters last week that those errors were unintentional.


Written by Arhopala Bazaloides

August 26, 2013 at 3:47 am

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: