I'm Not a Lawyer…


Google recently announced a Prediction API:

The Prediction API enables access to Google's machine learning algorithms to analyze your historic data and predict likely future outcomes. Upload your data…, then use the Prediction API to make real-time decisions in your applications.

Interesting, and possibly very useful, but is Google going to keep a copy of my data for its own use? Or will it record some characteristics of that data (e.g., to improve its algorithms)? The former seems unlikely, but then, so did the success of Javascript as a general-purpose programming language. The latter sounds very possible, which makes me worry about accidental information leakage, particularly through post-facto correlation.

Their privacy policy doesn’t explicitly mention uploaded data (though it’s very clear about what Google will and won’t do with personal data). As of right now, their product-specific elaboration doesn’t seem to have anything either. The Terms of Service for their APIs looks promising, but Section 4 (“Your Data”) just says:

  1. You're responsible for your data.
  2. " Google claims no ownership or control over any of your Data. You retain copyright and any other rights you already hold in the Data, and you are responsible for protecting those rights, as appropriate." OK, but that doesn't say that they won't keep a copy, or keep a copy of any metadata or aggregated statistics that they calculate from it.
  3. "By submitting, posting, displaying, or transmitting Data on or through the Service, you give Google permission to process your Data for the sole purpose of enabling Google to provide you with the Service in accordance with its privacy policy." OK, they're not allowed to process my data to do other things than the service I signed up for, but again, what about any derived metadata?

I’m being deliberately paranoid here, but past experience with licensing and other legal matters has taught me to assume that contracts mean exactly, and only, what’s written. I’d be interested in hearing from anyone who actually is a lawyer about whether my “aggregate, record, and use” scenario could be defended.