Opinion, Berkeley Blogs

Is Google reading your bMail?

By Chris Hoofnagle

Both users of bMail and the campus itself have never received a clear answer to a simple question: Is Google subjecting data in Google Apps for Education to data analysis or mining for purposes unnecessary for technical rendition of service?

A recently-filed lawsuit suggests that Google is indeed applying analysis to our messages, but masking this behavior by not showing users advertising. In Fread v. Google, students from the University of Hawaii and the University of the Pacific allege:

"…Google does not serve targeted content-based advertising to Google Apps EDU users. Google nonetheless extracts the content and meaning from Plaintiffs' and Class Members' Sent and Received e-mail messages and uses that content for various purposes and for profit

[…]

From this reading, Google collects, extracts, and/or generates metadata consisting of "PHIL Clusters" (Probabilistic Hierarchical Inferential Learner). "PHIL Clusters" represent the meaning inferred from particular words or phrases in Plaintiffs' and the Class' Received and Sent e-mails.

Systems such as PHIL, or similar systems, learn "concepts" by learning an explanatory model of text.

Thus, Google's use of PHIL's concepts are designed and supposed to model the actual ideas that occur in Plaintiffs and Class members' mind in creating e-mail content.

Why should we believe these plaintiffs? For one, they are represented by attorneys who are litigating another case concerning Gmail, Dunbar v. Google. In the discovery process, these attorneys could have learned about this data analysis. We, the users of bMail, however, will remain in the dark, because Google has sealed much of the record in these cases.

As educational institutions, we are under a duty to supervise and control Google's maintenance and use of educational records. We cannot do this without having clear answers about Google's processes. The Fread lawsuit gives us an opportunity to do so: Google's filings with the court are under oath and scrutinized by the expert plaintiff lawyers in the case. As a System, we could call upon Google to provide us with unreadacted versions of these materials.

Related: The good, not so good, and long view on Bmail