Monday, July 06, 2009

Data Mining Kids and Teens' UGC and Social Networks

Via YPulse, and in follow up to an earlier news item the site broke last week, some coverage and discussion of Echometrix's new service PULSE, a self-described "real-time web based sentiment analytics tool that specializes exclusively on teen data (ages 7 to 21)." The service is already getting some buzz and news attention, and my immediate thoughts and concerns are no more positive now that I've read some of the initial reactions.

First off..."teen data"? Since when are 7-year-olds considered to be teenagers? What they really mean is children and teens, so COPPA should be in full effect here, but anyway. The confusion of kids and teens in Pulse's own descriptions of the service obviously makes it all the more problematic, not to mention difficult to analyze. Although it's possible that they really do distinguish between teens and kids in terms of the data they actually collect in some cases, in others (such as UGC) they are quite explicit about including data from both kids and teens. Furthermore, according to their URL and their Tween Pulse Research Blog, the company is pretty aware that they're not really limiting their research to probably just sounds more palatable that way.

The rest of the claims made in Pulse's corporate materials are downright creepy, and basically boast about mining youth-produced UGC, blogs, forums websites and chatrooms, along with any IM that "teens" engage in through the company's other product, "FamilySafe." According to Proudfoot, and as the product name would suggest, FamilySafe is an Internet security program that monitors and analyzes everything a child does online, and then alerts the child's parent of "anything alarming" through text messages. So far, FamilySafe monitors approximately 150,000 young people in the US and Canada, all under the rubric of providing parents with more "control" over their kids' online activities. Of course, it's highly doubtful that parents who have signed on to the FamilySafe were considering the larger market value of the "non-alarming" information that is also collected from their children through this service, but then again I guess you can't really be all that surprised when you discover that the market research firm you've paid to monitor your kids has a hidden agenda. Sorry to all the parents who bought FamilySafe unaware of its market research implications, but in this day and age of widespread corporate surveillance, a little research into the company's ownership and a thorough reading of the terms of service is pretty much mandatory when installing software that will track your/your kids' every online move. On the other hand, this is also a great (and possibly very devious) example of the corporate misuse (abuse?) of parents' safety concerns as a Trojan horse for market research and/or marketing.

Anyway, here's an excerpt of The Pulse product description:
Every single minute PULSE is aggregating the web’s social media outlets such as chat and chat rooms, blogs, forums, instant messaging, and web sites to extract meaningful user generated content from your target audience, the teens!

PULSE contextualizes the aggregated content and provides instantaneous customized summaries in real time of the teen market. The PULSE identifies, evaluates, and graphically displays a wide spectrum of analytic information relating to the type, tone, grade, frequency of communications, impressions, needs, desires, hopes, dreams and wants of this teen audience who live on the Web.

[...] we focus on user generated content, which is the only data source solid enough to reveal the author’s true attitude and emotion. We provide you with access to 100% unbiased, unfiltered, and user group generated content from a vast network of teen focused content sources such as forums, blogs, chats and IM conversations.

Well, as unbiased as any of us are when we post stuff online. There is certainly an element of performativity within individuals' online identity management practices that is being ignored here. But this very critique can also become a slippery slope into dismissing the importance of what Pulse and their contemporaries are doing. A good example of this is found in the article discussed in today's YPulse post about, um, PULSE. The article, written by Shannon Proudfoot for, describes the invasive nature of the data mining service but quickly jumps to downplaying its importance by positioning the service within the context of that standard old argument of 'how little it will ultimately matter because you can't really find anything out that way, and people don't post real info about themselves online, and aren't marketers just so out of touch with the youth, etc., etc.'. Not that I recommend starting a moral panic around this or anything, but there's got to be a better way of exploring these things without it coming off as either insanely and unrealistically pessimistic or as insanely and unrealistically optimistic. Luckily anastasia is much more nuanced in her discussion of the article, outlining both the perceived weaknesses of the Pulse methodology, as well as highlighting the need for regulation "when it comes to mining data of internet users under the age of 18."

In terms of the methodology, obviously the corporate description is overly celebratory and vague. anastasia seems optimistic that the data collected won't be all that useful, as do the teen and IT expert Proudfoot interviews in her article. Granted, Proudfoot's interview with Echometrix CEO Jeffrey Greene doesn't reveal much to the contrary, as he glibly dismisses tried and tested qualitative research methods (which the market industry has really perfected over the past three decades) in favour of Pulse's own quantitative approach:
"Services like Pulse are in huge demand because they provide nearly instant feedback in a swiftly changing media environment, Greene said, and fly-on-the-wall results are much more accurate than traditional market research.

"Teens are so clever that people who attempt to do research in the teen marketplace often tell us that teens 'game the system,' " he said. "When teens participate in an online poll or a focus group, they know or think they know what answer we want to hear, so that's the answer they provide."

The company said Pulse predicted Kris Allen's surprise American Idol victory before the results were announced in May. Teens talk about iPods 13 times more than the Zune MP3 player, the program reveals, and the iPhone gets four times more buzz than the BlackBerry."

But the thing is, the data they're collecting includes a lot more than the mere number of times a particular brand name is mentioned, and we would be wise to remember that data mining technology is advancing at lightening speed before assuming that their methods are ineffective just because they're not releasing any "rich" proprietary information to the press. YPulse's anastasia describes PULSE's methodology as lacking in comparison to focus groups and surveys because the data is likely to be decontextualized and opinion also expressed by Proudfoot's teen insider. But rather than think of PULSE's market analysts as a bunch of out-of-touch suits, who "at 50-something has no idea how a teenager thinks, saying, 'This is really interesting!'", we should instead suspect that some pretty on-the-ball, tech savvy, youth culture (and behaviour) experts are much more likely to be the ones interpreting the data collected...and rather than looking at solely at frequencies, i have no doubt in the world that they will also be tracking how teens talk and when, to whom and how their opinions change over time, and applying all sorts of rich qualitative methods in their analysis (discourse analysis, trend analysis, profiling, identifying archetypes, etc.). That's the beauty of data mining...the info you can collect is vast and the connections you can make between units of data boggle the mind, and there's always the potential for rich interpretation...even if that's not what Echometrix is currently promoting (or revealing, or even doing).

Proudfoot also interviews an independent technology analyst, Jesse Hirsh, who expresses similar reservations about the efficacy of the system, stating: "Nobody ever posts an honest Facebook photo of themselves. They post the best Facebook photo of themselves, so they're not really being honest, are they?" But then again, doesn't that picture actually tell us quite a lot about what that person wants, even if it's not such an accurate picture of who they are? What they want to portray to others, how they would like to see themselves, what they think of as an ideal or appropriate or funny public display, etc.? And isn't marketing all about identifying and exploiting our wants and ideals?

Overall, what this story tells me is that even if PULSE doesn't succeed with their own attempt to data mine youth-produced UGC, the fact remains that the technology is out there and its ability to exploit kids' online contributions, thoughts and communications is being overtly and unapologetically promoted as such to the public. If there were any lingering doubts that kids' UGC requires some regulatory protection to prevent its misuse and misappropriation by (adult-led) corporations, this newest case study should finally put them to rest. It's in the public domain, but are kids' authorship rights really being fostered and adequately supported in this kind of environment? Perhaps it's time for a Creative Commons-produced terms of service, which teens and children can put on their websites, blogs and forums, explicitly and formally forbidding users of the site (human or automated, including Echometrix's webcrawlers) from appropriating content published on the site for profit without explicit permission of the author...I mean, isn't that how the public domain is actually supposed to work anyway?

On a more positive note, big props to Proudfoot for interviewing a teen as one of the experts on this clearly teen-relevant issue. Awesome!


Shaping Youth said...

May I crosspost this one? AJ

Sara M. Grimes said...


Shaping Youth said...

Thanks,'re live on Shaping Youth today!

And I think you've just kicked off a mini-series for me...(just when I was wrapping up healthGAMERS, this data mining/privacy stuff is REALLY usurping my mindshare...yowza.)

Talk soon, great job! Keep up the good work, and btw, didn't you officially become 'doctoral' this year? --A.