Alt Data’s Ethical Day of Reckoning

With the alternative data industry projected to be worth over $350 million by 2020, it's time to consider whether financial services is on the brink of its own Cambridge Analytica moment or if it is simply time for an alt data ethics evaluation.

altdata-idm1018

In April 2017, during an atypical criminal case fitting for an era driven by technological advancements, a jury convicted Nan Huang, an experienced data analyst at Capital One Financial Corp., of insider trading.

Huang, hired to investigate fraudulent credit card activity, created detailed spreadsheets of private consumer credit card purchases from Capital One customers, amounting to 2.4 percent of the company’s sales. He then cross-referenced the non-public information with historical, publicly available data to analyze and create revenue projections for 226 publicly traded retail companies, which he used to execute securities trades before companies’ quarterly earnings announcements.

Using statistical analysis of the combination of the data, Huang traded in 105 out of the 226 companies amassing a total of $4,403,545 in positive profits, or a 12 percent return on his initial investment, according to the Securities and Exchange Commission (SEC).

Stephen Graham, an SEC employee with expertise in economic and statistical analysis, testified during the case and told the jury the consumer credit card data used was a demonstrably valuable tool that helped Huang make profitable trading decisions. Although found guilty on insider trading charges, Huang faced no legal repercussions for exploiting confidential, individual transaction data for monetary gain. 

SEC v. Huang is the first case evaluated within the confines of the law to determine financial executives’ impropriety when using alt data, and currently, no case law exists on the use of alt data following the implementation of Europe’s sweeping General Data Protection Regulation (GDPR), enacted to protect individual data privacy. Although Huang violated Capital One’s employee policies by obtaining the data without company consent, consumer credit card data sold to third parties remains one of the most lucrative sources of data with 38 percent of investment funds using it for an edge in the market. Web data was the highest at 48 percent, followed by social media and sentiment data at 36 percent, satellite at 29 percent, and geolocation at 24 percent respectively. 

The legal ramifications of insider trading in SEC v. Huang were clear as the verdict was announced, but the ethical implications of using personal information, such as consumer purchases, are the responsibility of each individual firm. As the scope of personal data collection has become etched into the public consciousness—largely due to Facebook’s breach of trust in the harvesting of personal information from its website for a British political consulting firm, Cambridge Analytica—firms using personal information such as credit card and social media data are on track to get swept up in the backlash.

Diana Ascher
Diana Ascher, UCLA

Diana Ascher, who received her PhD in information studies with a focus on data ethics at the University of California, Los Angeles (UCLA), began her career in business writing at Bloomberg. As the co-founder of the Information and Ethics Institute and the director of Information Studies Research Labs at UCLA, Ascher is well-versed in data ethics, and she predicts that fallout from data breaches in the technology sector will lead to an examination of alt data practices in financial services and particularly, fintech.

“If firms look longer term, they will find it is in their best interest to consider data ethics at the outset and not in retrospect,” says Ascher. “Ethical data practices will help their decision-making in terms of what data is collected, stored, and transmitted, and how we think about data, the consumer, and the ripple effects of sharing even pseudo-anonymous data with third parties.”

Alt Data Anonymity

At a Reuters Newsmaker event in London in July, Financial Conduct Authority (FCA) chair Charles Randell painted a stark picture of a world where huge quantities of data gathered from every aspect of our lives is owned by few, resulting in a population that is “prisoners to technology.” Although GDPR has fundamentally altered data collection practices in Europe, Randell called for a continuous debate on how to ensure innovation in financial services, from algorithms and big data, remains a “force for good” and doesn’t reduce people to numbers.

“We need to anticipate the fundamental questions that big data, artificial intelligence and behavioral science present, and make sure that we innovate ethically to shape the answers,” he said.

Data vendors and global companies have more access to a plethora of personal data than ever before. Datasets are being compiled on where we shop, how we eat, our political preferences, insights into intimate relationships, browsing history, and even our location, among many other facets of our daily lives and activities.  

Individual consumers using an app may not be aware that their location is being tracked to produce data on how many times they visited a Walmart parking lot or why that’s important for statistical analysis, but hedge funds eager to capitalize on this information are buying in. According to Alternativedata.org, a website run by former buy-side and sell-side data analysts, alternative data providers have grown from a little over a 100 companies in 2010 to over 350 with total buy-side spending on alt datasets expected to exceed 1.7 billion dollars in 2020.

In order to be compliant with GDPR, vendors and financial companies are required to make datasets anonymous. Anonymizing datasets is touted by vendors as not only a way to avoid legal repercussions, but as a crucial component of ensuring consumer privacy.    

Managing director of enterprise at Refinitiv (formerly Thomson Reuters) Marion Leslie says the use of personal data does not focus on individual activity, but rather, produces a macro view of behaviors that then provide a landscape view of market movers. 

“The part of the industry that we serve and the services we provide are absolutely not about individuals, it’s about market performance, prediction, and insight into how we believe or insights into how the customers can used to predict how markets behave. Certainly for us there is absolutely no interest in individual data,” she says. 

Ensuring Data Protection

Winston Maxwell, a corporate partner at Hogan Lovells international law firm who specializes in media, communication, and data protection laws and assists clients with global data privacy governance programs and implementation of compliance, says financial firms need to be responsible for their agreements with alt data vendors. Maxwell says it’s essential for firms to conduct proper due diligence checks with each vendor before using their alt datasets. 

Winston Maxwell, Hogan Lovells
Winston Maxwell, Hogan Lovells

“Investment advisors are very excited about alternative data, but the sort of training issues that we’ve been focusing on with financial institutions related to GDPR is ensuring in [firms’] contracts with the vendor that you have appropriate reps and warranties to make sure the data that’s being provided to the extent is fully anonymized, comes from a legitimate source, and has either the permission of the data subjects or some other legal basis to be used,” he says.

Maxwell notes that methods to gather and sell data are not always black and white, and that anonymization is one of GDPR’s gray areas. When referring to anonymous data, he said, “it gets pretty tricky. You get into the weeds of European case law and GDPR interpretations and what it means to be anonymous.” He adds, “What’s even worse is that what’s anonymous today may not be anonymous tomorrow because you may have tools or artificial intelligence and machine learning that can take a dataset and make it not anonymous.”

Under GDPR, data with individual identifiers removed is considered anonymous and not subject to the same protection requirements as data where the subjects are easily identified. But the wording of the second part of GDPR Recital 26, related to anonymous data, makes the law a bit tricky. 

“To ascertain whether means are reasonably likely to be used to identify the natural person, account should be taken of all objective factors, such as the costs of and the amount of time required for identification, taking into consideration the available technology at the time of the processing and technological developments,” GDPR Recital 26 states. 

In a study by 1 Media Lab at the Massachusetts Institute of Technology (MIT) researchers looked at the re-identifiability of three months of credit card records for more than 1.1 million individuals’ credit card metadata and concluded that anonymized financial metadata can be re-identified with spatiotemporal information. For instance, simply knowing the approximate price of someone’s coffee even when other key identifiers are removed or scrambled increases the chance of someone being identified.

Researchers took it a step further and studied coarsened data, which consists of data recoding and variable suppression, and deemed that coarse data still provides “little anonymity.”

Ascher notes that the industry has a problem with anonymity and that it should no longer promote datasets as anonymous even if data is encrypted and aggregated. 

“Vendors are looking at short-term gains, not long-term ethics,” she said. “There isn’t such thing as secure data, even if we have the strictest protocols in place to protect data today, we have no idea what type of technologies are going to be developed tomorrow that can overwhelm those systems or work around them.”

Alt data provider Unacast, a geolocation data provider seeking to “understand human mobility,” complies its data from apps and location data companies. Unacast recently raised $17.5 million in funding with significant investments from WhiteStar Capital, Open Ocean Capital, and European telecom giant Telia. Norwegian native and Unacast’s co-founder and CEO, Thomas Walle, says many geolocation vendors from Europe fled to the US due to GDPR concerns. 

Thomas Walle, Unacast
Thomas Walle, Unacast

He mentions an increase in the number of hedge funds and asset managers interested in the datasets Unacast sells, which Walle says helps deliver insights on where people work, live, what types of stores they visit and at what time. He notes that it is possible to link anonymous data back to individuals, but asks, “Do companies really want to do that? Do they benefit from that? No.” He adds that there are very few instances of data misuse in the industry. 

Walle says a primary driver for firms fleeing to the US, rather than complying with GDPR mandates, is the inability for certain vendors to account for all the personal data used when individuals decide to opt out. The opt-out function requires any data collectors to delete all the data associated with specific individuals if those individuals choose to limit the sharing of their data. If a vendor doesn’t have the technological frameworks to locate all the data for this requirement, it can be a challenge for compliance, he says.

GDPR is championed as creating a better relationship with users and data collectors through responsible, transparent data practices, Walle says. Although the number of geolocation data providers in Europe has gone down, data quality has increased among those who remain, which is significant for hedge funds, he adds.

After GDPR went live, web users were inundated with user agreements and consent forms, necessary for compliance and designed to give insight into how personal data is used, as well as how to opt out. Still, critics argue that convoluted consent forms don’t go far enough to explain how data is being used. Walle says the opt-out rate is “next to nothing,” sharing a case about a partner that has 2 billion users, but only about 30 users withdrew their consent.  

This could be chalked up to individuals being content with the way their personal data is shared or it may indicate a lack of consumer awareness around the importance of consent forms for data protection. Ascher says it’s the latter, noting that the expectation that ordinary consumers know their data is being monetized and sold is problematic.

“Consumers are playing catch-up in terms of information literacy and what’s being done with the information that they are transmitting,” she says. “People don’t realize they’re giving away their personal data for free and that data is being monetized by big companies and sent to other interested parties, who not only are making investment decisions based on personal information that’s gathered a variety of ways, but also can limit your access to opportunities based on the data that is associated with your profile structurally.”

charles-randell
Charles Randell, FCA

FCA’s Randell similarly stated that in order to engender trust between companies and consumers, companies must articulate in a coherent and easily understood manner a firm’s approach to using consumer data. 

“By good communication, I don’t mean pages and pages of obscure disclosures, disclaimers, and consents. I mean short and readable statements that make it clear what firms will and won’t do with their customers’ data,” Randell said, adding that data statements should be developed in partnership with consumers, “not imposed on them.”

Data privacy and sharing practices are magnified by data breaches, such as the October 2018 news about a bug in Google’s API for its Google+ service, which allowed third-party developers to access information on users and their friends. The public is no longer idle to data practices, evident by Californians’ votes in June 2018 to put data privacy at the forefront of legislation by tightening consumer laws through the California Consumer Privacy Act.

But for hedge funds and financial firms, Ascher asks: “How much data do you really need to make informed decisions that put you ahead of your competitor?” 

Getting Ahead with Ethical Frameworks

David Saul, chief data scientist and senior vice president at State Street Corp., regularly assesses and proposes new technologies and says financial firms need to distinguish between good intent or harmless intent and use data appropriately. 

david-saul-state-street
David Saul, State Street

“I think we’re much better off approaching [data ethics] intelligently and putting appropriate regulations in place before we have a major incident,” he says. 

For both Saul and Gideon Smith, a 20-year quant veteran at Rosenberg Equities, a subsidiary of AXA Investments, applying thorough vetting processes for alternative data providers has been fundamental for both compliance and ethical governance. Smith says both vendors and datasets are evaluated for quality, reliability, accuracy, sourcing, and ethics. 

“If companies are transparent about how they use the data and consumers and companies see that very clearly, it’s not only going to protect the data, I believe it’s going to generate greater business, because when people have more trust in the data then they’re going to invest more,” says Saul.

For Ascher, avoiding a data scandal means staying ahead of regulations and technology. She suggests requiring certification programs for data ethics and applying a code of ethics for data management would be steps in the right direction. 

“Many companies have no data management plan and have no transparent communication of what that ethical code is. As consumers become more literate about these issues, especially in light of lawsuits against Facebook, Twitter, and social media platforms, we really need folks to be talking about other sectors like fintech, where financial information is tied directly to personal information and it is very valuable,” says Ascher, adding that if consumers refuse to give up their data for free, it makes it harder for companies to capitalize on individuals’ personal information. 

“Selling the data, that will never go away,” says Walle. “All this goes to business models that are tailored by apps, publishers, and the industry. I agree with critics that apps and companies have to be better at showcasing and being concerned with the users and how the data is being used.” 

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe

You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.

‘Feature, not a bug’: Bloomberg makes the case for Figi

Bloomberg created the Figi identifier, but ceded all its rights to the Object Management Group 10 years ago. Here, Bloomberg’s Richard Robinson and Steve Meizanis write to dispel what they believe to be misconceptions about Figi and the FDTA.

Where have all the exchange platform providers gone?

The IMD Wrap: Running an exchange is a profitable business. The margins on market data sales alone can be staggering. And since every exchange needs a reliable and efficient exchange technology stack, Max asks why more vendors aren’t diving into this space.

Most read articles loading...

You need to sign in to use this feature. If you don’t have a WatersTechnology account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here