GameStop post-mortem: Alt data world confronts eroding barrier between online and real life

After Redditors staged an epic short squeeze against a handful of hedge funds, some in the industry are left wondering whether today’s models and data techniques are prepared for world where online often equals real life.

  • Like most corners of life, social media is bleeding into trading and investment decisions, both in retail and institutional markets. 
  • After a group of Reddit users banded together to short squeeze hedge funds Citron Research and Melvin Capital, both of which were shorting the stock for video game retailer GameStop, the financial system was blindsided as the stock rose hundreds of dollars practically overnight. Despite a healthy and well-funded community of alternative data providers—which the buy side employs for millions of dollars to spot early trading signals—GameStop online sentiment went undetected or ignored while the play bubbled up on the site for months. 
  • The saga unfolded just three weeks after an armed mob stormed the US Capitol, where law enforcement stood unprepared and overpowered despite bountiful evidence on social media that some attendees were planning violence. Faced with these two sagas, it’s clear that the line that used to starkly separate real life from online is eroding quickly. 
  • Now, with Congress holding an ongoing series of hearings to investigate the GameStop/Reddit debacle, and with the National Guard still occupying the US Capitol, citing a continued threat, some in the industry—and surely beyond it—are wondering how they and their models can keep up with a world where the nature of factual, credible, and important information is changing shape.
     

“While I do not think that anyone could have anticipated these events, I’ve learned much from them, and I’m taking steps to protect our investors from anything like this happening in the future,” said Gabriel Plotkin, Melvin Capital CEO.

Plotkin testified as part of last month’s congressional hearing, the first in a series of ongoing hearings addressing the now legendary tale of GameStop, Reddit, Robinhood, and a handful of hedge funds, one of which is Melvin. In a story layered with memes, irony, and—at times—sheer absurdity, are many smaller stories prompting debates and re-thinks of nearly all corners of the US financial system, from settlement times, to risk, to short-selling, to the power of alternative data.

WatersTechnology spoke with an array of data users and providers to understand two key things: Could someone have anticipated this? And how will they prepare for the next time? Because almost certainly, agreed most sources, there will be a next time.

Blind spot

In January, day traders on Reddit—a site with 52 million daily active users, more than 100,000 communities, and 50 billion-plus monthly views—initiated a short squeeze of epic proportions primarily against hedge funds Melvin Capital and Citron Research, which were shorting the stock of video game retailer GameStop. The play bubbled up on the subreddit r/WallStreetBets (which hit 1 million subscribers last year, and is now up to more than 9 million) over several months prior to GameStop’s all-time high on January 28.

Since then, its stock price has receded and rose again in waves, though not nearly close to its $483 peak. On the day of publication of this story, it had once again surpassed $150 on the New York Stock Exchange. For contrast, GameStop was trading around $4 per share last summer. Prominent names in the alt data space, like Thinknum and Eagle Alpha, have said they’ve noticed increased demand for Reddit-related datasets, Business Insider recently wrote.

One of the challenges when you remove all these frictions, and you create kind of this lottery ticket thinking, is: What information are these folks trading on? And I think that’s an area of concern because it appears to be no information or disinformation.
Kathleen DeRose, NYU

But this case begs the question from these providers, and the hundreds like them, of why they weren’t providing this type of data in the first place, or at least monitoring it. The global alt data market, valued at $1.64 billion in 2020 and projected to reach $17.35 billion by 2027, has become a lucrative business with vast room left to grow, according to a report by US-based consulting firm Grandview Research. Constituting those figures are data sources spanning credit and debit card transactions, email receipts, foot traffic records, mobile app usage, satellite and weather data, social and sentiment data, web-scraped data, and web traffic.

The latter three—while distinct from one another—fall into a similar, internet-centric vein. Social and sentiment data—particularly after more than a year of social distancing and limited in-person interactions—is largely derived from online activity, through social media giants like Facebook and Twitter, which many of these alt data companies have longer histories of monitoring than, say, Reddit. For that, there are several valid reasons.

Since its founding in 2005, Reddit had done a good job of relegating itself to some of the more niche and even darker corners of the internet, though certainly not to the extent that other similar internet forums like 4chan and Something Awful have. With its more traditional bulletin board-style layout, it lies somewhere in the middle of those forums and the continually evolving, major platforms, though in numbers and in relevance, it’s closer to the big players.

Reddit users coalesce around thousands of communities, or subreddits, which are organized by conversation topics. Several of these communities are aimed at creating and spreading images with the intent to make them go viral, lending credence to Reddit’s longstanding reputation as a meme factory.

For example, one meme known as Pepe the Frog—an anthropomorphic frog with many variations—originated in a 2005 comic strip, but grew to mainstream popularity through the likes of 4chan, Tumblr, and Reddit, before it devolved into a symbol of the alt-right and white nationalism, and the Anti-Defamation League added it to its list of general hate symbols. (Reddit has not banned the meme from its platform.)

On top of sitting upon a meme treasure trove—or landmine, depending on who you ask—Reddit “dialects” proliferate the site, an exact number of which is nearly impossible to pin down. For one, there’s memespeak, an example of which can be found in the word “thick”—written as “thicc”—or “dogs” which, on the internet, becomes “doggos.” Now that the world is more than a decade into the rise of social media, this language has a relatively long history, and given its ability to morph and spread like wildfire, no one is totally and completely fluent, especially not anyone who goes outside very often. That’s a huge problem for the machine-learning models used in alt-data scouting.

For another instance, many of Reddit’s 130,000+ communities are hyper-focused, or industry-specific, bringing in a slew of jargon, some recognizable and some not. While r/WallStreetBets contains much language and information that is decipherable to financial experts, it’s also home to lingo like “diamond hands,” which refers to holding a valuable stock over a long period, “paper hands,” which refers to closing out a position the moment the market shifts, and—I’m sorry—“tendies,” which has a rich, 4chan-derived history, but for time and sanity’s sake, is short for chicken tenders but used to describe the amount of payoff, or gains, from an investment.

So what’s a little financial machine-learning model—trained on its millions of reliable and straightforward data points, like regulatory filings, earnings calls transcripts, or anonymized credit card transactions—supposed to make of all that? And what does it mean for trading firms hoping to find early trading signals ahead of the pack, or mitigate their risk, or react when another group stages a coordinated attack against them?

Information vs. data vs. data and information in tandem

“I think the world is figuring out that there’s a huge difference between information and data,” says Bob Sloan, managing partner at S3 Partners, a data and research company that provides real-time short interest analytics to hedge funds and financial institutions. “Information is things that are mass consumed, and data are the things that actually get the signal in the noise. But what happens is that we consume a lot of information. And we don’t necessarily know how to use data.”

He recalls a story published by the Washington Post in December, in which a Harvard researcher used Amazon reviews of scented candles containing phrases like “no scent,” “no smell,” and “can’t smell” to identify potentially unidentified cases of Covid-19, a side effect of which is loss of smell. The researcher found that the number of scented candle reviews containing these terms nearly tripled from January to November, rising from about 2% to 6%.

“That’s using data in a mass of information,” Sloan says. But that’s no easy task.

If we take away anything from the GameStop saga, it’s that the nature of valuable information, and thereby data—from where it lives to how it’s deemed accurate and relevant—is changing. Alt data signals have long been referred to as needles in haystacks, but as data providers and users begin—or rather, are forced—to consider uncharted, nuanced territory such as Reddit—and other forms of social media like the video-centric YouTube—as tantamount to protecting their clients’ and their own investments, the stack reaches the size of mountains and oceans, all while the needles remain specks.

Other than the platforms used and assets at stake, it’s impossible to separate the retail and institutional investor in this story. Under normal circumstances, though, another huge factor separates the two: access to high-quality, timely data and information. In this case, the information was readily available to both, if they were looking, though the retail investor had the edge—they were the ones disseminating the information, encouraging fellow investors to join suit.

On one hand, it’s a huge win for the democratization of data, and it challenges the power dynamics that have for so long colored the divide between those that run the country’s financial system and those that are merely affected by it. But like most things that involve technology and human nature, there is—often unintended—a dark side, exacerbated by an exploding barrier between America’s rich and poor.

There’s a series of a huge macroeconomic and political changes that have crowbarred the sanity of the financial markets and turned them into, broadly speaking, a circus for a variety of reasons.
Lex Sokolin, ConsenSys

Speaking as part of a virtual panel hosted by the CFA Society of New York on March 24, Lex Sokolin, global fintech co-head of ConsenSys, a blockchain software company, described an environment that, over the last 20 years, he feels has allowed unaddressed, much larger issues than GameStop, Robhinhood, and Reddit to fester and lead the financial system exactly where it found itself on January 28—dumbfounded.

“There’s a series of a huge macroeconomic and political changes that have crowbarred the sanity of the financial markets and turned them into, broadly speaking, a circus for a variety of reasons. It’s not that Gen Z is cheap and only has 20 bucks to invest and therefore, Robinhood. But it’s the fact that Elon Musk has defeated Warren Buffet in being relevant. It’s the fact that the Federal Reserve has printed $6 trillion of assets and that interest rates are at zero. There’s nothing to invest in for the normal person. It’s the fact that being an expert and having knowledge has been debased, and is worthless, and whatever you feel is what matters. And it’s the fact that student debt per person has never been worse, and on average, Americans are broke. And so the only equity worth buying is a lottery ticket. Nothing else is worth buying,” he said.

Kathleen DeRose, a clinical associate professor at NYU’s Stern School of Business, echoed Sokolin on the same panel, drawing an analogy between today’s political polarization and the information markets that serve as the lifeblood of social media giants.

“One of the challenges when you remove all these frictions, and you create kind of this lottery ticket thinking, is: What information are these folks trading on? And I think that’s an area of concern because it appears to be no information or disinformation,” she said. As an example, she offered yesterday’s generation of Democrats and Republicans, who shared once, at least in the mainstream, an overlapping middle ground on many hot-button issues. Social media has largely influenced the erosion of that space, she said. “Should that same phenomenon happen in information markets that are financial markets, I think that’s something that could potentially be very negative.”

Scraping by

Neil Bond, former head trader at Ardevora Asset Management who left the business in April 2020, has been a critic of web-scraping technologies, telling WatersTechnology in July 2019 that Ardevora had piloted a project using web-scraped data, but ended up dropping it as it took too much work and didn’t add much value to the firm’s alpha generation.

“Knowing when these weird events are happening is useful to traders and there was no missing out when the prices started to move,” he says. “We were all talking about it, but really, we were thinking how ridiculous a situation it was.”

Today he acknowledges there are a number of young and in-the-works technologies allegedly trying to separate the credible and non-credible information—or “low-quality” versus “high-quality”—on social media sites and online forums, which would ease some of the workload borne by firms’ internal data scientists.

Some major companies, like Bloomberg and Liquidnet’s acquired business Otas Technologies, try to serve these functions already. The Bloomberg Terminal’s media heat signals have given traders like Bond trading ideas and early warning signals by using its newsfeed and setting up alerts for names contained in the firms’ portfolios. A column of the news feed would be devoted to metrics such as positive and negative words associated with media coverage and the number of people reading a relevant news story. At Ardevora, Bond and his traders also used Otas, a provider of analytics to the buy side before its incorporation into Liquidnet’s new investment analytics division, in a similar fashion to understand sudden price movements of their holdings and to set up stop-loss triggers.

How do web-scraping tools arrive? And what is good news and bad news? And what’s a good news source and a bad news source?
Neil Bond, formerly Ardevora Asset Management

“How do web-scraping tools arrive? And what is good news and bad news? And what’s a good news source and a bad news source? Reuters and Bloomberg journalists will get a higher rating than a Reddit forum, obviously,” Bond says. “But I think they will become more and more important because of the fact that [Redditors] have been able to move these share prices so dramatically. So they’ll be given a higher weighting, but I think that will just be temporary.”

If that’s true, and if any added emphasis is only temporary, the financial system could very well find itself back to a position like the one it occupied on January 28. To draw upon an old adage, once you see a bandwagon—or once Reuters and Bloomberg are writing about it—it’s already too late.

Keeping score

Joe Gits, CEO and co-founder of Social Market Analytics, built his data company around Twitter, of which it is a licensed partner.

“That was our very first database,” Gits says. “We have scored pretty much everyone that’s ever tweeted about an individual security.” Social Market Analytics scores those tweets based on nine different metrics, such as how often the account is retweeted, how often other users respond to it, and how accurate it is when it’s referencing a particular asset. It does this for all US equities, futures, cryptocurrencies, and foreign exchange (FX).

Founded in 2011, Social Market Analytics analyzes non-traditional and traditional data—such as Edgar filings—to derive market intelligence for asset managers and hedge funds. Currently, the company is beta-testing a new, Reddit-specific data product in response to multiple requests by hedge fund clients since January. Gits says he had thought about incorporating Reddit in the past, but had other products to prioritize first.

“We’re going to start with WallStreetBets. We’ve got a couple other [threads] we’re looking at. We have to rank the commentators. Some of our natural language processing (NLP) is going to have be a little bit different. There’s a lot more—believe it or not—slang on Reddit than there is on Twitter. So there is some development work—there is a good amount of development work associated with it,” Gits says.

Now that Reddit rocketed to priority status for Gits and his company, scoring criteria for Reddit will be generated on the same metrics it uses for tweets. The first objective of the scoring process is to pinpoint how influential certain posters are, followed by their accuracy. In some instances, they will find posters who are not particularly influential in terms of engagement from other users, but are uncannily accurate when talking about certain stocks; even if they have zero engagement, those users will rank higher than other high “influencers.”

Cancer is bad. Curing cancer is not bad. You’ve got to make sure your NLP knows the context of the conversation.
Joe Gits, Social Market Analytics

“We’ll bucket it by account, and then we’ll bucket it by security. So this is a conversation on Tesla. This is a conversation on Sarepta. And those conversations are different. Sarapeta is a drug company, so you’ve got to be able to handle that differently. Cancer is bad. Curing cancer is not bad. You’ve got to make sure your NLP knows the context of the conversation,” Gits says.

Luckily for Social Market Analytics, it has trained its model on 10 years of Twitter data, and Gits says its topic model has gotten good at picking up things like sarcasm, slang, and nuance. For example, if a client is monitoring Tesla’s stock, the model will suggest they also look into Twitter conversations centered on Elon Musk and SpaceX, but wouldn’t offer the same suggestion for a conversation about electricity pioneer Nikola Tesla. By contrast, the separate model the company uses to parse and analyze Edgar filings would not serve as a decent primer for taking on the world of Reddit.

Risky business

This saga has implications far beyond data scientists, machine-learning and NLP models, alternative data providers, and alpha. In the long term, how this plays out in the risk arena may be even more important than others.  

Twenty-two days before Reddit broke the market and the internet, a mob in support of former president Donald Trump descended on the US Capitol, stormed the building, and sent members of Congress, who were assembled to count the November election’s electoral votes and certify then-President-elect Joe Biden’s win, into hiding for hours. Five people died while more than 100 were injured. Despite a slew of indications—on sites like Reddit, Twitter, Facebook, and 4chan—that some attendees were planning violence, the 1,200 Capitol police on duty that day had little more than the short metal barricades one would find at a concert standing between them and a mob, armed with chemical irritants, lead pipes, and tactical gear. An unprepared assembly of law enforcement was quickly overpowered.

The common thread that January 6 and January 28 share is this one: Those in charge of finding these early warnings were either looking in the wrong places, or shrugging off what they thought was nonsense online talk. This isn’t to say that what goes viral on social media is gospel—it is not. But it can have a real-world impact, often frequently and swiftly.

The role of risk officers and compliance specialists is to anticipate and prepare for the worst. These professionals know, probably better than most, what can happen without at least some catastrophizing. In late February of last year, as Covid-19 was spreading across continents, Miranda Morad, MarketAxess general counsel for the Asian and European regions, was getting antsy. Having just come back to the UK from Canada, she attended a large meeting where she refused to shake anyone’s hand for the first time.

“I was the only person doing it, and they were all laughing at me. So I then said I’m not going to into the office—I’m not going to take the risk. And it was a while before reality struck,” she said during an interview for an upcoming WITAD Awards article. “It takes a while before people hear you. The natural instinct is it’s not that bad. It’s not going to happen. It’s unimaginable.”

From voting booths, to supply chains, to financial markets—there’s market structure. So what we’re really saying is we don’t understand the impact of this market structure.
Bob Sloan, S3 Partners

While this January and the pandemic-plagued year before it felt foreign and ghastly, Chris White, CEO of bond pricing platform BondCliq, asks whether humans have ever really understood the world around them.

“I think that chaos has been the norm. I think what you’re seeing is communication changing states, like the way that water goes from ice to liquid to gas. You’re seeing it literally change states, in which now the communication that we rely on to interpret the world has moved into a new medium. And actually any time there’s ever been meaningful innovation in communication—where all human beings can communicate at the same time—you get a massive shift in culture,” White says.

Take, for example, when German theologian Martin Luther translated the Bible, previously only found in Latin, to German in the early 1500s. Thanks to the growth of the printing press at the time, the new text was disseminated quickly among other Germans, allowing them to interpret scripture in their own ways. When Luther, according to lore, nailed his 95 Theses to the doors of Catholic churches in Wittenberg, it would jumpstart the Protestant Reformation, a decades-long rejection of the church that resulted in division between Roman Catholicism and several new Christian sects that still exist today.

A time-honored tradition of the human experience, rebellions and reckonings come and go, and certainly for the people who live through them, it can feel like the rug has been pulled out from under them. For everyone who comes after, it’s just history.

As topics such as media literacy, digital ethics, AI ethics, and digital curation become urgently needed in an increasingly digital world, some are recognizing that as much as institutions are buyers and sellers of securities, they’re fundamentally more like us—buyers and sellers of information.

S3 Partners’ Sloan says that one operating principle for his firm, and one that he encourages other data providers to adopt, is to focus on the “what” and never the “who.” If the names of people and organizations are tied to your data, then your business is about identity. And if your business is about identity, then you’re not really in the data business—you’re in the information business, he says.

“Who’s being exploited?” Sloan asks. “If we look at information as a bid and an ask … there’s market structure in everything, right? Everything. From voting booths, to supply chains, to financial markets—there’s market structure. So what we’re really saying is we don’t understand the impact of this market structure.”

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe

You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.

‘Feature, not a bug’: Bloomberg makes the case for Figi

Bloomberg created the Figi identifier, but ceded all its rights to the Object Management Group 10 years ago. Here, Bloomberg’s Richard Robinson and Steve Meizanis write to dispel what they believe to be misconceptions about Figi and the FDTA.

Where have all the exchange platform providers gone?

The IMD Wrap: Running an exchange is a profitable business. The margins on market data sales alone can be staggering. And since every exchange needs a reliable and efficient exchange technology stack, Max asks why more vendors aren’t diving into this space.

Most read articles loading...

You need to sign in to use this feature. If you don’t have a WatersTechnology account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here