Alternative data investing decrypted

Aurum Research Limited
10 min read
Download Article

Imagine that you are sitting in your parked car on a hill overlooking a truck depot. You’ve been counting the trucks coming in and out for so long that your legs are becoming numb and the smell of the leftover pizza on the passenger seat is starting to make you feel a little sick. Nevertheless the healthy paycheque that you’re getting from the guy in the sharp suit to do this makes the boredom worth it. Little do you know that you’re a notable cog in the company research process for some highly sophisticated investors.

In the age of alternative data, this type of activity is fast becoming a thing of the past. Historically equity analysts would seek to combine traditional data sets produced by a company, such as earnings announcements, SEC filings, and press releases, with more qualitative insights, such as talking to company management. The analyst could also employ someone to count trucks to try and gain more timely company insights. However, due to the recent parabolic increase in the amount of alternative data that is produced, an analyst can now gain these timely insights by purchasing this data, leaving the truck-counter out of a job.

What is this new world of data?

It is best to start with some definitions. Over the past few years the buzz phrases, ‘big data’ and ‘alternative data’ have often been used interchangeably, and there is indeed some overlap between the two.

Big data is often described as “data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them”[1]. However, as much as big data is often associated with various new data sources that are being created by the internet, mobile use and other sources, technically speaking it could just refer to a data source that has been readily available for some time but is simply very large in nature.

Alternative data refers to “any data arising from sources not considered traditional financial data, but which may offer market insights”[2]. This could take the form of, for example, one number per quarter, not necessarily a huge unwieldy data set. Many companies that collect alternative data, such as credit card businesses, satellite and location tracking businesses, mobile devices, review websites and other web sources, are often unaware of the various uses that other types of firm have for their data.

There is now a sizable industry dedicated to the collection and storage of data, as well as the hardware associated with it. However, it may surprise some to know that there are also data brokers and data set advisory businesses, such is the range of potential data sets for an investment firm to buy. Additional further layers of complexity can come into play when we consider that data often originates in different forms. Valuable information may be held on websites, for example, and there are firms that scrape[3]this data, check its accuracy, and repackage it in an easy to consume data set that can be sold to interested parties that may be seeking to gain a greater understanding of a firm, industry or economy as a whole.

GDPR’s introduction on May 25th will force data vendors to be rigorous when it comes to scrubbing data to ensure that all Personally Identifiable Information is removed. They will need to ensure that their systems and processes are up to this new scrutiny or risk hefty fines (potentially up to 4% of annual revenue). In addition, consumers will be asked to actively confirm that they are happy for their data to be used by third parties. The impact of this on data vendors is yet to be seen, but it seems likely that there will be a fall in the amount of new data sets available. In addition, there is no exemption for third party data that has already been gathered without consent, so there may be large amounts of pre-existing data that become unusable. It will be interesting to see the impact, if any, that this has once GDPR regulations settle in.

Unstructured data refers to data that is not organised in a pre-defined manner, such as textual information. Companies specialising in natural language processing will look to turn this into information that is easy to consume; for example, they may count the number of positive words that a CEO says in a company update, or gauge whether public sentiment has become more positive towards a particular company in the last day based on Twitter activity. These may feed into quantitative scoring systems, so that an analyst can easily incorporate what was previously textual information into a model. Other companies go one step further, taking the data and looking to sell a ‘trading signal/algorithm’ that a fund manager could use. Note that it is in the area of unstructured data that machine learning techniques are often used and the analysis of the data becomes even more complex.

How are investment managers reacting to this new world of data?

This new challenge can be tackled in a variety of different ways and will depend on what the CIO is hoping to accomplish: do they have a specific goal they wish to achieve from the analysis of a particular data set (e.g. to gauge how well a specific company or sector is currently doing), or are they simply looking for underlying patterns in the data that may have some predictive power over something in the market? In general, it is more likely that a discretionary manager would have a specific goal in mind that they hope will be addressed through analysis of the data, whilst a quantitative manager would be looking for broader patterns, although this is not always the case. Indeed, some of the larger discretionary equity long/short managers have allocated considerable resources to the acquisition and analysis of new data sets, whilst some quantitative managers consider that having some form of discretionary decision-making process in place regarding what they wish to achieve, before they begin crunching the numbers, can be more efficient.

In terms of the size of team required to handle these new data sets, a focused discretionary manager may have a relatively small data team that buys pre-cleaned structured or unstructured data and then manages to compute clear predictive signals. On the other hand, a large quantitative manager may look to build its own data sourcing unit that only purchases raw unstructured data sets in the hope that its own data cleansing, internal natural language processing or machine learning units can generate some form of useful trading signal. Whether a firm’s data strategy requires teams in data vendor relationship management, data cleansing, and data procurement, never mind how to actually analyse the data, is one of the biggest questions facing many managers today.

A less intensive approach may involve simply purchasing a licence to incorporate ‘ready-made’ signals generated by data vendors. A major consideration of this approach, however, and for many data sets that are relatively well-known, is just how unique the alpha from the data set is, and how heavily utilised it is by the rest of the marketplace. A sound data strategy involves finding out who the other regular purchasers of a data set are; highly commoditised data sources, such as Google Trends, are more likely to be removed from the investment process before other, less widely utilised, sources. Building one’s own proprietary data set from a unique source is clearly a sound way to minimise the risk that the alpha generated does not degrade as quickly as that from other data sources, although it is clearly more time-consuming and challenging, and is only likely to be achieved by the larger and better resourced firms.

Another approach to this problem is to try to agree a period of exclusivity on a data set. Here a manager may offer to help a data vendor understand who and what its data set may be useful for in exchange for a period where the data is not sold to anyone else. The vendor may then be able to become more efficient at monetising the value of the data it is generating.

What problems are funds encountering?

Whether alternative data sets are more effective for quantitative managers, or as an additional quantitative tool for discretionary managers, is interesting to consider. A wider range of alternative data sets appear to be available to discretionary managers, insofar as a data set that is released infrequently, for example quarterly, may have insufficient data points for a quantitative manager to gain enough comfort in the validity of any sort of back test. Similarly, if the data set is large but has only been available for a couple of years, a quantitative manager may run the risk of overfitting its signal, if there is not a sufficiently large data set for a model to learn a trading signal or pattern.

Further problems that a quantitative manager may encounter can involve changes in accounting rules or ways in which data is captured by the vendor. A discretionary manager is likely to be better able to piece together changes in the data than a quantitative program, which is likely to instantly process a trade regardless of these changes. Additionally, understanding how a data set may change for some form of structural, rather than cyclical, reason, such as a regulatory or accounting change, is clearly an advantage for the discretionary manager. Another way a quantitative signal could slip up could be where new external factors begin to affect the data. For example, a trading signal based upon how frequently a company’s name is searched for on the internet may briefly become ineffective if the reason for the search changes. An example of this might be that users are looking for information because the company has become a merger target, rather than potential customers looking for the company’s website.

Another challenge is that different forms of data set are often appropriate for different sectors of the market. For example, social media may be useful for analysing firms that are in the public eye, like consumer brands, but less useful for industrial firms. Indeed, analysis of the consumer sector seems to be the most natural fit in terms of extracting value for many of these data sets – consumers talk about which shoes are trendy at the moment but how often do they talk about their favourite oil services company? This is changing, however, as more and more types of data set become available.

It is interesting to see how discretionary managers integrate the use of this data into their processes. Data must, however, be presented to them in a useful manner that they can quickly integrate into a broader research process. Giving too much information may mean that important nuggets get lost. Additionally, there may be cultural challenges. Can ‘old school’ traders adapt to a new world where alternative data sets paint a picture of what is happening more quickly than traditional approaches, and can they handle the possibility that a more data savvy trader is suddenly doing better than they are? With investing, the name of the game has not changed, in that you need to have the right information to ensure you not only make the right call, but do so earlier and more profitably than the competition.

How are new data sets changing the investment landscape?

There are clearly a wide range of changes that are occurring. The first is that a firm’s data strategy is now a key differentiator. What is the budgeting strategy for the additional data costs? Investors have become accustomed to seeing quantitative managers charge certain costs for data or research to the fund, but this is largely a new area for discretionary managers. How might investors react to an increase in costs being charged to discretionary funds, particularly when it may take time for additional costs to lead to improved performance? It is also possible that discretionary firms that are concentrating more on having the best data may decide to concentrate less on research into more qualitative factors.

Earnings season is typically a critical period in the quarter for a stock’s performance, with a large proportion of a stock’s price move for the quarter potentially occurring on earnings day. Discretionary and quantitative managers have traditionally implemented a range of strategies around this day. It appears, however, that the increased use of alternative data sets is starting to lead to reduced price reaction on earnings day, with the market beginning to price earnings for the quarter much earlier than when the earnings announcement comes out. For example, managers can now access data relating to Black Friday sales, based on credit card and geo location data, with a relatively short time lag, particularly compared to company earnings figures. This change could mean that, as the purchase of alternative data sets becomes more prevalent, the range of strategies that have traditionally been traded during earnings season may become ineffective.

Regulation and compliance in this area continues to evolve. Some firms do not allow research teams to have exclusive use of one data set, due to concerns that regulators may see this as an unfair advantage. The law around data scraping is also maturing, whilst privacy and data law is very different in different geographies. GDPR is creating some alignment and data users will need to ensure that the source of data they rely on is not going to suddenly dry up because it falls foul of these regulations. Does the data vendor actually have the right to sell the data? Could data be the next area that a Wall Street scandal emanates from? One only has to look at the recent Facebook and Cambridge Analytica scandal to see how significant the impact of incorrect use of data can be.

The range of approaches that managers can take, and what questions to ask, is something that investors need to become comfortable with. Investors should understand the differences between unstructured and structured data sets, ask how many data sets a firm is incorporating into its process, and understand how asset price movements in certain markets may be starting to change due to newly available data. In the quantitative space, medium- or long-term approaches have often been seen as more traditional and less exciting than more high frequency strategies, however this growth of novel data sets is certainly leading to more differentiation in slower strategies. Whether you are a systematic trading manager, or a fundamental stock picker, the world of alternative data will continue to evolve and be a disruptive force for all market participants.

A version of this appeared in HFM InvestHedge. HFM InvestHedge is one of several titles published under Pageant Media’s hedge fund brand, HFM Global, covering investor actions, profiles, in-depth analysis of investors and investor-related issues for allocators, consultants and the investor relations community.

  1. Wikipedia (

  2. Quantifying Intuition: Mapping the data science landscape in the hedge fund industry’. Jefferies LLC. June 2017.

  3. Web scraping is a term for various methods used to collect information from across the Internet. Generally, this is done with software that simulates human web surfing to collect specified bits of information from different websites. Source: Technopedia (

You may also like

Macro hedge fund primer: uncovering the unconstrained


Global macro   |   Fixed income relative value   |   Macro emerging markets   |   CommoditiesIn summary Macro funds typically take positions…

Monthly hedge fund industry performance review – October 2023


Hedge fund performance was moderately negative in October; the average asset weighted hedge fund net return across all strategies was -0.34%. Most hedge…

Aurum’s quarterly review – Q3 2023


Performance for Aurum’s commingled fund of hedge funds $US classes ranged from +1.7% to +2.4% in the third quarter of 2023, against a backdrop of negative…

Hedge fund industry performance deep dive – Q3 2023


In summary… Five-year CAR for hedge funds was at 4.9% at the end of Q3, above bonds at -1.7% and just above equities at 4.1%. Global equities*** and…

Monthly hedge fund performance review – September 2023


Hedge fund industry performance was moderately positive in September; the average hedge fund return across all strategies was 0.08% (asset weighted net…

Monthly hedge fund performance review – August 2023


Hedge fund performance was moderately positive in August; the average hedge fund net return across all strategies was 0.16%. Most hedge fund strategy groups…

Monthly hedge fund performance review – July 2023


Hedge fund performance was generally positive in July; the average hedge fund net return across all strategies was 1.05%. All hedge fund strategy groups…

Hedge fund industry performance deep dive – H1 2023


In summary… The hedge fund industry* was up 3.4% in H1 23 with performance being heavily weighted to the start of Q1 and the end of Q2. The best performing…

Aurum’s quarterly review – Q2 2023


Performance for Aurum’s commingled fund of hedge funds $US classes ranged from -0.9% to +0.2% in the second quarter of 2023. The outlook at the beginning…

Monthly hedge fund performance review – June 2023


Hedge fund performance was generally positive in June; the average hedge fund net return across all strategies was 1.40%. Equity markets rallied in June,…

Quant hedge fund primer: demystifying quantitative strategies


QEMN   |   Statistical arbitrage   |   Managed futures/CTAs   |   Quant macro/GAA   |   Alternative risk premiaIn summary Quantitative…

Monthly hedge fund performance review – May 2023


Hedge fund performance was flat in May; the average hedge fund net return across all strategies was 0.02%. Market volatility was notably higher than observed…