Caveat emptor: Hedge Funds’ use of alternative data

Aglaya Nickolova | Senior Analyst
12 min read
Download Article

The Future of Investing or a Legal Time Bomb?

Aurum Research Limited (“Aurum”) monitors over 4,000 active hedge funds with over $3 trillion in assets under management[1].  We are seeing increasing numbers of hedge funds using alternative data in their investment process. With the experience of making hedge fund investment recommendations for nearly 25 years, Aurum has created an operational due diligence process designed to evolve and adapt to changes like this in the industry. In this paper we explore what alternative data is and how it is used by hedge fund managers.  We also share our findings on what the nine key areas of consideration should be both for managers using alternative data in their investment process and for investors conducting due diligence on such managers.

16.1 zettabytes of data were generated globally in 2016 and the forecast is for this to grow ten times by 2025[2]. A zettabyte is approximately equal to 1 billion terabytes. To put that in context if each terabyte in a zettabyte were a kilometre, their combined length would be equivalent to 1,300 trips to the moon and back.

Hedge Funds are expected to pay almost $2 billion in 2020[3]for the collection and storage of alternative data. Alternative data usage is becoming more and more popular with managers looking for new sources of alpha in their investment process. But a lawsuit involving LinkedIn could change what data hedge funds are legally allowed to collect[4]. The case sheds light on an important consideration for hedge funds and their investors: what are the legal and regulatory implications of using alternative data?


LinkedIn and the importance of precedent

In August 2017, a Californian judge ruled that public LinkedIn pages can be scraped by hiQ, a company which uses bots to inform employers about their employees’ LinkedIn activity, despite the fact that LinkedIn’s terms of use forbid the use of any web-scraping bot. hiQ sued after receiving a cease and desist letter from LinkedIn, which blocked hiQ’s bots and threatened to bring legal action. The decision by the court has created a backlash over the perceived invasion of users’ privacy. The case is currently in the appeal process and is being closely watched by lawyers and hedge fund managers. Managers are faced not only by the legal risks, but also reputational damage if they scrape data they are not supposed to. As Evan Reich, Data Strategist at $20bn BlueMountain Capital Management noted[4] :

“No dataset is so good that it is worth betting the firm on.”


Alternative data a brief overview

Broadly speaking, data harvested from social media, credit card transactions and smart phones is commonly known as “alternative data”. Alternative data is thought to provide a “big picture” view of consumer trends and behaviours, which can bring greater insight to the investment process. See my colleague, Anthony Pratt’s article for a more in-depth look at what alternative data is.  As of July 2017, data provider Eagle Alpha, considered as one of the sector’s frontrunners, identified 482 datasets across 24 different categories, out of which they reported consumer transactions and geo-location data to be the most popular categories with clients[2]:


Alternative data categories


Source: “Alternative Data – Legal Considerations Setting the Scene”, from Use of Alternative Data by Investment Firms – legal considerations,
Simmons & Simmons, 26 July 2017

The examples of the usefulness of alternative data are numerous. The observations made from comparing satellite images can show whether mining operations are expanding or contracting over time, or whether free parking spots at Walmart, for example, have increased or decreased over the quarter. This type of data could provide an indicator as to whether earnings will beat or miss forecasts. Credit card data could point to consumer spending trends at specific retailers.

Collecting and analysing new types of data requires skills that could be too costly for an organisation to develop in-house especially when the gains of doing so are not certain. Data vendors provide this service to many hedge funds. Once the hedge funds acquire the data they process it, identifying trends in consumer behaviour. They then use this data in addition to more established information sources, such as economic data releases and earnings reports to make investment decisions.

In 2015, BlackRock warned that asset managers who do not adopt alternative data will be left behind[5]:

“We believe that in order to generate sustained alpha, investors should embrace acquiring, analysing and understanding the fast growing universe of data. Those who are unable to do so run the risk of falling behind in a rapidly changing investment landscape.”

The EY 2018 Global Hedge Fund Survey (“the EY survey”) notes that the majority of hedge funds either use or are evaluating the use of next-generation (“next-gen”) data – a material increase from two years ago when around 50% were in that category. According to the report, “nearly 60% of hedge funds who use next-gen data are doing so to support their fundamental approach.”[6]

Bloomberg announced recently that it would sell alternative data sets through its terminal.[7]


The more the merrier, surely?

As managers invest in the utilisation of alternative data, data engineering and data science are becoming even more crucial. The EY survey notes that “managers of all sizes and strategies are trying to make impactful investments so that their business is not left behind as leveraging data as a competitive advantage becomes ever more crucial.” Investments in alternative data capabilities are being made across three areas:

  • Data engineering, sourcing data, data storage/management, extraction, transformation and load processing
  • Data modelling and advanced analytics
  • Data visualization/reporting

Managers are also feeling the pressure from investors who are expecting them to leverage alternative data in order to generate alpha. According to the EY survey, 43% of investors believe that it is “critically important” for fund managers to use next-generation data and artificial intelligence to support their investment process. Another 33% believe it is “somewhat important” with only 24% stating is “not important”.


Why doesn’t obtaining large quantities of alternative data necessarily result in alpha production?

Data needs to be carefully processed, filtered and analysed. Greenwich Associates conducted a study in 2016, collecting responses from 46 asset managers and 23 hedge funds[8]. The study outlines the top five obstacles that organisations face when adopting alternative data as:

  • Prohibitively high fees to acquire datasets
  • Internal procurement processes are too cumbersome/slow
  • Lack of time available to analyse and evaluate data
  • Management not convinced of data’s value
  • Difficulty understanding/working with data sets that are not customised for specific use

Despite the challenges, more and more managers are exploring how to incorporate alternative data into their investment processes with expert data science teams being engaged to obtain and process the data, either directly as raw data or from data vendors.

The increased spending on alternative data by the buy-side is shown below:


Total buy-side spend on alternative data ($m)



The history of incorporating alternative data into the investment decision-making process is relatively short and the universe of managers that rely exclusively on alternative data to make investment decisions is small. However, this is seen as an area of growth as managers are seeking to identify new ways of generating alpha. The Greenwich study revealed that 80% of managers want greater access to alternative data.

Sophisticated methods of data gathering and analysis have boomed in recent years. There are now over 400 alternative data providers according to, a site run by YipitData, one of these providers.[7]The growth of the space is illustrated by the chart below.


Number of alternative data providers


There have already been some high-profile legal cases brought against companies due to the potential misuse of alternative data, namely Cambridge Analytica, LinkedIn and others. However, global regulators have not yet provided much guidance as to what hedge fund managers should or should not do with regards to the collection and usage of alternative data.

Hedge funds’ investment staff should be aware of the potential risks around the use of alternative data. We would also expect the compliance officer and general counsel to be engaged early on in the due diligence process.

So if both the use and availability of alternative data is on the rise, what are some of the key issues hedge funds need to consider when processing raw data or engaging data vendors? And what are the key questions investors should ask their managers when conducting due diligence?


9 Key considerations for hedge funds and hedge fund investors around using alternative data

“It takes twenty years to build a reputation and five minutes to ruin it.”

                                                                                              – Warren Buffet

There are a number of key considerations in the process of evaluating data sets and data vendors:

  1. Material Non-Public Information (“MNPI”) – the risk of MNPI is one of the key risks around the use of alternative data. The SEC has a strong focus on insider dealing and while so far there has been little guidance from the regulator on insider dealing best practice in the alternative data space, this is likely to change. We expect clear rules and regulation to be produced over time. While awaiting for more regulation in the space, managers should consider key questions such as what is the source of data?, is that source legitimate? and is the data time sensitive?There are also regional differences in MNPI law in the UK versus the US with the UK rules being broader. The US has a narrow definition of insider dealing and information can be material and non-public but still be allowed as long as it is not misappropriated and no fiduciary duty is breached.[9]
    And what about MNPI and exclusivity? If a manager has been given exclusive access to a data set, does that mean that there is MNPI? Most hedge fund Compliance Officers we have spoken to do not like exclusivity, despite the potential edge over competitors. The risk of regulatory action around exclusivity is not seen as high across the industry, but this could soon change.
  1. Privacy – is there any Personally Identifiable Information (PII) in the data set being acquired? How has it been obtained? Has it been adequately “scrubbed” or anonymised? Could combining two data sources that don’t individually hold data considered PII create data that can be used to identify individuals? With the introduction of EU’s General Data Protection Regulation (GDPR), which came into force in May 2018, this topic is in the spotlight. Data relating to geo-locations, credit card transactions and mobile app usage must be anonymised under GDPR rules.
  1. Copyright – even if the data is public, it does not automatically mean that it can be used for commercial benefit. What are the permissions in the data provider’s “terms and conditions”? Are managers legally allowed to use this data? Are the vendors allowed to sell the data? What reps and warranties from end users have been received? The legal review and understanding of the data agreement by either the compliance officer or general counsel of a hedge fund is a crucial part of the due diligence process. Hedge funds that purchase alternative data would need robust contractual and legal protection to ensure that they can use this data without recourse.
  1. Website scraping – “scraping” broadly refers to collecting electronic data from a third party. Typically, software is used to automatically harvest and sort large amounts of data from public online sources, most typically websites.[10] Firms engaged in web scraping need to manage the associated compliance risks due to the ever growing body of regulatory deliberation associated with the practice. According to Integrity Research, an advisory firm founded in 2003 to analyse the global investment research industry, fewer than 50 web scraping cases have gone to US courts, which is a very small number given how widespread the practice is.[11]That said, prominent sites pay close attention to usage patterns and some have taken legal action. Does a website allow for data to be scraped? Can the data be used for investment purposes? Again, hedge funds need to be aware of the small print in the terms and conditions of data usage.
  1. Data quality – is the data considered to be of good quality? What geographies has the data been sourced from? How is the data going to be used? Has the data been processed correctly given its nature? Techniques of data processing differ – textual data on the web tends to be filtered by natural language processing, while geolocation data often needs to be mapped by geo-fencing coordinates.[11] Geo-fencing is a technology that uses GPS coordinates or RFID signals to draw a virtual boundary in space and trigger certain actions on the basis of the boundary[12]. A detailed review of the procedures used to gather and manipulate the data is required in order to validate its accuracy. This should be conducted not only at the initial due diligence on a vendor, but also on a regular basis thereafter.
  1. Data processing – who is “scrubbing” the data? Is that exclusively done by the vendors? Does the manager have an in-house team to scrub data? What happens if data provided by vendors has not been properly “scrubbed”? Is external counsel utilised on any “grey” areas? How is data aggregated? Has the manager’s investment team been given access to data from a vendor in order to test it without prior involvement of their compliance officer or general counsel?
  1. Model risk – how does the alternative data fit within a model? Is the data being tested first before being integrated? There is a risk around implementing the data as part of an existing investment strategy. If the data is incorporated in the model incorrectly then inconsistent trading signals could be produced.
  1. Users – who are the other users of the data set provided by a vendor? How many and what type of users are they? Some managers prefer to use data vendors who service firms across a range of industries, not just finance. Is there a risk of crowding whereby too many users rely on the same data set? Is the data set provided to all users at the same time or are some users benefitting from early access?
  1. Cost – who pays for the data sets and vendors – the manager or the fund? Is there a “cap” on the data costs? Is there a set budget in place? How can costs be minimised without diminishing the quality of the end result? Could it be cheaper for a manager to get data sets from data vendors rather than obtain raw data and process it in-house?

As hedge fund managers’ use of alternative data becomes more prevalent, we expect that alternative data policies will be put in place and provided to investors as part of the due diligence process. For investors, it is crucial to verify who is responsible for the policy, ideally the compliance officer or general counsel.  They should also verify how often the policy is updated and how well it fits the hedge fund’s specific needs and uses of alternative data. It is also important to discuss any issues that have arisen from the due diligence on data sets and data vendors as well as any steps taken by the manager to address these issues[13]. Have any vendors been vetoed? Why?

In addition to an internal policy, hedge fund managers should have vendor questionnaires in place, to be used as part of the initial due diligence on data vendors as well as on a regular basis thereafter. Speaking to the compliance officers of some of the funds that Aurum monitors, we understand that the recently released AIMA Due Diligence of Alternative Data Vendors questionnaire[14] is a good starting point, but that needs to be tailored according to the specific needs of the fund. They also cite the International Data Standards Organisation (“IDSO”), an organisation of alternative data producers, distributors and buy-side users, who have come together to design a framework for alternative data best practices, as a valuable source of information.

Aurum’s Operational Due Diligence team has developed a series of questions relating to alternative data and data vendors. These are discussed in depth with the compliance teams of the managers who use alternative data as part of their investment process. We expect managers to be engaged, aware of the risks and able to demonstrate reasonable steps in ensuring there is a strong controls framework in place.

Beware, don’t despair!

“An investment in knowledge pays the best interest.”

                                                               – Benjamin Franklin

Alternative data is here to stay and if used appropriately it has the potential to provide new sources of alpha. However, learning how to acquire and analyse it properly is crucial for any hedge fund manager wanting to avoid the headline risk associated with data misuse. By developing robust standards, alternative data participants can reduce the associated legal, regulatory and compliance risks.

Investors should also be taking these potential new risks into consideration and performing enhanced due diligence. They need to ask the right questions to ensure that managers have appropriate governance frameworks in place and remain aware of the key risks alternative data usage poses.

  1. References in this paper to “Hedge Fund Data Engine” or “SE”, in the context of performance or statistical data, is a reference to the proprietary database maintained by Aurum containing data on over 4,000 active hedge funds representing in excess of $3 trillion of assets.  Information in the database is derived from multiple sources including Aurum’s own research, regulatory filings, public registers and other data providers.














You may also like

Monthly hedge fund performance review – August 2023


Hedge fund performance was moderately positive in August; the average hedge fund net return across all strategies was 0.16%. Most hedge fund strategy groups…

Monthly hedge fund performance review – July 2023


Hedge fund performance was generally positive in July; the average hedge fund net return across all strategies was 1.05%. All hedge fund strategy groups…

Hedge fund industry performance deep dive – H1 2023


In summary… The hedge fund industry* was up 3.4% in H1 23 with performance being heavily weighted to the start of Q1 and the end of Q2. The best performing…

Aurum’s quarterly review – Q2 2023


Performance for Aurum’s commingled fund of hedge funds $US classes ranged from -0.9% to +0.2% in the second quarter of 2023. The outlook at the beginning…

Monthly hedge fund performance review – June 2023


Hedge fund performance was generally positive in June; the average hedge fund net return across all strategies was 1.40%. Equity markets rallied in June,…

Quant hedge fund primer: demystifying quantitative strategies


QEMN   |   Statistical arbitrage   |   Managed futures/CTAs   |   Quant macro/GAA   |   Alternative risk premiaIn summary Quantitative…

Monthly hedge fund performance review – May 2023


Hedge fund performance was flat in May; the average hedge fund net return across all strategies was 0.02%. Market volatility was notably higher than observed…

Monthly hedge fund performance review – April 2023


Hedge fund performance was generally positive in April; the average hedge fund net return across all strategies was 0.51%. April was a month of lower volatility…

Hedge fund industry performance deep dive – Q1 2023


In summary… Global growth surprised markets positively in Q1 2023. Global equities*** and bonds** returned 6.35% and 3.19% respectively. Strategies…

Aurum’s quarterly review – Q1 2023


Performance for Aurum’s commingled fund of hedge funds’ $US classes ranged from marginally negative to -2.4% for Q1. While Aurum’s portfolios delivered…

Monthly hedge fund performance review – March 2023


Hedge fund performance was mixed in March, the average hedge fund net return across all strategies was -0.39%. Strategies exhibiting a higher beta to equities…

A guide to hedge fund fees and redemption terms


In summary Investors determine whether hedge fund fees and redemption terms are appropriate through the manager selection process. Consideration is given…