How We Manage a Portfolio of Crypto Assets with ML on AWS
Intro to the Crypto Industry
The winter season in the crypto industry (as well as in any other industry) is the best time to build without distractions from hyped and overvalued topics. One thing that is not going away is cryptocurrency trading and investment management.
The wealth management industry is an intersection of traditional finance, product management, and financial technologies. In this article, we want to describe our technological approach to building solutions for investors who want to have exposure to the crypto industry, although safely and with protection from the market’s downsides.
Just last year, crypto hedge funds showed 150% growth with an average AUM of $58.6M. As well, 67% of traditional hedge funds who currently invest intend to deploy more capital by the end of 2022. You can find more market information in the recent PwC report.
Although crypto is associated with high, even abnormal, returns we need to perform in a way that serves to protect investors from the universe of risks that are inevitably related to such high performance.
We will discuss how to cover risks related to volatility, uncertainty, and drawdowns using machine learning and data science, as well as risks related to the nature of trading instruments, strategies, and infrastructure with the help of the AWS toolkit.
Challenges That We Faced
Let’s have a look at the main challenges we will have to face while developing such a system:
- Continuous data collection
- Reliable and repeatable modeling
- Execution at the exchanges
- Market risk management
- Security and asset protection
We want to split the above points into two groups: those that can be solved with the AWS infrastructure (data, execution, and security) and those with the algorithms (modeling reliability and risk management).
The solution architecture for the trading bot is fully serverless.
Serverless architecture is a way to build and run applications and services without having to manage infrastructure. Your application still runs on servers, but all the server management is done by AWS. This allows us to decrease operational costs and focus on the core product instead of worrying about managing and operating servers or runtimes.
AWS Services Used
AWS Lambda runs code without thinking about servers or clusters. The service allows you to build event-driven functions for easy communication between decoupled services and reduce costs by running applications during periods of peak demand without crashing or over-provisioning resources.
Amazon SageMaker helps data scientists and developers prepare, build, train, and deploy high-quality machine learning (ML) models quickly by bringing together a broad set of capabilities purpose-built for ML.
Amazon DynamoDB is a fully managed, serverless, key-value NoSQL database designed to run high-performance applications at any scale. DynamoDB offers built-in security, continuous backups, automated multi-Region replication, in-memory caching, and data import and export tools.
We want to leverage several different data sources for our decision-making engine. The main reason for this is that different factors influence the crypto markets in different ways. Some of these factors are purely related to prices, volatility, and other market-related data. Deeper analytics might involve on-chain data analysis, transactions, and developer activity. Of course, we cannot rule out the mood of the crowd and related public sentiment data from social media like Reddit, Twitter, and news headlines.
We have reviewed several data sources and providers who could help us cover the data needs. We liked CryproQuant and Academy Santiment the most and decided to stay with the latter, thanks to the comfort of Python API. The pricing and data coverage for both products is very decent.
First, we analyzed the universe of instruments we could trade and have in a portfolio. From a historical data perspective, it is worth training the models on the biggest and oldest coins (BTC, ETH, ADA, LINK, XRP, TRON, DASH, etc.). However, with this approach, we could fall into the trap of survivorship bias. We cannot find all the market patterns by only looking at the instruments that have survived until today; we also need to learn from the projects that disappeared from the radar back in time to be able to identify them.
So, in the end, we have reconstructed the historical universes of the top 20 market cap coins at each moment in time in order to train and test models in actual environments for different market conditions, not on the instruments that have survived through to today. You can read more about survivorship bias in the trading analysis here at Investopedia.
One of the things that we focus on most while modeling in financial markets are not the actual machine learning model configurations, but the features we extract from the raw data, their combinations, and final economic interpretations of the predictive values.
We have previously shared some insight on various feature importance algorithms in this blog article — AI in Finance: advanced idea research and evaluation beyond backtests.
Here, we want to show an interpretation example of one of our models:
On the following universe of tickers:
You can see how development activity and big fish price movements are essential in explaining the price of a particular instrument at a given moment in time. However, of course, market drivers can change over time. This is why the models have to be re-trained and re-analyzed over time. As well, in the above-mentioned blog article, we show how crucial it is to perform feature importance on cross-validation to ensure the generalization and robustness of our findings.
We should split the evaluation of our performance into two parts.
The first part is faulty technical, even mathematical, and related to the analysis of machine learning models. We need to deal with market uncertainty and stochastic nature (compared to texts, images, and audio that always have stable fixed meaning and interpretation) which affects our judgment and analysis. We have described several useful metrics and tactics on this topic in this blog article: AI in Finance: how to finally start to believe your backtests.
The next part is strategy allocation and financial performance analysis. We use traditional investments in the stock market (SPY index) and the crypto market (just holding BTC) as benchmarks and have built a strategy where we:
- Train ML models on survivorship-free datasets
- Use the models to predict and trade BTC
- Apply CPPI framework for risk management and drawdown handling
- Staking USDT in the case of holding a portion of capital cash
You can see visual and metrics performance below:
As we can see, our strategy (mint capital) has comparable annualized returns with BTC (hence, we give investors similar returns). Still, our risk measures are better (almost 3x lower drawdowns, 2x lower VaR).
As well, we have evaluated several historical market stress events to demonstrate how we would have protected investor capital in such situations:
As you can see, our strategy has much lower drawdowns compared to the stock market and crypto benchmarks.
Security is part of any solution for portfolio management. No one wants to lose their access keys from their portfolio of assets. One of the important things is to establish data security, which includes encryption for data in transit and at rest.
AWS Key Management Service (AWS KMS) helps you create and manage cryptographic keys and control their use across a wide range of AWS services and in your applications. AWS KMS is a secure and resilient service that uses hardware security modules to protect cryptographic keys.
AWS Certificate Manager (ACM) is used to provision, manage, and deploy public and private SSL/TLS certificates for use with AWS services and internally connected resources. This allows us to make sure that data is encrypted in transit.
In this article, we have reviewed how to build a crypto portfolio management strategy on AWS services and use machine learning and alternative data mining techniques. In addition, we focused strongly on protecting investor capital from market (advanced analysis and machine learning) and execution risks (AWS infrastructure).
In the future, we aim to improve the system by trading more instruments and adding more data sources such as CBOE/CME futures, fear & greed index, commodities, “recession-proof” instrument correlations, and mining activities. We hope this article will help you understand the landscape of algorithmic and infrastructure solutions to build your own investment product. Feel free to contact us if you have any questions or need additional support.