Generative models in scenario generation for probabilistic analysis of investment projects
- Alfazeta
- 8 de jan. de 2024
- 5 min de leitura
Atualizado: 3 de jan.
Generative models are algorithms that aim to learn the structure of a data set and generate new samples that are similar to that data. They are widely used in a variety of areas, such as natural language processing, computer vision, and data analysis. Some applications have truly surprising performance, notably the generation of images that closely resemble real images, but were generated by a generative model. Figure 1 shows an example of an application of this type, in which a neural network was used to generate images of faces of people who do not exist, but are indistinguishable from real images.

Figure 1 Images of faces generated by a GAN. These people do not exist, the images were randomly generated by a neural network
1 - GENERATION OF TIME SERIES THROUGH GENERATIVE MODELS
When it comes to time series, generative models can play a key role in synthetic data generation. By learning the characteristics and patterns present in a time series, these models can create new samples that resemble the behavior observed in the original data. This is very advantageous when one wants to perform a probabilistic analysis, since the generative model will be able to not only generate possible scenarios, but will generate them with a probability distribution consistent with the data with which the model was trained. Thus, more common scenarios are generated more frequently than rarer scenarios, so that the results obtained truly represent the probability distribution learned by the model through the data that was used in training. However, it is important to emphasize that synthetic time series generation requires care and adequate validation.
Generative models must be trained on representative data sets and their generalization capacity must be evaluated. In addition, it is important to consider the limitations and uncertainties associated with these synthetic predictions. Figure 2 presents an example of time series generation using a generative model. Note that 5 curves are generated for each scenario simultaneously. The possibility of generating more than one curve simultaneously is very interesting, as it is possible not only to maintain the coherence of each variable generated with its history, but also to maintain coherence between the correlations obtained from the variables generated with the history observed in the data used in training. For probabilistic analysis to be successful, it is essential that the correlations between variables are representative, otherwise there is a risk of obtaining biased projections.

Figure 2 Example of scenarios generated by a generative model
2 - OPERATING PRINCIPLE OF A GAN
Generative models are based on the architecture of Generative Adversarial Networks (GAN), which were originally proposed by Ian Goodfellow and his colleagues in 2014 and presented in a paper titled "Generative Adversarial Networks". A GAN is composed of two neural networks: the Generator and the Discriminator. These two networks work together in an adversarial manner to generate synthetic samples that resemble a real dataset. Figure 3 shows an example of a GAN used for time series generation.

Figure 3 Architecture of a GAN used for synthetic scenario generation
The generator takes a random noise vector as input and generates a synthetic sample. The goal of the generator is to learn how to map this noise to samples that look real. The discriminator, in turn, takes as input both real samples and samples generated by the generator. Its task is to distinguish between real and synthetic samples. The goal of the discriminator is to learn to make this distinction accurately.
During training, the generator seeks to improve its ability to fool the discriminator by generating samples that are increasingly difficult to distinguish from real ones. At the same time, the discriminator seeks to improve its ability to distinguish between real and synthetic samples.
This adversarial training process continues iteratively, with the generator and discriminator constantly improving each other. Ideally, by the end of training, the generator will be able to generate high-quality synthetic samples that are indistinguishable from real samples, and the discriminator will have difficulty making this distinction. Thus, GANs are capable of generating synthetic samples that capture the characteristics and patterns present in real data, allowing the creation of new samples that resemble the original data set.
Figure 4 schematically presents how more than one curve is generated at the same time. What is normally done is to consider that the stacked time series are similar to images, so that each row of the matrix contains the values of each of the time series that will be generated and each column is a specific time. The generation is done using convolutional networks that are capable of generating images (sets of curves, in the case of time series generation) that make sense as a whole, not just individually. The very way the network is trained means that it learns to generate the complete scenario, that is, all the variables simultaneously. Since the generator has learned to “trick” the discriminator, it will necessarily have learned the correlations between the variables.

Figure 4 Schematic representation of time series generation using GANs, with generator in encoder-decoder architecture
Another important aspect to note is that the GAN will learn to generate scenarios that maintain correlations between variables, if they are correlated. If there is no good correlation between variables, the GAN will also learn this, and these variables will be generated as if they were independent, when creating new scenarios using the GAN generator.
3 - HOW CAN GANs HELP IN PROBABILISTIC PROJECT EVALUATION AND RISK ANALYSIS?
Probabilistic project evaluation is an approach that takes into account uncertainty about the future behavior of variables relevant to project performance. Instead of considering only deterministic values, this methodology uses probability distributions to model different possible scenarios. This allows for a more comprehensive and realistic analysis of the risks and potential outcomes of the project. By assigning probabilities to the different outcomes and calculating financial indicators, it is possible to obtain a more accurate view of the viability and profitability of the project, helping to make more informed decisions.
Figure 5 shows an example of a project NPV histogram obtained using scenarios generated by a GAN. As well as NPV, other metrics can be calculated, such as incremental profitability and breakeven price.

Figure 5 NPV histogram of an investment project obtained through multiple realizations generated by a GAN
Note that there is a very large variation in project performance depending on the scenario, causing the NPV to vary from approximately 92 to approximately 570 million dollars with 90% confidence. With this information, the decision maker can assess whether this level of uncertainty is adequate or whether it is necessary to redefine some characteristic of the project (for example, reduce the scope) to bring the risk to an acceptable level. A correct assessment of the project's financial exposure can also lead to the implementation of more appropriate risk mitigation strategies.
It is important to mention that several other methodologies could be used to generate scenarios, not necessarily using Artificial Intelligence techniques. We list below some of the advantages of GANs:
Scenarios with a more realistic probability distribution for each variable individually.
Scenarios that are coherent as a whole, since all variables are generated at the same time, ensuring that correlations between variables are preserved.
Possibility of using a slightly different architecture (conditional GAN), which can be used to generate conditional time series. This allows, for example, to test scenarios with some characteristic that one wishes to study. For example, one may be interested in generating scenarios consistent with moments in which interest rates around the world were increasing. This may be important for assessing the project's sensitivity to shocks in macroeconomic conditions.
There is also the possibility of associating Artificial Intelligence methods, such as GANs, with more traditional methods that model stochastic processes. These hybrid methods are quite promising in several areas.