## Prescriptive Data Analytics

My current interest is to work towards prescriptive data analytics: developing framework and algorithms that create value/money/insight from data.## Massive Computational Experiments for Statistical Performance Assessment

Nowadays, a PhD in a computational field is expected to present results based on computations on the order of 1 million CPU-hours. With the great computational resources that are available from cloud computing and cheap computers (computer clusters with hundreds of nodes), it is feasible to perform any level of computations. In a recent paper in JCP, I presented results of 320,000 CPU-hours of computations (about 10 million simulation runs). In this paper, I presented a procedure for performance assessment of closed-loop optimization.## Selection of representative models for decision making and optimization under uncertainty

When uncertainty is represented in terms of a probability distribution, a large number of realizations can be generated to capture the uncertainty. In the context of decision making and optimization under uncertainty, however, we need to select very few models. In this research, we investigated how to select the most representative subset from a large number of realizations when decision parameters are unknown a priori. The paper is published in Computers & Geosciences. An earlier version of this research was presented at InterPore in Cincinnati. Recent results and findings will be presented at AGU fall meeting in San Francisco.## Closed-loop field development under uncertainty by use of optimization with sample validation

Results of this research is described in a paper in SPE Journal ( download ). See some cool visualization here.#### SPE 173219 highlights:

- Closed-loop field development (CLFD) optimization framework is introduced.
- Optimization with sample validation (OSV) is introduced for optimization under uncertainty (optimization over multiple realizations from a probability distribution function).
- The OSV procedure determines, in a systematic way, the number of realizations for optimization under uncertainty.
- Use of CLFD with OSV represents a robust and efficient overall methodology for optimal decision making in hydrocarbon field development projects.
- A simultaneous optimization of all wells provides superior results to those from a (greedy) well-by-well optimization in which each well is optimized independently (see this powerPoint show file for evolution of water saturation map versus time for simultaneous and well-by-well optimization).

Wouldn't it be cool to simulate the field development and the reservoir management? In our research, we can simulate the field development where wells are drilled one (or a few) at a time, and new data are obtained from each well.

With my adviser, Prof. Louis Durlofsky, we developed a new framework for field development optimization. Closed-loop field development (CLFD) is a framework that shows how the history matching and optimization methods can be put together for improving the oil recovery or the net present value. CLFD workflow is shown in Figure 1 here.

Figure 1: Closed-Loop Field Development (CLFD) Optimization, SPE 173219

In SPE 173219, we discuss an important subject which is "optimization under uncertainty" for expensive function evaluations. Uncertainty is usually represented by using a set of realizations (instead of a single realization) to describe the model. Optimization is then performed by maximizing the expected objective over the entire set of realizations (robust optimization). This approach (optimization over the entire set) can be impractical when function evaluations are computationally expensive and a large number of realizations are used to capture the uncertainty in model description. To solve this problem, we proposed the "optimization with sample validation" (OSV) methodology. In this approach, robust optimization is performed over a small subset of representative realizations, and then an appropriate validation criterion is used to assess whether the representative realizations could adequately represent the entire set. Our framework is suitable for expensive function evaluations and complex optimization problems in which stochastic programming techniques such as out-of-sample validation might not work!

#### Assessment of the value of a hydrocarbon reservoir:

Figure 2 here is taken from Example 2 of SPE-173219-PA paper. As Figure 2 shows, given all the available data from existing wells, we can use CLFD to assess the value of a petroleum reservoir development project; in other words, our CLFD framework can be used for financial analysis of petroleum reservoir development projects. Furthermore, by making "better decisions" about future wells, the value of the project (which is reflected on P50) is improveed at later times.

The results in Figure 2 show that incorporating reservoir data helps us make better decisions throughout reservoir development project. Both the P10 and P50 of the project are consistently improved. Furthermore, the estimate of the value of the project (which is E[NPV], shown in blue here), becomes closer to P50 value.

Figure 2: P10, P50, P90 NPVs (10th, 50th and 90th percentile of NPV) evaluated for the entire set of 50 realizations, along with the expected NPV for the representative set (of 10 realizations) versus CLFD step, SPE-173219-PA

## TSVD-based Levenberg-Marquardt algorithm for large scale inverse problems

Results of this research is described in a paper in JPSE ( download ).

In hydrology and reservoir engineering, subsurface reservoir models are construcated based on observed data at well locations. The models should be calibrated to match dynamic data such as flow rates at well locations. The model calibration is an inverse problem which has infinitely many solutions. Nonlinear inverse problems are encountered in various engineering fields such as geophysics, hydrology, signal processing and machine learning.

In solving nonlinear inverse problems, our goal is to either obtain a single appropriate model or obtain multiple models that quantify the uncertainty in model description. Part of my research is on investigating efficient approaches for solving nonlinear inverse problems.

Among numerous optimization algorithms, the Levenberg-Marquardt algorithm is perhaps the most efficient approach for solving nonlinear inverse problems. The Levenberg-Marquardt algorithm is also a popular approach in machine learning for training artificial neural networks. In a recent paper in JPSE we have shown that the Levenberg-Marquardt algorithm provides appropriate solutions to nonlinear inverse problems, while the Gauss-Newton and its varients such as quasi-Newton methods may converge to rough/poor estimates of the model. In particular, we showed that the Levenberg-Marquardt search direction at early iterations has components mainly in the direction of eigenvectors corresponding to the largest eigenvalues, while the Gauss-Newton search direction is mainly in the direction of eigenvectors corresponding to the smallest eigenvalues. As a consequence, the Levenberg-Marquardt algorithm gradually resolves the important features of the model.

Our Levenberg--Marquardt algorithm is applicable to large-scale problems. By TSVD parameterization, the search-direction vector is expanded in terms of eigenvectors of the Hessian corresponding to the largest eigenvalues. This process only requires a TSVD of the Jacobian matrix to be computed and the Hessian matrix or the Jacobian are not computed. The TSVD is computed by use of Lanczos algorithm which does not require an explicit computation of the Jacobian (the sensitivity matrix). The Lanczos algorithm only requires the product of Jacobian times a vectors and the product of the transpose of Jacobian times a vector.

## Optimization under uncertainty

Optimal management of subsurface resources requires the ability of utilizing various types of reservoir data for making optimal decisions in development of hydrocarbon resources and in operations of exisiting wells. Decisions include determining locations of new wells and operational settings of existing wells. My research aims to provide algorithms, softwares and tools to help make optimal decisions.

## Efficient methods for generalized field development optimization

Decisions in development of hydrocarbon fields are usually determined using optimization methods together with reservoir simulation. Decision parameters for optimization include the number of wells, their locations and time-varrying controls, drilling sequence, and well-types. In this research an efficient optimization framework is developed for the joint optimization of field development decision parameters.

## Gradient-based History Matching for Closed-loop Reservoir Management

In reservoir management, computational models are built to simulate the future reservoir behaviour. Geological models for the subsurface reservoir are generated based on the available geological, geophysical and production data. As there is always new data from the reservoir, development of efficient methods for history matching is of significant interest. History matching is an important step of closed-loop reservoir management (CLRM). Figure 3 shows the schematic layout of CLRM. CLRM provides a framewrok that different history matching methods can be compared.

Figure 3: Schematic layout of closed-loop reservoir management

In this research, efficient gradient-based history matching techniques are developed for applications in closed-loop reservoir management. Figures 4 and 5, respectively, show data matches and predictions from prior and conditional RML realizations for a producer well. Data matches and predictions for the wells not shown are similar. As figure 5 shows, all RML realizations are generated in about 50 simulation runs which shows the efficiency of our gradient-based history matching technique.

Figure 4: Data matches and predictions for unconditional realizations

Figure 5: Data matches and predictions for conditional RML realizations. Dashed vertical line shows end of history matching.

Figure 6: Normalized (history matching) objective function versus number of simulation runs.

## History matching using optimal parameterization

History matching using parameterization based on a truncated SVD of a dimensionless sensitivity matrix is proved to be the optimal parameterization in history matching.

This method is based on using the Gauss-Newton or the Levenberg-Marquardt algorithms. These gradient-based algorithms require the computation of the sensitivity matrix which is not feasible for large history matching problems. However, by parameterizing the vector of the change in the model in terms of the eigenvectors corresponding to the largest eigenvalues of the Hessian matrix, the formulation would only require a truncated SVD of the sensitivity matrix rather than the full matrix. The Lanczos algorithm is used to compute a truncated SVD. For computing a truncated SVD of a matrix G, the Lanczos algorithm only requires the product of G times a vector, and the product of transpose of G times a vector.

In the history matching context, the sensitivity matrix G is the matrix of derivatives of the predicted data with respect to all model parameters. The product of G times an arbitrary vector v, can be computed with the gradient simulator method (direct method) while the product of the transpose of G times an arbitrary vector u can be computed with an adjoint solution. Writing the code for the adjoint method and the gradient simulator method require explicite knowledge of reservoir simulator code.

Note that the adjoint method is well-known for computing the analytical gradient from the simulator in the history matching. However, the adjoint method can compute the product of transpose of G times any arbitrary vector u. For a particular vector u, the adjoint solution is the analytical gradient.

Commercial reservoir simulators usually do not have the adjoint/gradient simulator method available. The computational costs of history matching without information about the sensitivity matrix and analytical gradient is usually high and might be less reliable.

Results and algorithms are provided in this paper in the Journal of Petroleum Science and Engineering.