Key words

1 Introduction

Metabolic processes convert material (e.g., nutrients, cofactors) into energy and biomass to achieve cell growth and maintenance. Quantifying these processes is particularly relevant to cancer cell metabolism because metabolic pathways (e.g., aerobic glycolysis) are often implicated in a tumor’s adaptation for survival, proliferation, and aggressiveness [1, 2]. Metabolic experiments are generally designed with contrast in mind to induce changes in these pathways, and thus it is imperative to be able to determine the activity of these pathways, either directly or by inferring from relevant models. The latter approach is effective when many independent but indirect measurements support the same conclusion. Steady-state metabolic flux analysis (MFA), while conventionally viewed as a tool to estimate absolute cellular fluxes, is a simple yet versatile technique to integrate a variety of cellular measurements to achieve this [3]. The required measurements are primarily growth and extracellular rates but can include rates estimated from stable isotopic tracing (e.g., 13C-MFA) and accumulation of radiolabelled products and enzyme capacity and directionality determined from kinetics, thermodynamics, and gene/protein expression [4,5,6,7,8,9]. A persistent challenge, however, is deploying a model that fits the investigation purpose and scope.

Because of the diversity of cancer metabolism and the associated context of investigations [10], it is difficult to find a one-size-fits-all metabolic model for flux analysis. We risk prematurely eliminating pathways that may transpire to become novel cancer metabolism features [11]. But with modifications, Recon 2 can serve this purpose. Recon 2 is reconstructed from genes in the human genome [12]. It embodies years of effort in prescribing accurate gene-protein-reaction mappings for each metabolic enzyme, such as enzyme complexes, atom and charge balancing, cofactor usage (e.g., NADP vs. NAD), and compartmentalization. Recon 2 encodes the full capacity of human metabolism, and conceivably cancer metabolism is well-described within this metabolic space [13]. There is a difficult balance in tuning the dimension of the model between a narrowly scoped model with coerced flux results and a loose model with non-informative flux intervals. Rather than building one metabolic model, manually, we closed this gap by dynamically extracting a representative subset of reactions from Recon 2 based on the context of the metabolic investigation and the information/data available (Table 1).

Table 1 Recon 2.2 network size before and after the reduction

Here, we describe a method to perform MFA of cancer cell metabolism using reactions from Recon 2 (Fig. 1). The extraction process is facilitated by metabolic data and a core reaction template that reflects investigator’s preference (see Note 1 ). By “activating” and “deactivating” individual reactions using data (e.g., rates) and manual specifications, associated pathways in Recon 2 are autonomously assembled into a complete functional network, with preference for core reactions. Classical mainstream reactions tend to be selected according to the template, but the choices are superseded by the data and the manual specifications. Pathway rerouting is thus readily accomplished, and efforts can focus on picking active reactions that precipitate the desired metabolic pathways. Rates and directionality information are derived from metabolite and expression data. Hypothetical information can be incorporated as well. Computation takes less than a minute; thus the model extraction and flux analysis can be performed in rapid explorative cycles. This method requires considerable coding, but we have provided MATLAB scripts to help exemplify the method, as well as to serve as a functional application.

Fig. 1
figure 1

Workflow for reducing Recon 2. (a) Starting from the full human reconstruction, (b) reaction directionalities are revised to restrict ATP and NADPH production. (c) Cell culture conditions and measured rates are used to constrain allowable exchange reactions and magnitude of network inputs/outputs. (d) Manual flux specifications are applied before maximizing the elimination of reactions not contained in the core template. (e) The reduced model is used for flux analysis. A Monte-Carlo approach is used to generate flux distributions

Flux analysis is a technique to extrapolate intracellular metabolic activity from experimental data using a model. The method described provides a simple and reproducible approach to manipulate content of Recon 2 for flux analysis of cancer cell metabolism.

2 Materials

The method described will follow closely the development of the MATLAB application (provided in the Supplementary Materials) but is not limited as a MATLAB implementation. The minimum computational requirements are software packages that can read SBML files and perform mathematical programming and optimizations (see Note 2 ).

2.1 Software

  1. 1.

    Install latest MATLAB (MathWorks) (see Note 3 ). The version used here is R2015a.

  2. 2.

    Install latest Gurobi Optimizer (Gurobi Optimization) (see Note 4 ). The version used here is Gurobi-7.5.2.

  3. 3.

    Install COBRA Toolbox (CT) for reading SBML files (see Note 5) [14].

2.2 Model and Inputs

  1. 1.

    Download Recon 2.2 (MODEL1603150001) as the reference model (see Note 6 ) [15].

  2. 2.

    Use Microsoft Excel or equivalent for model specification, as shown in modelSpecs.xlsx (see Note 7 ).

  3. 3.

    As guidance, use the list of Recon 2 core template reactions and allowable exchange reactions provided in the Supplementary Materials (see Note 8 ).

  4. 4.

    As guidance, use rates for cell growth and extracellular exchanges provided in the Supplementary Materials (see Note 8).

3 Methods

3.1 Prepare MATLAB Environment

  1. 1.

    Check that Gurobi and CT in MATLAB paths are set correctly (see Note 9 ).

  2. 2.

    Change MATLAB working directory to project folder.

  3. 3.

    Read Recon 2 SBML using CT function readCbModel(). Save Recon 2 workspace variable to file (see Note 10 ). Repeat this step when the SBML file is altered.

3.2 Load Recon 2 and Modify Model

  1. 1.

    Load Recon 2 from MATLAB file.

  2. 2.

    Update the directionality of the reactions by changing flux lower and upper boundaries. The focus of these changes is to restrict reactions that can produce ATP and NADPH to a few conventional ones (see Note 11 ).

  3. 3.

    Restrict exchange reactions by changing the respective flux lower and upper boundaries to zero. The focus of these changes is to eliminate uptake of trivial substrates, i.e., not present in cell culture media (see Note 11 ).

  4. 4.

    Update coefficients of the biomass equation if required (see Note 8 ).

3.3 Test Feasibility with and Without Measured Rates

  1. 1.

    Perform linear programming (LP) to check that modifications made in Subheading 3.2, steps 24 did not render the model infeasible, i.e., the biomass equation “biomass_reaction” still carries flux. If solver returns an infeasible status, revise modifications made in Subheading 3.2 until a feasible solution can be achieved before proceeding (see Note 12 ).

  2. 2.

    Perform quadratic programming (QP) to adjust measured exchange rates and growth rates such that they are balanced. Check residuals of the adjusted rates to make sure they are acceptable (see Note 13 ).

3.4 Reduce Recon 2 and Generate Submodel

  1. 1.

    Apply adjusted rates to Recon 2 by fixing the lower and upper boundaries of the exchange flux to the adjusted value. The constraints are relaxed slightly using a small slack value (1e-6) to improve model feasibility (see Note 14 ).

  2. 2.

    Perform mixed-integer linear programming (MILP) to minimize use of reactions that are not listed in the core reaction template (see Note 15 ). Repeat this step in case MILP produces nonunique solutions. The MATLAB script (see modelReduction.m) will generate three solutions, and choose the first solution as the reduced model by default.

  3. 3.

    Using LP, perform randomized elimination of reactions that have non-zero flux in the MILP flux solution (see Note 16 ). This step is optional.

  4. 4.

    Reactions that have non-zero flux are exported as the reduced model for flux analysis. Combining two or more reduced models is possible. Remove adjusted rates applied in Subheading 3.4, step 1, but keep modifications made in Subheading 3.2.

  5. 5.

    Save reduced model as MATLAB file and as a CT model (see Note 5 ).

3.5 Monte-Carlo Flux Analysis

  1. 1.

    Load reduced model from MATLAB file.

  2. 2.

    Apply measured rates and perform QP. This step is the same as Subheading 3.3, step 2, but the reduced model is used instead of Recon 2. Save adjusted rates, intracellular fluxes, and residuals as “optimum” set.

  3. 3.

    Calculate the value of underdetermined fluxes by randomized uniform sampling. This is achieved by performing flux variability analysis (FVA) on a randomly chosen reaction and fixing the chosen reaction’s flux to a value sampled from the FVA interval randomly with a uniform distribution. This step is performed to all reactions in a randomized order (see Note 17 ).

  4. 4.

    Generate flux distributions by a Monte-Carlo approach by repeating Subheading 3.5, steps 13 (>100 iterations). With this procedure, however, measured rates are corrupted first with normally distributed noise before performing QP (see Note 17 ). Save adjusted rates, intracellular fluxes, and residuals as “corrupted” set.

  5. 5.

    Compile and plot flux results from the corrupted set (Fig. 2) (see Note 18 ).

Fig. 2
figure 2

Flux distributions generated by a Monte-Carlo approach. Histograms are shown from a bird’s-eye view, with density represented by shading

4 Notes

  1. 1.

    The method is an abridged version of an existing approach of reducing Recon 2 for steady-state flux analysis of HEK cell culture [4]. The published method involves complicated handling of raw gene expression data and numerous subjective steps, which are now summarized into a single editable “core reaction template.” This simplifies manipulation of reactions contained in Recon 2.

  2. 2.

    The optimization approaches required are linear programming (LP), mixed-integer linear programming (MILP), and quadratic programming (QP). The generalized formulation for these approaches is shown as Eq. 1. QP and LP are similar, except that the matrix Q and the constant k are zero in LP. For steady-state flux analysis, the inequality constraint is reformulated as the metabolite balance equation shown as Eq. 2, where S and v are the stoichiometric matrix and flux vector, respectively [3]. In MILP, the vector x contains both continuous and binary [0,1] values.

    $$ {\displaystyle \begin{array}{l}\operatorname{Minimize}\ {c}^{\hbox{'}}\cdotp x+{x}^{\hbox{'}}\cdotp Q\cdotp x+k\\ {}\mathrm{such}\ \mathrm{that}:\\ {} lb\le x\le ub\\ {}A\cdot x\le b\end{array}} $$
    (1)
    $$ S\cdot v\le 0 $$
    (2)
  3. 3.

    MATLAB is preferred because both Gurobi and CT installations come with MATLAB interface. MATLAB script modelReduction.m contains the steps performed in Subheadings 3.13.4. MATLAB script monteCarloMFA.m contains the steps performed in Subheading 3.5.

  4. 4.

    Gurobi has all three LP, QP, and MILP solvers. Academic free licensing is available.

  5. 5.

    COBRA Toolbox (CT) is currently being updated (version 3.0 in the pipeline) (http://opencobra.github.io/). The recommended installation procedure is via GitHub, but the required MATLAB function files can also be downloaded as a ZIP file. Running initCobraToolbox.m will run an update procedure. CT contains a large number of functions for constraint-based modelling and model editing that may be useful. The description of these functions can be found in OpenCOBRA’s website, under “Analysis” and “Reconstruction”- > “Refinement.” The website also has very detailed tutorials. Typical analysis functions include FBA and FVA, which are useful for flux simulations and checking.

  6. 6.

    The method described is based on Recon 2.2 [15], which is a 2016 version of the human reconstruction. The SBML model (MODEL1603150001) can be downloaded from EMBL-EBI BioModels Database. This version supersedes Recon 2.04 (2015), which can be browsed and downloaded from the Virtual Metabolic Human (VMH) website (http://vmh.uni.lu). Recon 3D currently is in the pipeline.

  7. 7.

    A spreadsheet interface is used to control and modify the contents of Recon 2. The file (modelSpecs.xlsx) is provided in the Supplementary Materials. Recon 2.2’s reactions are tabulated in this file. Excel’s filter is a useful way to visualize specific groups of reactions.

  8. 8.

    Information used to modify and reduce Recon 2 is listed in the spreadsheet, instead of being scripted in MATLAB. This provides traceability and opportunity for version control. The information includes (1) core template reactions, (2) allowable exchange reactions, (3) measured rates and errors, (4) reaction directions, (5) flux constraints, and (6) biomass composition. The core template provided has 363 reactions. The original biomass coefficients reflect the makeup of a generic mammalian cell (in mmol/gDW), and not for any specific cell type. See “Instructions” tab for more detailed explanation of the spreadsheet. Model specification can be equally accomplished in plain text files as well.

  9. 9.

    Run initCobraToolbox.m after installing Gurobi and specifying Gurobi path in MATLAB; otherwise COBRA Toolbox may not assign the correct solver. Add Gurobi routines to MATLAB path using gurobi_setup.m. This file is located in Gurobi’s MATLAB interface directory. Add CT scripts to MATLAB path using initCobraToolbox.m. This file is located in OpenCOBRA’s root directory. Rerun these functions if MATLAB path was reset or when Gurobi or CT functions are no longer detected.

  10. 10.

    CT takes about 15 min to load the Recon 2.2 SBML file. To save time, save the model variable as a local MATLAB file, and reload this file at the start of each MATLAB session, or reset the variable.

  11. 11.

    Controlling the sources of ATP and NADPH is important because energy and redox balance can alter flux outcomes [4, 16]. For cancer metabolism, one can restrict the production of ATP to mainstream pathways like glycolysis, TCA cycle, and oxidative phosphorylation and NADPH to oxidative pentose phosphate pathway and malic enzymes. Restriction is accomplished by altering the directionality of reactions, i.e., ATP and NAPDH are consumed by reactions unless allowed intentionally. Controlling the network inputs and outputs is also important to properly balance the measured rates (when performing QP). Inputs and outputs should be restricted to nontrivial substrates and products and essential nutrients required by the biomass equation. Cell culture dataset tends to be amino acid-centric [4]; allowable exchanges are thus the 20 amino acids, glucose, lactate, and small molecules H+, H2O, Pi, O2, CO2, NH3, SO4, and urea. To incorporate other substrates or by-products (e.g., fatty acids, acetate) [6], simply allow the exchange of these metabolites of interests.

  12. 12.

    When the lower boundary of “biomass_reaction” flux is fixed to 1, an infeasible outcome means that biomass cannot be produced. This is caused by the deactivation of an essential source, sink, and/or pathway route. To troubleshoot, modifications are toggled ON/OFF until a feasible solution is achieved. Conditionally essential metabolic features may require special annotation for future references.

  13. 13.

    While experimental measurements are acquired independently, the model inputs and outputs are occasionally coupled, for example, essential amino acid uptakes to the growth rate and glucose consumption to lactate production. QP is performed to adjust measure rates in a weighted least square fashion (Eq. 3) such that they become consistent according to the model [3]. Equation 3 can be reformulated as a QP problem, with measurement errors σ converted into the weight matrix Q. Calculated residuals show the “movement” of the rates and can be manipulated by tuning σ. For example, reduce error term of growth rate to induce greater adjustments of amino acid rates. Terms with zero residuals are free (uncoupled) fluxes.

    $$ \operatorname{Minimize}\ {\left(\frac{v_{\mathrm{measured}}-{v}_{\mathrm{simulated}}}{\sigma}\right)}^2 $$
    (3)
  14. 14.

    Measured rates do not necessarily need to belong to the control or treatment set per se. It is important to use rates that represent average activity and directionality of exchange reactions. If rates are opposite between conditions, then model reduction may be performed more than once, with the final model produced by overlaying two or more reduced models.

  15. 15.

    For the MATLAB application, the binary values 0 and 1 were formulated to represent activation and deactivation of reactions, respectively. Hence the objective of the MILP problem is set to maximize “1”s among reactions that are not listed as core reactions.

  16. 16.

    MILP does not produce a minimum model, i.e., all reactions are essential. Further model reduction can be accomplished by randomly eliminating active reactions, one at a time, and checking for feasibility by LP. The objective function can be used to bias the choice of reactions eliminated, e.g., keep reactions that maximize ATP production.

  17. 17.

    Determining flux distribution by a Monte-Carlo approach allows propagation of measurements errors to the flux of all other reactions. Measured rates are corrupted using Eq. 4 before being adjusted and used as flux constraints. Although underdetermined fluxes have uniform distributions within the flux interval, they also have complex couplings to other underdetermined fluxes. Hence, a greedy sampling approach was used. Underdetermined fluxes were constrained in a randomized order, each to a random flux uniformly sampled from that instance of flux interval.

    $$ {v}_{\mathrm{c}\mathrm{orrupt}}={v}_{\mathrm{measured}}+\sigma \dot{\mathrm{c}}N\left(0,1\right) $$
    (4)
  18. 18.

    Figure 2 is an example of a bird’s-eye view plot of histograms for all reactions. MATLAB’s zoom function can be used to inspect individual histograms and to produce a more legible y-axis (reaction ID).