1 Introduction

This paper studies a posteriori error estimation of the of one-step time discretization methods for linear parabolic differential equations. The objective is to give rigorous error bounds in terms of quantities derived from the computed solution, which should show the correct asymptotic order of accuracy as known from an a priori error analysis in terms of regularity bounds of the exact solution or the data. The expected order of accuracy at the time nodes is higher than the order expected in other points. In the case of Runge–Kutta methods, the maximal order at the nodal points is the classical order of the method. Such a higher order at nodal points is indeed attained provided compatibility conditions are satisfied, which are, however, unrealistic with all kinds of (Dirichlet or Neumann or Robin) boundary conditions except periodic boundary conditions. If the required compatibility conditions are not fully satisfied, an order reduction with respect to the classical order is observed [9, 13]. Remarkably, the order reduction remains localized near the boundary and the full classical order is attained in the interior of the domain [10].

In the present paper our goal is twofold: we give a new proof of a posteriori error bounds at the time nodes and we show that the order reduction does not occur in the a posteriori control of the error in the interior of the spatial domain.

We use the unified treatment of essentially all single-step time-stepping schemes of [3] and of the corresponding reconstructions. A key novel feature of our analysis is an error representation formula based on Duhamel’s principle. Through this expression a direct superconvergence analysis for Runge–Kutta and Galerkin time discretization schemes is possible. Our interior results are a posteriori analogs of the a priori estimates of [10].

For previous a posteriori results using various one step time discretization methods we refer, e.g., to [13, 58, 11, 14]. A posteriori time-superconvergence results for fully discrete schemes based on dG piecewise linear time discretization methods were derived in [5].

We consider linear parabolic equations in a Hilbert space setting: Let \(H\) be a Hilbert space with inner product \(\langle \cdot ,\cdot \rangle \) and corresponding norm \(|\cdot |\). The problem is to find \(u:[0,T] \rightarrow D(A)\) satisfying

$$\begin{aligned} \left\{ \begin{array}{ll} u^{\prime } (t) + Au(t)=f(t), \quad 0<t<T,\\ u(0)=u^0, \end{array} \right. \end{aligned}$$
(1.1)

with \(A\) a positive definite, self-adjoint, linear operator on \(H\) with domain \(D(A)\) dense in \(H\), and a given forcing term \(f : [0,T]\rightarrow H.\)

We remark that the results of Sect. 2 hold equally in a Banach space setting, whereas those of Sect. 3 require the above Hilbert space setting.

1.1 Discretization methods

We will use the notation and formalism of [3] to describe the numerical methods considered. We consider piecewise polynomial functions in arbitrary partitions \(0=t_0<t_1<\cdots <t_N=T\) of \([0,T],\) and let \(J_n:=(t_{n-1},t_n]\) and \(k_n:=t_n-t_{n-1}\). We denote by \(\mathcal{V }_q^\mathrm{d}, q\in \mathbb{N }_0,\) the space of functions that are piecewise polynomials of degree at most \(q\) in time in each subinterval \(J_n\) with coefficients in \(V=D(A^{1/2})\), without continuity requirements at the nodes \(t_n\). The elements of \(\mathcal{V }_q^\mathrm{d}\) are taken continuous to the left at the nodes \(t_n;\mathcal{V }_q(J_n)\) consist of the restrictions to \(J_n\) of the elements of \(\mathcal{V }_q^\mathrm{d}\). The spaces \({\mathcal{H }}_q^\mathrm{d}\) and \({\mathcal{H }}_q(J_n)\) are defined similarly by requiring that the coefficients are in \(H\). Let \(\mathcal{V }_q^\mathrm{c}\) and \(\mathcal{H }_q^\mathrm{c}\) be the spaces of the continuous elements of \(\mathcal{V }_q^\mathrm{d}\) and \(\mathcal{H }_q^\mathrm{d}\). For \(v\in \mathcal{V }_q^\mathrm{d}\) we let \(v^n:=v(t_n),\;v^{n+}:= \lim _{t\downarrow t_n} v(t)\).

To define the time stepping methods we introduce the operator \({\varPi }_{q-1}\) to be a projection operator to piecewise polynomials of degree \(q-1,\varPi _{q-1}: C([0,T];H) \rightarrow \oplus _{n=1}^N \mathcal{H }_{q-1}(J_n).\) Also, \(\widetilde{\varPi }_{q}:\mathcal{H }_q(J_n) \rightarrow \mathcal{H }_\ell (J_n)\) is an operator mapping polynomials of degree \(q\) to polynomials of degree \(\ell ,\) with \(\ell =q\) or \(\ell =q-1;\varPi _{q-1}\) and \(\widetilde{\varPi }_{q}\) are defined in a reference time interval and then transformed into \(J_{n}.\)

The time discrete approximation \(U\) to the solution \(u\) of (1.1) is defined as follows: We seek \(U\in \mathcal{V }_q^\mathrm{c}\) satisfying the initial condition \(U(0)=u^0\) as well as

$$\begin{aligned} U^{\prime }(t) + \varPi _{q-1}F(t,\widetilde{\varPi }_{q} U(t)) = 0 \quad \forall t \in J_n, \end{aligned}$$
(1.2)

where \(F(t,v)= Av - f(t).\) An equivalent Galerkin formulation is

$$\begin{aligned} \int _{J_n} \, \big [\langle U^{\prime } ,v \rangle + \langle \varPi _{q-1} F(t,\widetilde{\varPi }_{q} U(t)),v \rangle \big ] \, dt = 0 \quad \forall v \in \mathcal{V }_{q-1}(J_n), \end{aligned}$$
(1.3)

for \(n=1,\ldots , N,\) see [3]. The above formalism covers a large class of one-step time discretization schemes. In particular, the continuous Galerkin (cG) method is

$$\begin{aligned} U^{\prime }(t) + P_{q-1} F(t, U(t))= 0 \quad \forall t \in J_n, \end{aligned}$$
(1.4)

with \(\varPi _{q-1}:=P_{q-1}, \) with \(P_\ell \) denoting the \(L^2\) orthogonal projection operator onto \(\mathcal{H }_\ell (J_n).\) Furthermore, in [3] it was shown that (1.3) describes other important implicit single-step time stepping methods: the RK collocation methods (RK-C) with \(\varPi _{q-1}:=I_{q-1},\) the interpolation operator at the collocation points, and \(\widetilde{\varPi }_{q}\) the identity operator; all other interpolatory RK methods with \(\varPi _{q-1}:=I_{q-1},\) and appropriate \( \widetilde{\varPi }_{q}\) (with \(\ell =q\)); the discontinuous Galerkin (dG) method with \(\varPi _{q-1}:=P_{q-1}\) and \( \widetilde{\varPi }_{q}= I_{q-1},\) where \(I_{q-1}\) is the interpolation operator at the Radau points \(0<c_1<\cdots <c_q=1\) (so \(\ell =q-1\)).

1.2 Superconvergence—classical order

A key assumption for the time-discretization methods related to the accuracy at the time nodes is: We assume that the method (1.2) is associated to \(q\) pairwise distinct points \(c_1,\ldots , c_q\in [0, 1]\) with the property

$$\begin{aligned} \int _0^1\prod _{i=1}^q(\tau -c_i) v(\tau )\, d\tau =0 \quad \text{ for } \text{ all } \text{ polynomials } v \text{ of } \text{ degree } \le r. \end{aligned}$$
(1.5)

This condition induces orthogonality conditions at each interval \(J_{n}\) with \(t_{n, i}:=t_{n-1}+c_i k_n, \) \( i=1,\ldots , q.\) These points will be associated to projections (or interpolants) used to define the method (1.2); see [3] for details. The superconvergence order or classical order \(p\) of the method at the nodes is denoted

$$\begin{aligned} p=q+1+r, \end{aligned}$$
(1.6)

which is equal to the order of the interpolatory quadrature with nodes \(c_i\).

2 Nodal error analysis in \(H\)

2.1 Reconstruction and main error equation

As in [3] we compare the solution \(u\) with the reconstruction \(\hat{U}\) of \(U\) defined through

$$\begin{aligned} \hat{U} (t):=U(t_{n-1})-\int _{t_{n-1}}^t \hat{\varPi }_q \big [A\widetilde{\varPi }_{q} U- f\big ](s)\, ds \quad \forall t\in J_n. \end{aligned}$$
(2.1)

where the projection operators \(\hat{\varPi }_q\) onto \(\mathcal{H }_q(J_n), n=1,\ldots , N,\) are chosen to that \(\hat{\varPi }_q w\) agree with \(\varPi _{q-1}\) at \(t_{n,i}\):

$$\begin{aligned} (\hat{\varPi }_q-\varPi _{q-1})w(t_{n, i})=0, \quad i=1,\ldots ,q, \quad \forall w \in C([0,T];H). \end{aligned}$$
(2.2)

In view of (1.5) for \(v(\tau )=1\) and (2.2), we obtain \(\hat{U}(t_n)=U(t_n)\) and conclude that \(\hat{U}\) is continuous. Furthermore, \(\hat{U}\) satisfies

$$\begin{aligned} \hat{U}^{\prime }(t) = -\hat{\varPi }_q [A\widetilde{\varPi } U(t)- f(t)] = -\hat{\varPi }_q F(t,\widetilde{\Pi }_{q} U(t)) \quad \forall t\in J_n, \end{aligned}$$
(2.3)

which has a similar structure to (1.2). The motivation for introducing \(\hat{U} \) goes back to [1, 2] and details for its various properties are discussed in [3]. The operator \(\hat{\varPi }_q\) is chosen as follows: for the cG and dG methods, \(\hat{\varPi }_q=P_q\); for the RK collocation methods, \(\hat{\varPi }_q\) is an interpolation operator at the \(q\) collocation points of \(J_n\) plus another point.

We mention two key properties of \(\hat{U}\). The first one is the orthogonality property which follows by (2.2):

$$\begin{aligned} \int _{J_n} \langle (\hat{\varPi }_q - \varPi _{q-1})w(s), v(s)\rangle \, ds = 0 \quad \forall w \in C([0,T];H), \ \ v \in \mathcal{H }_r(J_n), \end{aligned}$$
(2.4)

for \(n=1,\ldots , N.\) The second one is a further assumption on \(\varPi _{q-1},\) namely for all \(V\in \mathcal{H }_q(J_n)\),

$$\begin{aligned} \int _{J_n} \langle V-\varPi _{q-1} V, v \rangle dt = 0 \quad \forall v\in \mathcal{H }_{r-1}(J_n), \end{aligned}$$
(2.5)

which, in view of (2.4), yields

$$\begin{aligned} \int _{J_n} \langle \hat{\varPi }_q V - V, v \rangle dt = 0 \quad \forall v\in \mathcal{H }_{r-1}(J_n). \end{aligned}$$
(2.6)

Condition (2.5) is verified by both cG and dG methods, for which \(\varPi _{q-1}=P_{q-1}\), as well as by RK methods, for which \(\varPi _{q-1}=I_{q-1}.\) 1.1.

We state now the main error equation, which is the starting point of our analysis. Let \(\hat{R}\) be the residual of \(\hat{U},\)

$$\begin{aligned} -\hat{R} (t):=\hat{U}^{\prime }(t)+A\hat{U} (t)- f(t). \end{aligned}$$
(2.7)

Subtracting (2.7) from the differential equation in (1.1), we obtain the equation

$$\begin{aligned} \hat{e}^{\prime }(t)+A\hat{e}(t)= \hat{R}(t), \end{aligned}$$
(2.8)

for the error \(\hat{e}:=u-\hat{U},\) which we rewrite in the form

$$\begin{aligned} \hat{e}^{\prime }(t)+A\hat{e}(t)= R_{\hat{U}}(t)+R_{{\widetilde{\varPi }}_{q}}(t) + R_{\hat{\varPi }_q}(t) + R_f(t) \end{aligned}$$
(2.9)

with

$$\begin{aligned} R_{\hat{U}}:=A(U-\hat{U}), \quad R_{\hat{\varPi }_q}:=A(\hat{\varPi }_q -I){\widetilde{\varPi }_{q}} U, \quad R_f:=f-\hat{\varPi }_qf, \end{aligned}$$
(2.10)

and

$$\begin{aligned} R_{\widetilde{\varPi }}(t):=A(\widetilde{\varPi }_{q} U-U). \end{aligned}$$
(2.11)

Notice that \(R_{\hat{\varPi }_q}\) vanishes when \(\hat{\varPi }_q\) is a projector over \(\mathcal{H }_q(J_n)\) whereas \(R_{\widetilde{\varPi }_{q}}\) vanishes when \( \widetilde{\varPi }_{q} =I.\)

We will estimate the error \(e:=u-U\) at the time grid points, where \(e(t_n)=\widehat{e}(t_n),\) which because of the error equation (2.8) can be bounded in terms of the residuals \(\widehat{R}(t)\) for \(0\le t \le t_n.\)

Remark 2.1

(Order of the residual) Theorem 2.2 in [2] states that

$$\begin{aligned} \widehat{U}(t) - U(t) = \frac{1}{(q+1)!} k_n^{q+1} {\widehat{U}}^{(q+1)} \varphi _{q+1}\Bigl (\frac{t-t_{n-1}}{k_n}\Bigr ), \end{aligned}$$

where \(\varphi _{q+1}\) is the polynomial of degree \(q+1\) with leading coefficient 1 with \(\varphi _{q+1}(0)=0\) and vanishing derivatives at \(c_1,\ldots ,c_q\). In view of using this formula and standard interpolation estimates in (2.9)–(2.11), we anticipate that in the case of a smooth solution the residual \(\hat{R}\) is of order \(O(k_n^{q+1})\) for the continuous Galerkin method cG(\(q\)), for the discontinuous Galerkin method dG(\(q\)), as well as for RK collocation methods with \(q\) arbitrary collocation points.

Some care must however be taken in which norm \(\widehat{R}\) is estimated. In Sect. 6 of [3] it is shown, for \(\rho \ge 1\), that \(U,\widehat{U}\in D(A^\rho )\) and \(\hat{R}\in D(A^{\rho -1})\) when \(U^0\in D(A^\rho )\) and \(f\in D(A^\rho )\), for the cases of the continuous and discontinuous Galerkin methods and for the RK collocation methods at, e.g., Radau points. However, if one does not assume \(f\in D(A)\) (which for the Laplacian with Dirichlet boundary conditions would require the compatibility condition that \(f\) vanishes at the boundary), then only \(A^{-1}\widehat{R}\) will be in \(H\), but not \(\widehat{R}\) itself. In the case of a spatially and temporally smooth solution we can then expect

$$\begin{aligned} |A^{-1}\widehat{R}(t)| = O(k_n^{q+1}) , \qquad t\in J_n, \end{aligned}$$
(2.12)

which holds if \(|\widehat{U}^{(q+1)}| =O(1)\). This bound can indeed be shown rigorously if the error satisfies

$$\begin{aligned} |A(U-u)(t)| = O(k_n^{q}), \qquad t\in J_n, \end{aligned}$$

which is obtained from standard a priori error bounds if the step sizes \(k_n\) are all of equal magnitude. For a finite element space discretization, the compatibility problem still shows up in that the \(H\)-norm of the spatially discrete residual is not \(O(k_n^{q+1})\) uniformly in the spatial grid size when \(f\notin D(A)\).

The difficulty of \(\hat{R}(t)\notin H\) does not occur for the heat equation on a cube with periodic boundary conditions or for the heat equation on the whole \(R^d\), where in the case of smooth data one obtains, for arbitrary \(\ell \ge 0\),

$$\begin{aligned} |A^\ell \widehat{R}(t)| = O(k_n^{q+1}) , \qquad t\in J_n. \end{aligned}$$
(2.13)

2.2 Error representation via Duhamel’s principle.

We now apply Duhamel’s principle to (2.8):

$$\begin{aligned} \hat{e}(t) = \int _0 ^t E_A (t-s) \hat{R}(s) \, ds, \end{aligned}$$
(2.14)

where \(E_A (t) = e^ {-At}\) is the solution operator of the homogeneous equation

$$\begin{aligned} v^{\prime }(t) + A v(t) =0, \quad v(0)=w, \end{aligned}$$
(2.15)

i.e., \(v(t) = E_A (t) w.\) The family of operators \(E_A (t)\) is the semigroup of contractions on \(H\) with generator \(-A.\) The following properties are well known, cf., e.g., Crouzeix [4], Thomée [14],

$$\begin{aligned} \frac{ d^\ell }{dt^\ell } \, E_A (t) w = (-A)^\ell \, E_A (t) w, \quad \ell \ge 0, \end{aligned}$$
(2.16)

and

$$\begin{aligned} |A^\ell \, E_A (t) w| \le C_A \, \frac{1}{t ^{\ell -m}} |A^m w| \quad \ell \ge m\ge 0. \end{aligned}$$
(2.17)

Since \(A\) and \( E_A\) commute, (2.17) implies

$$\begin{aligned} |\, E_A (t)\, A^\ell w| \le C_A \frac{1}{t^{\ell -m}} |A^m w|, \quad \ell \ge m\ge 0, \end{aligned}$$
(2.18)

when \(|E_A w| \le C_A t^{-m} |A^{-m} w|\).

Starting from (2.14) we derive now a different error representation formula involving time derivatives of \(E_A.\) In the interval \(t_{n-1}\le s \le t_n\) we define the scaled \(j\)th antiderivative of \(\hat{R} \) as

$$\begin{aligned} \hat{R}^{[j]} _n(s) : = {k_n^{-j}} \int _{t_{n-1}} ^s \int _{t_{n-1}} ^{s_{j-1} } \cdots \int _{t_{n-1}} ^{s_{1}} \hat{R}( \tau ) \, \, ds_1 ds_2 \ldots ds_{j-1} d\tau , \quad j\ge 1. \end{aligned}$$
(2.19)

Then, one has,

$$\begin{aligned} k_n^j \hat{R}^{[j]}_n (s) = \int _{t_{n-1}} ^s \frac{(s-\tau ) ^{j-1}}{(j-1)!}\, \hat{R}(\tau ) d\tau , \quad j\ge 1. \end{aligned}$$
(2.20)

Using (2.16), (2.20) and integrating by parts in (2.14) we obtain,

$$\begin{aligned} \int _{t_{n-1}} ^t E_A (t-s) \hat{R}(s) \, ds&= \int _{t_{n-1}} ^t E_A (t-s) k_n \frac{ d }{d s} \hat{R}^{[1]} _n(s) \, ds \nonumber \\&= \int _{t_{n-1}} ^t E_A (t-s) A\, k_n \hat{R}^{[1]}_n (s) \, ds + k_n\hat{R}^{[1]}_n (t). \end{aligned}$$
(2.21)

Further,

$$\begin{aligned} \int _{t_{n-1}} ^t E_A (t-s) A\, k_n \hat{R}^{[1]}_n (s) \, ds&= \int _{t_{n-1}} ^t E_A (t-s) A \frac{ d }{d s} k_n^2 \hat{R}^{[2]}_n (s) \, ds \nonumber \\&= \int _{t_{n-1}} ^t E_A (t-s) A^2\, k_n^2 \hat{R}^{[2]}_n (s) \, ds + A k_n^2 \hat{R}^{[2]}_n (t).\nonumber \\ \end{aligned}$$
(2.22)

Thus, for any \(\rho , \)

$$\begin{aligned} \int _{t_{n-1}} ^t E_A (t-s) \hat{R}(s) \, ds&= \int _{t_{n-1}} ^t E_A (t-s) A ^\rho \, k_n^\rho \hat{R}^{[\rho ]}_n (s)\, ds \nonumber \\&+ \sum _{j=1} ^\rho A^ {j-1} k_n^j \hat{R}^{[j]}_n (t). \end{aligned}$$
(2.23)

Notice that, still for \(t \ge t_{n-1},\) and for \(s\in J_m\), \(E_A(t-s)=E_A(t-t_m)E_A(t_m-s),\) thus

$$\begin{aligned} \int _{J_m} E_A (t-s) \hat{R}(s) \, ds&= E_A(t-t_m) \int _{t_{m-1}} ^{t_{m}} E_A(t_m-s) \hat{R}(s) \, ds. \end{aligned}$$
(2.24)

Treating the last integral as (2.23) we have proved the following proposition.

Proposition 2.1

Let \(t\in J_n, \) then with \(\hat{R}^{[j]} _n\) defined by (2.19), the following error representation formula holds:

$$\begin{aligned} \hat{e}(t)&= \int _{t_{n-1}} ^t E_A (t-s) k_n^{\rho -1} A ^{\rho -1} \hat{R}^{[\rho -1]}_n (s) \, ds + \sum _{j=1} ^{\rho -1} k_n^j A^ {j-1} \hat{R}^{[j]}_n(t) \nonumber \\&+ \sum _{m=1}^{n-1}\biggl ( \int _{J_m} E_A (t-s) k_m^\rho A ^\rho \hat{R}^{[\rho ]}_m (s) \, ds + E_A (t- t_{m} ) \sum _{j=1} ^\rho k_m^j A^ {j-1} \hat{R}^{[j]}_m(t_m ) \biggr ).\nonumber \\ \end{aligned}$$
(2.25)

The error representation formula (2.25) will be the starting point of our analysis. We mainly consider \(t=t_n\), which leads to a posteriori error control at the time nodes. We will treat Galerkin schemes and Runge-Kutta methods separately. We use (2.18) in the above error representation formula to obtain

$$\begin{aligned} |e(t_{n})|&= |\hat{e}(t_{n})|\le C_A \int _{J_n} \Big | k_n^{\rho -1} A ^{\rho -1}\, \hat{R}^{[\rho -1]} _n (s) \Big | \, ds + \left| \sum _{j=1} ^{\rho -1} k_n^j A^ {j-1} \hat{R}^{[j]} _n \left( t_n\right) \right| \\&+ C_A\sum _{m=1}^{n-1}\left( \int _{J_m} \Big | \frac{1}{ ({t_{n}}-s) }\,k_m^\rho A ^{\rho -1}\, \hat{R}^{[\rho ]} _m (s) \Big | \, ds + \left| \sum _{j=1} ^\rho k_m^j A^ {j-1} \hat{R}^{[j]} _m \left( t_{m} \right) \right| \right) . \end{aligned}$$

The terms in the above relation are treated differently. The expression (2.20) yields

$$\begin{aligned} \int _{J_n} \Big | k_n^{\rho -1} A ^{\rho -1}\, \hat{R}^{[\rho -1]} _n (s) \Big | \, ds \le \frac{k_{n}^{\rho }}{\rho !}\, \sup _{s \in J_{n}} \big | A ^{\rho -1} \hat{R}( s ) \big | , \end{aligned}$$

and

$$\begin{aligned} \sum _{m=1}^{n-1} \int _{J_m} \Big | \frac{1}{ ({t_{n}}-s) }\,k_m^\rho A ^{\rho -1}\, \hat{R}^{[\rho ]} _m (s) \Big |\, ds&\le \max _m \frac{k_{m}^{\rho }}{\rho !} \sup _{s \in J_{m}} \big | A ^{\rho -1} \hat{R}( s ) \big | \int _0 ^{t_{n-1}} \; \frac{1}{t_{n} -s} ds\, \\&\le \log \frac{t_{n}}{k_{n}}\ \max _{0\le m \le n } \frac{k_{m}^{\rho }}{\rho !} \sup _{s \in J_{m}} \big | A ^{\rho -1} \hat{R}( s ) \big |. \\ \end{aligned}$$

Let

$$\begin{aligned} L_n:= \log \frac{t_n}{k_n} +1, \end{aligned}$$
(2.26)

then we have

$$\begin{aligned} \begin{aligned} |e(t_{n})|&\le C_A L_{n} \ \max _{1\le m \le n } \frac{k_{m}^{\rho }}{\rho !} \Vert A^{\rho -1} \hat{R}\Vert _{L^\infty (J_m,H)} + \sum _{m=1}^{n} \Big | \sum _{j=1} ^\rho k_m^{j} A^ {j-1} \hat{R}^{[j]}_m(t_m )\Big | ,\nonumber \end{aligned}\\ \end{aligned}$$
(2.27)

where \(\Vert A^{\rho -1} \hat{R}\Vert _{L^\infty (J_m,H)} = \sup _{s \in J_{m}} \big | A ^{\rho -1} \hat{R}( s ) \big | \). We note that at the expense of the logarithmic factor \(L_n\) we here reduced the regularity requirement of \(\hat{R}\) from \(D(A^\rho )\) to \(D(A^{\rho -1})\) as compared with a more straightforward estimation in (2.25). We are ready now to derive the main estimates of this section.

2.3 Nodal estimates for Galerkin schemes

In the case of Galerkin schemes (continuous or discontinuous) the error estimates are direct consequences of (2.27). The main point here is that the terms involving \( \hat{R}^{[j]}_m(t_m )\) all vanish due to the orthogonality. In this case we have the following result.

Theorem 2.1

Let \(q\ge 2\) and \(\hat{R} \in D(A^{\rho -1})\) hold for some \(1 \le \rho \le q-1.\) Then, the error of the continuous Galerkin method of degree \(q\) and of the discontinuous Galerkin method dG\((q-1)\) satisfies

$$\begin{aligned} |e(t_n)|\le C_A L_n \max _{1\le m\le n} \frac{ k_m^{\rho }}{ \rho ! } \Vert A^{\rho -1} \hat{R}\Vert _{L^\infty (J_m,H)}, \end{aligned}$$
(2.28)

where \(C_A\) is the stability constant in (2.17) and \(L_n\) is the logarithmic factor given in (2.26).

Proof

Recall that both methods are written in the form

$$\begin{aligned} U^{\prime }(t) + P_{q-1}F(t,{\widetilde{\varPi }_{q}} U(t)) = 0 \quad \forall t \in J_n, \end{aligned}$$
(2.29)

where \({\widetilde{\varPi }_{q}}= I\) for the cG method, and for the dG method \( \widetilde{\varPi }_{q}= I_{q-1},\) where \(I_{q-1}\) is the interpolation operator at the Radau points. Notice that in both cases \(\varPi _{q-1}=P_{q-1}.\) In view of (2.3),

$$\begin{aligned} \hat{U}^{\prime }(t) + \hat{\varPi }_q F(t,\widetilde{\varPi }_{q} U(t)) =0 \quad \forall t\in J_n, \end{aligned}$$
(2.30)

with \(\hat{\varPi }_q = P_q.\) Therefore, in the case of the cG method \(\hat{R}(t) = R_{\hat{U}}(t)+R_{\widetilde{\varPi }_{q}}(t) + R_{\hat{\varPi }_q}(t) + R_f(t) \) where

$$\begin{aligned} R_{\hat{U}}=A(U-\hat{U}), \quad R_{\hat{\varPi }_q}\!=\!A(P_q -I) U, \quad R_{\widetilde{\varPi }_{q}}(t)\!=\!0, \quad R_f\!=\!f-P_qf.\qquad \quad \end{aligned}$$
(2.31)

Hence by (2.4) we have

$$\begin{aligned} \int _{J_n}\langle \hat{R}, v\rangle \, dt =0 \quad \ \forall v\in \mathcal{V }_{q-2}(J_n). \end{aligned}$$
(2.32)

In view of the definition of \(\hat{R}^{[j]}_m\) of (2.20), we obtain

$$\begin{aligned} \hat{R}^{[j]}_m(t_m)=0, \end{aligned}$$
(2.33)

so that the terms involving \(\hat{R}^{[j]}_m(t_m)\) in (2.27) vanish and (2.35) follows.

In the case of the dG method, the properties of Gauss–Radau quadrature imply

$$\begin{aligned} \int _{J_n} \langle A(I_{q-1} U - U),v \rangle ~dt = 0 \quad \forall v\in \mathcal{V }_{q-2}(J_n). \end{aligned}$$
(2.34)

Thus, given that in the expression for \(\hat{R}\) the difference to the cG case is that \(R_{\widetilde{\varPi }_{q}} + R_{\hat{\varPi }_q} = A(I_{q-1} U - U)\), (2.32) still holds in this case as well. The proof is thus complete. \(\square \)

Remark 2.2

(Order of convergence) One notices that the highest possible order of the residual \(\hat{R}\) for dG is \(q\) in (2.35), whereas it is \(q+1\) for cG. Hence the highest order in (2.35) is \(2q\) in for cG and \(2q-1\) for dG, as expected. The difference is due to the fact that in the dG case the residual \(\hat{R}\) contains an additional term of the form \(R_{\widetilde{\varPi }_{q}}= A(U-I_{q-1}U)\).

Note, however, that the full order is attained only if \(\hat{R} \in D(A^{q-2})\). This is usually not satisfied, since it requires unnatural compatibility conditions at the boundary when \(A\) is an elliptic operator with Dirichlet or Neumann boundary conditions and \(H=L_2(\Omega )\). In particular, in the case of Dirichlet conditions this would require that \(\widehat{U}\) vanishes on the boundary, which is not true in general when \(f\) does not vanish on the boundary. In this case one only has \(A^{-1}\widehat{R}\in H=L_2(\Omega )\).

We now turn to an a posteriori estimate (of order \(q+1\)) in terms of \(|A^{-1}\hat{R}(t)|\), which formally corresponds to the case \(\rho =0\) in the previous theorem.

Theorem 2.3

Let \(A^{-1}\hat{R} \in H\). Then, the error of the continuous Galerkin method of degree \(q\) and of the discontinuous Galerkin method dG\((q-1)\) satisfies

$$\begin{aligned} |e(t_n)|&\le C_A L_n \max _{1\le m\le n-1} \Vert A^{-1} \hat{R}\Vert _{L^\infty (J_m,H)} + 2\Vert A^{-1} \hat{R} \Vert _{L^\infty (J_n,H)}\nonumber \\&+k_n \Vert A^{-1} \hat{R}^{\prime } \Vert _{L^\infty (J_n,H)} \end{aligned}$$
(2.35)

where \(C_A\) is the stability constant in (2.17) and \(L_n\) is the logarithmic factor given in (2.26).

Proof

At \(t=t_n\) we split the integral in the error formula (2.14) into the integrals from \(0\) to \(t_{n-1}\) and from \(t_{n-1}\) to \(t_n\) and use partial integration in the latter integral:

$$\begin{aligned} e(t_n)=\widehat{e}(t_n)&= \int _0^{t_{n-1}} AE_A(t_n-s)\, A^{-1}\hat{R}(s)\, ds \\&+ A^{-1}\hat{R}(t_n) - E_A(k_n) A^{-1}\hat{R}(t_{n-1}) - \int _{J_n} E_A(t_n-s) A^{-1}\hat{R}^{\prime }(s)\,ds. \end{aligned}$$

The result then follows on using (2.17).

2.4 Nodal estimates for collocation methods

In this section we establish a posteriori estimates for the nodal error for RK collocation methods. We recall that the classical order \(p\) of the RK-C method satisfies \(q+1\le p \le 2q, \) i.e., \(1\le \rho \le r = p-q-1.\) The main difference to the case of Galerkin schemes is that the terms involving \(\hat{R}^{[j]}_m (t_{m} ) \) give rise to non-zero expressions involving the inhomogeneity \(f.\) In this case we choose \(\hat{\varPi } _q = \widehat{I}_q ,\) [2, 3]. \(\widehat{I}_q \) is an extended interpolation operator defined on continuous functions \(v\) with the following two key properties

$$\begin{aligned} \widehat{I}_q v\in \mathcal{H }_q(J_n), \quad (\widehat{I}_q v)(t_{n,i})=v(t_{n,i}), \quad i=1,\ldots ,q. \end{aligned}$$
(2.36)

\(\widehat{I}_q\) interpolates \(v\) at one more point, either inside \(J_n\) either outside given that \(v\) is defined at an extended interval. This issue was discussed in detail in [3].

Now, as before we start from \(\hat{R}(t) = R_{\hat{U}}(t)+R_{\widetilde{\varPi }_{q}}(t) + R_{\hat{\varPi }_q}(t) + R_f(t) \) where

$$\begin{aligned} R_{\hat{U}}=A(U-\hat{U}), \quad R_{\hat{\varPi }_q}\!=\!A( \widehat{I}_q -I) U, \quad R_{\widetilde{\varPi }_{q}}(t)\!=\!0, \quad R_f\!=\!f-\widehat{I}_q f.\qquad \quad \end{aligned}$$
(2.37)

Therefore by the assumptions on \(\widehat{I}_q\) we have

$$\begin{aligned} \int _{J_n}\langle R_{\hat{U}} + R_{\hat{\varPi }_q}+ R_{\widetilde{\varPi }_{q}}(t), v\rangle \, dt =0 \quad \ \forall v\in \mathcal{H }_{{r-1}}(J_n). \end{aligned}$$
(2.38)

Concerning the remaining term \(R_f\), we introduce the notation

$$\begin{aligned} E^{[j]}_{f,n} = \int _{J_n} \frac{(t_n-\tau ) ^{j-1}}{(j-1)!}\, R_f(\tau ) \,d\tau , \quad 1\le j \le r. \end{aligned}$$
(2.39)

This is just the quadrature error of the function \((t_n-\tau )^{j-1}/(j-1)! \cdot f(\tau )\) over the interval \(J_n\),

$$\begin{aligned} E^{[j]}_{f,n} = \int _{J_n} \frac{(t_n-\tau ) ^{j-1}}{(j-1)!}\, f(\tau ) \,d\tau - k_n \sum _{i=1}^q b_i \, \frac{((1-c_i)k_n) ^{j-1}}{(j-1)!}\, f(t_{n-1}+c_i k_n), \end{aligned}$$

which is of optimal order \(O(k_n^{p+1})\) if \(f\) is \(p\)-times continuously differentiable. Then, in view of the definition of \(\hat{R}^{[j]}_n\) of (2.20) and due to (2.38), we have

$$\begin{aligned} k_n^j \hat{R}^{[j]}_n (t_n) = E^{[j]}_{f,n}. \end{aligned}$$

With (2.27) we therefore obtain the following result. A similar result holds for perturbed collocation methods, [12], compare to [3].

Theorem 2.3

Let the classical order \(p\) of a \(q-\)stage Runge-Kutta collocation method satisfy \(p\ge q+2\) and let \(\hat{R},f\in D(A^{\rho -1})\) for \(1\le \rho \le r=p-q-1.\) Then the following a posteriori error estimate is valid at the nodes \(t_n\):

$$\begin{aligned} |e(t_n)| \le C_A L_n \max _{1\le m\le n} \frac{ k_m^{\rho }}{ \rho ! }\Vert A^{\rho -1} \hat{R}\Vert _{L^\infty (J_m,H)} + \sum _{m=1}^{n} \sum _{j=1} ^\rho \left| A^ {j-1} E^{[j]}_{f,m}\right| . \end{aligned}$$

The full classical order \(p\) is attained when \(\hat{R},f\in D(A^{r-1})\), which for \(r>1\) again imposes unnatural compatibility conditions. Theorem 2.2 holds also for the collocation method, with the same proof.

Remark 2.3

The estimate in [3] for RK collocation methods is similar. The first term on the right hand side is the same but the term involving the quadrature errors \(E^{[j]}_{f,m}\) differs to the one of [3] which is

$$\begin{aligned} C_A L_n \sum _{j=0}^{\rho -1} \max _{1\le m\le n} \Big (k_m^j \Vert A^{j-1}\big (f-\widehat{I}_{q+\rho -j}f\big )\Vert _{L^\infty (J_m,H)} \Big ). \end{aligned}$$
(2.40)

The auxiliary interpolator operators \(\widehat{I}_\ell \) are defined as follows: Let \(\hat{t}_{m,j} \in J_m\), with \(j=1,\ldots ,\rho \), be pairwise distinct and different from \(t_{m,i}\), with \(i=0,\ldots ,q\). The operator \(\widehat{I}_\ell \) is an interpolation operator of order \(\ell \) with \(\ell =q+1,\ldots ,q+\rho ,\) defined on continuous functions \(v\) on \([0,T]\) and values on \(\mathcal{H }_\ell (J_m)\):

$$\begin{aligned} (\widehat{I}_\ell v)(\sigma )=v(\sigma ), \quad \sigma =t_{m,i},\hat{t}_{m,j}, \quad i=0,\ldots , q, \quad j=1,\ldots , \ell -q. \end{aligned}$$

Here, in contrast to [3] we have chosen not to include the non-homogeneous term in the argument involving the strong stability of \( E_A.\) For that reason our bound has one higher power of \(A.\) In both cases the required regularity of \(\hat{R}\) remains the same. Nevertheless, the second term in Theorem 2.3 can be controlled by the terms appearing in (2.40). To see why, notice that our assumptions imply

$$\begin{aligned} \int _{J_\ell } \, \widehat{I}_q v \, dx=\int _{J_\ell } \, v \, dx \, , \quad v\in \mathcal{H }_{p-1}. \end{aligned}$$

Then, with \(\widehat{I}_\ell \) as above we have,

$$\begin{aligned} \begin{array}{ll} \displaystyle E^{[j]}_{f,m} &{}= \displaystyle \int \limits _{J_\ell } \dfrac{(t_m-\tau ) ^{(j-1)}}{(j-1)!} (f-\widehat{I}_qf) (\tau ) d\tau \\ &{} = \displaystyle \int \limits _{J_m} \dfrac{(t_m-\tau ) ^{(j-1)}}{(j-1)!} (f-\widehat{I}_{p-j}f) ( \tau ) d\tau \\ &{}\quad + \displaystyle \int \limits _{J_m} \dfrac{(t_m-\tau ) ^{(j-1)}}{(j-1)!} (\widehat{I}_{p-j} f-\widehat{I}_qf) (\tau ) d\tau . \end{array} \end{aligned}$$

The last integral is zero and therefore,

$$\begin{aligned} \begin{array}{l} \big | A^{j-1}E^{[j]}_{f,m} \big | \le \dfrac{k _m ^{j}}{(j-1)!} \big \Vert A^{j-1}(f-\widehat{I}_{p-j}f )\big \Vert _{L^\infty (J_m,H)}. \end{array} \end{aligned}$$
(2.41)

3 Interior a posteriori error bounds

We prove the following main result, which yields full-order a posteriori error bounds in the interior of the domain without requiring any compatibility conditions on the boundary. By \(H^{m}(S)\) we denote the standard Sobolev space of order \(m\) defined on a domain \(S.\)

Theorem 3.1

Let \(A\) be the negative Laplacian on a bounded Lipschitz domain \(\Omega \subset \mathbb{R }^d\), equipped with Dirichlet boundary conditions. Let \(\omega \subset \widehat{\omega }\subset \Omega \) be Lipschitz subdomains such that the boundaries of the three domains have pairwise distances of at least \(\delta >0\).

Let \(q\ge 2\) and \(\hat{R}|_{\widehat{\omega }} \in H^{2\rho }(\widehat{\omega })\) for some \(1 \le \rho \le r=p-q-1\), where \(p\) is the classical order of the method. Then, the error of the continuous Galerkin method of degree \(q\) and of the discontinuous Galerkin method dG\((q-1)\) satisfies

$$\begin{aligned} \Vert e(t_n)\Vert _{L_2(\omega )}\le C_1 \sum _{m=1}^n k_m^{\rho } \int _{J_m}\Bigl ( \Vert \hat{R}(t)\Vert _{H^{2\rho }(\widehat{\omega })} + \Vert \hat{R}(t)\Vert _{H^{-2}(\Omega )} \Bigr )dt, \end{aligned}$$
(3.1)

where \(C_1\) depends only on \(\Omega \) and \(\delta \).

Theorem 3.2

In the situation of Theorem 3.1, the error of a \(q\)-stage Runge–Kutta collocation method satisfies

$$\begin{aligned} \nonumber \Vert e(t_n)\Vert _{L_2(\omega )}&\le C_1 \sum _{m=1}^n k_m^{\rho } \int _{J_m}\Bigl ( \Vert \hat{R}(t)\Vert _{H^{2\rho }(\widehat{\omega })} + \Vert \hat{R}(t)\Vert _{H^{-2}(\Omega )} \Bigr )dt \\&+\ C_2 \sum _{m=1}^n \sum _{j=1}^{\rho } \Bigl ( \Vert E_{f,m}^{[j]}\Vert _{H^{2(j-1)}(\widehat{\omega })} + \Vert E_{f,m}^{[j]} \Vert _{H^{-2}(\Omega )} \Bigr ), \end{aligned}$$
(3.2)

where \(E_{f,m}^{[j]}\) is the quadrature error defined in (2.39) and \(C_1,C_2\) depend only on \(\Omega \) and \(\delta \).

The interior nodal error bounds are of optimal order \(p\) when \(\widehat{R}\) is sufficiently regular in a neighbourhood of the subdomain \(\widehat{\omega }\), which is the case if the solution is sufficiently regular on \(\widehat{\omega }.\) In this case we expect that a higher regularity version of (2.12) yields

$$\begin{aligned} k_m^r \Vert \hat{R}(t)\Vert _{H^{2r}(\widehat{\omega })} = O(k_m^{p}), \qquad t\in J_m. \end{aligned}$$

It is important to note that the regularity away from \(\widehat{\omega }\) and the boundary behaviour on \(\partial \Omega \) play no role.

As the proof below shows, the dependence of \(C_1,C_2\) on the domain \(\Omega \) is only through the constants in Poincaré–Friedrichs inequalities. The dependence on \(\delta \) is obtained such that \(C_1,C_2=O(\delta ^{-4(\rho +2)})\).

The result could straightforwardly be generalized to any second-order elliptic differential operator with smooth coefficients and appropriate essential boundary conditions.

For the proof we consider a finite chain of domains

$$\begin{aligned} \omega = \omega _0 \subset \omega _1 \subset \dots \subset \omega _{\ell -1}=\widehat{\omega }\subset \omega _{\ell } = \Omega , \end{aligned}$$

where \(\ell = 2\rho +4\) and the distance from \(\omega _j\) to the boundary of \(\omega _{j+1}\) is for all \(j\) bounded from below by a constant times \(\delta \). To these regions we associate smooth cutting functions \(\chi _j\) on \(\Omega \) such that

$$\begin{aligned} \chi _j \equiv 1\quad \text{ in } \omega _j,\qquad \chi _j \equiv 0\quad \text{ outside } \omega _{j+1} \end{aligned}$$

for \(j=0,1,\ldots ,\ell -1\), and \(\chi _{\ell }\equiv 1\) on \(\Omega \). Viewed as multiplication operators, these functions have the following property with respect to the norm \(|\cdot |\) of \(H=L_2(\Omega )\):

$$\begin{aligned} |A^{-(j+1)/2}(A\chi _j - \chi _j A)v| \le \beta \, |A^{-j/2} \chi _{j+1}v|. \end{aligned}$$
(3.3)

For \(A=-\Delta \), this bound is a consequence of the fact that the commutator \(A\chi _j - \chi _j A\) is a first-order differential operator: \( -\Delta (\chi _j\varphi ) + \chi _j\Delta \varphi = -(\Delta \chi _j)\varphi -2 \nabla \chi _j\cdot \nabla \varphi , \) and from this we also note that \(\beta \) is proportional to \(\delta ^{-2}\).

Lemma 3.1

If operators \(\chi _0,\ldots ,\chi _\ell \) satisfy (3.3) and \(\chi _\ell =\mathrm{id}\), then

$$\begin{aligned} |\chi _0 E_A(t) v|^2 \le |\chi _0v|^2 + \beta ^2 |A^{-1/2}\chi _1v|^2 +\cdots + \beta ^{2\ell } |A^{-\ell /2}\chi _\ell v|^2. \end{aligned}$$

Proof

We denote \(w(t)=E_A(t)v\) and \(B_j=\chi _j A - A \chi _j\). Since \(w(t)\) satisfies \(w^{\prime }+Aw=0\), \(w(0)=v\), we have

$$\begin{aligned} \chi _0 w^{\prime } + A \chi _0 w = B_0 w, \quad \chi _0w(0)=\chi _0v. \end{aligned}$$

The standard parabolic energy estimate yields

$$\begin{aligned} |\chi _0w(t)|^2 + \int _0^t |A^{1/2} \chi _0w(s)|^2\, ds \le |\chi _0v|^2 + \int _0^t | A^{-1/2} B_0w(s)|^2\, ds \end{aligned}$$

and hence, by (3.3),

$$\begin{aligned} |\chi _0w(t)|^2 \le |\chi _0v|^2 + \beta ^2 \int _0^t | \chi _1w(s)|^2\, ds. \end{aligned}$$

Since \(\chi _1w(t)\) solves \(\chi _1 w^{\prime } + A \chi _1 w = B_1w,\chi _1w(0)=\chi _1v,\) we obtain by the same argument

$$\begin{aligned} \int _0^t | \chi _1w(s)|^2\, ds \le |A^{-1/2}\chi _1v|^2 + \beta ^2 \int _0^t | A^{-1/2}\chi _2w(s)|^2\, ds. \end{aligned}$$

Continuing in this way, we have for \(j=1,\ldots , \ell -1\)

$$\begin{aligned} \int _0^t | A^{-(j-1)/2}\chi _jw(s)|^2\, ds \le |A^{-j/2}\chi _jv|^2 + \beta ^2 \int _0^t | A^{-j/2}\chi _{j+1}w(s)|^2\, ds. \end{aligned}$$

Since \(\chi _\ell =\mathrm{id}\), for \(j=\ell -1\) the last integral term equals

$$\begin{aligned} \int _0^t | A^{-(\ell -1)/2}w(s)|^2\, ds \le |A^{-\ell /2} v|^2. \end{aligned}$$

Concatenating the above estimates completes the proof. \(\square \)

Proof

(of Theorem 3.1) We work in the Hilbert space \(H=L_2(\Omega )\) with the norm \(|\cdot |=\Vert \cdot \Vert _{L_2(\Omega )}\). We begin by noting

$$\begin{aligned} \Vert e(t_n) \Vert _{L_2(\omega )} \le |\chi _0 e(t_n)| \end{aligned}$$

and \(e(t_n)=\widehat{e}(t_n)\). For Galerkin methods we obtain from (2.25) and the Galerkin orthogonality (2.33) that

$$\begin{aligned} |\chi _0 e(t_n)| \le \sum _{m=1}^{n-1} \int _{J_m} k_m^\rho \, \bigl | \chi _0 E_A(t_n-s) A^\rho \hat{R}_m^{[\rho ]}(s) \bigr | \, ds. \end{aligned}$$

By Lemma 3.1 with \(\ell =2\rho +2\) we have, for \(w=\hat{R}_m^{[\rho ]}(s)\),

$$\begin{aligned} \bigl | \chi _0 E_A(t_n-s) A^\rho w \bigr |^2 \le \bigl | \chi _0 A^\rho w \bigr |^2 + \beta ^2 \bigl | A^{-1/2} \chi _1 A^\rho w \bigr |^2 +\cdots \beta ^{2\ell } \bigl | A^{-\ell /2} \chi _\ell A^\rho w \bigr |^2. \end{aligned}$$

We now show that we can estimate

$$\begin{aligned} \bigl | A^{-j/2} \chi _j A^\rho w \bigr | \le C \, \Vert w \Vert _{H^{2\rho -j}(\omega _{j+2})}. \end{aligned}$$

For this we use a duality argument:

$$\begin{aligned} \bigl | A^{-j/2} \chi _j A^\rho w \bigr | =\sup _{\varphi \in C^\infty _0(\Omega ),\,\varphi \ne 0} \frac{\langle \chi _j A^\rho w, \varphi \rangle }{|A^{j/2}\varphi |} = \sup _{\varphi \ne 0} \frac{\langle A^{\rho -j/2}\chi _{j+1} w, A^{j/2} \chi _j \varphi \rangle }{|A^{j/2}\varphi |}. \end{aligned}$$

Since the norm \(| A^{j/2} \cdot |\) is equivalent to the \(H^{j}(\Omega )\) Sobolev norm on \(C^\infty _0(\Omega )\), we have

$$\begin{aligned} \bigl | A^{j/2} \chi _j \varphi \bigr | \le C^{\prime } \bigl | A^{j/2} \varphi \bigr | \qquad \forall \, \varphi \in C^\infty _0(\Omega ). \end{aligned}$$

Hence,

$$\begin{aligned} \bigl | A^{-j/2} \chi _j A^\rho w \bigr | \le C^{\prime } \bigl | A^{\rho -j/2}\chi _{j+1} w | \le C^{\prime \prime } \, \Vert \chi _{j+1} w \Vert _{H^{2\rho -j}(\Omega )} \le C \, \Vert w \Vert _{H^{2\rho -j}(\omega _{j+2})}, \end{aligned}$$

which is the desired estimate. Combining the above estimates, we obtain

$$\begin{aligned} |\chi _0 e(t_n)|&\le C\sum _{m=1}^{n-1} k_m^{\rho } \int _{J_m} \Bigl ( \Vert \hat{R}_m^{[\rho ]}(s) \Vert _{H^{2\rho }(\omega _{2})} \!+\! \beta \Vert \hat{R}_m^{[\rho ]}(s) \Vert _{H^{2\rho -1}(\omega _{3})} \!+\! \cdots \Bigr . \\&\quad \Bigl . \!+\!\ \beta ^{\ell -3} \Vert \hat{R}_m^{[\rho ]}(s) \Vert _{H^{-1}(\omega _{\ell -1})}\!+\! (\beta ^{\ell -2}\!+\!\beta ^{\ell -1}\!+\!\beta ^\ell ) \Vert \hat{R}_m^{[\rho ]}(s) \Vert _{H^{-2}(\Omega )} \Bigr ) ds, \end{aligned}$$

which implies the error bound of Theorem 3.1 for the Galerkin methods. For the Runge–Kutta methods, there appear in addition the quadrature errors \(E_{f,m}^{[j]}\) of (2.39), which are treated in the same way.

Remark 3.1

While the results of Sect. 2 hold equally for an elliptic operator as well as for its finite element discretization (with \(A\) denoting the continuous and discrete operator, respectively), Theorem 3.1 is stated only for the spatially continuous, temporally discretized case. An analogous result would also hold for the full—space and time—discretization if the discrete operator \(A_h\) satisfied an estimate (3.3) with the operators given as \(\chi _{j,h}v_h= \Pi _h (\chi _jv_h)\) for \(v_h\in V_h\), where \(\Pi _h\) is a suitable projection to the finite element space \(V_h\). Though it may be expected that such an estimate would hold for time-independent quasi-uniform meshes with a constant \(\beta \) independent of the mesh size \(h\), a detailed analysis of this question is outside the scope of this paper, which is concerned only with the time discretization aspects.