River flow modelling using nonparametric functional data analysis

Time series and extreme value analyses are two statistical approaches usually applied to study hydrological data. Classical techniques, such as autoregressive integrated moving‐average models (in the case of mean flow predictions), and parametric generalised extreme value fits and nonparametric extreme value methods (in the case of extreme value theory) have been usually employed in this context. In this article, nonparametric functional data methods are used to perform mean monthly flow predictions and extreme value analysis, which are important for flood risk management. These are powerful tools that take advantage of both, the functional nature of the data under consideration and the flexibility of nonparametric methods, providing more reliable results. Therefore, they can be useful to prevent damage caused by floods and to reduce the likelihood and/or the impact of floods in a specific location. The nonparametric functional approaches are applied to flow samples of two rivers in the United States. In this way, monthly mean flow is predicted and flow quantiles in the extreme value framework are estimated using the proposed methods. Results show that the nonparametric functional techniques work satisfactorily, generally outperforming the behaviour of classical parametric and nonparametric estimators in both settings.


Introduction
Prediction of future values is essential for the design of water systems, and control measures will be more effective if the process is reliable.Likewise, management and scheduling of areas exposed to flood risk rely heavily on tools for frequency analysis of hydrological extremes.
Numerous studies have been carried out on hydrological problems using statistical methods.Among them, time series prediction is topical in this field (Toth et al., 2000;Tamea et al., 2005;Wu et al., 2009).Research studies on time series also include linear models for forecasting river flows (see, e.g., Wang et al., 2009, and references therein).Among the several techniques to model time series, autoregressive integrated moving average (ARIM A) models described well the data analysed in the present research and, therefore, they were employed to fit the hydrological time series studied.Moreover, with this choice, a similar comparison (between ARIM A models and nonparametric functional methods) to that performed in some works widely cited in the literature (Ferraty et al., 2005) can be carried out.Basically, ARIM A models are preferred for time series of short-memory type (the autocorrelation structure decreases quickly), while, in other cases, hydrological processes are of long-memory type.Other possible alternatives not considered in the present research would be, for example, fractional Gaussian noise and broken line models (Koutsoyiannis, 2000).
Statistics of extremes (Coles, 2001) is also one of the most significant techniques in frequency analysis (see, e.g., Katz et al., 2002;Saf, 2009;Singh et al., 2005).Daily, monthly or annual maximum time series of river flow recordings are typically represented by the generalised extreme value (GEV) distribution.
ARIM A and GEV fitting are typical examples of parametric modelling.A different type of statistical model applied to hydrological data involves using nonparametric curve estimation methods, which does not require restrictive assumptions on the distribution of the population of interest.Several papers have applied nonparametric estimation methods to hydrological time series to carry out predictions as well as to perform extreme value analysis (Lall et al., 1993;Guo et al., 1996;Sharma et al., 1997;Kim and Heo, 2002;Wang et al., 2009;Quintela-Del-Río, 2011).Further details regarding nonparametric techniques (including theoretical motivations, practical applications to several scientific fields and references) may be found in, for instance, the books of Ruppert et al. (2003) or Wasserman (2005).
Time series were recently analysed by nonparametric functional data analysis (NFDA) (Ferraty and Vieu, 2006;Ramsay and Silverman, 2005).NFDA works with data consisting of curves or multidimensional variables.Different procedures using these techniques have been applied to several complex real problems (Besse et al., 2000;Hall et al., 2001;Fernández De Castro et al., 2005;Castellano Méndez et al., 2009).
The present paper focuses on applying NFDA techniques in prediction problems and extreme value analysis in the setting of hydrology.The organisation of the paper is as follows.Section 2 presents the statistical methods used in this paper.These methods correspond to ARIM A models for time series prediction (Section 2.1), and GEV para-metric estimators and nonparametric methods in extreme value analysis (Sections 2.2.1 and 2.2.2, respectively).Next, the new proposals using NFDA to study both problems (time series prediction and extreme value analysis) are presented (Section 2.3.1).In Section 3, these tools are applied to river flow data from two sites in the US.Finally, in Section 4, a general discussion of the results is included.

Time series analysis. ARIM A models
Let {Z t } t∈R be a stochastic process or time series observed until a time T .Usually, the process is observed at N discretised times, and the observations are denoted by {Z 1 , . . ., Z N }.
To predict a future value Z N +s , the simplest way consists of taking into account one single past value.This is done by constructing a two-dimensional statistical sample of size n = N −s, by setting X i = Z i and Y i = Z i+s , with i = 1, ..., N −s.Therefore, the problem is converted into a standard prediction problem of a response Y , given an explanatory variable X.This can be generalised by considering the following autoregressive process of order p: where ε i is the error process, assumed to be independent of Z i , and the aim is to estimate the function m(•).
A first approximation consists in assuming that m(•) belongs to a particular class of functions, only depending on a finite number of parameters to be estimated, such as the ARIM A(p, d, q) models (Singh et al., 2005).If d is a non-negative integer, then {Z t } is said an ARIM A(p, d, q) process if Y t = (1 − B) d Z t is a causal ARM A(p, q) process, where B is the backward shift operator defined by B j Z t = Z t−j , j = 0, ±1, ±2, . . .(Brockwell and Davis, 1991).Note that the process {Z t , t = 0, ±1 ± 2, . ..} is said an ARM A(p, q) process if {Z t } is stationary and if for every t, where {ε t } is a process of error terms, generally assumed to be uncorrelated random variables with mean 0 and variance σ 2 .In a ARIM A(p, d, q), p is the order (number of time lags) of the autoregressive model, d is the degree of differencing (the number of times the data have had past values subtracted), and q is the order of the moving-average model.The good practical properties of ARIM A models have led to regularly use them to study hydrological problems.Some relevant papers on this topic are, for example, Montanari et al. (1997), Toth et al. (2000) or Tamea et al. (2005).
The prediction problem can be tackled using nonparametric methods.To apply these methods only some mild regularity conditions on the function m(•) have to be assumed.The "curse of dimensionality problem" (Wand and Jones, 1995, p. 90) is particularly troublesome in this nonparametric framework.It has to do with the selection of the number of past values to consider in the model.This is indeed an important question.The lower the number of past predictors, the less flexible the model is, but when the lag increases, a large number of observations are needed to obtain good estimates of the model parameters.This number increases exponentially as the dimension becomes larger.

The GEV distribution
Suppose X 1 , . . ., X n is a sequence of extreme values with a common distribution function F .In the context of the present paper, these variables can be the maximum river flows measured in a specific period of time (24 h, a month, a year, etc.).Classical parametric extreme value theory uses the idea that, under certain regularity conditions (Fisher and Tippett, 1928), the limit of the distribution function F of the maximum is the GEV distribution.Its cumulative distribution function is: with θ = (µ, σ, γ).Here, µ is the location parameter, σ > 0 is the scale parameter, and γ is the shape parameter.Mean and standard deviation are obtained as functions of these parameters (see, e.g.Coles, 2001).The range of definition of the GEV distribution depends on γ.If γ ̸ = 0, F θ (x) is defined for x such that 1 + γ(x − µ)/σ > 0, while if γ = 0, it is defined for −∞ < x < ∞.Various values of the shape parameter yield the extreme value type I, II, and III distributions.Specifically, the three cases γ = 0, γ > 0, and γ < 0 correspond to the Gumbel, Fréchet, and "reversed" Weibull distributions, respectively.Using the random sample of extreme values, an estimator θ for θ can be obtained.Then, substituting F by F θ, estimators of some important functions in this framework can be defined.For instance: • The function providing the probabilities of exceedance.In the context of the present paper, it corresponds to the function that, for a river flow c, gives the probability of obtaining a flow larger than c (per unit of time).It is defined as • The flow quantile, defined as the value of the flow that can be expected to be once exceeded during a T -period of time.For each value of T , it is given by • The mean return period or recurrence interval of a particular river flow c, defined as an estimator of the interval of time between events of this flow.It can be expressed as the inverse of the probability that a flow c will be exceeded in one period of time: An application of these expressions is given in Section 3.2.

Nonparametric estimators
The main advantage of working with nonparametric methods is that they are modelfree, that is, no specific functional form is required for the parameters or curves to be estimated.Several nonparametric estimators for different functions of interest have been developed in the last decades.In this work, kernel estimators of the density function and the distribution function are used.Let X be a continuous random variable, with density function f and distribution function F. Given a random sample X 1 , . . ., X n , each X i having the same distribution as X, the Parzen-Rosenblatt nonparametric kernel estimator (Parzen, 1962) of f is defined by: where K is a kernel function (normally, K is a density function with some regularity conditions) and h = h(n) ∈ R + is the smoothing parameter (or bandwidth) that regulates the amount of smoothing to be used.From the relation between a density function and a distribution function, a nonparametric kernel estimator of the distribution function can be directly constructed: where dt is the distribution function of the kernel K. Using equation ( 7), nonparametric estimators of the probabilities of exceedance, the flow quantiles, and the recurrence intervals defined in (3), ( 4) and ( 5), respectively, can be obtained: and An important first step to compute (8), ( 9) and ( 10) is the selection of the smoothing parameter h.Popular techniques to select the bandwidth are the modified cross-validation (Bowman et al., 1998;Quintela-Del-Río, 2011) and plug-in methods (Lall et al., 1993;Quintela-Del-Río, 2011).In the examples presented in this work, a cross-validation bandwidth selection method is used.
In an extreme value framework, it can be of interest to estimate the flow quantiles or the return periods for extremely large events.In a hydrological context, Lall et al. (1993) found that the previous nonparametric estimators can suffer from boundary problems.Some authors have addressed extrapolation issues using nonparametric estimators like those given in ( 9) or ( 10).They basically focused on studying the influence of the kernel function and the bandwidth parameter in the final results.Regarding the kernel, while Guo et al. (1996) proposed to use a Gumbel kernel and Lall et al. (1993) discussed the use of a variable kernel distribution function estimator to tackle this problem, Adamowski and Feluch (1990) tested Gaussian, Gumbel and Epanechnikov kernels in flood frequency analysis and found that the choice of the kernel is not important, and the shape of the kernel does not affect extrapolation accuracy.As for the smoothing parameter, the use of variable or local bandwidths to address the extrapolation problem was discussed in Adamowski (1989) or Guo et al. (1996).Note that the variance of the (parametric or nonparametric) estimators can increase significantly when the interest is to estimate extremely large flow quantiles.For this, in that case, the results obtained should be considered carefully.In the present paper, the methods will always be applied for values inside the range of the observed data.
Nonparametric kernel quantile function estimators based on smoothing the empirical quantile function are proposed and studied by Moon and Lall (1994) and Apipattanavis et al. (2010).They follow similar ideas, but while in Moon and Lall (1994) the Gasser-Müller estimator (Gasser and Müller, 1984) with higher order kernel is used, in Apipattanavis et al. ( 2010) the smoothing process is carried out employing the local polynomial estimator (Fan and Gijbels, 1996) with a local bandwidth.

Functional data. NFDA techniques
Let {(χ i , Y i ), i = 1, . . ., n} be a sample of n random pairs, each distributed as (X , Y ), where the variable X is of functional nature (a curve), and Y is scalar.Formally, X is a random variable valued in some semi-metric functional space E, and d(•, •) denotes the associated semi-metric, according to the definition (Ferraty and Vieu, 2006): The conditional cumulative distribution of Y given X is defined for any y ∈ R and any χ ∈ E by: A functional variable can be considered a generalisation of a multidimensional variable, assuming that the variable χ is p-dimensional, with p an integer (for example, p = 12 for the monthly mean flow in the twelve months of a year).In this case, the functional space would be E = R p and the semi-metric could be the classical Euclidean distance or some equivalent measure (Ramsay and Silverman, 2005).
Both parametric and nonparametric methods can be used in functional data applications.The monograph of Ferraty and Vieu (2006) provides a benchmark of nonparametric curve estimation for functional data.As shown in this book, the conditional distribution F (•|χ) given in ( 11) can be nonparametrically estimated by: where K is a kernel function and H is defined as the distribution of another kernel density function K 0 , that is, H(x) = x −∞ K 0 (u)du.Parameters g and h are smoothing parameters or bandwidths (they could take the same value).
Expression ( 12) is a direct extension of the nonparametric estimator of a conditional distribution function (F (y|X = x), for (X, Y ) real random variables (Hall et al., 1999)).The main difference between functional and non-functional estimators lies in the use of a semi-metric d(χ, χ i ) instead of the Euclidean distance ∥χ − χ i ∥.Several types of kernel functions and semi-metrics can be considered (see Ferraty and Vieu, 2006, Sections 3.2-3.4),depending, essentially, on the data at hand.Theoretical optimality properties of the estimator (12) can be found in Quintela-Del-Río ( 2008).
An important advantage of NFDA techniques is that the framework model reduces to a bivariate setting and, therefore, the curse of dimensionality problem is basically avoided.Additionally, the boundary problems of nonparametric estimators, previously described, can be partially avoided in functional data estimation.This fact requires a proper choice of the semi-metric (Ferraty and Vieu, 2006).

NFDA applied to time series analysis
As it is well known, ARIM A models are constrained by their particular structure and the number of past values used in the statistical model for prediction purposes.NFDA methods overcome these two restrictions, because of the nonparametric nature of the approaches and dividing the observed seasonal time series into a sample of curves.In Section 3.1, NFDA methods are applied to predict monthly mean flows in practical situations, and the performance of these approaches is compared with that obtained when ARIM A models are employed.
To analyse a monthly mean series as a set of functional data, the original time series is converted into annual curves.Note that if there were some months in which the corresponding measures were not available, the curves would not have the same number of components (this is known as an unbalanced data setting), and more complex specific preprocessing would be required (see Section 3.6 of Ferraty and Vieu, 2006).Let {Z k } N k=1 be the complete time series.For i = 1, . . ., n, the annual curves, χ i = (χ i (1), . . ., χ i (12)), are constructed, where corresponds to the monthly mean flows of the ith year.Each annual curve is considered as a continuous path (i.e.χ i = {Z 12•(i−1)+t , t ∈ [0; 12)}), but observed only at 12 discretised points.Thus, the time series consists of a sample of n dependent functional data χ 1 , . . ., χ n .
In this way, much information from the past of the time series can be taken into account, but still using for the past a single continuous object (exactly one year).For more insight on this issue, let us suppose, for instance, that the time series could be measured p-times each year with p > 12.In this case, the functional data analysis will consider the whole continuous past year and the same asymptotic behaviour remains, independent of p.
In order to predict the monthly mean flow in the year n + 1, the following process was carried out.For i = 1, . . ., n − 1 and for any fixed δ in {1, . . ., 12}, take Y i (δ) = χ i+1 (δ), i.e., Y i (δ) denotes the monthly flow in the month δ and the year i + 1.Thus, a sample of , with Y i (δ) a real variable and χ i a functional one, is available.According to Section 2.3, a predictor of Y n (δ), knowing χ n , can be achieved by estimating the median of the conditional distribution: where Fn (•|χ n ) is the estimated distribution of Y (δ) given χ n .Repeating this step for δ = 1, . . ., 12, the mean values of the flow for the (n + 1)th year can be predicted.
In the functional data context of this paper, another approximation consists in considering a regression model like (1) and using a nonparametric kernel functional method to estimate the regression function m(•).Considering the sample data of functional covariates and a scalar response, {χ i , Y i (δ)} n−1 i=1 , the nonparametric functional estimator (Ferraty and Vieu, 2006) has the expression: Equation ( 15) constitutes a functional alternative based on regression techniques to the approach previously used based on median estimation.Using (15), the flows of the (n + 1)th year can be predicted calculating Ŷn (δ) = m(χ n ) (δ = 1, . . ., 12). (16)

NFDA applied to extreme value analysis
Denote by t α the α-order quantile of the distribution of Y given a particular value of χ.
From the conditional distribution function, the α-order quantile is defined as: Using the estimator given in ( 12), a nonparametric estimator of t α in ( 17) is readily obtained by tα = F −1 n (α|χ).( 18) Several asymptotic properties of this estimator are shown in Ferraty et al. (2005).Expression ( 18) can be immediately used as an estimator of the flow quantiles (4).Section 3.2 presents an application of this approximation using a time series of a river in the U.S.
The problem of extreme quantile estimation using functional data has also been addressed in Gardes et al. (2010), where nonparametric estimators of quantiles from heavytailed distributions when functional covariate information is available are studied.

Hydrological data
In this Section, the functional nonparametric techniques are applied to two time series of river flow (measured in cubic meters per second, m 3 /s), in the U.S., which were downloaded from the National Water Information System (NWIS) of USA, http://waterdata.usgs.gov.The free statistical software R (R Development Core Team, 2015) was employed to implement the different procedures.Specific packages used in this process are cited below.
Firstly, flow data of Salt River near Roosevelt, AZ, were selected.The annual peak flow data for this river were considered by Katz et al. (2002), where they used a GEV distribution.A study is also available in Anderson and Meerschaert (1998), who found that the monthly mean flow is quite seasonal and possesses a heavy-tailed distribution.These data have been also used in nonparametric studies (Quintela-Del-Río, 2011).In the present paper, Salt River hydrological data are employed to examine the approaches on flow prediction and extreme value analysis.Additionally, a monthly mean flow time series of Christina River at Coochs Brigde, DE, was also considered (Senior and Koerkle, 2003;Celebioglu, 2006).These data are only used in the time series prediction application, but not to perform extreme value analysis.Lower flow values, compared with those of Salt River, are obtained here (see Figures 2 and 4).
These two rivers were selected because they belong to two different climate areas with disparate temperatures and significant differences in rainfall throughout the year (see Figure 1 for a location map).Christina River at Coochs Bridge at Delaware (US) is influenced by an Atlantic climate, with high humidity and stable precipitations.The average annual temperature in this location is about 13 • C degrees, and the average annual precipitation is around 1168 mm.Salt River near Roosevelt, Arizona, belongs to a Continental area, with high average annual temperature (over 21 • C degrees), and an average annual precipitation around 635 mm.Thus, the performance of the NFDA techniques can be compared in different scenarios.

Monthly mean flow prediction
Monthly mean flow data of both rivers, from January of 1944 to December of 2009, are considered.The number of observations in this time interval is 792.In both cases, no missing values appear, and the quality of the records is guaranteed by the information of the web page of the NWIS.
Firstly, a descriptive statistical analysis of both time series is performed.Table 1 presents the most usual descriptive statistics for the data of the two rivers.In both cases, high values for the kurtosis and the skewness (to the right), and the presence of maximum values far away from the rest of data, according to a heavy-tailed distribution, can be observed.
The mean monthly time series, which does not fit a normal distribution, can be normalised using a log-transformation function in order to remove the periodicity of the original series (Wang et al., 2009;Keskin et al., 2006).In Figure 2, Salt River data, before and after the logarithmic transformation are shown.Figure 3 presents the estimated density functions computed with equation (6) using these data.In Figure 4, similar plots to those in Figure 2, but for Christina River, are displayed.
In Figures 5 and 6, the autocorrelation functions for the data of both rivers before and after the logarithmic transformation, respectively, are shown.The plots present different dependence structures, and suggest that the ARIM A modelling could be a possible approximation for prediction purposes.To perform a functional analysis of the series and following Section 2.3.1, the original time series are converted into annual curves.In this case, there are no missing data and measures for all the months are available.Therefore, the number of annual curves is 66.To validate the performance of the approaches, the values in the 66th year (2009) are predicted using the values from the 65 previous years, and these predictions are compared with the real values in that year.To apply the nonparametric functional methods, two bandwidths have to be selected.To do this, in a first step, considering the first 64 years, the 65th is used as a validation step.As explained in Section 2.3.1, given the sample {χ i , Y i (δ)} 64 i=1 , a predictor of Y 64 (δ), knowing χ 64 , can be achieved using equation ( 14).Repeating this step for δ = 1, . . ., 12, the mean values of the flow for the 65th year can be predicted.The NFDA estimators are applied using kernels based on the Epanechnikov density, On the other hand, the bandwidths h and g are selected by minimizing the prediction error over the 65th year, that is 12 δ=1 ( Ŷ64 (δ) − Y 64 (δ)) 2 , and the FPCA (Functional Principal Components Analysis) semi-metric is used (for more details, see Ramsay and Silverman, 2005).
Next, in a second step, given {χ i , Y i (δ)} 64 i=1 and the previous selected parameters h and g, Fn (•|χ 65 ) is estimated and a predictor of Y 65 (δ) for δ = 1, . . ., 12, using the corresponding estimator of the median of the conditional distribution F (•|χ 65 ) given in ( 14), is obtained.Additionally, the nonparametric functional estimator of the mean function ( 15), based on regression techniques, was also applied.In this case, the monthly mean flows of the 66th year were predicted using equation ( 16) for n = 65.The software for computing the NFDA, programmed in R, can be freely obtained at the web http://www.math.univtoulouse.fr/staph/npfda/.
A parametric ARIM A model is also fitted to the time series, by means of the package forecast of the software R. In this package, automatic methods to select the order of the model and also to estimate the corresponding parameters are implemented.In this case, an ARIM A(1, 0, 2) is fitted for Salt River data and an ARIM A(4, 0, 4) for Christina River data.
Figure 7 shows, for Salt River data, the predicted values for the 66th year (dashed line) together with the real values in the 66th year (solid line).All the data considered are the natural logarithm of the real values.Figure 8 is the equivalent plot for the Christina River data set.In each case, the top panel corresponds to the functional modelling using the predictions based on regression (equation ( 16)), the middle panel shows the functional modelling using the predictions based on the median (equation ( 14)), and the bottom panel presents the ARIM A approach.
A numerical comparison for obtaining the best predictor is made using the MSE criterion, that is, The MSE values in both rivers are given in Table 2.In the first row, the results using NFDA based on regression (equation ( 16)) are presented.The results obtained applying NFDA methods based on the median (equation ( 14)) are shown in the second row.Finally, in the third row, MSE values using ARIM A models are given.
As observed in Figures 7 and 8, NFDA predictions methods provide better fits to the real series.The ARIM A predictions are, basically, the mean values.Moreover, it can be observed in Table 2 that the MSE errors are lower using the NFDA techniques, and the best criterion is that using the median as the predicted value in the two time series.

Extreme value analysis
In this section, NFDA techniques are applied for extreme value analysis.Equations described in Section 2.3.2 are used, and the results obtained are compared with those using the parametric GEV and nonparametric estimators presented in Sections 2.2.1 and 2.2.2, respectively.In this case, only Salt River data are available.The maximum daily flow data of this river, from 01/01/1987 to 31/12/2009, are used to calculate flow quantile estimates as indicated in (4).In Figure 9, a boxplot computed with these data is presented.It can be observed the very asymmetric and heavy-tailed data distribution, with a lot of extreme values corresponding to high quantiles of the variable.Similar information can be deduced from Table 3, where the most usual descriptive statistics for the maximum daily flow variable are shown.
The considered values from years 1987-2008 (inclusive) are used in the estimation process, and the corresponding estimates are checked with the real values in the year 2009.
In the classical parametric GEV estimation (Section 2.2.1), the data need to be independent, or, at least, the dependence has to decrease suitably fast with increasing time separation (Smith, 1989).However, nonparametric estimators (both of functional and non-functional type) can be correctly applied in this field and have good theoretical properties, although the assumption of independence is not strictly fulfilled (Youndjé and Vieu, 2006;Quintela-Del-Río, 2008).
The first step to apply NFDA (now, to calculate flow quantiles) is to construct functional data from a sample of daily maxima, in the same way as in Section 3.1.Because daily data are available, functional data composed of the corresponding values of each month are constructed.Unlike the situation in Section 3.1, now the number of components changes from one functional variable χ to another (unbalanced data setting).This happens because the months do not have the same number of days.All months are con- sidered to have 31 measures, interpolating linearly the two closest values for each value that originally does not exist.Therefore, each functional observation consists of 31 data.
Here, the construction of the functional data is analogous to (13): where {Z k } n k=1 denotes the complete time series of daily maxima, and χ i = (χ i (1), . . ., χ i (31)) the daily data of the ith month.Now, our focus is on the estimation of the conditional distribution function of the variable of each daily maximum, conditioned on the values in the previous month.
The comparison between the classical parametric GEV methods, the nonparametric techniques and the NFDA approaches is carried out in the following steps: a set of values for levels c i from i = 1, . . ., 20 is selected.Specifically, c 1 is chosen as the median of the data (up to the year 2008), and c 20 as the quantile of order 0.95.The sequence c i consists of 20 equally spaced points.Using the true measures of the last year 2009, the number of days in which the values c i were exceeded can be computed.Thus, the recurrence intervals, using the corresponding empirical distribution function, in expression (5), can be approximated.These estimators are: Now, any estimation method of the flow quantiles (4), using the values in (21), should provide an approximated value of the true values c i .The flow quantiles are estimated using the classical parametric methods, the nonparametric approaches and also by means of our approximation based on NFDA methods, described below.
Parametric GEV approach.For each i = 1, . . ., 20, the flow quantiles are estimated by For this, the package evir of the software R, that estimates the GEV parameters by maximum likelihood, is used.

Nonparametric approach.
For each i = 1, . . ., 20, nonparametric estimators of the flow quantiles are calculated, as indicated in (9): where the bandwidth, obtained by cross-validation, is h = 13.66 NFDA approach.Expression ( 14) can be adapted to estimate any quantile.Then, for each i = 1, . . ., 20, estimators of the flow quantiles are obtained by estimating the conditional quantile of order 1/ RT (c i ) by the expression: where, for each δ, Y j (δ) = χ j+1 (δ) and χ (n/31)−1 denotes the functional data composed of the 31 measures of the penultimate month.Then, for each day δ an estimated value is available, and the functional nonparametric estimate of the flow quantile, denoted by ĉiF , will be the sample mean of these daily values Ŷ(n/31)−1 (δ).The same kernels, bandwidths and semi-metric as in the example in Section 3.1 are used.To compare mathematically the three approaches, the relative mean absolute error (RMAE) of ĉiθ , ĉih and ĉiF is computed, given by: RMAE = 1 20 where ĉi * can be ĉiθ , ĉih or ĉiF .The results obtained are RMAE = 1.13, 0.71 and 0.26, for the parametric GEV, nonparametric and NFDA estimates, respectively.Therefore, the minimum error is obtained with the NFDA techniques.On the other hand, Figure 10 shows the quantile estimations with the previous proposals for Salt River data (parametric GEV estimations with a dotted line, nonparametric with a dashed line, and NFDA estimations with a solid line.The dashed diagonal line represents the true values to be estimated).The long-tailed distribution observed in Figure 9 clearly reveals the difficulty  (Serinaldi, 2009).However, the NFDA approach, considering each functional datum as the complete set of values for each month, gives more precise estimations than those obtained with the parametric GEV or the simple nonparametric methods.The largest differences between the estimates occur at the highest levels, where the good approximations of the NFDA estimates are observed and it is more important to have reliable prediction techniques.Note that a multivariate approach would be possible in the parametric GEV and the nonparametric settings, but, in this case, a vector composed of 30 predictor variables would be necessary.This high value makes very difficult (if not impossible) this kind of approximation in practice.

Discussion
Statistical techniques are usually applied to address practical problems in hydrology.In the present paper, two of them, monthly mean river flow prediction and extreme value analysis, are the focus of the research.NFDA approaches, combining nonparametric methods with functional data, are used in this setting.
The main objective of this paper was to apply different NFDA techniques to two particular hydrological problems, and to test their behaviour in comparison to more classical approaches.The nonparametric functional methods were applied to real data of two rivers in the U.S. The different alternatives were validated using the final year in the database as a testing sample, and the rest of the years as the training sample.
In the prediction setting, two nonparametric functional proposals, based on the median and the mean, respectively, were applied and compared with classical ARIM A models.The results showed that NFDA approaches, especially those based on the median, had a better performance than the classical ARIM A models.
The previous approaches could be extended including available information like daily precipitation, daily or any other climatic covariate.Several models similar to those presented incorporating covariates have been proposed and studied previously.For example, the dynamic regression (ARIM AX) combines the Box-Jenkins models with the linear regression, obtaining a more general model for the study of the time series (Shumway and Stoffer, 2011).This kind of models simply adds covariates to the general expression of an ARIM A model, but the covariate coefficients are hard to interpret.An alternative approach could be the application of regression models with ARM A (or ARIM A) errors.This includes the use of parametric, nonparametric or semiparametric approaches.In a hydrological context, Castellano-Méndez et al. ( 2004) presents a study of the Xallas river (northwest of Spain), using Box-Jenkins and neural networks methods, incorporating exogenous variables such as rainfall information.Regarding the case of functional methods, covariates could be included in the problem through the use of semi-functional partial linear models (Aneiros-Perez and Vieu, 2006).This approach uses a nonparametric kernel procedure; the output is scalar, and a functional covariate and multivariate non functional covariate are considered.Functional regression between functional explanatory variables and a scalar response is also possible using the backfitting algorithm (Febrero-Bande and González-Manteiga, 2011).This would allow including functional covariates in the model.The application of these techniques to our data would require the availability of some relevant climatic variables.Unfortunately, these variables are not available in the managed databases.However, a more deep study of this issue could be carried out in a future research.
Regarding the extreme value analysis, the estimation of the flow quantiles has been the focus of the study.These values play an important role in hydrological problems, because they are directly linked with flood analysis.The new NFDA approach performed better than the parametric GEV estimators, producing more close to the true values.On the other hand, it is well-known that due to the small number of extreme values in a sample, it is usually difficult to obtain reliable estimations.These estimations could be improved by using more precise bandwidth parameters.The bandwidth parameter selection in NFDA remains, nowadays, as an open problem.The development of data-driven techniques for computing optimal bandwidths will produce directly the improvement of the promising results obtained in the quantile estimation problem.
In general, the approaches proposed in this paper yielded accurate estimates of both the functions of interest, such as the cumulative distribution function or the function providing the probabilities of exceedance, and derived parameters, as, for example, the flow quantiles.They also captured more complex patterns in the data providing better future estimations.Therefore, they represent a better alternative to the classical methods regularly used in this framework, being useful tools for environmental agencies to manage hydrological risks including those of floods.

Figure 1 :
Figure 1: Location map of Salt River near Roosevelt, AZ, and Christina River at Coochs Brigde, DE.

Figure 2 :
Figure 2: Salt River monthly mean flow data.Top panel: original data (measured in m 3 /s).Bottom panel: natural logarithm of original data.

Figure 3 :
Figure 3: Nonparametric density estimates of Salt River flow data.Top panel: original data.Bottom panel: natural logarithm of original data.

Figure 4 :
Figure 4: Christina River monthly mean flow data.Top panel: original data (measured in m 3 /s).Bottom panel: natural logarithm of original data.

Figure 5 :
Figure 5: Autocorrelation functions of Salt River and Christina River monthly mean flow data before the logarithmic transformation.

FFigure 6 :
Figure 6: Autocorrelation functions of Salt River and Christina River monthly mean flow data after the logarithmic transformation.

Figure 7 :
Figure 7: Predicted values for Salt River monthly mean flows in 2009 (dashed line) and real values in that year (solid line).(a) nonparametric functional data (NFDA) modelling based on regression.(b) nonparametric functional data (NFDA) modelling based on the median.(c) ARIM A approach.

Figure 8 :
Figure 8: Predicted values for Christina River monthly mean flows in 2009 (dashed line) and real values in that year (solid line).(a) nonparametric functional data (NFDA) modelling based on regression.(b) nonparametric functional data (NFDA) modelling based on the median.(c) ARIM A approach.

Figure 9 :
Figure 9: Boxplot of Salt River maximum daily flow data, measured in m 3 /s.

Figure 10 :
Figure10: Estimations of the quantiles using the parametric GEV estimator, nonparametric kernel method the nonparametric functional (NFDA) approach for Salt River data (parametric GEV estimations with a dotted line, nonparametric with a dashed line and nonparametric functional estimations with a solid line.The dashed diagonal line represents the true quantiles to be estimated.

Table 1 :
Descriptive statistics for the monthly mean flow variable of Salt River and

Table 2 :
MSEs of the monthly mean flow predictors in the 66th year (2009) using different methods (NFDA based on regression, in the first row; NFDA based on the median, in the second row; and ARIM A models in the third row), for Salt River and Christina River.

Table 3 :
Descriptive statistics for Salt River maximum daily flow variable.