Interactive comment on “ Water levels of the Mekong River Basin based on CryoSat-2 SAR data classification ” by Eva Boergens et al

Abstract. In this study we use CryoSat-2 SAR (Delay-Doppler Synthetic Aperture Radar) data over the Mekong River Basin to estimate water levels. Smaller inland waters can be observed with CryoSat-2 data with a higher accuracy compared to the classical radar altimeters due to the increased along track resolution of SAR and the smaller footprint. However, even with this SAR data the estimation of water levels over smaller (width less than 500 m) is still challenging as only very few consecutive observations over the water body are present. The usage of land-water-masks for target identification tends to fail as the river becomes smaller. Therefore, we developed a classification to divide the observations into water and land observations based solely on the observations. The classification is done with an unsupervised classification algorithm, and it is based on features derived from the SAR and RIP (Range Integrated Power) waveforms. After the classification, classes representing water and land are identified. The measurements classified as water are used in a next step to estimate water levels for each crossing over the Mekong River. The resulting water levels are validated and compared to gauge data, Envisat data and CryoSat-2 water levels derived with a land-water mask. The CryoSat-2 classified water levels perform better than results based on the land-water-mask and Envisat. Especially, in the smaller upstream regions the improvements of the classification approach for CryoSat-2 are evident.


Introduction
The water of rivers is vital for humans but poses a threat at the same time.Rivers are crucial as a suppliers of water for irrigation and fresh water for drinking.However, floods can destroy crops, settlements, and infrastructure.For this reason, it is essential to monitor the water level of river systems.An increasing number of in situ gauges have been derelicted since the 1980s (Global Runoff Data Center, 2013), or the data is not publicly available.It is therefore increasingly important to measure river water level with satellite altimetry.
All of the aforementioned studies use pulse-limited altimetry data.CryoSat-2, launched in 2010, is the first satellite carrying a Delay-Doppler altimeter (Raney, 1998).The altimeter operates in three measuring modes: the classical pulse-limited Low Resolution (LR) mode, the Delay-Doppler Synthetic Aperture Radar (SAR) mode, and the SAR Interferometric (SARin) mode.
Compared to conventional radar altimeters, Delay-Doppler measurements have a higher along track resolution and a smaller footprint.This improves the observation of water levels of inland water bodies like lakes (e. g.Nielsen et al., 2015;Kleinherenbrink et al., 2015;Göttl et al., 2016) or rivers (e. g.Villadsen et al., 2015;Bercher et al., 2013).The advantage of SAR altimetry observations are especially useful for measuring smaller inland waters like rivers.However, CryoSat-2 has a long repeat time of 369 days compared to 35 days of Envisat and SARAL, and 10 days for Topex/Poseidon, Jason-1 and Jason-2.This restricts the estimation of meaningful water level time series over rivers or lakes, if not enough different tracks cross the water body.The advantage of the long repeat time is the very dense spatial distribution of observations.This is especially useful for rivers to better monitor their continuous progression.Unlike lakes, rivers can change their water levels rapidly over their course which makes a denser spatial distribution of observations desirable.
To derive water levels from lakes or rivers it is necessary to identify the water returns of the altimeter.This can be done by applying a land-water-mask such as the mask provided by the World Wildlife Fund (https://www.worldwildlife.org/pages/global-lakes-and-wetlands-database).Such a mask is constant over time, therefore, it neither accounts for the seasonal variations of the water extent nor inter-annually shifting river and lake banks.These masks are usually not accurate enough for narrow rivers where only a few water measurements are available.Extracting dynamic land-water-masks from optical or SAR remote sensing images is difficult in the study area since cloud-free optical data is only available during the dry season with low water level.Moreover, SAR images with sufficient spatial resolution are only available from 2014 on with the launch of Sentinel-1.Although a high accuracy land-water-mask is provided by the Mekong River Commission (http : //portal.mrcmekong.org/mapservice) for our study area of the Mekong River Basin, its accuracy of 30 m is still not sufficient for the smaller and, especially, the small rivers.Although, a high accuracy land-water-mask is provided by the Mekong River Commission (http : //portal.mrcmekong.org/maps ervice) which has an accuracy of 30 m, this accuracy might not be sufficient for medium and small sized rivers.Additionally, the mask has no seasonal variations included.In the Mekong River Basin the river width varies between 20 m to more than 2 km.The small rivers with a width of less than 100 m are most of the tributaries and the upstream part of the left river bank side main tributaries.The medium rivers, which are less than 500 m but more than 100 m wide, are the main tributaries and the upstream main river.In the downstream reach of the river, before it splits into the delta, the river has a width of over 2 km (see also Figure 2 for a map of the basin).
To overcome the problems and limitations of land-water-masksTo be independent from the accuracy and availablity of land-water-masks, we classify the altimetry data beforehand in water and land observations.For the classical pulse-limited altimeter this has been done successfully for the last decade (e.g Berry et al., 2005;Desai et al., 2015).Even very small water areas in wetlands have been classified successfully with Envisat data by Dettmering et al. (2016).In the classification, the shape of the waveform is used to discriminate between different reflecting surfaces.Also CryoSat-2 SAR data has been classified based on the SAR waveform before for lakes (Göttl et al., 2016), lakes and rivers (Villadsen et al., 2016), or ice (Armitage and Davidson, 2014).
This study takes a step further and uses not only the waveform but also the Range Integrated Power (RIP) for a classification of the altimeter measurements in water and non-water returns over the Mekong River Basin in Southeast Asia.The RIP is only available for Delay-Doppler SAR altimetry and gives additional insight to the reflective surface which the waveform alone could not provide (see Figure 3 for an example and Wingham et al. (2006)).
The unsupervised k-means algorithm is used for the classification (MacQueen, 1967) as not enough reliable training data is available for a supervised classification.The k-means algorithm is a widely used unsupervised clustering algorithm and has been used for altimetry classification before (e.g.Göttl et al., 2016).This paper is structured as follows: First, an introduction is given about the study area of the Mekong River Basin in section 2, afterwards more information of the CryoSat-2 SAR data is given in section 3. The classification and the used features are described in section 4 followed by an explanation of the water level estimation in section 5.The results and validations are presented and discussed in section 6.The paper ends with the conclusions in section 7.An overview over all relevant processing steps of this study is given in Figure 1.

Study Area
In this study, the Mekong River Basin in Southeast Asia (China, Myanmar, Thailand, Laos, Cambodia, and Vietnam) is investigated, with focus on the part of the basin south of the Chinese border.Upstream from here, it is not possible to measure the river with satellite altimetry because the river flows through narrow gorges that shadow the altimetric measurements.Downstream, the study area ends by the confluence with the Tonle Sap River from where on the river is under tidal influence.The tributaries, namely the large left bank side tributaries in Laos, are investigated as well.The hydrology of the Mekong Basin is primarily influenced by the precipitation on the Tibetan Plateau and the south-eastern monsoon (Mekong River Commission, 2005).
The Mekong River and its tributaries flow through different topographic regions (Figure 2).The main river upstream from Vientiane and the left bank tributaries in Laos are surrounded by mountainous areas with steep banks where the rivers have a greater slope and have a width smaller than 500 m or even less than 100 m.Downstream of Vientiane and up to the Mekong Falls the river widens and flows with less slope over the Khorat plateau.Below the Mekong Falls the river is surrounded by seasonal wetlands and widens to more than 1 km.For further processing we defined three overlapping data masks according to these regions (Figure 2).The regions are determined by the roughness of a topography model and the absolute height.
Afterwards a margin around each subregion allows for an overlap.

Data
In this study we use Delay-Doppler SAR altimeter data measured by CryoSat-2 between 2010 and 2016.CryoSat-2 measures in three different modes, which are set in a geographical mask (https://earth.esa.int/web/guest/-/geographical-mode-mask-7107).
The LRM is active mostly over the oceans and the interior of the ice sheets of Antarctica and Greenland, whereas the SAR mode measures over sea ice and other selected regions and SARin focuses mostly on glaciated regions (ESA, 2016).This mask has changed over the life time of the satellite.The entire study area of the Mekong River Basin has only been measured in SAR mode since July 2014 (see Figure 2 for the extent of the SAR mode mask).In SAR mode the along-track foot print size is reduced to 300 m while it remains 10 km in the across-track direction.
Here, we use the CryoSat-2 baseline C SAR Level 1b data provided by ESA GPOD SARvatore (https://gpod.eo.esa.int/) for the period of 2010 to 2016.These data contain the full stack matrix.
The Delay-Doppler SAR altimeter measures a point on the surface several times from different looking angles (Cullen and Wingham, 2002).All these measurements form the multi-look stack data (see Figure 3).For every point 246 single-look waveforms are collected in the stack matrix.In Figure 3, two exemplary stack matrices are presented.The first (a) is measured over the Tonle Sap lake and the second (b) over a medium river upstream.Each row is a single-look waveform.The integration of this matrix over all single-looks results in the multi-look SAR waveform (in Figure 3 integration over each row of the stack) hereafter referred to as the waveform.The integration over the range bins results in the Range Integrated Power (RIP).In Figure 3 this corresponds to the integration over the columns.Detailed information on the Delay-Doppler measurements are described in Raney (1998).
Additionally, we use a river polygon which is provided by the Mekong River Commission (http://portal.mrcmekong.org/map_service).The polygon was derived from aerial images and topographic maps.The accuracy of the river mask is ∼ 30 m, but no information about seasonality of the polygon is given.

Classification Approach
For the medium and small rivers in our study area of the Mekong basin no reliable land-water-mask is available.Thus a classification by means of the k-means algorithm is performed to extract the water measurements.
The k-means algorithm (MacQueen, 1967) is an unsupervised method to cluster the data on the basis of different features.
For the land-water classification a set of features derived from both the waveform and the RIP is used which are summarized in Table 1For the land-water classification a set of features derived from the Cryosat-2 stack data over the intermediate step of the waveform and the RIP is used.The features are summarized in Table 1.The features derived from the waveform are the maximum power, the peakiness, and the position of the leading edge.It is well known, that waveforms of water reflections have a higher power than those of land reflections.Medium, and even more so small, water bodies have a smooth mirror-like surface which can only be measured by signals emitted close to nadir.This leads to a very peaky waveform and RIP with a high power.Following Laxon (1994) the peakiness p wf is calculated with where wf is the waveform and wf i the power of the i th bin.
To estimate the relative position of the leading edge in the waveform, the waveform is retracked using an Improved Threshold Retracker with a threshold of 50% on the best sub-waveform (Gommenginger et al., 2011).The on-board tracking system always tries to hold the leading edge of the main reflection at the nominal tracking point.This is not always possible and leads to a deviation of the leading edge from the nominal tracking point.Over wider rivers the tracking system can manage to keep the leading edge close to the tracking point.In Figure 4, left panel, one exemplary waveform with its features maximum power and position of the leading edge is shown (the peakiness cannot be displayed).
Features based on the RIP are the peakiness p RIP , the standard deviation std RIP , the width, the off-center, and the symmetry.
The std RIP is a measure of the the difference in the returning power under different looking angles is (see Figure 3).Water reflections over large water bodies result in a overall smoother RIP than water reflections over small water bodies which in turn have a smoother RIP than land reflections.The std RIP is where RIP i is the i th entry of the RIP and N the number of looks in the RIP, usually 246.
As mentioned before, small and medium inland waters with a smooth surface only reflect the signal back to the satellite at near nadir.Therefore the RIP is both very peaky and narrow.The width w is derived with: The off-center feature of f describes the deviation of the main reflection from the nadir point.It should be close to zero for measurements of water, whereas land measurements are more disturbed and often show the maximum return in the lobes.We measure the off-center feature of f as the difference between the middle look of the RIP and the mean point of the RIP which is calculated with: A positive of f value indicates that the majority of the returning power was detected before the satellite passed the nadir position, a negative value vice versa.
The last feature is a measure of the symmetry of the RIP s.For an ideal smooth water reflection, like a small lake, the RIP should be perfectly symmetrical.However, for a sloped target, as a river is, the reflection depends on the relative orientation between the satellite and the water surface.The reflection is stronger when the satellite looks on a water surface that is sloped towards it.A positive s indicates a water surface sloped towards the approaching satellite.This effect leads to an unsymmetrical RIP.To quantify this, an unsymmetrical exponential function RIP is fitted to the RIP with (5) Here, a is the amplitude of the exponential function, b the look where the function reaches its maximum, and c 1 and c 2 are the two decay parameters.The symmetry feature is then  Additional to these eight features, both the whole waveform and the whole RIP are used as features.Each bin is then considered as a single feature.The waveform needs to be shifted so that the leading edge is positioned on the nominal tracking point.Since the features span different orders of magnitude, it is necessary to normalize the feature set.All of these features were chosen according to their sensitivity for the posed problem of water classification and independent from each other.More features were tested but discarded because they were either not sensitive for the classification or highly correlated to one of the used features.
The k-means algorithm is used to cluster the data on the basis of the above features in 20 classes.An unsupervised clustering algorithm is applied because no reliable training data is available.The unsupervised k-means clustering algorithm is widely used and was already tested for waveform classification in Göttl et al. (2016).The k-means algorithm assumes normally distributed features with equal variance, which we ensured by the normalization of the features.The number of classes depends on the application and variation in the input features.An estimate for the number of classes can be done with knowledge of the classified data.In our study case, a look at the spatial distribution of the features tells us that only two classes, land and water, are not sufficient as altimeter measurements of land can be very diverse (this holds also for water measurements, but they are less diverse than land).The diversity of the returning waveform and RIP can be explained by the reflective properties of e. g. land, water, vegetation.With this it can be concluded that at least 10 classes are needed.We tested the classification and validated resulting water levels for several numbers of classes (10, 15, 20,30) and found similar results for all with the results of 20 classes slightly superior.
Each of the clusters is defined by their centroid which is the mean feature of all points in this cluster.New data is then classified by grouping it to the closest centroid.Here, the clustering is done on one randomly drawn third of the data.The residual two third of the data is then classified into the cluster classes.The clustering is not done on the whole data set due to computational efficiency.The repeatability of the clustering and classification will be validated in section 6.After the classification it is determined which classes represent water and land returns, respectively.This was done by visual inspection of the mean waveform and RIP for each class and the locations of the observations in each class related to the land-water-mask (see section 3)approximate location of the river known from the land-water-mask (see section 3).
As described in section 2 the Mekong Basin is divided into different regions, upstream, middle and downstream.We classify each of the regions separately as they are too diverse in the reflectivity properties of the water bodies to be classified together.
Additionally, the classification is done only on altimeter data not further away than 20 km from the river polygon due to computational efficiency (the polygon can be seen in Figure 2).

Water Level Estimation Approach
The classification results in a set of measurements considered as water returns.From these measurements the water level for each crossing is determined in this section.

Altimetric Water Levels
A water level is computed for each crossing of the satellite track with a river in the Mekong River Basin.To locate these crossings a river polygon (see section 3) is used.We apply all measurements less than 5 km away from the river crossing that were classified as water and retrack the SAR waveforms with an Improved Threshold Retracker with 50% threshold (Gommenginger et al., 2011).Instead of using a median or mean over all classified measurements, we search for a horizontal line in the heights, which is assumed to represent the water surface.It is still possible that some of the water classified measurements do not represent the river surface and need to be excluded from the water level computation (across-track of nadir effects or water bodies surrounding the river).These outliers do not necessarily have to be at the margin of the river but can also be located in the middle due to islands or sandbanks in the river.This would restrict the use of an along-track standard deviation of the heights for outlier detection.
To find the line of equal water height, a histogram of the water levels with Doane bins (Doane, 1976) is used.Doane bins are more suitable to small (less than 30) non-normally distributed data than the classical Sturge bins (Sturges, 1926).If a horizontal line is present in the heights, one of the bins is distinctively larger, e.g.contains more observations, than the others and collects the heights of nearly equal water level.The median of the heights in this bin is then taken as water level.If less than 5 height points were classified as water, the median of the heights is taken as the water level.The advantage of this approach is that it is better suited for rivers wider than 1 km with islands and sandbanks that cause outliers in the heights.However, in many cases our histogram approach and taking the median of all observations deliver similar results.

Outlier Detection
In spite of careful data selection through the classification and in the height retrieval, some retrieved water levels have to be considered as outliers.To find these outliers we make use of the CryoSat-2 repeat time of 369 days.With the knowledge of the very stable annual signal of the Mekong River one can assume that two measurements of the same CryoSat-2 track 369 days apart should measure a similar height.Based on this, a water level is considered as an outlier if the mean difference to all other heights of the same pass is larger than 7 m.This is only applicable if other water level measurements of the same track exist.Due to the changing mode mask (see section 3) some regions are only measured in the last two years.To overcome this, a second outlier detection is applied which compares the water level with water levels of other tracks that are close in space and time of the year.To this end, we used all measurements that are less than 10 km away along the river and less then 30 days of the year apart.If the water level is more than 10 m different from the distance weighted mean water level of all these points it is considered as an outlier.
The thresholds for the outlier detection were chosen as a conservative upper bound.It has to be expected to have in average a water level difference of 40 to 60 cm in five days during the rising water season, but it could be as high as 4 or 5 m (Mekong River Commission, 2009).Additionally, some inter-annual changes in the flood season can be expected, and the rivers in the Mekong Basin have a median slope of 30 cm/km.and the slope of the river has to be considered which is in median 30 cm/km for the Mekong River.Of the three thresholds used for the outlier detection the difference of 7 m w.r.t. of the year is the most sensitive for the later result.The time and distance weighted mean in the second part of the outlier detection limits the sensitivity of the other two thresholds.

Merging of the overlap regions
From the classification we derive a set of heights for each of the different geographical regions which have a certain overlap (see Figure 2 and section 2).In this overlap, for the same crossing two water levels were computed, therefore, it has to be decided which height shall be used.To resolve this, we use the distance weighted mean water level of all other water level measurements that are less than 10 km away and less then 30 days of the year apart as in the outlier detection (see subsection 5.2).The water level that is closest to this mean water level is applied.The results of the merging process can also be used for validation of the classification as will be shown in subsubsection 6.3.3.

Results, Validation and Discussion
We applied the described methodology for the classification and water level determination on CryoSat-2 SAR data in the Mekong River Basin.In this section, both the results of the classification and the water level determination are presented and validated.

Results of the Classification
After the clustering and classification of the CryoSat-2 measurements we select the classes of water returns.In the upstream region we identify three and in the middle region six out of twenty as water classes.In the downstream region the classification approach failed.There, the rivers are surrounded by seasonal wetland whose observations are also water returns.Additionally, the width of the rivers feature larger seasonal changes than in the other regions.This can influence the waveform and RIP significantly.At some points we find peaky returns in the dry season, which can also be found in the wet season in the wetland, whereas the river itself shows near ocean-like waveforms during in the wet season., which the classification algorithm can not distinguish.
In Figure 5 the mean waveform and mean RIP of some classes are shown ( note the different power-axes).The classes displayed are selected to best represent all 20 classes for the upstream and middle region.As can be seen, the shape of the mean waveform and mean RIP of water classes in the upstream region reappear in the middle region, but not as water classes.
In the middle region small lakes have the same signature as the river upstream.For this reason, the two regions were classified separately.The third land class shown for the upstream region has a very distorted mean RIP.In this area not all stacks over land are 'full', i. e. not every single-look recorded returning power.This leads to such distorted RIPs (side note: in another class the distortion is mirrored).All mean waveforms and RIPs are displayed in Appendix A for the interested reader.
In Figure 6, a section of the river network in the upstream region with the results of the classification is shown.The course of the river is well depicted, however, not at every crossing of the satellite track with the river water is identified.At some crossings no water reflection of the river was measured since the river was too narrow.On the other hand, some points classified as water are not close to the given polygon (blue line).However, the topography model (ETOPO1, Amante and Eakins ( 2009)) shown in the background indicates river valleys in the three circled areas.Therefore, one can assume that the classification is able to find rivers that are so small (down to 20 m wide) that they are not present in the high resolution river polygon provided by the MRC.
Figure 7 shows the classification for one exemplary track in the upstream region.The measurements classified as water (red dots) line up to a nearly constant water level at all crossings of the satellite track with the river.

Resulting water level
In the entire Mekong Basin we estimate water levels at more than 2000 crossings, which means approximately one measurement every 4 km along the main river (compared to 50 km for Envisat).It is not possible to measure a water level at every crossing between the CryoSat-2 track with a river in the basin.As mentioned before, at some crossings the river is too small so that not in every pass a reliable measurement could be made; some other water levels were discarded during the outlier detection; furthermore, at some crossings the classification failed to identify the water.However, we are still able to retrieve at least some measurements from rivers as small as 20 m in width.In Figure 8 all measured heights at all dates are presented in a map, which shows well the overall topography of the river network but cannot show smaller details like seasonal variations.
For one track the heights and the classification are displayed in Figure 7 with an inlaid map of the geographic surroundings.
In this track four water crossings are found where the two most northern ones are very close together with a difference of the water level of 20 cm.There the river meanders under the track which causes two crossings close together.The two southern crossings are two different rivers which explains the large height difference between the two locations close together.It is visible that only few measurements are used to estimate the water level at each crossing.Approximately 180 water levels (or 8%) are even estimated by just one measurement, the majority of those in the upstream region.
For crossings with more than one water measurement we can calculate the standard deviation of the measurements used for water level estimation.More than 85% of the water levels have a standard deviation of less than 0.5 m.

Validation
The classification is validated twofold: On the one side, we test the repeatability of the classification with a cross validation.
On the other side, the different classification in the regions can be compared in the overlap areas.The latter can be used at the same time also to validate the resulting water levels.Additionally, the water levels are validated with respect to the stable seasonal signal using gauge data.We compare these results with the performance of Envisat water levels and CryoSat-2 data extracted with a land-water mask in the same validation.For a better overview Table 2 summarizes all validations done in this study.
Table 2. Summary of all validations done in this study, separated for the validations of the classification and the water level estimation.

Classification
Water level estimation cross validation comparison of water levels of same pass Water levels in the overlap between the upper and middle region

Validation of the Classification
The cross validation of the classification is done for one third of the data.The classes determined before are considered as true values for this validation.The data are split into two equal parts.The first part is again clustered with the k-means algorithm, whereas the second part is classified with the resulting classes.This classification is validated against the "true" classes we found before in the first classification.
Table 3 summarizes the results of the cross validation.Water and non water classes are distinguished.The overall accuracy is 97.9%.This cross validation shows that the classification is stable and does not change with the data subset used for the clustering.As second possibility for the validation of the classification lies in the water level estimation.For crossings with enough measurements only those points which lie on a straight line are used for the height determination (see section 5).The number of observations discarded should be small, if not zero, for a flawless classification.

Validation of Water Levels
Unlike water level time series measured by short-repeat orbit missions, CryoSat-2 measurements cannot be validated against the time series of in situ gauges without reducing the topography as done by Villadsen et al. (2015).The Mekong River and its tributaries have topography that is too complex to allow for reliable reduction.Besides this, the temporal overlap between the CryoSat-2 data and the gauge data is only about 1.5 years or even less (April 2011 until December 2012).
To validate the water levels we use again the nearly one year repeat time of CryoSat-2.We investigate the differences between two subsequent tracks at the same river crossing.A histogram of the differences is shown in Figure 9(a).Table 4 displays the median, mean and standard deviation of these differences for the merged results as well as for the two regions (upstream and middle) separately.The results of the validation are compared to a validation with in situ gauge data, Envisat data and CryoSat-2 data with a land-water-mask.The gauge data provided by the Mekong River Commission for the main river and also some tributaries has a daily temporal resolution (http://ffw.mrcmekong.org/).From Table 4 and Figure 9 one can see that the water level varies up to 50 cm in median from year to year, but some years show much larger differences of up to 4 m.
The Envisat data is taken from the DAHITI database (Schwatke et al., 2015) for the main river as well as some tributaries (Boergens et al., 2016b) and has a temporal resolution of up to 35 days.For validation, we take the differences between gauge measurements that are 369 days apart and Envisat measurements where the day of the year is less than 5 days different.The validation of the gauges gives a measure of how stable the annual signal is in the Mekong Basin.The Envisat observations are the most commonly used data for inland waters with a pulse limited altimeter.We also compare our results to water levels derived from CryoSat-2 by simply averaging measurements inside the land-water-mask (Figure 9(b)).The water levels derived with the land-water mask underwent the same outlier detection as used on the results of the CryoSat-2 classification for better comparability.
The median of the differences of the CryoSat-2 classification results are always better than the Envisat results (see Table 4).
Even though, the differences are larger for the upstream region than for the middle region.In the upstream region, the mean difference are nearly equal for CryoSat-2 classification and Envisat results caused by the larger spread of the CryoSat-2 results.
The land-water-mask method leads to comparable good results as the classification along the main stream in the middle region where the river is wide.In the middle region along the main river the land-water-mask and the classification approach yield comparable results in the validation.However, in absolute numbers of observations the land-water-mask approach produce more water levels but with a higher amount of outliers.But in the upstream region with small rivers with a width of 100 m or less the quality deteriorates.The polygon is given with an accuracy of 50 m which is sufficient for a 1 km wide river but is too inaccurate for 100 m wide rivers.This causes the larger difference in the validation results of the two CryoSat-2 data sets in the upstream region.In the upstream region the water levels of the classification approach are superior over those of the land-water-mask approach as well in terms of validation results and absolute numbers of valid observations.For both regions the number of outliers is much larger for the mask than for the classification approach.This reveals the opportunity that SAR altimetry provides for rivers which are too small to be reliably identified in optical (e.g.Landsat) or SAR (e.g.Sentinel-1) images.As already shown in section 6.1 and figure 5 the classification of SAR altimetry identifies even rivers which are not visible in the land-water-mask derived from aerial images.
Additionally, the feature selection of the classification was done mostly with regard to the reflective properties of small water bodies which we find in the upstream region.This explains the better classification results in the upstream region compared to the middle region.

Validation in the overlap regions
The overlap between the two regions, upstream and middle described in subsection 5.3, can be used for validation of the classification and height determination.
Theoretically, the land-water classification and the resulting water levels should be identical in the overlap between the two regions.Unfortunately, this is not the case for all points.Overall, at only 67 river crossings the water levels are estimated in both regions.At these 67 points it is possible to evaluate the differences of the two water levels.Out of these, in 45 cases, or 67%, the differences are below 15 cm where we consider them equal given the accuracy of river altimeter measurements.At the same time, the largest difference between two water levels at the same location is 17 m.At the crossings where the difference is larger than 15 cm it has to be decided which water level is taken for the final data set (see subsection 5.3).In 17 cases the water level of the upstream region and in 5 cases the water level of the middle regions was chosen.We found that the decision which of the water levels should be taken has a spatial dependency.Towards the upstream border of the overlap region the results of the upstream classification are more likely to be taken, and vice versa for the middle region.Something similar can be observed for those crossings in the overlap region which have only in one of the two data sets water level estimations, we find more valid upstream observations towards the border to the upstream region and more middle stream observations towards the middle stream region.All this together justifies the separation of the classification into the different regions.

Conclusions
In this study we demonstrate the possibilities of classifying CryoSat-2 SAR data in the Mekong River Basin and using this classification for water level extraction.We demonstrate in this study the advantage of CryoSat-2 SAR altimetry data for measuring rivers which are identified by a classification, which is independent of a accurate land-water-mask.The classification uses features derived not only from the waveform but also from the RIP.The RIP contains more information about the reflecting surface than the waveform on its own can provide.This improves the classification and allowes us to identify even very small rivers with a width as small as 20 m.
In fact, the classification works better on medium and small rivers than large rivers.The cross validation of the classification shows that it is stable and repeatable.However, we were not able to use this classification to isolate the river in the downstream region where the Mekong River is surrounded by seasonal wetlands.
The classification in water and land measurements is used to derive water levels at the crossings of the CryoSat-2 track with a river in the whole basin.Overall, more than 2000 water levels are measured, after outlier detection.However, it is not possible to derive at every crossing a water level.The altimeter is not able to measure a water return at every possible river crossing due to too small rivers or too disturbed returns.Additionally, some measured water levels are discarded in the outlier detection.
The water levels are validated using the near yearly return time of CryoSat-2 and the very stable annual signal in the basin.
This validation is compared to the same validation done on Envisat water levels, gauge measurements and using a precise land-water-mask on CryoSat-2 data.Especially, for small rivers in the upstream region the classification improves the water level determination compared to the use of a land-water-mask.Compared to Envisat water levels the CryoSat-2 water levels are of higher quality in the whole river basin due to the smaller footprint of the SAR compared to pulse limited altimeter on Envisat.
The resulting water levels of this study will be used in a combination with other altimetric water levels following the ideas of Boergens et al. (2016a) to build basin wide multi-mission water level time series.With CryoSat-2 data we will be able to significantly improve the spatial resolution of the water level observations and to better close the data gap between the end of 5 the Envisat mission and the launch of the SARAL mission.With the launch of the Sentinel-3 satellite in February 2016 SAR altimetry data with a short repeat time is available.When the full stack data are publicly available the same classification of the data for water level retrieval can be hopefully used.

Figure 4 ,
Figure4, right hand, displays a RIP with the feature w marked.The off-center feature of f is too small to be visible in this example, but the symmetry, or the lack thereof, is clearly shown.

Figure 1 .
Figure 1.Processing steps used in this study for extracting water levels from CryoSat-2 SAR data.

Figure 2 .Figure 6 .
Figure 2. Map of the study area with the regional masks (black areas with different hachures) and the SAR mode mask with their validity period (red boxes).

Figure 9 .Figure A4 .
Figure 9. Histogram of the differences of height measurements 369 days apart for CryoSat-2 water levels with the classification, CryoSat-2 water levels inside land-water-mask, gauge water level, and Envisat water level.

Table 1 .
Features used for the classification

Table 3 .
Result of the cross validation

Table 4 .
Analysis of the differences of height measurements 369 days apart for the whole study area, only the upstream region, and only the middle stream region.