MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_NextPart_01C98237.0F74E120" This document is a Single File Web Page, also known as a Web Archive file. If you are seeing this message, your browser or editor doesn't support Web Archive files. Please download a browser that supports Web Archive, such as Windows® Internet Explorer®. ------=_NextPart_01C98237.0F74E120 Content-Location: file:///C:/628D990C/Ex1HydroExcel.htm Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset="us-ascii"
CE
397 Statistics in Water Resources
Exercise
1
Exploring
Time Series Data with HydroExcel
By: Stephanie Johnson, Tim Whiteaker, =
David Maidment
Center
for Research in Water Resources
University
of Texas at Austin
January
2009
Contents
Introduction
Goals
of this Exercise
Computer
Requirements
User
Feedback on HydroExcel
Procedure
HydroExcel
Spreadsheets
Selecting
a Data Source
Getting
a List of Variables
Getting
a List of Sites
Getting
a Site Catalog
Getting
a Time Series
Summarizing
Daily Time Series by Months and Years.
Descriptive
Statistics in Excel
Histograms
in Excel
To
be turned in:
In this exercise we’re going to explore so=
me
examples of temporal and spatial variation in hydrologic variables. To do our exploration, we will use=
some
tools developed by the Consortium of Universities for the Advancement of
Hydrologic Science Inc. (CUAHSI) to explore mean daily streamflows at vario=
us
sites across the nation. Stre=
amflow
is an excellent example of how variables vary in time and space and it is o=
ne
of the most common hydrologic variables to explore.
The goals of this exercise are to explore tempor=
al
variations in mean daily stream discharge using HydroExcel. These daily mean values are archiv=
ed and
available through the National Water Information System (NWIS); they can be
downloaded online at http://wat=
erdata.usgs.gov/nwis/dv/?referred_module=3Dsw. They can also be obtained using CU=
ASHI
web services through HydroExcel.
To perform this exercise you will need a computer
with the Windows operating system, a live internet connection, Microsoft Ex=
cel
2007, and Google Earth (download and install it here: http://earth.google.com/). We will be using HydroExcel, devel=
oped
by CUAHSI researchers to adapt Microsoft Excel for CUAHSI web services inpu=
t,
serving data both from national data providers and universities. The
spreadsheet uses VBA macros and an object library called HydroObjects http://his.cuahsi.org/hydr=
oobjects.html
to communicate with and retri=
eve
data from WaterOneFlow web services.
The latest version of HydroExcel available online is at http://his.cuahsi.org/hydroe=
xcel.html.
It is Version 1.1. We will be=
using
HydroExcel Version 1.1.1 which is contained in the file HydroExcel-Jan09.xl=
sb,
which is accessible as a zip file at HydroExcel-Jan09.zip,
which uses the same version of HydroObjects as Version 1.1
If you’d like to download HydroExcel and
install it on your personal computer, you can do so. Otherwise, the tool has already be=
en
installed on the computers in the Civil Engineering Learning Resource Center
(LRC) on the 3rd floor of ECJ.&=
nbsp;
If you choose to install HydroExcel on your personal computer you fi=
rst
need to install the object library, HydroObjects. HydroObjects is available at http://his.cuahsi.org/hydr=
oobjects.html
; simply run the .exe file and follow the installation procedure. (HydroObjects is already installed=
and
enabled on the computers in the LRC).
We would like feedback on user improvements that
could be made to HydroExcel. =
If you
have problems using this tool or see places where it could be improved, ple=
ase
note them as you do this exercise and then respond at the end to the follow=
ing
questions:
(1)&=
nbsp;
Installation – were you
able to install the tool correctly?
(2)&=
nbsp;
Errors
– Did you encounter any errors? =
Is
the error reproducible? If so, please list the steps so that the
developers can recreate the error.=
(3)&=
nbsp;
User
Interface – Is the user interface intuitive? Are buttons labeled
appropriately? Is it clear where to input the parameters for each
worksheet?
(4)&=
nbsp;
Explanatory
Text – Is the text on the left side of each worksheet useful?
There's a lot of text there. Can it be trimmed down? Should mor=
e be
added to it?
(5)&=
nbsp;
Operability
– =
Is
the spreadsheet responsive? When things happen, do you get an appropr=
iate
message box or status message?
(6)&=
nbsp;
Distractions
– =
List
up to three things that annoy you about the spreadsheet. If it annoys
you, how could it do those things differently so as not to annoy you?
(7)&=
nbsp;
Capabilities
– =
List
up to three things that you wish the spreadsheet did that it does not curre=
ntly
do.
Open HydroExcel from the file HydroExcel-Jan09.xlsb. &n=
bsp;
HydroExcel uses macros to perform its duties, so you may have to ena=
ble
the macros functionality by going to the security warning at the top of the
screen and telling it to “Enable this Content”.
Save the HydroExcel file to a new name Ex1HydroExcel-Jan09.xlsb so that =
if you
want to go back and get your original version again at any time you can do =
so.
Once HydroExcel is running, you’ll see a r=
ow
of tabs at the bottom of the spreadsheet.&=
nbsp;
These tabs have the following functions:
Open the Data
Source worksheet. Here we=
see a
list of all the web services that HydroExcel can operate. Each one of these
services using a live internet connection to access a particular dataset
(national, state, or university-based) and return the information that was
requested. The default WSDL
location is the data series for the Little Bear River observation network at
Utah State University.
For our work, we want to retrieve mean daily
streamflow data from the USGS NWIS daily value database http=
://river.sdsc.edu/wateroneflow/NWIS/DailyValues.asmx?WSDL
so select this WSDL address and copy it into the box at the top labeled WSDL Location. You can also accomp=
lish
this task by right clicking on the address in the list, going to the HydroE=
xcel
tab in the context menu that appears, and choosing “Set active web service to …..” as shown below.
The address should now appear in the “WSDL
Location” box as shown.
Lets check to make sure that the
web services are working. Hit=
the button. You’ll hear a “Clunk=
8221;
and then the following message should appear, which shows you that all the =
web
services are up and runnnig. =
Ok, we
are in business!
Let’s take a look at what kind of data we = can get from the NWIS daily values web service. Select the button, and you’ll see = that the view switches to the Variables= and after a short wait a list of variables is returned. These are all the variables for wh= ich data are recorded at some sites in the NWIS Daily Values network. Scroll down and take a look at all= the variables that are available. From this list we can see that there are 382 of them in total. Here are a couple of useful ones:<= o:p>
NWISDV:00060 |
Discharge, cubic feet per seco=
nd |
NWISDV:00600 |
Total nitrogen, water, unfilte=
red,
milligrams per liter |
Here, NWISDV stands for NWIS Daily Values and th=
e DV
distinguishes these data from others measured at these sites (such as grab
sample water quality data), or those measured on other NWIS networks, such =
as
for groundwater.
OK. Now
let’s tell the program what geographical location we’re interes=
ted
in retrieving daily values for. We
do this under the tab.
If at this point we told the tool to “Get Sites”, it wou=
ld
go out and retrieve a list of all the sites that have daily values availabl=
e in
the NWIS daily values database.
Since that’s over 24,000 sites, let’s rein in our search=
by
telling it to only locate sites in a certain geographical area – in the Get Sites Options box, check the Use Lat/Lon Box (if you don’t have the check mark as
shown below, HydroExcel ignores the parameters that you have added in the
lat-long box to limit the search and just acquires all the sites in the
network). Lets
also map the sites in Google Earth after downloading by selecting TRUE in the Create and open KML file after download. We now input a lat/lon box that fr=
ames
in the area around Austin, Texas (see figure below), a range from 29.5º=
;N
to 31ºN in latitude, and 97ºW to 98ºW in longitude (remember
that -97 and -98 are needed when specifying longitudes in decimal degrees,
because West Longitudes are considered negative, while East Longitudes are
positive). Select =
Get
Sites and wait a few seconds.
HydroExcel goes over the internet connection, accesses the NWIS
database, and downloads the 79 sites that have daily values in our indicated
area of interest. By indicati=
ng
“TRUE” in the box, we also told HydroExcel to create and open a=
KML
file showing us where these sites are located. The tool automatically creates this
layer and opens Google Earth with the sites indicated with pins. Pretty Cool! If you go to the Google Earth Places menu on the
left of the display and click on the + sign by NWISDV network, it will expa=
nd
and you’ll see a listing of all the sites in the downloaded network.<=
o:p> If you scroll down, you’ll find site 08158=
000,
Colorado River at Austin. And if you click on its pin, you’ll see whe=
re
it is located – on the Colarado River, North of Bergstrom Airport, ju=
st
upstream of where Highway 183 crosses the Colorado River. Let’s narrow our search again. To do that, we will Sort the 79 sites shown in our Excel worksheet by County. Highlight the columns and rows 13-=
92 and
columns F to K that have the Site data in them, beginning with the header r=
ow
at the top and then use Data/Sort<=
/b>
with Sort by County as the sel=
ected
column. Excel
Tip: When you wish to highlight a set of cells in Ex=
cel,
Press Ctrl-Shift and then → if you want to highlight cells to the right, or=
↓if you want =
to
highlight cells below. Thus, if you highlight the SiteCode cell at the top =
left
of the table and hit Ctrl-Shift→ =
the
cells to the right that are filled are highlighted, and then if you use Ctrl-Shift↓=
the
cells below are highlighted to the extent of the cells that have active
contents. If we now scroll down a little bit, we see that =
41
of the 79 sites are located in Travis County. What kind of information is availa=
ble at
each of these sites? To answe=
r that,
let’s select all of the sites in Travis County (as shown below), right
click, and select the Get Site Cat=
alog
for Selected Sites option under the HydroExcel
context menu (Note that you cannot include any headers in the selected set =
of
sites or the context sensitive menu for HydroExcel won’t appear). Excel
Tip: When you wish to highlight a set of rows whose
values in one column are the same, you can use a Data/Filter. Highlight all the cells in the Sites list as bef=
ore,
select Data/Filter, and you’ll see a set of highlight headers on the
columns. Under the required column, deselect Select All and then click on t=
he
county that you want: This step takes a few minutes as HydroExcel acce=
sses
the database and figures out what kind of daily data are available at each =
of
these sites. When the query is
complete, the Site Catalog wor=
ksheet
is automatically opened. If w=
e scroll
through this worksheet, we can see that we now have more information about =
each
of these sites and what type of data is available at each. At station 08158920, for example, =
we can
get average daily gauge height in units of feet and average daily streamflo=
w in
units of cubic feet per second (cfs).
Both datasets are available through the current date; the streamflow
data starts in 1978, however, while the gauge height data starts in 1982. For our work, we are going to look at the mean d=
aily
streamflow at USGS station 0815800=
0,
the Colorado River at Austin, TX. To download the time series associ=
ated
with this variable, we select one of the cells that correspond to the
streamflow at the station, right click, and select Download Time Series for NWISDIV:00060 &=
#8211;
Discharge, cubic feet per second. We are now moved to the Time Series worksheet and wait for a bit while the program
downloads all the mean daily streamflow values available at this station. On=
ce the
data is downloaded, we now have values for mean daily streamflow at this
location on the Colorado River for every day from 1893 to present. Wow! We should definitely see some vari=
ety in
these numbers. If you want a shorter period of record, Choose the Po=
pulate
Time Series Parameters …. option (as s=
hown
below) and then you can fill in a new StartDate
and EndDate in the Time Series
spreadsheet and, and hit . Well, let’s look at them and see what
we’ve got. Go to the Statistics and Charts tab. Here we have some macros programme=
d to
look at the data that we just downloaded and make it easier to ingest. By default, the tool makes a graph=
of
all the data that we downloaded.
Since Excel can only graph 32,000 numbers and we have so much data,
however, our graph is only going to go up through 1983 – which is whe=
n it
runs out of space on the graph. That’s
OK for now, but we may want to adjust our dates in other attempts. No=
tice
that on this graph, we see the maximum, minimum, and average values that are
reported for the time step that we’re graphing. Since we’re graphing data fo=
r mean
daily streamflow, however, the maximum, minimum, and average number are all
going to be the same in this case.
Obviously we see lots of variety here. We have some really low flows duri=
ng the
1940, some really high flows in the mid to late 1930s and some more
intermediate flows in between. Let’s look at another time step. Click on the chart (graph) area and a side bar will appear on the right hand
side of your worksheet. This =
side
bar shows us what our plot is showing, in this case the Values of the mean daily streamflow displayed by DateTime. The “Legend Fields” are
denoted by Σ Values and i=
n this
case the legend displays Average=
b>, Max, and Min, according to the Axis Field DateTime. In this=
case,
the three values are the same for each day since we have only a single valu=
e in
our data series for each DateTime. Uncheck the DateTime
box and instead select the Month=
b> box. Our graph now shows us the maximum,
minimum, and average mean daily discharge values during each month for our
entire period of record.
That’s cool. We =
can
now see the fluctuations in these values over time. Look at the seasonality here. We see a lot of variation from mon=
th to
month. If you look on the left hand side of the display,
you’ll see the numbers plotted in the graphs. Excel time works in Julian days st=
arting
at Jan 1, 1900, so it doesn’t like data that fall in the 1890’s and summarizes them as a single value.Getting a Site Catalog
Getting a Time Series
It’s downloading over 40,000
values, so this may take a while! =
span>When
it is done you’ll hear a “Clunk” and the “Download
Complete” box appears. =
Summarizing Daily Time Series by Months and Y=
ears
|
|
To
be turned in: Compare the monthly flow character=
istics
(average, max, min) of the Colorado River at Aus=
tin
from 1900 to 1940 and from 1940 to present. How has the construction of =
the
Highland Lakes chain of reservoirs affected these characteristics?
We can view the same data, on a yearly basis, by
unselecting the month box and
selecting the year box. Neat.
|
|
To
be turned in: Compare the daily flow characteris=
tics
summarized over years (average, max, min) of the Colorado River at Austin f=
rom
1900 to 1940 and from 1940 to present. How has the construction of =
the
Highland Lakes chain of reservoirs affected these annual flow characteristi=
cs?
Next, let’s take a 10-year portion of the =
data
that we just downloaded and perform some statistics on it outside of HydroE=
xcel. Return to the Time Series tab and select the dates and values for the time fr=
om
1/1/1999 to 12/31/2008, and hit Ge=
tValues. You’ll see that this dow=
nload
occurs a lot faster than the earlier one since you have only 10 years of da=
ta
to acquire.
Copy
thes data. Open a new Excel workbook and paste
the mean daily flow data from 1/1/1999 to 12/31/2008 into a new worksheet.<=
span
style=3D'mso-spacerun:yes'> Save the workbook as “Ex1.xlsb”.
Excel has a number of statistical procedures bui=
lt
into it. We will use some of =
these
now. First of all, let’s
calculate some basic statistics of the mean daily flow values. Under the Data tab, you should find the Data
Analysis tool. If the too=
l is
not there, you have to activate it.
To activate the Data
Analysis tool you choose the O=
ffice
Button and then Excel Options<=
/b>
and the bottom of the page that appears.
In the =
Excel
Options window go to Add-ins=
b> tab
(select on left); at the bottom of the box select the Analysis Toolpak, and then hit the Go… button next to the Manage
Excel Add-ins dropdown box at the bottom of the page (its not enough ju=
st
to double click on the Analysis Toolpak entry in the table).
.
In the =
Add-ins
pop up box, activate the Analysis
Toolpak by clicking in its check box.
In the =
Data
tab in Excel, you should now see an Analysis
box appear to the right of the Out=
line
box.
Now we can use the Data Analysis tools. First let’s use the tools to
calculate the summary statistics for our 10-years of mean daily flow data.<=
span
style=3D'mso-spacerun:yes'> Under the Data tab, select the D=
ata
Analysis tool, and select the =
Descriptive
Statistics option.
The input range includes our 10-years of mean da=
ily
flow values (don’t include the column header); the output range is to=
the
right of the input data (cell D1).
We choose to calculate the =
summary
statistics and 95% confidence interval for the mean, for these data by
activating its box and selecting O=
K.
Which produces the following
result. =
span>Excel
calculates 13 common statistics for the dataset. We now have a summary of this data=
and
can get some insight on what we’re looking at. We can see, for example, that the
average daily flow during this time was 1,587 cfs, but at times it got up as
high as 27,1000 cfs. The stan=
dard
error of estimate is 48.46 cfs (SQRT(Variance/Co=
unt)),
and the 95% confidence limit on the mean is 95.0 cfs (1.96 * Standard
error). This means that the t=
rue
mean streamflow for this period lies between 1587 +/- 95 cfs with 95%
confidence.
These statistics look fairly horrible in so far =
as
fitting a distribution to the data because they show that the data are high=
ly
skewed (6.15) and the mean is well above the median and mode.
If we take y =3D Log10<=
/sub>(Q)
and repeat this exercise, we get the following result, which looks a lot
nicer. Skewness is now near 0=
and
mean, median and mode are all consistent with one another. This suggests that the daily strea=
mflow
data may be well fitted by a lognormal distribution.
To
be turned in:
What are the mean, median, standard error of estimate and 95% confidence le=
vel
on the mean of the logarithms of the data?=
What are these values when converted to cfs? How do these compare with the
descriptive statistics compiled directly from the data without taking logs?=
We can also get an understanding of our data by
creating a visual. One common
example of data visualization is the histogram. A histogram is a plot that shows t=
he
number of values that lie in various categories or “bins” of the
data. To develop a histogram,=
we
must split our data into these bins.
Our summary statistics give us some guidance on how to define these =
bins. Let’s start by dividing the =
data
into equal bins of 1800 cfs each.
To do this, we simply type our bin divisions in the workbook as show=
n.
We then select the Data Analysis tool in the
“Data” tab. This =
time
we chose the histogram option. Our
input data is, again, our 10-years of mean daily flows; the bins are the
numbers that we just entered. We
chose to output the histogram data in the same spreadsheet.
We now have an output of the bins and the number=
of
data values that lie in each bin.
The easiest way to understand this data is by plotting a bar graph of
it, as shown. This is how his=
tograms
are typically plotted, since it gives a good visual summary of how the data
varies.
To be turned in: Compute and
compare the histograms of the data and of the logs of the data
To
be turned in:
Now go back to HydroExcel and use it to tell your own story about time tren=
ds
in streamflow at another location anywhere in the United States. Use the functionality of HydroExce=
l to show
to show the characteristics of the flow and how they vary through time. Why did you choose this story? (Your story should include some pl=
ots
and about a half-page of typed explanation.)
(1)&=
nbsp;
=
Compare
the monthly flow characteristics (average, max, min)
of the Colorado River at Austin from 1900 to 1940 and from 1940 to
present. How has the
construction of the Highland Lakes chain of reservoirs affected these
characteristics?
(2)&=
nbsp;
=
Compare
the daily flow characteristics summarized over years (average, max, min) of the Colorado River at Austin from 1900 to 1940=
and
from 1940 to present. H=
ow has
the construction of the Highland Lakes chain of reservoirs affected these a=
nnual
flow characteristics?
(3)&=
nbsp;
=
What
are the mean, median, standard error of estimate and 95% confidence level on
the mean of the logarithms of the data?&nb=
sp;
What are these values when converted to cfs? How do these compare with the
descriptive statistics compiled directly from the data without taking logs?=
(4)&=
nbsp;
=
Compute and compare the histograms of the data and of the
logs of the data.
(5)&=
nbsp;
=
Now
go back to HydroExcel and use it to tell your own story about time trends i=
n streamflow
at another location anywhere in the United States. Use the functionality of HydroExce=
l to
show to show the characteristics of the flow and how they vary through time=
. Why did you choose this story? (Your story should include some pl=
ots
and about a half-page of typed explanation.)
(6)&=
nbsp;
=
User
feedback on HydroExcel, as specified below
User
Feedback on HydroExcel
We would like feedback on user improvements that
could be made to HydroExcel. =
If you
have problems using this tool or see places where it could be improved, ple=
ase
note them as you do this exercise and then respond at the end to the follow=
ing
questions:
(1)&=
nbsp;
Installation –
(2)&=
nbsp;
Errors
– Did you encounter any errors? =
Is
the error reproducible? If so, please list the steps so that the
developers can recreate the error.=
(3)&=
nbsp;
User
Interface – Is the user interface intuitive? Are buttons labeled
appropriately? Is it clear where to input the parameters for each
worksheet?
(4)&=
nbsp;
Explanatory
Text – Is the text on the left side of each worksheet useful?
There's a lot of text there. Can it be trimmed down? Should mor=
e be
added to it?
(5)&=
nbsp;
Operability
– =
Is
the spreadsheet responsive? When things happen, do you get an appropr=
iate
message box or status message?
(6)&=
nbsp;
Distractions
– =
List
up to three things that annoy you about the spreadsheet. If it annoys
you, how could it do those things differently so as not to annoy you?
(7)&=
nbsp;
Capabilities
– =
List
up to three things that you wish the spreadsheet did that it does not curre=
ntly
do.
OK.=
Now you’re done!