Historical Temperature and Precipitation Data for Phildelphia, PA

      These data are taken from the Franklin Institute's weather website. This site states that the Franklin Institute "has been an observing site for the National Weather Service since 1993." However, their weather station is currently located on the roof of their building in downtown Philadelphia – a very poor choice. There are overlapping data files starting in 1993. This application uses the historical file HERE, through 1999, to which are appended the yearly files HERE, starting in 2000. The data source(s) for the historical data file are not given. The "official" Philadelphia weather site has moved several times since the 1870's. It was moved to Philadelphia International Airport, another poor choice for climate research purposes, in 1948. The lack of documentation about sources is a problem for using these data in climate research – a problem cetainly not unique to Philadelphia.
      Some formatting differences in the Franklin Institute files have been resolved in the file used here. Inches are used for snow and rain and temperatures are in °F; these the standard reporting units for US weather stations even though conversions to metric units and °C are sometimes made in datasets used for scientific purposes. Trace amounts of precipitation (rain and snow) are indicated by a value of -1 rather than T or trace. Prior to 2002, the Franklin Institute files expressed rain in hundredths of an inch (that is, a value of 72 means 0.72") and snow values in tenths of an inch (10 means 1" of snow). In the file used here, precipitation and snow (including accumulated snow starting in 2002) are always given in inches.
      Missing values are always given as -999. All years contain 366 days, with -999 entered for February 29 in non-leap years. Snowfall reporting begins in October 1884. Snow on the ground reporting begins in 2002; these data often seem inconsistent and they are not used in this application.
      The format is:

month day year max_T min_T rain(") snow(") snow_depth(")  
1	1	1872	-999	-999	-999	-999	-999
1	2	1872	-999	-999	-999	-999	-999
12	30	2012	35	32	0.22	0.4	0.4
12	31	2012	38	30	0	0	0
where the values are separated by spaces or tabs. The only data in 1872 are precipitation starting April 1. Note that data from other sites could be accessed with this application (although the earliest year with data will probably be different). The 8th column (snow depth) isn't used, so it doesn't have to be included in the file. Without making changes to the code, the data file must be named philtemp.txt (case-sensitive).
      The dataset as downloaded from the Franklin Institute website contained only a handful of days with missing data. Because the source of the data is not known, it is also not known whether missing data have sometimes been replaced with estimated values. But, for the very few remaining missing days, temperature values have been calculated as the average of the temperatures from the day before and after. There were only two days on which it seemed likely that missing precipitation values might have been non-zero, and these values have simply been estimated based on the day before and after. No indication about when missing data have been replaced with estimated values is given in these data files.
      In summary, the Franklin Institute data do not include the quality control flags that are are required for data intended for scientific research purposes. However, whatever problems might be raised by estimating a very few missing temperature and precipitation values seem more than adequately offset by the result of having a continuous record as needed for calculating cumulative values such as total precipitation and heating/cooling degree days.
      Heating and cooling degree days (HDD and CDD) are calculated in the following way:

HDD = MAX[Tbase – (Tmax + Tmin)/2, 0]
CDD = MAX[(Tmax + Tmin)/2 – Tbase, 0]

That is, when the daily average is above the base temperature, it is a cooling day (CDD). When the daily average is below the base temperature, it is a heating day (HDD). There are no negative values for HDD or CDD. Using the average of the daily maximum and minimum temperature is a commonly used approximation for the daily average temperature. Typically, a base temperature of 65°F is used. A calculation for growing degree days, related to cooling degree days, is often done for agricultural purposes – predicting time to crop maturity and timing the application of pesticides, for example. A base temperature of 50°F is often used for agricultural regions in temperate climates. These cumulative values are excellent single-value indicators of year-to-year changes in temperature – possibly more useful than annual average temperature.
      All things considered, these data from Philadelphia are interesting and useful for examining trends because of the length of the record, but they should be used with great caution in climate research. There are, however, several opportunities for inquiry projects. All of these projects will help develop skills in these areas:
Trend analysis, statistics, mathematical modeling
Data analysis, applying spreadsheet formulas, using appropriate graphics to display data
Writing clear experiment definitions and inquiry protocols, writing concisely and staying "on topic," supporting arguments based on evidence, developing oral presentation skills
Computer progreamming
Although many interesting analyses can be done within spreadsheets, others may need computer programs to read and process the data
In no particular order, here are some suggested inquiry-based questions, with brief explanatory notes:
Are there trends in any of the parameters available from these data, including the calculated values for heating and cooling degree days?
The basic goal in examaining historical weather data is to determine whether there are long-term trends that demonstrate changes in climate rather than fluctuations in weather. The challenge in finding climate trends is that those trends will be small compared to year-to-year variability. It is always possible to "do the math" for trendlines, but the results are subject to interpretation. Based on the assumption that a linear regression is appropriate, the correlation coefficient r2 may be significantly less than 1.
If there appear to be trends in temperature or precipitation, is it possible to manipulate the interpretation of the data based on how the trends are modeled mathematically or on what time intervals are used?
Linear regressions applied to the entire historical record may over- or underestimate trends. Using an exponential model rather than a linear one will result in larger changes in future predicted values. Changing the environment around a site may initiate a temperature trend. Trends can be made to appear larger or smaller by changing the time period examined.
Is the length of the growing season changing?
Using a base temperature of 50°F, the start of the growing season is marked by the appearance of cooling (growing) degree days above 0 and the end is marked by the appearance of heating degree days above 0. Sometimes, the beginning and end of the growing season is limited by calendar dates, too, so as to discount the possibly spurious occurrence of positive CDD and HDD values early and late in the year. For example, CDD calculations in northern temperature climates might be started on March 1 and ended on October 31. It might be better to count the start and end of the growing season from when the CDD and HDD values exceeds some small number, perhaps 10, to avoid "weather noise" at the beginning and end of the season.
Is the distribution of precipitation during the year changing?
Perhaps the timing of precipitation during the year changes even if there are no year-to-year trends. How many days per year have precipitation? (Do you want to count days with trace precipitation") How many days per year have precipitation greater than some specified amount? (Maybe the number of heavy rainfall events is changing.)
Is there "digit bias" in reported temperatures values?
For records that pre-date automated reporting of temperature values, it is entirely possible that there is some "digit bias" in the reporting of values. For example, if the scale on a thermometer displays even digits (70, 72, 74, etc.) it may be that reported values are rounded to the nearest even degree. In the absence of this reporting bias, for large numbers of temperature reports the distribution of the digits 0-9 in the "ones" position of the temperature value should be uniform – that is, all digits are equally likely. It is not difficult to write a computer program to read the data file and count the occurrence of the digits 0-9 in the "ones" position for every reported temperature value. Use a chi-square test to analyze results.
Answering these inquiry questions for the Philadelphia is an excellent way to start thinking about analyzing these kinds of data. If you ask the same kinds of questions for many data sites, from the US Historical Climatology Network or the US Climate Reference Network, then what starts here as inquiry can progress into authentic research about temperature and climate patterns across the US.
      The application for analyzing Philadelphia temperature and precipitation data is written in HTML and PHP. It must be run from an online server which supports PHP scripts or a local server on your computer (WAMP for Windows, MAMP for Mac).