J-Walk posted a link to the Top 101 Cities with the Largest Temperature etc…. The title is too long, so you’ll have to follow the link if you want the details. I wanted to see how this broke down by state. Here’s what I did.
I brought the data into Excel using a web query (Data – Import External Data – New Web Query). City-data.com didn’t put the tabular data into a table, so I had to bring in the whole site.
I used QuickTTC to split the data into columns.
I started with colon and open parenthesis. Then on the middle column, I applied period and close parenthesis. The other columns were obvious. I deleted the unwanted stuff and was left with this.
Next I created a pivot table from that data. I made State the row and added City and Pop to the column area. I don’t like having that column data on top of each other, so I drug the Data header on to the top of the Total label to move the data side-by-side.
To sort the data, I right clicked on the State column and chose Field Settings. From there, I chose Advanced and chose Count of City as the field on which to sort.
Of course I wanted that Descending, so I went back in and changed it. I next added the Diff field to the column area. Since my first two columns are side-by-side, adding Diff maintains that relationship. I right clicked on Diff – Field Settings and changed the aggregate function to Average and added a number format. Finally I formatted the Population column (right click – Field Settings – Number…).
The finished pivot table.
Great post, Dick (and not because you reference by blog). I do this sort of data scraping / cleanup thing all the time, but it never occurred to me to actually document how I do.
I second that DK. Great post. Love that you’re going back to the core roots.
I like it!
Thank you!! I use Excel’s Subtotal with the columns I want aggregated using the most common aggregating function (sum, count, etc.). Then I would use Find and Replace to exchange a “(1,” for the “(9,” if I needed and average rather than a sum.
Sure do like that pivot table tip. Subtotals can take quite a bit of space in large data-sets. Thanks again for all the great sharing.
Brett