A couple of months ago, I downloaded the aggregated datasets from DEEWR's higher education statistics page. The files are large, with each line telling you about a group of students. A line might tell you (to make up an example) that there were 5 domestic, commencing Bachelor students aged 23 studying psychology at UQ in 2009. Reading these files requires either superhuman patience with Excel or writing your own code, so DEEWR helpfully collate the data into easy-to-read spreadsheets such as those for 2009 on this page. It is these sorts of spreadsheets that have been used recently by Andrew Norton and FlagPost to discuss maths (and science) enrolments from 2008 to 2010, a period when HECS fees for these subjects were decreased.
When I played around with the aggregated datasets, I found various errors and inconsistencies. Since no-one seems to be mentioning them, I've decided to write up some of what I found before I gave up on getting the data I wanted out of it. Part 1 deals with the student load data used in the linked blog posts above. While this data may well be accurate (I haven't checked in detail), the figures in the summary spreadsheets are not reliable for gauging how many students want to study maths or physics. Part 2 looks at errors and inconsistencies in the physics data.
Part 1: Probably consistent and accurate data which is hard to interpret
Included in the commencing EFTSL numbers are Honours and grad entry - the former are 4th year and should be excluded, though their impact is minor. In the following I've excluded the grad entry students as well, because while I don't know what they are, I'm pretty sure they're not maths or physics students.
Using the EFTSL data gives a very noisy description of maths enrolments, especially for commencing students, because most students enrolled in first year maths courses are not maths students – there are a heap of engineering students who have to do maths to at least second-year. Including all the domestic students in commencing pass-level Bachelor degrees, the mathematical sciences (code 0101) EFTSL went from 7291.485 in 2008 to 7959.698 in 2009, an increase of 9.2%. If we restrict it to students whose broad field of education is the natural sciences (01), these numbers become 1849.829 and 2120.075 respectively, an increase of 14.6%. Already we see a very large change in the apparent increase in maths enrolments.
But restricting it to science students does not necessarily make the percentage increase more accurate, since many (most?) of those students do not intend to study maths, but are (e.g.) biology students doing their compulsory 1st year stats course. Only a very small minority of those EFTSL's (about 160 of them) belong to students who are classified as having a maths field of study; most are classified as general science students, a group which includes many maths students who haven't been classified more specifically.
We can restrict the search to 010101 subjects (maths, as opposed to 010103, which is stats). Including all EFTSL's in 010101, the change from 2008 to 2009 is from 4307.221 to 4710.288 (an increase of 9.4%); restricting it to science students gives a change from 1066.96 to 1249.311 (an increase of 17.1%). How many of these students are actually maths students, as opposed to chemists or physicists doing compulsory courses? I don't know.
Repeating the procedure for physics and astronomy courses (0103) gives an even more spectular change to the increase from 2008 to 2009. We have for all students a change from 1951.509 to 2059.804 (increase of 5.5%), and restricting to science students, 801.3837 to 915.3911 (14.2%).
The student load data for the narrow fields of study should therefore be treated cautiously, unless you care about departments' teaching loads.
Part 2.1: Errors
On the 'All students' spreadsheet (XLS file), you can see that UTas had 1056 PhD students enrolled in 2009. But a breakdown of these PhD students by field of education shows that 146 of them are coded as 0103, physics and astronomy. There are not 146 physics PhD students at UTas – that is more than any other uni had in 2009. I'm told that the true figure is somewhere around 10. I don't know if the rest of these 146 are PhD students miscoded as doing physics, physics students miscoded as doing a PhD, or some combination of the two. If we look at the breakdown of the enrolled students by subfield and course type, we see that there are no undergrads coded as doing physics, but perhaps they are all lumped into the 'other' science category (I've omitted various Grad Dip and other courses):
Subfield descr, code PhD Mas rsch Mas cswk B Hons B pass other 0199 10 2 50 102 819 bio 0109 157 9 10 18 119 earth 0107 0 1 18 4 26 physics 0103 146 8 0 0 0 maths 0101 9 4 0 0 0
I don't think UTas is an isolated case, but I haven't checked in detail. It would surprise me if Victoria University had more physics PhD students (87 in 2009) than Melbourne, for instance, but perhaps I'm just unfamiliar with their areas of research.
I haven't tried looking at other subjects, because while I have a vague idea of the major Australian unis for physics research, I don't have a clue about anything else and so wouldn't know where to look for erroneous data.
Part 2.2: Inconsistencies
The inconsistencies I've found relate to the classification of students at levels narrower than the broad field of study. That is, students in the natural sciences are correctly classified as such, but unis differ as to how they distinguish between the subfields of maths, physics, bio, etc., or indeed whether they distinguish at all.
To keep the tables small, I'll restrict myself to the Go8 and the same course types as in the table for UTas. Here is the breakdown for enrolments of physics students:
Uni PhD Mas rsch Mas cswk B Hons B pass ANU 139 19 6 0 2 Adel 66 2 10 20 66 Melb 84 8 22 0 0 UNSW 46 5 0 0 0 UQ 49 2 9 5 0 USyd 102 12 0 0 0
It is immediately obvious that this is a Group of 6, with UWA and Monash absent. Apart from two lonely biologists and one even lonelier earth scientist, all of UWA's PhD students in the natural sciences are lumped together in the 'other' category. Monash separate out their biology PhD students, but physics counts as 'other'.
The University of Adelaide is wonderful and classifies all their physics students nicely (I assume). It is fair enough that the other unis don't bother with classifying their undergrads into subdisciplines of science, since it's not always obvious till 3rd year where they'll finish. But in an Honours year it's clear what subject a student is doing, so it should be possible to have this data recorded (Melbourne-model unis exempted).
Furthermore, at the end of a science degree, you graduate with a BSc in some particular field(s) – at least you do at UQ. Here is the same table, but for completions rather than enrolments:
Uni PhD Mas rsch Mas cswk B Hons B pass ANU 25 3 2 0 1 Adel 8 0 3 2 8 Melb 12 1 0 0 0 UNSW 10 1 0 0 0 UQ 8 3 3 5 0 USyd 23 0 0 0 0
UQ just lumps all its graduating physics students, which it knows it has, in with other science students. I don't know what's going on in Adelaide, whose lovely numbers in the earlier table don't seem to correspond to their completions.
It's difficult to get good data for anything narrower than the broad field of education, at least in science.