Tips for working with national attendance data
In this post I’m going to talk through some issues I’ve had
and tricks come up with when dealing with national attendance stats.
The Data
First step is to get the data, which you can find at https://www.gov.uk/government/collections/statistics-pupil-absence
(or I just google ‘dfe absence statistics’).
There are three releases a year. The Autumn term release,
which comes out the following May. The Autumn and Spring release which comes
out the following October and the full year release which comes out the
following March-June. So for the 2018-2019 academic year the autumn term stats
were released in May 2019, the Autumn-Spring will be release this month (October
2019) and the full year stats will come out in Spring 2020.
You therefore usually have a choice between more recent data
that doesn’t cover the whole year and older data that does. Another factor to
bear in mind is that the Autumn-Spring data doesn’t have breakdowns for Ever6 FSM
where as the full year does. That’s the clincher for me, so at the moment I’m
using the full year dataset from the last academic year for which it’s available
(2017/18).
Click on the collection then download the zip file with the
underlying data. In the folder is a csv called Absence_3term_201718_national_characteristics.csv
that contains the overall absence statistics (for 2017/18 and previous years)
as well as breakdowns by school type and pupil context.
Data Import
You might think getting this data into BI is so simple as
not to require a step – Get Data CSV right? I’d actually advise against this method
though because when you publish your dashboard and go to schedule refreshes to
keep it up to date it will try to connect to all the data sources in the file –
if one of those is a csv on your hard drive it won’t let you proceed with the
scheduling.
Now you can set up a ‘Power BI Gateway’ which allows the BI
service to query files stored locally, but as that’s essentially a backdoor
into your school network your IT people should be pretty leery about letting
you set it up. Plus it’s completely unnecessary. This data is in the public domain
so we can paste it onto a publicly accessible site and call the data from there.
My preference is for Google Sheets. Paste the data into a google sheet and click ‘Share’ in the top right corner.
Edit the sharable link:
From:
https://docs.google.com/spreadsheets/d/<google-sheet-guid>/edit?usp=sharing
To:
https://docs.google.com/spreadsheets/d/<google-sheet-guid>/export?format=xlsx&id=<google-sheet-guid>
You can also use Excel365.
1. Navigate to your OneDrive for Business
location using a browser. Right-click the file you want to use, and
select Open in Excel.
2. In Excel, select File > Info and
select the link above the Protect
Workbook button. Select Copy link to clipboard (your
version might say Copy
path to clipboard).
3. Delete “?web=1” from the end of the link
address.
In Power BI choose Get Data| Web and paste in the google/O365
link.
Data Structure
We’re interested in three columns for identifying our
national data: Year, School_type and Characteristic.
First step is to go to Edit Queries and if year is stored
as an integer (as it probably will be) right click and change it to text. This
is so we can easily compare it with an academic year filter that stores dates
as say “2019-20”
For this example I’m assuming a MAT with a school table
that has school phases (Primary, Secondary and Special) that links to a student
table that links to an attendance table.
I’m going to explain how to create a measure for national
data that responds to the filters you use to filter your MAT’s data – so you don’t
have to filter twice to compare like with like.
SELECTEDVALUE
The key to making this filter work is the SELECTEDVALUE
function, which we’ve met before but to recap which takes the arguments:
SELECTEDVALUE(<column>,[<alternativereresult>])
If the column of the first argument is filtered such that
it contains only one value, the function returns that value, otherwise you get the
alternative result.
Phase Measure
The simplest example of how this works is the phase
measure. In our national dataset the phases are “Total” (ie all phases), “State
Funded Primary”, State Funded Secondary” and “Special”.
Phase = SWITCH(SELECTEDVALUE(School[Phase],"All"),"Primary","State-funded
primary","Secondary","State-funded
secondary","Special","Special","All","Total","Total")
The inner function returns ‘All’ if no value is selected in
the School[phase] slicer. Otherwise it returns Primary/Secondary/Special which
are then translated to the forms used in the DfE dataset by the switch function.
Year
The year is slightly more complicated because we want to
use the most recent national data available, unless we are looking at internal
data from 2016-17 or earlier as we have the national data from those years.
I created a calculated column on the DfE data (this is why
I converted year to text as MID only works on text strings).
YearID = VALUE(MID(DFE_Absence[year],4,2))
This pulls out 2 characters starting at the fourth character
– so 201718 yields 17.
I have a field called AcYear in a table DateRef that I use
to choose academic years in the format 2017-18. My Year measure is therefore
NatYear = IF(VALUE(MID(SELECTEDVALUE(DateRef[AcYear]),3,2))
>=
MAX(DFE_Absence[YearID]),MAX(DFE_Absence[YearID]),VALUE(MID(SELECTEDVALUE(DateRef[AcYear]),3,2)))
This gives me the YearID of the national data, ie 17 for
201718.
Context
Context is the hardest because there are loads of factors.
Check it out:
Context =
SWITCH(TRUE(),
SELECTEDVALUE(SwitchGen[Gender])="M","Gender_Male",
SELECTEDVALUE(SwitchGen[Gender])="F","Gender_Female",
SELECTEDVALUE(SwitchEAL[EAL]) =
"Unknown","First_language_Unclassified",
SELECTEDVALUE(SwitchEAL[EAL])
= "N","First_language_Known_or_believed_to_be_English",
SELECTEDVALUE(SwitchEAL[EAL]) =
"Y","First_language_Known_or_believed_to_be_other_than_English",
SELECTEDVALUE(EthnicityMain[Description])
="White","Ethnicity_Major_White_Total",
SELECTEDVALUE(EthnicityMain[Description])
="Mixed","Ethnicity_Major_Mixed_Total",
SELECTEDVALUE(EthnicityMain[Description])
="Asian","Ethnicity_Major_Asian_Total",
SELECTEDVALUE(EthnicityMain[Description])
="Black","Ethnicity_Major_Black_Total",
SELECTEDVALUE(EthnicityMain[Description])
="Chinese","Ethnicity_Major_Chinese",
SELECTEDVALUE(EthnicityMain[Description])
="Other","Ethnicity_Major_Any_Other_Ethnic_Group",
SELECTEDVALUE(EthnicityMajor[Description]) ="White - British","Ethnicity_Minor_White_British",
SELECTEDVALUE(EthnicityMajor[Description]) ="White -
Irish","Ethnicity_Minor_Irish",
SELECTEDVALUE(EthnicityMajor[Description]) ="Traveller of Irish
heritage","Ethnicity_Minor_Traveller_of_Irish_heritage",
SELECTEDVALUE(EthnicityMajor[Description]) ="Gypsy
Roma","Ethnicity_Minor_Gypsy_Roma",
SELECTEDVALUE(EthnicityMajor[Description]) ="Any other white
background","Ethnicity_Minor_Any_other_white_background",
SELECTEDVALUE(EthnicityMajor[Description]) ="White and Black
Caribbean","Ethnicity_Minor_White_and_Black_Caribbean",
SELECTEDVALUE(EthnicityMajor[Description]) ="White and Black
African","Ethnicity_Minor_White_and_Black_African",
SELECTEDVALUE(EthnicityMajor[Description]) ="White and
Asian","Ethnicity_Minor_White_and_Asian",
SELECTEDVALUE(EthnicityMajor[Description]) ="Any other Mixed
background","Ethnicity_Minor_Any_other_Mixed_background",
SELECTEDVALUE(EthnicityMajor[Description])
="Indian","Ethnicity_Minor_Indian",
SELECTEDVALUE(EthnicityMajor[Description])
="Pakistani","Ethnicity_Minor_Pakistani",
SELECTEDVALUE(EthnicityMajor[Description])
="Bangladeshi","Ethnicity_Minor_Bangladeshi",
SELECTEDVALUE(EthnicityMajor[Description]) ="Any other Asian
background","Ethnicity_Minor_Any_other_Asian_background",
SELECTEDVALUE(EthnicityMajor[Description]) ="Black
Caribbean","Ethnicity_Minor_Black_Caribbean",
SELECTEDVALUE(EthnicityMajor[Description]) ="Black -
African","Ethnicity_Minor_Black_African",
SELECTEDVALUE(EthnicityMajor[Description])
="Any other black
background","Ethnicity_Minor_Any_other_black_background",
SELECTEDVALUE(SwitchPupilContext[Factor])
="FSM","FSM_eligible",
SELECTEDVALUE(SwitchPupilContext[Factor])
="Disadvantaged","FSM_eligible_in_last_6_years",
OR(SELECTEDVALUE(GroupYear[YearName])="Y1",SELECTEDVALUE(GroupYear[YearName])="R"),"NC_Year_1_and_below",
SELECTEDVALUE(GroupYear[YearName])
="Y2","NC_Year_2",
SELECTEDVALUE(GroupYear[YearName])
="Y3","NC_Year_3",
SELECTEDVALUE(GroupYear[YearName])
="Y4","NC_Year_4",
SELECTEDVALUE(GroupYear[YearName])
="Y5","NC_Year_5",
SELECTEDVALUE(GroupYear[YearName])
="Y6","NC_Year_6",
SELECTEDVALUE(GroupYear[YearName]) ="7","NC_Year_7",
SELECTEDVALUE(GroupYear[YearName]) ="8","NC_Year_8",
SELECTEDVALUE(GroupYear[YearName]) ="9","NC_Year_9",
SELECTEDVALUE(GroupYear[YearName])
="10","NC_Year_10",
SELECTEDVALUE(GroupYear[YearName])
="11","NC_Year_11",
OR(SELECTEDVALUE(GroupYear[YearName])="12",SELECTEDVALUE(GroupYear[YearName])="13"),
"NC_Year_12_and_above",
"Total")
Clearly
this would be a pain to type out into DAX, even allowing for autocomplete. So
what I did was paste the unique context values into Excel. I then put in
SELECTEDVALUE( in a column, the column names and the values like this:
Note
that you can’t insert quote marks because they’re how Excel recognises speech
so use a different symbol instead (I put in £), then copy the result into
Notepad where you can use find and replace to change the £ to “.
There
are a couple of other points to notice about this measure. Firstly it will only
ever resolve to one value. It goes through each condition and if there is a value
selected it will give the corresponding result. If no filters are applied then
it will return ‘Total’.
The
second is the use of the fields like ‘SwitchGen[Gender]’. First time I did this
I just used the field Students[Gender] but that’s vulnerable to fail because
SELECTEDVALUE returns a value if a column only has one value, even if you haven’t
specifically filtered it to that value. So say you have two Travellers of Irish
Heritage and they’re both girls. If you filter ethnicity to Traveller of Irish
Heritage then your measure would return ‘Gender_Female’ because you’d have
filtered your students table so that only girls were present. SwitchGen is a
separate table, made using DATATABLE that has one column with values M/F that is
related to Students[Gender]. That means SELECTEDVALUE will only return a value
for the field when it is sliced.
The
final measure
NatAttd
= CALCULATE((100 - SELECTEDVALUE(DFE_Absence[sess_overall_percent]))/100,
DFE_Absence[YearID]=[NatYear],
DFE_Absence[school_type]=[Phase],
DFE_Absence[characteristic]=[Context])
Plus
a bonus explainer measure for telling users how national data is being filtered.
"National Dataset: " &
IF(VALUE(MID(SELECTEDVALUE(DateRef[AcYear]),3,2)) >= MAX(DFE_Absence[YearID]),
"20" & MAX(DFE_Absence[YearID])&"-"&MAX(DFE_Absence[YearID])+1,
"20" & VALUE(MID(SELECTEDVALUE(DateRef[AcYear]),3,2)) & "-" & VALUE(MID(SELECTEDVALUE(DateRef[AcYear]),3,2))+1)
& ": " &
IF([Phase]="Total","All Schools",[Phase])
& ": " &
IF([Context]="Total","All
Students",[Context])
Comments
Post a Comment