Tips for working with national attendance data

In this post I’m going to talk through some issues I’ve had and tricks come up with when dealing with national attendance stats.

The Data

First step is to get the data, which you can find at https://www.gov.uk/government/collections/statistics-pupil-absence (or I just google ‘dfe absence statistics’).

There are three releases a year. The Autumn term release, which comes out the following May. The Autumn and Spring release which comes out the following October and the full year release which comes out the following March-June. So for the 2018-2019 academic year the autumn term stats were released in May 2019, the Autumn-Spring will be release this month (October 2019) and the full year stats will come out in Spring 2020.

You therefore usually have a choice between more recent data that doesn’t cover the whole year and older data that does. Another factor to bear in mind is that the Autumn-Spring data doesn’t have breakdowns for Ever6 FSM where as the full year does. That’s the clincher for me, so at the moment I’m using the full year dataset from the last academic year for which it’s available (2017/18).

Click on the collection then download the zip file with the underlying data. In the folder is a csv called Absence_3term_201718_national_characteristics.csv that contains the overall absence statistics (for 2017/18 and previous years) as well as breakdowns by school type and pupil context.

Data Import

You might think getting this data into BI is so simple as not to require a step – Get Data CSV right? I’d actually advise against this method though because when you publish your dashboard and go to schedule refreshes to keep it up to date it will try to connect to all the data sources in the file – if one of those is a csv on your hard drive it won’t let you proceed with the scheduling.

Now you can set up a ‘Power BI Gateway’ which allows the BI service to query files stored locally, but as that’s essentially a backdoor into your school network your IT people should be pretty leery about letting you set it up. Plus it’s completely unnecessary. This data is in the public domain so we can paste it onto a publicly accessible site and call the data from there.

My preference is for Google Sheets. Paste the data into a google sheet and click ‘Share’ in the top right corner.

Edit the sharable link:

From:

https://docs.google.com/spreadsheets/d/<google-sheet-guid>/edit?usp=sharing

To:

https://docs.google.com/spreadsheets/d/<google-sheet-guid>/export?format=xlsx&id=<google-sheet-guid>

You can also use Excel365.

1. Navigate to your OneDrive for Business location using a browser. Right-click the file you want to use, and select Open in Excel.

2. In Excel, select File > Info and select the link above the Protect Workbook button. Select Copy link to clipboard (your version might say Copy path to clipboard).

3. Delete “?web=1” from the end of the link address.

In Power BI choose Get Data| Web and paste in the google/O365 link.

Data Structure

We’re interested in three columns for identifying our national data: Year, School_type and Characteristic.

First step is to go to Edit Queries and if year is stored as an integer (as it probably will be) right click and change it to text. This is so we can easily compare it with an academic year filter that stores dates as say “2019-20”

For this example I’m assuming a MAT with a school table that has school phases (Primary, Secondary and Special) that links to a student table that links to an attendance table.

I’m going to explain how to create a measure for national data that responds to the filters you use to filter your MAT’s data – so you don’t have to filter twice to compare like with like.

SELECTEDVALUE

The key to making this filter work is the SELECTEDVALUE function, which we’ve met before but to recap which takes the arguments:

SELECTEDVALUE(<column>,[<alternativereresult>])

If the column of the first argument is filtered such that it contains only one value, the function returns that value, otherwise you get the alternative result.

Phase Measure

The simplest example of how this works is the phase measure. In our national dataset the phases are “Total” (ie all phases), “State Funded Primary”, State Funded Secondary” and “Special”.

Phase = SWITCH(SELECTEDVALUE(School[Phase],"All"),"Primary","State-funded primary","Secondary","State-funded secondary","Special","Special","All","Total","Total")

The inner function returns ‘All’ if no value is selected in the School[phase] slicer. Otherwise it returns Primary/Secondary/Special which are then translated to the forms used in the DfE dataset by the switch function.

Year

The year is slightly more complicated because we want to use the most recent national data available, unless we are looking at internal data from 2016-17 or earlier as we have the national data from those years.

I created a calculated column on the DfE data (this is why I converted year to text as MID only works on text strings).

YearID = VALUE(MID(DFE_Absence[year],4,2))

This pulls out 2 characters starting at the fourth character – so 201718 yields 17.

I have a field called AcYear in a table DateRef that I use to choose academic years in the format 2017-18. My Year measure is therefore

NatYear = IF(VALUE(MID(SELECTEDVALUE(DateRef[AcYear]),3,2)) >= MAX(DFE_Absence[YearID]),MAX(DFE_Absence[YearID]),VALUE(MID(SELECTEDVALUE(DateRef[AcYear]),3,2)))

This gives me the YearID of the national data, ie 17 for 201718.

Context

Context is the hardest because there are loads of factors. Check it out:

Context = SWITCH(TRUE(),

SELECTEDVALUE(SwitchGen[Gender])="M","Gender_Male",

SELECTEDVALUE(SwitchGen[Gender])="F","Gender_Female",

SELECTEDVALUE(SwitchEAL[EAL]) = "Unknown","First_language_Unclassified",

SELECTEDVALUE(SwitchEAL[EAL]) = "N","First_language_Known_or_believed_to_be_English",

SELECTEDVALUE(SwitchEAL[EAL]) = "Y","First_language_Known_or_believed_to_be_other_than_English",

SELECTEDVALUE(EthnicityMain[Description]) ="White","Ethnicity_Major_White_Total",

SELECTEDVALUE(EthnicityMain[Description]) ="Mixed","Ethnicity_Major_Mixed_Total",

SELECTEDVALUE(EthnicityMain[Description]) ="Asian","Ethnicity_Major_Asian_Total",

SELECTEDVALUE(EthnicityMain[Description]) ="Black","Ethnicity_Major_Black_Total",

SELECTEDVALUE(EthnicityMain[Description]) ="Chinese","Ethnicity_Major_Chinese",

SELECTEDVALUE(EthnicityMain[Description]) ="Other","Ethnicity_Major_Any_Other_Ethnic_Group",

SELECTEDVALUE(EthnicityMajor[Description]) ="White - British","Ethnicity_Minor_White_British",

SELECTEDVALUE(EthnicityMajor[Description]) ="White - Irish","Ethnicity_Minor_Irish",

SELECTEDVALUE(EthnicityMajor[Description]) ="Traveller of Irish heritage","Ethnicity_Minor_Traveller_of_Irish_heritage",

SELECTEDVALUE(EthnicityMajor[Description]) ="Gypsy Roma","Ethnicity_Minor_Gypsy_Roma",

SELECTEDVALUE(EthnicityMajor[Description]) ="Any other white background","Ethnicity_Minor_Any_other_white_background",

SELECTEDVALUE(EthnicityMajor[Description]) ="White and Black Caribbean","Ethnicity_Minor_White_and_Black_Caribbean",

SELECTEDVALUE(EthnicityMajor[Description]) ="White and Black African","Ethnicity_Minor_White_and_Black_African",

SELECTEDVALUE(EthnicityMajor[Description]) ="White and Asian","Ethnicity_Minor_White_and_Asian",

SELECTEDVALUE(EthnicityMajor[Description]) ="Any other Mixed background","Ethnicity_Minor_Any_other_Mixed_background",

SELECTEDVALUE(EthnicityMajor[Description]) ="Indian","Ethnicity_Minor_Indian",

SELECTEDVALUE(EthnicityMajor[Description]) ="Pakistani","Ethnicity_Minor_Pakistani",

SELECTEDVALUE(EthnicityMajor[Description]) ="Bangladeshi","Ethnicity_Minor_Bangladeshi",

SELECTEDVALUE(EthnicityMajor[Description]) ="Any other Asian background","Ethnicity_Minor_Any_other_Asian_background",

SELECTEDVALUE(EthnicityMajor[Description]) ="Black Caribbean","Ethnicity_Minor_Black_Caribbean",

SELECTEDVALUE(EthnicityMajor[Description]) ="Black - African","Ethnicity_Minor_Black_African",

SELECTEDVALUE(EthnicityMajor[Description]) ="Any other black background","Ethnicity_Minor_Any_other_black_background",

SELECTEDVALUE(SwitchPupilContext[Factor]) ="FSM","FSM_eligible",

SELECTEDVALUE(SwitchPupilContext[Factor]) ="Disadvantaged","FSM_eligible_in_last_6_years",

OR(SELECTEDVALUE(GroupYear[YearName])="Y1",SELECTEDVALUE(GroupYear[YearName])="R"),"NC_Year_1_and_below",

SELECTEDVALUE(GroupYear[YearName]) ="Y2","NC_Year_2",

SELECTEDVALUE(GroupYear[YearName]) ="Y3","NC_Year_3",

SELECTEDVALUE(GroupYear[YearName]) ="Y4","NC_Year_4",

SELECTEDVALUE(GroupYear[YearName]) ="Y5","NC_Year_5",

SELECTEDVALUE(GroupYear[YearName]) ="Y6","NC_Year_6",

SELECTEDVALUE(GroupYear[YearName]) ="7","NC_Year_7",

SELECTEDVALUE(GroupYear[YearName]) ="8","NC_Year_8",

SELECTEDVALUE(GroupYear[YearName]) ="9","NC_Year_9",

SELECTEDVALUE(GroupYear[YearName]) ="10","NC_Year_10",

SELECTEDVALUE(GroupYear[YearName]) ="11","NC_Year_11",

OR(SELECTEDVALUE(GroupYear[YearName])="12",SELECTEDVALUE(GroupYear[YearName])="13"), "NC_Year_12_and_above",

"Total")

Clearly this would be a pain to type out into DAX, even allowing for autocomplete. So what I did was paste the unique context values into Excel. I then put in SELECTEDVALUE( in a column, the column names and the values like this:

Note that you can’t insert quote marks because they’re how Excel recognises speech so use a different symbol instead (I put in £), then copy the result into Notepad where you can use find and replace to change the £ to “.

There are a couple of other points to notice about this measure. Firstly it will only ever resolve to one value. It goes through each condition and if there is a value selected it will give the corresponding result. If no filters are applied then it will return ‘Total’.

The second is the use of the fields like ‘SwitchGen[Gender]’. First time I did this I just used the field Students[Gender] but that’s vulnerable to fail because SELECTEDVALUE returns a value if a column only has one value, even if you haven’t specifically filtered it to that value. So say you have two Travellers of Irish Heritage and they’re both girls. If you filter ethnicity to Traveller of Irish Heritage then your measure would return ‘Gender_Female’ because you’d have filtered your students table so that only girls were present. SwitchGen is a separate table, made using DATATABLE that has one column with values M/F that is related to Students[Gender]. That means SELECTEDVALUE will only return a value for the field when it is sliced.

The final measure

NatAttd = CALCULATE((100 - SELECTEDVALUE(DFE_Absence[sess_overall_percent]))/100,

DFE_Absence[YearID]=[NatYear],

DFE_Absence[school_type]=[Phase],

DFE_Absence[characteristic]=[Context])

Plus a bonus explainer measure for telling users how national data is being filtered.

AttdNatDisplay =
"National Dataset: " &
IF(VALUE(MID(SELECTEDVALUE(DateRef[AcYear]),3,2)) >= MAX(DFE_Absence[YearID]),
    "20" & MAX(DFE_Absence[YearID])&"-"&MAX(DFE_Absence[YearID])+1,
    "20" & VALUE(MID(SELECTEDVALUE(DateRef[AcYear]),3,2)) & "-" & VALUE(MID(SELECTEDVALUE(DateRef[AcYear]),3,2))+1)
    & ": " &
    IF([Phase]="Total","All Schools",[Phase])
    & ": " &

IF([Context]="Total","All Students",[Context])

Search This Blog

PowerBIforSchools

Tips for working with national attendance data

Comments

Post a Comment

Popular posts from this blog

Nan, Infinity and the healing power of DIVIDE

Attainment 8 in BI - Part 1

Data modelling for a large MAT