Attainment 8 in BI - part 2


Previously on BI4Schools

In a previous post we covered how to prepare exam data to work out Attainment 8. Specifically that meant:
  • Adding QANs and ‘EBaccClass’ (EN, MA, EB or OP)
  • Adding DfE Performance Points
  • Duplicating Combined Science records so a record of 65 became two records with points value 5.5
  • Applying Discounting rules
  • Adding an Index which uniquely identifies each grade ordered from largest to smallest.



·     We brought that data into a BI model with the table name A8Data and linked it to a students table on StudentId.

Buckets of Joy

We going to create 4 measures, one each for the 4 buckets that make up Attainment 8. We’ll tackle them in order of difficulty: Maths, English, EBacc then Open. Before we get started though I want to do a deep dive on a special class of formulas in DAX – <aggregate>X, specifically SUMX as understanding how that works is key to understanding the measures we need to create.

SUMX
SUMX has two arguments a table, then an expression. What it does is it goes through each row of the table, calculates the expression in that row and sums the result for all rows. In other words it’s like adding the expression as a calculated column to the table and using a normal SUM function on that column.

In this simple example the formula SUMX(Animals,IF(ISBEAR(Animals[Animal]),1,0)) would return the value 3 as the result of the expression would be 1 for Panda, Grizzly and Paddington and 0 for Dog. (Obviously ISBEAR is not a real DAX formula but you get the idea).

There are two really important concepts to understand here. The first is that the measure responds to row and filter context, so if you have a filter somewhere on your report, Fictional Character = False then the result will be 2 as only the rows Panda, Grizzly and Dog will be considered in the measure. The second is that the first argument, Animals, doesn’t have to be an unaltered table from the model – it can be any DAX expression that results in a table (e.g. FILTER, SUMMARIZE, UNION). If you want to check what table functions there are and what they do, go to the DAX Function Reference.

If you think about the sums we need to do to work out A8 we need to go through each kid and for each one add up some of their results, so we will be nesting SUMX formulas, for each row of the Students table, iterating through relevant rows of the A8Data table, before moving on to the next student and their A8Data records.

Our measures will look a bit like this:

TotalPoints = SUMX(Students,SUMX(A8Data,A8Data[Points])

This formula will work through the students table and for each student, it will add up the points on the A8Data table. If you create a table with students on it and add that total points measure you will notice something strange though. The totals for the individual students will be right, but the overall total will be far too high (it will be the total number of points in the table multiplied by the number of students).
The student totals are correct but the total, 556K for 98 students, is way off

The reason for this discrepancy is that when this measure is calculated within the context of a row with a specific student that Row Context filters the A8Data table so only points belonging to the child in that row are summed. The total is calculated without any row context, so all rows in the A8 data table are summed. To fix this we need to tell the measure to apply the row context from the students table to the A8Data using the EARLIER formula (see part one for more info on Row Context and EARLIER).

TotalPoints = SUMX(Students,SUMX(FILTER(A8Data,A8Data[StudentId]=EARLIER(Students[StudentId])),A8Data[Points]))

This won’t affect the results in the table with students (you’ve told it to apply a filter that row context was applying anyway) but it means that when the measure is calculated without row context, the A8Data table is filtered by the earlier row context of the Students table.

Maths Bucket

In A8 the maths bucket is simply the maths grade *2. We applied discounting rules in part 1 so we know that in our A8Data table each student has a maximum of 1 result where EBaccClass = “MA”.

Applying what we know about nested SUMX functions we can create the following measure:

MaBucket = SUMX(Students,SUMX(FILTER(A8Data,A8Data[StudentId]=EARLIER(Students[StudentId])&&A8Data[EBaccClass]="MA"),A8Data[Points]))*2

The measure goes through each row of the student table and gets the record for that student from the A8Data table with the EBaccClass 'MA'. That figure is multiplied by two and the results are summed for each student in the Students table.

English Bucket

The English Bucket is a touch more complicated as we need to work out the best of English and English Literature (or pick one arbitrarily if they are the same grade) and double it if both grades are present. Another way to think about this is that we want to take each student’s largest English grade and if they have 2 English grades, add the largest English grade again.

EnBucket = SUMX(Students,CALCULATE(MAX(A8Data[Points]),A8Data[EBaccClass]="EN")+
IF(CALCULATE(COUNTROWS(A8Data),A8Data[EBaccClass]="EN")=2,
CALCULATE(MAX(A8Data[Points]),A8Data[EBaccClass]="EN"),0))

The CALCULATE function allows to work something out with additional filter context (in this case the maximum points value where EBaccClass = “EN”). After working out everybody's maximum result, the IF function checks if two results are present, if so the maximum points are added again, if not zero is added. Existing row context is preserved which is why it works out the maximum for each student. Here there’s no nested SUMX formula so no need to preserve row context with EARLIER.

EBacc Bucket

Here’s where it starts to get interesting. The EBacc bucket is the sum of the best three EBacc qualifying subjects. To get at these we’re going to use the formula TOPN, which is a table formula that returns a table with a specified number of rows from a table in our model sorted in a specified way.

You might be thinking boom, TOPN(3,A8Data,A8Data[Points],DESC) and we’re done, but there’s a wrinkle. If you have a kid with EBacc results 7,6,5,5 then that formula will return all four records. TOPN returns all records with rank equal to or better than N, even if that means that the total number of records returned is greater than N.

This is why we created an index that uniquely identifies all the grades in the dataset in order of their points value. The formula 

TOPN(3,FILTER(A8Data,A8Data[EBaccClass]="EB"),A8Data[Index],ASC) 

can only return 3 rows because each Index value is unique. The sort order is ascending because we applied an index value of 1 to one of the grades with the highest points value and sequential numbers to all other grades in descending order of value. That means that in our set of 7,6,5,5 one of the 5s will have a lower index value and so will be included in this EBacc Bucket subtable. The other we will consider for inclusion in the open bucket.

EBBucket = SUMX(Students,SUMX(TOPN(3,FILTER(A8Data,A8Data[EBaccClass]="EB"&&A8Data[StudentId]=EARLIER(Students[StudentId])),A8Data[Index],ASC),A8Data[Points]))

Note that TOPN returns a table, which is why it can be used as the first argument of SUMX. Within TOPN there is another table argument where we use FILTER to ensure that only EBacc results are considered and that the earlier row context from table Students is respected.

Open Bucket

The Open Bucket is the most complex because as well as the subjects with a class of “OP” we also need to consider the unused English/English Lit grade and the 4th, 5th and 6th largest EBacc grade as well. We’ll create three separate tables within our measure for each of these sets of grades, then we’ll do the TOPN of that dynamically created table.

To get the second largest English grade we use the function EXCEPT which takes as its arguments two tables and returns all the rows in the left hand table that aren’t in the right hand table. We can therefore stick in EXCEPT(TOPN(2,….),TOPN(1,…)) and we will get just the second biggest result. Again we’ll need to sort on Index to ensure we’re identifying the right number of results in each table.

We’ll use exactly the same technique to get the EBacc results only for those it will be TOPN(6,..),TOPN(3,…).

The OP results are easy – FILTER returns a table, so for them we just need to 

FILTER(A8Data,A8Data[EBaccClass]="OP"&&A8Data[StudentId]=EARLIER(Students[StudentId]))

To combine our three tables we’ll use the UNION function. Union takes multiple tables as arguments and makes them one table. All the tables need to have the same columns, which isn’t an issue for us as they are all A8Data filtered one way or another.

Below is the full Open Bucket Measure – I separated out its parts on different lines to make it easier to maintain (you can do this by pressing Shift and Enter in the formula bar). 

Lines 4 – 6 are the tables that are combined in the UNION expression. Line 7 is the ordering used by the TOPN function from line 2. Line 8 (points) is the expression we want added up by the SUMX function from line 1.

OPBucket = SUMX(Students,SUMX(
    TOPN(3,
    UNION(
        EXCEPT(TOPN(2,FILTER(A8Data,A8Data[EBaccClass]="EN"&&A8Data[StudentId]=EARLIER(Students[StudentId])),A8Data[Index],ASC),TOPN(1,FILTER(A8Data,A8Data[EBaccClass]="EN"&&A8Data[StudentId]=EARLIER(Students[StudentId])),A8Data[Index],ASC)),

  EXCEPT(TOPN(6,FILTER(A8Data,A8Data[EBaccClass]="EB"&&A8Data[StudentId]=EARLIER(Students[StudentId])),A8Data[Index],ASC),TOPN(3,FILTER(A8Data,A8Data[EBaccClass]="EB"&&A8Data[StudentId]=EARLIER(Students[StudentId])),A8Data[Index],ASC)),

  FILTER(A8Data,A8Data[EBaccClass]="OP"&&A8Data[StudentId]=EARLIER(Students[StudentId]))),

    A8Data[Index],ASC),
    A8Data[Points]))

Attainment 8 Measure

To get attainment 8 you just need to sum all four buckets:

Attainment8 = [EnBucket]+[MaBucket]+[EBBucket]+[OPBucket]

Comments

Popular posts from this blog

Attainment 8 in BI - Part 1

Subject - Class - Pupil Table

Data modelling for a large MAT