Attainment 8 in BI - part 2
Previously on
BI4Schools
In a previous
post we covered how to prepare exam data to work out Attainment 8. Specifically
that meant:
- Adding QANs and ‘EBaccClass’ (EN, MA, EB or OP)
- Adding DfE Performance Points
- Duplicating Combined Science records so a record of 65 became two records with points value 5.5
- Applying Discounting rules
- Adding an Index which uniquely identifies each grade ordered from largest to smallest.
· We brought that
data into a BI model with the table name A8Data and linked it to a students table on StudentId.
Buckets of Joy
We going to create
4 measures, one each for the 4 buckets that make up Attainment 8. We’ll tackle
them in order of difficulty: Maths, English, EBacc then Open. Before we get started
though I want to do a deep dive on a special class of formulas in DAX – <aggregate>X,
specifically SUMX as understanding how that works is key to understanding the
measures we need to create.
SUMX
SUMX has two
arguments a table, then an expression. What it does is it goes through each row
of the table, calculates the expression in that row and sums the result for all
rows. In other words it’s like adding the expression as a calculated
column to the table and using a normal SUM function on that column.
In this simple
example the formula SUMX(Animals,IF(ISBEAR(Animals[Animal]),1,0)) would return
the value 3 as the result of the expression would be 1 for Panda, Grizzly and
Paddington and 0 for Dog. (Obviously ISBEAR is not a real DAX formula but you
get the idea).
There are two
really important concepts to understand here. The first is that the measure
responds to row and filter context, so if you have a filter somewhere on your
report, Fictional Character = False then the result will be 2 as only the rows
Panda, Grizzly and Dog will be considered in the measure. The second is that
the first argument, Animals, doesn’t have to be an unaltered table from the
model – it can be any DAX expression that results in a table (e.g. FILTER,
SUMMARIZE, UNION). If you want to check what table functions there are and what
they do, go to the DAX Function Reference.
If you think
about the sums we need to do to work out A8 we need to go through each kid and
for each one add up some of their results, so we will be nesting SUMX formulas, for each row of the Students table, iterating through relevant rows of the A8Data table, before moving on to the next student and their A8Data records.
Our measures will look a bit like this:
TotalPoints =
SUMX(Students,SUMX(A8Data,A8Data[Points])
This formula
will work through the students table and for each student, it will add up the
points on the A8Data table. If you create a table with students on it and add
that total points measure you will notice something strange though. The totals
for the individual students will be right, but the overall total will be far
too high (it will be the total number of points in the table multiplied by the
number of students).
The student totals are correct but the total, 556K for 98 students, is way off
The reason for
this discrepancy is that when this measure is calculated within the context of
a row with a specific student that Row
Context filters the A8Data table so only points belonging to the child in
that row are summed. The total is calculated without any row context, so all rows
in the A8 data table are summed. To fix this we need to tell the measure to apply
the row context from the students table to the A8Data using the EARLIER formula
(see part one for more info on Row Context and EARLIER).
TotalPoints =
SUMX(Students,SUMX(FILTER(A8Data,A8Data[StudentId]=EARLIER(Students[StudentId])),A8Data[Points]))
This won’t
affect the results in the table with students (you’ve told it to apply a filter
that row context was applying anyway) but it means that when the measure is calculated
without row context, the A8Data table is filtered by the earlier row context of
the Students table.
Maths Bucket
In A8 the maths
bucket is simply the maths grade *2. We applied discounting rules in part 1 so
we know that in our A8Data table each student has a maximum of 1 result where
EBaccClass = “MA”.
Applying what
we know about nested SUMX functions we can create the following measure:
MaBucket =
SUMX(Students,SUMX(FILTER(A8Data,A8Data[StudentId]=EARLIER(Students[StudentId])&&A8Data[EBaccClass]="MA"),A8Data[Points]))*2
The measure goes through each row of the student table and gets the record for that student from the A8Data table with the EBaccClass 'MA'. That figure is multiplied by two and the results are summed for each student in the Students table.
English Bucket
The English
Bucket is a touch more complicated as we need to work out the best of English
and English Literature (or pick one arbitrarily if they are the same grade) and
double it if both grades are present. Another way to think about this is that
we want to take each student’s largest English grade and if they have 2 English
grades, add the largest English grade again.
EnBucket =
SUMX(Students,CALCULATE(MAX(A8Data[Points]),A8Data[EBaccClass]="EN")+
IF(CALCULATE(COUNTROWS(A8Data),A8Data[EBaccClass]="EN")=2,
CALCULATE(MAX(A8Data[Points]),A8Data[EBaccClass]="EN"),0))
The CALCULATE function
allows to work something out with additional filter context (in this case the maximum
points value where EBaccClass = “EN”). After working out everybody's maximum result, the IF function checks if two results are present, if so the maximum points are added again, if not zero is added. Existing row context is preserved which
is why it works out the maximum for each student. Here there’s no nested SUMX
formula so no need to preserve row context with EARLIER.
EBacc Bucket
Here’s where it
starts to get interesting. The EBacc bucket is the sum of the best three EBacc
qualifying subjects. To get at these we’re going to use the formula TOPN, which
is a table formula that returns a table with a specified number of rows from a
table in our model sorted in a specified way.
You might be
thinking boom, TOPN(3,A8Data,A8Data[Points],DESC) and we’re done, but there’s a
wrinkle. If you have a kid with EBacc results 7,6,5,5 then that formula will
return all four records. TOPN returns all records with rank equal to or better than N, even if that means that the total number of records returned is greater than N.
This is why we
created an index that uniquely identifies all the grades in the dataset in order
of their points value. The formula
TOPN(3,FILTER(A8Data,A8Data[EBaccClass]="EB"),A8Data[Index],ASC)
can only return 3 rows because each Index value is unique. The sort order is ascending
because we applied an index value of 1 to one of the grades with the highest
points value and sequential numbers to all other grades in descending order of
value. That means that in our set of 7,6,5,5 one of the 5s will have a lower index
value and so will be included in this EBacc Bucket subtable. The other we will
consider for inclusion in the open bucket.
EBBucket =
SUMX(Students,SUMX(TOPN(3,FILTER(A8Data,A8Data[EBaccClass]="EB"&&A8Data[StudentId]=EARLIER(Students[StudentId])),A8Data[Index],ASC),A8Data[Points]))
Note that TOPN returns a table, which is why it can be used as the first argument of SUMX. Within TOPN there is another table argument where we use FILTER to ensure that only EBacc results are considered and that the earlier row context from table Students is respected.
Open Bucket
The Open Bucket
is the most complex because as well as the subjects with a class of “OP” we also
need to consider the unused English/English Lit grade and the 4th, 5th
and 6th largest EBacc grade as well. We’ll create three separate tables
within our measure for each of these sets of grades, then we’ll do the TOPN of
that dynamically created table.
To get the second
largest English grade we use the function EXCEPT which takes as its arguments
two tables and returns all the rows in the left hand table that aren’t in the
right hand table. We can therefore stick in EXCEPT(TOPN(2,….),TOPN(1,…)) and we
will get just the second biggest result. Again we’ll need to sort on Index to
ensure we’re identifying the right number of results in each table.
We’ll use
exactly the same technique to get the EBacc results only for those it will be TOPN(6,..),TOPN(3,…).
The OP results
are easy – FILTER returns a table, so for them we just need to
FILTER(A8Data,A8Data[EBaccClass]="OP"&&A8Data[StudentId]=EARLIER(Students[StudentId]))
To combine our
three tables we’ll use the UNION function. Union takes multiple tables as arguments
and makes them one table. All the tables need to have the same columns, which
isn’t an issue for us as they are all A8Data filtered one way or another.
Below is the
full Open Bucket Measure – I separated out its parts on different lines to make
it easier to maintain (you can do this by pressing Shift and Enter in the formula
bar).
Lines 4 – 6 are the tables that are combined in the UNION expression. Line
7 is the ordering used by the TOPN function from line 2. Line 8 (points) is the expression
we want added up by the SUMX function from line 1.
1 OPBucket = SUMX(Students,SUMX(
TOPN(3,
UNION(
EXCEPT(TOPN(2,FILTER(A8Data,A8Data[EBaccClass]="EN"&&A8Data[StudentId]=EARLIER(Students[StudentId])),A8Data[Index],ASC),TOPN(1,FILTER(A8Data,A8Data[EBaccClass]="EN"&&A8Data[StudentId]=EARLIER(Students[StudentId])),A8Data[Index],ASC)),
EXCEPT(TOPN(6,FILTER(A8Data,A8Data[EBaccClass]="EB"&&A8Data[StudentId]=EARLIER(Students[StudentId])),A8Data[Index],ASC),TOPN(3,FILTER(A8Data,A8Data[EBaccClass]="EB"&&A8Data[StudentId]=EARLIER(Students[StudentId])),A8Data[Index],ASC)),
FILTER(A8Data,A8Data[EBaccClass]="OP"&&A8Data[StudentId]=EARLIER(Students[StudentId]))),
A8Data[Index],ASC),
A8Data[Points]))
Attainment 8
Measure
To get attainment
8 you just need to sum all four buckets:
Attainment8 =
[EnBucket]+[MaBucket]+[EBBucket]+[OPBucket]
Comments
Post a Comment