Friday, September 30, 2016

What's Changed in the True Formula for 2016-17

This took longer than hoped, but here are some details on what's changed in the True formula from the 2015-16 version to the 2016-17. It is the fourth and final piece of the large Renewology intro, following the general intro, tables and charts details, and summary of the Renewology model. If you want more details on the formation of the True formula, you can go back and piece together all my old posts. Perhaps some day I will do a better documentation, but this post is just about the changes from the 2015-16 version to the 2016-17 one.


The first significant change is the overdue inclusion of a lead-out factor, in addition to the lead-in factor. The textbook example of this is Rosewood, a show that aired most of its episodes with a ginormous Empire lead-in but a few before the modest Hell's Kitchen or Wayward Pines. What I actually put in the formula may be a bit conservative for a show like Rosewood; its first pre-Hell's episodes in March still look like the weakest episodes of the series to date. But that's kind of how I felt at the time they aired, that it had dropped way more than I expected. When it held that number in the post-DST episodes in late March, that was more of the level I had been expecting, and those episodes now line up pretty well with the typical pre-Empire ones.

This addition also had some unforeseen smoothing benefits on other shows. There were series like Scandal and Brooklyn Nine-Nine that the formula perceived to have big collapses in the second half of the season, and those collapses don't look quite as bad when factoring in their signficantly reduced lead-out situations. (Scandal went from strong How to Get Away with Murder to weaker The Catch, while Brooklyn traded out Family Guy for The Grinder.)

For the most part, this works a lot like lead-ins, but only weighted half as much. A half-hour show gets a 0.1 upgrade in True for every 0.5 reduction in lead-in... so it's a 0.1 upgrade for every 1.0 reduction in lead-out. However, the baseline for what is considered a "normal lead-out" is a little lower than the baseline for lead-in, because the league average lead-out was lower than the average lead-in. (With the crazy exception of Empire, networks tend to like using their big shows to lead into other primetime shows!) This lower baseline means the lead-out impact as a whole is a little more than 50%.

Timeslot Characteristics

I'm not sure how many people picked up on this, but one thing that really annoyed me late in the season was when New Girl was airing episodes both at 8:00 and in the 9:00 hour. The 8:00 episodes had much, much higher True scores than the 9:00 ones, and it was very obvious because they were on the same night.

At the time, I thought this might be some kind of flaw in the way the formula treats viewing and competition in general. But in the off-season, I did a bigger-picture analysis and noticed that this was true of every single show that aired several episodes in both the 8:00 and 9:00 hours: from 2 Broke Girls to Sleepy Hollow to The Odd Couple to Brooklyn Nine-Nine. Every show had much better True scores in the 8:00 hour. There are some things to quibble about in these individual cases, as not all of them were really "regular" timeslots and most of them were changing nights as well. But there were a lot of examples of inflated scores at 8:00 and really no counter-examples.

It started to look like this was much less about mis-grading the viewing levels; after all, the formula does a good job tracking the effects of things like DST when the timeslot was the same. It's actually something inherent to the time periods themselves.

The main part of the True formula that is time-specific is what I called the "Methodology Adjustment." This was added back at the very beginning of the True formula and hasn't been changed since. It can be rather confusing, and you can try to make sense of my old post on this if you really want. But let's just say it was an attempt to convert viewing levels into a better measurement of how shows actually get viewed within the Live+SD window.

I looked back at the viewing levels I used to come up with this adjustment back in 2011 and compared them to 2016 viewing levels for the same period. The results were what I expected: overall viewing is down a lot in all hours, but the drop-off is not as bad at 8:00 as it is in the other hours. If you want even more evidence, you can look at broadcast ratings over time, which have gotten much stronger at 8:00 and weaker at 10:00 (though admittedly not by that much just over the history of the True formula). Overall, I think there's a pretty solid mountain of evidence, both from individual shows and from the "fundamentals," that in a world of heavy same-day DVRing, it's not as hard to be an 8:00 show as it used to be (relative to the other hours, that is).

So what I did was basically add a step to the Methodology Adjustment; now it converts 2016 viewing into something that looks like what we used in the original adjustment back in 2011. This tends to penalize 8:00 shows a bit (usually a little less than a tenth), is very close to neutral at 9:00, and helps 10:00 shows by less than a tenth.

This kind of thing is hard to explain because I'm talking about the formula's interpretation, not raw numbers. You can rest easy that 8:00 is still a tougher hour compared with 9:00, especially when DST is in play. But they've been flattened out some, relative to the formula's old interpretation. And if a 9:00 hour is particularly competitive (like on Wednesday), it's now more plausible that it can be about equal to the 8:00 hour in terms of difficulty.

Lead-in "Liveness"

In 2015-16, lead-ins often seemed to be weighted too heavily; in almost all cases when a show went from an original to a repeat lead-in, it would hold up better than it was "supposed" to. Considering lead-ins are weighed about the same per point than they were in the very early years of the formula, and same day-DVRing has increased a lot in this period, this was probably overdue for an update.

Rather than just weigh all lead-ins less, the formula introduces different treatments for lead-ins based on their "liveness," or how much of their viewing is actually live and usable as a lead-in in the traditional sense. The three broad categories are:

1. Scripted originals. These shows generally have the least live viewership, and Live+SD numbers include heavy same-day DVRing. A scripted original lead-in is now effectively about 80% as big as it was in the old formula.

2. Repeats/specials/movies and unscripted originals. These shows are mostly live viewership and are treated exactly as they were in the old formula.

3. Sports. These shows are virtually completely live viewership. Shows actually seem to be affected more by sports lead-ins than the formula would suggest. So the formula ramped them up to 110% of the old formula treatment, and also does not reward a show for skew discrepancies between a show and its sports lead-in.

If nothing else, these differences help account for the fact that I no longer get final half-hour breakdowns. Unscripted shows are usually bigger at the end of the program than the full average, while a ton of scripted shows drop at the half due to same-day DVRing.

Weighing factors based on skew

I've long theorized that young-skewing shows are more susceptible to overall viewing levels, while older-skewing shows are more susceptible to changes in competition. It's most obvious on holidays like Thanksgiving Eve, when CBS always seems to hold up swimmingly while Fox shows like Glee and Empire get crushed, and you can also see it in the way young-skewing shows like New Girl and The Vampire Diaries always have huge late-season drops, while CBS procedurals are more resilient.

I ran lots of correlations between True and viewing/competition within a wide array of shows, and was able to see that the relationship actually did exist. Older skewing shows tended to be impacted by competition more than the True formula says they should be, and younger-skewing ones were hurt more by drops in viewing. It's not that strong, but it's there. So I added a "skew multiplier" that slightly changes how viewing and competition are weighted based on skew. Now, younger-skewing shows are more affected by viewing levels, while older-skewing ones are more affected by competition. There was a version of this in the formula last year, but it was not really put in the right place. (It was based on the week of the season, rather than the actual viewing levels themselves.) So the net effect probably won't be that noticeable on a year-to-year basis in most cases.

One place that this can cause some chaos is on the already chaotic Sundays, where sports competition is a major factor. I often found that downgrading the importance of competition seemed to penalize the young-skewing Fox comedies too much. Maybe the Sunday competition (football and male-skewing cable dramas) happen to share a lot of audience with those cartoons. And in a lot of cases, I just think the way that I count competition (breaking it apart by full half-hours) doesn't really account for all of the NFL audience in these overrun periods.

So I've done a minor emergency fix in the last week; in the 7:00 hour (Bob's Burgers), we waive the "count sports less than other competition" rule. And in the 8:00 half-hour (The Simpsons) the "count sports less than other competition" rule is only applied two-thirds as much as usual. It's not a scientific fix, just more an attempt to account for NFL viewers that may slip through the cracks. This may still not be enough, and there may still be some random stinkers for the Fox cartoons due to the timing of Sunday overruns. But I'll try to stay relatively on top of it and make sure the competition is counted close to correctly.

The emergency fix was not back-fit to last year's True scores, so as not to mess with the Renewology model. The only thing I changed in Renewology is that I slightly increased the perception of the Fox comedy department, to help account for the fact that there were several fall Bob's Burgers and The Simpsons episodes that were under-counted. But since the logistic regression only uses late-season episodes, most of the numbers that are mucked up by the NFL wouldn't have made a big difference anyway.

No comments:

Post a Comment

© 2009-2022. All Rights Reserved.