Thursday, August 29, 2013

The 2013 True Tweaks


This was not a big summer for my timeslot metric called True Strength, as most of my energy has gone toward creating the new historical posts. I'm not going to do a full-length explanation of every single facet of the formula this year as I have the last two, because this year there was no major overhaul. I'm just going to go over the few substantial changes, and that can serve as an addendum to all the posts I wrote last year.

The name is now "True." One nomenclature note: I'm dropping the whole idea of attaching a version label on the formal name of this number each year, so it will just be referred to as "True" in tables going forward. (Incidentally, I will also introduce the name "Plus" for A18-49+ effective September 1. However, this one is more of a nickname that will be easier to say and write in informal situations. I think I will still use the formal "A18-49+" on tables.)

Lead-in compatibility. The thing I was most interested in working on this summer was the idea that not all lead-ins are created equal. Two of my biggest misses to the negative in The Question last year were for shows airing after a special lead-in with a vastly different audience. It's become clear that airing after a big lead-in is not necessarily as helpful as the lead-in adjustment in True2 thinks it should be. I don't have access to things like gender breakdowns, but I thought there might be some value in comparing a show's "skew" (% of its audience within the 18-49 demo) with its lead-in's audience skew, and giving the show's True a bonus if those numbers are vastly different.

This is not exactly a sweeping game-changer, because there are so few shows that actually air a lot of episodes after an "incompatible" lead-in. However, the consistency in True for shows that had different lead-ins (like Castle, Body of Proof and Vegas) tends to be improved by this, while it makes no real difference for most shows. Seems like a win.

Here's the adjustment: for every percentage point in skew difference between a program and its lead-in beyond 5, a show's "True lead-in" is reduced by 4%. The "True lead-in" is increased by 4% for each point in skew difference less than 5. So a show with a 10-point lead-in difference means the lead-in will be treated as 20% less than its literal number. A "wildly incompatible" 20-point lead-in difference means the lead-in will be treated as a whooping 60% less!* That might seem kind of ridiculous, but in those "wildly incompatible" situations, it actually makes sense on paper. As I mentioned in previewing Unforgettable last month, we've seen crime drama repeats do worse after Big Brother than after other crime drama repeats, even though Big Brother has a rating well over a point higher.

*- Technically, a skew difference above 30 points would take that over 100% and create a negative "True lead-in" value, but I never saw any difference that large (the biggest ones were in the 25 vicinity), so let's hope it doesn't happen. While a negative "True lead-in" would certainly not be "correct," I do think it's safe to say that a show like Blue Bloods (16% skew) would provide basically zero help to something like New Girl (66% skew).

Probably the single best example of this phenomenon was season three of Body of Proof, which premiered badly after a very incompatible The Bachelor (2.6), stayed weak after the very incompatible The Taste (1.4/1.2/1.1), then grew after the highly compatible Dancing with the Stars (2.1 average), then shrank for the finale after younger-skewing Extreme Weight Loss (1.3).

With the adjustment, that Bachelor episode is now treated as if it pulled a 1.2. The Taste is credited as a mere 0.6/0.4/0.4. Dancing with the Stars is now given a 2.4 average, and Extreme Weight Loss a 0.7. In other words, DWTS is counted as a 1.2-point bigger lead-in (or worth 0.2 more points to BoP) than The Bachelor, which roughly makes sense since BoP averaged 0.2-0.3 more points after DWTS in reality.

We'll see how this goes in 2013-14; it may need to be tweaked since, as I said, there's a small sample size of shows that aired both with a "compatible" lead-in and an "incompatible" one for me to go on. I am working on a way to illustrate this at work; I might have the SpotVault pages list both the literal lead-in rating and the "True" lead-in, and I also might return to listing "Skew" on the daily tables so those differences can be seen as they happen.

Also, I briefly toyed with doing something similar for competition compatibility, but it seemed far less simple/helpful. For example, two of the shows that I think have among the largest audience overlap - The Voice and Dancing with the Stars - also have very different skews. It might be more useful to do a "true competition" adjustment along genre lines rather than skew. But that will have to wait for next year.

A smoother "fall hype" adjustment. One of this number's big struggles has been the fact that ratings get weaker over the course of the fall, even accounting for "climate" changes in overall viewing. (They of course get weaker in the winter/spring too, but usually relatively in line with the "climate" changes.) The first two editions of True Strength chose one point in the late fall and artificially deflated the ratings to a certain extent before that point and artificially inflated them after that point. In the first version, that pivotal point was at roughly the end of October, and then I changed it to roughly the end of November in True2.

As I've always acknowledged, this was helpful but it tended to exaggerate the effect right around that pivotal point. If you've perused the SpotVault extensively, you might have noticed that a ton of shows hit their lowest True2 numbers during the month of November, right before "fall hype" was about to kick in.

So I took out the fall hype adjustment entirely and threw together a bunch of shows that should have roughly the same True throughout the season (veteran comedies and procedurals) and averaged all of their points week-by-week compared to their season averages. Here's what it looked like across the season.


It looked like having three different "fall hype" levels, rather than two, was probably of some value because the late fall numbers were still higher than the winter/spring ones. Throwing out the premiere bounce in the first two weeks and a few other outliers (mostly in weeks with very few episodes of the selected shows), there seemed to be a relatively predictable "early fall" level in the mid-to-upper 100's, a clearly lower "late fall level" right around 100, and a "winter/spring" level in the general vicinity of 95.

So here are the new "hype" adjustments. In the first six weeks of the season, True numbers are multiplied by 0.93 (roughly the old fall hype adjustment). From week 7 through the holiday break, True numbers are multiplied by 1.00. And after the holiday break, they're multiplied by 1.05. This should create a smoother hype adjustment that doesn't have a clearly weak section of the season... but we'll see! It might have to get even more complicated in future years.

The Sunday cable effect. Perhaps the most "pressing" issue for this summer was how unfairly this number appears to treat Sunday programming. This is because overall viewing is so much higher on Sunday than any other night. This gap between Sunday and other nights might be because cable programs are now a big part of the Sunday competitive landscape, yet the number doesn't count them at all. (And, to a smaller extent, there is no accounting for the CW's local programming on Sunday nights, whereas Monday-Friday shows are credited for facing the CW's national-rated stuff.)

Rather than actually come up with a way of counting literal cable programs, which will take much more work, I tried to go at it in a simpler way for now. Basically I just tried looking at the broadcast viewing/overall viewing tendencies across the various nights of the week. I didn't get anywhere for awhile, as the viewing trends seemed relatively aligned over the years, but then I realized that in this definition of "broadcast viewing" I should probably be counting other cable shows that I manually add to that number: most importantly Monday Night Football. When I gave that approximately 2.0 bounce to the Monday broadcast rating (MNF averaged about a 5.0, but only across about 40% of the regular season Mondays), the "bc" on Monday and Sunday were about the same, and I had to add a bounce to high-viewed Sunday to re-optimize the correlation. That best number seems to be in the vicinity of 3.0.

So, along the lines of the 10:00 competition adjustment, there's going to be a new "Sunday cable competition" number of 3.0 added to the "bc" count on Sundays from 8:00 to 11:00. This ends up being worth somewhere between 0.1 and 0.25 additional True points for most Sunday shows, depending on how big they are. While this adjustment still leaves Sunday as the "easiest" of the five weeknights (so there may still be some controversy about whether it is enough of an adjustment), the gap is much smaller than it used to be.

Anyway, I'm not sure the way I went about getting this number is particularly sound mathematically, but it's clear I need some kind of placeholder that helps connect the Sunday True numbers to their actual renewal chances. It's not really that productive to have to apologize for the Sunday problem every time I do a Power Rankings, so maybe this will patch the wound somewhat. (Or maybe I will still be apologizing anyway.) In the future I hope to come up with some relatively labor-unintensive way of doing this with actual cable program numbers rather than the more "theoretical" approach used here, but I just didn't have the time to do that this summer. I'd have to apply that kind of approach to the whole week.

Additional tweaks. In addition to these three changes, I've also updated some of the "baseline" numbers in the formula, but I will not spend too much time boring you on those. Basically I updated the baselines for things like PUT, tendency PUT, and lead-in, and I also slightly decreased the size of the 10:00 competition adjustment (since the ratings in general got smaller). Most of these changes are not significant compared to the numbers on last year's wrap-up post.

Another thing this season is that I'm extending the competition add-on for 8:30-start cable sports events like Monday Night Football and Thursday Night Football into the 8:00 half-hour. This will really help smooth out the True scores for shows like How I Met Your Mother and The Big Bang Theory, whose True scores were far too low early in the fall (compared to later in the season). Yes, these sports shows technically start at 8:30, but there's probably a lot of pre-tune, and on the West Coast those 8:00 shows actually have to go up against the tail end of the 8:30 games, so there should be something added on at 8:00 anyway.

2 comments:

Spot said...

Great news all around. I appreciate all the patience in explaining it to us. I am happy with all the adjustments. The compatibility thing was an issue I've had for quite a while and I am very glad that you are addressing it. I think it is a shame that we don't have regular access to more demo breakdown (18-34 for instance) as that would for sure help too. I think eventually you'll extend it to the competition compatibility too but I understand how you would need more information in order to do so. As for the Sunday eternal issue, I do think you were still not aggressive enough because at this point I almost feel like Sunday is the new Friday for most of the networks in terms of difficulty, so it makes no sense to me that it is still the easiest of the five weekdays. But, hey, it's your formula, and I already appreciate that at least part of the effect was reduced! Looking forward to the best case/ worst case now!

Spot said...

Great work on this blog! I wish that you had access to gender splits.

Post a Comment

© SpottedRatings.com 2009-2018. All Rights Reserved.