Editorials and Articles Archive

Saves, Shocks, and Sigmas

Our readers ask, and we answer...or at least we dodge the question deftly

15 March 2015

We haven't written a new editorial in a while, and other than the fact that executive producer Per Blankens has completely lost his mind with this new elimination format – "Torture the contestants, then expect them to sing well", yeah, that's brilliant – there really isn't much to write about. Well, at least our mailbox has served up a few questions we ought to address. So, on a quiet late-winter Sunday night, here goes nothing...

Let's start with the burning question of the weekend, at least at the intersection of American Idol Boulevard and Library Sciences Way: will we catalog and rate the Save performances this year?

Our mail has been running about 2 to 1 in favor of doing so. Several people (Correspondents Andrew, Matthew, and 'Idol Maniac' in particular) made particularly compelling cases. We still think Correspondent Ben has the single best case, and he's on the "No" side. Plus, we are still scratching our heads as to this: if Save performances truly matter, then why hadn't we been tracking them for the past six years? That they happen now on performance nights rather than result nights seems a pretty thin distinction.

It's a very close call. After pondering the matter for the weekend, we have decided to do what most 21st Century political and industrial leaders do in a difficult situation like this....

...and that's kick the can down the road for awhile. We do have a justification, at least. We feel that we would, at the very least, need to add a new result status to the database: "Kaput and Unsaved", in essence. And, that would require rewriting approximately two trillion programming functions and database views, conservatively speaking. We still shudder at the effort it took to implement and debug everything when the Save was introduced. Longtime WNTS readers may recall how many years Duets and Trios played havoc with the averages on the site. Can-kicking rarely looks so attractive.

We will, however, compute a rating for the Save performance each week, until either the Save is used or the producers wise up. Sarina-Joi Crowe's this past Thursday was 38, with a large standard deviation of 23.

Speaking of Crowe, her stunning 12th-place elimination has the Idolsphere mostly up in arms. Partly this is because shock boots simply don't happen as often as they used to.

While your take on things may differ, the last time a clear contender failed to outlast a clear midcard candidate, let alone an undercarder, was in 2011. Pia Toscano, the highest-rated contestant in our database at 80.8, finished ninth in AI10 behind the likes of Paul McDonald (whom we alone seem to have really liked, 43.3), Stefano Langone (49.4), and Jacob Lusk (45.4, though in fairness Lusk's average plummeted mostly at the very end.) She also finished behind sixth-place Casey Abrams, whom the judges had foolishly 'saved' two weeks earlier. Abrams was a strong and creative midcard contestant who didn't deserve to finish 11th, but neither was he worth keeping off the scrap heap while the calendar still read March.

Before that, nothing terribly earth shattering had happened for several years. Siobhan Magnus (62.4) went out a respectable sixth in Season 9, survived by Aaron Kelly (41.0) – the judges had justifiably used their Save on Big Mike Lynche (55.9) in the Final Nine. Alexis Grace (67.3) finished 11th in Season 8, outlasted by a bunch of nice folks we don't wish to discuss. Grace and Crowe followed the same career trajectory: two highly-acclaimed performances, one slip-up, au revoir. Grace, however, had utterly no prayer of being saved. (We will go to our graves asserting that the Save rule, introduced that season, was devised solely for the protection of Adam Lambert. Incidentally, we'd say that Danny Gokey outlasting Allison Iraheta that year was as criminal and outrageous as any outcome that has occured since, including Crowe's unfortunate ouster, but that's just our opinion.)

While it's tempting to credit the Judges' Save for this relatively serene stretch of results, the truth is that it hasn't really been much of a factor. With the exception of Season 11, when it kept eventual second-place finisher Jessica Sanchez from finishing seventh, its beneficiaries have mostly been midcard contestants. And, while Sanchez getting the boot in the middle of April would indeed have been an injustice, it's not like the other six Idols remaining that year were chopped liver.

No, the credit belongs in fact to the much-maligned voters; the teenyboppers and grandmas that Idol analysts are always at the ready to blame. As we showed in our season-ending editorial from 2014, they've been ruthlessly on target for many years, and especially so in AI12 and AI13. They may have lulled the Idolsphere into a false sense of security. Regardless, we think the judges' decision to eschew the Save was dead-on correct. There very well may not be a boot-ee more worthy of rescue this year than Crowe, and it's downright laughable that she will not be on the Summer tour. But, it's way too early to make an all-or-nothing commitment to that sentiment. Ask Pia Toscano.

We received a few letters this week from folks who wanted to know how Joey Cook could have an overnight approval rating of 88 with a standard deviation of 19. Doesn't that suggest that the high end of her vote expectation range was 107? Isn't that impossible?

Answers: No. Yes. Not really.

When we developed the Project WNTS rating system back in 2005, we mapped the 'raw' results up to that point to a 0-to-100 scale, artificially widening the normal distribution curve and leaving some room on both ends for future superlative performances. In fact, while there is a practical limit to how high or low an approval rating can be, there is no mathematical limit. If we someday are treated to a performance that every single Rating reviewer scores as, say, six of their personal standard deviations above their historical mean, then we're going to plug that number into Excel and publish whatever jaw-dropping number it spits back at us -- probably about 150, we'd surmise.

Considering nobody's ever climbed higher than 96 in thirteen-plus years, a three-digit approval rating seems astronomically unlikely. It would require an unprecedented consensus of transcendental excellence. A negative rating, however, cannot be ruled out, because Idol contestants have a nasty habit of plumbing unforeseen depths. Lazaro Arbos was briefly at zero during the tally for Close To You, scaring the hell out of our IT Department, as they had no idea what a negative rating would do to the CSS that produces the "star bars" in the results table. As it turns out, it can handle it. Lucky.

The Law of Large Numbers puts a practical limit on the consensus approval ratings that we publish, just as we intended. It is certainly possible, however, for an individual rating to map to something out of our Centigrade scale. In fact, it happens all the time. While we would have to unwind a boatload of Excel and SQL code to figure out the exact number of standard deviations above average one established reviewer would have to give a performance to convert to a 100 (it varies season to season in a plethora of ways), it's surely below 2.5. If one out of every, say, 80 performance grades we receive didn't crack the century mark after conversion, either something is very wrong with our methodology or something is very wrong with the singers that night. We'd bet on the latter.

Besides, the standard deviation for every WNTS approval rating is affected by "rankers" moreso than "raters." That's because people who rank the performances on the Interwebs (which includes people who just give a Best 3 and Worst 3) greatly outnumber those who rate them (including our Review Crew). As surely every regular WNTS reader understands, the higher the s.d., the more likely the performance rating will regress towards 50.

There are exactly two ways that a performance with a high standard deviation can still achieve a high approval rating, and both are rare. In fact, if we'd anticipated these corner cases ten years ago, we'd probably have devised a correction for them. But we didn't, which is just as well.

Corner Case #1: The rankers are all over the board about a performance, but it nonetheless finished first on ordinal points, and the subset of raters were far more uniformly positive about it. Recall that our basic approach is to calculate the episode average (still the single most critical calculation each week), add up the ordinal points for every contestant, see who came in first and last, calculate their ratings directly, and then have Excel "bend the curve" for the remaining singers. This is precisely how Bo Bice got to 90 for Whipping Post while lugging a sigma of 20 -- some "rankers" were less than enamored about the seminal performance of Epoch Two, having never seen any Idol contestant do THAT to a microphone stand before, but a disproportionate number of the "raters" scored it sky high.

Even then, though, there is a limit to how high an s.d. can be, because if your variance among rankers is wide, there's little chance you can finish first on ordinal points and thus give the raters the opportunity to bear you on their shoulders. Bice and Katharine McPhee have delivered the only two "showstopper" performances with an s.d. of 20. So what's your second option then?

Corner Case #2: Finish second (or third) on the night, but stay close to the leader. If their rating is very high, and you're just a hair behind them, then yours will be too. This explains what we feel is, by leaps and bounds, the most crazily improbable combination in our database: Adam Lambert's astounding 82/29 on Black Or White. Tons of people loved it. Quite a few hated it. Lambert didn't win the ordinals that night: Iraheta did, just barely, with Give In To Me. That rated out to an 83, and with Lambert (and Grace, and Lil Rounds) all bunched up near the top, "Black Or White" was carried along in its wake to 82.

You may ask, what would Lambert's rating have been if he'd had finished first that night? Maybe Corner Case #1 would still have applied? Short (and honest) answer: we don't remember, and attempting to "unwind" the database to that date to find out would get us checked into the local mental ward. It was surely very high though – generally speaking, "The Glambert" was much better received by the professional journalists and bloggers who rate each performance than he was in the forums, where his detractors tended to congregate. From eyeballing our S8 spreadsheet, it looks like high 70's.

Finally, a true "editorial" finish to this editorial. We generally avoid chiming in on any particular performance rating during the season, but what the hell...

Cook will not reach 90 for Fancy this week. She's well ahead on ordinal points, but the performance's standard deviation is just too high, even among rated reviewers only, to get to the magical "showstopper" number. As we type this, she looks to be too far away to get there even through post-season normalization.

That's a pity. Because quite honestly, even if it's only the WNTS Math Dept.'s personal opinion, that was one of the ten greatest and most imaginative performances we have ever seen on American Idol. For all the headaches and agita this show gives us, it's moments like that – a busker who, inspired by a YouTube video, sings a crappy Iggy Azalea song as though she were Edith Piaf in a 1930's French cabaret – that keep us tuning in.

See you Thursday night.

- The WNTS.com Team

[ Back to Editorial List ]

Editorials and Articles Archive

Saves, Shocks, and Sigmas

Our readers ask, and we answer...or at least we dodge the question deftly

15 March 2015

For New Visitors: