Inman

Zillow Prize will award $1M to the person or team who most improves Zestimate

In 2009, Netflix gave $1 million to BellKor’s Pragmatic Chaos technology team for building a better movie rating prediction algorithm (to the tune of 10.06 percent) than the one Netflix had been using.

It appears that 2017 is the year Zillow follows in the media platform’s footsteps; the company just announced Zillow Prize, which will award $1 million to the first person or team who most improves the algorithm for Zestimate, Zillow’s home valuation tool.

How to win the Zillow Prize

“Zillow Prize calls for data scientists, engineers and visionaries to compete to improve automated home valuations of 110 million homes across the U.S.,” stated the company in its press release.

This is the first time that Zillow will make “a portion of the proprietary data that powers the Zestimate home valuation” available to anybody outside of the company — which is no doubt exciting to those who’ve longed to know what’s behind the algorithm that Zillow calls “one of the highest-profile, most accurate and sophisticated examples of machine learning.”

“We still spend enormous resources on improving the Zestimate, and are proud that with advancements in machine learning and cloud computing, we’ve brought the error rate down to 5 percent nationwide,” said Stan Humphries, creator of the Zestimate home valuation and Zillow Group chief analytics officer in a statement.

“While that error rate is incredibly low, we know the next round of innovation will come from imaginative solutions involving everything from deep learning to hyperlocal data sets — the type of work perfect for crowdsourcing within a competitive environment.”

Kaggle, a platform that connects data scientists with machine learning problems, will administer the contest. A public qualifying round kicks off today, and participants have until October 16 to register for the contest and download the competition data set, then submit a developed model that will improve the Zestimate residual error. The first data set is a list of real estate properties (and details about those properties) in Los Angeles, Orange and Ventura Counties in California.

Humphries said that the stage-one data set contains “all real estate transactions that are full-value, arm’s-length sales for the past 10 years in those counties,” including one file with a record for each home listing the public attributes of the home (square footage, beds and baths, and so on), and a separate file with the transaction information for that home, including the Zestimate, the actual sales price and the residual error rate, which Humphries describes as “the difference between our valuation of that home, what we expected it to sell for, and what it actually sold for. There’s a residual error figure for every transaction that has taken place.”

The developers’ tasks will be to minimize that residual error rate. “That’s what the first round is all about: Their task is to minimize the error left after our algorithm runs on it,” explained Humphries.

Zillow will begin evaluating submissions on October 17 and will unveil a leaderboard of the top 100 teams — who will be invited to compete for the $1 million grand prize — on January 17, 2018, in preparation for a private final round of competition that will kick off on February 1, 2018. The company plans to select a final winner on or near January 15, 2019.

“In the final round, the winning team must build an algorithm to predict the actual sale price itself, using innovative data sources to engineer new features that will give the model an edge over other competitors,” said Zillow in the release. “The home value predictions from each algorithm submission will be evaluated against real-time home sales in August through October 2018.

“To take home the $1 million dollar grand prize, the winning algorithm must beat Zillow’s benchmark accuracy on the final round competition data set and enhance the accuracy further than any other competitor.”

This data set is much larger — a national data set instead of spanning three counties and including sales from 2016 and 20 years back — and participants will be attempting to predict the sales price of the home instead of improving the Zestimate residual error rate.

And there’s a twist that Humphries is especially excited about: “The second-round teams are allowed to use outside data,” he revealed. “They’re not allowed to use listing information — we’re not interested in a model that takes the listing price and predicts the sales price because it’s not an unbiased estimate of the value of the home. But they’re allowed to use any other outside data and integrate it with our data set to make that prediction.”

This is a big deal, Humphries says, because “we really do believe that moving forward, a lot of the gains that we’ll see in the Zestimate will come from hyperlocal data sets or hyperlocal algorithms, where people are doing things or tweaking it for local conditions and are finding data sets that we’ve got on our list to look at.”

That list is long, he acknowledged, and by crowdsourcing the project, they could identify even more data sets. “Things like seismic ratings for where your home is, whether you’re living under an airport flight path, pollution data, ambient noise around your home, toxic waste information, water quality — tons of stuff that is available out there,” Humphries explained.

There will also be a $100,000 second-place prize and a $50,000 third-place prize in the final round of the contest, and the top three teams in the qualifying round will split a $50,000 prize, too.

You can learn more about registering and competing for the Zillow Prize at www.zillow.com/zprize. Humphries says he’s been “delighted” so far with the reaction, which was swift and enthusiastic.

“We’re excited to harness the global data science community to improve something that’s going to be used by consumers,” he said.

Zestimate in the news

The Zestimate has been in the news lately due to a couple of lawsuits filed by Illinois real estate attorney Barbara Andersen. Earlier in May, Andersen filed suit against the company because she said the Zestimate on a condo she was trying to sell undervalued the property by about $100,000.

Andersen then dropped her initial suit in order to file a class-action lawsuit on behalf of a developer in Illinois; she will be representing any plaintiffs in the suit.

“To homeowners, sellers and buyers, the Zestimate home valuation remains an important data point. Combined with other information, like recent home sales, and the guidance of real estate professionals, the Zestimate helps consumers make smarter financial decisions about their homes,” stated Zillow in its press release about Zillow Prize.

In the release, Zillow said that it publishes Zestimates on more than 110 million homes based on 7.5 million “statistical and machine learning models.” Data points for the Zestimate come from “county and tax assessor records, and direct feeds from hundreds of multiple listing services and brokerages,” according to Zillow.

“Additionally, homeowners have the ability to update facts about their homes and see an immediate change to their Zestimate. More than 70 million homes on Zillow have been updated by the community of users,” Zillow stated in the release.

Email Amber Taufen

Like me on Facebook! | Follow me on Twitter!