The Trouble with Venture Capital Data — This is going to be BIG

There aren't many people who get the chance to analyze venture capital fund return data. You'd have to work for a very limited number of fund performance tracking firms, like Thomson Reuters, Cambridge, etc., or be an institution big enough to see a ton of different funds over time.

So when someone gets access to a dataset, no matter how incomplete, it's no surprise they'll rush into making lots of declarative statements about how the asset class performs without even a reasonable gut check.

Mattermark just posted a short report full of such statements and the former 21 year old institutional LP analyst in me (the job I got my VC start in over 15 years ago) flipped his shit upon close review.

Here's everything wrong with this dataset:

1) Sounds big, but really isn't.

First, the author says the dataset is "a dataset compiled by Bloomberg, covering 3,300 individual funds and 1,600 general partners". Sounds like a huge amount, but only later does he say that only "476 funds which had known Net IRR values, the overwhelming majority of which were from vintage 2002, or more recently."

VC funds raise money, on average, between every 3-4 years--and many more often than that. Conservatively, though, this data set of mostly 2002 or later funds would cover about 4 funds from each firm given that timeframe. That means if you have 476 funds, you're looking at about 119 managers. Now, not every manager sticks around through four funds, but even if you're generous it's probably no more than 200 managers. If it's much more than that, you're probably looking at a bunch of newbie funds on their first or second fund and their performance is too new to judge.

So what percent of the market is that? Well, CB insights lists somewhere in the neighborhood of 500 active VC firms as of 2013--meaning firms that did 4 or more deals that year. The NVCA has pegs the number of firms in this period as 900, but either way, it's multiples larger than whatever dataset Bloomberg has. Because returns are so positively and unevenly skewed, it's really hard to say you know a ton about the asset class.

2) Saying "long term returns" when you only have vintage 2002 and after data is a joke.

Companies take a long time to exit--often 5-9 years. If funds put their money to work over 3-4 years, how long before you really know that much about a VC fund.

The midway point of this dataset is 2009.

The average company of a 2009 fund was funded in 2011, just five years ago, and half the companies in that fund are less than five years old.

That means that half of the deals in half of the funds in this dataset are younger than Munchery, Duolingo, FiftyThree, and Codecademy.

Seems a little presumptuous to know exactly how that younger half is actually going to turn out when you look at it that way, doesn't it?

You know what a long term dataset is? Try one that starts with Accel I or Battery I back in the early 80's. That's some long term, multiple cycle kind of data. That's what we had at the General Motors pension fund when I was there. I mean, even forget the 80's for a moment. Analyzing venture returns without even looking at the 90's?

What is this? The baseball steroid era? Did we just forget that decade even happened? That's a lot of data to ignore.

Plus, if you were analyzing data from funds, you wouldn't necessarily compare absolute sizes. Fund sizes on average have grown every year as the asset class has changed. Companies are staying private longer than they did and so the growth rounds being raised are unique to this era. If you really want to compare big versus small, you'd try to normalize the size data across time.

3) You can't eat IRRs for breakfast.

You know what the IRRs of Quirky and Fab were before they went to zero? Really really big.

Looking at interim IRRs of young funds, especially during boom periods in venture isn't predictive of outcomes. Sometimes you've got a Google and other times it's a Pets.com.

The only really way to know?

Cash in the bank. Cash on cash returns, which take a long time to get, if you get them at all, are really the only true judge of VC returns.

Making investment decisions based on the private company markup of the last idiot who put money into something to get a preferred share that comes out first if shit hits the fan... well, good luck with all that.

4) Gut check.

Sometimes, you have to pull back from the data and just think rationally. If you don't think that it's harder to put more money to work in venture than less, in a world where companies do more with less, and great deals are scarce, than I don't know what color the sky is in your world.

If you have a $500 million fund, and you invest in two unicorns, and somehow maintain your pro rata, owning 20% of the fund, you still haven't even returned capital to your LPs.

In my dinky little sub-$25mm seed fund, a billion dollar outcome returns the whole fund three times over--just one, let alone what all the other deals do.

Venture doesn't scale--and if the data doesn't point that out, then I think you have to start questioning your data, or deciding that the data you have isn't venture. When Uber raises billions in a single round, seems weird to put that in the same asset class as the $350k I just gave to two people and some duct tape.

Don't believe everything you read.

May 16 The Trouble with Venture Capital Data

May 18 Turning the Phone Down the Street into the Feet in the Store: My Investment in Radius8

May 16 Know Thyself, Venture Capitalist: Thoughts on feeling open or closed.

Where to Find Me