• Homo Deus by Yuval Harari (Book Review) July 21, 2017 3:12 pm
    I previously wrote a review of Yuval Harari’s Sapiens, which I highly recommended, despite fundamentally disagreeing with one of its central arguments. Unfortunately, I cannot say the same about Homo Deus. While the book asks incredibly important questions about the future of humanity, it not only comes up short on answers, but, more disappointingly, it presents caricature versions of other philosophical positions. I nonetheless finished Homo Deus because it is highly relevant to my own writing in World After Capital. Based on some fairly positive reviews, I expected a profound insight until the end, but it never came.One of the big recurring questions in Homo Deus is why we, Homo Sapiens, think ourselves to be the measure of all things, putting our own interests above those of all other species. Harari blames this on what he calls the “religion of humanism” which he argues has come to dominate all other religions. There are profound problems both with how he asks this question and with his characterization of Humanism. Let’s start with the question itself. In many parts of the book, Harari phrases and rephrases this question in a way that implies humanity is being selfish, or speciest (or speciesist, as some spell it).  For instance, he clearly has strong views about the pain inflicted on animals in industrial meat production. While it is entirely fine to hold such a view (which I happen to share), it is not good for a philosophical or historical book to let it guide the inquiry. Let me provide an alternative way to frame the question. On airplanes the instructions are to put the oxygen mask on yourself first, before helping others. Why is that? Because you cannot help others if you are incapacitated due to a lack of oxygen. Similarly, humanity putting itself first, does not automatically have to be something morally bad. We need to take care of humanity’s needs, if we want to be able to assist other species (unless you want to make an argument that we should perish). That is not the same as arguing that all of humanity’s wants should come first. The confusion between needs and wants is not at all mentioned in Homo Deus but is an important theme n the wonderful “How Much is Enough” by Edward and Robert Skidelsky and in my book “World After Capital.”Now let’s consider Harari’s approach to Humanism. For someone who is clearly steeped in history, Harari’s definition of Humanism confounds Enlightenment ideas with those arising from Romanticism. For instance, he repeatedly cites Rousseau as being a key influencer on “Humanism” (putting it in quotes to indicate that this is Harari’s definition of it), but Rousseau was central to the romanticist counter movement to the Enlightenment, as championed by Voltaire. If you want an example of a devastating critique, read Voltaire’s response to Rousseau.   One might excuse this commingling as a historical shorthand, seeing how Romanticism quickly followed the Enlightenment (Rousseau and Voltaire were contemporaries) and how much of today’s culture is influenced by romantic ideas. Harari makes a big point of the latter, frequently criticizing the indulgence in “feelings” that permeates so much of popular culture and has also invaded politics and even some of modern science. But this is a grave mistake as it erases a 200 year history of secular enlightenment-style humanist thinking that does not at all give a primacy to feelings. Harari pretends that we have all followed Rousseau, when many of us are in the footsteps of Voltaire.This is especially problematic, as there has never been a more important time to restore Humanism, for the very reasons of dramatic technological progress that motivate Harari’s book. Progress in artificial intelligence and in genomics make it paramount that we understand what it means to be human before taking steps to what could be a post human or trans human future. This is a central theme of my book “World After Capital” and I provide a view of Humanism that is rooted in the existence and power of human knowledge. Rather than restate the arguments here, I encourage you to read the book.Harari then goes on to argue how progress in AI and genetics will undermine the foundations of “Humanism,” thus making room for new “religions” of trans humanism and “Dataism” (which may be a Harari coinage). These occupy the last part of the book and again Harari engages with caricature versions of the positions, which he sets up based on the most extreme thinkers in each camp. While I am not a fan of some of these positions, which I believe run counter to some critical values of the kind of Humanism we should pursue, their treatment by Harari robs them of any intellectual depth. I won’t spend time here on these, other than to call out a particularly egregious section on Aaron Swartz whom Harari refers to as the “first martyr” for Dataism. This is a gross mis-treatment of Aaron’s motivations and actions.There are other points where I have deep disagreements with Harari, including the existence of Free Will. Harari’s position, there is no free will, feels like it is inspired by Sam Harris in its absolutism. You can read my own take. I won’t detail all of these other disagreements now as they are less important than the foundational mis-representation of what Humanism has been historically and the ignorance of what it can be going forward.  
  • Uncertainty Wednesday: Continuous Random Variables (Cont’d) July 19, 2017 11:37 am
    Last time in Uncertainty Wednesdays, I introduced continuous random variables and gave an example of a bunch of random variables following a Normal Distribution.Now in the picture you can see two values, denoted as μ and σ^2, for the different colored probability density functions. These are the two parameters that completely define a normally distributed random variable: μ is the Expected Value and σ^2 is the Variance.This is incredibly important to understand. All normally distributed random variables only have 2 free parameters. What do I mean by “free” parameters? We will give this more precision over time, but basically for now think of it as follows: a given Expected Value and Variance completely define a normally distributed Random Variable. So even though these random variables can take on an infinity of values, the probability distribution across these values is very tightly constrained.Contrast this with a discrete random variable X with four possible values x1, x2, x3 and x4. Here the probability distribution p1, p2, p3, p4 has the constraint that p1 + p2 + p3 + p4 = 1 where pi = Prob(X = xi). That means there a 3 degrees of freedom because the fourth probability is determined by the first 2. Still that is one more degree of freedom than for the Normal Distribution, despite having only four possible outcomes (instead of an infinity).Why does this matter? Assuming that something is normally distributed provides a super tight constraint. This should remind you of the discussion we had around independence.  There we saw that assuming independence is actually a very strong assumption. Similarly, assuming that something is normally distributed is a strong constraint because it means there are only two free parameters characterizing the entire probability distribution.  
  • Be Careful of What You Wish For: Token/ICO Edition July 17, 2017 12:38 pm
    I have written several posts on token sales and ICOs already, including some “Thoughts on Regulating ICOs” and  “Optimal Token Sales.” With the continued fundraising success of new projects, here are some observations on investment terms and their potential implications for achieving successful outcomes.Many projects these days have a private fundraising event that precedes any public token offering. These take varying forms, including investments in corporations and some type of SAFT (Simple Agreement for Future Tokens). Fueled by a lot of demand, the terms of these raises have become more and more project friendly. Now at first blush that might seem great, but there is an adverse selection problem that is much more severe for protocols than it was for traditional startups. Why? Because in a traditional startup, if I get shut out as an investor I may never make it in and so even strong investors may “hold their nose” at bad terms. With a protocol though there will be a public token sale event and eventually a public token altogether, providing more flexible entry opportunities than in a traditional startup. That in turn means that if the early private terms are not appealing the mix of investors will rapidly shift to lower quality investors.Now you might still say: who cares about the investors? It’s all about the project creators in any case. And that is likely true once in a while. But overall the initiators of many of these projects are inexperienced when it comes to building an organization, allocating resources, dealing with adversity and so on. The kind of things your early investors and advisors can help you with, if they are good. The evidence from traditional startups that have done “club rounds” as their seed round, with many investors piling in, none with a meaningful equity position and hence skin in the game, suggests that difficult questions often remain unresolved.I expect that we will unfortunately relearn this same lesson on many projects currently being funded via token sales. It will be fascinating to see how well some of the projects that have recently raised tens and sometimes hundreds of millions of dollars do with putting that money to work sensibly and as they encounter the need to make tough decisions. My sense is that outcomes here will be influenced by the strength of the extended team, including early backers.  
  • Uncertainty Wednesday: Continuous Random Variables July 12, 2017 12:34 pm
    So far in Uncertainty Wednesdays we have only dealt with models and random variables that had a discrete probability distribution. Often in fact we had only two possible states or signal values. There are lots of real world problems though in which the variable of interest can take on a great many values. For example the time between two events taking place. We could try to break this down into discrete small intervals (say seconds) and have a probability per second. Or we could define a continuous random variable where the wait time can be any real number from some continuous range.Now if you have been following along with this series you will have one immediate objection: how can we assign a probability to our random variable taking on a specific real number from a range? A range of reals contains uncountably infinitely many real numbers and hence the probability for any single real value must be, well, infinitely small? So how do we define a Prob(X = x)?Before I get to the answer let me interject a bit of philosophy. There is a fundamental question about the meaning of real numbers: are they actually real, as in, do they exist? OK, so this is a flippant way of asking the question. Here is a more precise way. Is physical reality continuous or quantized? If it is quantized, then using a model with real numbers is always an approximation of reality. My reading of physics is that we don’t really know the answer. A lot of phenomena are quantized but then there is something like time, which we understand extremely poorly (which is why I chose time as opposed to say distance as my example above). Personally, while not, ahem certain, I am more inclined to see real numbers as a mathematical ideal, which approximates a quantized reality.Does this matter? Well, it does because too often continuous random variables are treated as some kind of ground truth, instead of an approximation to a physical process. And as we will see in some future Uncertainty Wednesday, often this is a rather restrictive approximation.Now back to the question at hand. How do we define a probability for a continuous random variable? The answer is through a so-called probability density function (PDF). I find it easiest to think of the PDF as specifying the probability “mass” for an infinitesimal interval around a specific value. Let’s call our density function f(x), then the value of f(x) at x is not the probability of X = x but rather the probability of x - ε ≤ X ≤ x + ε for an infinitesimal ε (I will surely get grief from someone for this abuse of notation).But by thinking about it this way it then follows quite readily that we can find the Probability of X being in a range by forming the integral of the probability density function for that rangeProbably the single best known probability density function is the one that gives us a random variable with a Normal Distribution. The shape of the PDF is why the Normal Distribution is also often referred to as the “Bell Curve”Next Uncertainty Wednesday we will dig a bit deeper into continuous random variables by comparing them to what we have learned about discrete ones.
  • Assessment of Trump Presidency? July 11, 2017 1:15 am
    We are rapidly approaching the first half year of Trump’s Presidency. I am genuinely curious whether there is anyone attempting a cogent defense of the record so far. If you have read something you think qualifies, please link to it in the comments. I would also love to see someone go back and ask people, like Peter Thiel, who supported Trump’s candidacy about their assessment of his performance to date.I just spent a week in Germany leading up to the G20 summit. While the following is a commentary after the summit (and from an Australian journalist), it echoes a sentiment I heard frequently: Trump is embarrassing and isolating the US, giving more room for China and Russia in the world. Again: I really would like to listen to or read an opposing view, as long as it is coherent, calm and reasoned. This could be about the presidency as a whole, domestic or foreign policy. So please if you have something worthwhile add it in the comments.
  • Individual Courage and Lasting Change in Tech July 7, 2017 12:24 pm
    The women who have come forward in tech in the last few weeks, including Susan Fowler, Niniane Wang and Sarah Kunst have shown great courage. Their courage will be the catalyst for lasting change, provided all of us turn this into a sustained effort. Having been in tech for over two decades as an entrepreneur and investor, I have sadly looked into allegations of sexual harassment on more than one occasion. The majority of these cases turned out to frustratingly go nowhere, as initial allegations were not subsequently confirmed. Women were scared to follow through. Scared that they would be known for their complaint, instead of their accomplishments. Scared that their careers would be blocked, their fundraising stalled. Those fears were real and justified and it is clear that for every complaint there were many, many more incidents that went unreported.Change starts with courageous individuals. To get systematic change though will take time and broad engagement. I hope that more women will find inspiration by those who have been leading to follow through on the record. And to start speaking out in the moment, to push back and call men out on the spot, knowing full well that doing so will, at times, come at a personal cost. There will, however, also be investors and employers who will be supportive and I want USV to be among those. None of this should be required of women in the first place. We men should behave professionally. There should be more women making investment decisions (including at USV). There should be more diversity of all kinds and at all levels of tech, period.Getting there will take a long time though, especially in Venture Capital. Change in our industry is slow, as fund cycles are long. At USV we last added a new General Partner in 2012 with Andy. Change in the diversity of GPs at existing firms will proceed one retirement / new partner at a time (in a variant on Planck’s quote that “science progresses one funeral at a time”). There is, however, a potential accelerant for change due to the VC industry itself being disrupted. The balance of power has been shifting from investors to entrepreneurs for some time now, as starting a company has become cheaper, services such as Mattermark have provided more visibility and networks such as AngelList have broadened who can invest. Finally, with crypto currencies and token offerings, a whole new funding mechanism that doesn’t rely on gatekeepers has become available.So now is the time to double down in order to achieve lasting change. We all need to seize this moment. Here are actions we can take as investors:Make sure to have a sexual harassment policy that explicitly covers external relationships. Here is a template for a policy for VC firms to include in employee handbooks. Jacqueline provided input to this and we will adopt a version for USV.Become a limited partner in some of the women and minority led early stage funds. As Shai Goldman points out, for the earliest investments there is nothing to go on other than the entrepreneur. Susan and I are LPs in Female Founders Fund and Lattice Ventures and will be LPs in 645  Ventures next fund.Learn about unconscious bias. We all love to think of ourselves as being immune to it, as strictly applying objective criteria, but there is ample research to the contrary. I recommend “What Works” by Iris Bohnet.And of course most of all: fund more women and minority entrepreneurs!
  • Uncertainty Wednesday: Variance July 6, 2017 12:54 am
    In last week’s Uncertainty Wednesday, I introduced the expected value EV of a random variable X. We saw that EV(X) is not a measure of uncertainty. The hypothetical investments I had described all had the same expected value of 0. It is trivial, given a random variable with EV(X) = μ  to construct X’ so that EV(X’) = 0. That’s in fact how I constructed the first investment. I started with $0 with 99% probability and $100 with 1% probability, which has an EV ofEV = 0.99 * 0 + 0.01 * 100 = 1and then I simply subtracted 1 from each possible outcome to get -$1 with 99% probability and $99 with 1% probability.What we are looking for instead in order to measure uncertainty, is a number that captures how spread out the values are around the expected value. The obvious approach to this would be to form the weighted sum of the distances from the expected value as follows:AAD(X) = ∑ P(X = x) * |x - EV(X)|where | | denotes absolute value (meaning the magnitude without the sign). This metric is known as the Average Absolute Deviation (btw, instead of the shorthand P(x) I am now writing P(X = x) to show more clearly that it is the probability of the random variable X taking on the value x). AAD is a one measure of dispersion around the expected value but it is not the most commonly used one. That instead is what is known as Variance, which is defined as followsVAR(X) = sum Prob(X = x) * (x - EV(X))^2  Or expressed in words: the probability weighted sum of the squared distances of possible outcomes from the expected value. It turns out that for a variety of reasons using the square instead of the absolute value has some useful properties and also interesting physical interpretations (we may get to those at some later point).   Let’s take a look at both of these metrics for the random variables from our investment examplesVarianceInvestment 1: 0.99 * (-1 - 0)^2 + 0.01 * (99 - 0)^2 = 0.99 * 1 + 0.01 * 9,801 = 99Investment 2: 0.99 * (-100 - 0)^2 + 0.01 * (9,900 - 0)^2 = 0.99 * 10,000 + 0.01 * 98,010,000 = 990,000Investment 3: 0.99 * (-10,000 - 0)^2 + 0.01 * (990,000 - 0)^2 = 0.99 * 100,000,000 + 0.01 * 980,100,000,000 = 9,900,000,000Average Absolute DeviationInvestment 1: 0.99 * |-1 - 0| + 0.01 * |99 - 0| = 0.99 * 1 + 0.01 * 99 = 1.98Investment 2: 0.99 * |-100 - 0| + 0.01 * |9,900 - 0| = 0.99 * 100 + 0.01 * 9,900 = 198Investment 3: 0.99 * |-10,000 - 0| + 0.01 * |990,000 - 0| = 0.99 * 10,000 + 0.01 * 990,000 = 19,800 You might have previously noticed that Investment 2 is simply Investment 1 scaled by a factor of 100 (and ditto Investment 3 is 100x Investment 2). We see that AAD, as per its definition, follows that same linear scaling whereas variance grows in the square, meaning the variance of Investment 2 is 100^2 = 10,000x the variance of Investment 1.Both of these are measures that pick up the values of the random variable as separate from the structure of the underlying probabilities. If that doesn’t make sense to you, go back and read the initial post about measuring uncertainty and then go back to the posts about entropy. The three hypothetical investments each have the same entropy as they share the same probabilities. But AAD and Variance pick up the difference in payouts between the investments.
  • Happy 4th of July: Celebrating Truths July 4, 2017 10:48 pm
    I am spending this 4th of July in Germany, having visited my parents and friends from growing-up near Nuremberg for the last few days. It is late here now and I am seeing all the Happy 4th wishes from the US in my Twitter timeline. That made me think about what I feel like celebrating on this day. And as I wrote last year, it is not so much independence we need in the modern world, but rather interdependence.  There is, however, something very much worth celebrating and that is the memorable language from the Declaration itself. It brought with it some ideas that feel as important today as they were back then, such as the concept of unalienable rights. Other aspects, though, feel like they need to be updated for the progress that has been made since then. Based on my writing in World After Capital, here is a newly phrased “preamble” that I would be happy to celebrate for many years to come (although I am sure it can be improved upon):We hold these truths to be universal, that all humans are created equal; that they are endowed qua their humanity with certain unalienable Rights, that among these are Life, Liberty and the pursuit of both Happiness and Knowledge; that they have Responsibilities towards each other and other species, that among these are Tolerance, and the Application and Furtherance of Knowledge for the Benefit of All.I will, in a future post, explain the rationale for my choice of words and ideas in detail.Until then:  Happy 4th of July!
  • ChangeX: Impact As A Service June 30, 2017 2:03 pm
    Susan and I have been longtime supporters of the wonderful ChangeX platform: ChangeX helps spread social innovation across communities. This week ChangeX introduced a new model for donors, which they are calling Impact as a Service (IaaS). This is a bit of a tongue-in-cheek reference to Infrastructure as a Service (also IaaS). It is spot on though as the two share important characteristics:IaaS, reduces or even eliminates the overhead of manual setup activities, allowing all participants to focus on what actually delivers value. IaaS provides much more transparency – you can see exactly what you are paying for and what you get in return.IaaS lets you start small and scale up.Delivering IaaS is an important milestone for ChangeX. Paul and the team set out to build a true technology platform to make social programs more impactful and help them grow across the world. Their diligent investment is now starting to pay off, which is exciting. Congratulations to the team!
  • Uncertainty Wednesday: Expected Value June 28, 2017 11:11 pm
    In today’s Uncertainty Wednesday we start to explore the properties of random variables.  The first one we will look at is the so-called “expected value” or EV for short. The EV is simply the probability weighted average of a random variable X, i.e.EV = ∑P(x)*x      sum is for all possible values x that X can haveNote: for now we will continue to work with so called discrete random variables which have distinct values of x and where for each value of x P(x) > 0 (as opposed to continuous random variables where x can vary in infinitesimally small increments – we will get to that later). The first thing to note about the expected value is that it doesn’t need to be a value that the random variable can take on. This is just like when you have a group of numbers and take their average, the average doesn’t have to (and often won’t be) one of the numbers.In fact, here are the expected values from last week’s investment examplesInvestment 1: 0.99 * (-1) + 0.01 * 99 = -0.99 + 0.99 = 0Investment 2: 0.99 * (-100) + 0.01 * 9900 = -99 + 99 = 0Investment 3: 0.99 * (-10000) + 0.01 * 990000 = -9900 + 9900 = 0The expected value of all three investments is 0 but the only values that the random variable can take on are -1, 99 in investment 1, -100 and 9900 in investment 2 and -10000, 990000 in investment 3.Also quite clearly the expected value being the same for all three investments (despite their massively different payouts) means that it is not an additional measurement of risk above and beyond what was provided by the entropy of the underlying probability distribution.Now something that causes no end of confusion is that the expected value goes by lots of different names, including “mean” and “average.” The reason that’s a problem is because we also use mean and average when we talk about a bunch of outcomes that have been realized from an underlying distribution. Just looking at the numbers above, you can easily see how you might get -1, -1, -1 three times in a row if you keep making investment 1. Now the average of that is obviously -1, which is quite clearly NOT the same as the expected value (which is 0). That’s why I like to use expected value as a term for characterizing an entire random variable and reserve mean (or better yet “sample mean”) for a set of outcomes.Later in Uncertainty Wednesday we will learn about the relationship between the sample mean and the expected value. For now please repeat after me: I will not confuse the sample mean with the expected value!As an exercise you may want to analyze how the expected value changes when every payout is multiplied by a number and has a constant added to it (i.e. a linear transform of the payout). So define a new random variable Y, where y = ax + b for every x from X. What is EV(Y) in terms of EV(X)?  
  • Scifi Book Recommendations: Seveneves and Three Body Problem Trilogy June 26, 2017 1:49 pm
    I love reading science fiction for two reasons: first, because of what I learn about the possible future and second, because of what it says about the present. For instance, I read a lot of William Gibson and Bruce Sterling and they correctly predicted the current rise of China, global Megacorps and crypto currencies (and we are making headway on AR and VR). I have just finished works by Neal Stephenson and Cixin Liu that give me a lot of pause about the present. I recommend them highly, but they are not for the faint of heart and may throw you into an existential funk.Warning: the following does contain some fairly general spoilers, nothing specific though, still depending on the kind of reader you are you may want to read the books first and then come back here (which will take a while given their heft). Both Stephenson’s “Seveneves” and Liu’s “Three Body Problem” trilogy deal with existential threats to humanity and our responses to those threats. The key takeaway from both is: humanity is woefully unprepared scientifically, technologically and psychologically for an existential threat.In both works the threat emanates from outside of our world, but much of the exploration of attitudes towards science and the challenges of maintaining democracy (and even basic humanity) would apply equally to a problem of our own making such as climate change. Both works praise, maybe excessively so (glorify?), the contributions of individual, often rogue, scientists and entrepreneurs in overcoming the inertia of a population that’s largely weak and self absorbed.Also shared by both works is an exploration of complex male female dynamics at the level of society and how those are impacted not just by the immediate crisis but also by technological progress overall. I strongly recommend reading Liu’s trilogy to the end for an important change in perspective on a central female character, whose actions are pivotal twice. Despite that shift at the end, Liu’s view appears decidedly more male oriented, whereas Stephenson’s title “Seveneves” sums up his more female centric approach. In either case though the authors are strongly defending what might be called “hardnosed altruism” which they seem to see as declining in present society (as an aside, based on previews, the upcoming movie Dunkirk appears to cover similar territory).The key takeaway though, one that I wholeheartedly agree with, is that many of problems we are pre-occupied with today as individuals, as nations, and as humanity as a whole, are nearly trivial when placed in the broader context of the universe at large. This is not to say we shouldn’t care about these problems or try to address them. But we shouldn’t let them take up all of our attention. Instead much of that attention should be freed up and directed towards progress. This also happens to be the central idea of my book World After Capital. Reading “Seveneves” and the “Three Body Problem” trilogy may give you a much more emotional access to the importance of the transition from the Industrial Age to the Knowledge Age.
  • Health Insurance and the R Word (Redistribution) June 24, 2017 3:59 pm
    There are a lot of outright lies (at worst) / profound misunderstandings (at best) on all sides circulating around health insurance at the moment. Today’s post is about the most fundamental one: the relationship between health insurance and redistribution and what it means for social consensus. Redistribution is a toxic word in US political discourse, but one I want to reclaim. So let’s be up front: Much of what we do as a society is about redistribution. For instance, public roads and public schools both involve a degree of redistribution (from those who don’t drive to those who do, from those without children to those with). We have public roads and public schools because we believe that society as a whole is better off with them.Now redistribution is also at the heart of insurance. Start with the simplest of cases which doesn’t involve the government or even an insurance company at all. Consider just a bunch of farmers getting together to insure each other against crop failure. Everyone puts a little bit of money in. If everyone’s crops are fine you get the money back or roll it forward to the next season. If someone’s crops fail they get the money so they don’t starve. Ex post, every insurance scheme is redistribution from those who are fine to those who experienced a loss. The reason people have nonetheless participated voluntarily in various forms of local or communal insurance is that you don’t know in advance whose crops are going to fail.Now here comes the rub. What if you are a farmer who is wealthy enough to survive a year or more of crop failure? You might be better off not participating in the insurance scheme, thus saving on your contribution potentially for many years in a row. But this act of an individual opting out tends to decrease the size and effectiveness of the insurance scheme for all. Historically, most societies had strong local pressures on participation mostly based on social norms (many of these schemes originally worked implicitly via a commons, as researched by Elinor Ostrom).As societies have grown in scale and become more impersonal, governments increasingly had to step in with formal regulations to assure participation in insurance schemes to maintain their viability. For instance, here in the US car insurance is mandatory in most (but not all states).So what should you take away from this? There always is some element of redistribution to insurance – at a minimum ex post and generally also ex ante. The “why should I (usually some healthy person) pay for x (usually some payment for someone from a different demographic)” objection to health insurance is about redistribution. We should acknowledge this openly and not pretend that it is otherwise, because then we can move forward and say “you should, because that is your contribution to how our society works.”This also points to why at present health insurance at the federal level is such a battleground. The US is too heterogenous at the moment to have a consensus on how society works for many important issues affecting healthcare. There is, for instance, no broad consensus on reproductive rights including questions of access to contraception and abortion. Having government enforce some amount of redistribution via healthcare in the absence of a broad consensus is difficult at best and possibly even dangerous by turning federal government into a villain. See my related recent post about getting past the dominance of nation state. And if you want to geek out more about insurance, here are some more posts. 
  • Uncertainty Wednesday: Random Variables June 21, 2017 12:07 pm
    Just a quick reminder on where we currently are in Uncertainty Wednesdays: I had introduced the idea of measuring uncertainty, then we defined what a probability distribution is and learned about entropy, which is a measure of uncertainty that is solely based on the probabilities of different states. We examined entropy for a simple distribution, and learned about the relationship of entropy to communication.Now consider again our super simple world with two states A and B. Suppose that P(A) = 0.99 and P(B) = 0.01. We will keep this fixed, meaning we will not change the entropy of the probability distribution. Furthermore, you know from our analysis that the entropy of this distribution is quite low as the states have very unequal probabilities.Suppose that these states represent the success or failure of an investment and you are faced with the following different payoutsInvestment 1: A -$1, B $99Investment 2: A -$100, B $9,900Investment 3: A -$10,000, B $990,000    The first thing to notice is that all three investments have the same 100x return. Wait, why 100x and not 99x? Because I have given you the net payouts. So in investment 1 you put up $1 and in state A you get back $0 (meaning you have now lost $1, hence -$1) whereas in state B you get back $100 (which means you now have $99 new dollars).Intuitively there appears to be a big difference in uncertainty between these three investments, despite the fact that they have the same returns and the same entropy. To start to measure this difference, we need to introduce a new concept, that of a random variable.A random variable X is simply a variable that takes on different values in different states of the world, with a defined probability distribution across those states. So for Investment 1X = -1 with probability 0.99 (state A occurs)X = 99 with probability 0.01 (state B occurs)Often we will write this shorthand as P(-1) = 0.99 and P(99) = 0.01 (in an upcoming post I will talk about why this shorthand obscures something important).We can now define measures such as the mean (or expected value), the variance and more to summarize the behavior of the random variable. If you already know what the expected value is, you can quickly convince yourself that it is the same for each of the investments above (and is what?).
  • MongoDB Stitch: A Fresh Take on Backend As A Service June 21, 2017 12:40 am
    Nearly 10 years ago, I wrote a post titled “I Want a New Platform” which led to our investment in MongoDB. At the time the company was called 10gen and had developed a Platform as a Service: a Javascript application server with a built in database. It turned out that we were ahead of the times (this was before Google App Engine and before Node). People wanted infrastructure as a service and AWS grew by leaps and bounds. 10gen found little adoption until the team decided to mothball the application server and make the database available separately. From there on MongoDB experienced rapid growth.That history is why I am particularly thrilled about the release of MongoDB’s Stitch service today at MongoDB World. Stitch is Backend as a Service for web and mobile apps. What makes Stitch special is that it seamlessly integrates MongoDB with services such as Twilio and Slack using a declarative approach. This dramatically reduces how much glue code needs to be written, enables reuse of Stitch pipelines, and provides autoscaling. All while giving you complete MongoDB based access to and manipulation of your data and even handling user authentication (including access controls based on that authentication).Getting up and running with Stitch is super fast and there is a free plan. I used a pre-release version to build a system for texting reminders to myself. I was able to build the whole thing in Stitch using MongoDB and Twilio with only a couple of declarative pipelines and zero custom code. Go and give it a whirl!
  • Getting Past the Dominance of the Nation State June 17, 2017 7:50 pm
    One of the important topics that I have not yet addressed in World After Capital is the role of the Nation State. So I am gathering up some of my thoughts in a post first. I believe it is critical that we get past the dominance of the nation state as the key organizing principle in the world. That doesn’t mean doing away with nation states (at least not overnight), but gradually de-emphasizing their importance.Here are my arguments for why we need to de-emphasize the Nation State:1. Nation states, true to their name, tend to emphasize the interests of a particular nation above others. The “America First” policy pursue by Trump is a prime example of this. As I have written repeatedly in World After Capital, by emphasizing superficial differences, this goes against the fundamental need for humanity to focus on our commonality.2. As a first approximation problems today come in two forms: global and local/regional. The nation state sits uncomfortably between the two. Global problems include climate change, infectious disease, corporate and individual taxation. These cannot be solved by any one nation state. Nor even by a small group of them. They truly are global in nature. Conversely, problems such as transportation, education, healthcare can and should be solved at the local or regional level. This problem is particularly acute in a country with the size and diversity of United States.3. We are in a time of profound change and as such we need more experiments on anything that doesn’t absolutely require global coordination. The Nation State is too large a unit for good experiments. Take education as an example. Having a national policy makes little sense at a time when technology is fundamentally changing how learning can occur.4. Information technology allows new approaches to regulation through transparency. In many instances what the federal level role should be is provide requirements for transparency of and interoperability between local/regional policies. This means we could have a significantly smaller Federal Government in terms of the number of direct employees, size of agencies and body of regulations.One reason to be excited about a truly decentralized internet, including decentralized yet consistent state (aka blockchains) and crypto currencies, is that these technologies have the potential to help us get past the Nation State. These decentralized systems are not constrained by the existing boundaries. They are truly global in nature, connecting all of humanity.Taking nation states as a given permanent feature of humanity is mistaking a short period of history for something permanent. I grew up near Nuremberg in Germany and it is useful to look at a historic map of the area from around the year 1200.It shows a large number of tiny principalities that had their own rulers, spoke widely varying local dialects, used different currencies, etc. Over time these fused into larger units and in the early 1800s Franconia became part of Bavaria. Today Bavaria is part of German, which in turn is part of the EU. This process of change and and should continue on a global scale.How should we determine at which scale to address a particular problem? The key principle here is the one of “subsidiarity”: decisions should be made at the lowest possible level. Since we have one global atmosphere we need to make some decisions globally, like how many greenhouse gases we should have. But staying with the same issue, the actual ways of achieving a limit should be decided a lower levels, such as regions.Given that all of the most pressing problems – climate change, infectious disease, taxation, death from above – are global, now more than ever is the time to get past the dominance of the nation state.
  • Uncertainty Wednesday: Entropy and Communication June 14, 2017 4:14 pm
    The last two Uncertainty Wednesdays we have looked at entropy as a measure of uncertainty. Today I want to tie that back to the original communications context in which Shannon introduced it.  We can use our really simple model of a world with two states and two signal values to do this. We have analyzed an example where the signal was a test. Now think of the signal as actually being an electric impulse sent down a wire by a local observer. The observer gets to see perfectly which state of the world has been realized and needs to communicate that state to us.With two states and two signal values there is a super simple communication scheme available: if the observed state is A send a 1 (or an “H”) and if the observed sate is B send a 0 instead. So with a single message transmission our observer can give us perfect information about the state they observed (meaning once we receive the signal, there is no uncertainty left and we will know exactly which state was observed). Important aside: we are assuming here that our observer’s incentives are aligned with ours so that the observer will truthfully communicate the state they have observed (we will get to incentive problems later in this series).Now you might think that this is all there is to is and surely we can’t do any better than this simple communication scheme. And that’s right for a one off observation. But now imagine that the state keeps changing (sometimes it is A and sometimes it is B) and we want our observer to keep signaling to us which state has occurred. In this repeated set up, is our simple communication scheme still the best? Somewhat surprisingly — if you have not worked on communication before — the answer is no. We can do better. And how much better turns out to be captured precisely by entropy!To understand this let’s first develop some intuition by going to the extreme case. Assume P(A) = 1 (and hence P(B) = 0). In this case we don’t need a signal from our observer at all. No matter how many times a “new” state happens, we know it will always be A. Imagine moving away from that just a tad to say P(A) = 0.999 and P(B) = 0.001, meaning B has a 1 in 1000 chance of happening. Well, we would expect to see long sequences of state A happening and our observer would be sending us long sequences of 111111111 with only an occasional 0 interspersed. That means we could compress the communication through the use of a protocol. For instance, our observer could wait until state B occurs and then send us the number of all As that’s been observed until then. Of course that number would have to be encoded in binary. But that affords a lot of compression: here is 32 As uncompressed “11111111111111111111111111111111” and here is the number 32 in binary “100000”. Obviously there would be some extra transmission overhead (after all, how would we tell where a number starts and stops?) but you can see that we can do a lot better.Now when I say that we can do better there is an important caveat here that often gets lost when this is discussed. The only way we can do better is by having the observer “buffer” some observations and then apply the compression algorithm. If we need to know each state *with certainty* immediately as it occurs, then we cannot do better than our super simple algorithm. But if we can afford to wait, well then we can do a lot better (I chose the word “buffer” on purpose here because this is related to the phenomenon at play when you start streaming audio or video on your computer).Let’s proceed and actually devise a protocol to explore compression a bit more. We will use what is called a Huffman code in which we choose the shortest code for the most frequent sequence and longer codes for less frequent series. To make this really simple we will only compress two subsequent observations. Let’s assume P(A) = 0.9 and hence P(B) = 0.1, then for two subsequent observations, assuming independence, we have the following probabilitiesP(AA) = 0.81P(AB) = 0.09P(BA) = 0.09P(BB) = 0.01and we will assign codes as followsAA -> 1AB -> 01BA -> 001BB -> 0001You will notice how the 1 signal always completes the code. We can implement both the sender and the receiver for this encoding as simple finite state machines (this is left as an exercise for the reader). Now let’s compute our expected average signal length0.81 * 1 + 0.09 * 2 + 0.09 * 3 + 0.01 * 4 = 1.3which is clearly shorter than the signal length of 2 which we would get from our simple scheme. And if we divide by 2 we see that our signal length per observed state is 0.65 — meaning we use on average less than a single signal to transmit a state!Finally let’s compute entropy and see what Shannon’s work tells us is the best we could do- 0.9 * log 0.9 - 0.1 * log 0.1 = 0.469By the way, we choose base 2 here for the logarithm to reflect that we have two possible signal values. And when we do that we call the measure of entropy “bits,” where a bit is a signal that has two possible values (generally assumed to be 0 and 1 and the link is to my previous series called Tech Tuesdays).So Shannon says that the best we could possible do for two states with p = 0.9 is 0.469 bits. By combining two successive states using a simple Huffman code we were able to achieve 0.65 bits which is not bad. If we expanded the size of our blocks to three, four or more states we can do even better, slowly approaching the theoretical minimum. Now as you let this all sink in please keep in mind that the compression is only possible here if we can wait to combine multiple state observations into a block which we then compress before transmission. This is an important assumption and may not always be true. For instance, suppose that the state of the world we are interested in is whether a particular share price is up or down and we want to trade on that. Well obviously we will not want to wait to accumulate a bunch of share price movements so we can save on communication. Instead we will invest heavily to give ourselves more communication capacity.
  • Optimal Token Sales June 13, 2017 3:04 pm
    The problem of finding an optimal structure for a token sale is quite difficult.For starters, the objectives between teams and investors are potentially different. Furthermore, these are not homogenous groups. Teams vary widely along at least two dimensions: intent (from get rich quick to change the world) and competency. Investors differ around dimensions such as risk tolerance and intended holding period.  As if that were not complicated enough, there is also a question how a sale is supposed treat incentives for the creation of a protocol relative to future participation in the protocol’s operation. Token sales mechanisms will differ widely in how they address all these competing objectives.So what should an optimal token sale actually be optimizing? Usually, when we say “should,” the implied normative perspective is the overall social one. That perspective includes many parties who don’t participate in the token sales at all, but rather will be using the successful tokens sometime in the future. Optimizing for this means maximizing funding for projects that are most likely to have a real impact (and do so in a way that maximizes impact), while keeping the money going to everything else as small as possible. Everything else being projects likely to fail as well as outright fraud.This is no different from the optimum we are, at least in principle, aiming for with traditional private and public financing markets for companies, R&D projects, and so on. It explains a lot of the regulation around VC and IPOs, whether or not you believe these systems are effective (for instance, I have written about how I think the IPO market is broken). What is different is that token sales are global and do not (yet) have a regulatory framework. We are therefore in a period where teams choose independently and we are seeing a lot of different approaches. Experimentation is essential for finding out what works best. Vitalik Buterin lays out a couple of examples in his post on Analyzing Token Sales Models and it would be great to see a compendium of all of the sales to date.The most potential for trouble are token sales which are one-time, large (possibly even uncapped) and take place when minimal specification / technical work has been done. In these the risk of outright abuse is highest (eg team starts paying themselves above market salaries, lavish perks), as well as the risk of nothing of use ever shipping is highest also. This is of course why in the traditional venture model early rounds tend to be smaller. On the plus side though the fact that a number of such sales have happened does provide a strong incentive for people to want to create new protocols. And a sale like this can finance all the work that is needed to create something quite complex.Conversely, the least problematic and the best incentives for the operation of a protocol would come from a highly distributed “helicopter drop” of a token that can immediately be used in a fully functioning protocol. The team makes no money here so there is zero potential for abuse, there is no technical risk (by assumption) and the recipients have a windfall so they will have no issue selling or using the token. The obvious downside is that such a “sale” cannot finance the creation of a protocol since it has (a) no proceeds and (b) happens after the works is already done. It can still provide an incentive though for creation since some percentage of the currency can be retained / earned in the future by the protocol creators. ZCash is a good example of this approach.These are obviously somewhat extreme bookends which I picked on purpose. The “optimal” approach would seem to fall somewhere in-between and have roughly the following elements1. Keep initial sale small. This could even be a traditional investor round that helps fund early technical work. Holdings can be concentrated as more will be sold / distributed later. Little information is required and investors should meet speculation suitability requirement (meaning should be able to lose 100% without a problem).2. Hold one or more subsequent sales of increasing size, to the extent that further funding is required for development. As the sales get larger aim for more widespread holding. Publish as much information about protocol and how it will work as is available to show reduction in risk.3. When the protocol is ready, hold a final sale and/or distribution that gets all but a small fraction of the initially available tokens out for use. Should a large fraction of the tokens be retained, it must be released programmatically and transparently. Under no circumstance should a large fraction that’s retained be discretionary (this what Vitalik calls the “No Central Banking” rule). Also: if your token has native mining, then you can use mining instead of this step 3 to have tokens come into existence in a highly distributed manner.What are some properties of sales that may seem important but likely are not?A. There does not need to be a guarantee of participation (this was one of Vitalik’s proposed properties). For instance a scheme in which many orders are submitted, but only some are chosen randomly is likely economically efficient. This may violate a sense of fairness, which resides in a moral realm that I don’t want to dismiss, but does not have an incentive impact.B. The price of a token does not necessarily have to go up in subsequent rounds. There can even be a “helicopter drop” or future mining that may turn out to be initially cheap. What is much more important for early purchasers than short term price fluctuations is understanding what percentage of the currency they have bought (note that this percentage can go down over time, see my post on monetary policy). Early buyers will generally approach a token with a view towards the possible size of the economy for that token.There is a lot here and I will clarify in a subsequent post or posts, once I have gotten feedback. This will include a discussion of mechanisms that can be used to deal with suitability requirements and spreading sales widely.Important disclaimer: None of this is legal advice. The entire post is written strictly from an incentive point of view.
  • Monetary Policy for Crypto Tokens June 11, 2017 4:40 pm
    In an information sessions for Congress, Peter Van Valkenburgh used my favorite analogy comparing crypto tokens to tickets at a fair ground. And William Mougayar has a new post about tokens where he specifically refers to them as “privately issued currency.” No matter how you think about tokens, there is now a great interest in understanding why and how their prices are determined in the market. The trivial answer of course is: supply and demand. But what exactly is the supply? As it turns out one critical determinant of supply, and hence of the price of a token today, is how many tokens there will be in the future. This is determined by the “monetary policy” of the token. To date all major tokens have either a fixed amount of tokens right away or eventually (e.g., BTC) or a rate of inflation that asymptotically goes to zero (e.g., ETH).Philosophically this “no inflation” choice seems to be inspired by a deep rooted aversion to central banks and their policies of growing the money supply. Many people in the crypto currency community consider this as a kind of appropriation, or even theft, from those who already hold currency.But the current approach has a severe drawback. It results in extremely rapid appreciation of tokens well ahead of their use value. Why is this? Without future inflation, the discount rate to be used in determining Net Present Value (NPV) of a token is quite low. And as anyone who has built an NPV or Discounted Cash Flow (DCF) model knows, NPV is extremely sensitive to changes in interest rate. In fact, as the discount rate approaches zero, the NPV explodes towards infinity as can be seen in the following chartNow one might argue that I am confusing concepts here because you can sell a token only once in the future and not repeatedly. But think of it differently: in the future the tokens will be used again and again and again, each time have some use value (you can think of that use value multiplied by the number of tokens in the future as the value of the network as a whole). So each token today will reflect that discounted future stream of use values.The future percentage rate of inflation is a key component of the discount rate. And for the majority of tokens today that component is ZERO! Now the other two components of the discount rate are the risk free rate of return, which is generally taken to be the return on some government backed asset. Well those rates are at historic lows and are essentially ZERO also because of a global glut of capital. The third and final component of the discount rate is the risk premium. Here I think many investors are currently vastly underestimating the risk they are taking on, largely because we are in the honeymoon phase with crypto tokens.So here is a rough approximation of the discount rate as I see itdiscount rate = inflation (ZERO by monetary policy of token) + risk free rate (ZERO because of glut of capital in the world) + risk premium (mistakenly near ZERO due to honeymoon phase)Taken together this gives you discount rate that is way too small which in turn results in an NPV that’s way too high. I believe this explains much of what we are currently seeing in token prices.Now you night say, why is an inflated NPV a problem? The answer is that it causes a wide divergence between the personal incentives of teams holding token sales from the socially desirable characteristics (Vitalik Buterin gets at this somewhat in his post on analyzing token sales models, but doesn’t draw the distinction clearly enough). I will write a separate post (or several) addressing the incentive problems in token sales. Until then one lever you should consider is having a token with a fixed low percentage inflation rate to reduce NPV. Monetary policy matters.PS One way to inflate is to issue new currency to lots of people which could form the basis of a future global basic income.
  • Mobile OS Duopoly: Apple and Google Extending their Power June 9, 2017 1:54 pm
    I have for some time pointed out the negative impact on innovation of the appstore duopoly. Things are getting worse though. With the latest releases of iOS and Android the ability of third party developers to create great experiences are being further curtailed in the name of privacy and battery life. That might be OK if Apple and Google didn’t have their own apps that compete for usage such as maps or transportation, but they do.Apple and Google are using the same playbook that Microsoft applied so successfully during the PC era: they are using private or automatically whitelisted APIs to make their apps work faster and better than what third parties can achieve. Their apps are of course also pre-installed on the devices for further advantage. But the real killer this time round is all the information that mobile devices are constantly gathering on user activity and location. This information is central to building user centric services such as digital assistants. And so as always when there is too much market power, it is consumers who will lose out. As a enduser, I would like many more options for digital assistants and high quality productivity applications in my life than Google and Apple.So what is to be done? I see two paths. The short term one is regulatory. The long term one is through competition from a true open source OS (yes Android is open source but Google has been sucking more functionality into the proprietary Google Play Services). Somewhere in between is the hope that another existing big company gets their act together on a mobile OS but I am doubtful there.
  • Uncertainty Wednesday: Entropy (Cont’d) June 7, 2017 10:56 am
    Last week in Uncertainty Wednesday, I introduced Shannon entropy as a measure of uncertainty that is based solely on the structure of the probability distribution. As a quick reminder, the formula isH = - K  ∑ pi log pi   where  ∑ is over the i = 1…n of the probability distributionNow you may notice a potential problem here if the distribution includes a probability p that approaches 0, because log p will go to infinity. If you know limits and remember L'Hôpital’s rule, you can convince yourself that the limit of p log p → 0 as p → 0 (start be rewriting as log p / (1/p) then apply L'Hôpital). Because of this when we compute entropy we will define p log p = 0 for p = 0.This lets us now easily graph what H looks like as a function of p for the simple case of only two states A and B, where we have p1 = P(A) and p2 = P(B) = 1 - p1. Here is a graph that has p1 as the x-axis and H as the y-axisWe see that the entropy H is maximized for p1 = 0.5 (and hence p2 = 0.5). Meaning: uncertainty is at a maximum when both states A and B are equally likely.There is an important converse conclusion here: if you want to assume maximum uncertainty about something you should assume that both (or if there are more than two, all) states are equally likely. This assumption best represents “not knowing anything” (other than the number of states). How is this possibly useful? Take an asset like bitcoin as an example. If you want to assume maximum uncertainty, you should assume that the price is equally likely to go up as it is to go down.Something else we see from the shape of the H(p) function is that at p = 0.5 the first derivative is zero and so small changes in p correspond to small changes in H. But as we get closer to the “edge” on either side, the same absolute change in p results in a much bigger change in H. You may recall our earlier analysis of the sensitivity and specificity of tests. We now have a measure of how much uncertainty reduction we get from a test and see that it depends on where we start with the least reduction occurring at maximum uncertainty.Next week we will look at the relationship between uncertainty as measured by entropy and the cost of communicating information (communication is the context in which Shannon came up with the entropy measure).