Skip to content
28 October 2008 / erikduval

The Snowflake Number: an Experiment in Open Science

I’ve posted before about the idea of a snowflake number that helps measure mass hyper-personalization. With My Great Friends KatrienXavier and Wayne, we are developing this idea a bit further in the form of a paper for the Web Science conference.

This blog has also carried before some posts on science2.0 and how we can be more effective and efficient when we do our research in an open way.

So, with this post, we’d like to apply an open science approach to snowflake. The current version of our submission is up on scribd. (Can’t seem to get the embed to work on 😦 )

We’d welcome your early feedback and comments. Any input that results in a change to the paper (or our thinking!) will be acknowledged.

BTW, the paper is already a bit too long, as the submission should be no more than 2 pages. Anybody knows how to make the left bottom corner of the first page less “empty”? Also, any ideas on how to better share a LaTeX paper?

The submission deadline for WebSci09 is this Friday, 31 October. Of course, we are still interested in comments and comments after that deadline – but earlier is better 😉



Leave a Comment
  1. Moritz Stefaner / Nov 3 2008 6:18 pm

    Hope this is the right spot for some comments on the paper.

    I find the idea original and thought-provoking. However, if I understood the paper right, I believe the measure introduced has some undesirable properties:

    * Using integers instead of real numbers for the “uniqueness degree” limits the expressiveness of the measure to only a couple of bins. Also, I expect the snowflake numbers to be very low usually (for long-tailed distributions for sure). So very many people will share a number – a true measure of uniqueness?

    * By basing the measure on an existential condition, you can get sudden jumps in the measure. By introducing ONE new tag, you can go from Infinite (not unique) to 1(very unique). Desirable?

    * You don’t take the overall uniqueness of items/features (tags, books, movies etc) into account. Each item counts the same. So I could have a quite unusual taste, but a high snowflake number (because a few others have a similar taste), while someone with a total mainstream taste, but ONE very unusual item (outlier) has a higher number. Doesn’t sound fair to me 😉

    * I think for probabilistic reasons, there will be a bias of “more data => lower snowflake number” per user. Can this be accounted for somehow?

    I guess, my counter–proposal would be a vector-space based measure, as is commonly used in colaborative filtering etc. This would allow to calculate a whole couple of metrics (how wide-spread is my taste, how unusual overall, am I in the main cluster right in the middle, or at the core of a peripheral cluster etc.) I guess, it could also serve to produce ONE number to measure “mainstreaminess”, e.g. distance to average/median point. But I haven’t quite thought this through 🙂


  1. Experiment « Long Slow Chat
  2. Standards for Technology Enhanced Learning « Erik Duval’s Weblog

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: