I've been pondering the question that Mikey has raised here, i.e. how many movies sent at random one can expect to see more than once. Or at least, how many in one's Events list are likely to be repeats.
Not being a mathematician or statistician I decided to write a simple simulation. It requires that you input the following:
(1) total number of movies available
(2) % that are CMs
(3) % of real movies considered of interest (which I call 'clickability')
(4) number of randomly-delivered real movies you have viewed
(5) number of movies in your Events list
(6) number of times to run the sim.
The output then predicts the frequency that your selected movies would be repeated.
Input (3), clickability, depends on both the (as yet unknown) number of tracks there might be, as well as one's personal threshold regarding what may be worth clicking. It turns out that the precise value chosen strongly effects the resulting frequency distribution - the more likely you are to click, the less likely you'll have repeats, since your Events list will be filled sooner with unique movies. Having guessed from personal experience at somewhere between 1 - 2%, I tried various values in between.
Taking reasonable numbers for total movies (160000), CMs (25%), plus my current number of real movies viewed (28275), number of Events (374), and using a value of 1.5% for clickability, the average results after 20 runs were as follows (figures to nearest decimal):
302.9 seen once, 32.2 twice, 2.0 three times, 0.2 four times, 0 times thereafter (total 374, of which 337.3 unique).
My actual figures are:
287 seen once, 37 twice, 3 three times, 1 four times, 0 times thereafter (total 374, of which 328 unique).
While a fairly good match, allowing for chance, this did of course depend on the value chosen for clickability. In order to test my guess, I then plugged fjgiie's recent figures in (160000 total, 25% CMs, 90000 reals viewed, 230 Events of which 212 unique) using the same 1.5% value for clickability. The average results after 20 runs were:
203.2 seen once, 12.7 twice, 0.5 three times, 0 times thereafter (total 230, of which 216.4 unique).
Fjgiie's quoted figures were:
198 seen once, 11 twice, 2 three times, 1 four times, 0 thereafter (total 230, of which 212 unique).
Again a good match, particularly the number of doubles.* I can therefore feel some confidence in the program, and the lesson it teaches: that when dealing with random events, coincidences are much more likely than might be intuitive. Indeed,
lack of coincidence (e.g. repeat movies) is a good reason to raise suspicions about a system being artificially manipulated.
If anyone wants to plug their own figures in and see how the sim's prediction compares with their own experience, or indeed to correct/improve my very amateur programming, it's
avaliable here (guaranteed virus-free, but only partially annotated, and no checking for sensible inputs, so remember RIRO! Being written in javascript and very iterative, it's also rather slow - I suggest you try just one run first to see how long it takes).
John Smaje
* [And this despite our quite different selection rates (me: 374/28275 = 1.32%; fjgiie: 230/90000 = 0.26%).
In fact, running the sim with our event numbers swapped around required that 'clickability' be adjusted to about 0.4% in my case and 1.4% for fjgiie in order to approximate our actual figures, a ratio of 1 to 3.5. This could partly be accounted for by our personal thresholds of course (e.g. for subtle/maybe inclusions), but my personal experience suggests we share fairly similar selection criteria. The other component of clickability is the
actual number of tracks, and we can't
both be right about their frequency! So, an analogy for my longer list length could be: we both like the same
type of cheese, but I
buy that type of cheese more frequently! Does this make sense?]
.