seeing the same movies?

Discuss your experiences with and ideas about Stardust@home here.

Moderators: Stardust@home Team, DustMods

Chuck Crisler
Posts: 30
Joined: Wed Aug 02, 2006 9:44 am
Location: Windham, NH

seeing the same movies?

Post by Chuck Crisler »

I wonder how often we see the exact same movie? There are a few that I think I recognize now. It seems reasonable that people would be shown the same real movie to see if they get the same answer. So I also wonder how often I do answer the same??? I believe that my understanding of what I am seeing has dramatically changed over the months, hopefully it is now much more accurate.
jsmaje
Posts: 616
Joined: Tue Aug 15, 2006 8:39 am
Location: Manchester UK

Post by jsmaje »

This issue has arisen before; see for example http://stardustathome.ssl.berkeley.edu/ ... ght=#10895.
I've decided to make a healthier low-fat, salt-free version of my sim as a result of the observations in that thread, and will post it here when ready.

Meanwhile, it remains true that the frequency of repeats in a random delivery process can be much higher than many people imagine (see e.g. Wikipedia's entry on the "Poisson distribution").

I doubt it's due to the team wanting to check up on our consistency, though I guess they could if they wanted to!

John
jsmaje
Posts: 616
Joined: Tue Aug 15, 2006 8:39 am
Location: Manchester UK

seeing the same movies?

Post by jsmaje »

Here's my new version of the sim: http://www.jsmaje.btinternet.co.uk/Stardust_sim1.htm

Issues such as calibration and bad focus movies are ignored, since it is simply intended to show how often one can expect to be presented with the same movie more than once in a random delivery process. The most readily available data (unless you've kept a personal tally of every movie seen) is your Events list.

So, first paste your Events list/page into the text area*. The total number of movies listed and how many times the same movie has been clicked on and listed again will be automatically calculated.

Certain movies have of course been withdrawn/introduced over time, and everyone's exposure to this changing 'pool' will have been different. I've set 100,000 as the default value; this is almost certainly inaccurate & up for debate, but you can edit it as seen fit (the larger the number, the slower the program).

And what percentage of these could be considered positive, i.e. worth clicking, depends on your own criteria (the proportion of officially-considered or eventually-determined positives is beside the point). To ensure a statistically sensible result, trial and error is required to find a figure that results in about the same number of unique simulated and unique actual movies. This turned out to be about 3.0% for me, but again that may be changed.

After clicking 'Run' the results page then summarises your inputs and shows what the sim predicts: the number of times that the same simulated movie was selected, and what your actual Events list figures are (this may take a bit of time depending on your computer speed; simply click 'No' if you get a 'Stop this script?' message).

In my case, to date I've viewed 51,448 real movies and clicked on 911, of which 796 were unique. Using 3% for my 'positivity' figure (to get close to the actual number of unique movies), ten runs of the sim gave the following average results:
  • 1 (i.e. selected once only): 685.4 (actual Events = 693)
    2 (twice): 95.0 (92)
    3 (three times): 11.6 (10)
    4 (etc...): 0.4 (1)
    5-10: 0 (0)
    [= 792.4 (796) unique]
Not a bad match, and sufficient to emphasise that repeats, especially doubles, are to be expected in any genuinely random process.

Try it a few times, tweak the inputs, and see how well the predictions match your actual events (no checks are made for sensible inputs though, so remember GIGO: garbage in, garbage out!)

* the easiest way to do this is to go to your Events page, press Ctrl+A to select it all, followed by Ctrl+C to copy it, then launch the sim and press Ctrl+V (ensuring the cursor is in the Paste box).
fjgiie
DustMod
Posts: 1253
Joined: Sat May 20, 2006 8:47 am
Location: Hampton, SC, US

Post by fjgiie »

_______Stardust@home repeat movie simulation
Number of real movies viewed = 193,944
Number of movies in Events list = 405
Number of well-focussed real movies available = 140,000
Percent you consider may be positive = 0.3%, hence potential positives = 420

Number of times simulated movies selected (262 unique):

_1___2___ 3___ 4___ 5___ 6___ 7___ 8___ 9___ 10
155_ 70__ 22___5___ 2___ 0___ 0___ 0___ 0____0


Actual number of times movies in Events list selected (336 unique):

_1___2___ 3___ 4___ 5___ 6___ 7___ 8___ 9___ 10
293_ 25__ 13___3___ 1___ 1___ 0 ___0___ 0____0

This is an average of 10 runs.(maybe a little jijo)
jsmaje
Posts: 616
Joined: Tue Aug 15, 2006 8:39 am
Location: Manchester UK

Post by jsmaje »

Hi fjgiie, thanks for trying the sim. If you increase your 'considered positive' figure a bit, to get a better match between the number of simulated and actual (336) uniques in order to make statistical sense, I'm sure the match will be better. You are clearly a lot more picky than me, but did you have any particular good reason to settle on 0.3%?

It would be interesting to see just how wide the range is for this figure amongst different dusters (you've already shown it could be as much as an order of magnitude!)
fjgiie
DustMod
Posts: 1253
Joined: Sat May 20, 2006 8:47 am
Location: Hampton, SC, US

Post by fjgiie »

I tried 0.7% and got this result. It seems difficult to get a result that gives five or six viewings of the same movie even running it several times. One of each is not many, correct?

jsmaje, you asked obout the 0.3 % - that is what I came up with when dividing number of events by total movies viewed and X100%. A little over 0.2

I like this sim better than the previous one. :)
jsmaje
Posts: 616
Joined: Tue Aug 15, 2006 8:39 am
Location: Manchester UK

Post by jsmaje »

Fjgiie, was that an average of several runs or the chance result of just one?

At least part of the problem with your initial 0.3% figure is (I think) because you didn't take into account the proportion of real movies you may consider 'bad focus'.

"One of each is not many, correct?" - correct! The problem is that while the probability of getting pregnant might be quite small, once pregnant it's a done deal for the next 9 months!

The first sim included the ability to automatically average several runs; it just takes longer (with lots of annoying 'Stop this script?' messages, at least in IE; not so much, if at all, in Mozilla/Netscape). But I'll add this back in tomorrow-ish!

Theoretically, the sim could run internally as many times as necessary to adjust the 'positivity' figure itself, but, being in javascript, this really would take an innordinate amount of time.

John
fjgiie
DustMod
Posts: 1253
Joined: Sat May 20, 2006 8:47 am
Location: Hampton, SC, US

New simulation for repeat movies

Post by fjgiie »

jsmaje wrote:...was that an average of several runs or the chance result of just one?
That link was one movie out of ten but the others with the same information jumped above and below as far as the number of simulated and actual uniques.

I have run many combinations, changing "Percent you consider may be positive" from 0.2% to 1.0% and "Number of well-focussed real movies available" from 110,000 to 140,000. Only one "run" of your sim gave six views on a single movie and that was the first trial using 0.3% and 140,000 movies available. So I could have used 0.1 for the average for six views but I just rounded to "0".
At least part of the problem with your initial 0.3% figure is (I think) because you didn't take into account the proportion of real movies you may consider 'bad focus'.
If I had added 10% bad focus to real movies viewed, that would just have made the percentage smaller than 0.2%. The "Real movies viewed" number does not go higher with a click on bad focus. And I have noticed that 5 or 6 views will show on the sim if a very small "percent you consider positive" like 0.2% is used but then "simulated movies selected" ends up smaller than "actual number of times movies in Events list selected".

With my events list used, 0.7% and 140,000 movies in the pool seem to even out both uniques. Also 0.9% and about 120,000 movies will make both uniques close. Not the five or six views per movie though. It's hardly ever more than four views on one movie.

Here is another report using 0.9% and 120,000 movies. This was run three times and only got up to four views once. The uniques were close in all three runs.

One other small thing, the movie pool. The average number of good focused focus movies may be well below 100,000. One post in updates mentioned that 70,000 had been removed. Also our 44 tiles that are now done were not all done at the beginning. So, what if we use 75,000 movies and 0.4%. Then our uniques would be very low. That's a problem.
jsmaje
Posts: 616
Joined: Tue Aug 15, 2006 8:39 am
Location: Manchester UK

Re: New simulation for repeat movies

Post by jsmaje »

fjgiie wrote:The "Real movies viewed" number does not go higher with a click on bad focus.
Exactly, so you'll need to have had a higher 'positivity' rate in order to find x number of unique movies within the smaller subset of real movies (i.e. the number recorded as 'viewed') than you were actually exposed to (i.e. including those you clicked on as 'bad focus').

As well as allowing automatic averaging of several runs, I've now added an input for what % you may consider 'bad focus'. Remember that there were many more poorly-focussed movies earlier in the project, so the sooner you started the higher the % likely for this figure; I've guessed at 10% as a default. This means that the number entered for movies available should now be the estimated total number of reals in the pool to which you have been exposed (which has of course also been changing over time, well-focussed or not). And for the reason explained above, I've therefore had to increase my personal % considered positive to 4.0.
Unfortunately this all risks getting as confusing as the first sim; so much for my promised 'slimline' version!
So I could have used 0.1 for the average for six views but I just rounded to "0".
Naughty! Rounding down a chance of 0.1 to zero is equivalent to saying that a rare coincidence cannot happen, and if it did it must be for some spooky reason.
Using my figures, 100 runs of the sim produced 28 instances of the same movie being seen 4 times (i.e. a 28% or 0.28 chance). Rounding this off to the nearest whole number would of course be be zero, yet one has in fact been listed 4 times. To re-iterate, the chances of anything particular happening may be small, but once it has happened there's no going back.
[this seems to be a reality that many people find difficult to grasp (not you Fjgiie); I could go on about the illusion of creationism vs. evolution for example, but will restrain myself!]

Here's the new version: http://www.jsmaje.btinternet.co.uk/Stardust_sim2.htm
Have fun, John
fjgiie
DustMod
Posts: 1253
Joined: Sat May 20, 2006 8:47 am
Location: Hampton, SC, US

Simulation Run 0.7

Post by fjgiie »

Ok jsmaje,

These results were obtained using Firefox.

140,000 movies and 13% bad focus. Bad focus for me runs about 9.5% of movies served and 12.68% of non calibration movies.
Nikita
DustMod
Posts: 994
Joined: Wed May 17, 2006 8:33 pm
Location: Indiana, USA

Post by Nikita »

Why do I feel like I just stepped back into "Statistics and Measurements" in college! :shock:
From dust we come
Bram
Posts: 8
Joined: Thu Oct 26, 2006 1:39 pm
Location: Amsterdam the Netherlands

Post by Bram »

I had a similar feeling, Nikita. But then it is all about having fun isn't it? :wink:
Nikita
DustMod
Posts: 994
Joined: Wed May 17, 2006 8:33 pm
Location: Indiana, USA

Post by Nikita »

I guess the best part is that there isn't a test afterwards!
...or is there? :D
From dust we come
jsmaje
Posts: 616
Joined: Tue Aug 15, 2006 8:39 am
Location: Manchester UK

Post by jsmaje »

So Nikita & Bram, what results do you get with it?
This doesn't need to be a private bit of fun between just me and Fjgiie!
Bram
Posts: 8
Joined: Thu Oct 26, 2006 1:39 pm
Location: Amsterdam the Netherlands

Post by Bram »

I guess Nikita and I saw that coming. My policy has always been not to enter into something "Fishy". :lol:
Post Reply