seeing the same movies?
Moderators: Stardust@home Team, DustMods
-
- Posts: 30
- Joined: Wed Aug 02, 2006 9:44 am
- Location: Windham, NH
seeing the same movies?
I wonder how often we see the exact same movie? There are a few that I think I recognize now. It seems reasonable that people would be shown the same real movie to see if they get the same answer. So I also wonder how often I do answer the same??? I believe that my understanding of what I am seeing has dramatically changed over the months, hopefully it is now much more accurate.
This issue has arisen before; see for example http://stardustathome.ssl.berkeley.edu/ ... ght=#10895.
I've decided to make a healthier low-fat, salt-free version of my sim as a result of the observations in that thread, and will post it here when ready.
Meanwhile, it remains true that the frequency of repeats in a random delivery process can be much higher than many people imagine (see e.g. Wikipedia's entry on the "Poisson distribution").
I doubt it's due to the team wanting to check up on our consistency, though I guess they could if they wanted to!
John
I've decided to make a healthier low-fat, salt-free version of my sim as a result of the observations in that thread, and will post it here when ready.
Meanwhile, it remains true that the frequency of repeats in a random delivery process can be much higher than many people imagine (see e.g. Wikipedia's entry on the "Poisson distribution").
I doubt it's due to the team wanting to check up on our consistency, though I guess they could if they wanted to!
John
seeing the same movies?
Here's my new version of the sim: http://www.jsmaje.btinternet.co.uk/Stardust_sim1.htm
Issues such as calibration and bad focus movies are ignored, since it is simply intended to show how often one can expect to be presented with the same movie more than once in a random delivery process. The most readily available data (unless you've kept a personal tally of every movie seen) is your Events list.
So, first paste your Events list/page into the text area*. The total number of movies listed and how many times the same movie has been clicked on and listed again will be automatically calculated.
Certain movies have of course been withdrawn/introduced over time, and everyone's exposure to this changing 'pool' will have been different. I've set 100,000 as the default value; this is almost certainly inaccurate & up for debate, but you can edit it as seen fit (the larger the number, the slower the program).
And what percentage of these could be considered positive, i.e. worth clicking, depends on your own criteria (the proportion of officially-considered or eventually-determined positives is beside the point). To ensure a statistically sensible result, trial and error is required to find a figure that results in about the same number of unique simulated and unique actual movies. This turned out to be about 3.0% for me, but again that may be changed.
After clicking 'Run' the results page then summarises your inputs and shows what the sim predicts: the number of times that the same simulated movie was selected, and what your actual Events list figures are (this may take a bit of time depending on your computer speed; simply click 'No' if you get a 'Stop this script?' message).
In my case, to date I've viewed 51,448 real movies and clicked on 911, of which 796 were unique. Using 3% for my 'positivity' figure (to get close to the actual number of unique movies), ten runs of the sim gave the following average results:
Try it a few times, tweak the inputs, and see how well the predictions match your actual events (no checks are made for sensible inputs though, so remember GIGO: garbage in, garbage out!)
* the easiest way to do this is to go to your Events page, press Ctrl+A to select it all, followed by Ctrl+C to copy it, then launch the sim and press Ctrl+V (ensuring the cursor is in the Paste box).
Issues such as calibration and bad focus movies are ignored, since it is simply intended to show how often one can expect to be presented with the same movie more than once in a random delivery process. The most readily available data (unless you've kept a personal tally of every movie seen) is your Events list.
So, first paste your Events list/page into the text area*. The total number of movies listed and how many times the same movie has been clicked on and listed again will be automatically calculated.
Certain movies have of course been withdrawn/introduced over time, and everyone's exposure to this changing 'pool' will have been different. I've set 100,000 as the default value; this is almost certainly inaccurate & up for debate, but you can edit it as seen fit (the larger the number, the slower the program).
And what percentage of these could be considered positive, i.e. worth clicking, depends on your own criteria (the proportion of officially-considered or eventually-determined positives is beside the point). To ensure a statistically sensible result, trial and error is required to find a figure that results in about the same number of unique simulated and unique actual movies. This turned out to be about 3.0% for me, but again that may be changed.
After clicking 'Run' the results page then summarises your inputs and shows what the sim predicts: the number of times that the same simulated movie was selected, and what your actual Events list figures are (this may take a bit of time depending on your computer speed; simply click 'No' if you get a 'Stop this script?' message).
In my case, to date I've viewed 51,448 real movies and clicked on 911, of which 796 were unique. Using 3% for my 'positivity' figure (to get close to the actual number of unique movies), ten runs of the sim gave the following average results:
- 1 (i.e. selected once only): 685.4 (actual Events = 693)
2 (twice): 95.0 (92)
3 (three times): 11.6 (10)
4 (etc...): 0.4 (1)
5-10: 0 (0)
[= 792.4 (796) unique]
Try it a few times, tweak the inputs, and see how well the predictions match your actual events (no checks are made for sensible inputs though, so remember GIGO: garbage in, garbage out!)
* the easiest way to do this is to go to your Events page, press Ctrl+A to select it all, followed by Ctrl+C to copy it, then launch the sim and press Ctrl+V (ensuring the cursor is in the Paste box).
_______Stardust@home repeat movie simulation
Number of real movies viewed = 193,944
Number of movies in Events list = 405
Number of well-focussed real movies available = 140,000
Percent you consider may be positive = 0.3%, hence potential positives = 420
Number of times simulated movies selected (262 unique):
_1___2___ 3___ 4___ 5___ 6___ 7___ 8___ 9___ 10
155_ 70__ 22___5___ 2___ 0___ 0___ 0___ 0____0
Actual number of times movies in Events list selected (336 unique):
_1___2___ 3___ 4___ 5___ 6___ 7___ 8___ 9___ 10
293_ 25__ 13___3___ 1___ 1___ 0 ___0___ 0____0
This is an average of 10 runs.(maybe a little jijo)
Number of real movies viewed = 193,944
Number of movies in Events list = 405
Number of well-focussed real movies available = 140,000
Percent you consider may be positive = 0.3%, hence potential positives = 420
Number of times simulated movies selected (262 unique):
_1___2___ 3___ 4___ 5___ 6___ 7___ 8___ 9___ 10
155_ 70__ 22___5___ 2___ 0___ 0___ 0___ 0____0
Actual number of times movies in Events list selected (336 unique):
_1___2___ 3___ 4___ 5___ 6___ 7___ 8___ 9___ 10
293_ 25__ 13___3___ 1___ 1___ 0 ___0___ 0____0
This is an average of 10 runs.(maybe a little jijo)
Hi fjgiie, thanks for trying the sim. If you increase your 'considered positive' figure a bit, to get a better match between the number of simulated and actual (336) uniques in order to make statistical sense, I'm sure the match will be better. You are clearly a lot more picky than me, but did you have any particular good reason to settle on 0.3%?
It would be interesting to see just how wide the range is for this figure amongst different dusters (you've already shown it could be as much as an order of magnitude!)
It would be interesting to see just how wide the range is for this figure amongst different dusters (you've already shown it could be as much as an order of magnitude!)
I tried 0.7% and got this result. It seems difficult to get a result that gives five or six viewings of the same movie even running it several times. One of each is not many, correct?
jsmaje, you asked obout the 0.3 % - that is what I came up with when dividing number of events by total movies viewed and X100%. A little over 0.2
I like this sim better than the previous one.
jsmaje, you asked obout the 0.3 % - that is what I came up with when dividing number of events by total movies viewed and X100%. A little over 0.2
I like this sim better than the previous one.
Fjgiie, was that an average of several runs or the chance result of just one?
At least part of the problem with your initial 0.3% figure is (I think) because you didn't take into account the proportion of real movies you may consider 'bad focus'.
"One of each is not many, correct?" - correct! The problem is that while the probability of getting pregnant might be quite small, once pregnant it's a done deal for the next 9 months!
The first sim included the ability to automatically average several runs; it just takes longer (with lots of annoying 'Stop this script?' messages, at least in IE; not so much, if at all, in Mozilla/Netscape). But I'll add this back in tomorrow-ish!
Theoretically, the sim could run internally as many times as necessary to adjust the 'positivity' figure itself, but, being in javascript, this really would take an innordinate amount of time.
John
At least part of the problem with your initial 0.3% figure is (I think) because you didn't take into account the proportion of real movies you may consider 'bad focus'.
"One of each is not many, correct?" - correct! The problem is that while the probability of getting pregnant might be quite small, once pregnant it's a done deal for the next 9 months!
The first sim included the ability to automatically average several runs; it just takes longer (with lots of annoying 'Stop this script?' messages, at least in IE; not so much, if at all, in Mozilla/Netscape). But I'll add this back in tomorrow-ish!
Theoretically, the sim could run internally as many times as necessary to adjust the 'positivity' figure itself, but, being in javascript, this really would take an innordinate amount of time.
John
New simulation for repeat movies
That link was one movie out of ten but the others with the same information jumped above and below as far as the number of simulated and actual uniques.jsmaje wrote:...was that an average of several runs or the chance result of just one?
I have run many combinations, changing "Percent you consider may be positive" from 0.2% to 1.0% and "Number of well-focussed real movies available" from 110,000 to 140,000. Only one "run" of your sim gave six views on a single movie and that was the first trial using 0.3% and 140,000 movies available. So I could have used 0.1 for the average for six views but I just rounded to "0".
If I had added 10% bad focus to real movies viewed, that would just have made the percentage smaller than 0.2%. The "Real movies viewed" number does not go higher with a click on bad focus. And I have noticed that 5 or 6 views will show on the sim if a very small "percent you consider positive" like 0.2% is used but then "simulated movies selected" ends up smaller than "actual number of times movies in Events list selected".At least part of the problem with your initial 0.3% figure is (I think) because you didn't take into account the proportion of real movies you may consider 'bad focus'.
With my events list used, 0.7% and 140,000 movies in the pool seem to even out both uniques. Also 0.9% and about 120,000 movies will make both uniques close. Not the five or six views per movie though. It's hardly ever more than four views on one movie.
Here is another report using 0.9% and 120,000 movies. This was run three times and only got up to four views once. The uniques were close in all three runs.
One other small thing, the movie pool. The average number of good focused focus movies may be well below 100,000. One post in updates mentioned that 70,000 had been removed. Also our 44 tiles that are now done were not all done at the beginning. So, what if we use 75,000 movies and 0.4%. Then our uniques would be very low. That's a problem.
Re: New simulation for repeat movies
Exactly, so you'll need to have had a higher 'positivity' rate in order to find x number of unique movies within the smaller subset of real movies (i.e. the number recorded as 'viewed') than you were actually exposed to (i.e. including those you clicked on as 'bad focus').fjgiie wrote:The "Real movies viewed" number does not go higher with a click on bad focus.
As well as allowing automatic averaging of several runs, I've now added an input for what % you may consider 'bad focus'. Remember that there were many more poorly-focussed movies earlier in the project, so the sooner you started the higher the % likely for this figure; I've guessed at 10% as a default. This means that the number entered for movies available should now be the estimated total number of reals in the pool to which you have been exposed (which has of course also been changing over time, well-focussed or not). And for the reason explained above, I've therefore had to increase my personal % considered positive to 4.0.
Unfortunately this all risks getting as confusing as the first sim; so much for my promised 'slimline' version!
Naughty! Rounding down a chance of 0.1 to zero is equivalent to saying that a rare coincidence cannot happen, and if it did it must be for some spooky reason.So I could have used 0.1 for the average for six views but I just rounded to "0".
Using my figures, 100 runs of the sim produced 28 instances of the same movie being seen 4 times (i.e. a 28% or 0.28 chance). Rounding this off to the nearest whole number would of course be be zero, yet one has in fact been listed 4 times. To re-iterate, the chances of anything particular happening may be small, but once it has happened there's no going back.
[this seems to be a reality that many people find difficult to grasp (not you Fjgiie); I could go on about the illusion of creationism vs. evolution for example, but will restrain myself!]
Here's the new version: http://www.jsmaje.btinternet.co.uk/Stardust_sim2.htm
Have fun, John
Simulation Run 0.7
Ok jsmaje,
These results were obtained using Firefox.
140,000 movies and 13% bad focus. Bad focus for me runs about 9.5% of movies served and 12.68% of non calibration movies.
These results were obtained using Firefox.
140,000 movies and 13% bad focus. Bad focus for me runs about 9.5% of movies served and 12.68% of non calibration movies.