Multiple viewings of FOVs: statistical analysis, conclusions

Discuss your experiences with and ideas about Stardust@home here.

Moderators: Stardust@home Team, DustMods

Post Reply
studebaker
Posts: 4
Joined: Mon Aug 07, 2006 4:21 pm

Multiple viewings of FOVs: statistical analysis, conclusions

Post by studebaker »

The fact that some of the movies identified in each of our "My Events" stats have been viewed many times (as noted here in the forum in another topic) invites some analysis.

Some observations:
In my "My events" section of statistics, I've flagged five funny-looking movies. They've been viewed an average of 76 times each. The variance in the number of total viewings for each of my five movies is consistent with a standard deviation of sqrt(76) times, or about 8.7. Those of you who have watched more movies and flagged more events would be able to refine this estimate. The consistency with sqrt(N) statistics suggests that the movies are being assigned randomly and that the stardust team has not continued to upload significant amounts of new data, although I could be limited by my use of only my five candidate movies.

I can also estimate the total number of real movies viewed by the total number of volunteers. I downloaded the score rankings of the top 100 people, inserted my own ranking, and then put in the total number of volunteers (9484 as of this morning) with an estimate of a relatively small score for the last bunch of people. You can correct the score value by a factor to generate the number of real movies viewed (multiply by about 2.5). Then I integrated the total number of movies viewed by the volunteer pool to get an estimate of about 5.1e6 (5 million) real movies viewed. The top 100 scores account for about 472000 movies. The estimate is a bit uncertain, but it is at least 3.8 million movies viewed, since my own score and rank is known, and somewhere in the big middle of the distribution.

The large number of views per movie suggests that we are dealing with a much smaller set of real movies loaded so far into the database made available for volunteer viewing. That number of real scans is between 3.8e6/76 and 5.1e6/76
or 50000 to 67000.

Now -- the following are conclusions that I draw from these numbers. The stardust site estimates a bit less than 1 million total movie scans for the whole collector, in which there may be something like 45 real interstellar dust particles (their estimate). In the number of movies we are looking at here, that would mean we should find something like 2.2 to 3.0 real interstellar dust particle tracks. Now on average, each movie has been viewed like 76 times, so the stardust team has quite good statistics on each flagged movie. If my flagged rate to total real movie view rate is comparable to others, there ought to be something like 760 - 1020 movies flagged as potential tracks, each viewed 76(+/- 8.7) times. That number of viewings ought to give the stardust team plenty of ammunition to figure out which movies are worth looking at much more closely.

Conclusions:

1. This is a very small subset of the total amount of data to come.
2. We knew that already based on the post in the forum by the stardust team member showing the map of the collector already scanned and uploaded.
3. The team has good statistics on potential candidates out of this small data subset.
3a. The improvement on reliability of identifying potential candidate tracks will not improve significantly, unless there is something really weird about how the stardust team collates and interprets information about how many volunteers agree on potential candidates.
4. Volunteers should probably save their energy until a new batch of data becomes available.
5. Stardust team should make a new batch of data available or make some announcement about it.
6. The rate of potential track candidate movies identified should provide some quite good information about the amount of impurity dust and imperfections included in the aerogel collectors.

Comments?
Best regards,

Studebaker
mwhiz
Posts: 95
Joined: Tue May 23, 2006 3:58 pm
Location: Seattle, Washington, USA

Post by mwhiz »

good statistics work! golly. can't wait until i get to take stat... :wink:
"The Earth is the cradle of mankind, but one cannot live in the cradle forever."
~Konstantin Tsiolkovsky
studebaker
Posts: 4
Joined: Mon Aug 07, 2006 4:21 pm

Post by studebaker »

And I should point out that with an anticipated 2.2 to 3.0 real tracks in this data subset, it is entirely possible that we have zero real tracks in here, since sqrt(2.2) = 1.5 and sqrt(3.0) = 1.73. So zero real tracks would be less than two standard deviations, although that's only approximately correct for such small numbers.

I note that nobody has posted anything really "TA-DA!" in the "I think I have a real track" topic. Mostly they're bad focus, or contaminant dust specks.
Kalman
Posts: 11
Joined: Wed Aug 09, 2006 5:18 am
Location: Budapest, Hungary

Re: Multiple viewings of FOVs: statistical analysis, conclus

Post by Kalman »

studebaker wrote:The large number of views per movie suggests that we are dealing with a much smaller set of real movies loaded so far into the database made available for volunteer viewing. That number of real scans is between 3.8e6/76 and 5.1e6/76
or 50000 to 67000.
Well, you justified me in my conclusion. I figured out a very close number only by a rough estimation. So it looks to be sure that the mass of pictures is still coming. :idea:
studebaker wrote:6. The rate of potential track candidate movies identified should provide some quite good information about the amount of impurity dust and imperfections included in the aerogel collectors.


I was meditating about this. I've just started a topic about the little grains that we used to find below the gel surface. May I advertise it? :wink:

http://stardustathome.ssl.berkeley.edu/ ... .php?t=716
--Kalman
icebike

Re: Multiple viewings of FOVs: statistical analysis, conclus

Post by icebike »

studebaker wrote:
The consistency with sqrt(N) statistics suggests that the movies are being assigned randomly and that the stardust team has not continued to upload significant amounts of new data, although I could be limited by my use of only my five candidate movies.
...

The large number of views per movie suggests that we are dealing with a much smaller set of real movies loaded so far into the database made available for volunteer viewing. That number of real scans is between 3.8e6/76 and 5.1e6/76
or 50000 to 67000.
My take on this is far simpler than yours.

Movies flagged by anyone are (as has been stated in the web site and by the dustmods) sent out to at least 100 additional viewers.

My theory is these confirmation-candidates go to the front of the queue for any given viewer, so as to front-load the potential for discovery of strong candidates.

This is probably by design, because it allows early particle detection, and early evaluation of the quality of our work.

So rather than saying anything about the the number of movies released, my theory says it speaks to the queueing theory involved.

However, to be fair, and in support of your theory, untill I see a movie number greater than 99999, I have to assume your theory is just as likely as mine.

Its just my idle speculation...
WeBeGood
Posts: 65
Joined: Thu Aug 03, 2006 7:26 am
Location: Texas, USA

Re: Multiple viewings of FOVs: statistical analysis, conclus

Post by WeBeGood »

studebaker wrote: The large number of views per movie suggests that we are dealing with a much smaller set of real movies loaded so far into the database made available for volunteer viewing. That number of real scans is between 3.8e6/76 and 5.1e6/76
or 50000 to 67000.
Studebaker

You can look at any movie by changing the number on the viewer url. Initially, the last movie was number 57546. Recently, some movies were added and the last number avialable increased to 61874. So your analysis is correct.

And currently 62477
Courtesy E-Mail Welcome @ WeBeGood@GMail.Com
templar781
Posts: 38
Joined: Thu Aug 03, 2006 9:19 am
Location: St. Petersburg Florida
Contact:

Re: Multiple viewings of FOVs: statistical analysis, conclus

Post by templar781 »

[quote="WeBeGood
You can look at any movie by changing the number on the viewer url. Initially, the last movie was number 57546. Recently, some movies were added and the last number avialable increased to 61874. So your analysis is correct.

And currently 62477
[/quote]

It looks like the Stardust team caught on and changed the movie numbering scheme. The current movie that I have is 8069163V1. A couple days ago I saw the numbering scheme change from basic numbers like 62477 to the larger number with the V1 on the end, but the movie number in the URL stayed basic. This morning when I started I noticed that the more complex numbering scheme was included in the URL as well as the movie page itself. Also the number sequence of the complex numbers seems much more random.
Winning isn't everything, but wanting to win is. - Vince Lombardi
WeBeGood
Posts: 65
Joined: Thu Aug 03, 2006 7:26 am
Location: Texas, USA

Re: Multiple viewings of FOVs: statistical analysis, conclus

Post by WeBeGood »

templar781 wrote:[quote="WeBeGood
You can look at any movie by changing the number on the viewer url. Initially, the last movie was number 57546. Recently, some movies were added and the last number avialable increased to 61874. So your analysis is correct.

And currently 62477
It looks like the Stardust team caught on and changed the movie numbering scheme. The current movie that I have is 8069163V1. A couple days ago I saw the numbering scheme change from basic numbers like 62477 to the larger number with the V1 on the end, but the movie number in the URL stayed basic. This morning when I started I noticed that the more complex numbering scheme was included in the URL as well as the movie page itself. Also the number sequence of the complex numbers seems much more random.[/quote]

I wish they had implemented it in another way and kept the numbering system linear. Like adding new movies continuously at a slow ramdom rate. Even showing the current last movie number, for those who are just looking and not picking. The first batch must be over 100 views by now.

Random numbers for the tracking tool, but linear numbers for the MyEvents would not have broken all the links in the discussions. But, they are short handed and doing a great job.
Courtesy E-Mail Welcome @ WeBeGood@GMail.Com
Kalman
Posts: 11
Joined: Wed Aug 09, 2006 5:18 am
Location: Budapest, Hungary

Post by Kalman »

Random numbers for the tracking tool, but linear numbers for the MyEvents would not have broken all the links in the discussions.
I must revise my viewpoint. It is very likely that the 5 digit numbers stand only for marked movies.
And it would explain, why numbers are increasing continuously. Sooner or later we reverse-engineer the database, hehe...
--Kalman
Post Reply