Dupilicate Movies in "My Events"

Post here if you are having any kind of problem with the Stardust@home website.

Moderators: Stardust@home Team, DustMods

Post Reply
mguile
Posts: 4
Joined: Tue Sep 19, 2006 4:58 pm
Location: Thompson CT

Dupilicate Movies in "My Events"

Post by mguile »

Several weeks ago I selected Movie 6690749V1 as having a potential track. A view days later its status was change to "as a candidate". About a week later Movie 6690749V1 showed up again in My Events" listing "as a candidate". It shows up twice in my listing. Any idea why? Unless I have been sent the movie a second time and again identified the potential track again.
Groundling
Posts: 65
Joined: Sat Aug 05, 2006 6:55 pm
Location: Oregon, USA

Duel entries

Post by Groundling »

Hi mguile;
I also have a couple of strange entries in my "events" folder. In my case, I entered a hit a couple of weeks ago, and then I found the same movie which I selected again. When I looked at my events folder I found the same number of "views" and the same number of "hits".
This should have increased by at least one, since I hit this movie twice myself.
I do not worry much about this since the program to record my input is actually pretty privitive in today's computer world.
A strange distraction for me but I don't see it as more than a "glitch".
I just continue on.
Groundling
I have met the enemy and he is us.
Pogo
looping
Posts: 4
Joined: Wed Sep 06, 2006 7:13 am

Scan the same movie 2 times

Post by looping »

I'm sure to have scan 2 times the same movie. Is it normal ?
(Movie id: 3422314V1)
fjgiie
DustMod
Posts: 1253
Joined: Sat May 20, 2006 8:47 am
Location: Hampton, SC, US

Repeat movies

Post by fjgiie »

Hi looping,

The movies are served up at random. Yes, we will get the same one sometimes.
I have clicked twice on the same movie 11 times. One movie (862370V1) was clicked on three times.

So I guess that is normal.

Thanks,

fjgiie
Mikey
Posts: 9
Joined: Thu Oct 26, 2006 1:23 pm

Re: Repeat movies

Post by Mikey »

fjgiie wrote:Hi looping,

The movies are served up at random. Yes, we will get the same one sometimes.
I have clicked twice on the same movie 11 times. One movie (862370V1) was clicked on three times.

So I guess that is normal.

Thanks,

fjgiie
If there are 700,000 movies, then how are we getting so many repeats?

With that many movies, one should never expect to get any repeats.
jsmaje
Posts: 616
Joined: Tue Aug 15, 2006 8:39 am
Location: Manchester UK

Re: Repeat movies

Post by jsmaje »

Mikey wrote:If there are 700,000 movies, then how are we getting so many repeats?
There'll only be 700,000 movies once all the aerogel tiles have been scanned. Up until a few days ago they'd only done about a fifth of them, so we've been seeing the same recycled movies (hence the frequent repeats) for about 3 months now. A few new tiles have just been added, but it has always been said it would take several months (7+ ?) to do the lot.
Mikey
Posts: 9
Joined: Thu Oct 26, 2006 1:23 pm

Re: Repeat movies

Post by Mikey »

jsmaje wrote:
Mikey wrote:If there are 700,000 movies, then how are we getting so many repeats?
There'll only be 700,000 movies once all the aerogel tiles have been scanned. Up until a few days ago they'd only done about a fifth of them, so we've been seeing the same recycled movies (hence the frequent repeats) for about 3 months now. A few new tiles have just been added, but it has always been said it would take several months (7+ ?) to do the lot.
I still don't understand why there would be many duplicates.

On July 13th the project stated that "We've set ourselves a target of having 12 complete tiles (about 50,000 focus movies) ready to search before we open the Virtual Microscope", which opened slowly on August 1st. On August 16th they said they were adding "4000 movies per day."

I would challenge any human to keep up with those numbers.

As of October 16th there had been 20 million searches performed on 175,000 movies. That means each movie should have been viewed, on average, 114 times. There must be a few hundered very active participants out of the current 19,000 to have duplicates occur.

How many people are looking, on average, at 2500 movies per day?

If there are many duplicates, then I am still very curious as to why. What am I missing?
fjgiie
DustMod
Posts: 1253
Joined: Sat May 20, 2006 8:47 am
Location: Hampton, SC, US

Multiple Views

Post by fjgiie »

Mikey wrote:If there are many duplicates, then I am still very curious as to why. What am I missing?
Movie 598677V1 316 72 Passed cut 1 possible interstellar dust particle (viewed three times)
Movie 6065316V1 313 182 Passed cut 1 possible interstellar dust particle (viewed three times)
Movie 862370V1 294 95 Passed cut 1 possible idp (viewed four times)

I have viewed about 90,000 non-calibration movies. Of these 212 movies have been clicked.
11 twice, 2 three times and 1 is a quadruple. That makes my events add to 230. Some of these events have been clicked 300 or more times by searchers.

There may be 160,000 movies that have been scanned. 4000 X 40 tiles. One fourth of 132 is 33.
It would not be unusual to have seen some multiple times.

The final point I would like to make is this: If I have seen movie 862370V1 four times, then there is a movie out there that I have not seen at least four times.

Thanks,

fjgiie
Mikey
Posts: 9
Joined: Thu Oct 26, 2006 1:23 pm

Re: Multiple Views

Post by Mikey »

fjgiie wrote: I have viewed about 90,000 non-calibration movies. Of these 212 movies have been clicked. 11 twice, 2 three times and 1 is a quadruple. That makes my events add to 230. Some of these events have been clicked 300 or more times by searchers.

There may be 160,000 movies that have been scanned. 4000 X 40 tiles. One fourth of 132 is 33. It would not be unusual to have seen some multiple times.

The final point I would like to make is this: If I have seen movie 862370V1 four times, then there is a movie out there that I have not seen at least four times.
fjgiie
I guess it would have been nice if the movies one did already see were not presented again. (You only know about the ones that you clicked on.)

Your "final point" raises another question: Does the project team know which movies are not viewed enough to get good feedback?
jsmaje
Posts: 616
Joined: Tue Aug 15, 2006 8:39 am
Location: Manchester UK

Post by jsmaje »

I've been pondering the question that Mikey has raised here, i.e. how many movies sent at random one can expect to see more than once. Or at least, how many in one's Events list are likely to be repeats.

Not being a mathematician or statistician I decided to write a simple simulation. It requires that you input the following:
(1) total number of movies available
(2) % that are CMs
(3) % of real movies considered of interest (which I call 'clickability')
(4) number of randomly-delivered real movies you have viewed
(5) number of movies in your Events list
(6) number of times to run the sim.
The output then predicts the frequency that your selected movies would be repeated.

Input (3), clickability, depends on both the (as yet unknown) number of tracks there might be, as well as one's personal threshold regarding what may be worth clicking. It turns out that the precise value chosen strongly effects the resulting frequency distribution - the more likely you are to click, the less likely you'll have repeats, since your Events list will be filled sooner with unique movies. Having guessed from personal experience at somewhere between 1 - 2%, I tried various values in between.

Taking reasonable numbers for total movies (160000), CMs (25%), plus my current number of real movies viewed (28275), number of Events (374), and using a value of 1.5% for clickability, the average results after 20 runs were as follows (figures to nearest decimal):
302.9 seen once, 32.2 twice, 2.0 three times, 0.2 four times, 0 times thereafter (total 374, of which 337.3 unique).
My actual figures are:
287 seen once, 37 twice, 3 three times, 1 four times, 0 times thereafter (total 374, of which 328 unique).

While a fairly good match, allowing for chance, this did of course depend on the value chosen for clickability. In order to test my guess, I then plugged fjgiie's recent figures in (160000 total, 25% CMs, 90000 reals viewed, 230 Events of which 212 unique) using the same 1.5% value for clickability. The average results after 20 runs were:
203.2 seen once, 12.7 twice, 0.5 three times, 0 times thereafter (total 230, of which 216.4 unique).
Fjgiie's quoted figures were:
198 seen once, 11 twice, 2 three times, 1 four times, 0 thereafter (total 230, of which 212 unique).

Again a good match, particularly the number of doubles.* I can therefore feel some confidence in the program, and the lesson it teaches: that when dealing with random events, coincidences are much more likely than might be intuitive. Indeed, lack of coincidence (e.g. repeat movies) is a good reason to raise suspicions about a system being artificially manipulated.

If anyone wants to plug their own figures in and see how the sim's prediction compares with their own experience, or indeed to correct/improve my very amateur programming, it's avaliable here (guaranteed virus-free, but only partially annotated, and no checking for sensible inputs, so remember RIRO! Being written in javascript and very iterative, it's also rather slow - I suggest you try just one run first to see how long it takes).

John Smaje

* [And this despite our quite different selection rates (me: 374/28275 = 1.32%; fjgiie: 230/90000 = 0.26%).
In fact, running the sim with our event numbers swapped around required that 'clickability' be adjusted to about 0.4% in my case and 1.4% for fjgiie in order to approximate our actual figures, a ratio of 1 to 3.5. This could partly be accounted for by our personal thresholds of course (e.g. for subtle/maybe inclusions), but my personal experience suggests we share fairly similar selection criteria. The other component of clickability is the actual number of tracks, and we can't both be right about their frequency! So, an analogy for my longer list length could be: we both like the same type of cheese, but I buy that type of cheese more frequently! Does this make sense?]


.
Mikey
Posts: 9
Joined: Thu Oct 26, 2006 1:23 pm

Post by Mikey »

jsmaje wrote:I can therefore feel some confidence in the program, and the lesson it teaches: that when dealing with random events, coincidences are much more likely than might be intuitive.
Nice work jsmaje.

Since the selection is random, one might also expect a movie or two or ... might not be seen at all. Odds are that any missed movies would not contain a track, but what if the only track in existence was on a missed slide? Hopefully the movie selction process is a little smarter than just being random.

I wonder how many selections would have to be made until every example was selected. I suspect it would be a very long time.
jsmaje
Posts: 616
Joined: Tue Aug 15, 2006 8:39 am
Location: Manchester UK

Post by jsmaje »

Regarding the sim:

Fjgiie has rightly pointed out to me that the number of real movies in the pool is far from certain (& that my 160000 total figure was actually a real-movie estimate, plus that I didn't take account of bad focus movies). Also that the pool size has been constantly changing: well-viewed movies being retired, bad focus movies withdrawn, & new movies being added, all at different times.

It's true the sim doesn't explicitly ask for a bad-focus movie input (and to be honest, I forgot to include that!), but whether any particular movie is bad enough to reject can be a matter of opinion at times, and could I suppose be regarded as another component of 'clickability'. That's my excuse anyway.

I think we can all nevertheless agree that repeats aren't only to be expected, particularly doubles (as mguile & Groundling were asking at the start of this thread), but that - given appropriate conditions - they can be at a higher rate than one might expect. Unfortunately the devil is always in the details, and the appropriate conditions here are clearly not that certain.

Mikey's other point about under-viewed or even un-viewed movies is I'm sure quite right in theory for small numbers, but in practice there are so many dusters working at such a rate that (according to the team- sorry can't find the reference, & correct me if I'm wrong) it seems it would need only a day to adequately scan an entire tiles-worth of movies. Indeed, I reckon that the present pool is in danger of being over-scanned to death. If any movies should even then escape by chance (unlikely, given that the project will be running for approx. 7 months) it would take only a minute or so to identify them on the last day of the project and get them well-scanned by tea-time!

On with the search...

John
Post Reply