Imagine that you have a very long line and that you are asked to mark a number of points in consecutive fashion along...

Imagine that you have a very long line and that you are asked to mark a number of points in consecutive fashion along the line, subject to a few conditions:

(1) For each new point that is marked on the line, there is a region of length L either side of that point within which no consecutive points are allowed to be marked. In other words, new points may only be marked if they are at least a distance of L from any other existing points.

(2) The location of each new point is chosen at random from any of the available locations along the line.

(3) Once there are no more possible locations, no new points are marked.

What would be the final average spacing of the points?

I have done a computer simulation of the problem and the average spacing seems to approach (4/3)*L but I have no way of proving this. Any thoughts?

Other urls found in this thread:

mathworld.wolfram.com/RenyisParkingConstants.html
twitter.com/SFWRedditVideos

I would like to write about this problem a bit, in the course of thinking about it.

I will suppose for initial thought that I have an infinitely long line to work with, though it occurs to me that the distinction between "infinite" and "very long" is very important to this problem, as the problem seems to involve notions of statistics, probability, and the continuum. I am also concerned about the formulation of the problem, and I wish to hold open the idea that its formulation be changed slightly, in thinking through your formulation. The problem's final language even supposes and assumes that there is a telos, or "end" to the iterations, also important. Finally, the idea of non-dense points is immediately suggestive of the naturals along the real line, but your OP picture is an excellent illustrative choice in that it supposes the possibility of irregularity.

Tacit in your problem is the idea that points are marked off one-at-a-time. This is also extremely important. the L is uniform among all such consecutively placed points, in your language - it doesn't vary, but is a constant L. At this point one ought to start thinking of open vs. closed intervals, dedekind cuts, etc. This regularity implicit in the formulation again implies a regularity of a natural number point-spacing, but let's keep thinking speculatively a bit more about your problem, before attempting anything definite.

I must confess to not knowing statistics very well. Therefore I would ask that you provide your own definition of "final average spacing", for example. It seems to me that we might have an infinity of points to work with, which might complicate matters. We might well conclude that such a metric does not exist.

Let T be the length of the line
Let x be a random real number between 0 and T

Then there is a 1-2L/T chance that the line will be split into two lines with length x-L and T-x-L

There is an L/T chance that the line will be shortened to length T-x-L

There is an L/T chance that the line will be shortened to length x-L

The answer will be the number of times this can recur divided by T

Let us attempt engagements with the thrust of OP's problem. The following thoughts are also offered without proof by way of sketch, and deserve checking.

1) Say that the space is R, the real number line (which serves for a true line, in the literal mathematical sense, extending infinitely in both directions). L meanwhile is some finite radius (open, half open, closed? I don't see how it's material at this trivial case). P1 is generated anywhere on the real line, RANDOMLY, excluding its radius. Can't do P2 anywhere inside P1's L radius. We have an iterative process. now do P2, RANDOMLY. same thing, same L.

Clearly infinitely many P's may be generated in a a straightforward iterative fashion (is the axiom of choice tacit in this?), in an irregular way. OP's question is thus still valid in this version and I don't know how to go about it, here.

2) Consider instead of the line R, the ray [0,inf) , or the appropriate caveat(?) (0,inf). We still have infinitely much space to work with, generating infinitely many P's per the above treatment. Again, I don't know how to approach this.

3) Now the interesting part. Consider some finite line-segment-tier subset of R, being either closed, open, or half-open intervals as subcases and interest require, now apply the above P-L logic. Having only finite space to work with, (with respect to some L), it seems reasonable that the space should be finitely exhausted, in a general way. Thus OP's thing of "very long line" as opposed to "infinite line" proves to be germane after some thought. Some boilerplate about intervals, points, etc needs to be pinned down I would suggest, along with the statistical stuff that I admitted to not knowing.

I invite others to develop this train of thought further.

(3) is basically what I'm considering

Although the length of the line (in my mind) is many times greater than L (say a few order is magnitude), because we run into problems when we're placing points near the end of the line (the rules aren't particularly clear about what happens near the ends).

The problem is that near the ends of the finite line, there is less likelihood for those regions to be affected by adjacent points because points can only be placed on one side.

To get around this, I've simulated this using a computer by imagining the line to be a closed loop. In this way, each location is equally likely to be affected by the placement of a random point compared to any other.

Anyway, I assume that if the length of the line is sufficiently greater than L, these end effects would be eventually diminished and you would arrive at the same average spacing eventually.

*a few orders of magnitude

There would be no average distance that would not be a random number itself.
What you are asking is equivalent to saying "pick a random distance between points and add 2L."
Depending on your definition of random, there would not be an an average distance between points that could be computed unless the line was finite and until the line was full.

>unless the line was finite and until the line was full

I agree

Forget anything about an infinitely long line. Just imagine the length of the line is much greater than L (say a few orders of magnitude greater).

This is the only case in which a finite number of points can exhaust all the available locations for new points.

But by picking the points consecutively, you are simply saying "from this first point, go out a random distance." Putting the caveat that the next point cannot be within a certain distance L of both sides of the point, you simply are adding 2L to the random distance.

Since it is random, there is no average distance until you have a finite length of the line, in which case you can compute "an" average distance, but that number also will be random.

Given an infinite amount of experiments of a finite length of line, or a random length of line makes no difference; the randomness of the points determines the average, and so it is random as well.

So the answer would be that you could not compute the average distance except that it is greater than L and less than or equal to the length of the line.
The algorithm preserves randomness.

IMHO

(cont)
A random number plus a constant is still random. The average difference between random numbers is random. The average difference between a finite set of random numbers is a number that is limited by the size of the set, but is still random.

I think that the average spacing between consecutive points would begin to converge (I suspect to (4/3)*L) for an increase in the ratio between the (finite) line length and L.

>So the answer would be that you could not compute the average distance except that it is greater than L and less than or equal to the length of the line.

How can you explain then that for my computer simulation, assuming the length of the line divided by L is greater than about 100, the average spacing always turns out to be something very close to (4/3)*L? (Usually between 1.32-1.34).

If the spacing were truly random between L and 2L, then you would expect to see all sorts of values, not just 1.33.

I should mention that the very maximum point spacing must be 2L, and this occurs if the points are positioned with the outer radii just touching.

The minimum point spacing is L, which occurs when each point is just touching the outer radius of the next point.

So the spacing must be between L and 2L.

This isn't sufficient

Assuming the radii of two points overlap, the spacing between those points could be anywhere between L < spacing < 2L. You haven't accounted for this.

If the "consecutive" in your algorithm that meant that the random number you added to the last point (plus the buffer zone of 2L) is between zero and infinity then you there is no convergence, so I assume that I didn't understand your algorithm, or that you chose your random number in a way that limited the distance between points artificially.

If it truly was random then I would suspect that most of the time it would chose a number that was greater than the length of your line, so I guess you chose the length of your line as the limit, but then you run the risk of the addition of your two random numbers being longer than the line, so that is a further limit.

This limit process that you put on your random selection might have a convergence that is related to the length of the line, and so would any attempts to have 100 or so points on any line.

Try looking at the 4th and 5th significant digit and see if that might be where the randomness is showing up, meaning that the 1.3 has something to do with your method of selecting a random number.

Let us therefore refine a version of the OP's problem along the following lines.

1) Consider the closed line segment [0,q] , with real q as strictly positive endpoint, which (for emphasis) may be any strictly positive real number.

2) The line segment [0,q] is to be iteratively and /randomly/ populated, or "laid" by points P1, P2... , each of which is equipped with a same /radius of exclusion/ L, such that it is impossible for any further points to be laid to be laid inside any one other point's radius of exclusion. Successive P's may be laid randomly anywhere not

3) Points may either be restricted by their L, near the endpoints, or not (it might be legal to lay a point at 0 or q, say, which I incline toward as an initial treatment), say.

4) certain questions may be asked of this particular model, in line with the OP's idea, which I do not hazard to do just now except in the most general way: number of points, density, and above all average distance, to the OP's central question-point.

I'll explain to you my computer simulation if that helps

Basically I have a 'line' which is a closed loop of total length T consisting of T possible locations where a point may be placed.

Initially, all the locations are available for a point to be placed.

I select a random location from those that are available, place a point there, and then exclude all points within +/- L from being available from then onward.

I repeat this, each time selecting the location of the next point randomly from all the remaining possible locations, until there are no more possible locations left.

Then the spacing of the points is simply T divided by the number of points.

Regardless of the actual length of T and L, this always turns out to be something close to 1.33L.

Okay. So you are picking a random number between 0 an T (to some precision that determines the total number of points in the finite set) that is the center of an interval 2L, then picking another random number from the set minus those values in the interval until you can't pick any more points.

That seems to violate the condition that they be consecutive; you can pick points that have values less than your first pick, unless you also exclude from your set those values that are less than your pick.

Moving on. I think you have confused, random selection with random value. Your values are determined by T and the precision of how you divide T up, but those numbers are not random. You simply are selecting them at random, then limiting the size of the selection set.

Don't be discouraged. Simulation sucks for everyone, and putting problems into practice is actually most of the battle.

Just wait till you have to try and put the ramblings of some stupid manager into SAP.

This is good practice for you.

>That seems to violate the condition that they be consecutive; you can pick points that have values less than your first pick, unless you also exclude from your set those values that are less than your pick.

I don't know what you mean by this

(cont) As for the 1.33L, try choosing more divisions for the interval 0 to T and see if the 1.33 isn't related to the divisor of T. I suspect that the number you divide T by, that also sizes L is where that average is coming from.

In your original post you said, " mark a number of points in consecutive fashion along the line,"

but in your explanation you said, "I have a 'line' which is a closed loop of total length T consisting of T possible locations where a point may be placed."

So in the first example of a line if I picked the halfway point, the next point could not be before it. Maybe I was too strict with consecutive, but after the second pick, you could only go one way down the line or up the line and preserve "consecutive."

So by rephrasing your algorithm as a set that you select from, to preserve the consecutive rule, you would have to exclude all values below your first pick.

This has been interesting, but I have to work tomorrow. Good luck! And try that thing with dividing up T into more points to see if that changes the 1.33. If the thread is up tomorrow, I'll take a look.

I have done different simulations with T/L = 10, 50, 100, 1000

And the number of divisions chosen to be between 100-10,000 (more than 10,000 and the program starts to run too slowly as I am not very good with programming).

With a smaller number of divisions (say 100), the ratio is actually closer to about 1.40.

But, with increasing number of divisions (up to 10,000), the ratio appears to be converging to 1.333. I would expect that for an extremely long line (say, billions of divisions), it would approach some constant ratio (I suspect it's 4/3 but I could be wrong).

When I said consecutive, I meant one point at a time

Not that the points must be placed left right or right to left

Sorry about the confusion there.

I'll post what I have then I too am off to bed.

We'll place points on the interval [0, T] and start with 0 (one fixed starting point won't hurt in the long run). Then we consider X a continuous random variable with values between L and 2L. To fill the line with n points means that the sum of n independent copies of X have managed to have a value between T-L and T.

This will occur with nonzero probability for n between T/2L and T/L, so for T >>> L we have large n, and thus the sum of the Xs is a normal distribution with, I think, mean 3nL/2 and standard deviation Lsqrt(n)/2sqrt(3).

To get the average number of points on the line (and hence the average distance between points when we divide T by this number) we need to add up nP[T-L < \Sigma_{i=1}^n X< T], which I'm too tired to have a go at even approximating. However, I suspect that as T gets vastly bigger than L (ie T/L -> infinity) we'll get some kind of constant ratio as you suggest between this sum and T. And since T and L have a fixed ratio this could give us the 4/3 L ratio for the average distance between points.

Peace.

Let $X$ be the distance to the next point. Then, $X$ takes on values between $L$ and $2L$ in some distribution. I'm sure the following can be generalized based on $X$ but I'll assume $X$ is uniform for what follows.

$N_{x} = min\{n \mid \Sigma_{i=1}^{n} X \leq x$. Let $f(x) = E[N_{x}]$. We ultimately want $f(T-L)$.

$f(x) = 1 + \int_{L}^{2L}f(x-u)du = 1 + \int_{0}^{x-L}f(u)du - \int_{0}^{x-2L}f(u)du$.

$f^{\prime}(x) = f(x-L) - f(x-2L)$.

I just can't solve the ODE. Any thoughts?

Since the minimum spacing is L and maximum is 2L wouldn't it make sense that a truly random simulation would have an average of 3L/2 spacing? Correct my brainletness pls.

I'm trying to explain why it turns out to be 1.33 rather than 1.5 but I'm struggling.

I've been working on my computer simulation though and have just run a large simulation overnight with the total line length being 1,000,000 and L being 1,000. After 1000 tests, the ratio falls between 1.31 and 1.37 and appears to be closely grouped around the 1.333 mark with a normal distribution.

3rd column is the calculated ratio for each simulation

Thing is still running

One thought line.
First the point A is put down anywhere. Adjacent points in right are B & C. Like -A--B--C-. Consider 2 (equipossible) cases
a) B is put down before C. Mean distance between A & B would be 3L/2, ofcourse.
b) C is put down before B. Now the distance AB is constrained, because BC must be > L and, by definition, there must be not enough room to accommodate yet another point between A & C.

But i still cant help with statistics in case b) :( Furter complications arise from the possible constraints on placement of C

OP here

For anyone's interest, it turns out that this problem is similar to one called the Parking Problem

It turns out that the ratio for that problem approaches 1.3376174... as the length of the line (or in that case a road) approaches infinity.

It is interesting that it is not 4/3 as I thought, but some very close number.

The difference between the two problems is that mine allows overlap between the blocks or cars, whereas the parking problem doesn't. But there doesn't seem to be any difference in the result.

mathworld.wolfram.com/RenyisParkingConstants.html