gcanyon a day ago

One that isn't listed here, and which is critical to machine learning, is the idea of near-orthogonality. When you think of 2D or 3D space, you can only have 2 or 3 orthogonal directions, and allowing for near-orthogonality doesn't really gain you anything. But in higher dimensions, you can reasonably work with directions that are only somewhat orthogonal, and "somewhat" gets pretty silly large once you get to thousands of dimensions -- like 75 degrees is fine (I'm writing this from memory, don't quote me). And the number of orthogonal-enough dimensions you can have scales as maybe as much as 10^sqrt(dimension_count), meaning that yes, if your embeddings have 10,000 dimensions, you might be able to have literally 10^100 different orthogonal-enough dimensions. This is critical for turning embeddings + machine learning into LLMs.

  • phreeza 20 hours ago

    By orthogonal-enough dimensions, do you mean vectors whose dot product is close to zero?

  • sigmoid10 17 hours ago

    This is actually just another way to see the third example (concentration of measure). As you increase the number of dimensions, the contribution of each base vector component in the calculation of, say, the cosine angle (i.e. via the scalar product) becomes less important. So in three dimensions you'll have a pretty high angle if one vector component points along a different base vector. But in 10,000 dimensions, the angle will be tiny.

  • westurner a day ago

    Does distance in feature space require orthogonality?

    With real space (x,y,z) we omit the redundant units from each feature when describing the distance in feature space.

    But distance is just a metric, and often the space or paths through it are curvilinear.

    By Taxicab distance, it's 3 cats, 4 dogs, and 5 glasses of water away.

    Python now has math.dist() for Euclidean distance, for example.

    • epistasis 20 hours ago

      Near-orthogonality allows fitting in more directions for distinct concepts than the dimension of the space. So even though the dimension of an LLM might be <2000, far far more than 2000 distinct directions can fit into that space.

      The term most often used is "superposition." Here's some material on it that I'm working through right now:

      https://arena3-chapter1-transformer-interp.streamlit.app/%5B...

      • gcanyon 13 hours ago

        Nice, thanks!

crazygringo 11 hours ago

> The volume of the unit d-sphere goes to 0 as d grows! A high dimensional unit sphere encloses almost no volume!

This feels misleading to me.

Directly comparing volumes in different dimensions doesn't make any sense because the units are different. It doesn't make sense to say that a quantity in m^3 is larger or smaller than a quantity in m^4. Because it doesn't make any sense to compare the area of a circle with the volume of a sphere.

> More accurate pictorial representations of high dimensional cubes (left) and spheres (right).

The cube one is arguably accurate -- e.g. in 100 dimensions, if the distance from the center of a cube to the center of a face is 1, then the distance from the center of the cube to a corner is 10.

But the sphere one, I don't know. Every point on a 100-dimensional sphere is still the same distance away from its center. The sphere is staying spherical in an intuitive way, it's just that the corners of the enclosing cube have gotten so much further away.

So what is accurate to say is that the proportion of volume of a sphere relative to that of its bounding cube keeps decreasing. Which, rather than being supposedly "counterintuitive", makes perfect intuitive sense -- because every time you add a dimension, you can think of it as "extruding" the previous sphere into the new dimension and then shaving it round, the way a 2D circle can be extruded into a cylinder in 3D and then shaved down to make it into a sphere. Every time you add a dimension, you shave off more.

The article suggests that a 3D sphere has greater volume than a 2D circle -- with a unit radius, the sphere is 4/3π while the circle is just π. But again, they're in different units, so it's a meaningless statement. It makes much more sense to say that a 2D circle takes up (1/4)π≈0.79 of its bounding square, a 3D sphere takes of (1/6)π≈0.52 of its bounding cube, a 4D sphere takes up (π/32)π≈=0.31, and so forth. So no, the volume doesn't go up and then down -- it just goes down every time when taken as a unitless proportion (and proportions are comparable).

  • jvanderbot 9 hours ago

    Your image of extruding a cylinder in higher dimensions inside its bounding box then rounding it off was really insightful as a teaching tool. I've always struggled to visualize these "counterintuitive" results, which are only counterintuitive because they are harder to visualize or seem to "change" after D=3. But now they don't. Thanks!

  • jsenn 6 hours ago

    I don't think this is right. If you're worried about units you can calculate the (generalized) surface area to volume ratio, which turns out to be exactly D/r. In other words, as D increases, the ratio goes to infinity.

    I think this fact can fairly be interpreted to mean that a high-dimensional unit sphere encloses almost no volume. The 2D cartoon drawing of a hypersphere also helps capture this: you can imagine the "spikes" stretching out and squeezing the interior portion, until it's all outside and no inside.

    EDIT: another argument I've seen involves calculating the ratio of the volume of a thin shell surrounding the n-sphere's surface to its total volume. You can prove that the limit of the ratio as the dimension goes to infinity is 1. In other words, in high dimensions almost all of the volume of the sphere is concentrated near its surface.

    • aatd86 5 hours ago

      Another simplistic way to see it is that it is a ratio of contained information. In higher dimensional spaces, the space is so big that below the unit spheres contain exponentially less information.

      It's just something between 0 and 1 exponentized to d where d is the dimension after all (i.e. the number of eigenvectors).

      d is an exponential scale factor in a sense.

  • yatopifo 8 hours ago

    I think it all depends on how you define hypervolume. If you say it’s a positive real number constructed by means of integration, then you can certainly compare them across objects of various dimensions. When you say “units” I immediately think of stuff like bivectors and trivectors where you can’t reduce one to another without losing important geometric properties. But here we are talking about just the scalar part which is as “unitless” as can be.

rectang a day ago

Time to share my favorite quote from Symbols, Signals and Noise by John R. Pierce, where he discusses how Shannon achieved a breakthrough in Information Theory:

> This chapter has had another aspect. In it we have illustrated the use of a novel viewpoint and the application of a powerful field of mathematics in attacking a problem of communication theory. Equation 9.3 was arrived at by the by-no-means-obvious expedient of representing long electrical signals and the noises added to them by points in a multidimensional space. The square of the distance of a point from the origin was interpreted as the energy of the signal represented by a point.

> Thus a problem in communication theory was made to correspond to a problem in geometry, and the desired result was arrived at by geometrical arguments.

  • CoastalCoder 11 hours ago

    Anyone know if Pierce's book (dated 1961) is still a good intro to the topic?

    My background is in CS, and this would just be evening reading out of general interest.

    • hotspot_one 8 hours ago

      I would be willing to read a few chapters just on spec. There is real value in understanding how people used to think about a problem, and where the source ideas came from.

remcob 21 hours ago

The distance between two uniform random points on an n-sphere clusters around the equator. The article shows a histogram of the distribution in fig. 11. While it looks Gaussian, it is more closely related to the Beta distribution. I derived it in my notes, as (surprisingly) I could not find it easily in literature:

https://xn--2-umb.com/21/n-sphere

  • zombot 19 hours ago

    > The distance between two uniform random points on an n-sphere clusters around the equator.

    This sentence makes no sense to me.

    • p1esk 19 hours ago

      He means it clusters around the distance from a pole to the equator.

      • remcob 19 hours ago

        Correct. I was too short in my comment. It's explained in the article: without loss of generality you can call one of the two points the 'north pole' and then the other one will be distributed close to the equator.

    • isoprophlex 19 hours ago

      Pick an equator on an n-sphere. It is a hyperplane of dimensions (n-1) through the center, composed of all but one dimensions of your sphere. The xy plane for a unit sphere in xyz, for example.

      Uniformly distribute points on the sphere. For high n, all points will be very near the equator you chose.

      Obviously, in ofder for a point to be not close to this chosen equator, it projects close to 0 on all dimensions spanning the equatorial hyperplane, and not close to 0 on the dimension making up the pole-to-pole axis.

      • oersted 18 hours ago

        My first thought is that it's rather obvious, but I'm probably wrong, can you help me understand?

        The analogy I have in mind is: if you throw n dice, for large n, the likelihood of one specific chosen dice being high value and the rest being low value is obviously rather small.

        I guess that the consequence is still interesting, that most random points in a high-dimensional n-sphere will be close to the equator. But they will be close to all arbitrary chosen equators, so it's not that meaningful.

        If the equator is defined as containing n-1 dimensions, then as n goes higher you'd expect it to "take up" more of the space of the sphere, hence most random points will be close to it. It is a surprising property of high-dimensional space, but I think it's mainly because we don't usually think about the general definition of an equator and how it scales to higher dimensions, once you understand that it's not very surprising.

        • isoprophlex 18 hours ago

          > The analogy I have in mind is: if you throw n dice, for large n, the likelihood of one specific chosen dice being high value and the rest being low value is obviously rather small.

          You're exactly right, this whole thing is indeed a bit of an obvious nothingburger.

    • akdor1154 18 hours ago

      "clusters" is acting as a verb here, not a noun.

  • 7fYZ7mJh3RNKNaG 19 hours ago

    beautiful visualizations, how did you make them?

    • remcob 18 hours ago

      The first one IIRC with Geogebra, all the rest with Matplotlib. The design goal was to maximize on 'data-ink ratio'.

FabHK a day ago

For high-dimensional spheres, most of the volume is in the "shell", ie near the boundary [0]. This sort of makes sense to me, but I don't know how to square that with the observation in the article that most of the surface area is near the equator. (In particular, by symmetry, it's near any equator; so, one would think, in their intersection. That is near the centre, though, not the shell.)

Anyway. Never buy a high-dimensional orange, it's mostly rind.

[0] https://www.math.wustl.edu/~feres/highdim

  • hansvm a day ago

    It's basically the same idea in both cases. Power laws warp anything "slightly bigger" into dominating everything else when the power is big enough. There's a bit more stuff near the outside than the inside, so with a high enough dimension the volume is in the rind. Similarly, the equator is a bit bigger than the other slices, so with enough dimensions its surface area dominates.

    • WiSaGaN 20 hours ago

      Yes, this seems to be the result of the standard Euclidean metric rather than the high dimension itself. I guess most people assuming the metric to be Euclidean, so it's ok.

  • youoy 18 hours ago

    If you like ML, this is also related with the results of this paper [0], where they show that learning in high dimensions amounts to extrapolation, as opposed to interpolation. Intuitively I think of this as the fact that points in the sphere are convexly independent, and most of the volume of the ball is near the boundary.

    [0] https://arxiv.org/abs/2110.09485

mattxxx a day ago

Yea - high dimensional spaces are weird and hard to reason about... and we're working very frequently in them, especially when dealing with ML.

  • l33t7332273 a day ago

    Luckily if you do enough math it becomes much easier to reason about such spaces

    • JBiserkov a day ago

      - How do you even visualize an 11-dimensional space?

      - oh that's easy - you just visualize an N-dimensional space and then set N equal to 11.

      • rectang a day ago

        I think of high-dimensional spaces in terms of projection. Projecting a 3-dimensional space onto a 2-dimensional space loses information and the results depend on perspective. Same with an 11-dimensional space being projected onto a 10-dimensional space.

        I find that this metaphor works pretty well for visualizing how a vector-space search engine represents how two documents can be "similar" in N-dimensional term-space: look at them from the right angle and they appear close together.

      • marcosdumay a day ago

        Yeah, stopping that need to visualize everything is one of the mechanisms usually adopted for working in high-dimensional space.

ljouhet 9 hours ago

Thank you! I love this article, but I couldn't find it.

I always search "Curse of dimensionality" instead of "Counterintuitive properties..."

brazzy 13 hours ago

> The volume of the unit -sphere goes to 0 as grows! A high dimensional unit sphere encloses almost no volume! The volume increases from dimensions one to five, but begins decreasing rapidly toward 0 after dimension six.

What the absolute fuck?

That one caught me truly off guard. I don't think "counterintuitive" is a strong enough word.

derbOac a day ago

I love this stuff because it's so counterintuitive until you've worked through some of it. There was an article linked to on HN a while back about high-dimensional Gaussian distributions that was similar in message, and probably mathematically related at some level. It has so many implications for much of the work in deep learning and large data, among other things.

  • vqv 12 hours ago

    Statistician here.

    I agree that some of this stuff seems counterintuitive on the surface. Once you make the connection with high-dimensional Gaussians, it can become more "obvious": if Z is standard n-dimensional Gaussian random vector, i.e. one with iid N(0,1) coordinates, then normalizing Z by its norm, say W, gives a random vector U that is uniformly distributed on an n-Sphere. Moreover, U is independent of W --- this is related to the fact that the sample mean and variance are independent for a random sample from a Normal population --- and W^2 has Chi-squared distribution on n degrees of freedom. So for example a statement about concentration of volume of the n-Sphere about an equatorial slice is equivalent to a statement about the probability that the dot product between U and a fixed unit norm vector is close to 0, and that probability is easy to approximate using undergraduate-level probability theory.

    Circling back to data: it is very easy to be mislead when working with high-dimensional data, i.e. data with many, many features.

bmitc a day ago

Actually, the most counterintuitive is 4-dimensional space. It is rather mathematically unique, often exhibiting properties no other dimension does.

  • dullcrisp a day ago

    Well I’m sure 2- and 3- dimensional space are also mathematically unique and interesting by the same token, but they’re nearer to our experience and intuition.

    • ngruhn 17 hours ago

      I‘ve heard that knots only exist in 3 dimensions. In 2D you can’t entangle anything and in 4D+ you can always untangle everything.

      • madcaptenor 11 hours ago

        It's been a while since I studied any topology, but if I'm remembering correctly you can knot an (n-2)-dimensional surface in n-dimensions.

  • NL807 a day ago

    >often exhibiting properties no other dimension does.

    Isn't that true for some other dimensions as well? There is a whole much of mathematical concepts that is constrained for a specific dimension. For example the cross product only makes sense in 3D. The perpendicular dot product (a special case of the determinant) only makes sense in 2D.

    • justsid 6 hours ago

      The cross product is a generalization of the wedge product which does exist in higher dimensions.

    • immibis 18 hours ago

      Apparently there's also a 7D cross product - and no others!

  • elcritch a day ago

    How so?

    • hansvm a day ago

      The intuitive way to think about it is that with very few dimensions you have very few degrees of freedom, so it's easy to prove things possible or impossible. With lots of dimensions, you have enough wiggle room to prove most things possible. Somewhere in between, you have enough complexity to not trivialize the problems but not enough wiggle room to be able to easily circumvent the issue.

      Often in practice, that boundary is around 3-4 dimensions. See the poincaré conjecture, various sphere packing shenanigans, graph embeddings, ....

    • bmitc a day ago

      There's a section here about phenomena in 4 dimensions: https://en.wikipedia.org/wiki/4-manifold

      One of the most surprising is that all smooth manifolds of dimension not equal to four only have a finite number of unique smooth structures. For dimension four, there are countably infinite number of unique smooth structures. It's the only dimension with that property.

      • elcritch a day ago

        Fascinating that higher dimension manifolds are more restrictive!

        Though in a _very_ handwavy way it seems intuitive given properties like that in TFA where 4-d is the only dimension where the edges of the bounding cube and inner spheres match. Especially given that that property seems related to the possible neighborhoods of points in d-4 manifolds. Though I quickly get lost in the specifics of the maths on manifolds. :)

        > However in four dimensions something very interesting happens. The radius of the inner sphere is exactly 1/2, which is just large enough for the inner sphere to touch the sides of the cube!

      • ashishb a day ago

        > One of the most surprising is that all smooth manifolds of dimension not equal to four only have a finite number of unique smooth structures. For dimension four, there are countably infinite number of unique smooth structures. It's the only dimension with that property.

        Can you give some intuition on smooth structure and manifold? I read Wikipedia articles a few times but still can't grasp them.

        • aithrowawaycomm 20 hours ago

          I am not sure the other comment was especially intuitive. Here is my understanding:

          Euclidean space is a vector space and therefore pretty easy to work with in computations (especially calculus) compared to something like the surface of a sphere, but the sphere doesn't simply abandon Euclidean vector structure. We can take halves of the sphere and "flatten them out," so instead of working with the sphere we can work with two planes, keeping in mind that the flattening functions define the boundary of those planes we're allowed to work within. Then we can do computations on the plane and "unflatten" them to get the result of those computations on the sphere.

          Manifolds are a generalization of this idea: you have a complicated topological structure S, but also some open subsets of S, S_i, which partition S, and smooth, invertible functions f_i: S_i -> R^n that tell you how to treat elements of S locally as if they were vectors in Euclidean space (and since the functions are invertible, it tells you how to map the vectors back to S, which is what you want).

          The manifold is a pair, the space S and the smooth functions f_i. The smoothness is important because ultimately we are interested in doing calculus on S, so if the mapping functions have "sharp edges" then we're introducing sharp edges into S that are entirely a result of the mapping and not S's own geometry.

        • bmitc a day ago

          Applying a smooth structure to a manifold to make it a smooth manifold is like a patching process that makes it look like a Eucliden space.

          Most of calculus and undergraduate math, engineering, and physics takes place in Euclidean space R^n. So all the curves and surfaces directly embed into R^n, usually where n = 2 or n = 3. However, there are more abstract spaces that one would like to study and those are manifolds. To do calculus on them, they need to be smooth manifolds. A smooth structure is a collection of "patches" (normally called charts) such that each patch (chart) is homeomorphic (topologically equivalent) to an open set in R^n. Such a manifold is called an n-dimensional manifold. The smoothness criterion is a technicality such that the coordinates and transformation coordinates are smooth, i.e., infinitely differentiable. Smooth manifolds is basically the extension of calculus to more general and abstract dimensions.

          For example, a circle is a 1-dimensional manifold since it locally looks like a line segment. A sphere (the shell of the sphere) is a 2-dimensional manifold because it locally looks like an open subset of R^2, i.e., it locally looks like a two dimensional plane. Take Earth for example. Locally, a Euclidean x-y coordinate system works well.

nyc111 11 hours ago

I don't understand what is meant here by "dimension." Is the definition of "dimension" consistent for 3-dimensional figures and n-dimensional figures? In other words, what is the importance of the orthogonality of the axes? The axes of dimensions beyond 3-d are not orthogonal, does this change the definition of "dimension".

I cannot conceive a geometrical image of higher dimensions. Algebraically, yes, but not geometrically.

  • travisjungroth 7 hours ago

    > The axes of dimensions beyond 3-d are not orthogonal, does this change the definition of "dimension".

    They are orthogonal.

    > I cannot conceive a geometrical image of higher dimensions.

    This is normal, and essential to the point of the article. If you could visualize 10-dimensional space, it wouldn’t be so counterintuitive.

    Try looking up images and videos of 4D objects projected into 3D and 2D. That might help. Hypercubes are maybe the easiest.

    • nyc111 3 hours ago

      From Wikipedia: https://en.m.wikipedia.org/wiki/Hypercube "4 – If one moves the cube one unit length into the fourth dimension, it generates a 4-dimensional unit hypercube (a unit tesseract)."

      How do we draw an orthogonal line to the three orthogonal linas that we have?

khana 19 hours ago

[dead]