Playing with Vision Embeddings

(prestonbjensen.com)

40 points | by prestoj 2 days ago

4 comments

  • markusMB 7 minutes ago
    Beautiful illustrations I find, 'Playing' is just the free and motivated version of 'exploration'.

    One thought on your nicely illustrated "key observation [is] that neural networks tend to place features along directions": my guess is that the neural net was TOLD to behave that way by choosing e.g. Cosine Loss?

  • jcattle 1 hour ago
    Very nice visualizations, thanks for that!

    One thing I still struggle with in my head is how these vision embeddings can then be used to give LLMs eyes.

    Because you somehow need a giant training set which describes images in natural language, no? Is that actually how it works, or is there some smart trick so you don't need to pay labellers a bunch of money to look at pictures and describe them.

    • dilyevsky 1 hour ago
      > Because you somehow need a giant training set which describes images in natural language, no?

      That's definitely one way - they train a text encoder together with an image encoder on a labelled set of images. WL & 3b1b made a nice video on it: https://www.youtube.com/watch?v=iv-5mZ_9CPY

      • jcattle 33 minutes ago
        Thanks I'll check out that video
  • SkitterKherpi 30 minutes ago
    [dead]