Pls explain
From a pixel generation perspective, what is most likely to be next to a finger? Another Finger! So… mississississississississississippi in a mathematical model.
Hands are really complicated, even to draw. Everything else is relatively easy to guess for an AI, usually faces are looking at the camera or looking sideways, but hands have like a thousand different positions and poses. It’s hard for the AI to guess what the hands should look like and where the fingers should be. It doesn’t help that people are historically bad at drawing hands so there’s a lot of garbage in the data.
That’s true but I would have thought that the models would be able to “understand” hands because I’m assuming they have seen millions of photographs with hands in them by now.
Why are humans so bad with drawing hands?
They are tough, AI isn’t building a logical model of a human when drawing them. It’s more like taking a best guess where pixels should go. So it’s not “thinking”: Alright, drawing a human, human has two hands, each hand has five fingers, the fingers are posed like this, …
It’s drawing a human, so it roughly throws a human shape on there, human shape roughly has a head, when there is a torso two arms should come out (roughly) and on the end of those two arms is something too, but what that is is complicated and always looks different. It’s all approximation, extremely well done, but in the end the AI is just guessing where to put something.
If you trained a model on just a single type of hand and finger position it would perfectly replicate it. But every hand is different and each hand has a near unlimited amount of positions it can be in (including each finger). So it’s usually a mess.
I saw one way to get better results, but that’s pretty much giving the AI beforehand a pose (like a stick figure) so it already knows where things should go. If you just freely generate “Human male, holding hands up” you probably get a mess with 6 fingers out and maybe a third arm going to nowhere in the back.
Why are humans so bad with drawing hands?
The rest of your answer makes sense but this rhetorical question is not helpful IMO. There are lots of things that humans are not good at but at which computers excel.