The experience of sight begins when photons from the world hit the lens of our eye, and get focused onto a small patch of photoreceptive cells on a part of the eye called the retina. These cells come in two types - rods and cones. Cones are for color detection, functioning well in bright light, and rods are more sensitive but also colorblind. Humans have about 125 million rod cells and 6 million cone cells. Some species have many more rods, especially those adapted to living at night. Some owls have night vision 100 times more acute than that the sight we are accustomed to.
Rods and cones perform a function called phototransduction, which simply means converting incoming light into electrical signals to be sent to the brain, making sight possible. All these cells contain photoreceptive proteins with various pigment molecules. In rods these are called rhodopsin. In cones, various pigments can be found, allowing the eye to distinguish between different colors. When light associated with the pigment impacts the photoreceptor cell, it sends a signal down the optic fiber, otherwise, it doesn't. Photoreceptor cells and the ability of sight are extremely old evolutionary innovations, dating back to the Cambrian period over 540 million years ago.
There are two notable structural characteristics of the human retina. The first is the fovea, a highly condensed area of photoreceptor cells located in the center of the retina. The cell density here is several times greater than on the periphery, explaining why when we look directly at something it is a lot clearer than looking at it through the corner of our eye.
The fovea is also responsible for the behavioral adaptations that provoke us to rapidly turn our heads and stare at something if it startles us. If the fovea didn't exist and photoreceptor density were uniform across the surface of the retina, we wouldn't need to do this - we'd only need to turn our head slightly so that the event at least fell within our field of vision. The foveal area is a relatively small portion of the visual field, about 10 degrees wide.
The second notable structural characteristic in the retina is our blind spot. This is where the optic fiber connects up to the back of the retina to get visual information, precluding the existence of photoreceptors in a small spot. Our brains automatically fill in our blind spots for us, but various visual exercises can prove that it's there.
Once light is converted into electrical impulses and sent down the optic fiber, it goes all the way to the back of the brain (after making a few stopovers), where the visual cortex is located. In the visual cortex, a hierarchy of detector cells isolates useful regularities in the visual data, discarding superfluous information. One layer of cells detects things like lines and curves.
A higher-up layer would detect regularities like motion and 3D shapes. The highest layer is where gestalts - overall symbols - appear, responsible for the conscious experience of sight under normal circumstances. The visual cortex is among the best understood of all brain areas, with a voluminous neuroscience literature.