Certain stimuli are nearly universally rewarding and others are nearly universally aversive among humans. Attraction and repulsion result when the nervous system detects features about its environment through the senses (taste, smell, appearance, etc.) and transmits these signals to emotional centers in the limbic system and elsewhere.
How is it that emotions get hooked up to feature detector neurons? In some cases, the relationship is purely learned through classical conditioning. For example, after I ate some bad chicken nuggets in 1999 and threw up afterwards, I felt nauseous around the smell of chicken nuggets for several years afterward. (The conditioned effect has since become extinct, but hey, I don't plan to eat chicken nuggets anyway.) In general, many food proclivities are learned, which we can see by the fact that some cultures love to eat cow tongues and monkey brains. There's evidence that nutrition during pregnancy affects later taste preferences.
But there are trickier cases. What about the preference for fat, sugar, and salt? Is this learned or hard-wired? My guess is that the affective valence of these inputs is learned but in the context of more internal cues which are hard-wired. For example, sugar tastes good when you're hungry but can be uncomfortable when you've eaten too much. The body has systems to control glucose concentrations, as well as other hormonal regulators of hunger and satiety, and presumably these interact with the brain to say "go" and "stop" at longer time-scales and drive the learning of the more immediate taste sensations.
While it's easy enough to imagine that many food preferences are learned from internal cues, the case of physical attractiveness seems more difficult. There appear to be dozens of features that determine beauty, and several of them seem fairly universal across cultures. For example, "The hourglass figure is truly timeless": "Written texts of all ages have the same drift when it comes to the midriff - they consistently describe women's thin waists as attractive. The conclusion comes from an analysis of British, Indian and Chinese texts dating as far back as the first century AD. According to the researchers, the finding supports the idea that we are hardwired to prefer slender waists, which are linked to good health and fertility."
Where is this preference for a 2/3 waist-to-hip ratio stored? Presumably the brain has feature detectors for curves and shapes. When certain curves and shapes fire, they pass through a neural network with weights set to pattern-match for a particular configuration. The output of this neural network determines the level of activation of reward systems. But how exactly do these weights get set? [OLD TEXT, probably wrong: This is definitely not something that can be learned, because the outcome of a better waist/hip ratio doesn't show up until years after kids have been born, and the influence of this single predictor on their survival is probably so small that you wouldn't notice it even over a lifetime of having children. Nor does it seem at all plausible that males subconsciously survey women they know and correlate waist/hip ratio against number of healthy offspring. I would conjecture that the correlation isn't even very strong in this age of low maternal/child mortality, yet the preference remains.] [NEW TEXT, Nov. 2014: Attractive features can be learned not via immediate rewards but by imprinting. Indeed, there's decent evidence for sexual imprinting, which seems more plausible to me than that DNA stores information about optimal body shapes. Still, DNA does need to tell the imprinting process which kinds of bodies to imprint on...]
This leads to the suggestion that the 2/3 ratio must be stored in some form within DNA (or potentially molecular epigentics, etc.). This strikes me as incredible. DNA codes for proteins that mix together throughout a long human-building process to turn on and off various cellular functions. How can DNA have enough fine-grained control through this indirect protein-creating mechanism to encode the number 2/3 (poetically speaking) into the neural weights for female body-shape attractiveness? And this isn't a one-off phenomenon. There are several other cross-cultural features of attractiveness based on image feature detection. When seeing this level of detail, my instinct is to run to the explanation of "learning," but I simply can't see how learning could apply in this case.
To be sure, there are plenty of cases in which the emotional salience of visual cues can be learned. Imprinting is a classic example. Geese can't store their mother's appearance in their DNA (else it would change every generation), so they have a developmental strategy to learn that the first thing they see moving within a critical period after birth is their mother. Likewise for reverse sexual imprinting. And there are all sorts of even abstract pictures to which people learn positive or negative associations through culture, like this, or this, or this. It should also be noted that many images are almost universally regarded as beautiful even though they cannot have direct evolutionary benefit, like this or this; presumably these are spandrels of more generic visual affective systems, similar to the human preference for music.
Anyway, one area in which learning is used to perceive physical attractiveness is with averageness preference. About a gazillion and four studies have shown that "average faces" are more attractive than average [sic] and sometimes more attractive than any individual face. You can try it yourself online. The phenomenon appears to be more than mere symmetry detection, because symmetrical faces turned upside down don't have much advantage and because averages of profile views (where symmetry doesn't apply) are still found more attractive. In other words, the brain presumably contains networks that check whether a face is close to the average of faces it has seen. Each time it sees a new face, maybe the brain updates these weights with some learning rate. A theoretical test for the averageness hypothesis could be to raise children within a group of people with a particular abnormal feature (say, a really square face), and upon exposing them to a more normal face, see if they like it less (or at least prefer it less than other children do). In any event, even if averages are learned, the fundamental algorithm of "comparing to the average" still has to be hard-wired. Presumably this lies somewhere in DNA as well.