PHP in Action

Refactoring is design

dagfinn | 11 October, 2008 14:41

Refactoring is by definition a design actitivity, since the definition of refactoring is "improving the design of existing code". But is this generally and fully recognized? After attending my friendly local agile conference (Smidig2008—sorry, it's in Norwegian), I'm getting more of a feel for how different people think about it. And I'm wondering whether the use of metaphors such as "cleaning" makes refactoring seem too much like unskilled labor. After all, physical cleaning jobs are seen that way.

The analogy between cleaning and refactoring is useful for making the non-developers understand that refactoring is absolutely necessary. But beyond this pragmatic similarity, are the two really similar in deep and meaningful ways? I don't think so. Refactoring is not unskilled labor. It's a task that both requires and builds design skill and experience. While anyone can see that a floor is dirty, identifying code smells is non-obvious, tricky and demanding. This is true even of the simplest code smell, duplicated code. Although spotting code duplication is sometimes easy, at other times, the duplication is too subtle to be easily identifable. When you clean a floor, the goal is well-defined and easy to visualize. When refactoring, you may know what you're aiming for at each small step, but just a few moves further ahead you may end up with a structure you hadn't imagined.

Share and Enjoy:These icons link to social bookmarking sites where readers can share and discover new web pages.
    blogmarks del.icio.us digg NewsVine Reddit

Get links with XPath

dagfinn | 06 October, 2008 12:53

There's a tutorial that appeared recently called Get Links With DOM. Planet PHP lists the author as Kevin Waterson, although his name is not mentioned on the page itself. Anyway, he claims:

Perhaps the biggest mistake people make when trying to get URLs or link text from a web page is trying to do it using regular expressions. The job can be done with regular expressions, however, there is a high overhead in having preg loop over the entire document many times. The correct way, and the faster, and infinitely cooler ways is to use DOM.

Yes, of course it's cooler. But I'm a little bit surprised at the claim that it's the "correct" (only) way, since there's at least one more that I find even cooler: XPath. Admittedly, it's slower, yet it's a more powerful language.

In his example, we just need to add a line to create an XPath object after we've created the DOM object:

$xpath = new DOMXpath($dom);
 

Then, instead of the DOM call:

/*** get the links from the HTML ***/
$links = $dom->getElementsByTagName('a');
 

we can use an XPath query:

/*** get the links from the HTML ***/
$links = $xpath->query('//a');
 

That's all. So why is that cooler? Because you can do more powerful searches easily. The DOM just happens to have a simple call to find all elements with a certain tag name, so there's not much difference in this case. But more complex stuff is something else. For instance, we can get just the URLs with a single expression:

$links = $xpath->query('//a/@href');
 

Or we can get just the URLs of just the links whose CSS class is "bookmark":

$links = $xpath->query("//a[@class='bookmark']/@href");
 

I've been using this for ages when testing web pages. Then there's the not quite official SimpleTest DOM tester, which uses CSS selectors to specify paths. But I won't go into that right now.

Share and Enjoy:These icons link to social bookmarking sites where readers can share and discover new web pages.
    blogmarks del.icio.us digg NewsVine Reddit
 
Accessible and Valid XHTML 1.0 Strict and CSS
Powered by LifeType - Design by BalearWeb