获取XML标签内容:


# cat sample.xml

  1. <?xml version="1.0"?> 
  2. <catalog> 
  3.    <book id="bk101"> 
  4.       <author>Gambardella, Matthew</author> 
  5.       <title>XML Developer's Guide</title> 
  6.       <genre>Computer</genre> 
  7.       <price>44.95</price> 
  8.       <publish_date>2000-10-01</publish_date> 
  9.       <description>An in-depth look at creating applications with XML.</description> 
  10.    </book> 
  11.    <book id="bk102"> 
  12.       <author>Ralls, Kim</author> 
  13.       <title>Midnight Rain</title> 
  14.       <genre>Fantasy</genre> 
  15.       <price>5.95</price> 
  16.       <publish_date>2000-12-16</publish_date> 
  17.       <description>A former architect battles corporate zombies,  
  18.       an evil sorceress, and her own childhood to become queen  
  19.       of the world.</description> 
  20.    </book> 
  21.    <book id="bk103"> 
  22.       <author>Corets, Eva</author> 
  23.       <title>Maeve Ascendant</title> 
  24.       <genre>Fantasy</genre> 
  25.       <price>5.95</price> 
  26.       <publish_date>2000-11-17</publish_date> 
  27.       <description>After the collapse of a nanotechnology  
  28.       society in England, the young survivors lay the  
  29.       foundation for a new society.</description> 
  30.    </book> 
  31.    <book id="bk104"> 
  32.       <author>Corets, Eva</author> 
  33.       <title>Oberon's Legacy</title> 
  34.       <genre>Fantasy</genre> 
  35.       <price>5.95</price> 
  36.       <publish_date>2001-03-10</publish_date> 
  37.       <description>In post-apocalypse England, the mysterious  
  38.       agent known only as Oberon helps to create a new life  
  39.       for the inhabitants of London. Sequel to Maeve  
  40.       Ascendant.</description> 
  41.    </book> 
  42.    <book id="bk105"> 
  43.       <author>Corets, Eva</author> 
  44.       <title>The Sundered Grail</title> 
  45.       <genre>Fantasy</genre> 
  46.       <price>5.95</price> 
  47.       <publish_date>2001-09-10</publish_date> 
  48.       <description>The two daughters of Maeve, half-sisters,  
  49.       battle one another for control of England. Sequel to  
  50.       Oberon's Legacy.</description> 
  51.    </book> 
  52. </catalog> 

You want to pick up the stuff between the "<description>, </description>" tags.

The first occurrence is on a single line. The rest of them span multiple lines and you want the newlines to be preserved. I shall assume that you want the whitespaces to be preserved as well.

Here's the script -

$
$ perl -lne 'BEGIN{undef $/} while (/<description>(.*?)<\/description>/sg){print $1}' sample.xml
An in-depth look at creating applications with XML.
A former architect battles corporate zombies,
      an evil sorceress, and her own childhood to become queen
      of the world.
After the collapse of a nanotechnology
      society in England, the young survivors lay the
      foundation for a new society.
In post-apocalypse England, the mysterious
      agent known only as Oberon helps to create a new life
      for the inhabitants of London. Sequel to Maeve
      Ascendant.
The two daughters of Maeve, half-sisters,
      battle one another for control of England. Sequel to
      Oberon's Legacy.
$
$


In case you want the newlines preserved, but want to remove the whitespace at the beginning, then -

$
$ perl -lne 'BEGIN{undef $/} while (/<description>(.*?)<\/description>/sg){($x = $1) =~ s/\n\s*/\n/g; print $x}' sample.xml
An in-depth look at creating applications with XML.
A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.
After the collapse of a nanotechnology
society in England, the young survivors lay the
foundation for a new society.
In post-apocalypse England, the mysterious
agent known only as Oberon helps to create a new life
for the inhabitants of London. Sequel to Maeve
Ascendant.
The two daughters of Maeve, half-sisters,
battle one another for control of England. Sequel to
Oberon's Legacy.
$
$


And in case you want to neither the newline nor the whitespace i.e. each chunk between "<description>" tags on a single line, then -

$
$ perl -lne 'BEGIN{undef $/} while (/<description>(.*?)<\/description>/sg){($x = $1) =~ s/\n\s*//g; print $x}' sample.xml
An in-depth look at creating applications with XML.
A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.
After the collapse of a nanotechnology society in England, the young survivors lay the foundation for a new society.
In post-apocalypse England, the mysterious agent known only as Oberon helps to create a new life for the inhabitants of London. Sequel to Maeve Ascendant.
The two daughters of Maeve, half-sisters, battle one another for control of England. Sequel to Oberon's Legacy.
$
$


此文章由 flyinweb 于 2010-07-29 10:36:21 编辑

本日志由 flyinweb 于 2010-07-29 10:29:34 发表,目前已经被浏览 3989 次,评论 0 次;

作者添加了以下标签: 获取XML标签内容

引用通告:http://www.517sou.net/Article/500/Trackback.ashx

评论订阅:http://www.517sou.net/Article/500/Feeds.ashx

评论列表

    暂时没有评论
(必填)
(必填,不会被公开)