当前位置:网站首页 > 更多 > 玩电脑 > 正文

[玩转系统] 周五的 REST、正则表达式和替换的乐趣

作者:精品下载站 日期:2024-12-14 07:41:00 浏览:12 分类:玩电脑

周五的 REST、正则表达式和替换的乐趣


[玩转系统] 周五的 REST、正则表达式和替换的乐趣

首先,我需要一些可以使用的东西。

$feed = Invoke-RestMethod -Uri http://powershell.org/wp/feed/

这是我得到的返回结果的示例:

title       : PowerShell Tip from the Head Coach of the 2014 Winter Scripting Games: Design for Performance and Efficiency!
link        : http://powershell.org/wp/2014/01/23/powershell-tip-from-the-head-coach-of-the-2014-winter-scripting-games-design-fo
              r-performance-and-efficiency/
comments    : {http://powershell.org/wp/2014/01/23/powershell-tip-from-the-head-coach-of-the-2014-winter-scripting-games-design-f
              or-performance-and-efficiency/#comments, 0}
pubDate     : Thu, 23 Jan 2014 14:28:33 +0000
creator     : creator
category    : {category, category, category, category...}
guid        : guid
description : description
encoded     : encoded
commentRss  : http://powershell.org/wp/2014/01/23/powershell-tip-from-the-head-coach-of-the-2014-winter-scripting-games-design-fo
              r-performance-and-efficiency/feed/

cmdlet Invoke-RestMethod 已继续创建 XML 元素。我不需要做任何事情。接下来,我想重新格式化结果并使其美观。例如, pubDate 虽然可读,但不会作为真正的 [datetime] 排序,因为它被表示为字符串。有些属性(例如“描述”)被隐藏得更远。

PS C:\> $feed[1].description

#cdata-section                                                                                                                   
--------------                                                                                                                   
There are several concepts that come to mind when discussing the topic of designing your PowerShell commands for performance a...

我可以从这样的事情开始:

$feed | Select Title,Link,
@{Name="Description";Expression={$_.description.InnerText}},
@{Name="Published";Expression={$_.PubDate -as [datetime]}}

我获取了描述文本并将 pubDate 视为 [datetime] 对象。

[玩转系统] 周五的 REST、正则表达式和替换的乐趣

更详细地查看此内容时,我首先看到的一件事是描述中充满了 HTML 代码。

$feed | Select Title,Link,
@{Name="Description";Expression={$_.description.InnerText}},
@{Name="Published";Expression={$_.PubDate -as [datetime]}}</pre>
<p>I&#8217;ve grabbed the description text and treated the pubDate as a [datetime] object.</p>
<p><a href="https://jdhitsolutions.com/blog/wp-content/uploads/2014/01/convertingxml1.png"><img fetchpriority="high" decoding="async" src="https://jdhitsolutions.com/blog/wp-content/uploads/2014/01/convertingxml1-1024x513.png"  width="474" height="237" class="aligncenter size-large wp-image-3630" srcset="https://jdhitsolutions.com/blog/wp-content/uploads/2014/01/convertingxml1-1024x513.png 1024w, https://jdhitsolutions.com/blog/wp-content/uploads/2014/01/convertingxml1-300x150.png 300w, https://jdhitsolutions.com/blog/wp-content/uploads/2014/01/convertingxml1.png 1299w" sizes="(max-width: 474px) 100vw, 474px" /></a></p>
<p>One of the first things I see looking at this in more detail is that the description is full of HTML code.</p>
<pre class="lang:batch decode:true ">
title       : Scripting Games Winter 2014 – Team Discussion Tips
link        : http://powershell.org/wp/2014/01/06/scripting-games-winter-2014-team-discussion-tips/
Description : When you&#8217;re logged into the Games, you&#8217;ll notice that clicking on your team pulls up a &#8220;team 
              discussion&#8221; box. That&#8217;s a shared discussion area for you and your team. However, if you click on one 
              of the files you&#8217;ve uploaded, you&#8217;ll see the discussion turn into a &#8220;File Discussion.&#8221; We 
              retain a separate thread for<span class="continue-reading">... <a href="http://powershell.org/wp/2014/01/06/scripting-games-winter-2014-team-discussion-tips/">Continue Reading 
              &#187;</a></span><div class="yarpp-related-rss">
              <h3>Related posts:</h3><ol>
              <li><a href="http://powershell.org/wp/2013/12/19/2014-winter-scripting-games-team-formation-tips/" rel="bookmark" >2014 Winter Scripting Games: Team Formation Tips</a></li>
              <li><a href="http://powershell.org/wp/2014/01/02/registration-and-team-formation-now-available-for-the-scripting-ga
              mes-2014-winter/" rel="bookmark" title="Registration and Team Formation NOW AVAILABLE for The Scripting Games 
              &#8211; 2014 Winter">Registration and Team Formation NOW AVAILABLE for The Scripting Games &#8211; 2014 
              Winter</a></li>
              <li><a href="http://powershell.org/wp/2013/10/02/seeking-coaches-and-judges-for-the-winter-scripting-games/" rel="bookmark" >Seeking Coaches and Judges for 
              the Winter Scripting Games</a></li>
              </ol>
              </div>
              
Published   : 1/6/2014 10:42:52 AM

为了使其更易于阅读,我想去掉 HTML 标签并将“-”之类的内容转换为我可以理解的内容。这就是正则表达式发挥作用的地方。

通过一些在线研究,我想出了一个正则表达式模式来查找 HTML 标签: 我可以像这样使用它:

PS C:\> [regex]$rgx = ""
PS C:\> $rgx.replace($feed[1].description.innertext,"")
There are several concepts that come to mind when discussing the topic of designing your PowerShell commands for performance and 
efficiency, but in my opinion one of the items at the top of the list is “Filtering Left” which is what I’ll be
 covering in this blog article. First, let’s start out by taking a... Continue Reading »
Related posts:
Winter Scripting Games 2014 Tip #1: Avoid the aliases
Winter Scripting Games 2014 Tip #2: Use #Requires to let PowerShell do the work for you
Winter Scripting Games 2014

Replace() 方法获取所有匹配项并将它们替换为“”,从而有效地将它们从文本中删除。因为,我有许多可能需要替换的项目,所以我定义了一个哈希表。

$decode=@{
''= ""
'’' = "'"
'“' = '"'
'”' = '"'
'»' = "..."
'–' = "--"
'–' = "@"
' ' = " "
 }

然后在我的 Select-Object 语句中我可以重新格式化描述文本。

$feed | Select Title,
@{Name="Description";Expression={
 $text = $_.Description.InnerText
 #strip out html codes
 foreach ($key in $decode.keys) {
  [regex]$rgx=$key
  $text = $rgx.Replace($text,$decode.Item($key)).Trim()
 }
 #use the cleaned up text
 $text
 }},

PowerShell 遍历哈希表键并将文本替换为键值。最终结果是 $text 现在是干净的。这是我的最终代码

$feed = Invoke-RestMethod -Uri http://powershell.org/wp/feed/

#hash table of HTML codes
#http://www.ascii.cl/htmlcodes.htm
$decode=@{
''= ""
'’' = "'"
'“' = '"'
'”' = '"'
'»' = "..."
'–' = "--"
'–' = "@"
' ' = " "
 }


$feed | Select @{Name="Title";Expression={$_.title}},
@{Name="Description";Expression={
 $text = $_.Description.InnerText
 #strip out html codes
 foreach ($key in $decode.keys) {
  [regex]$rgx=$key
  $text = $rgx.Replace($text,$decode.Item($key)).Trim()
 }
 #use the cleaned up text
 $text
 }},
@{Name="Published";Expression={$_.PubDate -as [datetime]}},
@{Name="Link";Expression={$_.Link}},
@{Name="Category";Expression={$_.Category.innertext -join ","}}

我所做的另一个添加是从 Category XML 元素获取文本,并将字符串数组连接到以逗号分隔的单行中。这是最终结果:

[玩转系统] 周五的 REST、正则表达式和替换的乐趣

但等等,还有更多!我已获取 Invoke-RestMethod 的输出并将新对象写入管道。我可以做的一件事是将结果通过管道传输到 Out-GridView 并将其用作对象选择器。

c:\scripts\get-powershellorg.ps1 | 
out-gridview -Title "Select one or more stories" -PassThru |
foreach { start $_.link }

我可以将结果发送到 Out-Gridview。从那里我可以选择一个或多个项目,单击“确定”并在我的网络浏览器中打开该项目。或者这个怎么样:

c:\scripts\get-powershellorg.ps1 |
Select Title,Description,Published,
@{Name="Link";Expression={ "$($_.link) "}},
Category |
ConvertTo-HTML -Title "PowerShell.org" | Out-string | 
foreach { $_.Replace("<","") } |
out-file c:\work\psorg.htm -Encoding ascii

这里我正在进行一些替换。运行脚本后,我重新选择属性并自定义 Link 属性,将其变成 HTML 锚链接。我这样做是为了当我转换为 HTML 时我会得到一个带有可点击链接的表格。嗯,差不多了。您会看到,当 ConvertTo-HTML 获取 Link 属性的文本时,它会将 转回带引号的 HTML。这意味着我需要在将结果保存到文件之前将其转回 中。请注意,我正在利用 ForEach 脚本块中的管道。

foreach { $_.Replace("<","") } 

Replace() 方法在替换 。最终结果是用一个命令进行两次替换。

我希望你能从中获得一些乐趣,也许还能学到一两个技巧。

您需要 登录账户 后才能发表评论

取消回复欢迎 发表评论:

关灯