令人拍案叫绝的C#正则表达式

smile · 发表于 2010-11-24 13:07:19

本帖最后由 smile 于 2010-11-24 13:18 编辑

C#的正则表达式，如果想使用Regex类就得先引入using System.Text.RegularExpressions;命名空间。如果只是想测试一下自己写的正则模型是否在待处理文本里面存在的话，可以用IsMatch()方法来检测。下面贴几段代码，以供日后复习：1:闲着无事写的一段测试正则表达式的代码，可以精确的定位你想要的
using System;
using System.Text;
using System.Text.RegularExpressions;
namespace zuo_Company.TestRegularExpressionProject{
public class TestMain{
  public static void Main(string [] args){
string parnt = @"(\d+)(?:\.)(\w{3})";
string txt = "sfwoewrewou3698755412365.gifsfwewrwsfw558791564981654.jpgslfoweuwrso";
Regex r = new Regex(parnt);
         Match m = r.Match(txt);
         Console.WriteLine("Number of groups found = " + m.Groups.Count);
         for(int i=0;i<m.Groups.Count;i++){
         Console.WriteLine(i+" "+m.Groups.Index+" "+m.Groups.Value);
         }
         Console.WriteLine();
         Console.WriteLine();
         MatchCollection mcs = r.Matches(txt);
         Console.WriteLine("抓到:"+mcs.Count+"条");
         for(int i=0;i<mcs.Count;i++){
         Console.WriteLine(i+" "+mcs.Index+" "+mcs.Value);
         GroupCollection GC = mcs.Groups;
         Console.WriteLine("组记录数:"+GC.Count);
         for(int j=0;j<GC.Count;j++){
            Console.WriteLine(j+" "+GC[j].Index+" "+GC[j].Value);
         }
         }
  }
}
}
-------------------------------------------------------------------------------
运行结果如下：
D:\Csharp>TestRegularExpression
Number of groups found = 3
0 11 3698755412365.gif
1 11 3698755412365
2 25 gif

抓到:2条
0 11 3698755412365.gif
组记录数:3
0 11 3698755412365.gif
1 11 3698755412365
2 25 gif
1 38 558791564981654.jpg
组记录数:3
0 38 558791564981654.jpg
1 38 558791564981654
2 54 jpg
-------------------------------------------------------------------------------
虽然这只是一个“例子程序”但是它完成了我们在日常程序中经常会用到的“图片抓捕”功能，在茫茫html代码中，我们想要分离出网页上的图片，用这个小程序就可以实现，不仅实现的抓捕图片的功能，而且还可以分离出“文件名”“扩展名”以便更好的后续工作进展。上面可以看到Match与MatchCollection的不同之处，相信你很聪明的一眼看出，一个是集合，一个是“单兵”。看过的兄弟是不是有疑问要问呀？string parnt = @"(\d+)(?:\.)(\w{3})";这里明明定义的三个组，为什么分离之后，只显示“文件名”“扩展名”，那个“.”让楼主吃了吗？如果你有这个疑问的话，那麻烦你先瞪大你的牛眼看清楚，(?:\.)这里多了一个?:，?:就是不将这个组保存进GroupCollection。
下面是引用MSDN里的一些代码：
CaptureCollection:
CaptureCollection 类表示捕获的子字符串的序列，并返回由单个捕获组所执行的捕获集。由于限定符，捕获组可以在单个匹配中捕获多个字符串。Captures 属性（CaptureCollection 类的对象）作为 Match 和 Group 类的成员提供，目的是便于对捕获的子字符串的集合进行访问。
例如，如果使用正则表达式 ((a(b))c)+（其中 + 限定符指定一个或多个匹配）从字符串“abcabcabc”中捕获匹配，则子字符串的每一匹配的 Group 的 CaptureCollection 将包含三个成员。
以下控制台应用程序示例使用正则表达式 (Abc)+ 来查找字符串“XYZAbcAbcAbcXYZAbcAb”中的一个或多个匹配。该示例阐释了使用 Captures 属性来返回多组捕获的子字符串。
using System;
using System.Text.RegularExpressions;
public class RegexTest
      {
      public static void RunTest()
      {
         int counter;
         Match m;
         CaptureCollection cc;
         GroupCollection gc;
         // Look for groupings of "Abc".
         Regex r = new Regex("(Abc)+");
         // Define the string to search.
         m = r.Match("XYZAbcAbcAbcXYZAbcAb");
         gc = m.Groups;
         // Print the number of groups.
         Console.WriteLine("Captured groups = " + gc.Count.ToString());
         // Loop through each group.
         for (int i=0; i < gc.Count; i++)
         {
            cc = gc.Captures;
            counter = cc.Count;

            // Print number of captures in this group.
            Console.WriteLine("Captures count = " + counter.ToString());

            // Loop through each capture in group.
            for (int ii = 0; ii < counter; ii++)
            {
                  // Print capture and position.
                  Console.WriteLine(cc[ii] + " Starts at character " +
                     cc[ii].Index);
            }
         }
      }
      public static void Main() {
         RunTest();
      }
}
----------------------------------------------------------------------------------------------------------
Group:
int[] matchposition = new int[20];
String[] results = new String[20];
// Define substrings abc, ab, b.
Regex r = new Regex("(a(b))c");
Match m = r.Match("abdabc");
for (int i = 0; m.Groups.Value != ""; i++)
{
      // Copy groups to string array.
      results=m.Groups.Value;
      // Record character position.
      matchposition = m.Groups.Index;
}
----------------------------------------------------------------------------------------
通过对由 Groups 属性返回的 GroupCollection 对象进行索引返回 Group 的实例。索引器可以是组号，在使用 "(?<groupname>)" 分组构造时则是捕获组的名称。例如，可以在 C# 代码中使用 Match.Groups[groupnum] 或 Match.Groups["groupname"]，或者在 Visual Basic代码中使用 Match.Groups(groupnum) 或 Match.Groups("groupname")。
下面的代码示例使用命名的分组构造，从包含“DATANAME:VALUE”格式的数据的字符串中捕获子字符串，正则表达式通过冒号“:”拆分数据。
Regex r = new Regex("^(?<name>\\w+):(?<value>\\w+)");
Match m = r.Match("Section1:119900");
此正则表达式返回下面的输出结果。
m.Groups["name"].Value = "Section1"
m.Groups["value"].Value = "119900"
-----------------------------------------------------------------------------------------------
Capture:
Capture 类包含来自单个子表达式捕获的结果。
下面的示例在 Group 集合中循环，从 Group 的每一成员中提取 Capture 集合，并且将变量 posn 和 length 分别分配给找到每一字符串的初始字符串中的字符位置，以及每一字符串的长度。
Regex r;
Match m;
CaptureCollection cc;
int posn, length;
r = new Regex("(abc)+");
m = r.Match("bcabcabc");
for (int i=0; m.Groups.Value != ""; i++)
{
      // Capture the Collection for Group(i).
      cc = m.Groups.Captures;
      for (int j = 0; j < cc.Count; j++)
      {
         // Position of Capture object.
         posn = cc[j].Index;
         // Length of Capture object.
         length = cc[j].Length;
      }
}
-------------------------------------------------------------------------------
更改日期格式:
using System;
using System.Globalization;
using System.Text.RegularExpressions;
public class Class1
{
public static void Main()
{
   string dateString = DateTime.Today.ToString("d",
                                    DateTimeFormatInfo.InvariantInfo);
   string resultString = MDYToDMY(dateString);
   Console.WriteLine("Converted {0} to {1}.", dateString, resultString);
}
static string MDYToDMY(string input)
{
      return Regex.Replace(input,
         "\\b(?<month>\\d{1,2})/(?<day>\\d{1,2})/(?<year>\\d{2,4})\\b",
         "${day}-${month}-${year}");
}
}
--------------------------------------
扫描 HREF:
void DumpHrefs(String inputString)
{
      Regex r;
      Match m;
      r = new Regex("href\\s*=\\s*(?:\"(?<1>[^\"]*)\"|(?<1>\\S+))",
         RegexOptions.IgnoreCase|RegexOptions.Compiled);
      for (m = r.Match(inputString); m.Success; m = m.NextMatch())
      {
         Console.WriteLine("Found href " + m.Groups[1] + " at "
            + m.Groups[1].Index);
      }
}
---------------------------------------------------------
如何：从 URL 中提取协议和端口号:
String Extension(String url)
{
      Regex r = new Regex(@"^(?<proto>\w+)://[^/]+?(?<port>:\d+)?/",
         RegexOptions.Compiled);
      return r.Match(url).Result("${proto}${port}");
}
-------------------------------------------------------------------------
如何：从字符串中剥离无效字符:
String CleanInput(string strIn)
{
      // Replace invalid characters with empty strings.
      return Regex.Replace(strIn, @"[^\w\.@-]", "");
}
------------------------------------------------------------------------
如何：验证字符串是否为有效的电子邮件格式:
bool IsValidEmail(string strIn)
{
// Return true if strIn is in valid e-mail format.
return Regex.IsMatch(strIn, @"^([\w-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$");
}