c# 4.0 - Efficient way to match regex patterns against huge volume of data -


i have 30 text files (say log files) , size of these files varies 100mb 200mb , have got 1 more text file (pattern.txt) contains around 30 regex patterns. need compare regex patterns against each line in log files in fast , efficient way. reading line line log file , compare against patterns.

is there more efficient way achieve without using third party components?

when filtering don't use regex ie if need compare line n 30 regexes try , turn regexes string indexof operations ie string in line if compare regex. basic string comparison functions incredibly fast, if performance regex's issue use normal string compare functions first speed things up. following example bit contrived demonstrates filtering using regexes lot faster using regexes.

index time: 3492  ms regex time: 81553 ms 

i created file 70mb alternating lines in ie

value="path=/this/is/a/path" initstring="path = " endstring="," />  <pattern value="path=/this/is/a/path" initstring="path = " endstring="," />  

iterating on these line looking string pattern. function uses indexof filter lines use case 23 times faster (i'm not c# developer of might not idiomatic).

private void indexof(streamreader streamreader) {    string line;   string pat = @".*pattern.*";   regex r = new regex(pat);   while ((line = streamreader.readline ()) != null) {      if(line.indexof("pattern") > 0) {        if(r.match(line).success) {          this.line_count++;       }      }    }  }   private void regex(streamreader streamreader) {    string line;   string pat = @".*pattern.*";   regex r = new regex(pat);   while ((line = streamreader.readline ()) != null) {      if(r.match(line).success) {        this.line_count++;     }    }  } 

you need write application in way can first filter before using regular expressions.


Comments

Popular posts from this blog

Ansible - ERROR! the field 'hosts' is required but was not set -

customize file_field button ruby on rails -

SoapUI on windows 10 - high DPI/4K scaling issue -