JavaScript Regex newlines ruin bibtex parsing_问答_开发者

JavaScript Regex newlines ruin bibtex parsing

开发者 https://www.devze.com 2023-03-30 15:39 出处：网络

I am trying to read a bibtex file into my JavaScript script. The Regex used to parse the file is: /(.*)\\s*=\\s*[{\"\'](.*|.*\\s+.*|.*\\s+.*\\s+.*|.*\\s+.*\\s+.*\\s+.*|.*\\s+.*\\s+.*\\s+.*\\s+.*)[}\

I am trying to read a bibtex file into my JavaScript script. The Regex used to parse the file is:

/(.*)\s*=\s*[{"'](.*|.*\s+.*|.*\s+.*\s+.*|.*\s+.*\s+.*\s+.*|.*\s+.*\s+.*\s+.*\s+.*)[}"'],?/g

This works as I want it to:

@Article{journals/aim/Sloman99,
  title =   "Review of Affective Computing",
  author =  "Aaron Sloman",
  journal = "AI Magazine",
  year =    "1999",
  number =  "1",
  volume =  "20",
  url = "http://dblp.uni-trier.de/db/journals/aim/aim20.html#Sloman99",
  pages =   "127--133",
}

It gives me nice key.value pairs like: "author : Aaron Sloman".

This doesn't:

@Article{journals/aim/Sloman99,
  title =   "Review of Affective Computing",
  author =  "Aaron
  S
  l
  o
  m
  a
  n",
  journal = "AI Magazine",
  year =    "1999",
  number =  "1",
  volume =  "20",
  url = "http://dblp.uni-trier.de/db/journals/a开发者_开发技巧im/aim20.html#Sloman99",
  pages =   "127--133",
}

It just omits the author.

So how can I make a regex that matches an entry with as much newlines (not only as much as there are repetitions of ".*\s+") as there are till it encounters a " or a }?

I know people love to use regular expressions to parse markup, it seems to be a fad... like lady gaga, or the Fun Dip. But if you want to parse mark up efficentially you should use a parser or write one.

Why, Regular expressions is meant to parse regular language. Most markdowns cannot be expressed as an NFA or DFA. And because of this using regex to parse them is... impossible at the hardest point, and just slow at the easiest.

There are a couple great JS bibtex parsers out there

http://sourceforge.net/projects/jsbibtex/
http://code.google.com/p/bibtex-js/

are two, I recommend you look at those... I know you have already done work to create your regular expression but I promise your job will become much easier when you take the step to a real parser.

Just a small example of why your regex is bad and a parser is better.

Matching patterns include

;;;)(>$#@ = 'dfsa3 342 '}
((())))+++>$#@ = 'dfsa3@@//''''''''''''
>$#@ = 'dfsa3@@//'''}}}}"""

These are not good!

JavaScript Regex newlines ruin bibtex parsing

精彩评论

关注公众号

热门标签

图文推荐

JavaScript Regex newlines ruin bibtex parsing

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：