开发者

regex in php remove citation from wiki text

开发者 https://www.devze.com 2023-03-28 04:42 出处:网络
From the given sample text i want the text apart from the ones that are contained in [[]] and {{}} Sample Text:

From the given sample text i want the text apart from the ones that are contained in [[]] and {{}}

Sample Text:

On 11 December 1988, aged just 15 years and 232 days, Tendulkar scored 100 not out in his debut [[first-class cricket|first-class]] match for [[Mumbai cricket team|Bombay]] against [[Gujarat cricket team|Gujarat]], making him the youngest Indian to score a century on first-class debut. He followed this by scoring a century in his first Deodhar and Duleep Trophy. {{cite web|url=http://www.espnstar.com/cricket/international-cricket/news/detail/item136972/Sachin-Tendulkar-factfile/|title=Sachin Tendulkar factfile |publisher=www.espnstar.com|accessdate=3 August 2009}} He was picked by the Mumbai captain [[Dilip Vengsarkar]] after seeing him negotiate [[Kapil Dev]] in the nets, and finished the season as Bombay's highest run-scorer.He scored 583 runs at an average of 67.77, and was the sixth highest run-scorer overall{{cite web|url=http://blogs.cricinfo.com/link_to_database/ARCHIVE/1980S/1988-89/IND_LOCAL/RANJI/STATS/IND_LOCAL_RJI_AVS_BAT_MOST_RUNS.html|title=1988–89 Ranji season – Most Runs|publisher=Cricinfo|accessdate=3 August 2009}} He also made an unbeaten century in the [[Irani Trophy]] final,{{cite web|url=http://cricketarch开发者_JAVA百科ive.com/Archive/Scorecards/52/52008.html|title=Rest of India v Delhi in 1989/90 |publisher=Cricketarchive|accessdate=3 August 2009}} and was selected for the tour of Pakistan next year, after just one first class season.

I tried this:

$patterns = ("/^{{*/", "/*}}$/" );$replacements = "";
  preg_replace($patterns, $replacements, $parts);
  print_r($parts);

and this:

$parts = preg_replace("/\[(?:\\\\|\\\]|[^\]])*\]/", "", $ans_str);

and this too:

$pattern = ("/\[.*?\]/", "/\{.*?\}/");
  $ans = preg_replace($pattern, "", $parts);

It does not work. Please help, thanks.


This should do the trick

$str = "On 11 December 1988, ...";
$str = preg_replace('/\{\{.+\}\}/Us', '', $str);
var_dump($str);

U modifier is for ungreedy mode, which means stop the match as soon as possible (to avoid all citations being caught as one giant match).

EDIT: added the s modifier, see comments


// remove `{{cite}}` tags
$str = preg_replace('/\s*\{\{[^}{]*+\}\}\s*/', ' ', $str);

// remove links--including rollover text--leaving link text
$str = preg_replace('/\[\[(?:[^][|]*+\|)?+([^][]*+)\]\]/', '$1', $str);

see demo on ideone.com


the following two lines did the trick :

$str = preg_replace(/\s*\{{.*?\}}\s*/g, " ", $str);//to remove the curly braces and the text between them.
$str = preg_replace(/[\[(.)\]]/g, "", $str);//to remove the square braces.

Sorry it went wrong.

0

精彩评论

暂无评论...
验证码 换一张
取 消