醉爱PHP

Tag: regexp php encoding

PHP判断字符串编码并且获取字符串中的中文

by admin on 五.11, 2009, under php

判断字符串的编码,使用正则表达式匹配是否为UTF-8编码

参考URL:http://www.w3.org/International/questions/qa-forms-utf-8.en.php

$result = preg_match(’%^(?:
                          [\x09\x0A\x0D\x20-\x7E] # ASCII
                          | [\xC2-\xDF][\x80-\xBF] # non-overlong 2-byte
                          | \xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs
                          | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte
                          | \xED[\x80-\x9F][\x80-\xBF] # excluding surrogates
                          | \xF0[\x90-\xBF][\x80-\xBF]{2} # planes 1-3
                          | [\xF1-\xF3][\x80-\xBF]{3} # planes 4-15
                          | \xF4[\x80-\x8F][\x80-\xBF]{2} # plane 16
                         )*$%xs’, $string);

如果$result为真,则是UTF-8编码的字符串,否为ANSI

以上面为条件,匹配出字符串中的中文

if ($result) {
    preg_match_all("/[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}/", $str, $arr);
    print_r($arr[0]);
} else {
    preg_match_all("/[\x80-\xFF]./", $str, $arr);
    print_r($arr[0]);
}

1 Comment : more...

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Visit our friends!

A few highly recommended friends...