如何在 mysql 或 php 中将 'u00e9' 转换为 utf8 字符?
P粉704196697
2023-08-24 20:34:18
[PHP讨论组]
<p>我正在对一些正在导入 mysql 的混乱数据进行数据清理。</p>
<p>数据包含“伪”unicode 字符,这些字符实际上嵌入到字符串中,如“u00e9”等。</p>
<p>所以一个字段可能是..“Jalostotitlu00e1n”
我需要撕掉那个笨拙的'u00e1n'并将其替换为相应的utf字符</p>
<p>我可以在 mysql 中执行此操作,也许使用子字符串和 CHR,但我通过 PHP 预处理数据,所以我也可以在那里执行此操作。</p>
<p>我已经知道如何配置 mysql 和 php 以使用 utf 数据。问题实际上出在我导入的源数据中。</p>
<p>谢谢</p>
/* php 将 utf8 html 转换为 ansi 的函数 */
public static function Utf8_ansi($valor='') { $utf8_ansi2 = array( "\u00c0" =>"À", "\u00c1" =>"Á", "\u00c2" =>"Â", "\u00c3" =>"Ã", "\u00c4" =>"Ä", "\u00c5" =>"Å", "\u00c6" =>"Æ", "\u00c7" =>"Ç", "\u00c8" =>"È", "\u00c9" =>"É", "\u00ca" =>"Ê", "\u00cb" =>"Ë", "\u00cc" =>"Ì", "\u00cd" =>"Í", "\u00ce" =>"Î", "\u00cf" =>"Ï", "\u00d1" =>"Ñ", "\u00d2" =>"Ò", "\u00d3" =>"Ó", "\u00d4" =>"Ô", "\u00d5" =>"Õ", "\u00d6" =>"Ö", "\u00d8" =>"Ø", "\u00d9" =>"Ù", "\u00da" =>"Ú", "\u00db" =>"Û", "\u00dc" =>"Ü", "\u00dd" =>"Ý", "\u00df" =>"ß", "\u00e0" =>"à", "\u00e1" =>"á", "\u00e2" =>"â", "\u00e3" =>"ã", "\u00e4" =>"ä", "\u00e5" =>"å", "\u00e6" =>"æ", "\u00e7" =>"ç", "\u00e8" =>"è", "\u00e9" =>"é", "\u00ea" =>"ê", "\u00eb" =>"ë", "\u00ec" =>"ì", "\u00ed" =>"í", "\u00ee" =>"î", "\u00ef" =>"ï", "\u00f0" =>"ð", "\u00f1" =>"ñ", "\u00f2" =>"ò", "\u00f3" =>"ó", "\u00f4" =>"ô", "\u00f5" =>"õ", "\u00f6" =>"ö", "\u00f8" =>"ø", "\u00f9" =>"ù", "\u00fa" =>"ú", "\u00fb" =>"û", "\u00fc" =>"ü", "\u00fd" =>"ý", "\u00ff" =>"ÿ"); return strtr($valor, $utf8_ansi2); }有一个办法。将所有
uXXXX替换为其 HTML 表示形式,并执行html_entity_decode()即
echo html_entity_decode("Jalostotitlán");u1234形式的每个 UTF 字符都可以在 HTML 中打印为ሴ。但是进行替换非常困难,因为如果没有其他字符来标识 UTF 序列的开头,则可能会出现很多误报。一个简单的正则表达式可能是preg_replace('/u([\da-fA-F]{4})/', '\1;', $str)