xml.sax(Simple API for XML)是C写的吗?只是接口层哦。
13.9 xml.sax -- Support for SAX2 parsers
New in version 2.0.
The xml.sax package provides a number of modules which implement the Simple API for
XML (SAX) interface for Python. The package itself provides the SAX exceptions and the
convenience functions which will be most used by users of the SAX API.
The convenience functions are:
make_parser( [parser_list])
Create and return a SAX XMLReader object. The first parser found will be used. If
parser_list is provided, it must be a sequence of strings which name modules that have
a function named create_parser(). Modules listed in parser_list will be used before
modules in the default list of parsers.
linux下的C语言libxml2-2.5 库能正确处理 gb2312编码的 xml文件。
我的权宜的办法就像你说的,先把文件读取出来,转成utf-8,把xml声明语句中的gb2312替换为ut
f-8
import re
import StringIO
def xml_utf8_open(f):
txt = ec = ''
for line in f.readlines():
if ec:
txt = txt + line
continue
x = re.match('<?xml(s|version.{4,})+encoding="(w*)".*', line)
if x:
ec = x.group(2)
txt = txt + re.sub(x.group(2), 'utf-8', line)
else:
txt = txt + line
if ec and ec != 'utf-8':
return StringIO.StringIO(unicode(txt, ec).encode('utf-8'))
return StringIO.StringIO(txt)
这段代码这么写好像很没效率,怎么改改呢?