I needed this for work/personal use. Sometimes you'll have a XML string generated as one long string and no line breaks...nusoap in the case of today/work, but there are any other number of possible things that will generate these. Anyways, this simply takes a long XML string and returns an indented/line-breaked version of the string for display/readability.
<?
function xmlIndent($str){
$ret = "";
$indent = 0;
$indentInc = 3;
$noIndent = false;
while(($l = strpos($str,"<",$i))!==false){
if($l!=$r && $indent>0){ $ret .= "\n" . str_repeat(" ",$indent) . substr($str,$r,($l-$r)); }
$i = $l+1;
$r = strpos($str,">",$i)+1;
$t = substr($str,$l,($r-$l));
if(strpos($t,"/")==1){
$indent -= $indentInc;
$noIndent = true;
}
else if(($r-$l-strpos($t,"/"))==2 || substr($t,0,2)=="<?"){ $noIndent = true; }
if($indent<0){ $indent = 0; }
if($ret){ $ret .= "\n"; }
$ret .= str_repeat(" ",$indent);
$ret .= $t;
if(!$noIndent){ $indent += $indentInc; }
$noIndent = false;
}
$ret .= "\n";
return($ret);
}
?>
(...this was only tested for what i needed at work, could POSSIBLY need additions)
CLXXIX. XML パーサ関数
導入
XML (eXtensible Markup Language) は、Web における構造化された ドキュメント交換用のデータフォーマットです。XML は、World Wide Web consortium (W3C) で規定された規格です。XML に関する情報およ び関連する技術は、» http://www.w3.org/XML/ で参照することができます。
このPHPエクステンションは、James Clark氏の expat™のサポートをPHPに付加します。 このツールキットは、XML ドキュメントの構文解析をしますが、 検証は行いません。3種類のソース 文字エンコーディング、 US-ASCII, ISO-8859-1 ,UTF-8 がPHPでサポートされます。UTF-16 はサポートさ れません。
この拡張モジュールは、XML パーサの作成 を行い、異なった XML イベントに関してハンドラ を定義します。各XMLパーサーには、設定可能な小数の パラメータ もあります。
要件
このエクステンションは、expat™ を使用します。 これは、» http://www.jclark.com/xml/expat.htmlにあります。 expatに付属のMakefileは、デフォルトでライブラリを構築しません。こ れを行うmakeルールを次のように指定できます。
libexpat.a: $(OBJS)
ar -rc $@ $(OBJS)
ranlib $@
expat のソース RPM パッケージが
» http://sourceforge.net/projects/expat/ にあります。
インストール手順
付属しているexpatライブラリを用いて以下の関数はデフォルトで有効となっ ています。 --disable-xmlを指定してXMLサポート を無効にすることができます。Apache 1.3.9以降でモジュールとしてPHPを コンパイルする場合、PHPは、Apacheから自動的に付属する expat™ライブラリを使用します。 付属するexpatライブラリを使用したくない場合には、 --with-expat-dir=DIRを指定してPHP のconfigureを実行してください。ただし、DIRは、expatをインストールした ベースディレクトリです。
Windows 版の PHP には この拡張モジュールのサポートが組み込まれています。これらの関数を使用 するために拡張モジュールを追加でロードする必要はありません。
実行時設定
設定ディレクティブは定義されていません。
リソース型
xml_parser_create()および xml_parser_create_ns() により返された xmlリソースは、 このエクステンションにより提供された関数で使用される XMLパーサのインスタンスを参照します。
定義済み定数
以下の定数が定義されています。 この関数の拡張モジュールが PHP 組み込みでコンパイルされているか、 実行時に動的にロードされている場合のみ使用可能です。
- XML_ERROR_NONE (integer)
- XML_ERROR_NO_MEMORY (integer)
- XML_ERROR_SYNTAX (integer)
- XML_ERROR_NO_ELEMENTS (integer)
- XML_ERROR_INVALID_TOKEN (integer)
- XML_ERROR_UNCLOSED_TOKEN (integer)
- XML_ERROR_PARTIAL_CHAR (integer)
- XML_ERROR_TAG_MISMATCH (integer)
- XML_ERROR_DUPLICATE_ATTRIBUTE (integer)
- XML_ERROR_JUNK_AFTER_DOC_ELEMENT (integer)
- XML_ERROR_PARAM_ENTITY_REF (integer)
- XML_ERROR_UNDEFINED_ENTITY (integer)
- XML_ERROR_RECURSIVE_ENTITY_REF (integer)
- XML_ERROR_ASYNC_ENTITY (integer)
- XML_ERROR_BAD_CHAR_REF (integer)
- XML_ERROR_BINARY_ENTITY_REF (integer)
- XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF (integer)
- XML_ERROR_MISPLACED_XML_PI (integer)
- XML_ERROR_UNKNOWN_ENCODING (integer)
- XML_ERROR_INCORRECT_ENCODING (integer)
- XML_ERROR_UNCLOSED_CDATA_SECTION (integer)
- XML_ERROR_EXTERNAL_ENTITY_HANDLING (integer)
- XML_OPTION_CASE_FOLDING (integer)
- XML_OPTION_TARGET_ENCODING (integer)
- XML_OPTION_SKIP_TAGSTART (integer)
- XML_OPTION_SKIP_WHITE (integer)
イベントハンドラ
XML イベントハンドラは次のように定義されます。
表 333. サポートされる XML ハンドラ
| ハンドラ設定用の PHP 関数 | イベントの説明 |
|---|---|
| xml_set_element_handler() | 要素イベントは、XML パーサーが開始または終了タグに出会うたび に発行されます。開始タグと終了タグについて別のハンドラがあり ます。 |
| xml_set_character_data_handler() | 文字データは、タグの間の空白を含めて XML ドキュメントにおけ るほぼ全ての非マークアップ部分の内容です。XML パーサーは、 空白を加えたり削除したりしないことに注意してください。空白が 意味を有するかどうかを決めるのは、アプリケーション側の責任 です。 |
| xml_set_processing_instruction_handler() |
PHP プログラマは、既に処理用命令 (PI) に既に慣れているに違
いありません。<?php ?> は処理用命令であり、この場合、
php は "PI ターゲット"と呼ばれます。
これらの処理はアプリケーション依存ですが、全ての PI ターゲッ
トが "XML" から始まることだけは、規定されています。
|
| xml_set_default_handler() | 別のハンドラでしないことをデフォルトのハンドラで行います。 XML およびドキュメント型の宣言のようなことをデフォルトハンドラで 行います。 |
| xml_set_unparsed_entity_decl_handler() | このハンドラは、処理されない (NDATA) エンティティの宣言用に コールされます。 |
| xml_set_notation_decl_handler() | このハンドラは、表記の宣言用にコールされます。 |
| xml_set_external_entity_ref_handler() | このハンドラは、XML パーサーが外部処理された通常のエンティティ への参照を見つけた際にコールされます。これは、例えば、ファ イルまたは URL への参照とすることが可能です。例としては、 外部エンティティ の例 を参照ください。 |
大文字変換(Case Folding)
要素ハンドラ関数は、その要素に大文字小文字を変換する (case-folded)の名前をつけることができます。 大文字変換(case-folding) は、XML標準により "大文字でないものは等 価な大文字に置換される一連の文字に適用されるプロセス" として定義 されています。言い替えると、XML に関しては単に大文字変換は大文字 にすることを意味します。
デフォルトで、ハンドラ関数に渡される全ての要素名は、大文字変換さ れます。この動作は、xml_parser_get_option() およびxml_parser_set_option() 関数でXMLパーサー 毎にそれぞれ問い合わせ、制御することが可能です。
エラーコード
(xml_parse() により返されるものとして) XMLエラーコードとして次のような定数が定義されています。:
| XML_ERROR_NONE |
| XML_ERROR_NO_MEMORY |
| XML_ERROR_SYNTAX |
| XML_ERROR_NO_ELEMENTS |
| XML_ERROR_INVALID_TOKEN |
| XML_ERROR_UNCLOSED_TOKEN |
| XML_ERROR_PARTIAL_CHAR |
| XML_ERROR_TAG_MISMATCH |
| XML_ERROR_DUPLICATE_ATTRIBUTE |
| XML_ERROR_JUNK_AFTER_DOC_ELEMENT |
| XML_ERROR_PARAM_ENTITY_REF |
| XML_ERROR_UNDEFINED_ENTITY |
| XML_ERROR_RECURSIVE_ENTITY_REF |
| XML_ERROR_ASYNC_ENTITY |
| XML_ERROR_BAD_CHAR_REF |
| XML_ERROR_BINARY_ENTITY_REF |
| XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF |
| XML_ERROR_MISPLACED_XML_PI |
| XML_ERROR_UNKNOWN_ENCODING |
| XML_ERROR_INCORRECT_ENCODING |
| XML_ERROR_UNCLOSED_CDATA_SECTION |
| XML_ERROR_EXTERNAL_ENTITY_HANDLING |
文字エンコーディング
PHPのXML拡張機能は、異なった文字エンコーディング を通じて» Unicode 文字セットをサポートします。ソースエンコーディング およびターゲットエンコーディング という2種類の文字エンコーディングがあります。 PHP におけるドキュメントの内部表現は、常に UTF-8でエンコードされます。
ソースエンコーディングは、XMLドキュメントが 構文解析された際に行わ れます。XML パーサの 作成を行う際に、ソースエンコードを指定することができます。 (このエンコーディングは、その XML パーサーが存在する間、後で変更す ることはできません)サポートされるソースエンコーディングは、 ISO-8859-1, US-ASCII , UTF-8 です。前の二つは、シングルバイトエンコー ディングです。これは、各文字がシングルバイトで表現されることを意 味します。UTF-8 は、1から4バイトの可変ビット 数(最大21ビット)で構成された文字をエンコードすることが可能です。 PHP で用いられるデフォルトのソースエンコーディングは、 ISO-8859-1です。
ターゲットエンコーディングは、PHPがデータをXMLハンドラ関数に 渡す時に行われます。あるXMLパーサが作成された際、ターゲットエン コーディングは、ソースエンコーディングと同様に設定されます。 しかし、これは、いつでも変更可能です。ターゲットエンコーディング は、タグ名と同様に文字データに作用し、命令を処理します。
XML パーサがソースエンコーディングが表現できる範囲の外側の文字に 出会った場合、エラーが返されます。
解釈するXMLドキュメントにおいてPHPが文字に出会った際に、選択した ターゲットエンコーディングで表現できない文字に出会った場合、問題 の文字は "降格" されます。現在、このことはこのような文字が疑問符 で置換されることを意味します。
例
以下にXMLドキュメントを処理するPHPスクリプトの例をいくつか示します。
この最初の例は、あるドキュメント中のstart エレメントの構造をイン デントを付けて表示します。
例 2561. XML エレメント構造を表示
<?php
$file = "data.xml";
$depth = array();
function startElement($parser, $name, $attrs)
{
global $depth;
for ($i = 0; $i < $depth[$parser]; $i++) {
print " ";
}
print "$name\n";
$depth[$parser]++;
}
function endElement($parser, $name)
{
global $depth;
$depth[$parser]--;
}
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
if (!($fp = fopen($file, "r"))) {
die("XML 入力をオープンできませんでした");
}
while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML エラー: %s が %d 行目で発生しました",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
xml_parser_free($xml_parser);
?>
例 2562. XMLをHTMLにマップする
この例は、XMLドキュメントのタグを直接HTMLタグにマップします。 "map array" にないエレメントは無視されます。もちろん、この例は、 特定の XML ドキュメント型を有する場合のみ動作します。
<?php
$file = "data.xml";
$map_array = array(
"BOLD" => "B",
"EMPHASIS" => "I",
"LITERAL" => "TT"
);
function startElement($parser, $name, $attrs)
{
global $map_array;
if (isset($map_array[$name])) {
echo "<$map_array[$name]>";
}
}
function endElement($parser, $name)
{
global $map_array;
if (isset($map_array[$name])) {
echo "</$map_array[$name]>";
}
}
function characterData($parser, $data)
{
echo $data;
}
$xml_parser = xml_parser_create();
// case-folding を用いることで、$map_array から確実にタグを見つけられるようにします
xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, true);
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
if (!($fp = fopen($file, "r"))) {
die("XML 入力をオープンできませんでした");
}
while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML エラー: %s が %d 行目で発生しました",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
xml_parser_free($xml_parser);
?>
この例は、XML コードに焦点を当てます。この例は、他のドキュメント をインクルードし処理するための外部エンティティリファレンスのハン ドラの使用法およびPIの処理方法、PIが含むコードに関する"信頼度" を定義する手段を説明します。
この例で使用される XML ドキュメントは、例題ファイル (xmltest.xml および xmltest2.xml) にあります。
例 2563. 外部エンティティの例
<?php
$file = "xmltest.xml";
function trustedFile($file)
{
// 信頼できるのは、自分自身が所有しているローカルファイルのみです
if (!eregi("^([a-z]+)://", $file)
&& fileowner($file) == getmyuid()) {
return true;
}
return false;
}
function startElement($parser, $name, $attribs)
{
echo "<<font color=\"#0000cc\">$name</font>";
if (count($attribs)) {
foreach ($attribs as $k => $v) {
echo " <font color=\"#009900\">$k</font>=\"<font
color=\"#990000\">$v</font>\"";
}
}
echo ">";
}
function endElement($parser, $name)
{
echo "</<font color=\"#0000cc\">$name</font>>";
}
function characterData($parser, $data)
{
echo "<b>$data</b>";
}
function PIHandler($parser, $target, $data)
{
switch (strtolower($target)) {
case "php":
global $parser_file;
// もし「信頼できる」ドキュメントだった場合、その中に書かれている
// PHP コードを実行しても安全だと考えます。そうでない場合、
// コードを実行するかわりにコードそのものを表示します。
if (trustedFile($parser_file[$parser])) {
eval($data);
} else {
printf("信頼できない PHP コード: <i>%s</i>",
htmlspecialchars($data));
}
break;
}
}
function defaultHandler($parser, $data)
{
if (substr($data, 0, 1) == "&" && substr($data, -1, 1) == ";") {
printf('<font color="#aa00aa">%s</font>',
htmlspecialchars($data));
} else {
printf('<font size="-1">%s</font>',
htmlspecialchars($data));
}
}
function externalEntityRefHandler($parser, $openEntityNames, $base, $systemId,
$publicId) {
if ($systemId) {
if (!list($parser, $fp) = new_xml_parser($systemId)) {
printf("エンティティ %s (%s にある) をオープンできませんでした\n", $openEntityNames,
$systemId);
return false;
}
while ($data = fread($fp, 4096)) {
if (!xml_parse($parser, $data, feof($fp))) {
printf("XML エラー: %s が、%d 行目でエンティティ %s のパース中に発生しました\n",
xml_error_string(xml_get_error_code($parser)),
xml_get_current_line_number($parser), $openEntityNames);
xml_parser_free($parser);
return false;
}
}
xml_parser_free($parser);
return true;
}
return false;
}
function new_xml_parser($file)
{
global $parser_file;
$xml_parser = xml_parser_create();
xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, 1);
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
xml_set_processing_instruction_handler($xml_parser, "PIHandler");
xml_set_default_handler($xml_parser, "defaultHandler");
xml_set_external_entity_ref_handler($xml_parser, "externalEntityRefHandler");
if (!($fp = @fopen($file, "r"))) {
return false;
}
if (!is_array($parser_file)) {
settype($parser_file, "array");
}
$parser_file[$xml_parser] = $file;
return array($xml_parser, $fp);
}
if (!(list($xml_parser, $fp) = new_xml_parser($file))) {
die("XML 入力をオープンできませんでした");
}
echo "<pre>";
while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML エラー: %s が %d 行目で発生しました\n",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
echo "</pre>";
echo "パースが完了しました\n";
xml_parser_free($xml_parser);
?>
例 2564. xmltest.xml
< chapter SYSTEM "/just/a/test.dtd" [
<!ENTITY plainEntity "FOO entity">
<!ENTITY systemEntity SYSTEM "xmltest2.xml">
]>
<>
<>Title &plainEntity;</>
<>
<>
< cols="3">
<>
<><>a1</>< morerows="1">b1</><>c1</></>
<><>a2</><>c2</></>
<><>a3</><>b3</><>c3</></>
</>
</>
</>
</>
&systemEntity;
< id="about">
<>About this Document</>
<>
<!-- this is a comment -->
<?php print 'Hi! This is PHP version '.phpversion(); ?>
</>
</>
</>
このファイルは、xmltest.xml からインクルードされます。
例 2565. xmltest2.xml
<!DOCTYPE foo [
<!ENTITY testEnt "test entity">
]>
<>
< attrib="value"/>
&testEnt;
<?php print "This is some more PHP code being executed."; ?>
</>
目次
- utf8_decode — UTF-8 エンコードされた ISO-8859-1 文字列をシングルバイトの ISO-8859-1 に変換する
- utf8_encode — ISO-8859-1 文字列を UTF-8 にエンコードする
- xml_error_string — XML パーサのエラー文字列を得る
- xml_get_current_byte_index — XML パーサのカレントのバイトインデックスを得る
- xml_get_current_column_number — XML パーサのカレントのカラム番号を取得する
- xml_get_current_line_number — XML パーサのカレントの行番号を得る
- xml_get_error_code — XML パーサのエラーコードを得る
- xml_parse_into_struct — 配列構造体に XML データを処理する
- xml_parse — XML ドキュメントの処理を開始する
- xml_parser_create_ns — 名前空間をサポートした XML パーサを生成する
- xml_parser_create — XML パーサを作成する
- xml_parser_free — XML パーサを解放する
- xml_parser_get_option — XML パーサからオプションを得る
- xml_parser_set_option — XML パーサのオプションを設定する
- xml_set_character_data_handler — 文字データハンドラを設定する
- xml_set_default_handler — デフォルトのハンドラを設定する
- xml_set_element_handler — 開始要素および終了要素のハンドラを設定する
- xml_set_end_namespace_decl_handler — 名前空間終了ハンドラを設定する
- xml_set_external_entity_ref_handler — 外部エンティティリファレンスハンドラを設定する
- xml_set_notation_decl_handler — 表記法宣言ハンドラを設定する
- xml_set_object — オブジェクト内部で XML パーサを使用する
- xml_set_processing_instruction_handler — 処理命令 (PI) 用ハンドラを設定する
- xml_set_start_namespace_decl_handler — 名前空間開始ハンドラを設定する
- xml_set_unparsed_entity_decl_handler — 処理されないエンティティ宣言用ハンドラを設定する
XML パーサ関数
14-Jul-2007 12:04
08-Jun-2007 08:29
<?php
/**
* correction of the previous code
*/
/**
* Converts XML into Array
*
* @param array $result
* @param object $root
* @param string $rootname
*/
function convert_xml2array(&$result,$root,$rootname='root'){
$n=count($root->children());
if ($n>0){
/**
* start of the correction
*/
if (!isset($result[$rootname]['@attributes'])){
$result[$rootname]['@attributes']=array();
foreach ($root->attributes() as $atr=>$value){
$result[$rootname]['@attributes'][$atr]=(string)$value;
}
}
/**
* end of the correction
*/
foreach ($root->children() as $child){
$name=$child->getName();
convert_xml2array($result[$rootname][],$child,$name);
}
} else {
$result[$rootname]= (array) $root;
if (!isset($result[$rootname]['@attributes'])){
$result[$rootname]['@attributes']=array();
}
}
}
/**
* Example how to use the function convert_xml2array
*/
/**
* Return Array from a xml string
*
* @param string $xml
* @return array
*/
function get_array_fromXML($xml){
$result=array();
$doc=simplexml_load_string($xml);
convert_xml2array($result,$doc);
return $result['root'];
}
?>
15-Apr-2007 03:50
Here is an example of another XML parsing script that parses the document into an array/object structure instead of relying on startElement, endElement, etc handlers.
You can find the documentation at:
http://www.criticaldevelopment.net/xml/doc.php
And the code (both PHP4 and PHP5 versions):
http://www.criticaldevelopment.net/xml/parser_php4.phps
http://www.criticaldevelopment.net/xml/parser_php5.phps
If you have any questions about it, just drop me an e-mail.
12-Apr-2007 08:19
/*
* Parse rss news, quotes etc.
*
* author : phpZmurf <phpzmurf[at]yahoo.com>
* created: 12.04.2007
* ver : 1.0
*
*/
$data = implode("", file("http://feeds.feedburner.com/quotationspage/qotd/"));
$parser = xml_parser_create();
xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, 0);
xml_parser_set_option($parser, XML_OPTION_SKIP_WHITE, 1);
xml_parse_into_struct($parser, $data, $values, $tags);
xml_parser_free($parser);
# data saved here
$arrQuotes = array();
# at the beginig - the tag is set colsed
$tagOpen = false;
foreach($values as $key => $item) {
if(!$tagOpen and $item['tag'] == 'item' and $item['type'] == 'open') {
# item tag opens
$tagOpen = true;
# empty temporary variables
$temp_title = '';
$temp_description = '';
$temp_guid = '';
$temp_link = '';
} elseif($item['tag'] == 'item' and $item['type'] == 'close') {
# item tag ends
$tagOpen = false;
# if all 4 tags contain data... add them to output array
if($temp_title != '' and $temp_description != '' and $temp_guid != '' and $temp_link != '') {
$arrQuotes[] = array(
'title' => $temp_title,
'description' => $temp_description,
'guid' => $temp_guid,
'link' => $temp_link
);
}
} else {
# save data into temporary variables
switch($item['tag']) {
case 'title':
$temp_title = $item['value'];
break;
case 'description':
# this here quz there was a fuggin <p> at the end of the desription
#$temp_description = $item['value'];
$temp_description = substr($item['value'], 0, strpos($item['value'], '<'));
break;
case 'guid':
$temp_guid = $item['value'];
break;
case 'link':
$temp_link = $item['value'];
break;
default: break;
}
}
}
foreach($arrQuotes as $key => $item) {
print_r($item);
}
15-Mar-2007 11:27
I took the code posted by forqoun and modified it to be somewhat more readable (by me), somewhat more friendly to the idea of parsing multiple files with the same object, and to be compatable with a HTTP POST of XML data. Anyone who's interested in my version of associated array output can check it out at http://www.sheer.us/code/php/xml-parse-to-associative-array.phpsrc
Be nice to me, this is my first published php code
31-Dec-2006 08:27
Time to add my attempt at a very simple script that parses XML into a structure:
<?php
class Simple_Parser
{
var $parser;
var $error_code;
var $error_string;
var $current_line;
var $current_column;
var $data = array();
var $datas = array();
function parse($data)
{
$this->parser = xml_parser_create('UTF-8');
xml_set_object($this->parser, $this);
xml_parser_set_option($this->parser, XML_OPTION_SKIP_WHITE, 1);
xml_set_element_handler($this->parser, 'tag_open', 'tag_close');
xml_set_character_data_handler($this->parser, 'cdata');
if (!xml_parse($this->parser, $data))
{
$this->data = array();
$this->error_code = xml_get_error_code($this->parser);
$this->error_string = xml_error_string($this->error_code);
$this->current_line = xml_get_current_line_number($this->parser);
$this->current_column = xml_get_current_column_number($this->parser);
}
else
{
$this->data = $this->data['child'];
}
xml_parser_free($this->parser);
}
function tag_open($parser, $tag, $attribs)
{
$this->data['child'][$tag][] = array('data' => '', 'attribs' => $attribs, 'child' => array());
$this->datas[] =& $this->data;
$this->data =& $this->data['child'][$tag][count($this->data['child'][$tag])-1];
}
function cdata($parser, $cdata)
{
$this->data['data'] .= $cdata;
}
function tag_close($parser, $tag)
{
$this->data =& $this->datas[count($this->datas)-1];
array_pop($this->datas);
}
}
$xml_parser = new Simple_Parser;
$xml_parser->parse('<foo><bar>test</bar></foo>');
?>
25-Dec-2006 02:53
Hi !
After parsing the XML and modifying it, I just add a method to rebuild the XML form the internal structure (xmlp->document).
The method xmlp->toXML writes into xmlp->XML attributes. Then, you just have to output it.
I hope it helps.
class XMLParser {
var $parser;
var $filePath;
var $document;
var $currTag;
var $tagStack;
var $XML;
var $_tag_to_close = false;
var $TAG_ATTRIBUT = 'attr';
var $TAG_DATA = 'data';
function XMLParser($path) {
$this->parser = xml_parser_create();
$this->filePath = $path;
$this->document = array();
$this->currTag =& $this->document;
$this->tagStack = array();
$this->XML = "";
}
function parse() {
xml_set_object($this->parser, $this);
xml_set_character_data_handler($this->parser, 'dataHandler');
xml_set_element_handler($this->parser, 'startHandler', 'endHandler');
if(!($fp = fopen($this->filePath, "r"))) {
die("Cannot open XML data file: $this->filePath");
return false;
}
while($data = fread($fp, 4096)) {
if(!xml_parse($this->parser, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($this->parser)),
xml_get_current_line_number($this->parser)));
}
}
fclose($fp);
xml_parser_free($this->parser);
return true;
}
function startHandler($parser, $name, $attribs) {
if(!isset($this->currTag[$name]))
$this->currTag[$name] = array();
$newTag = array();
if(!empty($attribs))
$newTag[$this->TAG_ATTRIBUT] = $attribs;
array_push($this->currTag[$name], $newTag);
$t =& $this->currTag[$name];
$this->currTag =& $t[count($t)-1];
array_push($this->tagStack, $name);
}
function dataHandler($parser, $data) {
$data = trim($data);
if(!empty($data)) {
if(isset($this->currTag[$this->TAG_DATA]))
$this->currTag[$this->TAG_DATA] .= $data;
else
$this->currTag[$this->TAG_DATA] = $data;
}
}
function endHandler($parser, $name) {
$this->currTag =& $this->document;
array_pop($this->tagStack);
for($i = 0; $i < count($this->tagStack); $i++) {
$t =& $this->currTag[$this->tagStack[$i]];
$this->currTag =& $t[count($t)-1];
}
}
function clearOutput () {
$this->XML = "";
}
function openTag ($tag) {
$this->XML.="<".strtolower ($tag);
$this->_tag_to_close = true;
}
function closeTag () {
if ($this->_tag_to_close) {
$this->XML.=">";
$this->_tag_to_close = false;
}
}
function closingTag ($tag) {
$this->XML.="</".strtolower ($tag).">";
}
function output_attributes ($contenu_fils) {
foreach ($contenu_fils[$this->TAG_ATTRIBUT] as $nomAttribut => $valeur) {
$this->XML.= " ".strtolower($nomAttribut)."=\"".$valeur."\"";
}
}
function addData ($texte) {
// to be completed
$ca = array ("é", "è", "ê", "à");
$par = array ("é", "è", "ê", "agrave;");
return htmlspecialchars(str_replace ($ca, $par, $texte), ENT_NOQUOTES);
}
function toXML ($tags="") {
if ($tags=="") {
$tags = $this->document;
$this->clearOutput ();
}
foreach ($tags as $tag => $contenu) {
$this->process ($tag, $contenu);
}
}
function process ($tag, $contenu) {
// Pour tous les TAGs
foreach ($contenu as $indice => $contenu_fils) {
$this->openTag ($tag);
// Pour tous les fils (non attribut et non data)
foreach ($contenu_fils as $tagFils => $fils) {
switch ($tagFils) {
case $this->TAG_ATTRIBUT:
$this->output_attributes ($contenu_fils);
$this->closeTag ();
break;
case $this->TAG_DATA:
$this->closeTag ();
$this->XML.= $this->addData ($contenu_fils [$this->TAG_DATA]);
break;
default:
$this->closeTag ();
$this->process ($tagFils, $fils);
break;
}
}
$this->closingTag ($tag);
}
}
}
21-Dec-2006 12:02
I reworked some of the code I found posted previously here, mainly so I could access the structure of the parsed xml file by the tags' names. So if I was parsing html that's also valid xml, I could access the page title by $xmlp->document['HTML'][0]['HEAD'][0]['TITLE'][0]['data']. The index after the tag name corresponds to the occurrence of that tag. If there were two <head></head> in the same depth, then the second one could get accessed by ['HEAD'][1].
<?php
class XMLParser
{
var $parser;
var $filePath;
var $document;
var $currTag;
var $tagStack;
function XMLParser($path)
{
$this->parser = xml_parser_create();
$this->filePath = $path;
$this->document = array();
$this->currTag =& $this->document;
$this->tagStack = array();
}
function parse()
{
xml_set_object($this->parser, $this);
xml_set_character_data_handler($this->parser, 'dataHandler');
xml_set_element_handler($this->parser, 'startHandler', 'endHandler');
if(!($fp = fopen($this->filePath, "r")))
{
die("Cannot open XML data file: $this->filePath");
return false;
}
while($data = fread($fp, 4096))
{
if(!xml_parse($this->parser, $data, feof($fp)))
{
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($this->parser)),
xml_get_current_line_number($this->parser)));
}
}
fclose($fp);
xml_parser_free($this->parser);
return true;
}
function startHandler($parser, $name, $attribs)
{
if(!isset($this->currTag[$name]))
$this->currTag[$name] = array();
$newTag = array();
if(!empty($attribs))
$newTag['attr'] = $attribs;
array_push($this->currTag[$name], $newTag);
$t =& $this->currTag[$name];
$this->currTag =& $t[count($t)-1];
array_push($this->tagStack, $name);
}
function dataHandler($parser, $data)
{
$data = trim($data);
if(!empty($data))
{
if(isset($this->currTag['data']))
$this->currTag['data'] .= $data;
else
$this->currTag['data'] = $data;
}
}
function endHandler($parser, $name)
{
$this->currTag =& $this->document;
array_pop($this->tagStack);
for($i = 0; $i < count($this->tagStack); $i++)
{
$t =& $this->currTag[$this->tagStack[$i]];
$this->currTag =& $t[count($t)-1];
}
}
}
?>
19-Dec-2006 04:53
RE: forquan (29-Jan-2006 12:45)
Thanks, for your code (it was what I need), but ... it didn't works with my XML file. I think that you tested it on simple XML. Never mind.
I change few lines (problem was in endHandler function), and now it WORKS :-)
<?php
$p =& new xmlParser();
$p->parse("/* XML file*/");
echo "<pre>";
print_r($p->output);
echo "</pre>";
class xmlParser{
var $xml_obj = null;
var $output = array();
var $attrs;
function xmlParser(){
$this->xml_obj = xml_parser_create();
xml_set_object($this->xml_obj,$this);
xml_set_character_data_handler($this->xml_obj, 'dataHandler');
xml_set_element_handler($this->xml_obj, "startHandler", "endHandler");
}
function parse($path){
if (!($fp = fopen($path, "r"))) {
die("Cannot open XML data file: $path");
return false;
}
while ($data = fread($fp, 4096)) {
if (!xml_parse($this->xml_obj, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($this->xml_obj)),
xml_get_current_line_number($this->xml_obj)));
xml_parser_free($this->xml_obj);
}
}
return true;
}
function startHandler($parser, $name, $attribs){
$_content = array();
$_content['name'] = $name;
if(!empty($attribs))
$_content['attrs'] = $attribs;
array_push($this->output, $_content);
}
function dataHandler($parser, $data){
if(!empty($data) && $data!="\n") {
$_output_idx = count($this->output) - 1;
$this->output[$_output_idx]['content'] .= $data;
}
}
function endHandler($parser, $name){
if(count($this->output) > 1) {
$_data = array_pop($this->output);
$_output_idx = count($this->output) - 1;
$add = array();
if(!$this->output[$_output_idx]['child'])
$this->output[$_output_idx]['child'] = array();
array_push($this->output[$_output_idx]['child'], $_data);
}
}
}
?>
16-Dec-2006 08:55
Re: hutch at midwales dot com
That function looks like major overkill.
To remove all white space between tags you could simply do:
preg_replace (">/\s+</" , "><" , $string);
02-Oct-2006 01:26
First off, I'd like thank all and sundry for providing this excellent resource, it has been very helpful in getting my head around xml parsing.
I was recently handed the task of collecting a variety of xml streams, from many different sources and of widely varying quality.
If have found that the following function helped parsing the input by cleaning it up. It removes all leading and trailing whitespace and removes carriage returns and linefeeds.
Using this function before using xml_parser_create() has helped reduce a number of otherwise unexplainable anomalies, such as arbitrary cutoff of data or the data being divided into two, requiring concatenation. Data longer than 1024 characters still has to be concatenated, but I can live with that.
<?php
// remove whitespace and linefeeds and returns the name of a temporary file
// takes the name of an existing file as a parameter
function cleanxmlfile($file, $tmpdir="/tmp", $prefix="xxx_") {
$tmp = file_get_contents ($file);
$tmp = preg_replace("/^\s+/m","",$tmp);
$tmp = preg_replace("/\s+$/m","",$tmp);
$tmp = preg_replace("/\r/","",$tmp);
$tmp = preg_replace("/\n/","",$tmp);
$tmpfname = tempnam($tmpdir, $prefix);
$handle = fopen($tmpfname, "w");
fwrite($handle, "$tmp");
fclose($handle);
return($tmpfname);
}
?>
HTH
29-Jan-2006 08:45
Here's code that will create an associative array from an xml file. Keys are the tag data and subarrays are formed from attributes and child tags
<?php
$p =& new xmlParser();
$p->parse('/*xml file*/');
print_r($p->output);
?>
<?php
class xmlParser{
var $xml_obj = null;
var $output = array();
var $attrs;
function xmlParser(){
$this->xml_obj = xml_parser_create();
xml_set_object($this->xml_obj,$this);
xml_set_character_data_handler($this->xml_obj, 'dataHandler');
xml_set_element_handler($this->xml_obj, "startHandler", "endHandler");
}
function parse($path){
if (!($fp = fopen($path, "r"))) {
die("Cannot open XML data file: $path");
return false;
}
while ($data = fread($fp, 4096)) {
if (!xml_parse($this->xml_obj, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($this->xml_obj)),
xml_get_current_line_number($this->xml_obj)));
xml_parser_free($this->xml_obj);
}
}
return true;
}
function startHandler($parser, $name, $attribs){
$_content = array();
if(!empty($attribs))
$_content['attrs'] = $attribs;
array_push($this->output, $_content);
}
function dataHandler($parser, $data){
if(!empty($data) && $data!="\n") {
$_output_idx = count($this->output) - 1;
$this->output[$_output_idx]['content'] .= $data;
}
}
function endHandler($parser, $name){
if(count($this->output) > 1) {
$_data = array_pop($this->output);
$_output_idx = count($this->output) - 1;
$add = array();
if ($_data['attrs'])
$add['attrs'] = $_data['attrs'];
if ($_data['child'])
$add['child'] = $_data['child'];
$this->output[$_output_idx]['child'][$_data['content']] = $add;
}
}
}
?>
18-Nov-2005 01:56
If you need utf8_encode support and configure PHP with --disable-all you will have some trouble. Unfortunately the configure options aren't completely documented. If you need utf8 functions and have everything disabled just recompile PHP with --enable-xml and you should be good to go.
06-Apr-2005 06:31
to import xml into mysql
$file = "article_2_3032005467.xml";
$feed = array();
$key = "";
$info = "";
function startElement($xml_parser, $attrs ) {
global $feed;
}
function endElement($xml_parser, $name) {
global $feed, $info;
$key = $name;
$feed[$key] = $info;
$info = ""; }
function charData($xml_parser, $data ) {
global $info;
$info .= $data; }
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "charData" );
$fp = fopen($file, "r");
while ($data = fread($fp, 8192))
!xml_parse($xml_parser, $data, feof($fp));
xml_parser_free($xml_parser);
$sql= "INSERT INTO `article` ( `";
$j=0;
$i=count($feed);
foreach( $feed as $assoc_index => $value )
{
$j++;
$sql.= strtolower($assoc_index);
if($i>$j) $sql.= "` , `";
if($i<=$j) {$sql.= "` ) VALUES ('";}
}
$h=0;
foreach( $feed as $assoc_index => $value )
{
$h++;
$sql.= utf8_decode(trim(addslashes($value)));
if($i-1>$h) $sql.= "', '";
if($i<=$h) $sql.= "','')";
}
$sql=trim($sql);
echo $sql;
20-Sep-2004 05:35
The documentation regarding white space was never complete I think.
The XML_OPTION_SKIP_WHITE doesn't appear to do anything. I want to preserve the newlines in a cdata section. Setting XML_OPTION_SKIP_WHITE to 0 or false doesn't appear to help. My character_data_handler is getting called once for each line. This obviously should be reflected in the documentation as well. When/how often does the handler get called exactly? Having to build separate test cases is very time consuming.
Inserting newlines myself in my cdata handler is no good either. For non actual CDATA sections that cause my handler to get called, long lines are split up in multiple calls. My handler would not be able to tell the difference whether or not the subsequent calls would be due to the fact that the data is coming from the next line or the fact that some internal buffer is long enough for it to 'flush' out and call the handler.
This behaviour also needs to be properly documented.
19-Mar-2004 03:36
I wrote a simple xml parser mainly to deal with rss version 2. I found lots of examples on the net, but they were all masive and bloated and hard to manipulate.
Output is sent to an array, which holds arrays containg data for each item.
Obviously, you will have to make modifications to the code to suit your needs, but there isnt a lot of code there, so that shouldnt be a problem.
<?php
$currentElements = array();
$newsArray = array();
readXml("./news.xml");
echo("<pre>");
print_r($newsArray);
echo("</pre>");
// Reads XML file into formatted html
function readXML($xmlFile)
{
$xmlParser = xml_parser_create();
xml_parser_set_option($xmlParser, XML_OPTION_CASE_FOLDING, false);
xml_set_element_handler($xmlParser, startElement, endElement);
xml_set_character_data_handler($xmlParser, characterData);
$fp = fopen($xmlFile, "r");
while($data = fread($fp, filesize($xmlFile))){
xml_parse($xmlParser, $data, feof($fp));}
xml_parser_free($xmlParser);
}
// Sets the current XML element, and pushes itself onto the element hierarchy
function startElement($parser, $name, $attrs)
{
global $currentElements, $itemCount;
array_push($currentElements, $name);
if($name == "item"){$itemCount += 1;}
}
// Prints XML data; finds highlights and links
function characterData($parser, $data)
{
global $currentElements, $newsArray, $itemCount;
$currentCount = count($currentElements);
$parentElement = $currentElements[$currentCount-2];
$thisElement = $currentElements[$currentCount-1];
if($parentElement == "item"){
$newsArray[$itemCount-1][$thisElement] = $data;}
else{
switch($name){
case "title":
break;
case "link":
break;
case "description":
break;
case "language":
break;
case "item":
break;}}
}
// If the XML element has ended, it is poped off the hierarchy
function endElement($parser, $name)
{
global $currentElements;
$currentCount = count($currentElements);
if($currentElements[$currentCount-1] == $name){
array_pop($currentElements);}
}
?>
03-Feb-2004 07:27
I have created a class set that both parses XML into an object structure and from that structure creates XML code. It is mostly finished but I thought I would post here as it may help someone out or if someone wants to use it as a base for their own parser. The method for creating the object is original compared to the posts before this one.
The object tree is created by created seperate tag objects for each tag inside the main document object and associating them together by way of object references. An index table is created so that each tag is assigned an ID number (in numerical order from 0) and can be accessed directly using that ID number. Each tag has object references to its children. There are no uses of eval() in this code.
The code is too long to post here, so I have made a HTML page that has it: http://www.withouthonor.com/obj_xml.html
Sample code would look something like this:
<?
$xml = new xml_doc($my_xml_code);
$xml->parse();
$root_tag =& $xml->xml_index[0];
$children =& $root_tag->children;
// and so forth
// To create XML code using the object, would be similar to this:
$my_xml = new xml_doc();
$root_tag = $my_xml->CreateTag('ROOTTAG');
$my_xml->CreateTag('CHILDTAG',array(),'',$root_tag);
// The following is used for the CreateTag() method
// string Name (The name of the child tag)
// array Attributes (associative array of attributes for tag)
// string Content (textual data for the child tag)
// int ParentID (Index number for parent tag)
// To generate the XML, use the following method
$out_xml = $my_xml->generate();
?>
18-Dec-2003 07:38
Hey;
If you need to parse XML on an older version of PHP (e.g. 4.0) or if you can't get the expat extension enabled on your server, you might want to check out the Saxy and DOMIT! xml parsers from Engage Interactive. They're opensource and pure php, so no extensions or changes to your server are required. I've been using them for over a month on some projects with no problems whatsoever!
Check em out at:
DOMIT!, a DOM based xml parser, uses Saxy (included)
http://www.engageinteractive.com/redir.php?resource=1&target=domit
or
Saxy, a sax based xml parser
http://www.engageinteractive.com/redir.php?resource=2&target=saxy
Brad
08-Nov-2003 07:48
In regards to jon at gettys dot org's XML object, The data should be TRIM()ed to remove any whitespace that could appear in CDATA entered as :
<xml_tag>
cdata here. cdata here. cdata here. cdata here.
</xml_tag>
So, after applying fred at barron dot com's suggested change to the characterData function, the function should appear as:
function characterData($parser, $data)
{
global $obj;
$data = addslashes($data);
eval($obj->tree."->data.='".trim($data)."';");
}
SIDE NOTE: I'm fairly new to XML so perhaps it is considered bad form to enter CDATA as I did in my example. Is this true or is the extra whitespace for the sake of readablity acceptable?
03-Jul-2003 12:29
A fix for the fread breaking thing:
while ($data = fread($fp, 4096)) {
$data = $cache . $data;
if (!feof($fp)) {
if (preg_match_all("(</?[a-z0-9A-Z]+>)", $data, $regs)) {
$lastTagname = $regs[0][count($regs[0])-1];
$split = false;
for ($i=strlen($data)-strlen($lastTagname); $i>=strlen($lastTagname); $i--) {
if ($lastTagname == substr($data, $i, strlen($lastTagname))) {
$cache = substr($data, $i, strlen($data));
$data = substr($data, 0, $i);
$split = true;
break;
}
}
}
if (!$split) {
$cache = $data;
}
}
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d", xml_error_string(xml_get_error_code($xml_parser)), xml_get_current_line_number($xml_parser)));
}
}
21-May-2003 07:12
The above example doesn't work when you're parsing a string being returned from a curl operation (why I don't know!) I kept getting undefined offsets at the highest element number in both the start and end element functions. It wasn't the string itself I know, because I substringed it to death with the same results. But I fixed the problem by adding these lines of code...
function defaultHandler($parser, $name) {
global $depth;
@ $depth[$parser]--;
}
xml_set_default_handler($xml_parser, "defaultHandler");
Hope this helps 8-}
23-Apr-2003 09:28
regarding jon at gettys dot org's nice XML to Object code, I've made some useful changes (IMHO) to the characterData function... my minor modifications allow multiple lines of data and it escapes quotes so errors don't occur in the eval...
function characterData($parser, $data)
{
global $obj;
$data = addslashes($data);
eval($obj->tree."->data.='".$data."';");
}
18-Feb-2003 02:10
2. Pre Parser Strings and New Line Delimited Data
One important thing to note at this point is that the xml_parse function requires a string variable. You can manipulate the content of any string variable easily as we all know.
A better approach to removing newlines than:
while ($data = fread($fp, 4096)) {
$data = preg_replace("/\n|\r/","",$data); //flarp
if (!xml_parse($xml_parser, $data, feof($fp))) {...
Above works across all 3 line-delimited text files (\n, \r, \r\n). But this could potentially (or will most likely) damage or scramble data contained in for example CDATA areas. As far as I am concerned end of line characters should not be used _within_ XML tags. What seems to be the ultimate solution is to pre-parse the loaded data this would require checking the position within the XML document and adding or subtracting (using a in-between fread temporary variable) data based on conditions like: "Is within tag", "Is within CDATA" etc. before fedding it to the parser. This of course opens up a new can of worms (as in parse data for the parser...). (above procedure would take place between fread and xml_parser calls this method would be compatible with the general usage examples on top of the page)
3. The Answer to parsing arbitrary XML and Preprocessor Revisited
You can't just feed any XML document to the parser you constructed and assuming that it will work! You have to know what kind of methods for storing data are used, for example is there a end of line delimited data in the file ?, Are there any carriage returns in the tags etc... XML files come formatted in different ways some are just a one long string of characters with out any end of line markers others have newlines, carriage returns or both (Microsloth Windows). May or may not contain space and other whitespace between tags. For this reason it is important to what I call Normalize the data before feeding it to the parser. You can perform this with regular expressions or plain old str_replace and concatenation. In many cases this can be done to the file it self sometimes to string data on the fly( as shown in the example above). But I feel it is important to normalize the data before even calling the function to call xml_parse. If you have the ability to access all data before that call you can convert it to what you fell the data should have been in the first place and omit many surprises and expensive regular expression substitution (in a tight spot) while fread'ing the data.
18-Feb-2003 02:09
My previous XML post (software at serv-a-com dot com/22-Jan-2003 03:08) resulted in some of the visitors e-mailg me on the carriage return stripping issue with questions. I'll try to make the following mumble as brief and easy to understand as possible.
1. Overview of the 4096 fragmentation issue
As you know the following freads the file 4096 bytes at a time (that is 4KB) this is perhaps ok for testing expat and figuring out how things work, but it it rather dangerous in the production environment. Data may not be fully understandable due to fread fragmentation and improperly formatted due to numerous sources(formats) of data contained within (i.e. end of line delimited CDATA).
while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
Sometimes to save time one may want to load it all up into a one big variable and leave all the worries to expat. I think anything under 500 KB is ok (as long as nobody knows about it). Some may argue that larger variables are acceptable or even necessary because of the magic that take place while parsing using xml_parse. Our XML parser(expat) works and can be successfully implemented only when we know what type of XML data we are dealing with, it's average size and structure of general layout and data contained within tags. For example if the tags are followed by a line delimiter like a new line we can read it with fgets in and with minimal effort make sure that no data will be sent to the function that does not end with a end tag. But this require a fair knowledge of the file's preference for storing XML data and tags (and a bit of code between reading data and xml_parse'ing it).
23-Jan-2003 07:08
use:
while ($data = str_replace("\n","",fread($fp, 4096))){
instead of:
while ($data = fread($fp, 4096)) {
It will save you a headache.
and in response to (simen at bleed dot no 11-Jan-2003 04:27) "If the 4096 byte buffer fills up..."
Please take better care of your data don't just shove it in to the xml_parse() check and make sure that the tags are not sliced the middle, use a temporary variable between fread and xml_parse.
12-Jan-2003 08:27
I was experiencing really wierd behaviour loading a large XML document (91k) since the buffer of 4096, when reading the file actually doesn't take into consideration the following:
<node>this is my value</node>
If the 4096 byte buffer fills up at "my", you will get a split string into your xml_set_character_data_handler().
The only solution I've found so far is to read the whole document into a variable and then parse.
04-Nov-2002 05:29
Building on... This allows you to return the value of an element using an XPath reference. This code would of course need error handling added :-)
function GetElementByName ($xml, $start, $end) {
$startpos = strpos($xml, $start);
if ($startpos === false) {
return false;
}
$endpos = strpos($xml, $end);
$endpos = $endpos+strlen($end);
$endpos = $endpos-$startpos;
$endpos = $endpos - strlen($end);
$tag = substr ($xml, $startpos, $endpos);
$tag = substr ($tag, strlen($start));
return $tag;
}
function XPathValue($XPath,$XML) {
$XPathArray = explode("/",$XPath);
$node = $XML;
while (list($key,$value) = each($XPathArray)) {
$node = GetElementByName($node, "<$value>", "</$value>");
}
return $node;
}
print XPathValue("Response/Shipment/TotalCharges/Value",$xml);
28-Sep-2002 04:01
For a simple XML parser you can use this function. It doesn't require any extensions to run.
<?
// Extracts content from XML tag
function GetElementByName ($xml, $start, $end) {
global $pos;
$startpos = strpos($xml, $start);
if ($startpos === false) {
return false;
}
$endpos = strpos($xml, $end);
$endpos = $endpos+strlen($end);
$pos = $endpos;
$endpos = $endpos-$startpos;
$endpos = $endpos - strlen($end);
$tag = substr ($xml, $startpos, $endpos);
$tag = substr ($tag, strlen($start));
return $tag;
}
// Open and read xml file. You can replace this with your xml data.
$file = "data.xml";
$pos = 0;
$Nodes = array();
if (!($fp = fopen($file, "r"))) {
die("could not open XML input");
}
while ($getline = fread($fp, 4096)) {
$data = $data . $getline;
}
$count = 0;
$pos = 0;
// Goes throw XML file and creates an array of all <XML_TAG> tags.
while ($node = GetElementByName($data, "<XML_TAG>", "</XML_TAG>")) {
$Nodes[$count] = $node;
$count++;
$data = substr($data, $pos);
}
// Gets infomation from tag siblings.
for ($i=0; $i<$count; $i++) {
$code = GetElementByName($Nodes[$i], "<Code>", "</Code>");
$desc = GetElementByName($Nodes[$i], "<Description>", "</Description>");
$price = GetElementByName($Nodes[$i], "<BasePrice>", "</BasePrice>");
}
?>
Hope this helps! :)
Guy Laor
19-Sep-2002 04:27
Some reference code I am working on as "XML Library" of which I am folding it info an object. Notice the use of the DEFINE:
Mainly Example 1 and parts of 2 & 3 re-written as an object:
--- MyXMLWalk.lib.php ---
<?php
if (!defined("PHPXMLWalk")) {
define("PHPXMLWalk",TRUE);
class XMLWalk {
var $p; //short for xml parser;
var $e; //short for element stack/array
function prl($x,$i=0) {
ob_start();
print_r($x);
$buf=ob_get_contents();
ob_end_clean();
return join("\n".str_repeat(" ",$i),split("\n",$buf));
}
function XMLWalk() {
$this->p = xml_parser_create();
$this->e = array();
xml_parser_set_option($this->p, XML_OPTION_CASE_FOLDING, true);
xml_set_element_handler($this->p, array(&$this, "startElement"), array(&$this, "endElement"));
xml_set_character_data_handler($this->p, array(&$this, "dataElement"));
register_shutdown_function(array(&$this, "free")); // make a destructor
}
function startElement($parser, $name, $attrs) {
if (count($attrs)>=1) {
$x = $this->prl($attrs, $this->e[$parser]+6);
} else {
$x = "";
}
print str_repeat(" ",$this->e[$parser]+0). "$name $x\n";
$this->e[$parser]++;
$this->e[$parser]++;
}
function dataElement($parser, $data) {
print str_repeat(" ",$this->e[$parser]+0). htmlspecialchars($data, ENT_QUOTES) ."\n";
}
function endElement($parser, $name) {
$this->e[$parser]--;
$this->e[$parser]--;
}
function parse($data, $fp) {
if (!xml_parse($this->p, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($this->p)),
xml_get_current_line_number($this->p)));
}
}
function free() {
xml_parser_free($this->p);
}
} // end of class
} // end of define
?>
--- end of file ---
Calling code:
<?php
...
require("MyXMLWalk.lib.php");
$file = "x.xml";
$xme = new XMLWalk;
if (!($fp = fopen($file, "r"))) {
die("could not open XML input");
}
while ($data = fread($fp, 4096)) {
$xme->parse($data, $fp);
}
...
?>
15-Aug-2002 05:59
[Editor's note: see also xml_parse_into_struct().]
Very simple routine to convert an XML file into a PHP structure. $obj->xml contains the resulting PHP structure. I would be interested if someone could suggest a cleaner method than the evals I am using.
<?
$filename = 'sample.xml';
$obj->tree = '$obj->xml';
$obj->xml = '';
function startElement($parser, $name, $attrs) {
global $obj;
// If var already defined, make array
eval('$test=isset('.$obj->tree.'->'.$name.');');
if ($test) {
eval('$tmp='.$obj->tree.'->'.$name.';');
eval('$arr=is_array('.$obj->tree.'->'.$name.');');
if (!$arr) {
eval('unset('.$obj->tree.'->'.$name.');');
eval($obj->tree.'->'.$name.'[0]=$tmp;');
$cnt = 1;
}
else {
eval('$cnt=count('.$obj->tree.'->'.$name.');');
}
$obj->tree .= '->'.$name."[$cnt]";
}
else {
$obj->tree .= '->'.$name;
}
if (count($attrs)) {
eval($obj->tree.'->attr=$attrs;');
}
}
function endElement($parser, $name) {
global $obj;
// Strip off last ->
for($a=strlen($obj->tree);$a>0;$a--) {
if (substr($obj->tree, $a, 2) == '->') {
$obj->tree = substr($obj->tree, 0, $a);
break;
}
}
}
function characterData($parser, $data) {
global $obj;
eval($obj->tree.'->data=\''.$data.'\';');
}
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
if (!($fp = fopen($filename, "r"))) {
die("could not open XML input");
}
while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
xml_parser_free($xml_parser);
print_r($obj->xml);
return 0;
?>
16-Apr-2002 06:23
I put up a good, simple, real world example of how to parse XML documents. While the sample grabs stock quotes off of the web, you can tweak it to do whatever you need.
http://www.analysisandsolutions.com/code/phpxml.htm
23-Mar-2002 06:16
In reference to the note made by sam@cwa.co.nz about parsing entities:
I could be wrong, but since it is possible to define your own entities within an XML DTD, the cdata handler function parses these individually to allow for your own implementation of those entities within your cdata handler.
27-Feb-2002 09:11
For newbies wanting a good tutorial on how to actually get started and where to go from this listing of functions, then visit:
http://www.wirelessdevnet.com/channels/wap/features/xmlcast_php.html
It shows an excellent example of how to read the XML data into a class file so you can actually process it, not just display it all pretty-like, like many tutorials on PHP/XML seem to be doing.
25-Jan-2002 01:43
I had to TRIM the data when I passed one large String containig a wellformed XML-File to xml_parse. The String was read by CURL, which aparently put a BLANK at the end of the String. This BLANK produced a "XML not wellformed"-Error in xml_parse!
28-Sep-2000 11:39
I've discovered some unusual behaviour in this API when ampersand entities are parsed in cdata; for some reason the parser breaks up the section around the entities, and calls the handler repeated times for each of the sections. If you don't allow for this oddity and you are trying to put the cdata into a variable, only the last part will be stored.
You can get around this with a line like:
$foo .= $cdata;
If the handler is called several times from the same tag, it will append them, rather than rewriting the variable each time. If the entire cdata section is returned, it doesn't matter.
May happen for other entities, but I haven't investigated.
Took me a while to figure out what was happening; hope this saves someone else the trouble.
08-Jul-1999 02:21
When using the XML parser, make sure you're not using the magic quotes option (e.g. use set_magic_quotes_runtime(0) if it's not the compiled default), otherwise you'll get 'not well-formed' errors when dealing with tags with attributes set in them.