Last updated: Thu, 31 May 2007

CXXI. PDF 関数

導入

PHP の PDF 関数を使用すると、 Thomas Merz が作成して現在は » PDFlib GmbH がメンテナンスしている PDFlib ライブラリを使用した PDF ファイルが作成できるようになります。

本節のドキュメントは、PDFlib ライブラリで利用可能な関数の概要のみを説明することを意図しており、完全なリファレンスではありません。ここで扱う各関数の完全で詳細な説明については、PDFlib GmbH が配布するすべての PDFlib パッケージに含まれている PDFlib リファレンスマニュアルを参照ください。このドキュメントは、PDFlib の機能に関する概要を非常に良くまとめており、全ての関数に関する最新のドキュメントが含まれています。

はじめの一歩としては、 PDFlib 配布パッケージに含まれるサンプルプログラムを眺めることをお勧めします。このサンプルでは、基本的なテキスト、ベクター、グラフィックスの出力だけではなく、 PDF インポート機能 (PDI) のような高度な関数も扱っています。

PDFlib のほとんどの関数と PHP モジュール内の関数の名前とパラメータは共通になっています。別途設定されていない限り、全ての長さと座標は、Postscript のポイント数で計られます。通常、1 インチあたり 72 Postscript ポイントですが、これは出力解像度に依存します。使用する座表系に関するより詳細な説明については、PDFlib の配布物に含まれる PDFlib リファレンスマニュアルを参照ください。

PDFlib のバージョン 6 では、PHP 4 用の関数指向の API に加えて PHP 5 用のオブジェクト指向の API も提供しています。主な違いは以下のとおりです。

PHP 4 では、まず最初に以下のような関数コールによって PDF リソースを取得しなければなりませんでした。

$p = PDF_new().

この PDF リソースは、それ以降のすべての関数コールの第一パラメータとして次のように使用されます。

PDF_begin_document($p, "", "").

しかし、PHP 5 では PDFlib オブジェクトは次のように作成します。

$p = new PDFlib().

このオブジェクトは、PDFlib API のすべての関数をメソッドとして提供しています。たとえば次のようになります。

$p->begin_document("", "").

さらに、PHP 5 の新機能である例外についても PDFlib 6 以降でサポートしています。

詳細な情報は、以下の例を参照ください。

注意: 外部 PDF ライブラリを使用しない、他のフリーな PDF ジェネレータに関心がある場合には、関連する FAQ を参照してください。

要件

PDFlib Lite はオープンソースで公開されていますが、フリーで使用するためにはいくつかの条件があります。 PDFlib Lite は、PDFlib の機能の一部をサポートしています。詳細は PDFlib のウェブサイトを参照ください。 PDFlib の完全版は » http://www.pdflib.com/products/pdflib-family/ でダウンロード可能ですが、商用する場合はライセンスを購入する必要があります。

古いバージョンの PDFlib に関する問題

2000 年 3 月 9 日以降のバージョンの PHP 4 では、3.0 より古いバージョンの PDFlib をサポートしていません。

PDFlib 4.0 以降は、PHP 4.3.0 以降でサポートされています。

インストール手順

この » PECL 拡張モジュールは PHP にバンドルされていません。この PECL 拡張モジュールをインストールする方法は、マニュアルの PECL 拡張モジュールのインストールという章にあります。新規リリース・ダウンロード・ソースファイル・管理者情報・CHANGELOG といった関連する情報については、次の場所にあります。 » http://pecl.php.net/package/pdflib.

PHP < 4.3.9 で以下の関数が動作するようにするには、 --with-pdflib[=DIR] を指定して PHP をコンパイルする必要があります。DIR は PDFlib のベースインストールディレクトリで、デフォルトは /usr/local です。

リソース型

PDF_new() は新しい PDFlib オブジェクトを作成します。これはほとんどの PDF 関数で必要となります。

廃止された PDFlib 関数についての注意

PHP 4.0.5 以降、PHPlib 用の PHP 拡張モジュールは、PDFlib GmbH から正式にサポートされています。これにより、PDFlib リファレンスマニュアルに記述された全ての関数が PHP4 で全く同じ意味、同じパラメータでサポートされています。しかし、PDFlib バージョン 5.0.4 以降ではすべてのパラメータを指定する必要があります。互換性を保つために PDFlib サポート関数ではまだ古い関数もサポートしていますが、上記のように新しいバージョンに置換される予定です。PDFlib GmbH は、これらの古い関数を使用した場合に生じた際に生じた問題に関してはサポートを行いません。このドキュメントではそれらの関数については「古い関数」と明記しており、かわりに使用する関数について説明しています。

例

多くの関数の使用法は簡単です。最も困難なのは、最初に pdf ドキュメントを作成する場合でしょう。次の例は、入門の際の助けとなるはずです。この例は PHP 4 を対象に開発されており、 1 ページを有するファイル test.pdf が作成されます。ドキュメントにはフィールドの内容についての情報が定義されており、 Helvetica-Bold フォントを読み込んで "Hello world! (says PHP)" というテキストを出力します。

例 1688. PHP 4 用の PDFlib での Hello World の例


<?php

$p = PDF_new();



/*  新しい PDF ファイルをオープンし、ディスク上に PDF を作成するためにファイル名を挿入します */

if (PDF_begin_document($p, "", "") == 0) {

    die("Error: " . PDF_get_errmsg($p));

}



PDF_set_info($p, "Creator", "hello.php");

PDF_set_info($p, "Author", "Rainer Schaaf");

PDF_set_info($p, "Title", "Hello world (PHP)!");



PDF_begin_page_ext($p, 595, 842, "");



$font = PDF_load_font($p, "Helvetica-Bold", "winansi", "");



PDF_setfont($p, $font, 24.0);

PDF_set_text_pos($p, 50, 700);

PDF_show($p, "Hello world!");

PDF_continue_text($p, "(says PHP)");

PDF_end_page_ext($p, "");



PDF_end_document($p, "");



$buf = PDF_get_buffer($p);

$len = strlen($buf);



header("Content-type: application/pdf");

header("Content-Length: $len");

header("Content-Disposition: inline; filename=hello.pdf");

print $buf;



PDF_delete($p);

?>

以下の例は PHP 5 用の PDFlib 配布物で使用するためのものです。PHP 5 の新機能である例外処理やオブジェクトのカプセル化機能を使用しています。この例では hello.pdf という名前の 1 ページのファイルを作成します。ドキュメントにはフィールドの内容についての情報が定義されており、 Helvetica-Bold フォントを読み込んで "Hello world! (says PHP)" というテキストを出力します。

例 1689. PHP 5 用の PDFlib での Hello World の例


<?php

try {
    $p = new PDFlib();

    /*  新しい PDF ファイルをオープンし、ディスク上に PDF を作成するためにファイル名を挿入します */
    if ($p->begin_document("", "") == 0) {
        die("Error: " . $p->get_errmsg());
    }

    $p->set_info("Creator", "hello.php");
    $p->set_info("Author", "Rainer Schaaf");
    $p->set_info("Title", "Hello world (PHP)!");

    $p->begin_page_ext(595, 842, "");

    $font = $p->load_font("Helvetica-Bold", "winansi", "");

    $p->setfont($font, 24.0);
    $p->set_text_pos(50, 700);
    $p->show("Hello world!");
    $p->continue_text("(says PHP)");
    $p->end_page_ext("");

    $p->end_document("");

    $buf = $p->get_buffer();
    $len = strlen($buf);

    header("Content-type: application/pdf");
    header("Content-Length: $len");
    header("Content-Disposition: inline; filename=hello.pdf");
    print $buf;
}
catch (PDFlibException $e) {
    die("PDFlib exception occurred in hello sample:\n" .
       "[" . $e->get_errnum() . "] " . $e->get_apiname() . ": " .
       $e->get_errmsg() . "\n");
}
catch (Exception $e) {
    die($e);
}
$p = 0;
?>

PDF_activate_item — 構造体要素やその他の内容をアクティブにする
PDF_add_annotation — 注記を追加する [古い関数]
PDF_add_bookmark — ブックマークを現在のページに追加する [古い関数]
PDF_add_launchlink — 現在のページに起動用注記を追加する [古い関数]
PDF_add_locallink — 現在のページにリンク注記を追加する [古い関数]
PDF_add_nameddest — 移動先を作成する
PDF_add_note — 現在のページに注記を追加する [古い関数]
PDF_add_outline — 現在のページにブックマークを追加する [古い関数]
PDF_add_pdflink — 現在のページにリンク注記を追加する [古い関数]
PDF_add_table_cell — 新しいテーブル、あるいは既存のテーブルにセルを追加する
PDF_add_textflow — Textflow を作成するか、既存の Textflow にテキストを追加する
PDF_add_thumbnail — 現在のページにサムネイルを追加する
PDF_add_weblink — 現在のページに Web リンクを追加する [古い関数]
PDF_arc — 反時計回りに円弧を描く
PDF_arcn — 時計回りに円弧を描く
PDF_attach_file — 現在のページに添付ファイルを追加する [古い関数]
PDF_begin_document — 新しい PDF ファイルを作成する
PDF_begin_font — Type 3 フォント定義を開始する
PDF_begin_glyph — Type 3 フォントのグリフ定義を開始する
PDF_begin_item — 構造体要素あるいはその他の内容をオープンする
PDF_begin_layer — レイヤーを開始する
PDF_begin_page_ext — 新規ページを開始する
PDF_begin_page — 新規ページを開始する [古い関数]
PDF_begin_pattern — パターン定義を開始する
PDF_begin_template_ext — テンプレート定義を開始する
PDF_begin_template — テンプレート定義を開始する [古い関数]
PDF_circle — 円を描く
PDF_clip — 現在のパスに切り取る
PDF_close_image — 画像を閉じる
PDF_close_pdi_page — ページハンドルを閉じる
PDF_close_pdi — PDF ドキュメント入力を閉じる [古い関数]
PDF_close — pdf ドキュメントを閉じる [古い関数]
PDF_closepath_fill_stroke — 現在のパスを閉じ、塗りつぶし、輪郭を描く
PDF_closepath_stroke — パスを閉じ、パスに沿って線を描く
PDF_closepath — 現在のパスを閉じる
PDF_concat — 行列を CTM に追加する
PDF_continue_text — 次の行にテキストを出力する
PDF_create_3dview — 3D ビューを作成する
PDF_create_action — オブジェクトやイベントに対するアクションを作成する
PDF_create_annotation — 矩形の注記を作成する
PDF_create_bookmark — ブックマークを作成する
PDF_create_field — フォームフィールドを作成する
PDF_create_fieldgroup — フォームフィールドグループを作成する
PDF_create_gstate — 画像状態オブジェクトを作成する
PDF_create_pvf — PDFlib 仮想ファイルを作成する
PDF_create_textflow — textflow オブジェクトを作成する
PDF_curveto — ベジエ曲線を描く
PDF_define_layer — レイヤー定義を作成する
PDF_delete_pvf — PDFlib 仮想ファイルを削除する
PDF_delete_table — テーブルオブジェクトを削除する
PDF_delete_textflow — textflow オブジェクトを削除する
PDF_delete — PDFlib オブジェクトを削除する
PDF_encoding_set_char — グリフ名や Unicode 値を追加する
PDF_end_document — PDF ファイルを閉じる
PDF_end_font — Type 3 フォント定義を終了する
PDF_end_glyph — Type 3 フォントのグリフ定義を終了する
PDF_end_item — 構造体要素やその他の内容を閉じる
PDF_end_layer — すべてのアクティブなレイヤーを無効にする
PDF_end_page_ext — ページを終了する
PDF_end_page — ページを終了する
PDF_end_pattern — パターンを終了する
PDF_end_template — テンプレートを終了する
PDF_endpath — 現在のパスを終了する
PDF_fill_imageblock — 画像ブロックをさまざまなデータで塗りつぶす
PDF_fill_pdfblock — PDF ブロックをさまざまなデータで塗りつぶす
PDF_fill_stroke — パスを塗りつぶし、パスの輪郭を描く
PDF_fill_textblock — テキストブロックをさまざまなデータで塗りつぶす
PDF_fill — 現在のパスを塗りつぶす
PDF_findfont — 後で使用するフォントを準備する [古い関数]
PDF_fit_image — 画像やテンプレートを配置する
PDF_fit_pdi_page — インポートした PDF ページを配置する
PDF_fit_table — テーブルをページに配置する
PDF_fit_textflow — textflow を矩形領域に配置する
PDF_fit_textline — 1 行分のテキストを配置する
PDF_get_apiname — 成功しなかった API 関数の名前を取得する
PDF_get_buffer — PDF 出力バッファを取得する
PDF_get_errmsg — エラーテキストを取得する
PDF_get_errnum — エラー番号を取得する
PDF_get_font — フォントを取得する [古い関数]
PDF_get_fontname — フォント名を取得する [古い関数]
PDF_get_fontsize — フォント処理 [古い関数]
PDF_get_image_height — 画像の高さを取得する [古い関数]
PDF_get_image_width — 画像の幅を取得する [古い関数]
PDF_get_majorversion — メジャーバージョン番号を取得する [古い関数]
PDF_get_minorversion — マイナーバージョン番号を取得する [古い関数]
PDF_get_parameter — 文字列パラメータを取得する
PDF_get_pdi_parameter — PDI 文字列パラメータを取得する [古い関数]
PDF_get_pdi_value — 数値型の PDI パラメータを取得する [古い関数]
PDF_get_value — 数値型のパラメータを取得する
PDF_info_font — 読み込まれたフォントについての詳細情報を問い合わせる
PDF_info_matchbox — マッチボックスの情報を問い合わせる
PDF_info_table — テーブルの情報を取得する
PDF_info_textflow — textflow の状態を問い合わせる
PDF_info_textline — テキストの行のフォーマットを行い、メトリクスを問い合わせる
PDF_initgraphics — 描画状態をリセットする
PDF_lineto — 線を描く
PDF_load_3ddata — 3D モデルを読み込む
PDF_load_font — フォントを検索し、準備する
PDF_load_iccprofile — ICC プロファイルを検索し、準備する
PDF_load_image — 画像ファイルをオープンする
PDF_makespotcolor — スポット色を作成する
PDF_moveto — 現在の位置を設定する
PDF_new — PDFlib オブジェクトを作成する
PDF_open_ccitt — raw CCITT イメージをオープンする [古い関数]
PDF_open_file — PDF ファイルを作成する [古い関数]
PDF_open_gif — GIF イメージをオープンする [古い関数]
PDF_open_image_file — ファイルからイメージを読み込む [古い関数]
PDF_open_image — イメージデータを使用する [古い関数]
PDF_open_jpeg — JPEG イメージをオープンする [古い関数]
PDF_open_memory_image — PHP のイメージ関数で作成されたイメージをオープンする [未サポート]
PDF_open_pdi_page — ページを準備する
PDF_open_pdi — PDF ファイルをオープンする [古い関数]
PDF_open_tiff — TIFF イメージをオープンする [古い関数]
PDF_pcos_get_number — number あるいは boolean 型の pCOS パスの値を取得する
PDF_pcos_get_stream — stream、fstream あるいは string 型の pCOS パスの内容を取得する
PDF_pcos_get_string — name、string あるいは boolean 型の pCOS パスの値を取得する
PDF_place_image — イメージをページ上に置く [古い関数]
PDF_place_pdi_page — PDF ページを置く [古い関数]
PDF_process_pdi — インポートされた PDF ドキュメントを処理する
PDF_rect — 矩形を描く
PDF_restore — 描画状態を復元する
PDF_resume_page — ページを再開する
PDF_rotate — 座標系を回転する
PDF_save — 描画状態を保存する
PDF_scale — スケールを設定する
PDF_set_border_color — 注記の周りの境界色を設定する [古い関数]
PDF_set_border_dash — 注記の周りの境界の破線形式を設定する [古い関数]
PDF_set_border_style — 注記の周りの境界の形式を設定する [古い関数]
PDF_set_char_spacing — 文字間隔を設定する [古い関数]
PDF_set_duration — ページ間隔を設定する [古い関数]
PDF_set_gstate — 画像状態オブジェクトをアクティブにする
PDF_set_horiz_scaling — テキストの横方向倍率を設定する [古い関数]
PDF_set_info_author — ドキュメントの author フィールドを設定する [古い関数]
PDF_set_info_creator — ドキュメントの creator フィールドを設定する [古い関数]
PDF_set_info_keywords — ドキュメントの keyword フィールドを設定する [古い関数]
PDF_set_info_subject — ドキュメントの subject フィールドを設定する [古い関数]
PDF_set_info_title — ドキュメントの title フィールドを設定する [古い関数]
PDF_set_info — ドキュメント情報のフィールドを設定する
PDF_set_layer_dependency — レイヤー間の関係を定義する
PDF_set_leading — テキストの行間を設定する [古い関数]
PDF_set_parameter — 文字列パラメータを設定する
PDF_set_text_matrix — テキストの行列を設定する [古い関数]
PDF_set_text_pos — テキストの位置を設定する
PDF_set_text_rendering — テキストの描画方法を設定する [古い関数]
PDF_set_text_rise — テキストの傾きを設定する [古い関数]
PDF_set_value — 数値パラメータを設定する
PDF_set_word_spacing — 単語間の空白を設定する [古い関数]
PDF_setcolor — 塗りつぶし色および輪郭色を設定する
PDF_setdash — 破線パターンを設定する
PDF_setdashpattern — 破線パターンを設定する
PDF_setflat — 平面度を設定する
PDF_setfont — フォントを設定する
PDF_setgray_fill — 塗りつぶし色をグレーに設定する [古い関数]
PDF_setgray_stroke — 描画色をグレーに設定する [古い関数]
PDF_setgray — 色をグレーに設定する [古い関数]
PDF_setlinecap — linecap パラメータを設定する
PDF_setlinejoin — linejoin パラメータを設定する
PDF_setlinewidth — 線幅を設定する
PDF_setmatrix — 現在の変換行列を設定する
PDF_setmiterlimit — miter limit を設定する
PDF_setpolydash — 複雑な破線パターンを設定する [古い関数]
PDF_setrgbcolor_fill — 塗りつぶし RGB 色の値を設定する
PDF_setrgbcolor_stroke — 描画 RGB 色を設定する [古い関数]
PDF_setrgbcolor — 描画および塗りつぶし RGB 色を設定する [古い関数]
PDF_shading_pattern — シェーディングパターンを定義する
PDF_shading — 混色を定義する
PDF_shfill — シェーディングで領域を塗りつぶす
PDF_show_boxed — ボックスにテキストを出力する [古い関数]
PDF_show_xy — 指定した位置にテキストを出力する
PDF_show — 現在の位置にテキストを出力する
PDF_skew — 座標系を歪ませる
PDF_stringwidth — テキストの幅を返す
PDF_stroke — パスを描く
PDF_suspend_page — ページを停止する
PDF_translate — 座標系の原点を設定する
PDF_utf16_to_utf8 — 文字列を UTF-16 から UTF-8 に変換する
PDF_utf32_to_utf16 — 文字列を UTF-32 から UTF-16 に変換する
PDF_utf8_to_utf16 — 文字列を UTF-8 から UTF-16 に変換する

PDF_activate_item

" width="11" height="7"/>

preg_split

Last updated: Thu, 31 May 2007

add a note User Contributed Notes
PDF 関数

praokean at yahoo dot com
23-Aug-2007 09:08


domPDF is not so great PDF creator becouse don't support foreign charachters.

Sam from dogmaConsult.de
15-Aug-2007 06:00


I seriously tried to get PDF parsing to work to use it in the indexing for fulltext search for a document management. But none of the pdf2text functions below worked for my test cases (among them an openoffice generated pdf file and a file generated by fpdf).



But I found a REALLY WORKING SOLUTION! On linux systems, install the XPDF package. It comes with a tool called pdftotext. Use php code similar to the following to get the text content of your pdf files:



<?

    $file = "test.pdf";

    $outpath = preg_replace("/\.pdf$/", "", $file).".txt";

    

    system("pdftotext ".escapeshellcmd($file), $ret);

    if ($ret == 0)

    {

        $value = file_get_contents($outpath);

        unlink($outpath);

        print $value;

    }

    if ($ret == 127)

        print "Could not find pdftotext tool.";

    if ($ret == 1)

        print "Could not find pdf file.";

?>



The solution works on all test cases and is much more powerful than any of the previous pure php functions posted here, although only available on linux.

tatlar at yahoo dot com
15-Aug-2007 08:49


http://www.digitaljunkies.ca/dompdf/index.php



PHP5 class that converts HTML to PDF. From the website:

"At its heart, dompdf is (mostly) CSS2.1 compliant HTML layout and rendering engine written in PHP. It is a style-driven renderer: it will download and read external stylesheets, inline style tags, and the style attributes of individual HTML elements. It also supports most presentational HTML attributes."

david at metabin
19-Jul-2007 08:19


Easiest way to get the text of a pdf is to install xpdf (on redhat yum -y install xpdf) 



then run xpdftotext your.pdf - which will then generate your.txt.

brain23 at gmx dot de
03-Jul-2007 11:28


For FPDF there also is an addon (FPDI) available, which let you import existing PDF documents: 



http://www.setasign.de/products/pdf-php-solutions/fpdi/

pitvanester at gemail dot com
25-Jun-2007 11:54


Sorry, both versions of pdf2txt don´t work...

jaymaity at gmail dot com
01-Jun-2007 09:22


Totally free open source alternative is also available without any license cost at

http://fpdf.org/

jkndrkn at gmail dot com
04-May-2007 02:51


For those of us that do not want to pay for a commercial license to use PDFlib in a closed-source project, there are at least two good alternatives: FPDF and TCPDF



http://www.fpdf.org/

PHP4 and PHP5 support



http://sourceforge.net/projects/pdf-php

PHP5 support only

luc at phpt dot org
30-Mar-2007 02:09


I am trying to extract the text from PDF files and use it to feed a search engine (Intranet tool). I tried several functions "PDF2TXT" posted below, but not they do not produce the expected result. At least, all words need to be separated by spaces (then used as keywords), and the "junk" codes removed (for example: binary data, pictures...). I start modifying the interesting function posted by Swen, and here is the my current version that starts to work quite well (with PDF version 1.2). Sorry for having a quite different style of programming. Luc



<?php

// Patch for pdf2txt() posted Sven Schuberth

// Add/replace following code (cannot post full program, size limitation)



// handles the verson 1.2

// New version of handleV2($data), only one line changed

function handleV2($data){

        

    // grab objects and then grab their contents (chunks)

    $a_obj = getDataArray($data,"obj","endobj");

    

    foreach($a_obj as $obj){

        

        $a_filter = getDataArray($obj,"<<",">>");

    

        if (is_array($a_filter)){

            $j++;

            $a_chunks[$j]["filter"] = $a_filter[0];



            $a_data = getDataArray($obj,"stream\r\n","endstream");

            if (is_array($a_data)){

                $a_chunks[$j]["data"] = substr($a_data[0],

        strlen("stream\r\n"),

        strlen($a_data[0])-strlen("stream\r\n")-strlen("endstream"));

            }

        }

    }



    // decode the chunks

    foreach($a_chunks as $chunk){



        // look at each chunk and decide how to decode it - by looking at the contents of the filter

        $a_filter = split("/",$chunk["filter"]);

        

        if ($chunk["data"]!=""){

            // look at the filter to find out which encoding has been used            

            if (substr($chunk["filter"],"FlateDecode")!==false){

                $data =@ gzuncompress($chunk["data"]);

                if (trim($data)!=""){

            // CHANGED HERE, before: $result_data .= ps2txt($data);    

                    $result_data .= PS2Text_New($data);

                } else {

                

                    //$result_data .= "x";

                }

            }

        }

    }

    return $result_data;

}



// New function - Extract text from PS codes

function ExtractPSTextElement($SourceString)

{

$CurStartPos = 0;

while (($CurStartText = strpos($SourceString, '(', $CurStartPos)) !== FALSE)

    {

    // New text element found

    if ($CurStartText - $CurStartPos > 8) $Spacing = ' ';

    else    {

        $SpacingSize = substr($SourceString, $CurStartPos, $CurStartText - $CurStartPos);

        if ($SpacingSize < -25) $Spacing = ' '; else $Spacing = '';

        }

    $CurStartText++;



    $StartSearchEnd = $CurStartText;

    while (($CurStartPos = strpos($SourceString, ')', $StartSearchEnd)) !== FALSE)

        {

        if (substr($SourceString, $CurStartPos - 1, 1) != '\\') break;

        $StartSearchEnd = $CurStartPos + 1;

        }

    if ($CurStartPos === FALSE) break; // something wrong happened

    

    // Remove ending '-'

    if (substr($Result, -1, 1) == '-')

        {

        $Spacing = '';

        $Result = substr($Result, 0, -1);

        }



    // Add to result

    $Result .= $Spacing . substr($SourceString, $CurStartText, $CurStartPos - $CurStartText);

    $CurStartPos++;

    }

// Add line breaks (otherwise, result is one big line...)

return $Result . "\n";

}



// Global table for codes replacement 

$TCodeReplace = array ('\(' => '(', '\)' => ')');



// New function, replacing old "pd2txt" function

function PS2Text_New($PS_Data)

{

global $TCodeReplace;



// Catch up some codes

if (ord($PS_Data[0]) < 10) return ''; 

if (substr($PS_Data, 0, 8) == '/CIDInit') return '';



// Some text inside (...) can be found outside the [...] sets, then ignored 

// => disable the processing of [...] is the easiest solution



$Result = ExtractPSTextElement($PS_Data);



// echo "Code=$PS_Data\nRES=$Result\n\n";



// Remove/translate some codes

return strtr($Result, $TCodeReplace);

}



?>

Sven.Schuberth(at)gmx.de
29-Mar-2007 03:38


I've improved the codesnipped for the pdf2txt version 1.2.

Now its possible the translate pdf version >1.2 into plain text.



Sven



<?php

// Function    : pdf2txt()

// Arguments   : $filename - Filename of the PDF you want to extract

// Description : Reads a pdf file, extracts data streams, and manages

//               their translation to plain text - returning the plain

//               text at the end

// Authors      : Jonathan Beckett, 2005-05-02

//                            : Sven Schuberth, 2007-03-29



function pdf2txt($filename){



    $data = getFileData($filename);

    

    $s=strpos($data,"%")+1;

    

    $version=substr($data,$s,strpos($data,"%",$s)-1);

    if(substr_count($version,"PDF-1.2")==0)

        return handleV3($data);

    else

        return handleV2($data);



    

}

// handles the verson 1.2

function handleV2($data){

        

    // grab objects and then grab their contents (chunks)

    $a_obj = getDataArray($data,"obj","endobj");

    

    foreach($a_obj as $obj){

        

        $a_filter = getDataArray($obj,"<<",">>");

    

        if (is_array($a_filter)){

            $j++;

            $a_chunks[$j]["filter"] = $a_filter[0];



            $a_data = getDataArray($obj,"stream\r\n","endstream");

            if (is_array($a_data)){

                $a_chunks[$j]["data"] = substr($a_data[0],

strlen("stream\r\n"),

strlen($a_data[0])-strlen("stream\r\n")-strlen("endstream"));

            }

        }

    }



    // decode the chunks

    foreach($a_chunks as $chunk){



        // look at each chunk and decide how to decode it - by looking at the contents of the filter

        $a_filter = split("/",$chunk["filter"]);

        

        if ($chunk["data"]!=""){

            // look at the filter to find out which encoding has been used            

            if (substr($chunk["filter"],"FlateDecode")!==false){

                $data =@ gzuncompress($chunk["data"]);

                if (trim($data)!=""){

                    $result_data .= ps2txt($data);

                } else {

                

                    //$result_data .= "x";

                }

            }

        }

    }

    

    return $result_data;

}



//handles versions >1.2

function handleV3($data){

    // grab objects and then grab their contents (chunks)

    $a_obj = getDataArray($data,"obj","endobj");

    $result_data="";

    foreach($a_obj as $obj){

        //check if it a string

        if(substr_count($obj,"/GS1")>0){

            //the strings are between ( and )

            preg_match_all("|\((.*?)\)|",$obj,$field,PREG_SET_ORDER);

            if(is_array($field))

                foreach($field as $data)

                    $result_data.=$data[1];

        }

    }

    return $result_data;

}



function ps2txt($ps_data){

    $result = "";

    $a_data = getDataArray($ps_data,"[","]");

    if (is_array($a_data)){

        foreach ($a_data as $ps_text){

            $a_text = getDataArray($ps_text,"(",")");

            if (is_array($a_text)){

                foreach ($a_text as $text){

                    $result .= substr($text,1,strlen($text)-2);

                }

            }

        }

    } else {

        // the data may just be in raw format (outside of [] tags)

        $a_text = getDataArray($ps_data,"(",")");

        if (is_array($a_text)){

            foreach ($a_text as $text){

                $result .= substr($text,1,strlen($text)-2);

            }

        }

    }

    return $result;

}



function getFileData($filename){

    $handle = fopen($filename,"rb");

    $data = fread($handle, filesize($filename));

    fclose($handle);

    return $data;

}



function getDataArray($data,$start_word,$end_word){



    $start = 0;

    $end = 0;

    unset($a_result);

    

    while ($start!==false && $end!==false){

        $start = strpos($data,$start_word,$end);

        if ($start!==false){

            $end = strpos($data,$end_word,$start);

            if ($end!==false){

                // data is between start and end

                $a_result[] = substr($data,$start,$end-$start+strlen($end_word));

            }

        }

    }

    return $a_result;

}

?>

brendandonhue at comcast dot net
23-Aug-2006 12:35


Here is a function to test whether a file is a PDF without using any external library.


<?php


define('PDF_MAGIC', "\\x25\\x50\\x44\\x46\\x2D");


function is_pdf($filename) {


  return (file_get_contents($filename, false, null, 0, strlen(PDF_MAGIC)) === PDF_MAGIC) ? true : false;


}


?>


It's not checking if the whole file is valid, just if the correct header is present at the beginning of the file.

MAGnUm at magnumhome dot servehttp.com
18-Jul-2006 06:01


domPDF is also a great PDF creation interface. it basically converts your code to CSS and then builds the PDF from that with the absolute positions, and what not...

spingary at yahoo dot com
13-Jan-2006 05:55


I was having trouble with streaming inline PDf's using PHP 5.0.2, Apache 2.0.54.



This is my code:



<?

header("Pragma: public");

header("Expires: Mon, 26 Jul 1997 05:00:00 GMT");

header("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT");

header("Cache-Control: must-revalidate");

header("Content-type: application/pdf");

header("Content-Length: ".filesize($file));

header("Content-disposition: inline; filename=$file");

header("Accept-Ranges: ".filesize($file)); 

readfile($file);

exit();

?>

It would work fine in Mozilla Firefox (1.0.7) but with IE (6.0.2800.1106) it would not bring up the Adobe Reader plugin and instead ask me to save it or open it as a PHP file.



Oddly enough, I turned off ZLib.compression and it started working.  I guess the compression is confusing IE.  I tried leaving out the content-length header thinking maybe it was unmatched filesize (uncompressed number vs actual received compressed size), but then without it it screws up Firefox too.  



What I ended up doing was disabling Zlib compression for the PDF output pages using ini_set:



<?

ini_set('zlib.output_compression','Off'); 

?>



Maybe this will help someone. Will post over in the PDF section as well.

ontwerp AT zonnet.nl
04-Nov-2005 04:01


I was searching for a lowcost/opensource option for combining static html files [as templates] and dynamic output from perl or php routines etc. And the sooner or later I found out that this was the most stable, 'speedest' and customizeable way to produce usable pdf 's with nice formatting :



1] create html page output [perl-> html output, direct html output from any app or php echo's etc. [sort these html files locally]



2] parse all html [inluding webimages links, tables font formatting etc] to [E]PS files with the perl app : html2ps [as mentioned beneath] 

http://user.it.uu.se/~jan/html2ps.html [sort all ps files by future pdf page positions]



3] use the free ps2pdf/ps2pdfwr linux application 

http://www.ps2pdf.com/convert/index.htm [uses gostscript, ghostview libs and so on etc]

Has great formatting options like headers, footers, numbering etc

[sort pdf files]



4] convert all pdf files to 1 pdf file with : pdftk [pdftoolkit], deliveres optional compressions/encryption, background stamps etc



One should ask why using different scripts :

- combination perl/php is great : perl is speedier at some issues like conversion to ps files in my experience

- ps to pdf is quickier then direct php to pdf [in my exp.!]

- I have total control over every files whenever i change html files as a template I use only editors or other app. for it [online or offline].



p.s. I had to make a opensource solution for creating simpel report analyses that's based on things like :

- first page [name / title / #/ date]

- some static info [like introduction, copyrights etc]

- some dynamic info [outputted from php->dbase queries] combined

with html tags/images etc.



And this all mixed [so seperated in files for transparancy]. Also the 3 way manner : data-> html, html->ps, ps->pdf, is easier and quickier to program or adjust in every step.



Correct me if i'm wrong [mail me to]



ing. Valentijn Langendorff

Design & Technologist

ragnar at deulos dot com
08-Oct-2005 11:30


After one hole day understanding how pdflib works i got the conclusion that its enough hard to draw just with words to furthermore for drawing a line maybe you will need something like four lines of code, so i did my own functions to do the life easier and the code more understable to modify and draw. I also made a function that will draw a rect with the corners round and the posibility even to fill it ;)



You can get it from http://www.deulos.com/pdf_php.php



feel free to make suggestions or whatever u like ;o)

18-Sep-2005 03:26


some code that can be very helpful for starters.



<?php



    // Declare PDF File



    $pdf = pdf_new();

    PDF_open_file($pdf);



    // Set Document Properties



    PDF_set_info($pdf, "author", "Alexander Pas");

    PDF_set_info($pdf, "title", "PDF by PHP Example");

    PDF_set_info($pdf, "creator", "Alexander Pas");

    PDF_set_info($pdf, "subject", "Testing Code");



    // Get fonts to use



    pdf_set_parameter($pdf, "FontOutline", "Arial=arial.ttf"); // get a custom font

    $font1 = PDF_findfont($pdf, "Helvetica-Bold",  "winansi", 0); // declare default font

    $font2 = PDF_findfont($pdf, "Arial",  "winansi", 1); // declare custom font & embed into file



    /*

    You can use the following Fontypes 14 safely (the default fonts)

    Courier, Courier-Bold, Courier-Oblique, Courier-BoldOblique 

    Helvetica, Helvetica-Bold, Helvetica-Oblique, Helvetica-BoldOblique 

    Times-Roman, Times-Bold, Times-Italic, Times-BoldItalic 

    Symbol, ZapfDingbats

    */



    // make the images



    $image1 = PDF_open_image_file($pdf, "gif", "image.gif"); //supported filetypes are: jpeg, tiff, gif, png.



    //Make First Page



    PDF_begin_page($pdf, 450, 450); // page width and height.

    $bookmark = PDF_add_bookmark($pdf, "Front"); // add a top level bookmark.

    PDF_setfont($pdf, $font1, 12); // use this font from now on.

    PDF_show_xy($pdf, "First Page!", 5, 225); // show this text measured from the left top.

    pdf_place_image($pdf, $image1, 255, 5, 1); // last number will schale it.

    PDF_end_page($pdf); // End of Page.



    //Make Second Page



    PDF_begin_page($pdf, 450, 225); // page width and height.

    $bookmark1 = PDF_add_bookmark($pdf, "Chapter1", $bookmark); // add a nested bookmark. (can be nested multiple times.)

    PDF_setfont($pdf, $font2, 12); // use this font from now on.

    PDF_show_xy($pdf, "Chapter1!", 225, 5);

    PDF_add_bookmark($pdf, "Chapter1.1", $bookmark1); // add a nested bookmark (already in a nested one).

    PDF_setfont($pdf, $font1, 12);

    PDF_show_xy($pdf, "Chapter1.1", 225, 5);

    PDF_end_page($pdf);

    

    // Finish the PDF File

    

    PDF_close($pdf); // End Of PDF-File.

    $output = PDF_get_buffer($pdf); // assemble the file in a variable.



    // Output Area



    header("Content-type: application/pdf"); //set filetype to pdf.

    header("Content-Length: ".strlen($output)); //content length

    header("Content-Disposition: attachment; filename=test.pdf"); // you can use inline or attachment.

    echo $output; // actual print area!



    // Cleanup



    PDF_delete($pdf); 

?>

thodge at ipswich dot qld dot gov dot au
05-Sep-2005 02:22


Yet another addition to the PDF text extraction code last posted by jorromer. The code only seemed to work for PDF 1.2 (Acrobat 3.x) or below. This pdfExtractText function uses regular expressions to cover cases I have found in PDF 1.3 and 1.4 documents. The code also handles closing brackets in the text stream, which were ignored by the previous version. My regular expression skills are somewhat lacking, so improvements may possible by a more skilled programmer. I'm sure there are still cases that this function will not handle, but I haven't come across any yet...





<?php





function pdf2string($sourcefile) {





    $fp = fopen($sourcefile, 'rb');


    $content = fread($fp, filesize($sourcefile));


    fclose($fp);





    $searchstart = 'stream';


    $searchend = 'endstream';


    $pdfText = '';


    $pos = 0;


    $pos2 = 0;


    $startpos = 0;





    while ($pos !== false && $pos2 !== false) {





        $pos = strpos($content, $searchstart, $startpos);


        $pos2 = strpos($content, $searchend, $startpos + 1);





        if ($pos !== false && $pos2 !== false){





            if ($content[$pos] == 0x0d && $content[$pos + 1] == 0x0a) {


                $pos += 2;


            } else if ($content[$pos] == 0x0a) {


                $pos++;


            }





            if ($content[$pos2 - 2] == 0x0d && $content[$pos2 - 1] == 0x0a) {


                $pos2 -= 2;


            } else if ($content[$pos2 - 1] == 0x0a) {


                $pos2--;


            }





            $textsection = substr(


                $content, 


                $pos + strlen($searchstart) + 2, 


                $pos2 - $pos - strlen($searchstart) - 1


            );


            $data = @gzuncompress($textsection);


            $pdfText .= pdfExtractText($data);


            $startpos = $pos2 + strlen($searchend) - 1;





        }


    }





    return preg_replace('/(\s)+/', ' ', $pdfText);





}





function pdfExtractText($psData){





    if (!is_string($psData)) {


        return '';


    }





    $text = '';





    // Handle brackets in the text stream that could be mistaken for


    // the end of a text field. I'm sure you can do this as part of the 


    // regular expression, but my skills aren't good enough yet.


    $psData = str_replace('\)', '##ENDBRACKET##', $psData);


    $psData = str_replace('\]', '##ENDSBRACKET##', $psData);





    preg_match_all(


        '/(T[wdcm*])[\s]*(\[([^\]]*)\]|\(([^\)]*)\))[\s]*Tj/si', 


        $psData, 


        $matches


    );


    for ($i = 0; $i < sizeof($matches[0]); $i++) {


        if ($matches[3][$i] != '') {


            // Run another match over the contents.


            preg_match_all('/\(([^)]*)\)/si', $matches[3][$i], $subMatches);


            foreach ($subMatches[1] as $subMatch) {


                $text .= $subMatch;


            }


        } else if ($matches[4][$i] != '') {


            $text .= ($matches[1][$i] == 'Tc' ? ' ' : '') . $matches[4][$i];


        }


    }





    // Translate special characters and put back brackets.


    $trans = array(


        '...'                => '…',


        '\205'                => '…',


        '\221'                => chr(145),


        '\222'                => chr(146),


        '\223'                => chr(147),


        '\224'                => chr(148),


        '\226'                => '-',


        '\267'                => '•',


        '\('                => '(',


        '\['                => '[',


        '##ENDBRACKET##'    => ')',


        '##ENDSBRACKET##'    => ']',


        chr(133)            => '-',


        chr(141)            => chr(147),


        chr(142)            => chr(148),


        chr(143)            => chr(145),


        chr(144)            => chr(146),


    );


    $text = strtr($text, $trans);





    return $text;





}





?>

29-Aug-2005 01:58


If you want to display the number of pages (for example: page 1 of 3) then the following code could be helpful:



<?php 

... 



$pdf->begin_page_ext(842,595 , "");

  .. add text,images,...

$pdf->suspend_page("");



$pdf->begin_page_ext(842,595 , "");

  .. add text,images,...

$pdf->suspend_page("");



... create all pages



$pdf->resume_page("pagenumber 1");

... add number of pages to page 1

$pdf->end_page_ext("");



$pdf->resume_page("pagenumber 2");

... add number of pages to page 2

$pdf->end_page_ext("");



...

?>

jorromer at uchile dot cl -- Krash
08-Jun-2005 02:51


I recently use mattb code below for the extraction of text from PDF files. I modify this code for only extract text fields.



Hope i can help some one



Here is the Function



<?php



  $text = pdf2string("file.pdf");

  echo $text;



  function pdf2string($sourcefile){

    $fp = fopen($sourcefile, 'rb');

    $content = fread($fp, filesize($sourcefile));

    fclose($fp);



    $searchstart = 'stream';

    $searchend = 'endstream';

    $pdfdocument = '';

    $pos = 0;

    $pos2 = 0;

    $startpos = 0;

   

    while( $pos !== false && $pos2 !== false ){

      $pos = strpos($content, $searchstart, $startpos);

      $pos2 = strpos($content, $searchend, $startpos + 1);

     

      if ($pos !== false && $pos2 !== false){

        if ($content[$pos]==0x0d && $content[$pos+1]==0x0a) $pos+=2;

        else if ($content[$pos]==0x0a) $pos++;



        if ($content[$pos2-2]==0x0d && $content[$pos2-1]==0x0a) $pos2-=2;

        else if ($content[$pos2-1]==0x0a) $pos2--;



        $textsection = substr($content, $pos + strlen($searchstart) + 2, $pos2 - $pos - strlen($searchstart) - 1);

        $data = @gzuncompress($textsection);

        $data = ExtractText2($data);

        $startpos = $pos2 + strlen($searchend) - 1;

        

        if ($data === false){ 

          return -1;}

          

        $pdfdocument .= $data;}}

   return $pdfdocument;}



function ExtractText2($postScriptData){

  $sw = true;

  $textStart = 0;

  $len = strlen($postScriptData);



  while ($sw){

    $ini = strpos($postScriptData, '(', $textStart);

    $end = strpos($postScriptData, ')', $textStart+1);

    if (($ini>0) && ($end>$ini)){

      $valtext = strpos($postScriptData,'Tj',$end+1);

      if ($valtext == $end + 2)

        $text .= substr($postScriptData,$ini+1,$end - $ini - 1);}

      

    $textStart = $end + 1;

    if ($len<=$textStart) $sw=false;

    

    if (($ini == 0) && ($end == 0)) $sw=false;}

  

  $trans = array("\\341" => "a","\\351" => "e","\\355" => "i","\\363" => "o","\\223" => "","\\224" => "");

  $text  = strtr($text, $trans);

  return $text;

} 

?>

webadmin at secretscreen dot com
06-Apr-2005 06:51


I found this info about pdflib scope on a Chinese (I think) site and translated it.  I was trying to do pdf_setfont and kept getting the wrong scope error.  Turns out it has to be in the Page scope.  So pdf_setfont will only work when called between pdf_begin_page and pdf_end_page.



#########################################

When API of the PDFlib is called, the error, Can't - IN 'document' scope occurs 

There is a concept of " the scope " in the PDFlib, as for all API of the PDFlib it is called with some scope, the *1 which is decided This error occurs when it is called other than the scope where API is appointed. The chart below in reference, please verify API call position.



Path: PDF_moveto (), PDF_circle (), PDF_arc (), PDF_arcn (), PDF_rect () in each case PDF_stroke (), PDF_closepath_stroke (), PDF_fill (), PDF_fill_stroke (), PDF_closepath_fill_stroke (), PDF_clip (), PDF_endpath () the between 



Page: PDF_begin_page () with PDF_end_page () in between outside path  



Template: PDF_begin_template () with PDF_end_template () in between outside path  



Pattern: PDF_begin_pattern () with PDF_end_pattern () in between outside path  



Font: PDF_begin_font () with PDF_end_font () in between outside glyph  



Glyph: PDF_begin_glyph () with PDF_end_glyph () in between outside path  



Document: PDF_open_* () with PDF_close () in between outside page tempalte and pattern  



Object: The PDF_new () with the PDF_delete () it belongs to the other no scope in between the place 



Null: Outside object  



Any: All scopes other than  



##########################################



Hope this helps others as much as it helped me!!!

chu61 dot tw at gmail dot com
07-Mar-2005 12:57


How to get how many pages in a PDF? I read PDF spec. V1.6 and find this:



PDF set  a "Page Tree Node" to define the ordering of pages in the document. The tree structure allows PDF applications, using little memory to quickly open a document containing thousands of pages.



If a PDF have 63 pages, the page tree node will like this...



2 0 obj

<< /Type /Pages

    /Kidsn [ 4 0 R

               10 0 R

             ]

     /Count 63        <---- YES, got it

>>

endobj



[P.S]   a  PDF may not only a pages tree node, The right answer is in "root page tree node", if  /Count XX with  /Parent XXX node, it not "root page tree node"



SO, You must find the node with /Count XX and Without /Parent  terms, and you'll get total pages of PDF



%PDF-1.0  ~  %PDF-1.5 all works



Alex form Taipei,Taiwan

mattb at bluewebstudios dot com
05-Feb-2005 06:44


I recently tested Donatas' code below for the extraction of text from PDF files.  After running into a few problems where PDF files were not being read at all, I've modified it somewhat.  It still isn't perfect, but should work great for searching.  Thanks Donatas.



<?php

$test = pdf2string("<pathtoPDFfile>");

echo "$test";



# Returns a -1 if uncompression failed

function pdf2string($sourcefile)

{

   $fp = fopen($sourcefile, 'rb');

   $content = fread($fp, filesize($sourcefile));

   fclose($fp);



   # Locate all text hidden within the stream and endstream tags

   $searchstart = 'stream';

   $searchend = 'endstream';

   $pdfdocument = "";



   $pos = 0;

   $pos2 = 0;

   $startpos = 0;

   # Iterate through each stream block

   while( $pos !== false && $pos2 !== false )

   {

      # Grab beginning and end tag locations if they have not yet been parsed

      $pos = strpos($content, $searchstart, $startpos);

      $pos2 = strpos($content, $searchend, $startpos + 1);

      if( $pos !== false && $pos2 !== false )

      {

         # Extract compressed text from between stream tags and uncompress

         $textsection = substr($content, $pos + strlen($searchstart) + 2, $pos2 - $pos - strlen($searchstart) - 1);

         $data = @gzuncompress($textsection);

         # Clean up text via a special function

         $data = ExtractText($data);

         # Increase our PDF pointer past the section we just read

         $startpos = $pos2 + strlen($searchend) - 1;

         if( $data === false ) { return -1; }

         $pdfdocument = $pdfdocument . $data;

      }

   }



   return $pdfdocument;

}



function ExtractText($postScriptData)

{

   while( (($textStart = strpos($postScriptData, '(', $textStart)) && ($textEnd = strpos($postScriptData, ')', $textStart + 1)) && substr($postScriptData, $textEnd - 1) != '\\') )

   {

      $plainText .= substr($postScriptData, $textStart + 1, $textEnd - $textStart - 1);

      if( substr($postScriptData, $textEnd + 1, 1) == ']' ) // This adds quite some additional spaces between the words

      {

         $plainText .= ' ';

      }



      $textStart = $textStart < $textEnd ? $textEnd : $textStart + 1;

   }



   return stripslashes($plainText);

}

?>

michi (Alt+Q) marel.at
01-Jul-2004 11:10


<?PHP

/* A little helpful function to calculate millimeters to points */

function calcToPt($intMillimeter) {

  $intPoints = ($intMillimeter*72)/25.4;

  $intPoints = round($intPoints);

  return $intPoints;

}



/* For example: Create DIN A4 210x297 mm */

pdf_begin_page( $pdf, calcToPt(210), calcToPt(297)); // 595x842 pt

?>

donatas at spurgius dot com
23-Jun-2004 04:56


I've been looking for a way to extract plain text from PDF documents (needed to search for text inside 'em). Not being able to find one I wrote the needed functions myself. here you go folks.



<?php

  function pdf2string ($sourceFile)

  {

    $textArray = array ();

    $objStart = 0;

    

    $fp = fopen ($sourceFile, 'rb');

    $content = fread ($fp, filesize ($sourceFile));

    fclose ($fp);

    

    $searchTagStart = chr(13).chr(10).'stream';

    $searchTagStartLenght = strlen ($searchTagStart);

    

    while ((($objStart = strpos ($content, $searchTagStart, $objStart)) && ($objEnd = strpos ($content, 'endstream', $objStart+1))))

    {

      $data = substr ($content, $objStart + $searchTagStartLenght + 2, $objEnd - ($objStart + $searchTagStartLenght) - 2);

      $data = @gzuncompress ($data);

      

      if ($data !== FALSE && strpos ($data, 'BT') !== FALSE && strpos ($data, 'ET') !== FALSE)

      {

        $textArray [] = ExtractText ($data);

      }

      

      $objStart = $objStart < $objEnd ? $objEnd : $objStart + 1;

    }

    

    return $textArray;

  }

  

  function ExtractText ($postScriptData)

  {

    while ((($textStart = strpos ($postScriptData, '(', $textStart)) && ($textEnd = strpos ($postScriptData, ')', $textStart + 1)) && substr ($postScriptData, $textEnd - 1) != '\\'))

    {

      $plainText .= substr ($postScriptData, $textStart + 1, $textEnd - $textStart - 1);

      if (substr ($postScriptData, $textEnd + 1, 1) == ']') //this adds quite some additional spaces between the words

      {

        $plainText .= ' ';

      }

      

      $textStart = $textStart < $textEnd ? $textEnd : $textStart + 1;

    }

    

    return stripslashes ($plainText);

  }

?>

uwe at steinmann dot cx
13-May-2004 10:25


Those looking for a free replacement of pdflib may consider

pslib at http://pslib.sourceforge.net which produces PostScript but it can be easily turned into PDF by Acrobat Distiller or ghostscript. The API is very similar and even hypertext functions are supported. There

is also a php extension for pslib in PECL, called ps.

samcontact at myteks dot com
01-May-2004 08:28


Here is another great tutorial on basic PDF building w/ PHP:

http://hotwired.lycos.com/webmonkey/02/20/index3a.html?tw=programming



=======================

http://myteks.com 

Computer Repair & Web Design

=======================

SenorTZ senortz at nospam dot yahoo dot com
28-Jul-2003 10:23


About creating a PDF document based on the content of another document(let's say a text file):



I have tried to send to the PDF-creator page from a link from the sender page the file name of the file I want to read the content from and generate the PDF document containing this content. The idea is is that when I tried to reffer the pdf-creator page via the link your_root/create_pdf.php?filename=$your_file_name, the pdf-creator page does not behave well when before creating the pdf document I have a line like $filename = $_GET["filename"].

I solved this using on the sender page instead of the link a form with a button, so the form has as action "create_pdf.php", as method "post" and a hidden field containing the "filename" value. And it works like this if, on the pdf-creator page I have a line like $filename = $_POST["filename"].



I would like to understand why this way it works and the other way does not.



I hope this helps. Here are the pieces of code I used.



Sender page:

print("<form name='to_pdf' action='see_pdf_file.php' method='post'>");

print("<br/><input type='submit' value='PDF'><input type='hidden' name='filename' value='$filename'></form>");



PDF-creator page:

<?

$filename = $_POST["filename"];

$file_handle = fopen($filename, "r");

$file_content = file_get_contents($filename);

fclose($file_handle);

//

$file_content = wordwrap($file_content,72,"|");

$a_row = explode("|",$file_content);

$i = 0;

//

$pdf = pdf_new();

pdf_open_file($pdf, "");

pdf_begin_page($pdf, 595, 842);

pdf_set_font($pdf, "Times-Roman", 16, "host");

pdf_add_outline($pdf, "Page 1");

pdf_set_value($pdf, "textrendering", 1);

pdf_show_xy($pdf, 'The content of the file:',50,700);

while ($a_row[$i] != "")

{

       pdf_continue_text($pdf,$a_row[$i]);

       $i++;

}

pdf_end_page($pdf);

pdf_close($pdf);

//

$data = pdf_get_buffer($pdf);

//

header("Content-type: application/pdf");

header("Content-disposition: inline; filename=test.pdf");

header("Content-length: " . strlen($data));

//

echo $data;

?>



PDFLib and PHP 431 used.



Thanks.

bmironov at jonview dot com
25-Jun-2003 07:46


RedHat 9 + Apache 2.0 + PHP 4.3.2 + Oracle 9i + PDFlib 5.0.1 (binary distribution)



It seems to be a working bundle if you do some magic with ./configure:



RedHat 9:

kernel-2.4.20-18.9



Apache 2.0.46:

./configure --enable-so --enable-rewrite=shared --enable-status --enable-mpm=prefork



PHP 4.3.2:

./configure \

--program-prefix= \

--prefix=/usr \

--exec-prefix=/usr \

--bindir=/usr/bin \

--sbindir=/usr/sbin \

--sysconfdir=/etc \

--datadir=/usr/share \

--includedir=/usr/include \

--libdir=/usr/lib \

--libexecdir=/usr/libexec \

--localstatedir=/var \

--sharedstatedir=/usr/com \

--mandir=/usr/share/man \

--infodir=/usr/share/info \

--with-config-file-path=/etc \

--with-config-file-scan-dir=/etc/php.d \

--without-tsrm-pthreads \    # !!!!!!!!!!!!!!!!!!!!

--with-zlib \

--with-gd \

--enable-gd-native-ttf \

--with-ttf \

--without-mysql \

--with-apxs2filter=/usr/local/apache2/bin/apxs \

--with-oci8 \

--enable-sigchild \

--enable-inline-optimization



Oracle9i:

ln -s $ORACLE_HOME/rdbms/public/nzerror.h $ORACLE_HOME/rdbms/demo/nzerror.h



ln -s $ORACLE_HOME/rdbms/public/nzt.h $ORACLE_HOME/rdbms/demo/nzt.h



ln -s $ORACLE_HOME/rdbms/public/ociextp.h $ORACLE_HOME/rdbms/demo/ociextp.h



If you want to use bundled GD-library then:

1) install following packages: libjpeg, libjpeg-devel, libpng, libpng-devel, freetype, freetype-devel, libtiff, libtiff-devel, zlib, zlib-devel



2) ln -s /usr/lib/libjpeg.so.62 /usr/lib/libjpeg.so

ln -s /usr/lib/libpng.so.62 /usr/lib/libpng.so



It seems to be a working combination, because it is NOT give you:

1) error message in Apache's error_log:

Module compiled with module API=20020429, debug=0, thread-safety=0

PHP compiled with module API=20020429, debug=0, thread-safety=1



2) error message in Apache's error_log:

[notice] child pid 12345 exit signal Segmentation fault (11)



3) MS Internet Explorer can show PDF-output from your PHP-script via Acrobat plug-in and does not crush. No confusing messages about opening "Adobe Acrobat Control for ActiveX".



Hope it will save you some time.



Good luck,

Boris

pbierans at lynet dot de
28-Mar-2002 02:56


Load extension, open a PDF, add a font, modify PDF in memory and send

it to browser:



<?php

  // no cache headers:

  header("Expires: Mon, 26 Jul 1997 05:00:00 GMT");

  header("Last-Modified: ".gmdate("D, d M Y H:i:s")." GMT");

  header("Cache-Control: no-store, no-cache, must-revalidate");

  header("Cache-Control: post-check=0, pre-check=0", false);

  header("Pragma: no-cache");



  $ext_name="libpdf_php.so";

    // libpdf_php.so is the PDFLIB for SunOS by "PDFlib GmbH"

    // visit http://www.pdflib.com



  // if the extension is not automatically loaded by Apache

  // dl() will try to load it on demand:

  if (!extension_loaded($ext_name) && !@dl($ext_name))

  {

    ?>

    <table width="100%" border="0"><tr><td align="center">

      <table style="border: solid #f0f0f0 2px;"><tr>

        <td valign="middle" style="padding: 20px; margin: 0px;">

          <p style="font-family: arial; font-size: 12px; ">

          <b>Sorry,</b><br>

          &nbsp;<br>

          A PDF can not be generated right now.<br>

          The administrator has been informed and will fix this as

          soon as possible.<br>

          Please try again later.

        </p>

      </td></tr></table>

    </td></tr></table>

    <?php

    mail('admin@domain.com','Error: PDFLib not found',

         'Called by script:\n  '.$SCRIPT_FILENAME.'?'.$QUERY_STRING,

         "From: warnings@domain.com\n");

    exit;

  } // verify that extension is usable



  // unique serial number:

  srand(microtime()*10000);

  $usnr= gmdate("Ymd-His-").rand(1000,9999).'-';

  $pdf_file=$usnr.'result.pdf';

  $src_file='source.pdf';



  // create pdf object

  $pdf = pdf_new();

  pdf_open_file($pdf);

  pdf_set_parameter($pdf, 'serial',      'if-you-have-one');



  // fonts to embed, they are in the folder of this file:

  pdf_set_parameter($pdf, 'FontAFM',     'TradeGothic=Tg______.afm');

  pdf_set_parameter($pdf, 'FontOutline', 'TradeGothic=Tg______.pfb');

  pdf_set_parameter($pdf, 'FontPFM',     'TradeGothic=Tg______.pfm');



  // load the source file:

  $src_doc   =pdf_open_pdi($pdf,$src_file,'', 0);

  $src_page  =pdf_open_pdi_page($pdf,$src_doc,1,'');

  $src_width =pdf_get_pdi_value($pdf,'width' ,$src_doc,$src_page,0);

  $src_height=pdf_get_pdi_value($pdf,'height',$src_doc,$src_page,0);



  pdf_begin_page($pdf, $src_width, $src_height);

  {

    // place the sourcefile to the background of the actual page:

    pdf_place_pdi_page($pdf,$src_page,0,0,1,1);

    pdf_close_pdi_page($pdf,$src_page);



    // modify the page:

    pdf_set_font($pdf, 'TradeGothic', 8, 'host');

    pdf_show_xy($pdf, 'Now: '.gmdate("Y-m-d H:i:s"),50,50);

  }

  pdf_end_page($pdf);

  pdf_close($pdf);



  // prepare output:

  $pdfdata = pdf_get_buffer($pdf); // to echo the pdf-data

  $pdfsize = strlen($pdfdata);     // IE requires the datasize



  // real datatype headers:

  header('Content-type: application/pdf');

  header('Content-disposition: attachment; filename="'.$pdf_file.'"');

  header('Content-length: '.$pdfsize);

  echo $pdfdata;

  exit; // keep this one so no #13#10 or #32 will be written

?>

add a note

PDF_activate_item

" width="11" height="7"/>

preg_split

Last updated: Thu, 31 May 2007

CXXI. PDF 関数

導入

要件

古いバージョンの PDFlib に関する問題

インストール手順

リソース型

廃止された PDFlib 関数についての注意

例

目次