締切済み

ディレクトリ内のtxtファイル中の英単語数等をカウントしたいのですがわ

2010/08/09 18:05

ディレクトリ内のtxtファイル中の英単語数等をカウントしたいのですがわかりません。 PERLを使って、テキストファイル中の段落数、文章数、単語数をカウントしたいと思っていて、splitをつかって頑張っていますがわかりません。テキストファイル中では、 I stand here today humbled by the task before us, grateful for the trust you've bestowed, mindful of the sacrifices borne by our ancestors. I thank President Bush for his service to our nation as well as the generosity and cooperation he has shown throughout this transition. Forty-four Americans have now taken the presidential oath. The words have been spoken during rising tides of prosperity and the still waters of peace. Yet, every so often, the oath is taken amidst gathering clouds and raging storms. At these moments, America has carried on not simply because of the skill or vision of those in high office, but because we, the people, have remained faithful to the ideals of our forebears and true to our founding documents. という具合に、段落は改行で、文章の区切りは. で（.と半角スペース２つ）、単語はスペース1つでそれぞれ区切ってあります。 while($data = <>){ chomp($data); @paragraph = split(/\n/, $data); } 上記のように、ファイル全体を、改行を区切りに要素に分解することはできています。が、これをどうやってカウントしていくのかがまったくわかりません。ご教授願います。

GMCoufs
お礼率0% (0/3)

Perl
回答数3
ありがとう数5

回答全件

基本的に改行を数えて段落を、ピリオドを数えて文章を、スペース数えて単語…

2010/08/17 17:29

うつし間違いでなければ > @word=split(/ /,$sent…

2010/08/10 03:06

ん? $data = <> は1行しか読みこまないし、それを cho…

2010/08/09 18:54

みんなの回答 （3）
専門家の回答

みんなの回答

toraneko75
ベストアンサー率51% (27/52)

2010/08/17 17:29 回答No.3

基本的に改行を数えて段落を、ピリオドを数えて文章を、スペース数えて単語を数えたらいいのだと思いますが、段落は＋１して調整して、単語は「文章の終わりはスペース２つ」「改行があるところはスペースなし」なので調整してあげたらよいかと思います。最後はピリオドで改行なしなら合いそうですが単語の数が一個くらいずれるかもしれないです。 #!/usr/bin/perl use strict; use warnings; my $sentence = "I stand here today humbled by the task before us, grateful for the trust you've bestowed, mindful of the sacrifices borne by our ancestors. I thank President Bush for his service to our nation as well as the generosity and cooperation he has shown throughout this transition. Forty-four Americans have now taken the presidential oath. The words have been spoken during rising tides of prosperity and the still waters of peace. Yet, every so often, the oath is taken amidst gathering clouds and raging storms. At these moments, America has carried on not simply because of the skill or vision of those in high office, but because we, the people, have remained faithful to the ideals of our forebears and true to our founding documents."; my ( $sent, $word, $para ); my @letters = split //, $sentence; foreach (@letters) { if (m/\n/) { $para++ } if (m/\./) { $sent++ } if (m/ /) { $word++ } } $word = $word + $para * 2 - $sent + 2; $para++; print "$para,$sent,$word";

全文を見る

ログインすると、全ての回答が全文表示されます。

kmee
ベストアンサー率55% (1857/3366)

2010/08/10 03:06 回答No.2

うつし間違いでなければ > @word=split(/ /,$sentence); $sentenceについての記述がどこにもないので、$sentenceは空のはずです。 @word=split(/ /,$in1); $tango = $tango + $#word +1; ではどうですか?

全文を見る

ログインすると、全ての回答が全文表示されます。

kmee
ベストアンサー率55% (1857/3366)

2010/08/09 18:54 回答No.1

ん? $data = <> は1行しか読みこまないし、それを chomp($data); として改行文字を削除しているので @paragraph = split(/\n/, $data); としても改行文字がないので分割されないと思うのですが。 $/を変えて全部読み込むようにしているのだとすると、こんどはwhileが意味をなさないです。また、@paragraphをその都度上書きしていて、最後の行しか残らないです。まずは、段落とかを考えずに「テキストファイルの行数を数えるには」を考えましょう $lines =0; while($data=<>){ $lines ++ ; } これで行数が数えられるのは、理解できますか? では、取り込んだ$dataから計算された値の合計、となると $nanka = 0; while($data=<>){ $nanka += &KEISAN($data) ; } これも理解できますか? では、問題の「文章」はどんなKEISANでできるでしょう?「単語」は?

質問者

補足 2010/08/09 23:01

$lines =0; $bun=0; $tango=0; while($data=<>){ $lines ++ ; @sentence=split(/. /,$data); foreach $in1(@sentence){ @word=split(/ /,$sentence); $bun++; } } これで行数と、文章数がカウントできるようになりました。それぞれの文章をスペース1つを区切りに分解する、という作業を繰り返すごとに文章に1を足すという方法です。しかしこれでは単語数をカウントできません。 $bun++; $tango = $tango + $#word +1; でイケるかなと思ったのですが違いました。

全文を見る

ログインすると、全ての回答が全文表示されます。

関連するQ&A

英単語は簡単ですが読めません。
could I go against everything that others have come to know about me and do something completely out of character and outrageous? Am I, even for a moment,really free to do whatever I like? おそらく抽象的なことが書かれていて、直訳しようとするとうまく訳せないタイプの英文のように思えます。英単語もすごく簡単だと思うんですが、なぜか読む事が出来ません。読める英文のはずなのに、どうして読めないかもわかりません。私は何に対しても行く、他者が知りたがる、私について、そして…うーん、わからないです。訳を知りたいのではなく、どうやって読み解くか知りたいです。どういう構造になっていて、どう読み解けばいいのか教えて頂けないでしょうか。
- ベストアンサー
- 英語
秀丸で単語をカウント
秀丸エディターであるテキストファイル内にある単語の数を数えたいのですが、今は、Ａ→Ａと置き換えています。もう少し簡単にカウントする方法があれば教えてください。
- ベストアンサー
- その他(ソフトウェア)
c言語で任意のファイルから読み込んだ単語の数をカウントする
c言語で任意のファイルから読み込んだ単語の数をカウントする任意のファイルを読み込んだプログラムに、読み込んだ単語の数をカウントするプログラムを追加する課題が出ました。条件は単語は空白で区切って１単語とする。改行も考慮に入れる。関数を定義してポインタを使うらしいのですが検討がつきません。下のプログラムでファイルを読み込むところまでは出来ています。どうかご教授おねがいします；； #include <stdio.h> int main(int argc, char *argv[]) { FILE *myFile; int i=0,c; if(argc < 2) { fprintf(stderr,"alice.txt is required\n"); return 1; } myFile=fopen(argv[1],"r"); if(myFile==NULL) { fprintf(stderr,"Cannot open; %s\n",argv[1]); } while ((c=fgetc(myFile)) !=EOF) { fprintf(stderr,"%c",c); } fclose(myFile); return 0; }
- ベストアンサー
- C・C++・C#
単語数をカウントする方法：その智慧をお貸し下さい！
こんにちは。翻訳の仕事をやっています。教えて頂きたいのは、ファイルに含まれる単語数をカウントする方法です。添付画像にあるように、そのファイルには "photoLibraryUnavailableAlertTitle" = "Photo Library Unavailable"; "photoLibraryUnavailableAlertMessage" = "The photo library is currently unavailable or empty. You can sync photos onto your device using iTunes."; "unsupportedMovieAlertMessage" = "The movie file \"%@\" cannot be played. Only files with the extensions .mov, .mp4, .mpv and .3gp are supported."; というような形式で英語が 300列ほど並んでいます。カウントしたいのは = 移行の "ここの部分” です。例えば "photoLibraryUnavailableAlertTitle" = "Photo Library Unavailable"; ならば、"Photo Library Unavailable"の部分をカウントして三文字となります。今までは不必要な部分をひとつひとつ手動で削除し、最後にワードカウントに掛ける、という方法をとっていたのですが、今後も同じような形式のファイルを受け取るとる予定があるので、ここで質問させて頂きました。さすがに全自動化というわけにはいかないと思いますが、すこしても手数が減らせる方法をご存じでしたら、ご教授おねがいいたします。ちなみに、使用しているのはmacです。以上、宜しくお願いいたします。
- ベストアンサー
- その他(ソフトウェア)
ワードの文字数のカウント
最近、自宅で文章を書く仕事を始めました。そこで、字数の確認が不可欠です。私は、今まで文字数のカウント機能を使っていたのですが、先方はワード文書のファイルを開くと最初に、左下に出てくる文字数を見ているようです。今書いている文書が、最初に左下に出る文字数が４８８なのに対してカウント機能を使うと５７７になります。６００字ぐらいにしてもらいたいと言われているので、カウント機能で計算すると問題ないのですが、４８８で見られると少なすぎます。この違いはどうして出てくるのでしょうか？段落ごとに、改行はしています。
- 締切済み
- Windows XP
単語数に制限のある英作文を作るには？
単語数５０程度の英語で○○について説明しなさい。というような問題が出た時の“単語数”とは、 a、the、i、you、and、to、is、are・・・・なども単語数に含めるのでしょうか？あとカンマやピリオドは・・・もちろん含まないですよね？（＾＾；）もし、あるテーマに対して自由に英作をしろという問題が出た場合、みなさんはどのようにして書き進めていきますか？最初に書きたい日本文を考えてから英文にしてみて、文字数がオーバーしたら他の言い方を考える。考えつかなかったらその文を書くのを辞める・・といった感じでしょうか？英作文は全くやったことがなく、右も左もわかりません。何か良い練習法があればお願いします。
- ベストアンサー
- 英語
英単語の前置詞
中１です。 of、in、at、for、to、by の６つの前置詞の使い方がわかりません。各単語例文つきで教えてください。授業は一般動詞習い始めたとこなんで、あんま難しく説明しないでくださいねお願いしますっ<m(__)m>
- ベストアンサー
- 英語
和訳をお願いします
和訳をお願いします 1 Children develop greater dependence on others by having pets. 2 More freedom is one of the benefits of pets ownership. 3 Taking care of pets helps young people build character. 4 Those who have pets find it more difficult to make friends. 5 Animal shelters are to blame for abandoned cats and dogs. 6 Families with pets should move to rural areas. 7 Owning pets is an bligation that should be taken seriously. 8 No steps should be taken to limit the reproduction of cats and dogs. 9 In their own way,our pets taken care of us just as we take care of them. 10 Senior citizen who have pets tend to keep to the themselves. 11 Studies show that HAVI pets may have an unfavorable impact on health. 12 We have many chances to meet with others to discuss our pets health. The editors would like to thank everyone who submitted opinions for this week's column,and we regret That we didn't have space print all of your letters. 長くてすみません(>_<)
- ベストアンサー
- 英語
英単語の訳
What Abbott and others are calling for is not only more attention to narrative, a detailed description of the processes that variables are presumed to capture, but also to systematic means of coding patterns in the narratives to permit generalization. この文章内のcoding patternsの意味をお教え下さい。
- ベストアンサー
- 英語
単語数をカウントするアプリケーションを作りたい
お世話になっております（初心者です）。現在指定した文字の文字数をカウントするアプリケーションを自作しているのですが、行き詰ってしまったのでご教授を頂けないでしょうか。使用言語はC／C++で、コンパイラはgccです。 ----------以下ソース int data_read( HWND hwnd,char *f_name ) { int i; FILE *fpr; fpr=fopen(f_name,"r"); if( fpr == NULL ){ MessageBox(hwnd,"データ読み込みエラーです","データ読み込みエラー",MB_OKCANCEL); return(-1); } i=0; while( fgets(g_str02[i],W_MAX,fpr) != NULL){ i++; } fclose(fpr); return(i); } int cou_e(HWND hwnd){ int i; int j,k; int l; l = data_read( hwnd,"c:\\result.txt" ); for(j=0;j<=l;j++){ for(k=0;g_str02[j][k] = '\n';k++){ if( g_str02[j][k] == 'e' ){ i++; } } } g_cou_e = i; return(0); } ----------ソースここまで ※必要なヘッダファイルはインクルード済 ※g_str02[j][k]はグローバルで宣言した関数（char型）。 ※g_cou_eはグローバルで宣言した関数で、指定したテキストファイルの'e'の数が格納される関数（int型）。 ※c:\result.txtは文字数をカウントしたいテキストファイル。このg_cou_eを他の関数で呼び出して使用したいのですが、今のところ実際にはテキストファイルに'e'は数個しか存在していないのにも関わらず、膨大な数値が表示されてしまったりしています。ちなみに、コンパイラ自体は成功します。どうぞ忌憚の無い意見をよろしくお願い致します。
- 締切済み
- C・C++・C#

ディレクトリ内のtxtファイル中の英単語数等をカウントしたいのですがわ

みんなの回答

補足 2010/08/09 23:01

関連するQ&A

注目のQ&A

カテゴリ
一覧

専門家に質問してみよう
専門家登録

あなたにピッタリな商品が見つかる！ OKWAVEセレクト

ディレクトリ内のtxtファイル中の英単語数等をカウントしたいのですがわ

みんなの回答

補足 2010/08/09 23:01

関連するQ&A

注目のQ&A

カテゴリ 一覧

専門家に質問してみよう 専門家登録

あなたにピッタリな商品が見つかる！ OKWAVEセレクト

カテゴリ
一覧

専門家に質問してみよう
専門家登録