Abaixo segue a lista das 250 palavras mais comuns, o percentual com que ocorrem e um resumo das estatísticas que mencionei acima.
| Número de livros | 1032 |
| Tamanho do arquivo combinado | 440MB |
| Data | 29/09/97 |
| Número total de palavras | 6.615.271 |
| Número total de palavras diferentes | 103.590 |
| Número de palavras que ocorrem menos de dez vezes | 78.332 |
| Número de ocorrências das 250 palavras mais comuns | 3.781.615 (57%) |
| Número de ocorrências das 1000 palavras mais comuns | 6.565.736 (99.25%) |
| Palavra | Porc. Total | Número de Ocorrências |
|---|---|---|
| the | 7.846512 | 519068 |
| of | 4.460135 | 295050 |
| and | 3.653471 | 241687 |
| to | 2.556630 | 169128 |
| in | 1.815451 | 120097 |
| was | 1.161903 | 76863 |
| that | 1.112607 | 73602 |
| his | 1.079124 | 71387 |
| he | 1.033669 | 68380 |
| it | 0.872406 | 57712 |
| with | 0.772803 | 51123 |
| as | 0.737385 | 48780 |
| by | 0.707575 | 46808 |
| for | 0.666821 | 44112 |
| is | 0.663329 | 43881 |
| had | 0.622454 | 41177 |
| but | 0.576575 | 38142 |
| which | 0.538663 | 35634 |
| on | 0.520341 | 34422 |
| be | 0.506994 | 33539 |
| at | 0.504938 | 33403 |
| not | 0.499753 | 33060 |
| they | 0.499345 | 33033 |
| from | 0.494795 | 32732 |
| were | 0.474372 | 31381 |
| their | 0.472573 | 31262 |
| this | 0.449097 | 29709 |
| or | 0.400679 | 26506 |
| have | 0.384247 | 25419 |
| you | 0.380876 | 25196 |
| her | 0.376991 | 24939 |
| who | 0.363099 | 24020 |
| all | 0.361406 | 23908 |
| him | 0.359970 | 23813 |
| an | 0.338807 | 22413 |
| so | 0.326850 | 21622 |
| are | 0.297705 | 19694 |
| one | 0.293624 | 19424 |
| she | 0.263753 | 17448 |
| my | 0.257782 | 17053 |
| them | 0.254396 | 16829 |
| we | 0.251494 | 16637 |
| been | 0.250602 | 16578 |
| no | 0.242152 | 16019 |
| me | 0.236710 | 15659 |
| if | 0.235637 | 15588 |
| said | 0.234185 | 15492 |
| there | 0.229787 | 15201 |
| when | 0.223619 | 14793 |
| would | 0.221261 | 14637 |
| more | 0.212901 | 14084 |
| will | 0.181625 | 12015 |
| some | 0.174747 | 11560 |
| what | 0.173795 | 11497 |
| into | 0.172102 | 11385 |
| has | 0.167340 | 11070 |
| could | 0.158754 | 10502 |
| than | 0.158255 | 10469 |
| out | 0.156547 | 10356 |
| then | 0.153720 | 10169 |
| up | 0.153418 | 10149 |
| its | 0.150697 | 9969 |
| man | 0.147553 | 9761 |
| time | 0.145315 | 9613 |
| now | 0.140281 | 9280 |
| two | 0.139133 | 9204 |
| upon | 0.139057 | 9199 |
| these | 0.137984 | 9128 |
| after | 0.136533 | 9032 |
| footnote | 0.135414 | 8958 |
| may | 0.135006 | 8931 |
| only | 0.134855 | 8921 |
| other | 0.133676 | 8843 |
| see | 0.128007 | 8468 |
| such | 0.123321 | 8158 |
| do | 0.123245 | 8153 |
| great | 0.120932 | 8000 |
| very | 0.120086 | 7944 |
| any | 0.120010 | 7939 |
| your | 0.118302 | 7826 |
| about | 0.114689 | 7587 |
| made | 0.113495 | 7508 |
| our | 0.112800 | 7462 |
| well | 0.112724 | 7457 |
| first | 0.112346 | 7432 |
| most | 0.110351 | 7300 |
| like | 0.110154 | 7287 |
| before | 0.109187 | 7223 |
| little | 0.108401 | 7171 |
| himself | 0.105287 | 6965 |
| over | 0.103760 | 6864 |
| without | 0.102868 | 6805 |
| own | 0.102808 | 6801 |
| those | 0.101644 | 6724 |
| good | 0.101266 | 6699 |
| might | 0.101175 | 6693 |
| men | 0.099361 | 6573 |
| can | 0.099331 | 6571 |
| should | 0.098817 | 6537 |
| did | 0.098741 | 6532 |
| where | 0.095824 | 6339 |
| come | 0.095763 | 6335 |
| people | 0.095627 | 6326 |
| must | 0.093450 | 6182 |
| us | 0.093057 | 6156 |
| day | 0.088991 | 5887 |
| long | 0.088825 | 5876 |
| much | 0.088810 | 5875 |
| down | 0.088311 | 5842 |
| same | 0.087676 | 5800 |
| mr | 0.083927 | 5552 |
| never | 0.083579 | 5529 |
| even | 0.083398 | 5517 |
| old | 0.082204 | 5438 |
| under | 0.081327 | 5380 |
| through | 0.080828 | 5347 |
| still | 0.080828 | 5347 |
| while | 0.080314 | 5313 |
| many | 0.080239 | 5308 |
| know | 0.079876 | 5284 |
| every | 0.079196 | 5239 |
| life | 0.078591 | 5199 |
| three | 0.077790 | 5146 |
| how | 0.077759 | 5144 |
| way | 0.077291 | 5113 |
| years | 0.076384 | 5053 |
| came | 0.076354 | 5051 |
| king | 0.074963 | 4959 |
| go | 0.073436 | 4858 |
| being | 0.072318 | 4784 |
| again | 0.070549 | 4667 |
| here | 0.069067 | 4569 |
| make | 0.068629 | 4540 |
| back | 0.068115 | 4506 |
| new | 0.067510 | 4466 |
| against | 0.066437 | 4395 |
| found | 0.065198 | 4313 |
| yet | 0.065031 | 4302 |
| say | 0.064230 | 4249 |
| too | 0.064170 | 4245 |
| last | 0.063157 | 4178 |
| though | 0.063051 | 4171 |
| head | 0.062613 | 4142 |
| away | 0.061948 | 4098 |
| right | 0.061131 | 4044 |
| hand | 0.060693 | 4015 |
| place | 0.060375 | 3994 |
| god | 0.060209 | 3983 |
| another | 0.059136 | 3912 |
| shall | 0.059121 | 3911 |
| country | 0.058894 | 3896 |
| part | 0.058788 | 3889 |
| far | 0.058667 | 3881 |
| left | 0.057624 | 3812 |
| eyes | 0.057534 | 3806 |
| soon | 0.056838 | 3760 |
| went | 0.055961 | 3702 |
| take | 0.055856 | 3695 |
| each | 0.055840 | 3694 |
| just | 0.055311 | 3659 |
| power | 0.055221 | 3653 |
| name | 0.054858 | 3629 |
| am | 0.054344 | 3595 |
| death | 0.054147 | 3582 |
| world | 0.053361 | 3530 |
| nor | 0.053104 | 3513 |
| mind | 0.053104 | 3513 |
| once | 0.052923 | 3501 |
| off | 0.052243 | 3456 |
| among | 0.051699 | 3420 |
| thought | 0.051275 | 3392 |
| whom | 0.050731 | 3356 |
| house | 0.050656 | 3351 |
| get | 0.050625 | 3349 |
| nothing | 0.050595 | 3347 |
| between | 0.050459 | 3338 |
| hundred | 0.050278 | 3326 |
| think | 0.050096 | 3314 |
| both | 0.048978 | 3240 |
| young | 0.048887 | 3234 |
| because | 0.048509 | 3209 |
| saw | 0.048267 | 3193 |
| ever | 0.048055 | 3179 |
| let | 0.047980 | 3174 |
| themselves | 0.047557 | 3146 |
| emperor | 0.047345 | 3132 |
| case | 0.046680 | 3088 |
| work | 0.046241 | 3059 |
| whose | 0.046121 | 3051 |
| war | 0.046075 | 3048 |
| took | 0.045939 | 3039 |
| general | 0.045622 | 3018 |
| city | 0.045607 | 3017 |
| state | 0.045259 | 2994 |
| side | 0.044821 | 2965 |
| things | 0.044684 | 2956 |
| always | 0.044533 | 2946 |
| days | 0.043838 | 2900 |
| thus | 0.043808 | 2898 |
| face | 0.043233 | 2860 |
| night | 0.042946 | 2841 |
| less | 0.042931 | 2840 |
| give | 0.042871 | 2836 |
| asked | 0.042825 | 2833 |
| body | 0.042538 | 2814 |
| also | 0.042311 | 2799 |
| seemed | 0.041843 | 2768 |
| four | 0.041646 | 2755 |
| non | 0.041631 | 2754 |
| son | 0.041586 | 2751 |
| whole | 0.041525 | 2747 |
| called | 0.041193 | 2725 |
| don't | 0.040875 | 2704 |
| however | 0.040437 | 2675 |
| love | 0.040316 | 2667 |
| put | 0.040210 | 2660 |
| thousand | 0.039893 | 2639 |
| hands | 0.039877 | 2638 |
| seen | 0.039817 | 2634 |
| tell | 0.039530 | 2615 |
| almost | 0.039227 | 2595 |
| look | 0.039182 | 2592 |
| father | 0.039122 | 2588 |
| heart | 0.038850 | 2570 |
| few | 0.038850 | 2570 |
| got | 0.038653 | 2557 |
| five | 0.038502 | 2547 |
| nature | 0.038366 | 2538 |
| find | 0.038184 | 2526 |
| public | 0.038094 | 2520 |
| going | 0.038063 | 2518 |
| roman | 0.037912 | 2508 |
| perhaps | 0.037716 | 2495 |
| woman | 0.037610 | 2488 |
| since | 0.037126 | 2456 |
| having | 0.036748 | 2431 |
| arms | 0.036718 | 2429 |
| heard | 0.036703 | 2428 |
| looked | 0.036582 | 2420 |
| age | 0.036552 | 2418 |
| gave | 0.036008 | 2382 |
| why | 0.035992 | 2381 |
| words | 0.035781 | 2367 |
| light | 0.035524 | 2350 |
| better | 0.035312 | 2336 |
| end | 0.034934 | 2311 |
| water | 0.034919 | 2310 |
| twenty | 0.034435 | 2278 |
| until | 0.034360 | 2273 |
| others | 0.034330 | 2271 |
Nenhum comentário:
Postar um comentário