atom feed3 messages in org.r-project.r-sig-ecologyRe: [R-sig-eco] reading large files in R
FromSent OnAttachments
Claudia liliana Ballesteros MejiaOct 16, 2009 8:28 am 
CorradoOct 16, 2009 8:34 am 
Claudia liliana Ballesteros MejiaOct 20, 2009 12:59 am 
Subject:Re: [R-sig-eco] reading large files in R
From:Claudia liliana Ballesteros Mejia (lail@yahoo.com)
Date:Oct 20, 2009 12:59:08 am
List:org.r-project.r-sig-ecology

Dear List,

Thanks a lot, I could solve my problem, I didn't change the type of file it was
reading so that's why it gave an strange error.

Cheers, and thanks again

Liliana.

________________________________ From: Steve Friedman <frie@gmail.com>

Sent: Sat, October 17, 2009 12:25:44 AM Subject: Re: [R-sig-eco] reading large files in R

Claudia,

The file you have described is probably too large for R to handle. What OS are
you working with. It matters. Also, do you need the whole file or can you read
in a portion of the file and process the data in logical geographical "zones".

If you have a file spdiez.txt, this should work

spdiez <- read.table("spdiez.txt, header = TRUE, sep = ",") # provided that
you want the header and in fact columns are comma separated.

Steve

Dear list,

I'm working with modeling spatial distributions of some species of butterflies
and I want to work with the BIOMOD package. But I have a very large file (1.25
GB) with 5925284 rows and 28 columns. When I try to load it with read.table it
says:

Error in read.table(file = file, header = header, sep = sep, quote = quote, :

cannot allocate buffer in 'readTableHead'.

so I try to use the code written in "Using R to process large data files",
published in @CSC.
(http://www.csc.fi/sivut/atcsc/arkisto/atcsc3_2007/ohjelmistot_html/R_and_large_data/)

but I can't get it right. So here is my code.

"spdiez.txt" is my file, and they suggest to create a matrix dropping the
columns and rows names.

length(scan("spdiez.txt", nlines=1, sep="\t", what="character"))

m<-matrix(nrow=5925283, ncol=27) filecon<-file("spdiez.txt", open="r") pos<-seek(filecon, rw="r")

for(i in 1:5925283) {

if (i % % 100 == 0) {

print(i)

} tt<-readLines(filecon, n=1) tt2<-na.omit(as.numeric(unlist(strsplit(tt, "\t")))) if(i!=1) { m[(i-1),]<-t(tt2) } pos<-seek(filecon, rw="r") }

but after this, it throws this error

Error in m[(i - 1), ] <- t(tt2) : replacement has length zero In addition: Warning messages: 1: closing unused connection 3 (spdiez.txt) 2: In na.omit(as.numeric(unlist(strsplit(tt, "\t")))) : NAs introduced by coercion 3: In na.omit(as.numeric(unlist(strsplit(tt, "\t")))) : NAs introduced by coercion

I would appreciate any help or idea that I can use to solve my problem.

Kind regards, and thanks in advanced for any suggestion.

Liliana.

-------------------------------------- Liliana Ballesteros Mejia PhD. Student Institute of Biogeography University of Basel St. Johanns Vorstadt 10 CH 4056 Basel Tel: +41-612670803 Switzerland

[[alternative HTML version deleted]]

R-si@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

[[alternative HTML version deleted]]