So, I have 3 .txt files according to the three categories of gene enrichment I downloaded from the GO platform and they just can't be read in R, I think it's due to the inconsistent columns.
First I tried using skip:
BP_results <- read.table("Data/analysisBP.txt", header = TRUE, sep = "\t", stringsAsFactors = FALSE, skip = 10, fill = TRUE)
It didn't work, so I tried converting the file to .csv and then separate the data in columns, but instead it separated each word of the categories by columns. I think the problem relies on the inconsistent columns from the .txt files I downloaded in GO. I also looked if there is any other options to download this data in a different type of file in GPO, but I'm unfamiliar with the XML and JSON options. How can I fix this? Do I change the files manually?
Any help is appreciated, thanks.
So, I have 3 .txt files according to the three categories of gene enrichment I downloaded from the GO platform and they just can't be read in R, I think it's due to the inconsistent columns.
First I tried using skip:
BP_results <- read.table("Data/analysisBP.txt", header = TRUE, sep = "\t", stringsAsFactors = FALSE, skip = 10, fill = TRUE)
It didn't work, so I tried converting the file to .csv and then separate the data in columns, but instead it separated each word of the categories by columns. I think the problem relies on the inconsistent columns from the .txt files I downloaded in GO. I also looked if there is any other options to download this data in a different type of file in GPO, but I'm unfamiliar with the XML and JSON options. How can I fix this? Do I change the files manually?
Any help is appreciated, thanks.
Share Improve this question edited Feb 5 at 6:59 jay.sf 73k8 gold badges63 silver badges125 bronze badges asked Feb 5 at 6:10 Julieta GonzálezJulieta González 111 silver badge2 bronze badges 5 |1 Answer
Reset to default 1Everything is good. I just changed the skip argument to 11 and it worked. I used the example file from de Gene Ontology webpage:
read.table("DATA/analysis.txt", header = TRUE, sep = "\t", stringsAsFactors = FALSE, skip = 11, fill = TRUE)
And maybe you can keep it simple:
read.delim("DATA/analysis.txt", stringsAsFactors = FALSE, skip = 11)
Or if you have readr installed, from tidyverse:
readr::read_tsv("DATA/analysis.txt", skip = 11)
skip=
to skip the meta data lines. If it still doesn't work maybe you skip the wrong number of lines, try with different parameters, e.g.skip=11
. – jay.sf Commented Feb 5 at 7:01skip=10
orskip=11
? Why? It seems to me that it should beskip=7
. And that the 7th text line is the column headers line, after some parsing. (in which case it should beheader=FALSE
). – Rui Barradas Commented Feb 5 at 10:47