最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

export to csv - My txt. file can't be read by R for Gene Onthology analysis - Stack Overflow

programmeradmin14浏览0评论

So, I have 3 .txt files according to the three categories of gene enrichment I downloaded from the GO platform and they just can't be read in R, I think it's due to the inconsistent columns.

First I tried using skip:

BP_results <- read.table("Data/analysisBP.txt", header = TRUE, sep = "\t", stringsAsFactors = FALSE, skip = 10, fill = TRUE)

It didn't work, so I tried converting the file to .csv and then separate the data in columns, but instead it separated each word of the categories by columns. I think the problem relies on the inconsistent columns from the .txt files I downloaded in GO. I also looked if there is any other options to download this data in a different type of file in GPO, but I'm unfamiliar with the XML and JSON options. How can I fix this? Do I change the files manually?

Any help is appreciated, thanks.

So, I have 3 .txt files according to the three categories of gene enrichment I downloaded from the GO platform and they just can't be read in R, I think it's due to the inconsistent columns.

First I tried using skip:

BP_results <- read.table("Data/analysisBP.txt", header = TRUE, sep = "\t", stringsAsFactors = FALSE, skip = 10, fill = TRUE)

It didn't work, so I tried converting the file to .csv and then separate the data in columns, but instead it separated each word of the categories by columns. I think the problem relies on the inconsistent columns from the .txt files I downloaded in GO. I also looked if there is any other options to download this data in a different type of file in GPO, but I'm unfamiliar with the XML and JSON options. How can I fix this? Do I change the files manually?

Any help is appreciated, thanks.

Share Improve this question edited Feb 5 at 6:59 jay.sf 73k8 gold badges63 silver badges125 bronze badges asked Feb 5 at 6:10 Julieta GonzálezJulieta González 111 silver badge2 bronze badges 5
  • 2 If it's tab-separated as it looks like, there's nothing inconsistent, it's just not really human readable. In your image you didn't use skip= to skip the meta data lines. If it still doesn't work maybe you skip the wrong number of lines, try with different parameters, e.g. skip=11. – jay.sf Commented Feb 5 at 7:01
  • 2 The file seems corrupted, you can see some rows have the newline in the wrong place. The header is at 7th row. Please paste the first 20 rows of file as text, not image. Or if it is public data, provide web link. – zx8754 Commented Feb 5 at 8:01
  • @zx8754 I think you've been misled and it's just a line wrap. – jay.sf Commented Feb 5 at 9:38
  • @jay.sf check the screenshot, after read.table, it is a newline. In any case, OP must give us example text file. – zx8754 Commented Feb 5 at 9:52
  • skip=10 or skip=11? Why? It seems to me that it should be skip=7. And that the 7th text line is the column headers line, after some parsing. (in which case it should be header=FALSE). – Rui Barradas Commented Feb 5 at 10:47
Add a comment  | 

1 Answer 1

Reset to default 1

Everything is good. I just changed the skip argument to 11 and it worked. I used the example file from de Gene Ontology webpage:

read.table("DATA/analysis.txt", header = TRUE, sep = "\t", stringsAsFactors = FALSE, skip = 11, fill = TRUE)

And maybe you can keep it simple:

read.delim("DATA/analysis.txt", stringsAsFactors = FALSE, skip = 11)

Or if you have readr installed, from tidyverse:

readr::read_tsv("DATA/analysis.txt", skip = 11)
发布评论

评论列表(0)

  1. 暂无评论