最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

json - Javascript remove leading and trailing spaces from multiline string and replace the rest of whitespace chunks with commas

programmeradmin6浏览0评论

How can I convert this text

data=`ID   ra      dec     V       VR      MJD
  100     30.1  +15     7.00    -10     2450000.1234
200   30.2      +16     12.226  -5.124  2450000.2345
   300  30.3     +17    13.022  12.777    2450000.3456


400      30.4  +18     14.880  13.666  2450000.6789
500 30.5        +19 12.892      -1.835  2450001
 600     30.6    +20     17.587  15.340  2450002.123
700     30.7    +21       13.984  13.903  2450000.123456 
800    30.8    +22     20.00   10.000  2450003.0     `

i.e an imported text with multiple lines and columns separated by spaces and tabs, into this

ID,ra,dec,V,VR,MJD
100,30.1,+15,7.00,-10,2450000.1234
200,30.2,+16,12.226,-5.124,2450000.2345
300,30.3,+17,13.022,12.777,2450000.3456


400,30.4,+18,14.880,13.666,2450000.6789
500,30.5,+19,12.892,-1.835,2450001
600,30.6,+20,17.587,15.340,2450002.123
700,30.7,+21,13.984,13.903,2450000.123456
800,30.8,+22,20.00,10.000,2450003.0

Unfortunately,

  • this regex data=data.replace(/^\s+|\s+$/g,'').replace(/[\t \r]+/g,','); only works with the first line,
  • this one data.replace(/[^\S\r\n]+$/gm, "").replace(/[\t \r]+/g,','); is ok, but only for for for traling.

Extra: How can I transform it to a json which separate the two blocks into two datasets such as [[{id:..., ra:...},{},{}],[{id:..., ra:...},{},{}]]

How can I convert this text

data=`ID   ra      dec     V       VR      MJD
  100     30.1  +15     7.00    -10     2450000.1234
200   30.2      +16     12.226  -5.124  2450000.2345
   300  30.3     +17    13.022  12.777    2450000.3456


400      30.4  +18     14.880  13.666  2450000.6789
500 30.5        +19 12.892      -1.835  2450001
 600     30.6    +20     17.587  15.340  2450002.123
700     30.7    +21       13.984  13.903  2450000.123456 
800    30.8    +22     20.00   10.000  2450003.0     `

i.e an imported text with multiple lines and columns separated by spaces and tabs, into this

ID,ra,dec,V,VR,MJD
100,30.1,+15,7.00,-10,2450000.1234
200,30.2,+16,12.226,-5.124,2450000.2345
300,30.3,+17,13.022,12.777,2450000.3456


400,30.4,+18,14.880,13.666,2450000.6789
500,30.5,+19,12.892,-1.835,2450001
600,30.6,+20,17.587,15.340,2450002.123
700,30.7,+21,13.984,13.903,2450000.123456
800,30.8,+22,20.00,10.000,2450003.0

Unfortunately,

  • this regex data=data.replace(/^\s+|\s+$/g,'').replace(/[\t \r]+/g,','); only works with the first line,
  • this one data.replace(/[^\S\r\n]+$/gm, "").replace(/[\t \r]+/g,','); is ok, but only for for for traling.

Extra: How can I transform it to a json which separate the two blocks into two datasets such as [[{id:..., ra:...},{},{}],[{id:..., ra:...},{},{}]]

Share Improve this question edited Jun 8, 2022 at 8:58 Wiktor Stribiżew 628k41 gold badges498 silver badges614 bronze badges asked Jun 10, 2016 at 17:54 leonard vertighelleonard vertighel 1,0681 gold badge19 silver badges39 bronze badges 5
  • are there spaces between the values or tabs? – Nina Scholz Commented Jun 10, 2016 at 17:59
  • Thank you for your quick ment. They can be one or multiple spaces or tabs. – leonard vertighel Commented Jun 10, 2016 at 18:00
  • do they have the same meaning (as separating)? what is with sparsed values? – Nina Scholz Commented Jun 10, 2016 at 18:02
  • no, they are just a way to separate data columns – leonard vertighel Commented Jun 10, 2016 at 18:07
  • Your 'Extra' is a separate question. – 1983 Commented Jun 10, 2016 at 18:50
Add a ment  | 

4 Answers 4

Reset to default 4

The string conversion might be easier with split/join and trim:

data
    .split(/\r?\n/)
    .map(row => row.trim().split(/\s+/).join(','))
    .join('\n')

The extra credit is a little more involved. :)

const rows = data.split(/\r?\n/).map(row => row.trim().split(/\s+/).join(','));
const keys = rows.shift().split(',');
const chunks = rows.join("\n").split(/\n{2,}/);

const output = chunks .map(chunk => chunk.split("\n").map(
    row => row.split(',').reduce((obj, v, i) => {
        obj[keys[i]] = v;
        return obj;
    }, {})
));

You're nearly there. You want the multiline flag on the first replace, but don't replace \n, so don't use \s. Use [ \t] instead:

var data = 'ID   ra      dec     V       VR      MJD\n' +
        '  100     30.1  +15     7.00    -10     2450000.1234\n' +
        '200   30.2      +16     12.226  -5.124  2450000.2345\n' +
        '   300  30.3     +17    13.022  12.777    2450000.3456\n' +
        '\n' +
        '\n' +
        '400      30.4  +18     14.880  13.666  2450000.6789\n' +
        '500 30.5        +19 12.892      -1.835  2450001\n' +
        ' 600     30.6    +20     17.587  15.340  2450002.123\n' +
        '700     30.7    +21       13.984  13.903  2450000.123456\n' +
        '800    30.8    +22     20.00   10.000  2450003.0     \n'

var result = data.replace(/^[ \t]+|[ \t]+$/gm,'').replace(/[ \t]+/g,',')
console.log(result);

// First: the trimming part. Split on newlines, process
// each line by trimming it and replacing remaining white
// space with mas
var data = 'ID   ra      dec     V       VR      MJD\n\
  100     30.1  +15     7.00    -10     2450000.1234\n\
200   30.2      +16     12.226  -5.124  2450000.2345\n\
   300  30.3     +17    13.022  12.777    2450000.3456\n\
\n\
\n\
400      30.4  +18     14.880  13.666  2450000.6789\n\
500 30.5        +19 12.892      -1.835  2450001\n\
 600     30.6    +20     17.587  15.340  2450002.123\n\
700     30.7    +21       13.984  13.903  2450000.123456 \n\
800    30.8    +22     20.00   10.000  2450003.0     ';

data = data.split('\n');
var i = 0, l = data.length;
for ( ; i < l; i++)
    data[i] = data[i].trim().replace(/\s+/g,',');
data = data.join('\n');
document.write('<h1>Formatted data string</h1><pre><code>'+data+'</code></pre>');

// Now to turn it into objects.
// We'll strip the first line because
// that'll be the list of column names:
var cols = data.replace(/^([^\n]+)\n/,'$1').split(','),
    columnCount = cols.length;
data = data.replace(/^[^\n]+\n/,'');

// Now separate the 2 datasets
var datasets = data.split('\n\n\n');
document.write('<h1>First dataset</h1><pre><code>'+datasets[0]+'</code></pre>');
document.write('<h1>Second dataset</h1><pre><code>'+datasets[1]+'</code></pre>')

// Now we go through each line and
// place the values into objects which
// we'll push to an array
var processed = [];
i = 0;
l = datasets.length;
for ( ; i < l; i++){
    processed[i] = [];
    var lines = datasets[i].split('\n'),
        lineCount = lines.length;
    for (var j = 0; j < lineCount; j++){
        var dataArray = lines [j].split(','),
            obj = {};
        for (var k = 0; k < columnCount; k++)
            obj[cols[k]] = dataArray[k];
        processed[i].push(obj);
    }
}
var finalJSON = JSON.stringify(processed);
document.write('<h1>Final JSON</h1><pre><code>'+finalJSON+'</code></pre>');

So, since you know the exact format of each line, you can use capture groups on a per-line basis to extract the details. Try something like this:

/^\s*(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s*$/mg

Remember that \s matches all whitespace, while \S matches non-whitespace. You may need to tweak the capture groups to your liking, if necessary. Then, using the multiline and global flags, we are all set up to iterate over all the matches.

Here's the code:

// Your data, with the header removed, formatted as a string literal:
var data = "100     30.1  +15     7.00    -10     2450000.1234\n"+
"200   30.2      +16     12.226  -5.124  2450000.2345\n"+
"   300  30.3     +17    13.022  12.777    2450000.3456\n"+
"\n"+
"\n"+
"400      30.4  +18     14.880  13.666  2450000.6789\n"+
"500 30.5        +19 12.892      -1.835  2450001\n"+
" 600     30.6    +20     17.587  15.340  2450002.123\n"+
"700     30.7    +21       13.984  13.903  2450000.123456 \n"+
"800    30.8    +22     20.00   10.000  2450003.0";

// The pattern to grab the data:
var data_pattern = /^\s*(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s*$/mg;

// Keep matching until we run out of lines that match...
var results = [];
var line_match;
while ((line_match = data_pattern.exec(data)) !== null){
    // Parse the match into a json object and add it to the results.
    results.push({
        ID: line_match[1],
        ra: line_match[2],
        dec: line_match[3],
        V: line_match[4],
        VR: line_match[5],
        MJD: line_match[6]
    });
}

// Output the results.
console.log(JSON.stringify(results, null, 2));

And here's the results on the console:

[
  {
    "ID": "100",
    "ra": "30.1",
    "dec": "+15",
    "V": "7.00",
    "VR": "-10",
    "MJD": "2450000.1234"
  },
  {
    "ID": "200",
    "ra": "30.2",
    "dec": "+16",
    "V": "12.226",
    "VR": "-5.124",
    "MJD": "2450000.2345"
  },
  {
    "ID": "300",
    "ra": "30.3",
    "dec": "+17",
    "V": "13.022",
    "VR": "12.777",
    "MJD": "2450000.3456"
  },
  {
    "ID": "400",
    "ra": "30.4",
    "dec": "+18",
    "V": "14.880",
    "VR": "13.666",
    "MJD": "2450000.6789"
  },
  {
    "ID": "500",
    "ra": "30.5",
    "dec": "+19",
    "V": "12.892",
    "VR": "-1.835",
    "MJD": "2450001"
  },
  {
    "ID": "600",
    "ra": "30.6",
    "dec": "+20",
    "V": "17.587",
    "VR": "15.340",
    "MJD": "2450002.123"
  },
  {
    "ID": "700",
    "ra": "30.7",
    "dec": "+21",
    "V": "13.984",
    "VR": "13.903",
    "MJD": "2450000.123456"
  },
  {
    "ID": "800",
    "ra": "30.8",
    "dec": "+22",
    "V": "20.00",
    "VR": "10.000",
    "MJD": "2450003.0"
  }
]

I hope this helped.

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论