I am using the Calibre eBook Mgr (PERL) to import a large # of eBooks. When the eBooks are imported, some of the author names (or variation of author names) are also imported into the "series" column. I am trying to identify some/most of the series names that contain author names so that the entries can be deleted. I have been able to identify matches when the entries in the author/series columns are identical, but when there are differences between the two columns, I have failed.
Sample of combined Author & Series Columns (",-" is delimiter)
- Author ",-" Series
Christie Golden,-Golden, Christie
Talia Beckett,Andrew Bellingham,Jess Mountifield,-Andrew Bellingham
L T Ryan,C R Gray,-C.R. Gray
Hanson, Damien,Phelps, Joseph,-Damien Hanson
Martha Carr,Michael Anderle,-martha r carr
The solution for matching when the author/series are equal, is simple enough:
^(.*?),-(\1)$
I have made several attempts at partially matching data, and that failed. The statement below is just a simplified version to see if the concept would work, it didn't. Tried 3 different Regex debuggers and still don't understand why the statement failed :-(
^(.*?)\W?\s?(.*?)\W?\s?(.*?)?,-((\1|\2)).*?$
Christie Golden,-Golden, Christie
Expected to match \2(Golden) with (Golden)
I've spent quite a bit of time trying to figure this out. Too be honest, it now more about an opportunity to gain some interesting knowledge than it is solving this particular problem.