I'm trying to figure out why code below doesn't work correctly, using the code below I want to find every "time" string from 1.html file.
<td align="center"><p>02:04:37.472</p></td>
<td align="center"><p>02:04:38.208</p></td>
<td align="center"><p>02:04:38.242</p></td>
I'm stuck and can't use the gnu version of awk. I would be grateful for help in repairing the code. Thank you
#!/bin/bash
_subtitles_getSubtitlesForUrl() {
local awkCode2=
read -r -d "" awkCode2 << 'SEARCHFORIDSAWKEOF'
BEGIN {
fileSize = 0
fps = 0
time = 0
}
/td align="center"><p>0/ {
isTimeMatched = match($0, /td align="center"><p>0[^0-9]*([0-9\.]+)/)
if (isTimeMatched) {
time = substr($0, RSTART + 22, RLENGTH - 16)
}
}
/Rozmiar pliku/ {
isFileSizeMatched = match($0, /Rozmiar pliku:[^0-9]*([0-9\.]+)/)
if (isFileSizeMatched) {
fileSize = substr($0, RSTART + 18, RLENGTH - 14)
}
}
/Video FPS/ {
isFpsMatched = match($0, /Video FPS:[^0-9]*([0-9\.]+)/)
if (isFpsMatched) {
fps = substr($0, RSTART + 15, RLENGTH - 15)
}
}
/napiprojekt:/ {
isHrefMatched = match($0, /href="(napiprojekt:[^"]*)"/)
if (isHrefMatched) {
printf("%7s | fps: %6s |%10s | %s\n",time ,fps ,fileSize , substr($0, RSTART + 6, RLENGTH - 7))
}
}
SEARCHFORIDSAWKEOF
cat 1.html | busybox awk "$awkCode2"
}
_subtitles_getSubtitlesForUrl
file 1.html
<tr title="<b>Autor:</b> Brak (dodał: macieju6)<br><b>Rozmiar pliku:</b> 23.48 GiB (25211536687 bajtów)<br><b>Ogólne bitrate pliku:</b> 27.0 Mbps<br><br><b>Video FPS:</b> 23.976<br><b>Video kodek:</b> MPEG-4<br><b>Video bitrate:</b> 25.0 Mbps<br><b>Video rozdzielczość:</b> 3840x1608<br><b>Video rozmiar:</b> 21.8 GiB (93%)<br><b>Video proporcje obrazu:</b> 2.40:1<br><br><b>Audio format:</b> E-AC-3 (Audio Coding 3)<br><b>Audio bitrate:</b> 960 Kbps<br><b>Audio liczba kanałów:</b> 6<br><b>Audio sampling rate:</b> 48.0 KHz<br><b>Audio resolution:</b> 16 bits<br><b>Audio rozmiar:</b> 855 MiB (4%)<br>" valign="middle">
<td align="left"><p class="blue indent"> <a class="tableA" href="napiprojekt:1fb0f00ceb78e10cfe89af3dbdccdd19">Godzilla Minus One.mp4</a></p></td>
<td align="center"><p>23.48 GiB</p> </td>
<td align="center"><p>23.976 </p></td>
<td align="center"><p>02:04:37.472</p></td>
<td align="center"><p>Brak</p></td>
<td align="center"><p>2025-01-17</p></td>
<td align="center"><p>10</p></td>
</tr>
<tr title="<b>Autor:</b> Victor Delacroix<br><b>Rozmiar pliku:</b> 24.93 GiB (26772498718 bajtów)<br><b>Ogólne bitrate pliku:</b> 28.6 Mbps<br><br><b>Video FPS:</b> 23.976<br><b>Video kodek:</b> Matroska<br><b>Video bitrate:</b> <br><b>Video rozdzielczość:</b> 1920x1080<br><b>Video rozmiar:</b> <br><b>Video proporcje obrazu:</b> 16:9<br><br><b>Audio format:</b> TrueHD<br><b>Audio bitrate:</b> <br><b>Audio liczba kanałów:</b> 8<br><b>Audio sampling rate:</b> 48.0 KHz<br><b>Audio resolution:</b> <br><b>Audio rozmiar:</b> <br>" valign="middle">
<td align="left"><p class="blue indent"> <a class="tableA" href="napiprojekt:ee2581c8ed39680c0851ac340f868d61">Godzilla Minus One.mkv</a></p></td>
<td align="center"><p>24.93 GiB</p> </td>
<td align="center"><p>23.976 </p></td>
<td align="center"><p>02:04:38.208</p></td>
<td align="center"><p>Victor Delacroix</p></td>
<td align="center"><p>2024-08-18</p></td>
<td align="center"><p>22</p></td>
</tr>
<tr title="<b>Autor:</b> Brak (dodał: kossa88)<br><b>Rozmiar pliku:</b> 951.9 MiB (998147235 bajtów)<br><b>Ogólne bitrate pliku:</b> 1 068 Kbps<br><br><b>Video FPS:</b> 23.976<br><b>Video kodek:</b> MPEG-4<br><b>Video bitrate:</b> 900 Kbps<br><b>Video rozdzielczość:</b> 720x300<br><b>Video rozmiar:</b> 806 MiB (85%)<br><b>Video proporcje obrazu:</b> 2.40:1<br><br><b>Audio format:</b> AAC (Advanced Audio Codec)<br><b>Audio bitrate:</b> 160 Kbps<br><b>Audio liczba kanałów:</b> 2<br><b>Audio sampling rate:</b> 48.0 KHz<br><b>Audio resolution:</b> <br><b>Audio rozmiar:</b> 143 MiB (15%)<br>" valign="middle">
<td align="left"><p class="blue indent"> <a class="tableA" href="napiprojekt:ab8edd9c56debfa9b66be98fabff8968">Godzilla Minus One.mp4</a></p></td>
<td align="center"><p>951.9 MiB</p> </td>
<td align="center"><p>23.976 </p></td>
<td align="center"><p>02:04:38.242</p></td>
<td align="center"><p>Brak</p></td>
<td align="center"><p>2024-07-10</p></td>
<td align="center"><p>4</p></td>
</tr>
The result I get is:
$ ./napi_test.sh
0 | fps: 23.976 | 23.48 GiB | napiprojekt:1fb0f00ceb78e10cfe89af3dbdccdd19
2:04:37 | fps: 23.976 | 24.93 GiB | napiprojekt:ee2581c8ed39680c0851ac340f868d61
2:04:38 | fps: 23.976 | 951.9 MiB | napiprojekt:ab8edd9c56debfa9b66be98fabff8968
As you can see the results are shifted one line down
I want:
2:04:37 | fps: 23.976 | 23.48 GiB | napiprojekt:1fb0f00ceb78e10cfe89af3dbdccdd19
2:04:38 | fps: 23.976 | 24.93 GiB | napiprojekt:ee2581c8ed39680c0851ac340f868d61
2:04:38 | fps: 23.976 | 951.9 MiB | napiprojekt:ab8edd9c56debfa9b66be98fabff8968
I'm trying to figure out why code below doesn't work correctly, using the code below I want to find every "time" string from 1.html file.
<td align="center"><p>02:04:37.472</p></td>
<td align="center"><p>02:04:38.208</p></td>
<td align="center"><p>02:04:38.242</p></td>
I'm stuck and can't use the gnu version of awk. I would be grateful for help in repairing the code. Thank you
#!/bin/bash
_subtitles_getSubtitlesForUrl() {
local awkCode2=
read -r -d "" awkCode2 << 'SEARCHFORIDSAWKEOF'
BEGIN {
fileSize = 0
fps = 0
time = 0
}
/td align="center"><p>0/ {
isTimeMatched = match($0, /td align="center"><p>0[^0-9]*([0-9\.]+)/)
if (isTimeMatched) {
time = substr($0, RSTART + 22, RLENGTH - 16)
}
}
/Rozmiar pliku/ {
isFileSizeMatched = match($0, /Rozmiar pliku:[^0-9]*([0-9\.]+)/)
if (isFileSizeMatched) {
fileSize = substr($0, RSTART + 18, RLENGTH - 14)
}
}
/Video FPS/ {
isFpsMatched = match($0, /Video FPS:[^0-9]*([0-9\.]+)/)
if (isFpsMatched) {
fps = substr($0, RSTART + 15, RLENGTH - 15)
}
}
/napiprojekt:/ {
isHrefMatched = match($0, /href="(napiprojekt:[^"]*)"/)
if (isHrefMatched) {
printf("%7s | fps: %6s |%10s | %s\n",time ,fps ,fileSize , substr($0, RSTART + 6, RLENGTH - 7))
}
}
SEARCHFORIDSAWKEOF
cat 1.html | busybox awk "$awkCode2"
}
_subtitles_getSubtitlesForUrl
file 1.html
<tr title="<b>Autor:</b> Brak (dodał: macieju6)<br><b>Rozmiar pliku:</b> 23.48 GiB (25211536687 bajtów)<br><b>Ogólne bitrate pliku:</b> 27.0 Mbps<br><br><b>Video FPS:</b> 23.976<br><b>Video kodek:</b> MPEG-4<br><b>Video bitrate:</b> 25.0 Mbps<br><b>Video rozdzielczość:</b> 3840x1608<br><b>Video rozmiar:</b> 21.8 GiB (93%)<br><b>Video proporcje obrazu:</b> 2.40:1<br><br><b>Audio format:</b> E-AC-3 (Audio Coding 3)<br><b>Audio bitrate:</b> 960 Kbps<br><b>Audio liczba kanałów:</b> 6<br><b>Audio sampling rate:</b> 48.0 KHz<br><b>Audio resolution:</b> 16 bits<br><b>Audio rozmiar:</b> 855 MiB (4%)<br>" valign="middle">
<td align="left"><p class="blue indent"> <a class="tableA" href="napiprojekt:1fb0f00ceb78e10cfe89af3dbdccdd19">Godzilla Minus One.mp4</a></p></td>
<td align="center"><p>23.48 GiB</p> </td>
<td align="center"><p>23.976 </p></td>
<td align="center"><p>02:04:37.472</p></td>
<td align="center"><p>Brak</p></td>
<td align="center"><p>2025-01-17</p></td>
<td align="center"><p>10</p></td>
</tr>
<tr title="<b>Autor:</b> Victor Delacroix<br><b>Rozmiar pliku:</b> 24.93 GiB (26772498718 bajtów)<br><b>Ogólne bitrate pliku:</b> 28.6 Mbps<br><br><b>Video FPS:</b> 23.976<br><b>Video kodek:</b> Matroska<br><b>Video bitrate:</b> <br><b>Video rozdzielczość:</b> 1920x1080<br><b>Video rozmiar:</b> <br><b>Video proporcje obrazu:</b> 16:9<br><br><b>Audio format:</b> TrueHD<br><b>Audio bitrate:</b> <br><b>Audio liczba kanałów:</b> 8<br><b>Audio sampling rate:</b> 48.0 KHz<br><b>Audio resolution:</b> <br><b>Audio rozmiar:</b> <br>" valign="middle">
<td align="left"><p class="blue indent"> <a class="tableA" href="napiprojekt:ee2581c8ed39680c0851ac340f868d61">Godzilla Minus One.mkv</a></p></td>
<td align="center"><p>24.93 GiB</p> </td>
<td align="center"><p>23.976 </p></td>
<td align="center"><p>02:04:38.208</p></td>
<td align="center"><p>Victor Delacroix</p></td>
<td align="center"><p>2024-08-18</p></td>
<td align="center"><p>22</p></td>
</tr>
<tr title="<b>Autor:</b> Brak (dodał: kossa88)<br><b>Rozmiar pliku:</b> 951.9 MiB (998147235 bajtów)<br><b>Ogólne bitrate pliku:</b> 1 068 Kbps<br><br><b>Video FPS:</b> 23.976<br><b>Video kodek:</b> MPEG-4<br><b>Video bitrate:</b> 900 Kbps<br><b>Video rozdzielczość:</b> 720x300<br><b>Video rozmiar:</b> 806 MiB (85%)<br><b>Video proporcje obrazu:</b> 2.40:1<br><br><b>Audio format:</b> AAC (Advanced Audio Codec)<br><b>Audio bitrate:</b> 160 Kbps<br><b>Audio liczba kanałów:</b> 2<br><b>Audio sampling rate:</b> 48.0 KHz<br><b>Audio resolution:</b> <br><b>Audio rozmiar:</b> 143 MiB (15%)<br>" valign="middle">
<td align="left"><p class="blue indent"> <a class="tableA" href="napiprojekt:ab8edd9c56debfa9b66be98fabff8968">Godzilla Minus One.mp4</a></p></td>
<td align="center"><p>951.9 MiB</p> </td>
<td align="center"><p>23.976 </p></td>
<td align="center"><p>02:04:38.242</p></td>
<td align="center"><p>Brak</p></td>
<td align="center"><p>2024-07-10</p></td>
<td align="center"><p>4</p></td>
</tr>
The result I get is:
$ ./napi_test.sh
0 | fps: 23.976 | 23.48 GiB | napiprojekt:1fb0f00ceb78e10cfe89af3dbdccdd19
2:04:37 | fps: 23.976 | 24.93 GiB | napiprojekt:ee2581c8ed39680c0851ac340f868d61
2:04:38 | fps: 23.976 | 951.9 MiB | napiprojekt:ab8edd9c56debfa9b66be98fabff8968
As you can see the results are shifted one line down
I want:
2:04:37 | fps: 23.976 | 23.48 GiB | napiprojekt:1fb0f00ceb78e10cfe89af3dbdccdd19
2:04:38 | fps: 23.976 | 24.93 GiB | napiprojekt:ee2581c8ed39680c0851ac340f868d61
2:04:38 | fps: 23.976 | 951.9 MiB | napiprojekt:ab8edd9c56debfa9b66be98fabff8968
Share
Improve this question
edited Feb 2 at 3:53
Barmar
783k56 gold badges546 silver badges660 bronze badges
asked Feb 1 at 23:40
D SD S
1295 bronze badges
4
|
2 Answers
Reset to default 5You're generating output when you match on napiprojekt:
but at that point you haven't yet matched on the corresponding td align=\"center\"><p>0
line; net result is your time
value is being displayed during the follow-on block's printf
operation (ie, time
is 'shifted' by one block).
Consider capturing the napiprojekt:
data in a variable and then generate your output when you match on td align=\"center\"><p>0
:
######### /td align="center"><p>0/
#
# replace this:
time = substr($0, RSTART + 22, RLENGTH - 16)
# with this:
printf("%7s | fps: %6s |%10s | %s\n",substr($0, RSTART + 22, RLENGTH - 16) ,fps ,fileSize , napi_data)
######### /napiprojekt:/
#
# replace this:
printf("%7s | fps: %6s |%10s | %s\n",time ,fps ,fileSize , substr($0, RSTART + 6, RLENGTH - 7))
# with this:
napi_data = substr($0, RSTART + 6, RLENGTH - 7)
After making these two sets of changes the code generates:
2:04:37 | fps: 23.976 | 23.48 GiB | napiprojekt:1fb0f00ceb78e10cfe89af3dbdccdd19
2:04:38 | fps: 23.976 | 24.93 GiB | napiprojekt:ee2581c8ed39680c0851ac340f868d61
2:04:38 | fps: 23.976 | 951.9 MiB | napiprojekt:ab8edd9c56debfa9b66be98fabff8968
I'm stuck and can't use the gnu version of awk
I experimented with busybox awk
, version BusyBox v1.30.1 (Ubuntu 1:1.30.1-7ubuntu3.1) multi-call binary. and found that it does support multiple-characters RS
(row separator), thus allowing to inform that </tr>
are separating rows, consider following simplified example, let 1.html
content be
<tr title="<b>Autor:</b> Brak (dodał: macieju6)<br><b>Rozmiar pliku:</b> 23.48 GiB (25211536687 bajtów)<br><b>Ogólne bitrate pliku:</b> 27.0 Mbps<br><br><b>Video FPS:</b> 23.976<br><b>Video kodek:</b> MPEG-4<br><b>Video bitrate:</b> 25.0 Mbps<br><b>Video rozdzielczość:</b> 3840x1608<br><b>Video rozmiar:</b> 21.8 GiB (93%)<br><b>Video proporcje obrazu:</b> 2.40:1<br><br><b>Audio format:</b> E-AC-3 (Audio Coding 3)<br><b>Audio bitrate:</b> 960 Kbps<br><b>Audio liczba kanałów:</b> 6<br><b>Audio sampling rate:</b> 48.0 KHz<br><b>Audio resolution:</b> 16 bits<br><b>Audio rozmiar:</b> 855 MiB (4%)<br>" valign="middle">
<td align="left"><p class="blue indent"> <a class="tableA" href="napiprojekt:1fb0f00ceb78e10cfe89af3dbdccdd19">Godzilla Minus One.mp4</a></p></td>
<td align="center"><p>23.48 GiB</p> </td>
<td align="center"><p>23.976 </p></td>
<td align="center"><p>02:04:37.472</p></td>
<td align="center"><p>Brak</p></td>
<td align="center"><p>2025-01-17</p></td>
<td align="center"><p>10</p></td>
</tr>
<tr title="<b>Autor:</b> Victor Delacroix<br><b>Rozmiar pliku:</b> 24.93 GiB (26772498718 bajtów)<br><b>Ogólne bitrate pliku:</b> 28.6 Mbps<br><br><b>Video FPS:</b> 23.976<br><b>Video kodek:</b> Matroska<br><b>Video bitrate:</b> <br><b>Video rozdzielczość:</b> 1920x1080<br><b>Video rozmiar:</b> <br><b>Video proporcje obrazu:</b> 16:9<br><br><b>Audio format:</b> TrueHD<br><b>Audio bitrate:</b> <br><b>Audio liczba kanałów:</b> 8<br><b>Audio sampling rate:</b> 48.0 KHz<br><b>Audio resolution:</b> <br><b>Audio rozmiar:</b> <br>" valign="middle">
<td align="left"><p class="blue indent"> <a class="tableA" href="napiprojekt:ee2581c8ed39680c0851ac340f868d61">Godzilla Minus One.mkv</a></p></td>
<td align="center"><p>24.93 GiB</p> </td>
<td align="center"><p>23.976 </p></td>
<td align="center"><p>02:04:38.208</p></td>
<td align="center"><p>Victor Delacroix</p></td>
<td align="center"><p>2024-08-18</p></td>
<td align="center"><p>22</p></td>
</tr>
<tr title="<b>Autor:</b> Brak (dodał: kossa88)<br><b>Rozmiar pliku:</b> 951.9 MiB (998147235 bajtów)<br><b>Ogólne bitrate pliku:</b> 1 068 Kbps<br><br><b>Video FPS:</b> 23.976<br><b>Video kodek:</b> MPEG-4<br><b>Video bitrate:</b> 900 Kbps<br><b>Video rozdzielczość:</b> 720x300<br><b>Video rozmiar:</b> 806 MiB (85%)<br><b>Video proporcje obrazu:</b> 2.40:1<br><br><b>Audio format:</b> AAC (Advanced Audio Codec)<br><b>Audio bitrate:</b> 160 Kbps<br><b>Audio liczba kanałów:</b> 2<br><b>Audio sampling rate:</b> 48.0 KHz<br><b>Audio resolution:</b> <br><b>Audio rozmiar:</b> 143 MiB (15%)<br>" valign="middle">
<td align="left"><p class="blue indent"> <a class="tableA" href="napiprojekt:ab8edd9c56debfa9b66be98fabff8968">Godzilla Minus One.mp4</a></p></td>
<td align="center"><p>951.9 MiB</p> </td>
<td align="center"><p>23.976 </p></td>
<td align="center"><p>02:04:38.242</p></td>
<td align="center"><p>Brak</p></td>
<td align="center"><p>2024-07-10</p></td>
<td align="center"><p>4</p></td>
</tr>
then
busybox awk 'BEGIN{RS="</tr>"}
match($0,/napiprojekt:[0-9a-f]*/){id=substr($0,RSTART,RLENGTH)}
match($0,/[0-9][0-9]:[0-9][0-9]:[0-9][0-9][.][0-9]+/){time=substr($0,RSTART,RLENGTH)}
{print id, time}' 1.html
gives output
napiprojekt:1fb0f00ceb78e10cfe89af3dbdccdd19 02:04:37.472
napiprojekt:ee2581c8ed39680c0851ac340f868d61 02:04:38.208
napiprojekt:ab8edd9c56debfa9b66be98fabff8968 02:04:38.242
napiprojekt:ab8edd9c56debfa9b66be98fabff8968 02:04:38.242
Explanation: after informing busybox awk
that </tr>
is row separator, I can treat each <tr>...
as single line, therefore I can search for data in any order.
EDIT: after why is the last one duplicated?
Reason is that there is empty row behind last </tr>
, which cause printing of last known values, to avoid this /<tr/
pattern can be added for printing action, that is doing
busybox awk 'BEGIN{RS="</tr>"}
match($0,/napiprojekt:[0-9a-f]*/){id=substr($0,RSTART,RLENGTH)}
match($0,/[0-9][0-9]:[0-9][0-9]:[0-9][0-9][.][0-9]+/){time=substr($0,RSTART,RLENGTH)}
/<tr/{print id, time}' 1.htm
0
? If that's the hour part of the time, can't it go up to 23? – Barmar Commented Feb 2 at 3:55cat 1.html | busybox awk "$awkCode2"
-->busybox awk "$awkCode2" 1.html
– Itération 122442 Commented Feb 2 at 9:34