Following the indications found here: I'm trying to compile, build and install tesseract 5 in Ubuntu 24.04 :
(base) raphy@raohy:~$ git clone --recursive .git
(base) raphy@raohy:~/tesseract$ ./autogen.sh
(base) raphy@raohy:~/tesseract$ ./configure --prefix=/home/raphy/Grasp/src/tesseract
checking for g++... g++
checking whether the C++ compiler works... yes
checking for C++ compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether the compiler supports GNU C++... yes
checking whether g++ accepts -g... yes
checking for g++ option to enable C++11 features... none needed
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a race-free mkdir -p... /usr/bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports the include directive... yes (GNU style)
checking whether make supports nested variables... yes
checking dependency style of g++... gcc3
checking for a sed that does not truncate output... /usr/bin/sed
checking Major version... 5
checking Minor version... 5
checking Point version... 0-48-gf96c
checking whether make supports nested variables... (cached) yes
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking whether C++ compiler accepts -Werror=unused-command-line-argument... no
checking whether C++ compiler accepts -mavx... yes
checking whether C++ compiler accepts -mavx2... yes
checking whether C++ compiler accepts -mavx512f... yes
checking whether C++ compiler accepts -mfma... yes
checking whether C++ compiler accepts -msse4.1... yes
checking for feenableexcept... yes
checking whether C++ compiler accepts -fopenmp-simd... yes
checking --enable-float32 argument...
checking --enable-graphics argument...
checking --enable-legacy argument...
checking for g++ option to support OpenMP... -fopenmp
checking for stdio.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for strings.h... yes
checking for sys/stat.h... yes
checking for sys/types.h... yes
checking for unistd.h... yes
checking for tiffio.h... yes
checking --enable-visibility argument...
checking whether to use tessdata-prefix... yes
checking if compiling with clang... no
checking whether to enable debugging...
checking how to print strings... printf
checking for gcc... gcc
checking whether the compiler supports GNU C... yes
checking whether gcc accepts -g... yes
checking for gcc option to enable C11 features... none needed
checking whether gcc understands -c and -o together... yes
checking dependency style of gcc... gcc3
checking for a sed that does not truncate output... (cached) /usr/bin/sed
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for fgrep... /usr/bin/grep -F
checking for ld used by gcc... /usr/bin/ld
checking if the linker (/usr/bin/ld) is GNU ld... yes
checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -B
checking the name lister (/usr/bin/nm -B) interface... BSD nm
checking whether ln -s works... yes
checking the maximum length of command line arguments... 1572864
checking how to convert x86_64-pc-linux-gnu file names to x86_64-pc-linux-gnu format... func_convert_file_noop
checking how to convert x86_64-pc-linux-gnu file names to toolchain format... func_convert_file_noop
checking for /usr/bin/ld option to reload object files... -r
checking for file... file
checking for objdump... objdump
checking how to recognize dependent libraries... pass_all
checking for dlltool... no
checking how to associate runtime and link libraries... printf %s\n
checking for ar... ar
checking for archiver @FILE support... @
checking for strip... strip
checking for ranlib... ranlib
checking command to parse /usr/bin/nm -B output from gcc object... ok
checking for sysroot... no
checking for a working dd... /usr/bin/dd
checking how to truncate binary pipes... /usr/bin/dd bs=4096 count=1
checking for mt... mt
checking if mt is a manifest tool... no
checking for dlfcn.h... yes
checking for objdir... .libs
checking if gcc supports -fno-rtti -fno-exceptions... no
checking for gcc option to produce PIC... -fPIC -DPIC
checking if gcc PIC flag -fPIC -DPIC works... yes
checking if gcc static flag -static works... yes
checking if gcc supports -c -o file.o... yes
checking if gcc supports -c -o file.o... (cached) yes
checking whether the gcc linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes
checking whether -lc should be explicitly linked in... no
checking dynamic linker characteristics... ./configure: line 14056: warning: command substitution: ignored null byte in input
GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking whether stripping libraries is possible... yes
checking if libtool supports shared libraries... yes
checking whether to build shared libraries... yes
checking whether to build static libraries... yes
checking how to run the C++ preprocessor... g++ -E
checking for ld used by g++... /usr/bin/ld -m elf_x86_64
checking if the linker (/usr/bin/ld -m elf_x86_64) is GNU ld... yes
checking whether the g++ linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes
checking for g++ option to produce PIC... -fPIC -DPIC
checking if g++ PIC flag -fPIC -DPIC works... yes
checking if g++ static flag -static works... yes
checking if g++ supports -c -o file.o... yes
checking if g++ supports -c -o file.o... (cached) yes
checking whether the g++ linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes
checking dynamic linker characteristics... (cached) ./configure: line 18060: warning: command substitution: ignored null byte in input
GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking whether C++ compiler accepts -std=c++17... yes
checking whether C++ compiler accepts -std=c++20... yes
checking for library containing pthread_create... none required
checking for brew... false
checking for asciidoc... false
checking for xsltproc... true
checking for wchar_t... yes
checking for long long int... yes
checking for pkg-config... /usr/bin/pkg-config
checking pkg-config is at least version 0.9.0... yes
checking for libcurl... yes
checking for lept >= 1.74... yes
checking for libarchive... yes
checking for icu-uc >= 52.1... yes
checking for icu-i18n >= 52.1... yes
checking for pango >= 1.38.0... yes
checking for cairo... yes
checking for pangocairo... yes
checking for pangoft2... yes
checking that generated files are newer than configure... done
configure: creating ./config.status
config.status: creating include/tesseract/version.h
config.status: creating Makefile
config.status: creating tesseract.pc
config.status: creating tessdata/Makefile
config.status: creating tessdata/configs/Makefile
config.status: creating tessdata/tessconfigs/Makefile
config.status: creating java/Makefile
config.status: creating java/com/Makefile
config.status: creating java/com/google/Makefile
config.status: creating java/com/google/scrollview/Makefile
config.status: creating java/com/google/scrollview/events/Makefile
config.status: creating java/com/google/scrollview/ui/Makefile
config.status: creating nsis/Makefile
config.status: creating include/config_auto.h
config.status: executing depfiles commands
config.status: executing libtool commands
Configuration is done.
(base) raphy@raohy:~/tesseract$ cmake -B builddir
-- The C compiler identification is GNU 13.3.0
-- The CXX compiler identification is GNU 14.2.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring tesseract version 5.5.0-48-gf96c...
-- Setting build type to 'Release' as none was specified.
-- IPO / LTO supported
-- CMAKE_SYSTEM_PROCESSOR=<x86_64>
-- Performing Test HAVE_AVX
-- Performing Test HAVE_AVX - Success
-- Performing Test HAVE_AVX2
-- Performing Test HAVE_AVX2 - Success
-- Performing Test HAVE_AVX512F
-- Performing Test HAVE_AVX512F - Success
-- Performing Test HAVE_FMA
-- Performing Test HAVE_FMA - Success
-- Performing Test HAVE_SSE4_1
-- Performing Test HAVE_SSE4_1 - Success
-- Performing Test OPENMP_SIMD
-- Performing Test OPENMP_SIMD - Success
-- Found PkgConfig: /usr/bin/pkg-config (found version "1.8.1")
-- Could NOT find Leptonica (missing: Leptonica_DIR)
-- Checking for module 'lept>=1.74'
-- Found lept, version 1.82.0
-- Found leptonica version: 1.82.0
-- Leptonica was build with TIFF support.
-- Found TIFF: /usr/lib/x86_64-linux-gnu/libtiff.so (found version "4.5.1")
-- Found LibArchive: /usr/lib/x86_64-linux-gnu/libarchive.so (found version "3.7.2")
-- Found CURL: /usr/local/lib/cmake/CURL/CURLConfig.cmake (found version "8.8.0-DEV")
-- Looking for feenableexcept
-- Looking for feenableexcept - found
--
-- General configuration for Tesseract 5.5.0-48-gf96c
-- --------------------------------------------------------
-- Build type: Release 64 bits
-- Compiler: GNU
-- Compiler version: 14.2.0
-- Used standard: C++20
-- CXX compiler options: -O3 -DNDEBUG
-- Compile definitions = HAVE_AVX;HAVE_AVX2;HAVE_AVX512F;HAVE_FMA;HAVE_SSE4_1;OPENMP_SIMD;CMAKE_BUILD;HAVE_CONFIG_H
-- Linker options:
-- Install directory: /usr/local
-- HAVE_AVX: 1
-- HAVE_AVX2: 1
-- HAVE_AVX512F: 1
-- HAVE_FMA: 1
-- HAVE_SSE4_1: 1
-- MARCH_NATIVE_OPT: OFF
-- HAVE_NEON: FALSE
-- Link-time optimization: FALSE
-- --------------------------------------------------------
-- Build with sw [SW_BUILD]: OFF
-- Build with openmp support [OPENMP_BUILD]: OFF
-- Build with libarchive support [HAVE_LIBARCHIVE]: ON
-- Build with libcurl support [HAVE_LIBCURL]: ON
-- Enable float for LSTM [FAST_FLOAT]: ON
-- Enable optimization for host CPU (could break HW compatibility) [ENABLE_NATIVE]: OFF
-- Disable disable graphics (ScrollView) [GRAPHICS_DISABLED]: OFF
-- Disable the legacy OCR engine [DISABLED_LEGACY_ENGINE]: OFF
-- Build training tools [BUILD_TRAINING_TOOLS]: ON
-- Build tests [BUILD_TESTS]: OFF
-- Use system ICU Library [USE_SYSTEM_ICU]: OFF
-- Install tesseract configs [INSTALL_CONFIGS]: ON
-- --------------------------------------------------------
--
-- Checking for modules 'icu-uc;icu-i18n'
-- Found icu-uc, version 74.2 // <-----------------------------------------------------
-- Found icu-i18n, version 74.2 // <-----------------------------------------------------
>> ICU_FOUND 1 icui18n;icuuc;icudata /usr/include // <-----------------------------------------------------
-- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY
-- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY - Success
-- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY
-- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY - Success
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR - Success
-- Checking for modules 'pango>=1.38.0;cairo;pangoft2;pangocairo;fontconfig'
-- Found pango, version 1.52.1
-- Found cairo, version 1.18.0
-- Found pangoft2, version 1.52.1
-- Found pangocairo, version 1.52.1
-- Found fontconfig, version 2.15.0
-- Configuring done (1.8s)
-- Generating done (0.1s)
-- Build files have been written to: /home/raphy/tesseract/builddir
But on the building phase I get undefined reference to the icu 72 files
:
(base) raphy@raohy:~/tesseract$ cmake --build builddir/
[ 93%] Linking CXX executable ../../bin/combine_lang_model
/usr/bin/ld: libunicharset_training.a(normstrngs.cpp.o): warning: relocation against `_ZTVN6icu_7213UnicodeStringE' in read-only section `.text'
/usr/bin/ld: libunicharset_training.a(unicharset_training_utils.cpp.o): in function `tesseract::SetupBasicProperties(bool, bool, tesseract::UNICHARSET*)':
unicharset_training_utils.cpp:(.text+0xf6): undefined reference to `u_isalpha_72'
/usr/bin/ld: unicharset_training_utils.cpp:(.text+0x105): undefined reference to `u_islower_72'
/usr/bin/ld: unicharset_training_utils.cpp:(.text+0x114): undefined reference to `u_isupper_72'
/usr/bin/ld: unicharset_training_utils.cpp:(.text+0x123): undefined reference to `u_isdigit_72'
/usr/bin/ld: unicharset_training_utils.cpp:(.text+0x133): undefined reference to `u_ispunct_72'
/usr/bin/ld: unicharset_training_utils.cpp:(.text+0x1ca): undefined reference to `uscript_getScript_72'
/usr/bin/ld: unicharset_training_utils.cpp:(.text+0x1d1): undefined reference to `uscript_getName_72'
/usr/bin/ld: unicharset_training_utils.cpp:(.text+0x2a5): undefined reference to `u_charMirror_72'
/usr/bin/ld: unicharset_training_utils.cpp:(.text+0x2c1): undefined reference to `u_charDirection_72'
/usr/bin/ld: unicharset_training_utils.cpp:(.text+0x78a): undefined reference to `u_tolower_72'
/usr/bin/ld: unicharset_training_utils.cpp:(.text+0x842): undefined reference to `u_toupper_72'
/usr/bin/ld: libunicharset_training.a(normstrngs.cpp.o): in function `tesseract::StripJoiners(std::vector<int, std::allocator<int> >*)':
normstrngs.cpp:(.text+0x2c): undefined reference to `u_isalpha_72'
/usr/bin/ld: libunicharset_training.a(normstrngs.cpp.o): in function `tesseract::NormalizeUTF8ToUTF32(tesseract::UnicodeNormMode, tesseract::OCRNorm, char const*, std::vector<int, std::allocator<int> >*)':
normstrngs.cpp:(.text+0x1f8): undefined reference to `icu_72::UnicodeString::UnicodeString(char const*, char const*)'
/usr/bin/ld: normstrngs.cpp:(.text+0x239): undefined reference to `icu_72::Normalizer2::getInstance(char const*, char const*, UNormalization2Mode, UErrorCode&)'
/usr/bin/ld: normstrngs.cpp:(.text+0x244): undefined reference to `icu_72::ErrorCode::assertSuccess() const'
/usr/bin/ld: normstrngs.cpp:(.text+0x24c): undefined reference to `icu_72::ErrorCode::reset()'
/usr/bin/ld: normstrngs.cpp:(.text+0x253): undefined reference to `vtable for icu_72::UnicodeString'
/usr/bin/ld: normstrngs.cpp:(.text+0x282): undefined reference to `icu_72::ErrorCode::assertSuccess() const'
/usr/bin/ld: normstrngs.cpp:(.text+0x2d7): undefined reference to `icu_72::UnicodeString::char32At(int) const'
/usr/bin/ld: normstrngs.cpp:(.text+0x32c): undefined reference to `icu_72::UnicodeString::moveIndex32(int, int) const'
/usr/bin/ld: normstrngs.cpp:(.text+0x353): undefined reference to `icu_72::UnicodeString::~UnicodeString()'
/usr/bin/ld: normstrngs.cpp:(.text+0x363): undefined reference to `icu_72::UnicodeString::~UnicodeString()'
/usr/bin/ld: normstrngs.cpp:(.text+0x3d1): undefined reference to `icu_72::UnicodeString::char32At(int) const'
/usr/bin/ld: normstrngs.cpp:(.text+0x52c): undefined reference to `icu_72::UnicodeString::moveIndex32(int, int) const'
I do not understand why this happens, since it found icu 74 folder
How to make it work?