LRE
Sign In
Sign Up
Sign In
Sign Up
Manage this list
×
Keyboard Shortcuts
Thread View
j
: Next unread message
k
: Previous unread message
j a
: Jump to all threads
j l
: Jump to MailingList overview
2025
January
2024
December
November
October
September
August
July
June
May
April
March
February
January
2023
December
November
October
September
August
July
June
May
April
March
February
January
2022
December
November
October
September
August
July
June
May
April
March
February
January
2021
December
November
October
September
August
July
June
May
April
March
February
January
2020
December
November
October
September
August
July
June
May
April
March
February
January
2019
December
November
October
September
August
July
June
May
April
March
February
January
2018
December
November
October
September
August
July
June
May
April
March
February
January
2017
December
November
October
September
August
July
June
May
April
March
February
January
2016
December
November
October
September
August
July
June
May
April
March
February
January
2015
December
November
October
September
August
July
June
May
April
March
February
January
2014
December
November
October
September
August
July
June
May
April
March
February
January
2013
December
November
October
September
August
July
June
May
April
March
February
January
2012
December
November
October
September
August
July
June
May
April
March
February
January
2011
December
November
October
September
August
July
June
May
April
March
February
January
2010
December
November
October
September
August
July
June
May
April
March
February
January
2009
December
November
October
September
August
July
June
May
April
March
February
January
2008
December
November
October
September
August
July
June
May
April
March
February
January
2007
December
November
October
September
August
July
June
May
April
March
February
January
2006
December
November
October
September
August
July
June
May
April
March
February
January
2005
December
November
October
September
August
July
June
May
April
March
February
January
2004
December
November
October
September
August
July
June
May
April
March
List overview
Download
Olena-patches
----- 2025 -----
January 2025
----- 2024 -----
December 2024
November 2024
October 2024
September 2024
August 2024
July 2024
June 2024
May 2024
April 2024
March 2024
February 2024
January 2024
----- 2023 -----
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
----- 2022 -----
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
----- 2021 -----
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
----- 2020 -----
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
----- 2019 -----
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
----- 2018 -----
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
----- 2017 -----
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
----- 2016 -----
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
----- 2015 -----
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
----- 2014 -----
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
----- 2013 -----
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
----- 2012 -----
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
----- 2011 -----
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
----- 2010 -----
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
----- 2009 -----
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
----- 2008 -----
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
----- 2007 -----
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
----- 2006 -----
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
----- 2005 -----
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
----- 2004 -----
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
olena-patches@lrde.epita.fr
9625 discussions
Start a n
N
ew thread
olena: olena-2.0-558-gbc90889 [ICDAR_13] New processing (scribo toolchain-like) started
by Anthony Seure
--- scribo/sandbox/icdar_13_table/Makefile | 9 +- scribo/sandbox/icdar_13_table/README_ROLAND | 18 +++ scribo/sandbox/icdar_13_table/TODO | 2 - scribo/sandbox/icdar_13_table/src/new.cc | 208 +++++++++++++++++++++++--- 4 files changed, 209 insertions(+), 28 deletions(-) create mode 100644 scribo/sandbox/icdar_13_table/README_ROLAND diff --git a/scribo/sandbox/icdar_13_table/Makefile b/scribo/sandbox/icdar_13_table/Makefile index b9600a4..19743e1 100644 --- a/scribo/sandbox/icdar_13_table/Makefile +++ b/scribo/sandbox/icdar_13_table/Makefile @@ -1,21 +1,26 @@ CCACHE= CC=g++ -CFLAGS=-Wall -Werror -O3 -DHAVE_TESSERACT_3 -DNDEBUG +CFLAGS=-Wall -Werror -O3 -DHAVE_TESSERACT_3 -DNDEBUG -g CLIBS=-I../../../milena/ -I../../ -I/usr/include/poppler CLEAN=*.o output/* log final.xml SRC=src/new.cc +SRC_OLD=src/main.cc OUTPUT=table +OUTPUT_OLD=old all: table table: $(CCACHE) $(CC) $(CFLAGS) $(CLIBS) $(SRC) -ltesseract -lpoppler-cpp -o $(OUTPUT) +old: + $(CCACHE) $(CC) $(CFLAGS) $(CLIBS) $(SRC_OLD) -ltesseract -lpoppler-cpp -o $(OUTPUT_OLD) + clean: rm -rf $(CLEAN) mrproper: clean - rm -f $(OUTPUT) + rm -f $(OUTPUT) $(OUTPUT_OLD) .PHONY: table clean mrproper diff --git a/scribo/sandbox/icdar_13_table/README_ROLAND b/scribo/sandbox/icdar_13_table/README_ROLAND new file mode 100644 index 0000000..89b70dc --- /dev/null +++ b/scribo/sandbox/icdar_13_table/README_ROLAND @@ -0,0 +1,18 @@ +#------------------------------------------------------------------------------# + README - ICDAR 2013 - Table competition +#------------------------------------------------------------------------------# + +FIRST OF ALL : + * mkdir output + +Compilation and cleaning : + * make (generates the main program) + * make old (generates the old program (without scribo toolchain)) + * make clean (remove all files expect the binary) + * make mrproper (remove all files) + +Usage : + * ./table [your-pdf-file] + Generate debug images in the output/ directory and the final.xml + * ./old [your-pdf-file] + Same thing but old processing (without scribo toolchain) diff --git a/scribo/sandbox/icdar_13_table/TODO b/scribo/sandbox/icdar_13_table/TODO index b34e7a2..a4aa631 100644 --- a/scribo/sandbox/icdar_13_table/TODO +++ b/scribo/sandbox/icdar_13_table/TODO @@ -3,10 +3,8 @@ #------------------------------------------------------------------------------# Table location sub-competition : - * Load PDF files (instead of PNM) * Find links betwwen pages for mutlipages tables * Get text from reversed-video zones - * Expand the process to borderless tables Table structure recognition sub-competition : * All diff --git a/scribo/sandbox/icdar_13_table/src/new.cc b/scribo/sandbox/icdar_13_table/src/new.cc index 963aa7d..714d0c2 100644 --- a/scribo/sandbox/icdar_13_table/src/new.cc +++ b/scribo/sandbox/icdar_13_table/src/new.cc @@ -1,9 +1,10 @@ +// INCLUDES OLENA #include <mln/binarization/all.hh> #include <mln/core/image/image2d.hh> #include <mln/data/all.hh> -#include <mln/draw/line.hh> +#include <mln/draw/all.hh> #include <mln/fun/v2v/rgb_to_luma.hh> @@ -14,25 +15,45 @@ #include <mln/labeling/all.hh> #include <mln/literal/all.hh> #include <mln/logical/and.hh> +#include <mln/logical/not.hh> #include <mln/value/all.hh> +// INCLUDE TESSERACT #include <tesseract/baseapi.h> +// INCLUDES SCRIBO #include <scribo/binarization/sauvola.hh> + #include <scribo/core/component_set.hh> +#include <scribo/core/line_set.hh> +#include <scribo/core/paragraph_set.hh> + +#include <scribo/debug/links_image.hh> +#include <scribo/draw/groups_bboxes.hh> +#include <scribo/draw/line_components.hh> + +#include <scribo/filter/object_links_bbox_h_ratio.hh> + #include <scribo/preprocessing/denoise_fg.hh> -#include <scribo/primitive/extract/vertical_separators.hh> -#include <scribo/primitive/remove/separators.hh> +#include <scribo/primitive/extract/lines_h_discontinued.hh> +#include <scribo/primitive/extract/lines_v_discontinued.hh> #include <scribo/primitive/extract/separators_nonvisible.hh> - +#include <scribo/primitive/extract/vertical_separators.hh> +#include <scribo/primitive/group/from_single_link.hh> +#include <scribo/primitive/link/internal/compute_anchor.hh> #include <scribo/primitive/link/internal/dmax_width_and_height.hh> -#include <scribo/primitive/link/with_single_right_link_dmax_ratio.hh> -#include <scribo/primitive/link/with_single_left_link_dmax_ratio.hh> #include <scribo/primitive/link/merge_double_link.hh> +#include <scribo/primitive/link/with_single_left_link_dmax_ratio.hh> +#include <scribo/primitive/link/with_single_right_link_dmax_ratio.hh> +#include <scribo/primitive/remove/separators.hh> + +#include <scribo/text/extract_paragraphs.hh> +#include <scribo/text/merging.hh> using namespace mln; +// Open and initialize XML void start_xml(std::ofstream& xml, const char* name, const char* pdf) { xml.open(name); @@ -40,12 +61,14 @@ void start_xml(std::ofstream& xml, const char* name, const char* pdf) << "<document filename='" << pdf << "'>" << std::endl; } +// Finalize an close XML void end_xml(std::ofstream& xml) { xml << "</document>" << std::endl; xml.close(); } +// Write a new (simple) table in XML file void write_table(std::ofstream& xml, const point2d& start, const point2d& end) { static unsigned table = 0; @@ -62,6 +85,10 @@ void write_table(std::ofstream& xml, const point2d& start, const point2d& end) ++table; } + /********/ + /* MAIN */ + /********/ + int main(int argc, char** argv) { typedef value::label_16 V; @@ -69,10 +96,9 @@ int main(int argc, char** argv) std::ofstream xml; std::ostringstream path; - image2d<value::rgb8> original; + image2d<value::rgb8> original, ima_links, ima_groups, ima_valid; image2d<value::int_u8> filtered; - image2d<bool> bin, separators, bin_without_separators, whitespaces, - denoised, comp, links; + image2d<bool> bin, separators, bin_without_separators, whitespaces, comp, denoised; scribo::component_set< image2d<unsigned> > components; unsigned dpi = 72; @@ -89,9 +115,18 @@ int main(int argc, char** argv) bin = scribo::binarization::sauvola(filtered, 81, 0.44); // Find separators - separators = scribo::primitive::extract::vertical_separators(bin, 81); - bin_without_separators = scribo::primitive::remove::separators(bin, separators); - whitespaces = scribo::primitive::extract::separators_nonvisible(bin); + bin_without_separators = duplicate(bin); + separators = separators; + V nhlines, nvlines; + unsigned min_width = 31; + unsigned min_height = 71; + scribo::component_set<L> hlines = scribo::primitive::extract::lines_h_discontinued(bin_without_separators, c4(), nhlines, min_width, 2); + scribo::component_set<L> vlines = scribo::primitive::extract::lines_v_discontinued(bin_without_separators, c4(), nvlines, min_height, 2); + for (unsigned i = 1; i <= hlines.nelements(); ++i) + data::fill((bin_without_separators | hlines(i).bbox()).rw(), false); + + for (unsigned i = 1; i <= vlines.nelements(); ++i) + data::fill((bin_without_separators | vlines(i).bbox()).rw(), false); // Denoise denoised = scribo::preprocessing::denoise_fg(bin_without_separators, c8(), 4); @@ -103,8 +138,13 @@ int main(int argc, char** argv) initialize(comp, denoised); data::fill(comp, false); for (unsigned i = 1; i <= components.nelements(); ++i) - data::fill((comp | components(i).bbox()).rw(), true); + { + const box2d& b = components(i).bbox(); + if (b.width() > 2 && b.height() > 2) + data::fill((comp | b).rw(), true); + } + // Find links scribo::object_links< image2d<unsigned> > right_link = scribo::primitive::link::with_single_right_link_dmax_ratio(components, scribo::primitive::link::internal::dmax_width_and_height(1), scribo::anchor::MassCenter); @@ -115,10 +155,127 @@ int main(int argc, char** argv) scribo::object_links< image2d<unsigned> > merged_links = scribo::primitive::link::merge_double_link(left_link, right_link); - initialize(links, denoised); - data::fill(links, false); - for (unsigned i = 1; i <= merged_links.components().nelements(); ++i) - data::fill((links | merged_links.components()(i).bbox()).rw(), true); + // Filter links + scribo::object_links< image2d<unsigned> > hratio_filtered_links = scribo::filter::object_links_bbox_h_ratio(merged_links, 2.5f); + + ima_links = data::convert(value::rgb8(), denoised); + ima_groups = data::convert(value::rgb8(), denoised); + ima_valid = data::convert(value::rgb8(), denoised); + + // Write links + for (unsigned l = 1; l < merged_links.nelements(); ++l) + { + point2d p1 = scribo::primitive::link::internal::compute_anchor(merged_links.components(), l, scribo::anchor::MassCenter); + point2d p2 = scribo::primitive::link::internal::compute_anchor(merged_links.components(), merged_links(l), scribo::anchor::MassCenter); + + draw::line(ima_links, p1, p2, literal::red); + } + + for (unsigned l = 1; l < hratio_filtered_links.nelements(); ++l) + { + point2d p1 = scribo::primitive::link::internal::compute_anchor(hratio_filtered_links.components(), l, scribo::anchor::MassCenter); + point2d p2 = scribo::primitive::link::internal::compute_anchor(hratio_filtered_links.components(), hratio_filtered_links(l), scribo::anchor::MassCenter); + + draw::line(ima_links, p1, p2, literal::blue); + } + + // Group components + scribo::object_groups< image2d<unsigned> > groups = scribo::primitive::group::from_single_link(hratio_filtered_links); + scribo::draw::groups_bboxes(ima_groups, groups, literal::blue); + + // Compute averages + unsigned average_height = 0; + unsigned average_width = 0; + + for (unsigned i = 1; i < groups.nelements(); ++i) + { + average_height += groups(i).bbox().height(); + average_width += groups(i).bbox().width(); + } + average_height /= groups.nelements(); + average_width /= groups.nelements(); + + std::vector<short> balance(groups.nelements(), 0); + + // Draw vertical links (red) + for (unsigned i = 1; i < groups.nelements(); ++i) + { + for (unsigned j = 1; j < groups.nelements(); ++j) + { + if (i != j) + { + const box2d& b1 = groups(i).bbox(); + const box2d& b2 = groups(j).bbox(); + const point2d& p1 = b1.pcenter(); + const point2d& p2 = b2.pcenter(); + + unsigned max_height = std::max(b1.height(), b2.height()); + unsigned min_height = std::min(b1.height(), b2.height()); + + if (p1[0] < p2[0] // Avoid redundancy + && max_height * 2 < denoised.ncols() + && min_height + 3 >= max_height // Same heights + && b1.width() < 2 * average_width && b2.width() < 2 * average_width // Regular width + && (b1.pmin()[1] == b2.pmin()[1] + || (b1.pmin()[1] < b2.pmin()[1] && b1.pmax()[1] > b2.pmin()[1]) + || (b1.pmin()[1] > b2.pmin()[1] && b2.pmax()[1] > b1.pmin()[1])) // Boxes are aligned + && abs(p1[0] - p2[0]) < 3 * max_height // Reduced gap + && abs(p1[1] - p2[1]) < 20) // Vertical proximity + { + draw::line(ima_groups, p1, p2, literal::red); + balance[i] += 1; + balance[j] += 1; + break; + } + } + } + } + + // Draw horizontal links (green) + for (unsigned i = 1; i < groups.nelements(); ++i) + { + for (unsigned j = 1; j < groups.nelements(); ++j) + { + if (i != j) + { + const box2d& b1 = groups(i).bbox(); + const box2d& b2 = groups(j).bbox(); + const point2d& p1 = b1.pcenter(); + const point2d& p2 = b2.pcenter(); + + if (p1[1] < p2[1] // Avoid redundancy + && (b1.pmin()[0] == b2.pmin()[0] + || (b1.pmin()[0] < b2.pmin()[0] && b1.pmax()[0] > b2.pmin()[0]) + || (b1.pmin()[0] > b2.pmin()[0] && b2.pmax()[0] > b1.pmin()[0])) // Boxes are aligned + && abs(p1[0] - p2[0]) < 10) // Reduced gap + { + draw::line(ima_groups, p1, p2, literal::green); + balance[i] += 1; + balance[j] += 1; + break; + } + } + } + } + + // Draw weighted boxes (red < orange < cyan < green) + for (unsigned i = 0; i < balance.size(); ++i) + { + std::cout << balance[i] << " "; + if (balance[i] == 1) + draw::box(ima_valid, groups(i).bbox(), literal::red); + + if (balance[i] == 2) + draw::box(ima_valid, groups(i).bbox(), literal::orange); + + if (balance[i] == 3) + draw::box(ima_valid, groups(i).bbox(), literal::cyan); + + if (balance[i] > 3) + draw::box(ima_valid, groups(i).bbox(), literal::green); + } + std::cout << std::endl << std::endl; + // Write images and close XML path.str(""); path << "output/p" << page << "_0_bin.pbm"; @@ -127,17 +284,20 @@ int main(int argc, char** argv) path.str(""); path << "output/p" << page << "_1_bin_without_separators.pbm"; io::pbm::save(bin_without_separators, path.str()); - path.str(""); path << "output/p" << page << "_2_whitespaces.pbm"; - io::pbm::save(whitespaces, path.str()); - - path.str(""); path << "output/p" << page << "_3_denoised.pbm"; + path.str(""); path << "output/p" << page << "_2_denoised.pbm"; io::pbm::save(denoised, path.str()); - path.str(""); path << "output/p" << page << "_4_components.pbm"; + path.str(""); path << "output/p" << page << "_3_components.pbm"; io::pbm::save(comp, path.str()); - path.str(""); path << "output/p" << page << "_5_links.pbm"; - io::pbm::save(links, path.str()); + path.str(""); path << "output/p" << page << "_4_links.ppm"; + io::ppm::save(ima_links, path.str()); + + path.str(""); path << "output/p" << page << "_5_groups.ppm"; + io::ppm::save(ima_groups, path.str()); + + path.str(""); path << "output/p" << page << "_6_valid.ppm"; + io::ppm::save(ima_valid, path.str()); } end_xml(xml); -- 1.7.2.5
11Â years, 7Â months
1
0
0
0
olena: olena-2.0-559-g2ef0c81 [ICDAR] Handle reverse video
by Anthony Seure
--- scribo/sandbox/icdar_13_table/Makefile | 4 +- scribo/sandbox/icdar_13_table/TODO | 2 + scribo/sandbox/icdar_13_table/src/new.cc | 61 ++++++++++++++++++++++-------- 3 files changed, 49 insertions(+), 18 deletions(-) diff --git a/scribo/sandbox/icdar_13_table/Makefile b/scribo/sandbox/icdar_13_table/Makefile index 19743e1..8155a34 100644 --- a/scribo/sandbox/icdar_13_table/Makefile +++ b/scribo/sandbox/icdar_13_table/Makefile @@ -1,6 +1,6 @@ -CCACHE= +CCACHE=ccache CC=g++ -CFLAGS=-Wall -Werror -O3 -DHAVE_TESSERACT_3 -DNDEBUG -g +CFLAGS=-Wall -Werror -O3 -DHAVE_TESSERACT_3 -DNDEBUG CLIBS=-I../../../milena/ -I../../ -I/usr/include/poppler CLEAN=*.o output/* log final.xml diff --git a/scribo/sandbox/icdar_13_table/TODO b/scribo/sandbox/icdar_13_table/TODO index a4aa631..e3c6f52 100644 --- a/scribo/sandbox/icdar_13_table/TODO +++ b/scribo/sandbox/icdar_13_table/TODO @@ -5,6 +5,8 @@ Table location sub-competition : * Find links betwwen pages for mutlipages tables * Get text from reversed-video zones + * *** glibc detected *** ./table: corrupted double-linked list + with the file us-005.pdf from the test set Table structure recognition sub-competition : * All diff --git a/scribo/sandbox/icdar_13_table/src/new.cc b/scribo/sandbox/icdar_13_table/src/new.cc index 714d0c2..aca31bb 100644 --- a/scribo/sandbox/icdar_13_table/src/new.cc +++ b/scribo/sandbox/icdar_13_table/src/new.cc @@ -98,8 +98,8 @@ int main(int argc, char** argv) std::ostringstream path; image2d<value::rgb8> original, ima_links, ima_groups, ima_valid; image2d<value::int_u8> filtered; - image2d<bool> bin, separators, bin_without_separators, whitespaces, comp, denoised; - scribo::component_set< image2d<unsigned> > components; + image2d<bool> bin, reverse, reverse_selection, bin_merged, separators, bin_without_separators, whitespaces, comp, denoised; + scribo::component_set< image2d<unsigned> > components, rcomponents; unsigned dpi = 72; @@ -114,6 +114,25 @@ int main(int argc, char** argv) filtered = data::transform(original, fun::v2v::rgb_to_luma<value::int_u8>()); bin = scribo::binarization::sauvola(filtered, 81, 0.44); + // Reverse selection + reverse = logical::not_(bin); + initialize(reverse_selection, reverse); + data::fill(reverse_selection, false); + + unsigned nrcomponents; + rcomponents = scribo::primitive::extract::components(reverse, c8(), nrcomponents); + + for (unsigned i = 1; i < rcomponents.nelements(); ++i) + { + const box2d& b = rcomponents(i).bbox(); + + if (b.height() < 20 && b.width() < 20) + data::fill((reverse_selection | b).rw(), true); + } + + reverse_selection = logical::and_(reverse, reverse_selection); + reverse_selection = scribo::preprocessing::denoise_fg(reverse_selection, c8(), 4); + // Find separators bin_without_separators = duplicate(bin); separators = separators; @@ -131,11 +150,14 @@ int main(int argc, char** argv) // Denoise denoised = scribo::preprocessing::denoise_fg(bin_without_separators, c8(), 4); + // Bin merged + bin_merged = logical::or_(denoised, reverse_selection); + // Extract components unsigned ncomponents; - components = scribo::primitive::extract::components(denoised, c8(), ncomponents); + components = scribo::primitive::extract::components(bin_merged, c8(), ncomponents); - initialize(comp, denoised); + initialize(comp, bin_merged); data::fill(comp, false); for (unsigned i = 1; i <= components.nelements(); ++i) { @@ -158,9 +180,9 @@ int main(int argc, char** argv) // Filter links scribo::object_links< image2d<unsigned> > hratio_filtered_links = scribo::filter::object_links_bbox_h_ratio(merged_links, 2.5f); - ima_links = data::convert(value::rgb8(), denoised); - ima_groups = data::convert(value::rgb8(), denoised); - ima_valid = data::convert(value::rgb8(), denoised); + ima_links = data::convert(value::rgb8(), bin_merged); + ima_groups = data::convert(value::rgb8(), bin_merged); + ima_valid = data::convert(value::rgb8(), bin_merged); // Write links for (unsigned l = 1; l < merged_links.nelements(); ++l) @@ -213,7 +235,7 @@ int main(int argc, char** argv) unsigned min_height = std::min(b1.height(), b2.height()); if (p1[0] < p2[0] // Avoid redundancy - && max_height * 2 < denoised.ncols() + && max_height * 2 < bin_merged.ncols() && min_height + 3 >= max_height // Same heights && b1.width() < 2 * average_width && b2.width() < 2 * average_width // Regular width && (b1.pmin()[1] == b2.pmin()[1] @@ -258,10 +280,9 @@ int main(int argc, char** argv) } } - // Draw weighted boxes (red < orange < cyan < green) + // Draw weighted boxes (red < orange < cyan < green) (useless ?) for (unsigned i = 0; i < balance.size(); ++i) { - std::cout << balance[i] << " "; if (balance[i] == 1) draw::box(ima_valid, groups(i).bbox(), literal::red); @@ -274,10 +295,9 @@ int main(int argc, char** argv) if (balance[i] > 3) draw::box(ima_valid, groups(i).bbox(), literal::green); } - std::cout << std::endl << std::endl; - // Write images and close XML + // FIXME To externalize path.str(""); path << "output/p" << page << "_0_bin.pbm"; io::pbm::save(bin, path.str()); @@ -287,16 +307,25 @@ int main(int argc, char** argv) path.str(""); path << "output/p" << page << "_2_denoised.pbm"; io::pbm::save(denoised, path.str()); - path.str(""); path << "output/p" << page << "_3_components.pbm"; + path.str(""); path << "output/p" << page << "_3_reverse.pbm"; + io::pbm::save(reverse, path.str()); + + path.str(""); path << "output/p" << page << "_4_reverse_selection.pbm"; + io::pbm::save(reverse_selection, path.str()); + + path.str(""); path << "output/p" << page << "_5_bin_merged.pbm"; + io::pbm::save(bin_merged, path.str()); + + path.str(""); path << "output/p" << page << "_6_components.pbm"; io::pbm::save(comp, path.str()); - path.str(""); path << "output/p" << page << "_4_links.ppm"; + path.str(""); path << "output/p" << page << "_7_links.ppm"; io::ppm::save(ima_links, path.str()); - path.str(""); path << "output/p" << page << "_5_groups.ppm"; + path.str(""); path << "output/p" << page << "_8_groups.ppm"; io::ppm::save(ima_groups, path.str()); - path.str(""); path << "output/p" << page << "_6_valid.ppm"; + path.str(""); path << "output/p" << page << "_9_valid.ppm"; io::ppm::save(ima_valid, path.str()); } -- 1.7.2.5
11Â years, 7Â months
1
0
0
0
olena: olena-2.0-560-gc703af5 [ICDAR] Externalize some functions
by Anthony Seure
--- scribo/sandbox/icdar_13_table/TODO | 3 +- scribo/sandbox/icdar_13_table/src/new.cc | 281 ++++++++++++++++++++---------- 2 files changed, 190 insertions(+), 94 deletions(-) diff --git a/scribo/sandbox/icdar_13_table/TODO b/scribo/sandbox/icdar_13_table/TODO index e3c6f52..86486c9 100644 --- a/scribo/sandbox/icdar_13_table/TODO +++ b/scribo/sandbox/icdar_13_table/TODO @@ -4,9 +4,10 @@ Table location sub-competition : * Find links betwwen pages for mutlipages tables - * Get text from reversed-video zones * *** glibc detected *** ./table: corrupted double-linked list with the file us-005.pdf from the test set + * 'Floating point exeption' using sauvola_ms(a, b, c) with a floating point + c instead of an unsigned (Z is working on it) Table structure recognition sub-competition : * All diff --git a/scribo/sandbox/icdar_13_table/src/new.cc b/scribo/sandbox/icdar_13_table/src/new.cc index aca31bb..54b9f3b 100644 --- a/scribo/sandbox/icdar_13_table/src/new.cc +++ b/scribo/sandbox/icdar_13_table/src/new.cc @@ -53,6 +53,36 @@ using namespace mln; +// Write image2d<bool> images +void write_image(const image2d<bool>& ima, + const char* name, + const unsigned page, + unsigned& number, + std::ostringstream& path) +{ + path.str(""); + path << "output/p" << page + << "_" << number + << "_" << name << ".pbm"; + io::pbm::save(ima, path.str()); + ++number; +} + +// Write image2d<value::rbg8> images +void write_image(const image2d<value::rgb8>& ima, + const char* name, + const unsigned page, + unsigned& number, + std::ostringstream& path) +{ + path.str(""); + path << "output/p" << page + << "_" << number + << "_" << name << ".ppm"; + io::ppm::save(ima, path.str()); + ++number; +} + // Open and initialize XML void start_xml(std::ofstream& xml, const char* name, const char* pdf) { @@ -85,10 +115,147 @@ void write_table(std::ofstream& xml, const point2d& start, const point2d& end) ++table; } - /********/ - /* MAIN */ - /********/ +// Draw vertical links from top to bottom (red) +void draw_links_tb(const scribo::object_groups< image2d<unsigned> >& groups, + image2d<value::rgb8>& ima_groups, + std::vector<short>& balance, + unsigned average_width) +{ + for (unsigned i = 1; i <= groups.nelements(); ++i) + { + for (unsigned j = 1; j <= groups.nelements(); ++j) + { + if (i != j) + { + const box2d& b1 = groups(i).bbox(); + const box2d& b2 = groups(j).bbox(); + const point2d& p1 = b1.pcenter(); + const point2d& p2 = b2.pcenter(); + + unsigned max_height = std::max(b1.height(), b2.height()); + unsigned min_height = std::min(b1.height(), b2.height()); + + if (p1[0] < p2[0] // Avoid redundancy + && max_height * 2 < ima_groups.ncols() + && min_height + 3 >= max_height // Same heights + && b1.width() < 2 * average_width && b2.width() < 2 * average_width // Regular width + && (b1.pmin()[1] == b2.pmin()[1] + || (b1.pmin()[1] < b2.pmin()[1] && b1.pmax()[1] > b2.pmin()[1]) + || (b1.pmin()[1] > b2.pmin()[1] && b2.pmax()[1] > b1.pmin()[1])) // Boxes are aligned + && abs(p1[0] - p2[0]) < 3 * max_height // Reduced gap + && abs(p1[1] - p2[1]) < 20) // Vertical proximity + { + draw::line(ima_groups, p1, p2, literal::red); + balance[i] += 1; + break; + } + } + } + } +} + +// Draw vertical links from bottom to top (red) +void draw_links_bt(const scribo::object_groups< image2d<unsigned> >& groups, + image2d<value::rgb8>& ima_groups, + std::vector<short>& balance, + unsigned average_width) +{ + for (unsigned i = groups.nelements(); i > 0; --i) + { + for (unsigned j = groups.nelements(); j > 0; --j) + { + if (i != j) + { + const box2d& b1 = groups(i).bbox(); + const box2d& b2 = groups(j).bbox(); + const point2d& p1 = b1.pcenter(); + const point2d& p2 = b2.pcenter(); + + unsigned max_height = std::max(b1.height(), b2.height()); + unsigned min_height = std::min(b1.height(), b2.height()); + + if (p1[0] > p2[0] // Avoid redundancy + && max_height * 2 < ima_groups.ncols() + && min_height + 3 >= max_height // Same heights + && b1.width() < 2 * average_width && b2.width() < 2 * average_width // Regular width + && (b1.pmin()[1] == b2.pmin()[1] + || (b1.pmin()[1] < b2.pmin()[1] && b1.pmax()[1] > b2.pmin()[1]) + || (b1.pmin()[1] > b2.pmin()[1] && b2.pmax()[1] > b1.pmin()[1])) // Boxes are aligned + && abs(p1[0] - p2[0]) < 3 * max_height // Reduced gap + && abs(p1[1] - p2[1]) < 20) // Vertical proximity + { + draw::line(ima_groups, p1, p2, literal::red); + balance[i] += 1; + break; + } + } + } + } +} + +// Draw horizontal links from left to right (green) +void draw_links_lr(const scribo::object_groups< image2d<unsigned> >& groups, + image2d<value::rgb8>& ima_groups, + std::vector<short>& balance) +{ + for (unsigned i = 1; i <= groups.nelements(); ++i) + { + for (unsigned j = 1; j <= groups.nelements(); ++j) + { + if (i != j) + { + const box2d& b1 = groups(i).bbox(); + const box2d& b2 = groups(j).bbox(); + const point2d& p1 = b1.pcenter(); + const point2d& p2 = b2.pcenter(); + + if (p1[1] < p2[1] // Avoid redundancy + && (b1.pmin()[0] == b2.pmin()[0] + || (b1.pmin()[0] < b2.pmin()[0] && b1.pmax()[0] > b2.pmin()[0]) + || (b1.pmin()[0] > b2.pmin()[0] && b2.pmax()[0] > b1.pmin()[0])) // Boxes are aligned + && abs(p1[0] - p2[0]) < 10) // Reduced gap + { + draw::line(ima_groups, p1, p2, literal::green); + balance[i] += 1; + break; + } + } + } + } +} + +// Draw horizontal links from right to left (green) +void draw_links_rl(const scribo::object_groups< image2d<unsigned> >& groups, + image2d<value::rgb8>& ima_groups, + std::vector<short>& balance) +{ + for (unsigned i = groups.nelements(); i > 0; --i) + { + for (unsigned j = groups.nelements(); j > 0; --j) + { + if (i != j) + { + const box2d& b1 = groups(i).bbox(); + const box2d& b2 = groups(j).bbox(); + const point2d& p1 = b1.pcenter(); + const point2d& p2 = b2.pcenter(); + + if (p1[1] > p2[1] // Avoid redundancy + && (b1.pmin()[0] == b2.pmin()[0] + || (b1.pmin()[0] < b2.pmin()[0] && b1.pmax()[0] > b2.pmin()[0]) + || (b1.pmin()[0] > b2.pmin()[0] && b2.pmax()[0] > b1.pmin()[0])) // Boxes are aligned + && abs(p1[0] - p2[0]) < 10) // Reduced gap + { + draw::line(ima_groups, p1, p2, literal::green); + balance[i] += 1; + break; + } + } + } + } +} +/******************************** MAIN ****************************************/ int main(int argc, char** argv) { typedef value::label_16 V; @@ -108,6 +275,7 @@ int main(int argc, char** argv) util::array< image2d<value::rgb8> > pdf; io::pdf::load(pdf, argv[1], dpi); + for (unsigned page = 0; page < pdf.nelements(); ++page) { original = pdf[page]; @@ -219,66 +387,11 @@ int main(int argc, char** argv) std::vector<short> balance(groups.nelements(), 0); - // Draw vertical links (red) - for (unsigned i = 1; i < groups.nelements(); ++i) - { - for (unsigned j = 1; j < groups.nelements(); ++j) - { - if (i != j) - { - const box2d& b1 = groups(i).bbox(); - const box2d& b2 = groups(j).bbox(); - const point2d& p1 = b1.pcenter(); - const point2d& p2 = b2.pcenter(); - - unsigned max_height = std::max(b1.height(), b2.height()); - unsigned min_height = std::min(b1.height(), b2.height()); - - if (p1[0] < p2[0] // Avoid redundancy - && max_height * 2 < bin_merged.ncols() - && min_height + 3 >= max_height // Same heights - && b1.width() < 2 * average_width && b2.width() < 2 * average_width // Regular width - && (b1.pmin()[1] == b2.pmin()[1] - || (b1.pmin()[1] < b2.pmin()[1] && b1.pmax()[1] > b2.pmin()[1]) - || (b1.pmin()[1] > b2.pmin()[1] && b2.pmax()[1] > b1.pmin()[1])) // Boxes are aligned - && abs(p1[0] - p2[0]) < 3 * max_height // Reduced gap - && abs(p1[1] - p2[1]) < 20) // Vertical proximity - { - draw::line(ima_groups, p1, p2, literal::red); - balance[i] += 1; - balance[j] += 1; - break; - } - } - } - } - - // Draw horizontal links (green) - for (unsigned i = 1; i < groups.nelements(); ++i) - { - for (unsigned j = 1; j < groups.nelements(); ++j) - { - if (i != j) - { - const box2d& b1 = groups(i).bbox(); - const box2d& b2 = groups(j).bbox(); - const point2d& p1 = b1.pcenter(); - const point2d& p2 = b2.pcenter(); - - if (p1[1] < p2[1] // Avoid redundancy - && (b1.pmin()[0] == b2.pmin()[0] - || (b1.pmin()[0] < b2.pmin()[0] && b1.pmax()[0] > b2.pmin()[0]) - || (b1.pmin()[0] > b2.pmin()[0] && b2.pmax()[0] > b1.pmin()[0])) // Boxes are aligned - && abs(p1[0] - p2[0]) < 10) // Reduced gap - { - draw::line(ima_groups, p1, p2, literal::green); - balance[i] += 1; - balance[j] += 1; - break; - } - } - } - } + // Draw and count links + draw_links_tb(groups, ima_groups, balance, average_width); + draw_links_bt(groups, ima_groups, balance, average_width); + draw_links_lr(groups, ima_groups, balance); + draw_links_rl(groups, ima_groups, balance); // Draw weighted boxes (red < orange < cyan < green) (useless ?) for (unsigned i = 0; i < balance.size(); ++i) @@ -297,36 +410,18 @@ int main(int argc, char** argv) } // Write images and close XML - // FIXME To externalize - path.str(""); path << "output/p" << page << "_0_bin.pbm"; - io::pbm::save(bin, path.str()); - - path.str(""); path << "output/p" << page << "_1_bin_without_separators.pbm"; - io::pbm::save(bin_without_separators, path.str()); - - path.str(""); path << "output/p" << page << "_2_denoised.pbm"; - io::pbm::save(denoised, path.str()); - - path.str(""); path << "output/p" << page << "_3_reverse.pbm"; - io::pbm::save(reverse, path.str()); - - path.str(""); path << "output/p" << page << "_4_reverse_selection.pbm"; - io::pbm::save(reverse_selection, path.str()); - - path.str(""); path << "output/p" << page << "_5_bin_merged.pbm"; - io::pbm::save(bin_merged, path.str()); - - path.str(""); path << "output/p" << page << "_6_components.pbm"; - io::pbm::save(comp, path.str()); - - path.str(""); path << "output/p" << page << "_7_links.ppm"; - io::ppm::save(ima_links, path.str()); - - path.str(""); path << "output/p" << page << "_8_groups.ppm"; - io::ppm::save(ima_groups, path.str()); - - path.str(""); path << "output/p" << page << "_9_valid.ppm"; - io::ppm::save(ima_valid, path.str()); + unsigned number = 0; + + write_image(bin, "bin", page, number, path); + write_image(bin_without_separators, "bin_without_separators", page, number, path); + write_image(denoised, "denoised", page, number, path); + write_image(reverse, "reverse", page, number, path); + write_image(reverse_selection, "", page, number, path); + write_image(bin_merged, "reverse_selection", page, number, path); + write_image(comp, "bin_merged", page, number, path); + write_image(ima_links, "components", page, number, path); + write_image(ima_groups, "groups", page, number, path); + write_image(ima_valid, "valid", page, number, path); } end_xml(xml); -- 1.7.2.5
11Â years, 7Â months
1
0
0
0
olena: olena-2.0-561-g1e8d6c3 [ICDAR_13] Externalize XML handler
by Anthony Seure
--- scribo/sandbox/icdar_13_table/Makefile | 2 +- scribo/sandbox/icdar_13_table/src/new.cc | 51 +++++++----------------------- scribo/sandbox/icdar_13_table/src/xml.cc | 31 ++++++++++++++++++ scribo/sandbox/icdar_13_table/src/xml.hh | 24 ++++++++++++++ 4 files changed, 68 insertions(+), 40 deletions(-) create mode 100644 scribo/sandbox/icdar_13_table/src/xml.cc create mode 100644 scribo/sandbox/icdar_13_table/src/xml.hh diff --git a/scribo/sandbox/icdar_13_table/Makefile b/scribo/sandbox/icdar_13_table/Makefile index 8155a34..8d0cd1a 100644 --- a/scribo/sandbox/icdar_13_table/Makefile +++ b/scribo/sandbox/icdar_13_table/Makefile @@ -4,7 +4,7 @@ CFLAGS=-Wall -Werror -O3 -DHAVE_TESSERACT_3 -DNDEBUG CLIBS=-I../../../milena/ -I../../ -I/usr/include/poppler CLEAN=*.o output/* log final.xml -SRC=src/new.cc +SRC=src/xml.cc src/new.cc SRC_OLD=src/main.cc OUTPUT=table OUTPUT_OLD=old diff --git a/scribo/sandbox/icdar_13_table/src/new.cc b/scribo/sandbox/icdar_13_table/src/new.cc index 54b9f3b..95bf575 100644 --- a/scribo/sandbox/icdar_13_table/src/new.cc +++ b/scribo/sandbox/icdar_13_table/src/new.cc @@ -1,3 +1,5 @@ +#include "xml.hh" + // INCLUDES OLENA #include <mln/binarization/all.hh> @@ -83,38 +85,6 @@ void write_image(const image2d<value::rgb8>& ima, ++number; } -// Open and initialize XML -void start_xml(std::ofstream& xml, const char* name, const char* pdf) -{ - xml.open(name); - xml << "<?xml version\"1.0\" encoding=\"UTF-8\"?>" << std::endl - << "<document filename='" << pdf << "'>" << std::endl; -} - -// Finalize an close XML -void end_xml(std::ofstream& xml) -{ - xml << "</document>" << std::endl; - xml.close(); -} - -// Write a new (simple) table in XML file -void write_table(std::ofstream& xml, const point2d& start, const point2d& end) -{ - static unsigned table = 0; - static unsigned region = 0; - static unsigned page = 1; - - xml << "\t<table id='" << table << "'>" << std::endl - << "\t\t<region id='" << region << "' page='" << page << "'>" << std::endl - << "\t\t<bounding-box x1='" << start[1] << "' y1='" << start[0] << "' " - << "x2='" << end[1] << "' y2='" << end[0] << "'/>" << std::endl - << "\t\t</region>" << std::endl - << "\t</table>" << std::endl; - - ++table; -} - // Draw vertical links from top to bottom (red) void draw_links_tb(const scribo::object_groups< image2d<unsigned> >& groups, image2d<value::rgb8>& ima_groups, @@ -142,8 +112,8 @@ void draw_links_tb(const scribo::object_groups< image2d<unsigned> >& groups, && (b1.pmin()[1] == b2.pmin()[1] || (b1.pmin()[1] < b2.pmin()[1] && b1.pmax()[1] > b2.pmin()[1]) || (b1.pmin()[1] > b2.pmin()[1] && b2.pmax()[1] > b1.pmin()[1])) // Boxes are aligned - && abs(p1[0] - p2[0]) < 3 * max_height // Reduced gap - && abs(p1[1] - p2[1]) < 20) // Vertical proximity + && (unsigned) abs(p1[0] - p2[0]) < 3 * max_height // Reduced gap + && (unsigned) abs(p1[1] - p2[1]) < 20) // Vertical proximity { draw::line(ima_groups, p1, p2, literal::red); balance[i] += 1; @@ -181,8 +151,8 @@ void draw_links_bt(const scribo::object_groups< image2d<unsigned> >& groups, && (b1.pmin()[1] == b2.pmin()[1] || (b1.pmin()[1] < b2.pmin()[1] && b1.pmax()[1] > b2.pmin()[1]) || (b1.pmin()[1] > b2.pmin()[1] && b2.pmax()[1] > b1.pmin()[1])) // Boxes are aligned - && abs(p1[0] - p2[0]) < 3 * max_height // Reduced gap - && abs(p1[1] - p2[1]) < 20) // Vertical proximity + && (unsigned) abs(p1[0] - p2[0]) < 3 * max_height // Reduced gap + && (unsigned) abs(p1[1] - p2[1]) < 20) // Vertical proximity { draw::line(ima_groups, p1, p2, literal::red); balance[i] += 1; @@ -261,7 +231,7 @@ int main(int argc, char** argv) typedef value::label_16 V; typedef image2d<V> L; - std::ofstream xml; + //std::ofstream xml; std::ostringstream path; image2d<value::rgb8> original, ima_links, ima_groups, ima_valid; image2d<value::int_u8> filtered; @@ -271,7 +241,8 @@ int main(int argc, char** argv) unsigned dpi = 72; // Loading and binarization - start_xml(xml, "final.xml", argv[1]); + //start_xml(xml, "final.xml", argv[1]); + XML* xml = new XML("final.xml", argv[1]); util::array< image2d<value::rgb8> > pdf; io::pdf::load(pdf, argv[1], dpi); @@ -424,7 +395,9 @@ int main(int argc, char** argv) write_image(ima_valid, "valid", page, number, path); } - end_xml(xml); + + //end_xml(xml); + delete xml; return 0; } diff --git a/scribo/sandbox/icdar_13_table/src/xml.cc b/scribo/sandbox/icdar_13_table/src/xml.cc new file mode 100644 index 0000000..76fed84 --- /dev/null +++ b/scribo/sandbox/icdar_13_table/src/xml.cc @@ -0,0 +1,31 @@ +#include "xml.hh" + +XML::XML(const char* name, const char* pdf) + : _name(name), _pdf(pdf) +{ + _xml.open(_name); + _xml << "<?xml version\"1.0\" encoding=\"UTF-8\"?>" << std::endl + << "<document filename='" << _pdf << "'>" << std::endl; +} + +XML::~XML() +{ + _xml << "</document>" << std::endl; + _xml.close(); +} + +void XML::write_table(const point2d& start, const point2d& end) +{ + static unsigned table = 0; + static unsigned region = 0; + static unsigned page = 1; + + _xml << "\t<table id='" << table << "'>" << std::endl + << "\t\t<region id='" << region << "' page='" << page << "'>" << std::endl + << "\t\t<bounding-box x1='" << start[1] << "' y1='" << start[0] << "' " + << "x2='" << end[1] << "' y2='" << end[0] << "'/>" << std::endl + << "\t\t</region>" << std::endl + << "\t</table>" << std::endl; + + ++table; +} diff --git a/scribo/sandbox/icdar_13_table/src/xml.hh b/scribo/sandbox/icdar_13_table/src/xml.hh new file mode 100644 index 0000000..397d585 --- /dev/null +++ b/scribo/sandbox/icdar_13_table/src/xml.hh @@ -0,0 +1,24 @@ +#ifndef XML_HH +# define XML_HH +# define MLN_WO_GLOBAL_VARS + +# include <iostream> +# include <fstream> +# include <mln/core/alias/point2d.hh> + +using namespace mln; + +class XML +{ + public: + XML(const char* name, const char* pdf); + ~XML(); + void write_table(const point2d& start, const point2d& end); + + private: + std::ofstream _xml; + const char* _name; + const char* _pdf; +}; + +#endif /* !XML_HH */ -- 1.7.2.5
11Â years, 7Â months
1
0
0
0
olena: olena-2.0-562-g8241f48 [ICDAR_13] Update XML output + Add XML reference
by Anthony Seure
--- scribo/sandbox/icdar_13_table/Makefile | 2 +- .../competition-entry-region-model.xsd | 45 +++++++++++++++++++ scribo/sandbox/icdar_13_table/src/new.cc | 5 +-- scribo/sandbox/icdar_13_table/src/xml.cc | 47 +++++++++++++------ scribo/sandbox/icdar_13_table/src/xml.hh | 9 +++- 5 files changed, 86 insertions(+), 22 deletions(-) create mode 100644 scribo/sandbox/icdar_13_table/competition-entry-region-model.xsd diff --git a/scribo/sandbox/icdar_13_table/Makefile b/scribo/sandbox/icdar_13_table/Makefile index 8d0cd1a..7c11bf1 100644 --- a/scribo/sandbox/icdar_13_table/Makefile +++ b/scribo/sandbox/icdar_13_table/Makefile @@ -1,6 +1,6 @@ CCACHE=ccache CC=g++ -CFLAGS=-Wall -Werror -O3 -DHAVE_TESSERACT_3 -DNDEBUG +CFLAGS=-Wall -Werror -O3 -DHAVE_TESSERACT_3 -DNDEBUG -DMLN_WO_GLOBAL_VARS CLIBS=-I../../../milena/ -I../../ -I/usr/include/poppler CLEAN=*.o output/* log final.xml diff --git a/scribo/sandbox/icdar_13_table/competition-entry-region-model.xsd b/scribo/sandbox/icdar_13_table/competition-entry-region-model.xsd new file mode 100644 index 0000000..2e0d0d4 --- /dev/null +++ b/scribo/sandbox/icdar_13_table/competition-entry-region-model.xsd @@ -0,0 +1,45 @@ +<?xml version="1.0" encoding="utf-8"?> +<xsd:schema attributeFormDefault="unqualified" elementFormDefault="qualified" version="1.0" + xmlns:xsd="
http://www.w3.org/2001/XMLSchema
" > + <xsd:element name="document"> + <xsd:complexType> + <xsd:sequence> + <xsd:element minOccurs="0" maxOccurs="unbounded" name="table"> <!-- a document can contain 0 or more tables --> + <xsd:complexType> + <xsd:sequence> + <xsd:element name="region" maxOccurs="unbounded" nillable="false"> <!-- each table must contain 1 or more regions --> + <xsd:complexType> + <xsd:sequence> + <xsd:element name="instruction" minOccurs="0" maxOccurs="unbounded"> <!-- the instructions are optional --> + <xsd:complexType> + <xsd:attribute name="instr-id" type="xsd:integer" use="required"/> + <xsd:attribute name="subinstr-id" type="xsd:integer"/> + <!--<xsd:attribute name="text" type="xsd:string" use="required"/> + <xsd:attribute name="x1" type="xsd:integer" use="required"/> + <xsd:attribute name="y1" type="xsd:integer" use="required"/> + <xsd:attribute name="x2" type="xsd:integer" use="required"/> + <xsd:attribute name="y2" type="xsd:integer" use="required"/>--> + </xsd:complexType> + </xsd:element> + <xsd:element name="bounding-box"> <!-- each region contains one bounding box --> + <xsd:complexType> + <xsd:attribute name="x1" type="xsd:integer" use="required"/> + <xsd:attribute name="y1" type="xsd:integer" use="required"/> + <xsd:attribute name="x2" type="xsd:integer" use="required"/> + <xsd:attribute name="y2" type="xsd:integer" use="required"/> + </xsd:complexType> + </xsd:element> + </xsd:sequence> + <xsd:attribute name="id" type="xsd:nonNegativeInteger" use="required"/> + <xsd:attribute name="page" type="xsd:positiveInteger" use="required"/> + </xsd:complexType> + </xsd:element> + </xsd:sequence> + <xsd:attribute name="id" type="xsd:nonNegativeInteger" use="required"/> + </xsd:complexType> + </xsd:element> + </xsd:sequence> + <xsd:attribute name="filename" type="xsd:string" use="required"/> + </xsd:complexType> + </xsd:element> +</xsd:schema> diff --git a/scribo/sandbox/icdar_13_table/src/new.cc b/scribo/sandbox/icdar_13_table/src/new.cc index 95bf575..9f05030 100644 --- a/scribo/sandbox/icdar_13_table/src/new.cc +++ b/scribo/sandbox/icdar_13_table/src/new.cc @@ -1,3 +1,4 @@ +#undef MLN_WO_GLOBAL_VARS #include "xml.hh" // INCLUDES OLENA @@ -231,7 +232,6 @@ int main(int argc, char** argv) typedef value::label_16 V; typedef image2d<V> L; - //std::ofstream xml; std::ostringstream path; image2d<value::rgb8> original, ima_links, ima_groups, ima_valid; image2d<value::int_u8> filtered; @@ -241,7 +241,6 @@ int main(int argc, char** argv) unsigned dpi = 72; // Loading and binarization - //start_xml(xml, "final.xml", argv[1]); XML* xml = new XML("final.xml", argv[1]); util::array< image2d<value::rgb8> > pdf; @@ -395,8 +394,6 @@ int main(int argc, char** argv) write_image(ima_valid, "valid", page, number, path); } - - //end_xml(xml); delete xml; return 0; diff --git a/scribo/sandbox/icdar_13_table/src/xml.cc b/scribo/sandbox/icdar_13_table/src/xml.cc index 76fed84..86e66e2 100644 --- a/scribo/sandbox/icdar_13_table/src/xml.cc +++ b/scribo/sandbox/icdar_13_table/src/xml.cc @@ -1,31 +1,48 @@ #include "xml.hh" -XML::XML(const char* name, const char* pdf) - : _name(name), _pdf(pdf) +XML::XML(const char* name, + const char* pdf) + : _name(name), _pdf(pdf), _table(0), _region(0), _first_time(true) { _xml.open(_name); _xml << "<?xml version\"1.0\" encoding=\"UTF-8\"?>" << std::endl - << "<document filename='" << _pdf << "'>" << std::endl; + << "<document filename='" << _pdf << "'>" << std::endl; } -XML::~XML() +XML::~XML(void) { _xml << "</document>" << std::endl; _xml.close(); } -void XML::write_table(const point2d& start, const point2d& end) +void XML::table(const point2d& start, + const point2d& end, + const unsigned page, + const bool connect) { - static unsigned table = 0; - static unsigned region = 0; - static unsigned page = 1; + if (_first_time) + { + _xml << "\t<table id='" << _table << "'>" << std::endl; + ++_table; + _first_time = false; + } + else + { + if (!connect) + { + _xml << "\t</table>" << std::endl; + _xml << "\t<table id='" << _table << "'>" << std::endl; + _region = 0; + ++_table; + } + } - _xml << "\t<table id='" << table << "'>" << std::endl - << "\t\t<region id='" << region << "' page='" << page << "'>" << std::endl - << "\t\t<bounding-box x1='" << start[1] << "' y1='" << start[0] << "' " - << "x2='" << end[1] << "' y2='" << end[0] << "'/>" << std::endl - << "\t\t</region>" << std::endl - << "\t</table>" << std::endl; + _xml << "\t\t<region id='" << _region << "' page='" << page + 1 << "'>" << std::endl + << "\t\t\t<bounding-box x1='" << start[1] + << "' y1='" << start[0] + << "' x2='" << end[1] + << "' y2='" << end[0] << "'/>" << std::endl + << "\t\t</region>" << std::endl; - ++table; + ++_region; } diff --git a/scribo/sandbox/icdar_13_table/src/xml.hh b/scribo/sandbox/icdar_13_table/src/xml.hh index 397d585..b4cef29 100644 --- a/scribo/sandbox/icdar_13_table/src/xml.hh +++ b/scribo/sandbox/icdar_13_table/src/xml.hh @@ -1,6 +1,5 @@ #ifndef XML_HH # define XML_HH -# define MLN_WO_GLOBAL_VARS # include <iostream> # include <fstream> @@ -13,12 +12,18 @@ class XML public: XML(const char* name, const char* pdf); ~XML(); - void write_table(const point2d& start, const point2d& end); + void table(const point2d& start, + const point2d& end, + const unsigned page, + const bool connect); private: std::ofstream _xml; const char* _name; const char* _pdf; + unsigned _table; + unsigned _region; + bool _first_time; }; #endif /* !XML_HH */ -- 1.7.2.5
11Â years, 7Â months
1
0
0
0
olena: olena-2.0-563-g2661c4d [ICDAR_13] Check old version XML output with GUI Annotator
by Anthony Seure
--- scribo/sandbox/icdar_13_table/Makefile | 3 ++- .../competition-entry-region-model.xsd | 0 scribo/sandbox/icdar_13_table/src/main.cc | 9 ++++++--- 3 files changed, 8 insertions(+), 4 deletions(-) rename scribo/sandbox/icdar_13_table/{ => originals}/competition-entry-region-model.xsd (100%) diff --git a/scribo/sandbox/icdar_13_table/Makefile b/scribo/sandbox/icdar_13_table/Makefile index 7c11bf1..baa9ae1 100644 --- a/scribo/sandbox/icdar_13_table/Makefile +++ b/scribo/sandbox/icdar_13_table/Makefile @@ -1,6 +1,7 @@ CCACHE=ccache CC=g++ CFLAGS=-Wall -Werror -O3 -DHAVE_TESSERACT_3 -DNDEBUG -DMLN_WO_GLOBAL_VARS +CFLAGS_OLD=-Wall -Werror -O3 -DHAVE_TESSERACT_3 -DNDEBUG CLIBS=-I../../../milena/ -I../../ -I/usr/include/poppler CLEAN=*.o output/* log final.xml @@ -15,7 +16,7 @@ table: $(CCACHE) $(CC) $(CFLAGS) $(CLIBS) $(SRC) -ltesseract -lpoppler-cpp -o $(OUTPUT) old: - $(CCACHE) $(CC) $(CFLAGS) $(CLIBS) $(SRC_OLD) -ltesseract -lpoppler-cpp -o $(OUTPUT_OLD) + $(CCACHE) $(CC) $(CFLAGS_OLD) $(CLIBS) $(SRC_OLD) -ltesseract -lpoppler-cpp -o $(OUTPUT_OLD) clean: rm -rf $(CLEAN) diff --git a/scribo/sandbox/icdar_13_table/competition-entry-region-model.xsd b/scribo/sandbox/icdar_13_table/originals/competition-entry-region-model.xsd similarity index 100% rename from scribo/sandbox/icdar_13_table/competition-entry-region-model.xsd rename to scribo/sandbox/icdar_13_table/originals/competition-entry-region-model.xsd diff --git a/scribo/sandbox/icdar_13_table/src/main.cc b/scribo/sandbox/icdar_13_table/src/main.cc index be394ba..3aa40c2 100644 --- a/scribo/sandbox/icdar_13_table/src/main.cc +++ b/scribo/sandbox/icdar_13_table/src/main.cc @@ -26,9 +26,12 @@ using namespace mln; -void start_xml(std::ofstream& xml, const char* name, const char* pdf) +void start_xml(std::ofstream& xml, const char* pdf) { - xml.open(name); + //std::ostringstream name; + //name << pdf << "-reg-result.xml"; + //xml.open((name.str()).c_str()); + xml.open("us-005-reg-result.xml"); xml << "<?xml version\"1.0\" encoding=\"UTF-8\"?>" << std::endl << "<document filename='" << pdf << "'>" << std::endl; } @@ -212,7 +215,7 @@ int main(int argc, char** argv) // Loading and binarization std::ofstream xml; - start_xml(xml, "final.xml", argv[1]); + start_xml(xml, argv[1]); //io::ppm::load(original, argv[1]); util::array< image2d<value::rgb8> > pdf; -- 1.7.2.5
11Â years, 7Â months
1
0
0
0
olena: olena-2.0-564-g3e47e4a [ICDAR_13] Validate links of boxes based on separators
by Anthony Seure
--- scribo/sandbox/icdar_13_table/src/new.cc | 123 +++++++++++++++++++++++++----- 1 files changed, 102 insertions(+), 21 deletions(-) diff --git a/scribo/sandbox/icdar_13_table/src/new.cc b/scribo/sandbox/icdar_13_table/src/new.cc index 9f05030..5370164 100644 --- a/scribo/sandbox/icdar_13_table/src/new.cc +++ b/scribo/sandbox/icdar_13_table/src/new.cc @@ -87,10 +87,12 @@ void write_image(const image2d<value::rgb8>& ima, } // Draw vertical links from top to bottom (red) +template<typename L> void draw_links_tb(const scribo::object_groups< image2d<unsigned> >& groups, image2d<value::rgb8>& ima_groups, std::vector<short>& balance, - unsigned average_width) + unsigned average_width, + const scribo::component_set<L>& hlines) { for (unsigned i = 1; i <= groups.nelements(); ++i) { @@ -116,9 +118,27 @@ void draw_links_tb(const scribo::object_groups< image2d<unsigned> >& groups, && (unsigned) abs(p1[0] - p2[0]) < 3 * max_height // Reduced gap && (unsigned) abs(p1[1] - p2[1]) < 20) // Vertical proximity { - draw::line(ima_groups, p1, p2, literal::red); - balance[i] += 1; - break; + unsigned k = 1; + short separators = 0; + + while (k <= hlines.nelements() && separators < 2) + { + const box2d& s = hlines(k).bbox(); + + if (s.pmin()[1] <= b1.pmin()[1] && s.pmax()[1] >= b1.pmax()[1] + && s.pmin()[0] > b1.pmax()[0] + && s.pmax()[0] < b2.pmin()[0]) + ++separators; + + ++k; + } + + if (separators < 2) + { + draw::line(ima_groups, p1, p2, literal::red); + balance[i] += 1; + break; + } } } } @@ -126,10 +146,12 @@ void draw_links_tb(const scribo::object_groups< image2d<unsigned> >& groups, } // Draw vertical links from bottom to top (red) +template<typename L> void draw_links_bt(const scribo::object_groups< image2d<unsigned> >& groups, image2d<value::rgb8>& ima_groups, std::vector<short>& balance, - unsigned average_width) + unsigned average_width, + const scribo::component_set<L>& hlines) { for (unsigned i = groups.nelements(); i > 0; --i) { @@ -155,9 +177,27 @@ void draw_links_bt(const scribo::object_groups< image2d<unsigned> >& groups, && (unsigned) abs(p1[0] - p2[0]) < 3 * max_height // Reduced gap && (unsigned) abs(p1[1] - p2[1]) < 20) // Vertical proximity { - draw::line(ima_groups, p1, p2, literal::red); - balance[i] += 1; - break; + unsigned k = 1; + short separators = 0; + + while (k <= hlines.nelements() && separators < 2) + { + const box2d& s = hlines(k).bbox(); + + if (s.pmin()[1] <= b1.pmin()[1] && s.pmax()[1] >= b1.pmax()[1] + && s.pmax()[0] < b1.pmin()[0] + && s.pmin()[0] > b2.pmax()[0]) + ++separators; + + ++k; + } + + if (separators < 2) + { + draw::line(ima_groups, p1, p2, literal::red); + balance[i] += 1; + break; + } } } } @@ -165,9 +205,11 @@ void draw_links_bt(const scribo::object_groups< image2d<unsigned> >& groups, } // Draw horizontal links from left to right (green) +template<typename L> void draw_links_lr(const scribo::object_groups< image2d<unsigned> >& groups, image2d<value::rgb8>& ima_groups, - std::vector<short>& balance) + std::vector<short>& balance, + const scribo::component_set<L>& vlines) { for (unsigned i = 1; i <= groups.nelements(); ++i) { @@ -186,9 +228,27 @@ void draw_links_lr(const scribo::object_groups< image2d<unsigned> >& groups, || (b1.pmin()[0] > b2.pmin()[0] && b2.pmax()[0] > b1.pmin()[0])) // Boxes are aligned && abs(p1[0] - p2[0]) < 10) // Reduced gap { - draw::line(ima_groups, p1, p2, literal::green); - balance[i] += 1; - break; + unsigned k = 1; + short separators = 0; + + while (k <= vlines.nelements() && separators < 2) + { + const box2d& s = vlines(k).bbox(); + + if (s.pmin()[0] <= b1.pmin()[0] && s.pmax()[0] >= b1.pmax()[0] + && s.pmin()[1] > b1.pmax()[1] + && s.pmax()[1] < b2.pmin()[1]) + ++separators; + + ++k; + } + + if (separators < 2) + { + draw::line(ima_groups, p1, p2, literal::green); + balance[i] += 1; + break; + } } } } @@ -196,9 +256,11 @@ void draw_links_lr(const scribo::object_groups< image2d<unsigned> >& groups, } // Draw horizontal links from right to left (green) +template<typename L> void draw_links_rl(const scribo::object_groups< image2d<unsigned> >& groups, image2d<value::rgb8>& ima_groups, - std::vector<short>& balance) + std::vector<short>& balance, + const scribo::component_set<L>& vlines) { for (unsigned i = groups.nelements(); i > 0; --i) { @@ -217,9 +279,27 @@ void draw_links_rl(const scribo::object_groups< image2d<unsigned> >& groups, || (b1.pmin()[0] > b2.pmin()[0] && b2.pmax()[0] > b1.pmin()[0])) // Boxes are aligned && abs(p1[0] - p2[0]) < 10) // Reduced gap { - draw::line(ima_groups, p1, p2, literal::green); - balance[i] += 1; - break; + unsigned k = 1; + short separators = 0; + + while (k <= vlines.nelements() && separators < 2) + { + const box2d& s = vlines(k).bbox(); + + if (s.pmin()[0] <= b1.pmin()[0] && s.pmax()[0] >= b1.pmax()[0] + && s.pmax()[1] < b1.pmin()[1] + && s.pmin()[1] > b2.pmax()[1]) + ++separators; + + ++k; + } + + if (separators < 2) + { + draw::line(ima_groups, p1, p2, literal::green); + balance[i] += 1; + break; + } } } } @@ -358,12 +438,13 @@ int main(int argc, char** argv) std::vector<short> balance(groups.nelements(), 0); // Draw and count links - draw_links_tb(groups, ima_groups, balance, average_width); - draw_links_bt(groups, ima_groups, balance, average_width); - draw_links_lr(groups, ima_groups, balance); - draw_links_rl(groups, ima_groups, balance); + draw_links_tb(groups, ima_groups, balance, average_width, hlines); + draw_links_bt(groups, ima_groups, balance, average_width, hlines); + draw_links_lr(groups, ima_groups, balance, vlines); + draw_links_rl(groups, ima_groups, balance, vlines); - // Draw weighted boxes (red < orange < cyan < green) (useless ?) + // Draw weighted boxes (red < orange < cyan < green) + // 1 link < 2 links < 3 links < 3+ links for (unsigned i = 0; i < balance.size(); ++i) { if (balance[i] == 1) -- 1.7.2.5
11Â years, 7Â months
1
0
0
0
olena: olena-2.0-565-g103edf9 [ICDAR_13] Refactore and use of sauvola_ms instead of sauvola
by Anthony Seure
--- scribo/sandbox/icdar_13_table/src/new.cc | 93 +++++++++++++++++------------- 1 files changed, 53 insertions(+), 40 deletions(-) diff --git a/scribo/sandbox/icdar_13_table/src/new.cc b/scribo/sandbox/icdar_13_table/src/new.cc index 5370164..b13696e 100644 --- a/scribo/sandbox/icdar_13_table/src/new.cc +++ b/scribo/sandbox/icdar_13_table/src/new.cc @@ -27,6 +27,7 @@ // INCLUDES SCRIBO #include <scribo/binarization/sauvola.hh> +#include <scribo/binarization/sauvola_ms.hh> #include <scribo/core/component_set.hh> #include <scribo/core/line_set.hh> @@ -56,6 +57,29 @@ using namespace mln; +// Draw weighted boxes (red < orange < cyan < green) +// 1 link < 2 links < 3 links < 3+ links +template<typename T, typename L> +void draw_adjacency_boxes(const std::vector<short>& balance, + image2d<T>& ima, + const scribo::object_groups< image2d<L> >& groups) +{ + for (unsigned i = 0; i < balance.size(); ++i) + { + if (balance[i] == 1) + draw::box(ima, groups(i).bbox(), literal::red); + + if (balance[i] == 2) + draw::box(ima, groups(i).bbox(), literal::orange); + + if (balance[i] == 3) + draw::box(ima, groups(i).bbox(), literal::cyan); + + if (balance[i] > 3) + draw::box(ima, groups(i).bbox(), literal::green); + } +} + // Write image2d<bool> images void write_image(const image2d<bool>& ima, const char* name, @@ -226,7 +250,8 @@ void draw_links_lr(const scribo::object_groups< image2d<unsigned> >& groups, && (b1.pmin()[0] == b2.pmin()[0] || (b1.pmin()[0] < b2.pmin()[0] && b1.pmax()[0] > b2.pmin()[0]) || (b1.pmin()[0] > b2.pmin()[0] && b2.pmax()[0] > b1.pmin()[0])) // Boxes are aligned - && abs(p1[0] - p2[0]) < 10) // Reduced gap + && abs(p1[0] - p2[0]) < 10 // Reduced gap + && abs(p1[1] - p2[1]) > (b1.width() + b2.width()) / 4) // Consistent gap { unsigned k = 1; short separators = 0; @@ -277,7 +302,8 @@ void draw_links_rl(const scribo::object_groups< image2d<unsigned> >& groups, && (b1.pmin()[0] == b2.pmin()[0] || (b1.pmin()[0] < b2.pmin()[0] && b1.pmax()[0] > b2.pmin()[0]) || (b1.pmin()[0] > b2.pmin()[0] && b2.pmax()[0] > b1.pmin()[0])) // Boxes are aligned - && abs(p1[0] - p2[0]) < 10) // Reduced gap + && abs(p1[0] - p2[0]) < 10 // Reduced gap + && abs(p1[1] - p2[1]) > (b1.width() + b2.width()) / 4) // Consistent gap { unsigned k = 1; short separators = 0; @@ -311,34 +337,33 @@ int main(int argc, char** argv) { typedef value::label_16 V; typedef image2d<V> L; - - std::ostringstream path; - image2d<value::rgb8> original, ima_links, ima_groups, ima_valid; - image2d<value::int_u8> filtered; - image2d<bool> bin, reverse, reverse_selection, bin_merged, separators, bin_without_separators, whitespaces, comp, denoised; - scribo::component_set< image2d<unsigned> > components, rcomponents; - - unsigned dpi = 72; + typedef image2d<value::rgb8> I8; + typedef image2d<bool> IB; + typedef scribo::component_set< image2d<unsigned> > CS; // Loading and binarization XML* xml = new XML("final.xml", argv[1]); util::array< image2d<value::rgb8> > pdf; + unsigned dpi = 72; io::pdf::load(pdf, argv[1], dpi); + // Iterate over all pages for (unsigned page = 0; page < pdf.nelements(); ++page) { - original = pdf[page]; - filtered = data::transform(original, fun::v2v::rgb_to_luma<value::int_u8>()); - bin = scribo::binarization::sauvola(filtered, 81, 0.44); + I8 original = pdf[page]; + image2d<value::int_u8> filtered = data::transform(original, fun::v2v::rgb_to_luma<value::int_u8>()); + IB bin = scribo::binarization::sauvola(filtered, 81, 0.44); + //IB bin = scribo::binarization::sauvola_ms(filtered, 81, 2); // Reverse selection - reverse = logical::not_(bin); + IB reverse = logical::not_(bin); + IB reverse_selection; initialize(reverse_selection, reverse); data::fill(reverse_selection, false); unsigned nrcomponents; - rcomponents = scribo::primitive::extract::components(reverse, c8(), nrcomponents); + CS rcomponents = scribo::primitive::extract::components(reverse, c8(), nrcomponents); for (unsigned i = 1; i < rcomponents.nelements(); ++i) { @@ -352,8 +377,8 @@ int main(int argc, char** argv) reverse_selection = scribo::preprocessing::denoise_fg(reverse_selection, c8(), 4); // Find separators - bin_without_separators = duplicate(bin); - separators = separators; + IB bin_without_separators = duplicate(bin); + IB separators = separators; V nhlines, nvlines; unsigned min_width = 31; unsigned min_height = 71; @@ -366,14 +391,15 @@ int main(int argc, char** argv) data::fill((bin_without_separators | vlines(i).bbox()).rw(), false); // Denoise - denoised = scribo::preprocessing::denoise_fg(bin_without_separators, c8(), 4); + IB denoised = scribo::preprocessing::denoise_fg(bin_without_separators, c8(), 4); // Bin merged - bin_merged = logical::or_(denoised, reverse_selection); + IB bin_merged = logical::or_(denoised, reverse_selection); // Extract components unsigned ncomponents; - components = scribo::primitive::extract::components(bin_merged, c8(), ncomponents); + CS components = scribo::primitive::extract::components(bin_merged, c8(), ncomponents); + IB comp; initialize(comp, bin_merged); data::fill(comp, false); @@ -398,9 +424,11 @@ int main(int argc, char** argv) // Filter links scribo::object_links< image2d<unsigned> > hratio_filtered_links = scribo::filter::object_links_bbox_h_ratio(merged_links, 2.5f); - ima_links = data::convert(value::rgb8(), bin_merged); - ima_groups = data::convert(value::rgb8(), bin_merged); - ima_valid = data::convert(value::rgb8(), bin_merged); + IB tmp = logical::and_(bin_merged, comp); + + I8 ima_links = data::convert(value::rgb8(), tmp); + I8 ima_groups = data::convert(value::rgb8(), tmp); + I8 ima_valid = data::convert(value::rgb8(), tmp); // Write links for (unsigned l = 1; l < merged_links.nelements(); ++l) @@ -442,25 +470,10 @@ int main(int argc, char** argv) draw_links_bt(groups, ima_groups, balance, average_width, hlines); draw_links_lr(groups, ima_groups, balance, vlines); draw_links_rl(groups, ima_groups, balance, vlines); - - // Draw weighted boxes (red < orange < cyan < green) - // 1 link < 2 links < 3 links < 3+ links - for (unsigned i = 0; i < balance.size(); ++i) - { - if (balance[i] == 1) - draw::box(ima_valid, groups(i).bbox(), literal::red); - - if (balance[i] == 2) - draw::box(ima_valid, groups(i).bbox(), literal::orange); - - if (balance[i] == 3) - draw::box(ima_valid, groups(i).bbox(), literal::cyan); - - if (balance[i] > 3) - draw::box(ima_valid, groups(i).bbox(), literal::green); - } + draw_adjacency_boxes(balance, ima_valid, groups); // Write images and close XML + std::ostringstream path; unsigned number = 0; write_image(bin, "bin", page, number, path); -- 1.7.2.5
11Â years, 7Â months
1
0
0
0
olena: olena-2.0-566-g6117c48 Fix various issues in the table detection experiment.
by Roland Levillain
* scribo/sandbox/icdar_13_table/src/new.cc: Use the multiscale version of the Sauvola binarization. (main): Remove an unused variable. Remove superfluous `return' statement. --- scribo/sandbox/icdar_13_table/src/new.cc | 7 +------ 1 files changed, 1 insertions(+), 6 deletions(-) diff --git a/scribo/sandbox/icdar_13_table/src/new.cc b/scribo/sandbox/icdar_13_table/src/new.cc index b13696e..02e7594 100644 --- a/scribo/sandbox/icdar_13_table/src/new.cc +++ b/scribo/sandbox/icdar_13_table/src/new.cc @@ -26,7 +26,6 @@ #include <tesseract/baseapi.h> // INCLUDES SCRIBO -#include <scribo/binarization/sauvola.hh> #include <scribo/binarization/sauvola_ms.hh> #include <scribo/core/component_set.hh> @@ -353,8 +352,7 @@ int main(int argc, char** argv) { I8 original = pdf[page]; image2d<value::int_u8> filtered = data::transform(original, fun::v2v::rgb_to_luma<value::int_u8>()); - IB bin = scribo::binarization::sauvola(filtered, 81, 0.44); - //IB bin = scribo::binarization::sauvola_ms(filtered, 81, 2); + IB bin = scribo::binarization::sauvola_ms(filtered, 81, 2); // Reverse selection IB reverse = logical::not_(bin); @@ -378,7 +376,6 @@ int main(int argc, char** argv) // Find separators IB bin_without_separators = duplicate(bin); - IB separators = separators; V nhlines, nvlines; unsigned min_width = 31; unsigned min_height = 71; @@ -489,6 +486,4 @@ int main(int argc, char** argv) } delete xml; - - return 0; } -- 1.7.2.5
11Â years, 7Â months
1
0
0
0
olena: olena-2.0-567-g5ead1be Build spatial relations between text boxes as directed graphs.
by Roland Levillain
* scribo/sandbox/icdar_13_table/src/new.cc: Here. --- scribo/sandbox/icdar_13_table/src/new.cc | 89 ++++++++++++++++++++++++++++++ 1 files changed, 89 insertions(+), 0 deletions(-) diff --git a/scribo/sandbox/icdar_13_table/src/new.cc b/scribo/sandbox/icdar_13_table/src/new.cc index 02e7594..2c41e31 100644 --- a/scribo/sandbox/icdar_13_table/src/new.cc +++ b/scribo/sandbox/icdar_13_table/src/new.cc @@ -469,6 +469,95 @@ int main(int argc, char** argv) draw_links_rl(groups, ima_groups, balance, vlines); draw_adjacency_boxes(balance, ima_valid, groups); + /* FIXME: The code below duplicates some of the code in the + draw_links_* routines. Factor. */ + // Adjacencies between nodes. Of course, an actual digraph data + // structure would be better. + typedef unsigned node_id; + typedef std::set<node_id> group_set; + typedef std::vector<group_set> adjacencies; + adjacencies nodes_below(groups.nelements()); + adjacencies nodes_above(groups.nelements()); + adjacencies nodes_right(groups.nelements()); + adjacencies nodes_left(groups.nelements()); + + // Draw vertical links (red) + for (unsigned i = 1; i < groups.nelements(); ++i) + { + for (unsigned j = i + 1; j < groups.nelements(); ++j) + { + const box2d& b1 = groups(i).bbox(); + const box2d& b2 = groups(j).bbox(); + const point2d& p1 = b1.pcenter(); + const point2d& p2 = b2.pcenter(); + + unsigned max_height = std::max(b1.height(), b2.height()); + unsigned min_height = std::min(b1.height(), b2.height()); + + if (/* p1[0] < p2[0] // Avoid redundancy + && */ + max_height * 2 < bin_merged.ncols() + && min_height + 3 >= max_height // Same heights + && b1.width() < 2 * average_width && b2.width() < 2 * average_width // Regular width + && (b1.pmin()[1] == b2.pmin()[1] + || (b1.pmin()[1] < b2.pmin()[1] && b1.pmax()[1] > b2.pmin()[1]) + || (b1.pmin()[1] > b2.pmin()[1] && b2.pmax()[1] > b1.pmin()[1])) // Boxes are aligned + && abs(p1[0] - p2[0]) < 3 * max_height // Reduced gap + && abs(p1[1] - p2[1]) < 20) // Vertical proximity + { + // Build the above/below adjacencies. + node_id top_node, bottom_node; + if (p1.row() < p2.col()) + { + top_node = i; + bottom_node = j; + } + else + { + top_node = j; + bottom_node = i; + } + nodes_below[top_node].insert(bottom_node); + nodes_above[bottom_node].insert(top_node); + } + } + } + + // Draw horizontal links (green) + for (unsigned i = 1; i < groups.nelements(); ++i) + { + for (unsigned j = i + 1; j < groups.nelements(); ++j) + { + const box2d& b1 = groups(i).bbox(); + const box2d& b2 = groups(j).bbox(); + const point2d& p1 = b1.pcenter(); + const point2d& p2 = b2.pcenter(); + + if (/* p1[1] < p2[1] // Avoid redundancy + && */ + (b1.pmin()[0] == b2.pmin()[0] + || (b1.pmin()[0] < b2.pmin()[0] && b1.pmax()[0] > b2.pmin()[0]) + || (b1.pmin()[0] > b2.pmin()[0] && b2.pmax()[0] > b1.pmin()[0])) // Boxes are aligned + && abs(p1[0] - p2[0]) < 10) // Reduced gap + { + // Build the right/left adjacencies. + node_id left_node, right_node; + if (p1.col() < p2.col()) + { + left_node = i; + right_node = j; + } + else + { + left_node = j; + right_node = i; + } + nodes_right[left_node].insert(right_node); + nodes_left[right_node].insert(left_node); + } + } + } + // Write images and close XML std::ostringstream path; unsigned number = 0; -- 1.7.2.5
11Â years, 7Â months
1
0
0
0
← Newer
1
...
70
71
72
73
74
75
76
...
963
Older →
Jump to page:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
Results per page:
10
25
50
100
200