LRE
Sign In
Sign Up
Sign In
Sign Up
Manage this list
×
Keyboard Shortcuts
Thread View
j
: Next unread message
k
: Previous unread message
j a
: Jump to all threads
j l
: Jump to MailingList overview
2024
December
November
October
September
August
July
June
May
April
March
February
January
2023
December
November
October
September
August
July
June
May
April
March
February
January
2022
December
November
October
September
August
July
June
May
April
March
February
January
2021
December
November
October
September
August
July
June
May
April
March
February
January
2020
December
November
October
September
August
July
June
May
April
March
February
January
2019
December
November
October
September
August
July
June
May
April
March
February
January
2018
December
November
October
September
August
July
June
May
April
March
February
January
2017
December
November
October
September
August
July
June
May
April
March
February
January
2016
December
November
October
September
August
July
June
May
April
March
February
January
2015
December
November
October
September
August
July
June
May
April
March
February
January
2014
December
November
October
September
August
July
June
May
April
March
February
January
2013
December
November
October
September
August
July
June
May
April
March
February
January
2012
December
November
October
September
August
July
June
May
April
March
February
January
2011
December
November
October
September
August
July
June
May
April
March
February
January
2010
December
November
October
September
August
July
June
May
April
March
February
January
2009
December
November
October
September
August
July
June
May
April
March
February
January
2008
December
November
October
September
August
July
June
May
April
March
February
January
2007
December
November
October
September
August
July
June
May
April
March
February
January
2006
December
November
October
September
August
July
June
May
April
March
February
January
2005
December
November
October
September
August
July
June
May
April
March
February
January
2004
December
November
October
September
August
July
June
May
April
March
List overview
Download
Olena-patches
March 2011
----- 2024 -----
December 2024
November 2024
October 2024
September 2024
August 2024
July 2024
June 2024
May 2024
April 2024
March 2024
February 2024
January 2024
----- 2023 -----
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
----- 2022 -----
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
----- 2021 -----
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
----- 2020 -----
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
----- 2019 -----
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
----- 2018 -----
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
----- 2017 -----
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
----- 2016 -----
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
----- 2015 -----
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
----- 2014 -----
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
----- 2013 -----
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
----- 2012 -----
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
----- 2011 -----
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
----- 2010 -----
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
----- 2009 -----
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
----- 2008 -----
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
----- 2007 -----
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
----- 2006 -----
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
----- 2005 -----
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
----- 2004 -----
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
olena-patches@lrde.epita.fr
7 participants
277 discussions
Start a n
N
ew thread
last-svn-commit-783-gb3c289f scribo/text/extract_lines.hh: Update code.
by Guillaume Lazzara
--- scribo/ChangeLog | 4 ++ scribo/scribo/text/extract_lines.hh | 68 ++++++++++++++++++++++------------ 2 files changed, 48 insertions(+), 24 deletions(-) diff --git a/scribo/ChangeLog b/scribo/ChangeLog index 9abe919..c41b0bb 100644 --- a/scribo/ChangeLog +++ b/scribo/ChangeLog @@ -1,5 +1,9 @@ 2011-03-01 Guillaume Lazzara <z(a)lrde.epita.fr> + * scribo/text/extract_lines.hh: Update code. + +2011-03-01 Guillaume Lazzara <z(a)lrde.epita.fr> + Make use of mln::fun::v2v::rgb_to_luma. * scribo/toolchain/internal/text_in_doc_preprocess_functor.hh, diff --git a/scribo/scribo/text/extract_lines.hh b/scribo/scribo/text/extract_lines.hh index 1a25cc8..b81cb79 100644 --- a/scribo/scribo/text/extract_lines.hh +++ b/scribo/scribo/text/extract_lines.hh @@ -1,5 +1,5 @@ -// Copyright (C) 2009, 2010 EPITA Research and Development Laboratory -// (LRDE) +// Copyright (C) 2009, 2010, 2011 EPITA Research and Development +// Laboratory (LRDE) // // This file is part of Olena. // @@ -33,13 +33,14 @@ # include <mln/core/concept/image.hh> -# include <mln/value/label_16.hh> +# include <mln/value/int_u16.hh> # include <scribo/core/line_set.hh> # include <scribo/primitive/extract/components.hh> # include <scribo/primitive/link/merge_double_link.hh> +# include <scribo/primitive/link/internal/dmax_width_and_height.hh> # include <scribo/primitive/link/with_single_left_link_dmax_ratio.hh> # include <scribo/primitive/link/with_single_right_link_dmax_ratio.hh> @@ -65,11 +66,18 @@ namespace scribo /*! ** \param[in] input A binary image. ** \param[in] nbh A neighborhood used for labeling. + ** \param[in] seps A binary image with separator information. ** ** \return A set of lines. */ template <typename I, typename N> - line_set<mln_ch_value(I,value::label_16)> + line_set<mln_ch_value(I,value::int_u16)> + extract_lines(const Image<I>& input_, const Neighborhood<N>& nbh_, + const mln_ch_value(I,bool)& separators); + + /// \overload + template <typename I, typename N> + line_set<mln_ch_value(I,value::int_u16)> extract_lines(const Image<I>& input, const Neighborhood<N>& nbh); @@ -77,8 +85,18 @@ namespace scribo template <typename I, typename N> - line_set<mln_ch_value(I,value::label_16)> - extract_lines(const Image<I>& input_, const Neighborhood<N>& nbh_) + line_set<mln_ch_value(I,value::int_u16)> + extract_lines(const Image<I>& input, const Neighborhood<N>& nbh) + { + mln_ch_value(I,bool) seps; + return extract_lines(input, nbh, seps); + } + + + template <typename I, typename N> + line_set<mln_ch_value(I,value::int_u16)> + extract_lines(const Image<I>& input_, const Neighborhood<N>& nbh_, + const mln_ch_value(I,bool)& separators) { trace::entering("scribo::text::extract_lines"); @@ -88,44 +106,46 @@ namespace scribo mln_precondition(input.is_valid()); mln_precondition(nbh.is_valid()); - typedef mln_ch_value(I,value::label_16) L; - /// Finding comps. - value::label_16 ncomps; + typedef mln_ch_value(I,value::int_u16) L; + value::int_u16 ncomps; component_set<L> - comps = scribo::primitive::extract::components(input, nbh, ncomps); + comps = scribo::primitive::extract::components(input, c8(), ncomps); /// First filtering. - component_set<L> filtered_comps - = scribo::filter::components_small(comps, 6); + comps = scribo::filter::components_small(comps, 3); + + /// Use separators. + if (exact(separators).is_valid()) + comps.add_separators(separators); /// Linking potential comps object_links<L> left_link - = primitive::link::with_single_left_link_dmax_ratio(filtered_comps); + = primitive::link::with_single_left_link_dmax_ratio(comps, + primitive::link::internal::dmax_width_and_height(1), + anchor::MassCenter); object_links<L> right_link - = primitive::link::with_single_right_link_dmax_ratio(filtered_comps); - + = primitive::link::with_single_right_link_dmax_ratio(comps, + primitive::link::internal::dmax_width_and_height(1), + anchor::MassCenter); // Validating left and right links. object_links<L> merged_links = primitive::link::merge_double_link(left_link, - right_link); + right_link); - // Remove links if bboxes have too different sizes. - object_links<L> hratio_filtered_links - = filter::object_links_bbox_h_ratio(merged_links, 2.0f); + object_links<L> hratio_filtered_links + = filter::object_links_bbox_h_ratio(merged_links, 2.5f); object_groups<L> groups = primitive::group::from_single_link(hratio_filtered_links); - - line_set<L> line = scribo::make::line_set(groups); - line = text::merging(line); - + line_set<L> lines(groups); + lines = text::merging(lines); trace::exiting("scribo::text::extract_lines"); - return line; + return lines; } # endif // ! MLN_INCLUDE_ONLY -- 1.5.6.5
13 years, 9 months
1
0
0
0
last-svn-commit-782-g7d8b116 Make use of mln::fun::v2v::rgb_to_luma.
by Guillaume Lazzara
* scribo/toolchain/internal/text_in_doc_preprocess_functor.hh, * src/binarization/ppm_sauvola.cc, * src/binarization/ppm_sauvola_ms.cc, * src/binarization/ppm_sauvola_ms_fg.cc, * src/binarization/ppm_sauvola_ms_split.cc, * src/binarization/sauvola.cc, * src/binarization/sauvola_debug.cc, * src/binarization/sauvola_ms.cc, * src/binarization/sauvola_ms_debug.cc, * src/binarization/sauvola_ms_fg.cc, * src/text_in_picture.cc, * src/text_in_picture_neg.cc, * src/text_recognition_in_picture.cc: Here. --- scribo/ChangeLog | 18 ++++++++++++++++++ .../internal/text_in_doc_preprocess_functor.hh | 7 ++++--- scribo/src/binarization/ppm_sauvola.cc | 9 +++++---- scribo/src/binarization/ppm_sauvola_ms.cc | 9 +++++---- scribo/src/binarization/ppm_sauvola_ms_fg.cc | 8 ++++---- scribo/src/binarization/ppm_sauvola_ms_split.cc | 5 ++--- scribo/src/binarization/sauvola.cc | 4 ++-- scribo/src/binarization/sauvola_debug.cc | 8 ++++---- scribo/src/binarization/sauvola_ms.cc | 9 +++++---- scribo/src/binarization/sauvola_ms_debug.cc | 9 +++++---- scribo/src/binarization/sauvola_ms_fg.cc | 8 ++++---- scribo/src/text_in_picture.cc | 7 ++++--- scribo/src/text_in_picture_neg.cc | 5 ++--- scribo/src/text_recognition_in_picture.cc | 9 +++++---- 14 files changed, 69 insertions(+), 46 deletions(-) diff --git a/scribo/ChangeLog b/scribo/ChangeLog index 7d766e9..9abe919 100644 --- a/scribo/ChangeLog +++ b/scribo/ChangeLog @@ -1,5 +1,23 @@ 2011-03-01 Guillaume Lazzara <z(a)lrde.epita.fr> + Make use of mln::fun::v2v::rgb_to_luma. + + * scribo/toolchain/internal/text_in_doc_preprocess_functor.hh, + * src/binarization/ppm_sauvola.cc, + * src/binarization/ppm_sauvola_ms.cc, + * src/binarization/ppm_sauvola_ms_fg.cc, + * src/binarization/ppm_sauvola_ms_split.cc, + * src/binarization/sauvola.cc, + * src/binarization/sauvola_debug.cc, + * src/binarization/sauvola_ms.cc, + * src/binarization/sauvola_ms_debug.cc, + * src/binarization/sauvola_ms_fg.cc, + * src/text_in_picture.cc, + * src/text_in_picture_neg.cc, + * src/text_recognition_in_picture.cc: Here. + +2011-03-01 Guillaume Lazzara <z(a)lrde.epita.fr> + * scribo/io/xml/load.hh: New XML loader. 2011-03-01 Guillaume Lazzara <z(a)lrde.epita.fr> diff --git a/scribo/scribo/toolchain/internal/text_in_doc_preprocess_functor.hh b/scribo/scribo/toolchain/internal/text_in_doc_preprocess_functor.hh index 6a9506b..6c0dd5a 100644 --- a/scribo/scribo/toolchain/internal/text_in_doc_preprocess_functor.hh +++ b/scribo/scribo/toolchain/internal/text_in_doc_preprocess_functor.hh @@ -1,4 +1,5 @@ -// Copyright (C) 2010 EPITA Research and Development Laboratory (LRDE) +// Copyright (C) 2010, 2011 EPITA Research and Development Laboratory +// (LRDE) // // This file is part of Olena. // @@ -29,7 +30,7 @@ #include <mln/core/concept/image.hh> #include <mln/data/transform.hh> #include <mln/data/convert.hh> -#include <mln/fun/v2v/rgb_to_int_u.hh> +#include <mln/fun/v2v/rgb_to_luma.hh> #include <mln/subsampling/antialiased.hh> #include <scribo/binarization/sauvola.hh> @@ -208,7 +209,7 @@ namespace scribo on_new_progress_label("Convert to gray-scale image"); image2d<value::int_u8> intensity_ima = mln::data::transform(input_rgb, - mln::fun::v2v::rgb_to_int_u<8>()); + mln::fun::v2v::rgb_to_luma<value::int_u8>()); on_progress(); diff --git a/scribo/src/binarization/ppm_sauvola.cc b/scribo/src/binarization/ppm_sauvola.cc index f0cd355..4732a02 100644 --- a/scribo/src/binarization/ppm_sauvola.cc +++ b/scribo/src/binarization/ppm_sauvola.cc @@ -1,5 +1,5 @@ -// Copyright (C) 2009, 2010 EPITA Research and Development Laboratory -// (LRDE) +// Copyright (C) 2009, 2010, 2011 EPITA Research and Development +// Laboratory (LRDE) // // This file is part of Olena. // @@ -27,7 +27,7 @@ #include <mln/io/ppm/load.hh> #include <mln/io/pbm/save.hh> #include <mln/data/transform.hh> -#include <mln/fun/v2v/rgb_to_int_u.hh> +#include <mln/fun/v2v/rgb_to_luma.hh> #include <scribo/binarization/sauvola.hh> #include <scribo/debug/usage.hh> @@ -74,7 +74,8 @@ int main(int argc, char *argv[]) // Convert to Gray level image. image2d<value::int_u8> - input_gl = data::transform(input, mln::fun::v2v::rgb_to_int_u<8>()); + input_gl = data::transform(input, + mln::fun::v2v::rgb_to_luma<value::int_u8>()); // Binarize image2d<bool> out = scribo::binarization::sauvola(input_gl, w, k); diff --git a/scribo/src/binarization/ppm_sauvola_ms.cc b/scribo/src/binarization/ppm_sauvola_ms.cc index eb694c2..cff28f9 100644 --- a/scribo/src/binarization/ppm_sauvola_ms.cc +++ b/scribo/src/binarization/ppm_sauvola_ms.cc @@ -1,5 +1,5 @@ -// Copyright (C) 2009, 2010 EPITA Research and Development Laboratory -// (LRDE) +// Copyright (C) 2009, 2010, 2011 EPITA Research and Development +// Laboratory (LRDE) // // This file is part of Olena. // @@ -29,7 +29,7 @@ #include <mln/io/ppm/load.hh> #include <mln/io/pbm/save.hh> #include <mln/data/transform.hh> -#include <mln/fun/v2v/rgb_to_int_u.hh> +#include <mln/fun/v2v/rgb_to_luma.hh> #include <scribo/binarization/sauvola_ms.hh> #include <scribo/debug/usage.hh> @@ -110,7 +110,8 @@ int main(int argc, char *argv[]) // Convert to Gray level image. image2d<value::int_u8> - input_1_gl = data::transform(input_1, mln::fun::v2v::rgb_to_int_u<8>()); + input_1_gl = data::transform(input_1, + mln::fun::v2v::rgb_to_luma<value::int_u8>()); // Binarize image2d<bool> diff --git a/scribo/src/binarization/ppm_sauvola_ms_fg.cc b/scribo/src/binarization/ppm_sauvola_ms_fg.cc index 20f237f..a17ce9f 100644 --- a/scribo/src/binarization/ppm_sauvola_ms_fg.cc +++ b/scribo/src/binarization/ppm_sauvola_ms_fg.cc @@ -1,5 +1,5 @@ -// Copyright (C) 2009, 2010 EPITA Research and Development Laboratory -// (LRDE) +// Copyright (C) 2009, 2010, 2011 EPITA Research and Development +// Laboratory (LRDE) // // This file is part of Olena. // @@ -31,7 +31,7 @@ #include <mln/io/ppm/load.hh> #include <mln/io/pbm/save.hh> #include <mln/data/transform.hh> -#include <mln/fun/v2v/rgb_to_int_u.hh> +#include <mln/fun/v2v/rgb_to_luma.hh> #include <scribo/binarization/sauvola_ms.hh> #include <scribo/preprocessing/split_bg_fg.hh> @@ -122,7 +122,7 @@ int main(int argc, char *argv[]) // Convert to Gray level image. image2d<value::int_u8> - fg_gl = data::transform(fg, mln::fun::v2v::rgb_to_int_u<8>()); + fg_gl = data::transform(fg, mln::fun::v2v::rgb_to_luma<value::int_u8>()); // Binarize image2d<bool> diff --git a/scribo/src/binarization/ppm_sauvola_ms_split.cc b/scribo/src/binarization/ppm_sauvola_ms_split.cc index bb98b38..92a0817 100644 --- a/scribo/src/binarization/ppm_sauvola_ms_split.cc +++ b/scribo/src/binarization/ppm_sauvola_ms_split.cc @@ -1,5 +1,5 @@ -// Copyright (C) 2009, 2010 EPITA Research and Development Laboratory -// (LRDE) +// Copyright (C) 2009, 2010, 2011 EPITA Research and Development +// Laboratory (LRDE) // // This file is part of Olena. // @@ -29,7 +29,6 @@ #include <mln/io/ppm/load.hh> #include <mln/io/pbm/save.hh> #include <mln/data/transform.hh> -#include <mln/fun/v2v/rgb_to_int_u.hh> #include <scribo/binarization/sauvola_ms_split.hh> #include <scribo/debug/usage.hh> diff --git a/scribo/src/binarization/sauvola.cc b/scribo/src/binarization/sauvola.cc index 0273071..e8047ab 100644 --- a/scribo/src/binarization/sauvola.cc +++ b/scribo/src/binarization/sauvola.cc @@ -29,7 +29,7 @@ #include <mln/io/magick/load.hh> #include <mln/io/pbm/save.hh> #include <mln/data/transform.hh> -#include <mln/fun/v2v/rgb_to_int_u.hh> +#include <mln/fun/v2v/rgb_to_luma.hh> #include <scribo/binarization/sauvola.hh> #include <scribo/debug/usage.hh> @@ -77,7 +77,7 @@ int main(int argc, char *argv[]) // Convert to Gray level image. image2d<value::int_u8> - input_1_gl = data::transform(input, mln::fun::v2v::rgb_to_int_u<8>()); + input_1_gl = data::transform(input, mln::fun::v2v::rgb_to_luma<value::int_u8>()); image2d<bool> out = scribo::binarization::sauvola(input_1_gl, w, k); diff --git a/scribo/src/binarization/sauvola_debug.cc b/scribo/src/binarization/sauvola_debug.cc index f723851..0c335a8 100644 --- a/scribo/src/binarization/sauvola_debug.cc +++ b/scribo/src/binarization/sauvola_debug.cc @@ -1,5 +1,5 @@ -// Copyright (C) 2009, 2010 EPITA Research and Development Laboratory -// (LRDE) +// Copyright (C) 2009, 2010, 2011 EPITA Research and Development +// Laboratory (LRDE) // // This file is part of Olena. // @@ -32,7 +32,7 @@ #include <mln/data/convert.hh> #include <mln/data/saturate.hh> -#include <mln/fun/v2v/rgb_to_int_u.hh> +#include <mln/fun/v2v/rgb_to_luma.hh> #include <scribo/binarization/local_threshold.hh> #include <scribo/binarization/sauvola.hh> @@ -116,7 +116,7 @@ int main(int argc, char *argv[]) image2d<value::int_u8> gima = data::transform(input, - mln::fun::v2v::rgb_to_int_u<8>()); + mln::fun::v2v::rgb_to_luma<value::int_u8>()); image2d<bool> diff --git a/scribo/src/binarization/sauvola_ms.cc b/scribo/src/binarization/sauvola_ms.cc index 541b9e5..6d60ab4 100644 --- a/scribo/src/binarization/sauvola_ms.cc +++ b/scribo/src/binarization/sauvola_ms.cc @@ -1,5 +1,5 @@ -// Copyright (C) 2009, 2010 EPITA Research and Development Laboratory -// (LRDE) +// Copyright (C) 2009, 2010, 2011 EPITA Research and Development +// Laboratory (LRDE) // // This file is part of Olena. // @@ -31,7 +31,7 @@ #include <mln/io/magick/load.hh> #include <mln/io/pbm/save.hh> #include <mln/data/transform.hh> -#include <mln/fun/v2v/rgb_to_int_u.hh> +#include <mln/fun/v2v/rgb_to_luma.hh> #include <scribo/binarization/sauvola_ms.hh> #include <scribo/debug/usage.hh> @@ -115,7 +115,8 @@ int main(int argc, char *argv[]) // Convert to Gray level image. image2d<value::int_u8> - input_1_gl = data::transform(input_1, mln::fun::v2v::rgb_to_int_u<8>()); + input_1_gl = data::transform(input_1, + mln::fun::v2v::rgb_to_luma<value::int_u8>()); // Binarize image2d<bool> diff --git a/scribo/src/binarization/sauvola_ms_debug.cc b/scribo/src/binarization/sauvola_ms_debug.cc index 6bf9837..70c1a9a 100644 --- a/scribo/src/binarization/sauvola_ms_debug.cc +++ b/scribo/src/binarization/sauvola_ms_debug.cc @@ -1,5 +1,5 @@ -// Copyright (C) 2009, 2010 EPITA Research and Development Laboratory -// (LRDE) +// Copyright (C) 2009, 2010, 2011 EPITA Research and Development +// Laboratory (LRDE) // // This file is part of Olena. // @@ -29,7 +29,7 @@ #include <mln/io/magick/load.hh> #include <mln/io/pbm/save.hh> #include <mln/data/transform.hh> -#include <mln/fun/v2v/rgb_to_int_u.hh> +#include <mln/fun/v2v/rgb_to_luma.hh> #include <scribo/binarization/sauvola_ms.hh> #include <scribo/debug/usage.hh> @@ -124,7 +124,8 @@ int main(int argc, char *argv[]) // Convert to Gray level image. image2d<value::int_u8> - input_1_gl = data::transform(input_1, mln::fun::v2v::rgb_to_int_u<8>()); + input_1_gl = data::transform(input_1, + mln::fun::v2v::rgb_to_luma<value::int_u8>()); // Binarize. diff --git a/scribo/src/binarization/sauvola_ms_fg.cc b/scribo/src/binarization/sauvola_ms_fg.cc index 4227db4..7ff9321 100644 --- a/scribo/src/binarization/sauvola_ms_fg.cc +++ b/scribo/src/binarization/sauvola_ms_fg.cc @@ -1,5 +1,5 @@ -// Copyright (C) 2009, 2010 EPITA Research and Development Laboratory -// (LRDE) +// Copyright (C) 2009, 2010, 2011 EPITA Research and Development +// Laboratory (LRDE) // // This file is part of Olena. // @@ -31,7 +31,7 @@ #include <mln/io/magick/load.hh> #include <mln/io/pbm/save.hh> #include <mln/data/transform.hh> -#include <mln/fun/v2v/rgb_to_int_u.hh> +#include <mln/fun/v2v/rgb_to_luma.hh> #include <scribo/binarization/sauvola_ms.hh> #include <scribo/preprocessing/split_bg_fg.hh> @@ -102,7 +102,7 @@ int main(int argc, char *argv[]) // Convert to Gray level image. image2d<value::int_u8> - fg_gl = data::transform(fg, mln::fun::v2v::rgb_to_int_u<8>()); + fg_gl = data::transform(fg, mln::fun::v2v::rgb_to_luma<value::int_u8>()); // Binarize image2d<bool> diff --git a/scribo/src/text_in_picture.cc b/scribo/src/text_in_picture.cc index e2f30de..77cf7b6 100644 --- a/scribo/src/text_in_picture.cc +++ b/scribo/src/text_in_picture.cc @@ -40,7 +40,7 @@ #include <mln/value/rgb8.hh> #include <mln/value/label_16.hh> -#include <mln/fun/v2v/rgb_to_int_u.hh> +#include <mln/fun/v2v/rgb_to_luma.hh> #include <mln/subsampling/antialiased.hh> @@ -236,7 +236,8 @@ int main(int argc, char* argv[]) std::cout << "** Using split_bg_fg" << std::endl; image2d<value::rgb8> fg = preprocessing::split_bg_fg(input_rgb, lambda, 32).second(); - intensity_ima = data::transform(fg, mln::fun::v2v::rgb_to_int_u<8>()); + intensity_ima = data::transform(fg, + mln::fun::v2v::rgb_to_luma<value::int_u8>()); t_ = timer_; std::cout << "Foreground extracted. " << t_ << std::endl; @@ -253,7 +254,7 @@ int main(int argc, char* argv[]) timer_.start(); std::cout << "** Using data::transform(intensity)" << std::endl; intensity_ima = data::transform(input_rgb, - mln::fun::v2v::rgb_to_int_u<8>()); + mln::fun::v2v::rgb_to_luma<value::int_u8>()); t_ = timer_; std::cout << "Intensity image " << t_ << std::endl; } diff --git a/scribo/src/text_in_picture_neg.cc b/scribo/src/text_in_picture_neg.cc index c1a4317..8d818e0 100644 --- a/scribo/src/text_in_picture_neg.cc +++ b/scribo/src/text_in_picture_neg.cc @@ -1,4 +1,5 @@ -// Copyright (C) 2010 EPITA Research and Development Laboratory (LRDE) +// Copyright (C) 2010, 2011 EPITA Research and Development Laboratory +// (LRDE) // // This file is part of Olena. // @@ -48,8 +49,6 @@ #include <mln/value/rgb8.hh> #include <mln/value/label_16.hh> -#include <mln/fun/v2v/rgb_to_int_u.hh> - #include <mln/data/wrap.hh> #include <mln/draw/box.hh> diff --git a/scribo/src/text_recognition_in_picture.cc b/scribo/src/text_recognition_in_picture.cc index a5f93a7..8b76f48 100644 --- a/scribo/src/text_recognition_in_picture.cc +++ b/scribo/src/text_recognition_in_picture.cc @@ -1,5 +1,5 @@ -// Copyright (C) 2009, 2010 EPITA Research and Development Laboratory -// (LRDE) +// Copyright (C) 2009, 2010, 2011 EPITA Research and Development +// Laboratory (LRDE) // // This file is part of Olena. // @@ -51,7 +51,7 @@ #include <mln/value/rgb8.hh> #include <mln/value/label_16.hh> -#include <mln/fun/v2v/rgb_to_int_u.hh> +#include <mln/fun/v2v/rgb_to_luma.hh> #include <mln/data/wrap.hh> @@ -253,7 +253,8 @@ int main(int argc, char* argv[]) // Extract foreground image2d<value::rgb8> fg = preprocessing::split_bg_fg(input_rgb, lambda, 32).second(); - intensity_ima = data::transform(fg, mln::fun::v2v::rgb_to_int_u<8>()); + intensity_ima = data::transform(fg, + mln::fun::v2v::rgb_to_luma<value::int_u8>()); // // Perform an initial rotation if needed. // // input_rgb = geom::rotate(input_rgb, -45, literal::black); -- 1.5.6.5
13 years, 9 months
1
0
0
0
last-svn-commit-781-g58cdb6c mln/labeling/fill_holes.hh: Improve speed.
by Guillaume Lazzara
--- milena/ChangeLog | 4 ++++ milena/mln/labeling/fill_holes.hh | 17 +++++++++-------- 2 files changed, 13 insertions(+), 8 deletions(-) diff --git a/milena/ChangeLog b/milena/ChangeLog index edefc8e..2eed916 100644 --- a/milena/ChangeLog +++ b/milena/ChangeLog @@ -1,5 +1,9 @@ 2011-03-01 Guillaume Lazzara <z(a)lrde.epita.fr> + * mln/labeling/fill_holes.hh: Improve speed. + +2011-03-01 Guillaume Lazzara <z(a)lrde.epita.fr> + * mln/fun/v2v/rgb_to_luma.hh: New function for grayscale conversion. diff --git a/milena/mln/labeling/fill_holes.hh b/milena/mln/labeling/fill_holes.hh index e76c489..5e487d9 100644 --- a/milena/mln/labeling/fill_holes.hh +++ b/milena/mln/labeling/fill_holes.hh @@ -1,4 +1,5 @@ -// Copyright (C) 2007, 2008, 2009 EPITA Research and Development Laboratory (LRDE) +// Copyright (C) 2007, 2008, 2009, 2011 EPITA Research and Development +// Laboratory (LRDE) // // This file is part of Olena. // @@ -33,6 +34,8 @@ # include <mln/labeling/background.hh> # include <mln/labeling/compute.hh> +# include <mln/data/transform.hh> + # include <mln/core/image/dmorph/image_if.hh> # include <mln/accu/math/count.hh> @@ -57,7 +60,7 @@ namespace mln /// \see mln::labeling::background /// template <typename I, typename N, typename L> - I + mln_concrete(I) fill_holes(const Image<I>& input, const Neighborhood<N>& nbh, L& nlabels); @@ -66,7 +69,7 @@ namespace mln template <typename I, typename N, typename L> inline - I + mln_concrete(I) fill_holes(const Image<I>& input, const Neighborhood<N>& nbh, L& nlabels) { @@ -77,10 +80,6 @@ namespace mln mln_precondition(exact(input).is_valid()); mln_precondition(exact(nbh).is_valid()); - mln_ch_value(I, bool) output; - initialize(output, input); - data::fill(output, false); - mln_ch_value(I, L) lbls = labeling::background(input, nbh, nlabels); accu::math::count<mln_value(I)> a_; @@ -99,7 +98,9 @@ namespace mln } } - data::fill((output | (pw::value(lbls) != bg_lbl)).rw(), true); + util::array<bool> bg_relbl(arr.nelements(), true); + bg_relbl(bg_lbl) = false; + mln_ch_value(I, bool) output = data::transform(lbls, bg_relbl); trace::exiting("labeling::fill_holes"); return output; -- 1.5.6.5
13 years, 9 months
1
0
0
0
last-svn-commit-780-g011b365 mln/fun/v2v/rgb_to_luma.hh: New function for grayscale conversion.
by Guillaume Lazzara
--- milena/ChangeLog | 5 +++ .../achromatism.hh => mln/fun/v2v/rgb_to_luma.hh} | 32 ++++++++++++------- 2 files changed, 25 insertions(+), 12 deletions(-) copy milena/{sandbox/green/mln/fun/v2v/achromatism.hh => mln/fun/v2v/rgb_to_luma.hh} (67%) diff --git a/milena/ChangeLog b/milena/ChangeLog index c4bed2c..edefc8e 100644 --- a/milena/ChangeLog +++ b/milena/ChangeLog @@ -1,3 +1,8 @@ +2011-03-01 Guillaume Lazzara <z(a)lrde.epita.fr> + + * mln/fun/v2v/rgb_to_luma.hh: New function for grayscale + conversion. + 2011-02-17 Guillaume Lazzara <z(a)lrde.epita.fr> * mln/util/array.hh: Add last() method. diff --git a/milena/sandbox/green/mln/fun/v2v/achromatism.hh b/milena/mln/fun/v2v/rgb_to_luma.hh similarity index 67% copy from milena/sandbox/green/mln/fun/v2v/achromatism.hh copy to milena/mln/fun/v2v/rgb_to_luma.hh index 72b545c..304a798 100644 --- a/milena/sandbox/green/mln/fun/v2v/achromatism.hh +++ b/milena/mln/fun/v2v/rgb_to_luma.hh @@ -1,4 +1,4 @@ -// Copyright (C) 2008, 2009 EPITA Research and Development Laboratory (LRDE) +// Copyright (C) 2011 EPITA Research and Development Laboratory (LRDE) // // This file is part of Olena. // @@ -23,10 +23,10 @@ // exception does not however invalidate any other reasons why the // executable file might be covered by the GNU General Public License. -#ifndef MLN_FUN_V2V_ACHROMATISM_HH -# define MLN_FUN_V2V_ACHROMATISM_HH +#ifndef MLN_FUN_V2V_RGB_TO_LUMA_HH +# define MLN_FUN_V2V_RGB_TO_LUMA_HH -# include <mln/value/rgb8.hh> +# include <mln/value/rgb.hh> namespace mln { @@ -37,22 +37,30 @@ namespace mln namespace v2v { - struct achromatism : public Function_v2v< achromatism > + template <typename T_luma> + struct rgb_to_luma : public Function_v2v< rgb_to_luma<T_luma> > { - typedef float result; + typedef T_luma result; + + template <typename T_rgb> + T_luma operator()(const T_rgb& rgb) const; - float operator()(const value::rgb8 rgb) const; }; + # ifndef MLN_INCLUDE_ONLY - float achromatism::operator()(const value::rgb8 rgb) const + template <typename T_luma> + template <typename T_rgb> + inline + T_luma + rgb_to_luma<T_luma>::operator()(const T_rgb& rgb) const { - return (math::abs(rgb.red() - rgb.green()) - + math::abs(rgb.red() - rgb.blue()) - + math::abs(rgb.green() - rgb.blue()))/3.0; + float luma = 0.299 * rgb.red() + 0.587 * rgb.green() + 0.114 * rgb.blue(); + return unsigned(luma + 0.49999); } + # endif // !MLN_INCLUDE_ONLY } // end of namespace fun::v2v @@ -61,4 +69,4 @@ namespace mln } // end of namespace mln -#endif // ! MLN_FUN_V2V_ACHROMATISM_HH +#endif // ! MLN_FUN_V2V_RGB_TO_LUMA_HH -- 1.5.6.5
13 years, 9 months
1
0
0
0
last-svn-commit-779-g2162f39 scribo/io/xml/load.hh: New XML loader.
by Guillaume Lazzara
--- scribo/ChangeLog | 4 + scribo/scribo/io/xml/load.hh | 525 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 529 insertions(+), 0 deletions(-) create mode 100644 scribo/scribo/io/xml/load.hh diff --git a/scribo/ChangeLog b/scribo/ChangeLog index cf02d73..7d766e9 100644 --- a/scribo/ChangeLog +++ b/scribo/ChangeLog @@ -1,5 +1,9 @@ 2011-03-01 Guillaume Lazzara <z(a)lrde.epita.fr> + * scribo/io/xml/load.hh: New XML loader. + +2011-03-01 Guillaume Lazzara <z(a)lrde.epita.fr> + Make XML output more flexible. * scribo/core/component_info.hh, diff --git a/scribo/scribo/io/xml/load.hh b/scribo/scribo/io/xml/load.hh new file mode 100644 index 0000000..e0f4548 --- /dev/null +++ b/scribo/scribo/io/xml/load.hh @@ -0,0 +1,525 @@ +// Copyright (C) 2011 EPITA Research and Development Laboratory (LRDE) +// +// This file is part of Olena. +// +// Olena is free software: you can redistribute it and/or modify it under +// the terms of the GNU General Public License as published by the Free +// Software Foundation, version 2 of the License. +// +// Olena is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +// General Public License for more details. +// +// You should have received a copy of the GNU General Public License +// along with Olena. If not, see <
http://www.gnu.org/licenses/
>. +// +// As a special exception, you may use this file as part of a free +// software project without restriction. Specifically, if other files +// instantiate templates or use macros or inline functions from this +// file, or you compile this file and link it with other files to produce +// an executable, this file does not by itself cause the resulting +// executable to be covered by the GNU General Public License. This +// exception does not however invalidate any other reasons why the +// executable file might be covered by the GNU General Public License. + +#ifndef SCRIBO_IO_XML_LOAD_HH +# define SCRIBO_IO_XML_LOAD_HH + +/// \file +/// +/// \brief Load document information from XML. + +# include <QtXml> + +# include <libgen.h> +# include <fstream> +# include <sstream> + +# include <map> + +# include <mln/core/image/image2d.hh> + +# include <mln/data/wrap.hh> +# include <mln/value/int_u8.hh> +# include <mln/io/pgm/save.hh> +# include <mln/io/pbm/save.hh> + +# include <scribo/core/document.hh> +# include <scribo/core/component_set.hh> +# include <scribo/core/line_set.hh> +# include <scribo/core/line_info.hh> + +# include <scribo/convert/from_base64.hh> + +namespace scribo +{ + + namespace io + { + + namespace xml + { + using namespace mln; + + /*! \brief Load document information from XML. + + We use a XML Schema part of the PAGE (Page Analysis and Ground + truth Elements) image representation framework. + + This schema was used in the Page Segmentation COMPetition + (PSCOMP) for ICDAR 2009. + + Its XSD file is located here: +
http://schema.primaresearch.org/PAGE/gts/pagecontent/2009-03-16/pagecontent…
+ + */ + template <typename L> + void + load(document<L>& doc, const std::string& input_name); + + +# ifndef MLN_INCLUDE_ONLY + + namespace internal + { + + + enum Mode + { + None, + ComponentSet, + ComponentInfo, + LabeledImage, + SeparatorsImage, + ObjectLinks, + ObjectGroups, + Point, + Link, + Group, + Line, + LineLinks, + LineLink, + TextData, + TextRegion, + CompIdList, + CompId, + Page + }; + + + struct ModeData + { + const char *name; + Mode mode; + }; + + + static const ModeData mode_data[] = { + { "component_set", ComponentSet }, + { "component_info", ComponentInfo }, + { "labeled_image", LabeledImage }, + { "separators_image", SeparatorsImage }, + { "object_links", ObjectLinks }, + { "object_groups", ObjectGroups }, + { "point", Point }, + { "link", Link }, + { "group", Group }, + { "line", Line }, + { "line_links", LineLinks }, + { "line_link", LineLink }, + { "text_data", TextData }, + { "text_region", TextRegion }, + { "compid_list", CompIdList }, + { "compid", CompId }, + { "page", Page }, + { 0, None } + }; + + + template <typename L> + class xml_handler : public QXmlDefaultHandler + { + + typedef mln_ch_value(L,bool) B; + + public: + xml_handler() : current_paragraph_id(1) { lines_data.append(line_info<L>()); } // line info id starts from 1. + + virtual + bool + startElement(const QString& /*namespaceURI*/, const QString& /*localName*/, + const QString& qName, const QXmlAttributes& atts ) + { + mode.push(find_mode(qName)); + + switch (mode.top()) + { + + // Component Set + case ComponentSet: + { + comp_set_data = new scribo::internal::component_set_data<L>(); + // qDebug() << qName << " - atts.value(\"nelements\").toInt() = " << atts.value("nelements").toInt();; + comp_set_data->soft_init(atts.value("nelements").toInt()); + } + break; + + + // Component Info + case ComponentInfo: + { + component_info info(atts.value("id").toInt(), + mln::make::box2d(atts.value("pmin_y").toInt(), + atts.value("pmin_x").toInt(), + atts.value("pmax_y").toInt(), + atts.value("pmax_x").toInt()), + mln::point2d(atts.value("mass_center_y").toInt(), + atts.value("mass_center_x").toInt()), + atts.value("card").toInt()); + + + info.update_tag(component::str2tag(atts.value("tag").toUtf8().constData())); + info.update_type(component::str2type(atts.value("type").toUtf8().constData())); + + comp_set_data->infos_.append(info); + } + break; + + + // Object links + case ObjectLinks: + { + // qDebug() << "object_links created"; + links = object_links<L>(components); + } + break; + + + // Object groups + case ObjectGroups: + { + // qDebug() << "object_groups created"; + groups = object_groups<L>(links); + } + break; + + + // Text data + case TextData: + { + // qDebug() << "TextData"; + + // Reserve space for line data. + lines_data.resize(atts.value("nlines").toInt() + 1); + // qDebug() << "line_set created"; + lines = line_set<L>(groups, lines_data); + llinks = line_links<L>(lines); + par_data = new scribo::internal::paragraph_set_data<L>(llinks, atts.value("nparagraphs").toInt()); + // std::cout << par_data->pars_.nelements() << " - " << llinks.nelements() << " - " << lines.nelements() << std::endl; + } + break; + + + // Text Region + case TextRegion: + { + // qDebug() << "TextRegion"; + + current_paragraph = paragraph_info<L>(llinks); + } + break; + + + // Line link + case LineLink: + { + llinks(atts.value("from").toInt()) = atts.value("to").toInt(); + } + break; + + + // Line + case Line: + { + current_line_id = atts.value("id").toInt(); + + line_data = new scribo::internal::line_info_data<L>(lines, mln::util::array<component_id_t>()); + + line_data->holder_ = lines; + line_data->text_ = atts.value("text").toUtf8().constData(); + + line_data->hidden_ = false; + line_data->tag_ = static_cast<line::Tag>(atts.value("tag").toInt()); + + line_data->baseline_ = atts.value("baseline").toInt(); + line_data->meanline_ = atts.value("meanline").toInt(); + line_data->x_height_ = atts.value("x_height").toInt(); + line_data->d_height_ = atts.value("d_height").toInt(); + line_data->a_height_ = atts.value("a_height").toInt(); + line_data->char_space_ = atts.value("kerning").toInt(); + line_data->char_width_ = atts.value("char_width").toInt(); + line_data->word_space_ = 0; + + line_data->reading_direction_ = line::LeftToRight; + line_data->type_ = line::str2type(atts.value("txt_text_type").toAscii().constData()); + line_data->reverse_video_ = (atts.value("txt_reverse_video") == "false" ? false : true); + line_data->orientation_ = 0; + line_data->reading_orientation_ = atts.value("txt_reading_orientation").toInt(); + line_data->indented_ = (atts.value("txt_indented") == "false" ? false : true); + + bbox.init(); + } + break; + + + // CompIdList + case CompIdList: + { + + } + break; + + + // CompId + case CompId: + { + line_data->components_.append(atts.value("value").toInt()); + } + break; + + + // Point + case Point: + { + point2d p(atts.value("y").toInt(), atts.value("x").toInt()); + bbox.take(p); + } + break; + + + // Labeled Image + case LabeledImage: + { + width = atts.value("width").toInt(); + height = atts.value("height").toInt(); + comp_set_data->ima_ = L(mln::make::box2d(height, width), 0); // No border + } + break; + + + // Separator Image + case SeparatorsImage: + { + width = atts.value("width").toInt(); + height = atts.value("height").toInt(); + comp_set_data->separators_ = B(mln::make::box2d(height, width), 0); // No border + } + break; + + + // Link + case Link: + { + links(atts.value("from").toInt()) = atts.value("to").toInt(); + } + break; + + + // Group + case Group: + { + groups(atts.value("object_id").toInt()) = atts.value("group_id").toInt(); + } + break; + + + // DEFAULT + default: + ; + } + + return true; + } + + + virtual + bool + endElement(const QString& /*namespaceURI*/, const QString& /*localName*/, const QString& /*qName*/) + { + switch(mode.top()) + { + // Component set + case ComponentSet: + { + // qDebug() << "Component set done"; + components = component_set<L>(comp_set_data); + } + break; + + // Line + case Line: + { + // qDebug() << "Line done"; + line_data->bbox_ = bbox; + lines_data(current_line_id) = line_info<L>(current_line_id, line_data); + lines_data(current_line_id).update_ebbox(); + + // Add this line to the current paragraph. + current_paragraph.add_line(lines_data(current_line_id)); + } + break; + + // TextRegion + case TextRegion: + { + // qDebug() << TextRegion; + par_data->pars_(current_paragraph_id++) = current_paragraph; + } + break; + + // Page + case Page: + { + // qDebug() << "Page done"; + lines.update_line_data_(lines_data); + parset = paragraph_set<L>(par_data); + } + break; + + // DEFAULT + default: + ; + + } + + mode.pop(); + return true; + } + + + + bool characters(const QString & ch) + { + switch (mode.top()) + { + case LabeledImage: + { + scribo::convert::from_base64(ch, comp_set_data->ima_); + } + break; + + case SeparatorsImage: + { + scribo::convert::from_base64(ch, comp_set_data->separators_); + } + break; + + default: + ; + } + + return true; + } + + +// private: // Methods + + Mode find_mode(const QString& qName) + { + for (int i = 0; mode_data[i].name; ++i) + if (mode_data[i].name == qName) + return mode_data[i].mode; + return None; + } + + +// private: // Attributes + + QStack<Mode> mode; + + // Shape + accu::shape::bbox<point2d> bbox; + + unsigned width; + unsigned height; + + // Components + mln::util::tracked_ptr<scribo::internal::component_set_data<L> > comp_set_data; + component_set<L> components; + + object_links<L> links; + object_groups<L> groups; + + // Lines + unsigned current_line_id; + scribo::internal::line_info_data<L> *line_data; + + line_links<L> llinks; + + unsigned current_paragraph_id; + paragraph_info<L> current_paragraph; + scribo::internal::paragraph_set_data<L> *par_data; + paragraph_set<L> parset; + + mln::util::array<line_info<L> > lines_data; + line_set<L> lines; + }; + + + + + + + + + + + template <typename L> + void + load_extended(document<L>& doc, + const std::string& output_name) + { + xml_handler<L> handler; + QXmlSimpleReader reader; + reader.setContentHandler(&handler); + + QFile file(output_name.c_str()); + if (!file.open(QFile::ReadOnly | QFile::Text)) + { + qDebug() << "Cannot read file"; + return; + } + + QXmlInputSource xmlInputSource(&file); + if (reader.parse(xmlInputSource)) + qDebug() << "Loaded successfuly"; + + doc.set_paragraphs(handler.parset); + } + + } // end of namespace scribo::io::xml::internal + + + // FACADE + + template <typename L> + void + load(document<L>& doc, + const std::string& output_name) + { + internal::load_extended(doc, output_name); + } + + +# endif // ! MLN_INCLUDE_ONLY + + } // end of namespace scribo::io::xml + + } // end of namespace scribo::io + +} // end of namespace scribo + + +#endif // ! SCRIBO_IO_XML_LOAD_HH + -- 1.5.6.5
13 years, 9 months
1
0
0
0
last-svn-commit-778-g3d3a249 Make XML output more flexible.
by Guillaume Lazzara
* scribo/core/component_info.hh, * scribo/core/component_set.hh, * scribo/core/document.hh, * scribo/core/line_info.hh, * scribo/core/line_links.hh, * scribo/core/object_groups.hh, * scribo/core/object_links.hh, * scribo/core/paragraph_set.hh: Make these classes serializable. * scribo/core/concept/serializable.hh, * scribo/core/concept/serialize_visitor.hh: New concepts. * scribo/core/internal/doc_xml_serializer.hh: New. Base implementation. * scribo/io/xml/internal/extended_page_xml_visitor.hh, * scribo/io/xml/internal/full_xml_visitor.hh, * scribo/io/xml/internal/page_xml_visitor.hh: New. Visitors producing different XML outputs. * scribo/io/xml/internal/html_markups_replace.hh, * scribo/io/xml/internal/print_box_coords.hh, * scribo/io/xml/internal/print_page_preambule.hh: New. Tools for XML output. * scribo/io/xml/save.hh: Make use of visitors. * scribo/toolchain/internal/content_in_doc_functor.hh: Set default XML output type. * src/content_in_doc.cc: Produce several XML output. --- scribo/ChangeLog | 38 ++ scribo/demo/viewer/runner.cc | 5 +- scribo/scribo/core/component_info.hh | 3 +- scribo/scribo/core/component_set.hh | 7 +- scribo/scribo/core/concept/serializable.hh | 64 +++ scribo/scribo/core/concept/serialize_visitor.hh | 49 +++ scribo/scribo/core/document.hh | 8 +- scribo/scribo/core/internal/doc_xml_serializer.hh | 140 ++++++ scribo/scribo/core/line_info.hh | 21 +- scribo/scribo/core/line_links.hh | 3 +- scribo/scribo/core/object_groups.hh | 4 +- scribo/scribo/core/object_links.hh | 8 +- scribo/scribo/core/paragraph_set.hh | 4 +- .../io/xml/internal/extended_page_xml_visitor.hh | 283 ++++++++++++ scribo/scribo/io/xml/internal/full_xml_visitor.hh | 456 ++++++++++++++++++++ .../scribo/io/xml/internal/html_markups_replace.hh | 97 +++++ scribo/scribo/io/xml/internal/page_xml_visitor.hh | 222 ++++++++++ scribo/scribo/io/xml/internal/print_box_coords.hh | 92 ++++ .../scribo/io/xml/internal/print_page_preambule.hh | 95 ++++ scribo/scribo/io/xml/save.hh | 388 +++-------------- .../toolchain/internal/content_in_doc_functor.hh | 9 +- scribo/src/content_in_doc.cc | 4 +- 22 files changed, 1660 insertions(+), 340 deletions(-) create mode 100644 scribo/scribo/core/concept/serializable.hh create mode 100644 scribo/scribo/core/concept/serialize_visitor.hh create mode 100644 scribo/scribo/core/internal/doc_xml_serializer.hh create mode 100644 scribo/scribo/io/xml/internal/extended_page_xml_visitor.hh create mode 100644 scribo/scribo/io/xml/internal/full_xml_visitor.hh create mode 100644 scribo/scribo/io/xml/internal/html_markups_replace.hh create mode 100644 scribo/scribo/io/xml/internal/page_xml_visitor.hh create mode 100644 scribo/scribo/io/xml/internal/print_box_coords.hh create mode 100644 scribo/scribo/io/xml/internal/print_page_preambule.hh diff --git a/scribo/ChangeLog b/scribo/ChangeLog index 63e3fee..cf02d73 100644 --- a/scribo/ChangeLog +++ b/scribo/ChangeLog @@ -1,5 +1,43 @@ 2011-03-01 Guillaume Lazzara <z(a)lrde.epita.fr> + Make XML output more flexible. + + * scribo/core/component_info.hh, + * scribo/core/component_set.hh, + * scribo/core/document.hh, + * scribo/core/line_info.hh, + * scribo/core/line_links.hh, + * scribo/core/object_groups.hh, + * scribo/core/object_links.hh, + * scribo/core/paragraph_set.hh: Make these classes serializable. + + * scribo/core/concept/serializable.hh, + * scribo/core/concept/serialize_visitor.hh: New concepts. + + * scribo/core/internal/doc_xml_serializer.hh: New. Base + implementation. + + * scribo/io/xml/internal/extended_page_xml_visitor.hh, + * scribo/io/xml/internal/full_xml_visitor.hh, + * scribo/io/xml/internal/page_xml_visitor.hh: New. Visitors + producing different XML outputs. + + * scribo/io/xml/internal/html_markups_replace.hh, + * scribo/io/xml/internal/print_box_coords.hh, + * scribo/io/xml/internal/print_page_preambule.hh: New. Tools for + XML output. + + * scribo/io/xml/save.hh: Make use of visitors. + + * scribo/toolchain/internal/content_in_doc_functor.hh: Set default + XML output type. + + * src/content_in_doc.cc: Produce several XML output. + + * demo/viewer/runner.cc: Update call to io::xml::save. + +2011-03-01 Guillaume Lazzara <z(a)lrde.epita.fr> + Set component type during component extraction. * scribo/core/component_info.hh, diff --git a/scribo/demo/viewer/runner.cc b/scribo/demo/viewer/runner.cc index 86ff5dc..a3cc883 100644 --- a/scribo/demo/viewer/runner.cc +++ b/scribo/demo/viewer/runner.cc @@ -1,4 +1,5 @@ -// Copyright (C) 2010 EPITA Research and Development Laboratory (LRDE) +// Copyright (C) 2010, 2011 EPITA Research and Development Laboratory +// (LRDE) // // This file is part of Olena. // @@ -156,7 +157,7 @@ void runner::process(const image2d<value::rgb8>& original_ima, f.enable_whitespace_seps = (find_seps == defs::Whitespaces || find_seps == defs::LinesAndWhitespaces); - f.allow_xml_extensions = true; + f.xml_format = scribo::io::xml::PageExtended; f.save_doc_as_xml = true; diff --git a/scribo/scribo/core/component_info.hh b/scribo/scribo/core/component_info.hh index 6fc73f8..f825aee 100644 --- a/scribo/scribo/core/component_info.hh +++ b/scribo/scribo/core/component_info.hh @@ -36,6 +36,7 @@ # include <mln/core/alias/point2d.hh> # include <mln/util/object_id.hh> +# include <scribo/core/concept/serializable.hh> # include <scribo/core/tag/component.hh> # include <scribo/core/tag/line.hh> @@ -44,7 +45,7 @@ namespace scribo typedef mln::util::object_id<scribo::ComponentId, unsigned> component_id_t; - class component_info + class component_info : public Serializable<component_info> { typedef mln::util::object_id<scribo::ComponentId, unsigned> component_id_t; diff --git a/scribo/scribo/core/component_set.hh b/scribo/scribo/core/component_set.hh index 442e8d6..a63ed6c 100644 --- a/scribo/scribo/core/component_set.hh +++ b/scribo/scribo/core/component_set.hh @@ -30,6 +30,10 @@ /// \file /// /// \brief Definition of a component set. +/// +/// \fixme component_set should always set a component type in order +/// to be fully supported by visitors. + # include <mln/core/concept/site_set.hh> # include <mln/core/concept/function.hh> @@ -59,6 +63,7 @@ # include <scribo/core/macros.hh> # include <scribo/core/component_info.hh> +# include <scribo/core/concept/serializable.hh> namespace scribo @@ -115,7 +120,7 @@ namespace scribo template <typename L> - class component_set + class component_set : public Serializable<component_set<L> > { typedef mln::accu::shape::bbox<mln_site(L)> bbox_accu_t; typedef mln::accu::center<mln_site(L)> center_accu_t; diff --git a/scribo/scribo/core/concept/serializable.hh b/scribo/scribo/core/concept/serializable.hh new file mode 100644 index 0000000..6e661a6 --- /dev/null +++ b/scribo/scribo/core/concept/serializable.hh @@ -0,0 +1,64 @@ +// Copyright (C) 2011 EPITA Research and Development Laboratory (LRDE) +// +// This file is part of Olena. +// +// Olena is free software: you can redistribute it and/or modify it under +// the terms of the GNU General Public License as published by the Free +// Software Foundation, version 2 of the License. +// +// Olena is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +// General Public License for more details. +// +// You should have received a copy of the GNU General Public License +// along with Olena. If not, see <
http://www.gnu.org/licenses/
>. +// +// As a special exception, you may use this file as part of a free +// software project without restriction. Specifically, if other files +// instantiate templates or use macros or inline functions from this +// file, or you compile this file and link it with other files to produce +// an executable, this file does not by itself cause the resulting +// executable to be covered by the GNU General Public License. This +// exception does not however invalidate any other reasons why the +// executable file might be covered by the GNU General Public License. + +#ifndef SCRIBO_CORE_CONCEPT_SERIALIZABLE_HH +# define SCRIBO_CORE_CONCEPT_SERIALIZABLE_HH + +/// \file +/// +/// Concept for serializer visitors. + +# include <mln/core/concept/object.hh> +# include <scribo/core/concept/serialize_visitor.hh> + +namespace scribo +{ + + /// \brief Link functor concept. + template <typename E> + class Serializable : public mln::Object<E> + { + public: + template <typename E2> + void accept(const SerializeVisitor<E2>& visitor) const; + }; + + +# ifndef MLN_INCLUDE_ONLY + + template <typename E> + template <typename E2> + void + Serializable<E>::accept(const SerializeVisitor<E2>& visitor) const + { + exact(visitor).visit(exact(*this)); + } + +# endif // ! MLN_INCLUDE_ONLY + + +} // end of namespace scribo + +#endif // SCRIBO_CORE_CONCEPT_SERIALIZABLE_HH diff --git a/scribo/scribo/core/concept/serialize_visitor.hh b/scribo/scribo/core/concept/serialize_visitor.hh new file mode 100644 index 0000000..e5e598f --- /dev/null +++ b/scribo/scribo/core/concept/serialize_visitor.hh @@ -0,0 +1,49 @@ +// Copyright (C) 2011 EPITA Research and Development Laboratory (LRDE) +// +// This file is part of Olena. +// +// Olena is free software: you can redistribute it and/or modify it under +// the terms of the GNU General Public License as published by the Free +// Software Foundation, version 2 of the License. +// +// Olena is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +// General Public License for more details. +// +// You should have received a copy of the GNU General Public License +// along with Olena. If not, see <
http://www.gnu.org/licenses/
>. +// +// As a special exception, you may use this file as part of a free +// software project without restriction. Specifically, if other files +// instantiate templates or use macros or inline functions from this +// file, or you compile this file and link it with other files to produce +// an executable, this file does not by itself cause the resulting +// executable to be covered by the GNU General Public License. This +// exception does not however invalidate any other reasons why the +// executable file might be covered by the GNU General Public License. + +#ifndef SCRIBO_CORE_CONCEPT_SERIALIZE_VISITOR_HH +# define SCRIBO_CORE_CONCEPT_SERIALIZE_VISITOR_HH + +/// \file +/// +/// Concept for serializer visitors. + +# include <mln/core/concept/object.hh> + +namespace scribo +{ + + /// \brief Link functor concept. + template <typename E> + class SerializeVisitor : public mln::Object<E> + { + public: + // void visit(..); + }; + + +} // end of namespace scribo + +#endif // SCRIBO_CORE_CONCEPT_SERIALIZE_VISITOR_HH diff --git a/scribo/scribo/core/document.hh b/scribo/scribo/core/document.hh index ef0869e..372f0a4 100644 --- a/scribo/scribo/core/document.hh +++ b/scribo/scribo/core/document.hh @@ -40,13 +40,15 @@ # include <scribo/core/line_set.hh> # include <scribo/core/paragraph_set.hh> +# include <scribo/core/concept/serializable.hh> + # include <scribo/primitive/extract/components.hh> namespace scribo { template <typename L> - struct document + struct document : public Serializable<document<L> > { public: @@ -98,7 +100,7 @@ namespace scribo private: - const char *filename_; + std::string filename_; mln::image2d<mln::value::rgb8> image_; paragraph_set<L> parset_; @@ -142,7 +144,7 @@ namespace scribo const char * document<L>::filename() const { - return filename_; + return filename_.c_str(); } diff --git a/scribo/scribo/core/internal/doc_xml_serializer.hh b/scribo/scribo/core/internal/doc_xml_serializer.hh new file mode 100644 index 0000000..b64c9d4 --- /dev/null +++ b/scribo/scribo/core/internal/doc_xml_serializer.hh @@ -0,0 +1,140 @@ +// Copyright (C) 2011 EPITA Research and Development Laboratory (LRDE) +// +// This file is part of Olena. +// +// Olena is free software: you can redistribute it and/or modify it under +// the terms of the GNU General Public License as published by the Free +// Software Foundation, version 2 of the License. +// +// Olena is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +// General Public License for more details. +// +// You should have received a copy of the GNU General Public License +// along with Olena. If not, see <
http://www.gnu.org/licenses/
>. +// +// As a special exception, you may use this file as part of a free +// software project without restriction. Specifically, if other files +// instantiate templates or use macros or inline functions from this +// file, or you compile this file and link it with other files to produce +// an executable, this file does not by itself cause the resulting +// executable to be covered by the GNU General Public License. This +// exception does not however invalidate any other reasons why the +// executable file might be covered by the GNU General Public License. + +#ifndef SCRIBO_CORE_INTERNAL_DOC_XML_SERIALIZER_HH +# define SCRIBO_CORE_INTERNAL_DOC_XML_SERIALIZER_HH + +/// \file +/// +/// Concept for serializer visitors. + +# include <scribo/core/concept/serialize_visitor.hh> + +# include <scribo/core/document.hh> +# include <scribo/core/component_set.hh> +# include <scribo/core/component_info.hh> +# include <scribo/core/paragraph_set.hh> +# include <scribo/core/object_groups.hh> +# include <scribo/core/object_links.hh> +# include <scribo/core/line_links.hh> +# include <scribo/core/line_info.hh> + +namespace scribo +{ + + /// \brief Link functor concept. + template <typename E> + class doc_xml_serializer : public SerializeVisitor<E> + { + public: + // Visit overloads + template <typename L> + void visit(const document<L>& doc) const; + + template <typename L> + void visit(const line_links<L>& llinks) const; + + template <typename L> + void visit(const object_groups<L>& groups) const; + + template <typename L> + void visit(const object_links<L>& links) const; + + template <typename L> + void visit(const component_set<L>& comp_set) const; + + void visit(const component_info& info) const; + + template <typename L> + void visit(const paragraph_set<L>& parset) const; + + template <typename L> + void visit(const line_info<L>& line) const; + }; + + +# ifndef MLN_INCLUDE_ONLY + + template <typename E> + template <typename L> + void + doc_xml_serializer<E>::visit(const document<L>& doc) const + { + } + + template <typename E> + template <typename L> + void + doc_xml_serializer<E>::visit(const line_links<L>& llinks) const + { + } + + template <typename E> + template <typename L> + void + doc_xml_serializer<E>::visit(const object_groups<L>& groups) const + { + } + + template <typename E> + template <typename L> + void + doc_xml_serializer<E>::visit(const object_links<L>& links) const + { + } + + template <typename E> + template <typename L> + void + doc_xml_serializer<E>::visit(const component_set<L>& comp_set) const + { + } + + template <typename E> + void + doc_xml_serializer<E>::visit(const component_info& info) const + { + } + + template <typename E> + template <typename L> + void + doc_xml_serializer<E>::visit(const paragraph_set<L>& parset) const + { + } + + template <typename E> + template <typename L> + void + doc_xml_serializer<E>::visit(const line_info<L>& line) const + { + } + +# endif // ! MLN_INCLUDE_ONLY + + +} // end of namespace scribo + +#endif // SCRIBO_CORE_INTERNAL_DOC_XML_SERIALIZER_HH diff --git a/scribo/scribo/core/line_info.hh b/scribo/scribo/core/line_info.hh index c82160a..33a1529 100644 --- a/scribo/scribo/core/line_info.hh +++ b/scribo/scribo/core/line_info.hh @@ -53,6 +53,11 @@ # include <scribo/core/line_set.hh> # include <scribo/core/component_set.hh> +# include <scribo/io/xml/internal/html_markups_replace.hh> + +# include <scribo/core/concept/serializable.hh> + + namespace scribo { @@ -114,6 +119,7 @@ namespace scribo bool indented_; std::string text_; + std::string html_text_; // Line set holding this element. line_set<L> holder_; @@ -125,7 +131,7 @@ namespace scribo template <typename L> - class line_info + class line_info : public Serializable<line_info<L> > { typedef internal::line_info_data<L> data_t; typedef mln::util::object_id<scribo::ComponentId, unsigned> component_id_t; @@ -198,6 +204,7 @@ namespace scribo bool has_text() const; const std::string& text() const; + const std::string& html_text() const; void update_text(const std::string& str); bool is_valid() const; @@ -604,6 +611,7 @@ namespace scribo return data_->indented_; } + template <typename L> bool line_info<L>::has_text() const @@ -611,6 +619,7 @@ namespace scribo return !data_->text_.empty(); } + template <typename L> const std::string& line_info<L>::text() const @@ -620,10 +629,19 @@ namespace scribo template <typename L> + const std::string& + line_info<L>::html_text() const + { + return data_->html_text_; + } + + + template <typename L> void line_info<L>::update_text(const std::string& str) { data_->text_ = str; + data_->html_text_ = scribo::io::xml::internal::html_markups_replace(str); } @@ -987,6 +1005,7 @@ namespace scribo << ", indented=" << info.indented() << ", hidden=" << info.is_hidden() << ", text=" << info.text() + << ", html_text=" << info.html_text() << ")" << std::endl; } diff --git a/scribo/scribo/core/line_links.hh b/scribo/scribo/core/line_links.hh index de62158..fdd09a5 100644 --- a/scribo/scribo/core/line_links.hh +++ b/scribo/scribo/core/line_links.hh @@ -34,6 +34,7 @@ # include <mln/util/array.hh> # include <mln/util/tracked_ptr.hh> +# include <scribo/core/concept/serializable.hh> # include <scribo/core/line_set.hh> @@ -69,7 +70,7 @@ namespace scribo /// \brief Line group representation. // template <typename L> - class line_links + class line_links : public Serializable<line_links<L> > { typedef internal::line_links_data<L> data_t; diff --git a/scribo/scribo/core/object_groups.hh b/scribo/scribo/core/object_groups.hh index 9d9fb25..bbfaf6e 100644 --- a/scribo/scribo/core/object_groups.hh +++ b/scribo/scribo/core/object_groups.hh @@ -36,6 +36,8 @@ # include <scribo/core/object_links.hh> # include <scribo/core/component_set.hh> +# include <scribo/core/concept/serializable.hh> + namespace scribo { @@ -69,7 +71,7 @@ namespace scribo /// \brief Object group representation. // template <typename L> - class object_groups + class object_groups : public Serializable<object_groups<L> > { typedef internal::object_groups_data<L> data_t; diff --git a/scribo/scribo/core/object_links.hh b/scribo/scribo/core/object_links.hh index af7dc38..2c2eea1 100644 --- a/scribo/scribo/core/object_links.hh +++ b/scribo/scribo/core/object_links.hh @@ -1,5 +1,5 @@ -// Copyright (C) 2009, 2010 EPITA Research and Development Laboratory -// (LRDE) +// Copyright (C) 2009, 2010, 2011 EPITA Research and Development +// Laboratory (LRDE) // // This file is part of Olena. // @@ -37,6 +37,8 @@ # include <scribo/core/component_set.hh> +# include <scribo/core/concept/serializable.hh> + namespace scribo { @@ -70,7 +72,7 @@ namespace scribo /// \brief Object group representation. // template <typename L> - class object_links + class object_links : public Serializable<object_links<L> > { typedef internal::object_links_data<L> data_t; diff --git a/scribo/scribo/core/paragraph_set.hh b/scribo/scribo/core/paragraph_set.hh index 6597189..5451069 100644 --- a/scribo/scribo/core/paragraph_set.hh +++ b/scribo/scribo/core/paragraph_set.hh @@ -33,6 +33,8 @@ # include <scribo/core/line_set.hh> # include <scribo/core/paragraph_info.hh> +# include <scribo/core/concept/serializable.hh> + namespace scribo { @@ -61,7 +63,7 @@ namespace scribo */ template <typename L> - class paragraph_set + class paragraph_set : public Serializable<paragraph_set<L> > { public: paragraph_set(); diff --git a/scribo/scribo/io/xml/internal/extended_page_xml_visitor.hh b/scribo/scribo/io/xml/internal/extended_page_xml_visitor.hh new file mode 100644 index 0000000..5d8a672 --- /dev/null +++ b/scribo/scribo/io/xml/internal/extended_page_xml_visitor.hh @@ -0,0 +1,283 @@ +// Copyright (C) 2011 EPITA Research and Development Laboratory (LRDE) +// +// This file is part of Olena. +// +// Olena is free software: you can redistribute it and/or modify it under +// the terms of the GNU General Public License as published by the Free +// Software Foundation, version 2 of the License. +// +// Olena is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +// General Public License for more details. +// +// You should have received a copy of the GNU General Public License +// along with Olena. If not, see <
http://www.gnu.org/licenses/
>. +// +// As a special exception, you may use this file as part of a free +// software project without restriction. Specifically, if other files +// instantiate templates or use macros or inline functions from this +// file, or you compile this file and link it with other files to produce +// an executable, this file does not by itself cause the resulting +// executable to be covered by the GNU General Public License. This +// exception does not however invalidate any other reasons why the +// executable file might be covered by the GNU General Public License. + +#ifndef SCRIBO_IO_XML_INTERNAL_EXTENDED_PAGE_XML_VISITOR_HH +# define SCRIBO_IO_XML_INTERNAL_EXTENDED_PAGE_XML_VISITOR_HH + +/// \file +/// +/// Extended XML PAGE format serializer Visitor. + +# include <fstream> +# include <scribo/core/internal/doc_xml_serializer.hh> +# include <scribo/core/document.hh> +# include <scribo/core/component_set.hh> +# include <scribo/core/paragraph_set.hh> +# include <scribo/core/object_groups.hh> +# include <scribo/core/object_links.hh> +# include <scribo/core/line_links.hh> +# include <scribo/core/line_info.hh> + +# include <scribo/convert/to_base64.hh> + +# include <scribo/io/xml/internal/print_box_coords.hh> +# include <scribo/io/xml/internal/print_page_preambule.hh> + + +namespace scribo +{ + + namespace io + { + + namespace xml + { + + namespace internal + { + + + class extended_page_xml_visitor : public doc_xml_serializer<extended_page_xml_visitor> + { + public: + // Constructor + extended_page_xml_visitor(std::ofstream& out); + + // Visit overloads + template <typename L> + void visit(const document<L>& doc) const; + + template <typename L> + void visit(const component_set<L>& comp_set) const; + + void visit(const component_info& info) const; + + template <typename L> + void visit(const paragraph_set<L>& parset) const; + + template <typename L> + void visit(const line_info<L>& line) const; + + private: // Attributes + std::ofstream& output; + }; + + + +# ifndef MLN_INCLUDE_ONLY + + + inline + extended_page_xml_visitor::extended_page_xml_visitor(std::ofstream& out) + : output(out) + { + } + + + + /// Document + // + template <typename L> + void + extended_page_xml_visitor::visit(const document<L>& doc) const + { + // Preambule + print_PAGE_preambule(output, doc, false); + + // Text + if (doc.has_text()) + doc.paragraphs().accept(*this); + + + // Page elements (Pictures, ...) + if (doc.has_elements()) + doc.elements().accept(*this); + + // Whitespace seraparators + if (doc.has_whitespace_seps()) + doc.whitespace_seps_comps().accept(*this); + + output << " </page>" << std::endl; + output << "</pcGts>" << std::endl; + + } + + /// Component Set + // + template <typename L> + void + extended_page_xml_visitor::visit(const component_set<L>& comp_set) const + { + for_all_comps(c, comp_set) + if (comp_set(c).is_valid()) + comp_set(c).accept(*this); + } + + + /// Component_info + // + inline + void + extended_page_xml_visitor::visit(const component_info& info) const + { + switch (info.type()) + { + case component::WhitespaceSeparator: + { + output << " <whitespace_separator_region id=\"wss" + << info.id() + << "\">" << std::endl; + + internal::print_box_coords(output, info.bbox(), " "); + + output << " </whitespace_separator_region>" << std::endl; + break; + } + + case component::LineSeparator: + { + output << " <separator_region id=\"sr" << info.id() + << "\" sep_orientation=\"0.000000\" " + << " sep_colour=\"Black\" " + << " sep_bgcolour=\"White\">" << std::endl; + + internal::print_box_coords(output, info.bbox(), " "); + + output << " </separator_region>" << std::endl; + break; + } + + + default: + case component::Image: + { + output << " <image_region id=\"ir" << info.id() + << "\" img_colour_type=\"24_Bit_Colour\"" + << " img_orientation=\"0.000000\" " + << " img_emb_text=\"No\" " + << " img_bgcolour=\"White\">" << std::endl; + + internal::print_box_coords(output, info.bbox(), " "); + + output << " </image_region>" << std::endl; + break; + } + } + } + + + /// Paragraph Set + // + template <typename L> + void + extended_page_xml_visitor::visit(const paragraph_set<L>& parset) const + { + const line_set<L>& lines = parset.lines(); + + for_all_paragraphs(p, parset) + { + const mln::util::array<line_id_t>& line_ids = parset(p).line_ids(); + + // FIXME: compute that information on the whole paragraph + // and use them here. + line_id_t fid = line_ids(0); + output << " <text_region id=\"" << p + << "\" txt_orientation=\"" << lines(fid).orientation() + << "\" txt_reading_orientation=\"" << lines(fid).reading_orientation() + << "\" txt_reading_direction=\"" << lines(fid).reading_direction() + << "\" txt_text_type=\"" << lines(fid).type() + << "\" txt_reverse_video=\"" << (lines(fid).reverse_video() ? "true" : "false") + << "\" txt_indented=\"" << (lines(fid).indented() ? "true" : "false") + << "\" kerning=\"" << lines(fid).char_space(); + + // EXTENSIONS - Not officially supported + output << "\" baseline=\"" << lines(fid).baseline() + << "\" meanline=\"" << lines(fid).meanline() + << "\" x_height=\"" << lines(fid).x_height() + << "\" d_height=\"" << lines(fid).d_height() + << "\" a_height=\"" << lines(fid).a_height() + << "\" char_width=\"" << lines(fid).char_width(); + // End of EXTENSIONS + output << "\">" + << std::endl; + + internal::print_box_coords(output, parset(p).bbox(), " "); + + // EXTENSIONS - Not officially supported + for_all_paragraph_lines(lid, line_ids) + { + line_id_t l = line_ids(lid); + lines(l).accept(*this); + } + // End of EXTENSIONS + + output << " </text_region>" << std::endl; + } + } + + + template <typename L> + void + extended_page_xml_visitor::visit(const line_info<L>& line) const + { + if (line.has_text()) + { + output << " <line text=\"" << line.html_text() << "\" "; + } + else + output << " <line " << std::endl; + + output << "id=\"" << line.id() + << "\" txt_orientation=\"" << line.orientation() + << "\" txt_reading_orientation=\"" << line.reading_orientation() + << "\" txt_reading_direction=\"" << line.reading_direction() + << "\" txt_text_type=\"" << line.type() + << "\" txt_reverse_video=\"" << (line.reverse_video() ? "true" : "false") + << "\" txt_indented=\"" << (line.indented() ? "true" : "false") + << "\" kerning=\"" << line.char_space() + << "\" baseline=\"" << line.baseline() + << "\" meanline=\"" << line.meanline() + << "\" x_height=\"" << line.x_height() + << "\" d_height=\"" << line.d_height() + << "\" a_height=\"" << line.a_height() + << "\" char_width=\"" << line.char_width() + << "\">" << std::endl; + + internal::print_box_coords(output, line.bbox(), " "); + + output << " </line>" << std::endl; + } + +#endif // MLN_INCLUDE_ONLY + + } // end of namespace scribo::io::xml::internal + + } // end of namespace scribo::io::xml + + } // end of namespace scribo::io + +} // end of namespace scribo + +#endif // SCRIBO_IO_XML_INTERNAL_EXTENDED_PAGE_XML_VISITOR_HH diff --git a/scribo/scribo/io/xml/internal/full_xml_visitor.hh b/scribo/scribo/io/xml/internal/full_xml_visitor.hh new file mode 100644 index 0000000..9c5bd1d --- /dev/null +++ b/scribo/scribo/io/xml/internal/full_xml_visitor.hh @@ -0,0 +1,456 @@ +// Copyright (C) 2011 EPITA Research and Development Laboratory (LRDE) +// +// This file is part of Olena. +// +// Olena is free software: you can redistribute it and/or modify it under +// the terms of the GNU General Public License as published by the Free +// Software Foundation, version 2 of the License. +// +// Olena is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +// General Public License for more details. +// +// You should have received a copy of the GNU General Public License +// along with Olena. If not, see <
http://www.gnu.org/licenses/
>. +// +// As a special exception, you may use this file as part of a free +// software project without restriction. Specifically, if other files +// instantiate templates or use macros or inline functions from this +// file, or you compile this file and link it with other files to produce +// an executable, this file does not by itself cause the resulting +// executable to be covered by the GNU General Public License. This +// exception does not however invalidate any other reasons why the +// executable file might be covered by the GNU General Public License. + +#ifndef SCRIBO_IO_XML_INTERNAL_FULL_XML_VISITOR_HH +# define SCRIBO_IO_XML_INTERNAL_FULL_XML_VISITOR_HH + +/// \file +/// +/// XML serializer Visitor. + +# include <fstream> +# include <scribo/core/internal/doc_xml_serializer.hh> +# include <scribo/core/document.hh> +# include <scribo/core/component_set.hh> +# include <scribo/core/paragraph_set.hh> +# include <scribo/core/object_groups.hh> +# include <scribo/core/object_links.hh> +# include <scribo/core/line_links.hh> +# include <scribo/core/line_info.hh> + +# include <scribo/convert/to_base64.hh> + +# include <scribo/io/xml/internal/print_box_coords.hh> +# include <scribo/io/xml/internal/print_page_preambule.hh> + + +namespace scribo +{ + + namespace io + { + + namespace xml + { + + namespace internal + { + + + class full_xml_visitor : public doc_xml_serializer<full_xml_visitor> + { + public: + // Constructor + full_xml_visitor(std::ofstream& out); + + // Visit overloads + template <typename L> + void visit(const document<L>& doc) const; + + template <typename L> + void visit(const line_links<L>& llinks) const; + + template <typename L> + void visit(const object_groups<L>& groups) const; + + template <typename L> + void visit(const object_links<L>& links) const; + + template <typename L> + void visit(const component_set<L>& comp_set) const; + + void visit(const component_info& info) const; + + template <typename L> + void visit(const paragraph_set<L>& parset) const; + + template <typename L> + void visit(const line_info<L>& line) const; + + private: // Attributes + std::ofstream& output; + }; + + + +# ifndef MLN_INCLUDE_ONLY + + + inline + full_xml_visitor::full_xml_visitor(std::ofstream& out) + : output(out) + { + } + + + + /// Document + // + template <typename L> + void + full_xml_visitor::visit(const document<L>& doc) const + { + print_PAGE_preambule(output, doc, false); + + // Text + if (doc.has_text()) + { + const line_set<L>& lines = doc.lines(); + + // Save component/link/group information (Extension) + { + // Component set + lines.components().accept(*this); + + // Object link + lines.links().accept(*this); + + // Object group + lines.groups().accept(*this); + } + // End of EXTENSIONS + + const paragraph_set<L>& parset = doc.paragraphs(); + + // Save paragraphs related information (Extension) + { + // General text information + output << " <text_data nlines=\"" << lines.nelements() << "\" " + << " nparagraphs=\"" << parset.nelements() << "\" />" << std::endl; + + // line_links + parset.links().accept(*this); + } + + // Paragraph and lines + parset.accept(*this); + } + + + // Page elements (Pictures, ...) + if (doc.has_elements()) + { + const component_set<L>& elts = doc.elements(); + for_all_comps(e, elts) + if (elts(e).is_valid()) + elts(e).accept(*this); + } + + + // line seraparators + if (doc.has_line_seps()) + { + const component_set<L>& + line_seps_comps = doc.line_seps_comps(); + + for_all_comps(c, line_seps_comps) + line_seps_comps(c).accept(*this); + } + + + // Whitespace seraparators + if (doc.has_whitespace_seps()) + { + const component_set<L>& + whitespace_seps_comps = doc.whitespace_seps_comps(); + + for_all_comps(c, whitespace_seps_comps) + whitespace_seps_comps(c).accept(*this); + } + + output << " </page>" << std::endl; + output << "</pcGts>" << std::endl; + + } + + + /// Line Links + // + template <typename L> + void + full_xml_visitor::visit(const line_links<L>& llinks) const + { + output << " <line_links>" << std::endl; + for_all_links(l, llinks) + { + output << " <line_link" + << " from=\"" << l + << "\" to=\"" << llinks(l) + << "\"/>" << std::endl; + } + output << " </line_links>" << std::endl; + } + + + /// Object Groups + // + template <typename L> + void + full_xml_visitor::visit(const object_groups<L>& groups) const + { + output << " <object_groups>" << std::endl; + for_all_groups(g, groups) + { + output << " <group " + << " object_id=\"" << g + << "\" group_id=\"" << groups(g) + << "\"/>" << std::endl; + } + output << " </object_groups>" << std::endl; + } + + + /// Object Links + // + template <typename L> + void + full_xml_visitor::visit(const object_links<L>& links) const + { + output << " <object_links>" << std::endl; + for_all_links(l, links) + { + output << " <link" + << " from=\"" << l + << "\" to=\"" << links(l) + << "\"/>" << std::endl; + } + output << " </object_links>" << std::endl; + } + + + /// Component Set + // + template <typename L> + void + full_xml_visitor::visit(const component_set<L>& comp_set) const + { + output << " <component_set nelements=\"" << comp_set.nelements() + << "\">" << std::endl; + for_all_comps(c, comp_set) + { + output << " <component_info" + << " id=\"" << comp_set(c).id() + << "\" mass_center_x=\"" << comp_set(c).mass_center().col() + << "\" mass_center_y=\"" << comp_set(c).mass_center().row() + << "\" card=\"" << comp_set(c).card() + << "\" tag=\"" << comp_set(c).tag() + << "\" type=\"" << comp_set(c).type() + << "\" pmin_x=\"" << comp_set(c).bbox().pmin().col() + << "\" pmin_y=\"" << comp_set(c).bbox().pmin().row() + << "\" pmax_x=\"" << comp_set(c).bbox().pmax().col() + << "\" pmax_y=\"" << comp_set(c).bbox().pmax().row() + << "\"/>" << std::endl; + } + + + // Save labeled image + { + const L& lbl = comp_set.labeled_image(); + output << "<labeled_image " + << " height=\"" << lbl.domain().height() + << "\" width=\"" << lbl.domain().width() << "\">" + << "<![CDATA["; + + util::array<unsigned char> lbl64; + convert::to_base64(lbl, lbl64); + output.write((const char *)lbl64.std_vector().data(), + lbl64.nelements()); + + output << "]]></labeled_image>" << std::endl; + } + + // Save separators image + { + const mln_ch_value(L,bool)& seps = comp_set.separators(); + output << "<separators_image " + << " height=\"" << seps.domain().height() + << "\" width=\"" << seps.domain().width() << "\">" + << "<![CDATA["; + + util::array<unsigned char> seps64; + convert::to_base64(seps, seps64); + output.write((const char *)seps64.std_vector().data(), + seps64.nelements()); + + output << "]]></separators_image>" << std::endl; + } + + output << "</component_set>" << std::endl; + } + + + /// Component_info + // + inline + void + full_xml_visitor::visit(const component_info& info) const + { + switch (info.type()) + { + case component::WhitespaceSeparator: + { + output << " <whitespace_separator_region id=\"wss" + << info.id() + << "\">" << std::endl; + + internal::print_box_coords(output, info.bbox(), " "); + + output << " </whitespace_separator_region>" << std::endl; + break; + } + + case component::LineSeparator: + { + output << " <separator_region id=\"sr" << info.id() + << "\" sep_orientation=\"0.000000\" " + << " sep_colour=\"Black\" " + << " sep_bgcolour=\"White\">" << std::endl; + + internal::print_box_coords(output, info.bbox(), " "); + + output << " </separator_region>" << std::endl; + break; + } + + + default: + case component::Image: + { + output << " <image_region id=\"ir" << info.id() + << "\" img_colour_type=\"24_Bit_Colour\"" + << " img_orientation=\"0.000000\" " + << " img_emb_text=\"No\" " + << " img_bgcolour=\"White\">" << std::endl; + + internal::print_box_coords(output, info.bbox(), " "); + + output << " </image_region>" << std::endl; + break; + } + } + } + + /// Paragraph Set + // + template <typename L> + void + full_xml_visitor::visit(const paragraph_set<L>& parset) const + { + const line_set<L>& lines = parset.lines(); + + for_all_paragraphs(p, parset) + { + const mln::util::array<line_id_t>& line_ids = parset(p).line_ids(); + + // FIXME: compute that information on the whole paragraph + // and use them here. + line_id_t fid = line_ids(0); + output << " <text_region id=\"" << p + << "\" txt_orientation=\"" << lines(fid).orientation() + << "\" txt_reading_orientation=\"" << lines(fid).reading_orientation() + << "\" txt_reading_direction=\"" << lines(fid).reading_direction() + << "\" txt_text_type=\"" << lines(fid).type() + << "\" txt_reverse_video=\"" << (lines(fid).reverse_video() ? "true" : "false") + << "\" txt_indented=\"" << (lines(fid).indented() ? "true" : "false") + << "\" kerning=\"" << lines(fid).char_space(); + + // EXTENSIONS - Not officially supported + output << "\" baseline=\"" << lines(fid).baseline() + << "\" meanline=\"" << lines(fid).meanline() + << "\" x_height=\"" << lines(fid).x_height() + << "\" d_height=\"" << lines(fid).d_height() + << "\" a_height=\"" << lines(fid).a_height() + << "\" char_width=\"" << lines(fid).char_width(); + // End of EXTENSIONS + output << "\">" + << std::endl; + + internal::print_box_coords(output, parset(p).bbox(), " "); + + + // EXTENSIONS - Not officially supported + for_all_paragraph_lines(lid, line_ids) + { + line_id_t l = line_ids(lid); + + lines(l).accept(*this); + } + + output << " </text_region>" << std::endl; + } + } + + + template <typename L> + void + full_xml_visitor::visit(const line_info<L>& line) const + { + if (line.has_text()) + { + output << " <line text=\"" << line.html_text() << "\" "; + } + else + output << " <line " << std::endl; + + output << "id=\"" << line.id() + << "\" txt_orientation=\"" << line.orientation() + << "\" txt_reading_orientation=\"" << line.reading_orientation() + << "\" txt_reading_direction=\"" << line.reading_direction() + << "\" txt_text_type=\"" << line.type() + << "\" txt_reverse_video=\"" << (line.reverse_video() ? "true" : "false") + << "\" txt_indented=\"" << (line.indented() ? "true" : "false") + << "\" kerning=\"" << line.char_space() + << "\" baseline=\"" << line.baseline() + << "\" meanline=\"" << line.meanline() + << "\" x_height=\"" << line.x_height() + << "\" d_height=\"" << line.d_height() + << "\" a_height=\"" << line.a_height() + << "\" char_width=\"" << line.char_width() + << "\">" << std::endl; + + internal::print_box_coords(output, line.bbox(), " "); + + output << " <compid_list>" << std::endl; + + for_all_line_comps(c, line.components()) + output << " <compid value=\"" + << line.components()(c) << "\" />" << std::endl; + + output << " </compid_list>" << std::endl; + + output << " </line>" << std::endl; + } + +#endif // MLN_INCLUDE_ONLY + + } // end of namespace scribo::io::xml::internal + + } // end of namespace scribo::io::xml + + } // end of namespace scribo::io + +} // end of namespace scribo + +#endif // SCRIBO_IO_XML_INTERNAL_FULL_XML_VISITOR_HH diff --git a/scribo/scribo/io/xml/internal/html_markups_replace.hh b/scribo/scribo/io/xml/internal/html_markups_replace.hh new file mode 100644 index 0000000..76f8107 --- /dev/null +++ b/scribo/scribo/io/xml/internal/html_markups_replace.hh @@ -0,0 +1,97 @@ +// Copyright (C) 2011 EPITA Research and Development Laboratory (LRDE) +// +// This file is part of Olena. +// +// Olena is free software: you can redistribute it and/or modify it under +// the terms of the GNU General Public License as published by the Free +// Software Foundation, version 2 of the License. +// +// Olena is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +// General Public License for more details. +// +// You should have received a copy of the GNU General Public License +// along with Olena. If not, see <
http://www.gnu.org/licenses/
>. +// +// As a special exception, you may use this file as part of a free +// software project without restriction. Specifically, if other files +// instantiate templates or use macros or inline functions from this +// file, or you compile this file and link it with other files to produce +// an executable, this file does not by itself cause the resulting +// executable to be covered by the GNU General Public License. This +// exception does not however invalidate any other reasons why the +// executable file might be covered by the GNU General Public License. + +#ifndef SCRIBO_IO_XML_INTERNAL_HTML_MARKUPS_REPLACE_HH +# define SCRIBO_IO_XML_INTERNAL_HTML_MARKUPS_REPLACE_HH + +/// \file +/// +/// \brief Replace HTML markups characters by their corresponding +/// markups. + + +namespace scribo +{ + + namespace io + { + + namespace xml + { + + namespace internal + { + + /*! \brief Replace HTML markups characters by their corresponding + markups. + */ + inline + std::string + html_markups_replace(std::string& input); + + +# ifndef MLN_INCLUDE_ONLY + + static inline std::map<char, std::string> init_map() + { + std::map<char, std::string> html_map; + html_map['\"'] = """; + html_map['<'] = "<"; + html_map['>'] = ">"; + html_map['&'] = "&"; + return html_map; + } + + + inline + std::string + html_markups_replace(const std::string& input) + { + static std::map<char, std::string> map = init_map(); + + std::string output = input; + for (unsigned i = 0; i < input.size(); ++i) + { + std::map<char, std::string>::iterator it = map.find(output.at(i)); + if (it != map.end()) + { + output.replace(i, 1, it->second); + i += it->second.size() - 1; + } + } + return output; + } + +# endif // ! MLN_INCLUDE_ONLY + + } // end of namespace scribo::io::xml::internal + + } // end of namespace scribo::io::xml + + } // end of namespace scribo::io + +} // end of namespace scribo + +#endif // ! SCRIBO_IO_XML_INTERNAL_HTML_MARKUPS_REPLACE_HH diff --git a/scribo/scribo/io/xml/internal/page_xml_visitor.hh b/scribo/scribo/io/xml/internal/page_xml_visitor.hh new file mode 100644 index 0000000..52d8f12 --- /dev/null +++ b/scribo/scribo/io/xml/internal/page_xml_visitor.hh @@ -0,0 +1,222 @@ +// Copyright (C) 2011 EPITA Research and Development Laboratory (LRDE) +// +// This file is part of Olena. +// +// Olena is free software: you can redistribute it and/or modify it under +// the terms of the GNU General Public License as published by the Free +// Software Foundation, version 2 of the License. +// +// Olena is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +// General Public License for more details. +// +// You should have received a copy of the GNU General Public License +// along with Olena. If not, see <
http://www.gnu.org/licenses/
>. +// +// As a special exception, you may use this file as part of a free +// software project without restriction. Specifically, if other files +// instantiate templates or use macros or inline functions from this +// file, or you compile this file and link it with other files to produce +// an executable, this file does not by itself cause the resulting +// executable to be covered by the GNU General Public License. This +// exception does not however invalidate any other reasons why the +// executable file might be covered by the GNU General Public License. + +#ifndef SCRIBO_IO_XML_INTERNAL_PAGE_XML_VISITOR_HH +# define SCRIBO_IO_XML_INTERNAL_PAGE_XML_VISITOR_HH + +/// \file +/// +/// PAGE format XML serializer Visitor. + +# include <fstream> + +# include <scribo/core/internal/doc_xml_serializer.hh> +# include <scribo/convert/to_base64.hh> + +# include <scribo/io/xml/internal/print_box_coords.hh> +# include <scribo/io/xml/internal/print_page_preambule.hh> + + +namespace scribo +{ + + namespace io + { + + namespace xml + { + + namespace internal + { + + /*! \brief Save document information as XML. + + We use a XML Schema part of the PAGE (Page Analysis and Ground + truth Elements) image representation framework. + + This schema was used in the Page Segmentation COMPetition + (PSCOMP) for ICDAR 2009. + + Its XSD file is located here: +
http://schema.primaresearch.org/PAGE/gts/pagecontent/2009-03-16/pagecontent…
+ + */ + class page_xml_visitor : public doc_xml_serializer<page_xml_visitor> + { + public: + // Constructor + page_xml_visitor(std::ofstream& out); + + // Visit overloads + template <typename L> + void visit(const document<L>& doc) const; + + template <typename L> + void visit(const component_set<L>& comp_set) const; + + void visit(const component_info& info) const; + + template <typename L> + void visit(const paragraph_set<L>& parset) const; + + private: // Attributes + std::ofstream& output; + }; + + + +# ifndef MLN_INCLUDE_ONLY + + + inline + page_xml_visitor::page_xml_visitor(std::ofstream& out) + : output(out) + { + } + + + + /// Document + // + template <typename L> + void + page_xml_visitor::visit(const document<L>& doc) const + { + // Preambule + print_PAGE_preambule(output, doc, true); + + // Text + if (doc.has_text()) + doc.paragraphs().accept(*this); + + // Page elements (Pictures, ...) + if (doc.has_elements()) + doc.elements().accept(*this); + + // line seraparators + if (doc.has_line_seps()) + doc.line_seps_comps().accept(*this); + + output << " </page>" << std::endl; + output << "</pcGts>" << std::endl; + } + + + /// Component Set + // + template <typename L> + void + page_xml_visitor::visit(const component_set<L>& comp_set) const + { + for_all_comps(c, comp_set) + if (comp_set(c).is_valid()) + comp_set(c).accept(*this); + } + + + /// Component_info + // + inline + void + page_xml_visitor::visit(const component_info& info) const + { + switch (info.type()) + { + case component::LineSeparator: + { + output << " <separator_region id=\"sr" << info.id() + << "\" sep_orientation=\"0.000000\" " + << " sep_colour=\"Black\" " + << " sep_bgcolour=\"White\">" << std::endl; + + internal::print_box_coords(output, info.bbox(), " "); + + output << " </separator_region>" << std::endl; + break; + } + + + default: + case component::Image: + { + output << " <image_region id=\"ir" << info.id() + << "\" img_colour_type=\"24_Bit_Colour\"" + << " img_orientation=\"0.000000\" " + << " img_emb_text=\"No\" " + << " img_bgcolour=\"White\">" << std::endl; + + internal::print_box_coords(output, info.bbox(), " "); + + output << " </image_region>" << std::endl; + break; + } + } + } + + + /// Paragraph Set + // + template <typename L> + void + page_xml_visitor::visit(const paragraph_set<L>& parset) const + { + const line_set<L>& lines = parset.lines(); + + for_all_paragraphs(p, parset) + { + const mln::util::array<line_id_t>& line_ids = parset(p).line_ids(); + + // FIXME: compute that information on the whole paragraph + // and use them here. + line_id_t fid = line_ids(0); + output << " <text_region id=\"" << p + << "\" txt_orientation=\"" << lines(fid).orientation() + << "\" txt_reading_orientation=\"" << lines(fid).reading_orientation() + << "\" txt_reading_direction=\"" << lines(fid).reading_direction() + << "\" txt_text_type=\"" << lines(fid).type() + << "\" txt_reverse_video=\"" << (lines(fid).reverse_video() ? "true" : "false") + << "\" txt_indented=\"" << (lines(fid).indented() ? "true" : "false") + << "\" kerning=\"" << lines(fid).char_space() + << "\">" + << std::endl; + + internal::print_box_coords(output, parset(p).bbox(), " "); + + output << " </text_region>" << std::endl; + } + } + + +#endif // MLN_INCLUDE_ONLY + + } // end of namespace scribo::io::xml::internal + + } // end of namespace scribo::io::xml + + } // end of namespace scribo::io + +} // end of namespace scribo + +#endif // SCRIBO_IO_XML_INTERNAL_PAGE_XML_VISITOR_HH diff --git a/scribo/scribo/io/xml/internal/print_box_coords.hh b/scribo/scribo/io/xml/internal/print_box_coords.hh new file mode 100644 index 0000000..d3aeedf --- /dev/null +++ b/scribo/scribo/io/xml/internal/print_box_coords.hh @@ -0,0 +1,92 @@ +// Copyright (C) 2011 EPITA Research and Development Laboratory (LRDE) +// +// This file is part of Olena. +// +// Olena is free software: you can redistribute it and/or modify it under +// the terms of the GNU General Public License as published by the Free +// Software Foundation, version 2 of the License. +// +// Olena is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +// General Public License for more details. +// +// You should have received a copy of the GNU General Public License +// along with Olena. If not, see <
http://www.gnu.org/licenses/
>. +// +// As a special exception, you may use this file as part of a free +// software project without restriction. Specifically, if other files +// instantiate templates or use macros or inline functions from this +// file, or you compile this file and link it with other files to produce +// an executable, this file does not by itself cause the resulting +// executable to be covered by the GNU General Public License. This +// exception does not however invalidate any other reasons why the +// executable file might be covered by the GNU General Public License. + +#ifndef SCRIBO_IO_XML_INTERNAL_PRINT_BOX_COORDS_HH +# define SCRIBO_IO_XML_INTERNAL_PRINT_BOX_COORDS_HH + +/// \file +/// +/// \brief Prints box2d coordinates to XML data. + +# include <mln/core/alias/box2d.hh> + +namespace scribo +{ + + namespace io + { + + namespace xml + { + + namespace internal + { + + /*! \brief Prints box2d coordinates to XML data. + */ + void + print_box_coords(std::ofstream& ostr, const box2d& b, + const char *space); + + +# ifndef MLN_INCLUDE_ONLY + + + inline + void + print_box_coords(std::ofstream& ostr, const box2d& b, + const char *space) + { + std::string sc = space; + std::string sp = sc + " "; + ostr << sc << "<coords>" << std::endl + << sp << "<point x=\"" << b.pmin().col() + << "\" y=\"" << b.pmin().row() << "\"/>" + << std::endl + << sp << "<point x=\"" << b.pmax().col() + << "\" y=\"" << b.pmin().row() << "\"/>" + << std::endl + << sp << "<point x=\"" << b.pmax().col() + << "\" y=\"" << b.pmax().row() << "\"/>" + << std::endl + << sp << "<point x=\"" << b.pmin().col() + << "\" y=\"" << b.pmax().row() << "\"/>" + << std::endl + << sc << "</coords>" << std::endl; + + } + + +# endif // ! MLN_INCLUDE_ONLY + + } // end of namespace scribo::io::xml::internal + + } // end of namespace scribo::io::xml + + } // end of namespace scribo::io + +} // end of namespace scribo + +#endif // ! SCRIBO_IO_XML_INTERNAL_PRINT_BOX_COORDS_HH diff --git a/scribo/scribo/io/xml/internal/print_page_preambule.hh b/scribo/scribo/io/xml/internal/print_page_preambule.hh new file mode 100644 index 0000000..b5ae891 --- /dev/null +++ b/scribo/scribo/io/xml/internal/print_page_preambule.hh @@ -0,0 +1,95 @@ +// Copyright (C) 2011 EPITA Research and Development Laboratory (LRDE) +// +// This file is part of Olena. +// +// Olena is free software: you can redistribute it and/or modify it under +// the terms of the GNU General Public License as published by the Free +// Software Foundation, version 2 of the License. +// +// Olena is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +// General Public License for more details. +// +// You should have received a copy of the GNU General Public License +// along with Olena. If not, see <
http://www.gnu.org/licenses/
>. +// +// As a special exception, you may use this file as part of a free +// software project without restriction. Specifically, if other files +// instantiate templates or use macros or inline functions from this +// file, or you compile this file and link it with other files to produce +// an executable, this file does not by itself cause the resulting +// executable to be covered by the GNU General Public License. This +// exception does not however invalidate any other reasons why the +// executable file might be covered by the GNU General Public License. + +#ifndef SCRIBO_IO_XML_INTERNAL_PRINT_PAGE_PREAMBULE_HH +# define SCRIBO_IO_XML_INTERNAL_PRINT_PAGE_PREAMBULE_HH + +/// \file +/// +/// \brief Print PAGE XML format preambule. + +# include <mln/core/alias/box2d.hh> + +namespace scribo +{ + + namespace io + { + + namespace xml + { + + namespace internal + { + + /// \brief Print PAGE XML format preambule. + template <typename L> + void print_PAGE_preambule(std::ofstream& output, + const document<L>& doc, + bool with_validation); + + +# ifndef MLN_INCLUDE_ONLY + + template <typename L> + void print_PAGE_preambule(std::ofstream& output, + const document<L>& doc, + bool with_validation) + { + output << "<?xml version=\"1.0\"?>" << std::endl; + + if (with_validation) + output << "<pcGts xmlns=\"
http://schema.primaresearch.org/PAGE/gts/pagecontent/2009-03-16\
" " + << "xmlns:xsi=\"
http://www.w3.org/2001/XMLSchema-instance\
" " + << "xsi:schemaLocation=\"
http://schema.primaresearch.org/PAGE/gts/pagecontent/2009-03-16
" + << "
http://schema.primaresearch.org/PAGE/gts/pagecontent/2009-03-16/pagecontent…
" " + << "pcGtsId=\"" << doc.filename() << "\">" << std::endl; + else + output << "<pcGts>" << std::endl; + + output << " <pcMetadata>" << std::endl; + output << " <pcCreator>LRDE</pcCreator>" << std::endl; + output << " <pcCreated/>" << std::endl; + output << " <pcLastChange/>" << std::endl; + output << " <pcComments>Generated by Scribo from Olena.</pcComments>" << std::endl; + output << " </pcMetadata>" << std::endl; + + output << " <page image_filename=\"" << doc.filename() + << "\" image_width=\"" << doc.width() + << "\" image_height=\"" << doc.height() + << "\">" << std::endl; + } + +# endif // ! MLN_INCLUDE_ONLY + + } // end of namespace scribo::io::xml::internal + + } // end of namespace scribo::io::xml + + } // end of namespace scribo::io + +} // end of namespace scribo + +#endif // ! SCRIBO_IO_XML_INTERNAL_PRINT_PAGE_PREAMBULE_HH diff --git a/scribo/scribo/io/xml/save.hh b/scribo/scribo/io/xml/save.hh index 7011e87..30579d0 100644 --- a/scribo/scribo/io/xml/save.hh +++ b/scribo/scribo/io/xml/save.hh @@ -38,7 +38,11 @@ # include <map> # include <scribo/core/document.hh> -# include <scribo/core/line_set.hh> + +# include <scribo/io/xml/internal/full_xml_visitor.hh> +# include <scribo/io/xml/internal/extended_page_xml_visitor.hh> +# include <scribo/io/xml/internal/page_xml_visitor.hh> + namespace scribo { @@ -49,360 +53,104 @@ namespace scribo namespace xml { - /*! \brief Save document information as XML. + /*! \brief Supported XML formats + + Page : PRima PAGE format. Used in ICDAR 2009. - We use a XML Schema part of the PAGE (Page Analysis and Ground - truth Elements) image representation framework. + PageExtended : Enriched PRima PAGE format with scribo data. + + Full : Enriched PRima PAGE format with scribo data. This + format can be reloaded in Scribo. + */ + enum Format + { + Page, + PageExtended, + Full + //Hocr + }; - This schema was used in the Page Segmentation COMPetition - (PSCOMP) for ICDAR 2009. - Its XSD file is located here: -
http://schema.primaresearch.org/PAGE/gts/pagecontent/2009-03-16/pagecontent…
+ /*! \brief Save document information as XML. */ template <typename L> void - save(const document<L>& doc, - const std::string& output_name, - bool allow_extensions); + save(const document<L>& doc, const std::string& output_name, + Format format); # ifndef MLN_INCLUDE_ONLY + namespace internal { - inline - std::string& - html_markups_replace(std::string& input, - std::map<char, std::string>& map) - { - for (unsigned i = 0; i < input.size(); ++i) - { - std::map<char, std::string>::iterator it = map.find(input.at(i)); - if (it != map.end()) - { - input.replace(i, 1, it->second); - i += it->second.size() - 1; - } - } - return input; - } - - - inline - void print_box_coords(std::ofstream& ostr, const box2d& b, - const char *space) + template <typename L> + void save_page(const document<L>& doc, std::ofstream& output) { - std::string sc = space; - std::string sp = sc + " "; - ostr << sc << "<coords>" << std::endl - << sp << "<point x=\"" << b.pmin().col() - << "\" y=\"" << b.pmin().row() << "\"/>" - << std::endl - << sp << "<point x=\"" << b.pmax().col() - << "\" y=\"" << b.pmin().row() << "\"/>" - << std::endl - << sp << "<point x=\"" << b.pmax().col() - << "\" y=\"" << b.pmax().row() << "\"/>" - << std::endl - << sp << "<point x=\"" << b.pmin().col() - << "\" y=\"" << b.pmax().row() << "\"/>" - << std::endl - << sc << "</coords>" << std::endl; - + scribo::io::xml::internal::page_xml_visitor f(output); + doc.accept(f); } - - template <typename L> - void - save(const document<L>& doc, - const std::string& output_name) + void save_page_extended(const document<L>& doc, std::ofstream& output) { - trace::entering("scribo::io::xml:save_text_lines"); - - std::ofstream file(output_name.c_str()); - if (! file) - { - std::cerr << "error: cannot open file '" << doc.filename() << "'!"; - abort(); - } - - std::map<char, std::string> html_map; - html_map['\"'] = """; - html_map['<'] = "<"; - html_map['>'] = ">"; - html_map['&'] = "&"; - - file << "<?xml version=\"1.0\"?>" << std::endl; - file << "<pcGts xmlns=\"
http://schema.primaresearch.org/PAGE/gts/pagecontent/2009-03-16\
" xmlns:xsi=\"
http://www.w3.org/2001/XMLSchema-instance\
" xsi:schemaLocation=\"
http://schema.primaresearch.org/PAGE/gts/pagecontent/2009-03-16
http://schema.primaresearch.org/PAGE/gts/pagecontent/2009-03-16/pagecontent…
" pcGtsId=\"" << doc.filename() << "\">" << std::endl; - - file << " <pcMetadata>" << std::endl; - file << " <pcCreator>LRDE</pcCreator>" << std::endl; - file << " <pcCreated/>" << std::endl; - file << " <pcLastChange/>" << std::endl; - file << " <pcComments>Generated by Scribo from Olena.</pcComments>" << std::endl; - file << " </pcMetadata>" << std::endl; - - file << " <page image_filename=\"" << doc.filename() - << "\" image_width=\"" << doc.width() - << "\" image_height=\"" << doc.height() - << "\">" << std::endl; - - // Text - if (doc.has_text()) - { - const line_set<L>& lines = doc.lines(); - const paragraph_set<L>& parset = doc.paragraphs(); - - for_all_paragraphs(p, parset) - { - const mln::util::array<line_id_t>& line_ids = parset(p).line_ids(); - - // FIXME: compute that information on the whole paragraph - // and use them here. - line_id_t fid = line_ids(0); - file << " <text_region id=\"" << p - << "\" txt_orientation=\"" << lines(fid).orientation() - << "\" txt_reading_orientation=\"" << lines(fid).reading_orientation() - << "\" txt_reading_direction=\"" << lines(fid).reading_direction() - << "\" txt_text_type=\"" << lines(fid).type() - << "\" txt_reverse_video=\"" << (lines(fid).reverse_video() ? "true" : "false") - << "\" txt_indented=\"" << (lines(fid).indented() ? "true" : "false") - << "\" kerning=\"" << lines(fid).char_space() - << "\">" - << std::endl; - - internal::print_box_coords(file, parset(p).bbox(), " "); - - file << " </text_region>" << std::endl; - } - } - - // Page elements (Pictures, ...) - if (doc.has_elements()) - { - const component_set<L>& elts = doc.elements(); - for_all_comps(e, elts) - if (elts(e).is_valid()) - { - file << " <image_region id=\"ir" << elts(e).id() - << "\" img_colour_type=\"24_Bit_Colour\"" - << " img_orientation=\"0.000000\" " - << " img_emb_text=\"No\" " - << " img_bgcolour=\"White\">" << std::endl; - - internal::print_box_coords(file, elts(e).bbox(), " "); - - file << " </image_region>" << std::endl; - } - } - - - file << " </page>" << std::endl; - file << "</pcGts>" << std::endl; - - trace::exiting("scribo::io::xml::save_text_lines"); + scribo::io::xml::internal::extended_page_xml_visitor f(output); + doc.accept(f); } - - - template <typename L> - void - save_extended(const document<L>& doc, - const std::string& output_name) + void save_full(const document<L>& doc, std::ofstream& output) { - trace::entering("scribo::io::xml:save_text_lines"); - - std::ofstream file(output_name.c_str()); - if (! file) - { - std::cerr << "error: cannot open file '" << doc.filename() << "'!"; - abort(); - } - - std::map<char, std::string> html_map; - html_map['\"'] = """; - html_map['<'] = "<"; - html_map['>'] = ">"; - html_map['&'] = "&"; - - file << "<?xml version=\"1.0\"?>" << std::endl; - file << "<pcGts>" << std::endl; - - file << " <pcMetadata>" << std::endl; - file << " <pcCreator>LRDE</pcCreator>" << std::endl; - file << " <pcCreated/>" << std::endl; - file << " <pcLastChange/>" << std::endl; - file << " <pcComments>Generated by Scribo from Olena.</pcComments>" << std::endl; - file << " </pcMetadata>" << std::endl; - - file << " <page image_filename=\"" << doc.filename() - << "\" image_width=\"" << doc.width() - << "\" image_height=\"" << doc.height() - << "\">" << std::endl; - - // Text - if (doc.has_text()) - { - const line_set<L>& lines = doc.lines(); - const paragraph_set<L>& parset = doc.paragraphs(); - - for_all_paragraphs(p, parset) - { - const mln::util::array<line_id_t>& line_ids = parset(p).line_ids(); - - // FIXME: compute that information on the whole paragraph - // and use them here. - line_id_t fid = line_ids(0); - file << " <text_region id=\"" << p - << "\" txt_orientation=\"" << lines(fid).orientation() - << "\" txt_reading_orientation=\"" << lines(fid).reading_orientation() - << "\" txt_reading_direction=\"" << lines(fid).reading_direction() - << "\" txt_text_type=\"" << lines(fid).type() - << "\" txt_reverse_video=\"" << (lines(fid).reverse_video() ? "true" : "false") - << "\" txt_indented=\"" << (lines(fid).indented() ? "true" : "false") - << "\" kerning=\"" << lines(fid).char_space(); - - // EXTENSIONS - Not officially supported - file << "\" baseline=\"" << lines(fid).baseline() - << "\" meanline=\"" << lines(fid).meanline() - << "\" x_height=\"" << lines(fid).x_height() - << "\" d_height=\"" << lines(fid).d_height() - << "\" a_height=\"" << lines(fid).a_height() - << "\" char_width=\"" << lines(fid).char_width(); - // End of EXTENSIONS - file << "\">" - << std::endl; - - internal::print_box_coords(file, parset(p).bbox(), " "); - - - // EXTENSIONS - Not officially supported - for_all_paragraph_lines(lid, line_ids) - { - line_id_t l = line_ids(lid); - - if (lines(l).has_text()) - { - std::string tmp = lines(l).text(); - tmp = internal::html_markups_replace(tmp, html_map); - - file << " <line text=\"" << tmp << "\" "; - } - else - file << " <line " << std::endl; - - file << "id=\"" << lines(l).id() - << "\" txt_orientation=\"" << lines(l).orientation() - << "\" txt_reading_orientation=\"" << lines(l).reading_orientation() - << "\" txt_reading_direction=\"" << lines(l).reading_direction() - << "\" txt_text_type=\"" << lines(l).type() - << "\" txt_reverse_video=\"" << (lines(l).reverse_video() ? "true" : "false") - << "\" txt_indented=\"" << (lines(l).indented() ? "true" : "false") - << "\" kerning=\"" << lines(l).char_space() - << "\" baseline=\"" << lines(l).baseline() - << "\" meanline=\"" << lines(l).meanline() - << "\" x_height=\"" << lines(l).x_height() - << "\" d_height=\"" << lines(l).d_height() - << "\" a_height=\"" << lines(l).a_height() - << "\" char_width=\"" << lines(l).char_width() - << "\">" << std::endl; - - internal::print_box_coords(file, lines(l).bbox(), " "); - - file << " </line>" << std::endl; - } - - file << " </text_region>" << std::endl; - } - } - // End of EXTENSIONS - - // Page elements (Pictures, ...) - if (doc.has_elements()) - { - const component_set<L>& elts = doc.elements(); - for_all_comps(e, elts) - if (elts(e).is_valid()) - { - switch (elts(e).type()) - { - case component::Separator: - { - file << " <separator_region id=\"sr" << elts(e).id() - << "\" sep_orientation=\"0.000000\" " - << " sep_colour=\"Black\" " - << " sep_bgcolour=\"White\">" << std::endl; - - internal::print_box_coords(file, elts(e).bbox(), " "); - - file << " </separator_region>" << std::endl; - break; - break; - } - - default: - case component::Image: - { - file << " <image_region id=\"ir" << elts(e).id() - << "\" img_colour_type=\"24_Bit_Colour\"" - << " img_orientation=\"0.000000\" " - << " img_emb_text=\"No\" " - << " img_bgcolour=\"White\">" << std::endl; - - internal::print_box_coords(file, elts(e).bbox(), " "); - - file << " </image_region>" << std::endl; - break; - } - } - } - } - - - // Whitespace seraparators - if (doc.has_whitespace_seps()) - { - const component_set<L>& - whitespace_seps_comps = doc.whitespace_seps_comps(); - - for_all_comps(c, whitespace_seps_comps) - { - file << " <whitespace_separator_region id=\"wss" - << whitespace_seps_comps(c).id() - << "\">" << std::endl; - - internal::print_box_coords(file, whitespace_seps_comps(c).bbox(), " "); - - file << " </whitespace_separator_region>" << std::endl; - } - } - - file << " </page>" << std::endl; - file << "</pcGts>" << std::endl; - - trace::exiting("scribo::io::xml::save_text_lines"); + scribo::io::xml::internal::full_xml_visitor f(output); + doc.accept(f); } } // end of namespace scribo::io::xml::internal + // FACADE template <typename L> void save(const document<L>& doc, const std::string& output_name, - bool allow_extensions) + Format format) { - if (allow_extensions) - internal::save_extended(doc, output_name); - else - internal::save(doc, output_name); + trace::entering("scribo::io::xml::save"); + + // Open file + std::ofstream output(output_name.c_str()); + if (! output) + { + std::cerr << "scribo::io::xml::save - ERROR: cannot open file '" + << doc.filename() << "'!"; + return; + } + + // Choose saving method. + switch (format) + { + case Page: + internal::save_page(doc, output); + break; + + case PageExtended: + internal::save_page_extended(doc, output); + break; + + case Full: + internal::save_full(doc, output); + break; + + default: + trace::warning("scribo::io::xml::save - Invalid XML format! Skip saving..."); + } + + output.close(); + trace::exiting("scribo::io::xml::save"); } diff --git a/scribo/scribo/toolchain/internal/content_in_doc_functor.hh b/scribo/scribo/toolchain/internal/content_in_doc_functor.hh index 48098ba..dcbb4f7 100644 --- a/scribo/scribo/toolchain/internal/content_in_doc_functor.hh +++ b/scribo/scribo/toolchain/internal/content_in_doc_functor.hh @@ -36,7 +36,6 @@ # include <scribo/primitive/extract/non_text.hh> # include <scribo/primitive/extract/components.hh> -//# include <scribo/primitive/extract/vertical_separators.hh> # include <scribo/primitive/extract/separators.hh> # include <scribo/primitive/extract/separators_nonvisible.hh> @@ -114,7 +113,7 @@ namespace scribo bool enable_whitespace_seps; bool enable_debug; bool save_doc_as_xml; - bool allow_xml_extensions; + scribo::io::xml::Format xml_format; //============ // Parameters @@ -139,7 +138,7 @@ namespace scribo enable_whitespace_seps(true), enable_debug(false), save_doc_as_xml(false), - allow_xml_extensions(true), + xml_format(scribo::io::xml::PageExtended), ocr_language("eng"), output_file("/tmp/foo.xml"), doc(doc_filename) @@ -189,7 +188,7 @@ namespace scribo // Whitespace separators on_new_progress_label("Find whitespace separators..."); - whitespaces = primitive::extract::separators_nonvisible(processed_image); + whitespaces = primitive::extract::separators_nonvisible(input_cleaned); on_progress(); } @@ -483,7 +482,7 @@ namespace scribo { on_new_progress_label("Saving results"); - scribo::io::xml::save(doc, output_file, allow_xml_extensions); + scribo::io::xml::save(doc, output_file, xml_format); on_xml_saved(); on_progress(); diff --git a/scribo/src/content_in_doc.cc b/scribo/src/content_in_doc.cc index 9748b28..d8d4e52 100644 --- a/scribo/src/content_in_doc.cc +++ b/scribo/src/content_in_doc.cc @@ -172,7 +172,9 @@ int main(int argc, char* argv[]) debug); // Saving results - scribo::io::xml::save(doc, argv[2], true); + scribo::io::xml::save(doc, argv[2], scribo::io::xml::PageExtended); + scribo::io::xml::save(doc, "page.xml", scribo::io::xml::Page); + scribo::io::xml::save(doc, "full.xml", scribo::io::xml::Full); trace::exiting("main"); } -- 1.5.6.5
13 years, 9 months
1
0
0
0
last-svn-commit-777-g09da3ef Set component type during component extraction.
by Guillaume Lazzara
* scribo/core/component_info.hh, * scribo/core/component_set.hh, * scribo/core/document.hh, * scribo/core/tag/component.hh, * scribo/primitive/extract/components.hh, * scribo/primitive/identify.hh: Explicitly set component type to Separator when extracting separator components. --- scribo/ChangeLog | 12 ++++++ scribo/scribo/core/component_info.hh | 8 ++- scribo/scribo/core/component_set.hh | 50 ++++++++++++++++--------- scribo/scribo/core/document.hh | 6 ++- scribo/scribo/core/tag/component.hh | 20 ++++++--- scribo/scribo/primitive/extract/components.hh | 18 ++++++--- scribo/scribo/primitive/identify.hh | 2 +- 7 files changed, 79 insertions(+), 37 deletions(-) diff --git a/scribo/ChangeLog b/scribo/ChangeLog index 3d23191..63e3fee 100644 --- a/scribo/ChangeLog +++ b/scribo/ChangeLog @@ -1,3 +1,15 @@ +2011-03-01 Guillaume Lazzara <z(a)lrde.epita.fr> + + Set component type during component extraction. + + * scribo/core/component_info.hh, + * scribo/core/component_set.hh, + * scribo/core/document.hh, + * scribo/core/tag/component.hh, + * scribo/primitive/extract/components.hh, + * scribo/primitive/identify.hh: Explicitly set component type to + Separator when extracting separator components. + 2011-02-17 Guillaume Lazzara <z(a)lrde.epita.fr> Add new tools in Scribo. diff --git a/scribo/scribo/core/component_info.hh b/scribo/scribo/core/component_info.hh index 1b03318..6fc73f8 100644 --- a/scribo/scribo/core/component_info.hh +++ b/scribo/scribo/core/component_info.hh @@ -53,7 +53,8 @@ namespace scribo component_info(const component_id_t& id, const mln::box2d& bbox, const mln::point2d& mass_center, - unsigned card); + unsigned card, + component::Type type = component::Undefined); component_id_t id() const; const mln::box2d& bbox() const; @@ -101,9 +102,10 @@ namespace scribo component_info::component_info(const component_id_t& id, const mln::box2d& bbox, const mln::point2d& mass_center, - unsigned card) + unsigned card, + component::Type type) : id_(id), bbox_(bbox), mass_center_(mass_center), card_(card), - tag_(component::None), type_(component::Undefined) + tag_(component::None), type_(type) { } diff --git a/scribo/scribo/core/component_set.hh b/scribo/scribo/core/component_set.hh index 7ddcf16..442e8d6 100644 --- a/scribo/scribo/core/component_set.hh +++ b/scribo/scribo/core/component_set.hh @@ -86,16 +86,20 @@ namespace scribo component_set_data(); component_set_data(const L& ima, const mln_value(L)& ncomps); component_set_data(const L& ima, const mln_value(L)& ncomps, - const mln::util::array<pair_accu_t>& attribs); + const mln::util::array<pair_accu_t>& attribs, + component::Type type = component::Undefined); component_set_data(const L& ima, const mln_value(L)& ncomps, - const mln::util::array<pair_data_t>& attribs); + const mln::util::array<pair_data_t>& attribs, + component::Type type = component::Undefined); component_set_data(const L& ima, const mln_value(L)& ncomps, const mln::util::array<scribo::component_info>& infos); - void fill_infos(const mln::util::array<pair_accu_t>& attribs); + void fill_infos(const mln::util::array<pair_accu_t>& attribs, + component::Type type = component::Undefined); - void fill_infos(const mln::util::array<pair_data_t>& attribs); + void fill_infos(const mln::util::array<pair_data_t>& attribs, + component::Type type = component::Undefined); // Useful while constructing incrementaly (XML loading). void soft_init(const mln_value(L) ncomps); @@ -141,10 +145,12 @@ namespace scribo /// Constructor from an image \p ima, the number of labels \p ncomps and /// attributes values (bounding box and mass center). component_set(const L& ima, const mln_value(L)& ncomps, - const mln::util::array<pair_accu_t>& attribs); + const mln::util::array<pair_accu_t>& attribs, + component::Type type = component::Undefined); component_set(const L& ima, const mln_value(L)& ncomps, - const mln::util::array<pair_data_t>& attribs); + const mln::util::array<pair_data_t>& attribs, + component::Type type = component::Undefined); /// @} /// Return the component count. @@ -284,26 +290,28 @@ namespace scribo inline component_set_data<L>::component_set_data(const L& ima, const mln_value(L)& ncomps, - const mln::util::array<pair_accu_t>& attribs) + const mln::util::array<pair_accu_t>& attribs, + component::Type type) : ima_(ima), ncomps_(ncomps) { initialize(separators_, ima); // FIXME: do we really want that? mln::data::fill(separators_, false); - fill_infos(attribs); + fill_infos(attribs, type); } template <typename L> inline component_set_data<L>::component_set_data(const L& ima, const mln_value(L)& ncomps, - const mln::util::array<pair_data_t>& attribs) + const mln::util::array<pair_data_t>& attribs, + component::Type type) : ima_(ima), ncomps_(ncomps) { initialize(separators_, ima); // FIXME: do we really want that? mln::data::fill(separators_, false); - fill_infos(attribs); + fill_infos(attribs, type); } template <typename L> @@ -321,7 +329,8 @@ namespace scribo template <typename L> inline void - component_set_data<L>::fill_infos(const mln::util::array<pair_accu_t>& attribs) + component_set_data<L>::fill_infos(const mln::util::array<pair_accu_t>& attribs, + component::Type type) { typedef mln_site(L) P; @@ -331,7 +340,8 @@ namespace scribo for_all_comp_data(i, attribs) { component_info info(i, attribs[i].first(), - attribs[i].second(), attribs[i].second_accu().nsites()); + attribs[i].second(), attribs[i].second_accu().nsites(), + type); infos_.append(info); } } @@ -339,7 +349,8 @@ namespace scribo template <typename L> inline void - component_set_data<L>::fill_infos(const mln::util::array<pair_data_t>& attribs) + component_set_data<L>::fill_infos(const mln::util::array<pair_data_t>& attribs, + component::Type type) { typedef mln_site(L) P; @@ -349,7 +360,8 @@ namespace scribo for_all_comp_data(i, attribs) { component_info info(i, attribs[i].first, - attribs[i].second.first, attribs[i].second.second); + attribs[i].second.first, attribs[i].second.second, + type); infos_.append(info); } } @@ -397,9 +409,10 @@ namespace scribo template <typename L> inline component_set<L>::component_set(const L& ima, const mln_value(L)& ncomps, - const mln::util::array<pair_accu_t>& attribs) + const mln::util::array<pair_accu_t>& attribs, + component::Type type) { - data_ = new internal::component_set_data<L>(ima, ncomps, attribs); + data_ = new internal::component_set_data<L>(ima, ncomps, attribs, type); } @@ -407,9 +420,10 @@ namespace scribo inline component_set<L>::component_set(const L& ima, const mln_value(L)& ncomps, - const mln::util::array<pair_data_t>& attribs) + const mln::util::array<pair_data_t>& attribs, + component::Type type) { - data_ = new internal::component_set_data<L>(ima, ncomps, attribs); + data_ = new internal::component_set_data<L>(ima, ncomps, attribs, type); } diff --git a/scribo/scribo/core/document.hh b/scribo/scribo/core/document.hh index e5ac825..ef0869e 100644 --- a/scribo/scribo/core/document.hh +++ b/scribo/scribo/core/document.hh @@ -297,7 +297,8 @@ namespace scribo mln_value(L) ncomps; whitespace_seps_comps_ = primitive::extract::components(whitespace_seps, - mln::c8(), ncomps); + mln::c8(), ncomps, + component::WhitespaceSeparator); } @@ -333,7 +334,8 @@ namespace scribo mln_value(L) ncomps; line_seps_comps_ = primitive::extract::components(line_seps, - mln::c8(), ncomps); + mln::c8(), ncomps, + component::LineSeparator); } diff --git a/scribo/scribo/core/tag/component.hh b/scribo/scribo/core/tag/component.hh index 10b86a6..7cd2ede 100644 --- a/scribo/scribo/core/tag/component.hh +++ b/scribo/scribo/core/tag/component.hh @@ -1,5 +1,5 @@ -// Copyright (C) 2009, 2010 EPITA Research and Development Laboratory -// (LRDE) +// Copyright (C) 2009, 2010, 2011 EPITA Research and Development +// Laboratory (LRDE) // // This file is part of Olena. // @@ -55,7 +55,8 @@ namespace scribo { Undefined = 0, Character, - Separator, + LineSeparator, + WhitespaceSeparator, Noise, Punctuation, Image @@ -116,8 +117,11 @@ namespace scribo case Character: str = "Character"; break; - case Separator: - str = "Separator"; + case LineSeparator: + str = "LineSeparator"; + break; + case WhitespaceSeparator: + str = "WhitespaceSeparator"; break; case Noise: str = "Noise"; @@ -139,8 +143,10 @@ namespace scribo { if (str == "Character") return Character; - else if (str == "Separator") - return Separator; + else if (str == "LineSeparator") + return LineSeparator; + else if (str == "WhitespaceSeparator") + return WhitespaceSeparator; else if (str == "Noise") return Noise; else if (str == "Punctuation") diff --git a/scribo/scribo/primitive/extract/components.hh b/scribo/scribo/primitive/extract/components.hh index 4994d4b..849dd7b 100644 --- a/scribo/scribo/primitive/extract/components.hh +++ b/scribo/scribo/primitive/extract/components.hh @@ -1,4 +1,5 @@ -// Copyright (C) 2009 EPITA Research and Development Laboratory (LRDE) +// Copyright (C) 2009, 2011 EPITA Research and Development Laboratory +// (LRDE) // // This file is part of Olena. // @@ -68,6 +69,7 @@ namespace scribo /// and background to 'false'. /// \param[in] nbh A neighborhood to be used for labeling. /// \param[in,out] ncomponents Will store the numbers of components found. + /// \param[in] type The default component type set to components. /// /// \return An image of labeled components. // @@ -75,7 +77,8 @@ namespace scribo inline component_set<mln_ch_value(I,V)> components(const Image<I>& input, - const Neighborhood<N>& nbh, V& ncomponents); + const Neighborhood<N>& nbh, V& ncomponents, + component::Type type = component::Undefined); # ifndef MLN_INCLUDE_ONLY @@ -88,7 +91,8 @@ namespace scribo inline void components_tests(const Image<I>& input, - const Neighborhood<N>& nbh, V& ncomponents) + const Neighborhood<N>& nbh, V& ncomponents, + component::Type type) { mlc_equal(mln_value(I),bool)::check(); // mlc_is_a(V, mln::value::Symbolic)::check(); @@ -97,6 +101,7 @@ namespace scribo (void) input; (void) nbh; (void) ncomponents; + (void) type; } @@ -107,11 +112,12 @@ namespace scribo inline component_set<mln_ch_value(I,V)> components(const Image<I>& input, - const Neighborhood<N>& nbh, V& ncomponents) + const Neighborhood<N>& nbh, V& ncomponents, + component::Type type = component::Undefined) { trace::entering("scribo::components"); - internal::components_tests(input, nbh, ncomponents); + internal::components_tests(input, nbh, ncomponents, type); typedef mln_ch_value(I,V) L; typedef mln::accu::shape::bbox<mln_site(L)> bbox_accu_t; @@ -129,7 +135,7 @@ namespace scribo pair_accu_t()); component_set<L> - output(results.first(), ncomponents, results.second().second()); + output(results.first(), ncomponents, results.second().second(), type); trace::exiting("scribo::components"); return output; diff --git a/scribo/scribo/primitive/identify.hh b/scribo/scribo/primitive/identify.hh index 81a7d16..1bed712 100644 --- a/scribo/scribo/primitive/identify.hh +++ b/scribo/scribo/primitive/identify.hh @@ -61,7 +61,7 @@ namespace scribo std::swap(min, max); if (max/min > 10) - output(c).update_type(component::Separator); + output(c).update_type(component::LineSeparator); } mln::trace::exiting("scribo::primitive::identify"); -- 1.5.6.5
13 years, 9 months
1
0
0
0
last-svn-commit-776-g4a86f5d configure.ac: scribo/src/primitive/remove.
by Guillaume Lazzara
--- ChangeLog | 4 ++++ configure.ac | 1 + 2 files changed, 5 insertions(+), 0 deletions(-) diff --git a/ChangeLog b/ChangeLog index 9fedefa..30f8b74 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,5 +1,9 @@ 2011-02-17 Guillaume Lazzara <z(a)lrde.epita.fr> + * configure.ac: scribo/src/primitive/remove. + +2011-02-17 Guillaume Lazzara <z(a)lrde.epita.fr> + * configure.ac: configure scribo/tests/convert. 2011-03-14 Thierry GERAUD <thierry.geraud(a)lrde.epita.fr> diff --git a/configure.ac b/configure.ac index e30f010..44f359b 100644 --- a/configure.ac +++ b/configure.ac @@ -332,6 +332,7 @@ AC_CONFIG_FILES([ scribo/src/primitive/extract/Makefile scribo/src/primitive/find/Makefile scribo/src/primitive/group/Makefile + scribo/src/primitive/remove/Makefile scribo/src/table/Makefile scribo/src/text/Makefile scribo/src/toolchain/Makefile -- 1.5.6.5
13 years, 9 months
1
0
0
0
last-svn-commit-775-g23fe97d Add new tools in Scribo.
by Guillaume Lazzara
* src/primitive/extract/Makefile.am, * src/primitive/remove/Makefile.am: Add new targets. * src/primitive/extract/separators_nonvisible.cc, * src/primitive/remove/separators.cc: New. --- scribo/ChangeLog | 10 ++++++++++ scribo/src/primitive/extract/Makefile.am | 2 ++ .../extract/separators_nonvisible.cc} | 19 ++++++++++--------- scribo/src/primitive/{ => remove}/Makefile.am | 11 ++++++----- .../negate.cc => primitive/remove/separators.cc} | 20 ++++++++++++-------- 5 files changed, 40 insertions(+), 22 deletions(-) copy scribo/src/{misc/negate.cc => primitive/extract/separators_nonvisible.cc} (79%) copy scribo/src/primitive/{ => remove}/Makefile.am (82%) copy scribo/src/{misc/negate.cc => primitive/remove/separators.cc} (76%) diff --git a/scribo/ChangeLog b/scribo/ChangeLog index fd4b155..3d23191 100644 --- a/scribo/ChangeLog +++ b/scribo/ChangeLog @@ -1,5 +1,15 @@ 2011-02-17 Guillaume Lazzara <z(a)lrde.epita.fr> + Add new tools in Scribo. + + * src/primitive/extract/Makefile.am, + * src/primitive/remove/Makefile.am: Add new targets. + + * src/primitive/extract/separators_nonvisible.cc, + * src/primitive/remove/separators.cc: New. + +2011-02-17 Guillaume Lazzara <z(a)lrde.epita.fr> + Small fixes in Scribo. * scribo/core/macros.hh: Update comments. diff --git a/scribo/src/primitive/extract/Makefile.am b/scribo/src/primitive/extract/Makefile.am index a46cd68..22d6bfd 100644 --- a/scribo/src/primitive/extract/Makefile.am +++ b/scribo/src/primitive/extract/Makefile.am @@ -22,6 +22,7 @@ noinst_PROGRAMS = \ discontinued_lines \ discontinued_vlines \ discontinued_hlines \ + separators_nonvisible \ thick_vlines \ thick_hlines \ lines_pattern @@ -29,6 +30,7 @@ noinst_PROGRAMS = \ discontinued_lines_SOURCES = discontinued_lines.cc discontinued_vlines_SOURCES = discontinued_vlines.cc discontinued_hlines_SOURCES = discontinued_hlines.cc +separators_nonvisible_SOURCES = separators_nonvisible.cc thick_vlines_SOURCES = thick_vlines.cc thick_hlines_SOURCES = thick_hlines.cc lines_pattern_SOURCES = lines_pattern.cc diff --git a/scribo/src/misc/negate.cc b/scribo/src/primitive/extract/separators_nonvisible.cc similarity index 79% copy from scribo/src/misc/negate.cc copy to scribo/src/primitive/extract/separators_nonvisible.cc index da6fad6..82d4787 100644 --- a/scribo/src/misc/negate.cc +++ b/scribo/src/primitive/extract/separators_nonvisible.cc @@ -1,5 +1,4 @@ -// Copyright (C) 2009, 2010 EPITA Research and Development Laboratory -// (LRDE) +// Copyright (C) 2011 EPITA Research and Development Laboratory (LRDE) // // This file is part of Olena. // @@ -25,35 +24,37 @@ // executable file might be covered by the GNU General Public License. #include <mln/core/image/image2d.hh> -#include <mln/logical/not.hh> #include <mln/io/pbm/all.hh> +#include <mln/data/convert.hh> +#include <scribo/primitive/extract/separators_nonvisible2.hh> #include <scribo/debug/usage.hh> - const char *args_desc[][2] = { { "input.pbm", "A binary image." }, + { "output.pbm", "Output image." }, {0, 0} }; int main(int argc, char *argv[]) { - mln::trace::entering("main"); using namespace mln; + using namespace scribo; if (argc != 3) return scribo::debug::usage(argv, - "Negate a binary image", + "Extract non visible separators (whitespaces)", "input.pbm output.pbm", args_desc); + trace::entering("main"); + image2d<bool> input; io::pbm::load(input, argv[1]); - io::pbm::save(logical::not_(input), argv[2]); - - mln::trace::exiting("main"); + io::pbm::save(primitive::extract::separators_nonvisible(input), argv[2]); + trace::exiting("main"); } diff --git a/scribo/src/primitive/Makefile.am b/scribo/src/primitive/remove/Makefile.am similarity index 82% copy from scribo/src/primitive/Makefile.am copy to scribo/src/primitive/remove/Makefile.am index 7e46a66..a673886 100644 --- a/scribo/src/primitive/Makefile.am +++ b/scribo/src/primitive/remove/Makefile.am @@ -1,4 +1,4 @@ -# Copyright (C) 2009 EPITA Research and Development Laboratory (LRDE). +# Copyright (C) 2011 EPITA Research and Development Laboratory (LRDE). # # This file is part of Olena. # @@ -16,7 +16,8 @@ include $(top_srcdir)/scribo/scribo.mk -SUBDIRS = \ - extract \ - find \ - group + +noinst_PROGRAMS = \ + separators + +separators_SOURCES = separators.cc diff --git a/scribo/src/misc/negate.cc b/scribo/src/primitive/remove/separators.cc similarity index 76% copy from scribo/src/misc/negate.cc copy to scribo/src/primitive/remove/separators.cc index da6fad6..46e977f 100644 --- a/scribo/src/misc/negate.cc +++ b/scribo/src/primitive/remove/separators.cc @@ -1,5 +1,4 @@ -// Copyright (C) 2009, 2010 EPITA Research and Development Laboratory -// (LRDE) +// Copyright (C) 2011 EPITA Research and Development Laboratory (LRDE) // // This file is part of Olena. // @@ -25,35 +24,40 @@ // executable file might be covered by the GNU General Public License. #include <mln/core/image/image2d.hh> -#include <mln/logical/not.hh> #include <mln/io/pbm/all.hh> +#include <mln/data/convert.hh> +#include <scribo/primitive/extract/separators.hh> +#include <scribo/primitive/remove/separators.hh> #include <scribo/debug/usage.hh> - const char *args_desc[][2] = { { "input.pbm", "A binary image." }, + { "output.pbm", "Output image." }, {0, 0} }; int main(int argc, char *argv[]) { - mln::trace::entering("main"); using namespace mln; + using namespace scribo; if (argc != 3) return scribo::debug::usage(argv, - "Negate a binary image", + "Remove visible separators", "input.pbm output.pbm", args_desc); + trace::entering("main"); + image2d<bool> input; io::pbm::load(input, argv[1]); - io::pbm::save(logical::not_(input), argv[2]); + image2d<bool> seps = primitive::extract::separators(input, 81); - mln::trace::exiting("main"); + io::pbm::save(primitive::remove::separators(input, seps), argv[2]); + trace::exiting("main"); } -- 1.5.6.5
13 years, 9 months
1
0
0
0
last-svn-commit-774-g2789911 Small fixes in Scribo.
by Guillaume Lazzara
* scribo/core/macros.hh: Update comments. * scribo/text/merging.hh: Add comments and fix line data swap. * scribo/text/recognition.hh: Make use of is_textline. * src/text_in_picture.cc: Initialize ImageMagick. --- scribo/ChangeLog | 12 ++++++++++++ scribo/scribo/core/macros.hh | 7 +++---- scribo/scribo/text/merging.hh | 6 +++--- scribo/scribo/text/recognition.hh | 2 +- 4 files changed, 19 insertions(+), 8 deletions(-) diff --git a/scribo/ChangeLog b/scribo/ChangeLog index 001e134..fd4b155 100644 --- a/scribo/ChangeLog +++ b/scribo/ChangeLog @@ -1,5 +1,17 @@ 2011-02-17 Guillaume Lazzara <z(a)lrde.epita.fr> + Small fixes in Scribo. + + * scribo/core/macros.hh: Update comments. + + * scribo/text/merging.hh: Add comments and fix line data swap. + + * scribo/text/recognition.hh: Make use of is_textline. + + * src/text_in_picture.cc: Initialize ImageMagick. + +2011-02-17 Guillaume Lazzara <z(a)lrde.epita.fr> + Improve and cleanup whitespace separator detection. * scribo/core/tag/anchor.hh: Add new anchors. diff --git a/scribo/scribo/core/macros.hh b/scribo/scribo/core/macros.hh index 887539f..c6de1ff 100644 --- a/scribo/scribo/core/macros.hh +++ b/scribo/scribo/core/macros.hh @@ -1,5 +1,5 @@ -// Copyright (C) 2009, 2010 EPITA Research and Development Laboratory -// (LRDE) +// Copyright (C) 2009, 2010, 2011 EPITA Research and Development +// Laboratory (LRDE) // // This file is part of Olena. // @@ -33,8 +33,6 @@ # define for_all_elements(E, S) \ for (unsigned E = 0; E < S.nelements(); ++E) - -// FIXME: we want to replace previous macros by these ones. # define for_all_comps(C, S) \ for (unsigned C = 1; C <= S.nelements(); ++C) @@ -56,6 +54,7 @@ # define for_all_line_comps(E, S) \ for_all_elements(E, S) +// Internal use only. # define for_all_lines_info(E, S) \ for_all_comp_data(E, S) diff --git a/scribo/scribo/text/merging.hh b/scribo/scribo/text/merging.hh index 3087465..f433e51 100644 --- a/scribo/scribo/text/merging.hh +++ b/scribo/scribo/text/merging.hh @@ -192,7 +192,7 @@ namespace scribo { // we transfer data from the largest item to the root one. scribo::line_info<L> tmp = lines(l1); - lines(l1) = lines(l2); + std::swap(lines(l1), lines(l2)); lines(l1).fast_merge(tmp); // We must set manually the tag for lines(l2) since it is @@ -504,8 +504,8 @@ namespace scribo void one_merge_pass(unsigned ith_pass, const box2d& domain, - std::vector<scribo::line_id_t>& v, - scribo::line_set<L>& lines, + std::vector<scribo::line_id_t>& v, // Ids sorted by bbox size. + scribo::line_set<L>& lines, // Tagged Lines (looks_like_a_text_line?) mln::util::array<unsigned>& parent) { image2d<unsigned> billboard(domain); diff --git a/scribo/scribo/text/recognition.hh b/scribo/scribo/text/recognition.hh index 59f269e..3a9742b 100644 --- a/scribo/scribo/text/recognition.hh +++ b/scribo/scribo/text/recognition.hh @@ -127,7 +127,7 @@ namespace scribo /// Use text bboxes with Tesseract for_all_lines(i, lines) { - if (! lines(i).is_valid() || lines(i).is_hidden() || lines(i).type() != line::Text) + if (! lines(i).is_textline()) continue; mln_domain(I) box = lines(i).bbox(); -- 1.5.6.5
13 years, 9 months
1
0
0
0
← Newer
1
2
3
4
5
6
7
8
9
...
28
Older →
Jump to page:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Results per page:
10
25
50
100
200