This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "Olena, a generic and efficient image processing platform".
The branch icdar/hdlac2011 has been updated
via ebfb6d5d15d50abe54ce41cb88b243c7b03b1935 (commit)
from f8dae9bab9ea4273a4fc426d7a7bbb3a24c4be93 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
ebfb6d5 2011-05-20 Coddy Levi <levi(a)lrde.epita.fr>
-----------------------------------------------------------------------
Summary of changes:
scribo/ChangeLog | 6 ++++++
scribo/scribo/util/component_outline.hh | 21 +++++++++++++--------
2 files changed, 19 insertions(+), 8 deletions(-)
hooks/post-receive
--
Olena, a generic and efficient image processing platform
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "Olena, a generic and efficient image processing platform".
The branch master has been updated
via 3c76d368389afeb0ef7cab77dbe1901b1bf6ff69 (commit)
from c9b6114cfa70fc440835c55a665a67feca0a1cfa (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
3c76d36 Replace img/lena.pgm by "standard" version.
-----------------------------------------------------------------------
Summary of changes:
milena/ChangeLog | 5 +++++
milena/img/lena.pgm | 3 ++-
2 files changed, 7 insertions(+), 1 deletions(-)
hooks/post-receive
--
Olena, a generic and efficient image processing platform
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "Olena, a generic and efficient image processing platform".
The branch icdar/hdlac2011 has been updated
via f8dae9bab9ea4273a4fc426d7a7bbb3a24c4be93 (commit)
from 973e5ac6bd3ecea14c6df71aa03a2dd78811b9ea (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
f8dae9b Improve results.
-----------------------------------------------------------------------
Summary of changes:
scribo/ChangeLog | 16 +++++++
.../primitive/extract/lines_h_thick_and_thin.hh | 15 ++++++-
scribo/scribo/primitive/extract/non_text_hdoc.hh | 4 ++
scribo/scribo/text/paragraphs.hh | 44 ++++++++++---------
.../toolchain/internal/content_in_hdoc_functor.hh | 15 ++++---
scribo/src/content_in_hdoc.cc | 2 +-
.../src/primitive/extract/lines_thick_and_thin.cc | 2 +-
7 files changed, 67 insertions(+), 31 deletions(-)
hooks/post-receive
--
Olena, a generic and efficient image processing platform
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "Olena, a generic and efficient image processing platform".
The branch icdar/hdlac2011 has been updated
via 973e5ac6bd3ecea14c6df71aa03a2dd78811b9ea (commit)
from 853ed71516fef6f26d0ba5cd80b35a74f6c53269 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
973e5ac Improve paragraph grouping for historical documents.
-----------------------------------------------------------------------
Summary of changes:
scribo/ChangeLog | 8 +
scribo/scribo/core/line_info.hh | 18 ++-
scribo/scribo/text/merging.hh | 16 +-
scribo/scribo/text/paragraphs.hh | 388 ++++++++++++++++++++++++++++++++++----
4 files changed, 378 insertions(+), 52 deletions(-)
hooks/post-receive
--
Olena, a generic and efficient image processing platform
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "Olena, a generic and efficient image processing platform".
The branch unstable/scribo has been updated
via 3368692f6ecf9e857f8443caa3f8d60da470a1f9 (commit)
from 3e4992613401cb2f4332d159cd29b1655074f997 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
3368692 doc/research.tex: New file describing tests and conclusions.
-----------------------------------------------------------------------
Summary of changes:
scribo/ChangeLog | 4 +
scribo/doc/research.tex | 230 +++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 234 insertions(+), 0 deletions(-)
create mode 100644 scribo/doc/research.tex
hooks/post-receive
--
Olena, a generic and efficient image processing platform
---
scribo/ChangeLog | 4 +
scribo/doc/research.tex | 230 +++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 234 insertions(+), 0 deletions(-)
create mode 100644 scribo/doc/research.tex
diff --git a/scribo/ChangeLog b/scribo/ChangeLog
index 0412e63..fd8acad 100644
--- a/scribo/ChangeLog
+++ b/scribo/ChangeLog
@@ -1,3 +1,7 @@
+2011-05-18 Guillaume Lazzara <z(a)lrde.epita.fr>
+
+ * doc/research.tex: New file describing tests and conclusions.
+
2011-05-17 Guillaume Lazzara <z(a)lrde.epita.fr>
Add a new tool.
diff --git a/scribo/doc/research.tex b/scribo/doc/research.tex
new file mode 100644
index 0000000..86ab68c
--- /dev/null
+++ b/scribo/doc/research.tex
@@ -0,0 +1,230 @@
+%% Copyright (C) 2011 EPITA Research and Development Laboratory (LRDE)
+%%
+%% This file is part of Olena.
+%%
+%% Olena is free software: you can redistribute it and/or modify it under
+%% the terms of the GNU General Public License as published by the Free
+%% Software Foundation, version 2 of the License.
+%%
+%% Olena is distributed in the hope that it will be useful,
+%% but WITHOUT ANY WARRANTY; without even the implied warranty of
+%% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+%% General Public License for more details.
+%%
+%% You should have received a copy of the GNU General Public License
+%% along with Olena. If not, see <http://www.gnu.org/licenses/>.
+
+\documentclass[a4]{book}
+
+%\usepackage{hevea}
+
+\usepackage{html}
+\usepackage{hyperref}
+\usepackage{graphicx}
+\usepackage{makeidx}
+\usepackage{xcolor}
+\usepackage{color}
+
+\title{SCRIBO\\
+ \large{Research report} }
+\author{LRDE}
+\date{}
+\makeindex
+
+
+\begin{document}
+
+\maketitle
+
+
+
+%===========================================
+%===========================================
+%===========================================
+\chapter{Preprocessing}
+
+
+
+%*******************************************
+%*******************************************
+\section{Show-through removal}
+
+
+%*******************************************
+%*******************************************
+\section{Color to grayscale conversion}
+
+2 formulas tested :
+\begin{itemize}
+\item $R + G + V$
+\item $0.299 * R + 0.587 * G + 0.114 * B$
+\end{itemize}
+
+
+%*******************************************
+%*******************************************
+\section{Binarization}
+
+
+
+%...........................................
+\subsection{Sauvola}
+\par{Sauvola}
+
+\cite{Sauvola}
+
+Best published method for documents.
+
+Parameters set up according to \cite{Badekas}.
+
+\par{Sauvola Multi-scale}
+
+Implemented with integral images. \cite{Faisal.integral_images}
+
+\par{Sauvola 3-channels}
+
+
+
+%*******************************************
+%*******************************************
+\section{Background/Foreground identification}
+
+
+
+%*******************************************
+%*******************************************
+\section{Unskew}
+
+
+
+%*******************************************
+%*******************************************
+\section{Denoising}
+
+
+
+%*******************************************
+%*******************************************
+\section{Delimitors}
+
+%...........................................
+\subsection{Lines}
+
+%...........................................
+\subsection{Tab-stops and whitespaces}
+
+File concerned : scribo/primitive/extract/separators\_non\_visible.hh
+
+First attempt to retrieve tab-stops/whitespaces delimitors. In order
+to limit false positive, the components are dilated horizontaly prior
+the algorithm.
+
+False positive were still too numerous in the core paragraphes.
+
+
+File concerned : scribo/primitive/extract/alignments.hh
+
+In order to avoid too much false positive, the text is grouped once
+(almost by word). To limit connections between paragraphs, the rules
+used to connect components is as follows : lookup for the closest left
+neighbor until a maximum distance compute with the formula (w / 2.0f)
++ (dmax_factor_ * h), where w and h are respectively the width and the
+height of the component. dmax_factor_ is a user defined parameter set
+to 1. Functor primitive::link::internal::dmax_default is used and
+implement that rule..
+
+We tried to find tabstops and whitespaces without grouping first but
+there were too much false positive inside paragraphs. Grouping may be
+a problem some times since if two paragraphs are too close to
+eachother, they may already connect...
+
+
+%===========================================
+%===========================================
+%===========================================
+\chapter{Text extraction}
+
+%*******************************************
+%*******************************************
+\section{lines}
+
+%...........................................
+\subsection{Component labeling}
+
+%...........................................
+\subsection{Component grouping}
+
+%...........................................
+\subsection{Line reconstruction}
+
+
+
+%*******************************************
+%*******************************************
+\section{paragraphs/text blocks}
+
+
+%===========================================
+%===========================================
+%===========================================
+\chapter{Non-text object extraction}
+
+%*******************************************
+%*******************************************
+\section{Background learning}
+
+
+%===========================================
+%===========================================
+%===========================================
+\chapter{Text recognition (OCR)}
+
+%*******************************************
+%*******************************************
+\section{Tesseract Integration}
+
+
+%*******************************************
+%*******************************************
+\section{Text cleanup}
+
+
+%===========================================
+%===========================================
+%===========================================
+\chapter{Data structures}
+
+%*******************************************
+%*******************************************
+\section{Component\_set}
+\subsection{Component\_info}
+
+%*******************************************
+%*******************************************
+\section{object\_links}
+
+%*******************************************
+%*******************************************
+\section{object\_groups}
+
+
+
+%*******************************************
+%*******************************************
+\section{line\_set}
+
+%...........................................
+\subsection{line\_info}
+
+
+
+
+%*******************************************
+%*******************************************
+\section{paragraph\_set}
+
+%...........................................
+\subsection{paragraph\_info}
+
+\end{document}
+
--
1.5.6.5