[cvs] / sankoff-paper / REPLY Repository:
ViewVC logotype

View of /sankoff-paper/REPLY

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.4 - (download) (annotate)
Sat Jan 8 15:52:36 2005 UTC (5 years, 8 months ago) by yam
Branch: MAIN
CVS Tags: HEAD
Changes since 1.3: +6 -12 lines
Update
Reviewer 1:
===========
> Yet another implementation of a Sankoff-like algorithm, called
> pmcomp, has just appeared in Bioinformatics 20: p. 2222 (2004). It
> is worth mentioning since it uses a "fold envelope" in the form of
> precomputed pair probabilities (although it improves only CPU time,
> not memory).

The manuscript now cites this work (NB that paper appeared after
the first submission of this manuscript).

> For the examples shown in figs 6, 7, and 8, it should be stated were
> the reference alignment and consensus structure are taken from.

This has now been done.


Reviewer 2:
===========

> General The paper presents the general view on the various
> constraints that can be introduces into Sankoff-like algorithms for
> simultaneous alignment and folding of RNA. Based on the general
> approach the author has created the universal object-oriented
> toolkit (the DART library) to implement algorithms with various sets
> of constraints. The approach (mutatis mutandis) can be applied to
> wide class of SCFG-related problems, not necessarily dealing with
> RNA.  The results presented in the manuscript are interesting and
> worth to be published in BMC Bioinformatics.  However, the
> presentation needs the major compulsory revisions.
> -------------------------------------------------------------------------------
> Major Compulsory Revisions (that the author must respond to before a
> decision on publication can be reached) 1. General structure.  The
> manuscript contains 5 sections (Introduction, Dynamic programming
> algorithms for SCFGs, Implementation, Results and Discussion). The
> second section ÒDynamic programming algorithms for SCFGsÓ consists
> of two different parts. The first one (pp 4-8) introduces the main
> notions related to SCFG; the second one (starting with the
> sub-section ÒImposing constraintsÓ) presents the main ideas of the
> paper.  I suggest presenting each of the parts as separate
> section. Current form of presentation makes the reading difficult.

I have now done this.

> 2. Dynamic programming algorithms for SCFGs, part 1.  As to my
> understanding the aim of the part is to make the paper
> self-contained.

Not quite. The aim of this section was to respond to a previous
reviewer's request in an earlier submission, that I should "not assume
the readers are familiar with SCFG jargon".

I am perfectly happy to try to broaden the potential readership to
people who are new to stochastic grammars, by reviewing the
terminology and presenting a short, self-consistent precis of the
theory. However, the general theory of SCFGs is (by now) widely enough
established that it should not be necessary to include a full tutorial
every time a new result is published. Making such demands of authors
can only slow down publication of novel research to an unacceptable
pace.

> Therefore, it should contain (1) necessary formal definitions and
> notation, and (2) informal comments that make the reading
> easier. The author does not achieve the goals. Some important
> definitions are not given, and informal comments are not clear. I
> think the part needs a major revision. Some more detailed comments
> are given below. I also suggest to try to use notations and
> definitions from a well-known book/paper and give the reference.
> 2.1. Notation (p.5).  The introduced notation differs from the
> commonly used and is a bit confusing. My suggestion is to consider
> grammars generating alignments, i.e. words in the alphabet {A, U, G,
> C,-}x {A, U, G, C,-}.

I have incorporated this suggested notation.

> Convenient representation of right-hand side of a production rule is
> a word (or a symbol sequence), not a list.

This has now been changed.

> 2.2. The parse tree and the sequence likelihood (p.5).  The formal
> definition of the parse tree is not given.

I have changed the text to read "This process generates a parse tree,
rooted at node S, in which internal nodes are labeled with
nonterminals and leaf nodes with terminals, with children of each node
ordered left-to-right". This is, in fact, a complete definition of the
parse tree. A "formal definition" is not necessary in this context.

> The informal explanation (including Figure 1) is not sufficient for
> the readers not familiar with the subject. E.g. according to the
> Figure 1, the grammar generates an alignment, not the Òword
> mixtureÓ. It would be reasonable to mention explicitly that the
> probability assigned to the parse tree is the product of the rules
> corresponding to the internal nodes.

This has now been changed.

> Better refer to the subsequence-pair (Xij,Ykl) as Òthe inside
> sequence pair of WÓ, not as Òthe inside sequenceÓ, see p.5, 4th line
> from the bottom.

This has now been changed.

> 3. Dynamic programming algorithms for SCFGs, part 2.  1. The author
> 3.does not state clearly what the most general form of admissible
> 3.constraints is. In the > subsection ÒImposing constraintsÓ
> 3.(pp.8-9) only combinations of independently defined fold and >
> 3.alignment envelopes are considered. However, one can imagine more
> 3.general constraints (see e.g.  > ÒFurther possible constraintsÓ,
> 3.p.10). The issue has to be clarified.

The sentence "The constraints given here allow the independent
imposition of alignment or fold constraints" has been added to the
"Further possible constraints", along with some other explanatory
text.

> 3.2. It is unclear (at least for me), does the part repeats the
> results from [Holmes I, Rubin GM.  Pairwise RNA structure comparison
> using stochastic context-free grammars. Pacific Symposium on
> Biocomputing, 2002] or it presents something new.

The new aspect is the "alignment constraint", as stated in the
abstract and introduction. I have further clarified this in the
discussion.

> 4. Results, Table 5 1. ÒEnvelope sizeÓ column.  > The measure of
> 4.envelope size is unnatural and reflects the authorÕs technique,
> 4.not the size itself. I > think, it worth to give the number of
> 4.admissible pairs (i, j), (k, l) and (i,k) in notation of p.9, and
> 4.the > total number of admissible quadruples (i, j, k, l).

This has now been done.

> 4.2. ÒCommentÓ column.  I would like to know more about accuracy of
> the method. The data given in the ÒCommentÓ column give the first
> impression. In addition to them I propose to give the data on
> alignment accuracy (% of columns of the etalon alignment restored by
> the program) and the structural accuracy (% of stem columns that are
> restored correctly and marked as stems).

This has now been done.

> -------------------------------------------------------------------------------
> Minor Essential Revisions (such as missing labels on figures, or the
> wrong use of a term, which the author can be trusted to correct)
> 1. p.9, formulas for Fx and Fy.  Should be i < j and k < l
> respectively (not Ò<=Ó )

No - the correct form is "i<=j" and "k<=l", as stated, so that
zero-length subsequences are included, as necessary for the recursion.


Reviewer 3:
===========
> Whereas the paper seems to be improved compared to an earlier
> version, it is still unclear what exactly is new. It appears to be
> the general implementation including the constraints.

I am somewhat confused by the reviewer's opinion here. The abstract
and introduction state very clearly that the primary new result of the
paper is the ability to impose both fold and alignment constraints on
the Sankoff algorithm with pairwise SCFGs. The reaction of this
reviewer ("it is unclear what is new") is a little confusing. However,
I have added a sentence to the abstract, clarifying the novelty of
this paper ("For the first time, it is possible to combine independent
structural and alignment constraints of unprecedented general
flexibility in Pair SCFG-based alignment algorithms").

> The example of use is a SCFG called the stemloc program. The program
> seems rather simple compared to other SCFG programs. For example
> there appears to be no stacking in the simplified presentation, but
> it is not clear whether the actual implementation includes such
> feature.

As far as I am aware, the only other implementation of a pairwise
stochastic context-free grammar to date has been the QRNA program, by
Rivas and Eddy. This did not include stacking. Neither did (for
example) the INFERNAL program (also from the Eddy lab). I would like
to know what other, more elaborate SCFG programs the reviewer is
referring to here.

In any case, this is in fact a moot point. The SCFG used in "stemloc"
is, as the reviewer notes, an *example* grammar for the algorithms
described here. It could quite easily be changed, and all of the
results described here would still apply.

> The stemloc program the results section describes the test data with
> very little detail. Apparently the program is only tested on three
> pairs of sequences? Or is it three families where the reported
> performances are averages?

Again, this is quite clearly stated in the text. I have now
disambiguated it further in the Results section.

> Three pairs of sequences are not enough to give a realistic picture
> of the performance. This is a major concern. For example, which
> sequences that are used from the different sources could easily be
> mentioned along with other relevant information such as the number
> of sequences in the data sets.

I strongly dispute that this is a "major concern", as the power of
pair SCFGs themselves is well documented (see e.g. the paper on the
program QRNA: Rivas and Eddy, BMC Bioinf. 2001).

However, in the interests of making the results as robust as possible,
the program has now been benchmarked on an expanded dataset of 22
alignments.

> In table five the performance of the program is measured. The
> measures of performance is, Found stems 1, etc. this is inadequate
> for comparing the performance with other methods performances. For
> example the positive prediction value and sensitivity (for finding
> base pairs) could be mentioned.

This was also requested by reviewer#2, and has now been done.

> It is also stated that the program is trained on RFAM. However, the
> structural alignments in RFAM are generated by Infernal and can be
> quite different from those found in the corresponding manual curated
> databases. It is reasonable that the paper comment on this.

A comment to this effect has been inserted.

> Minor things: - It is not entirely clear how the program is switched
> from global to local alignment.

This is stated in the CYK algorithm section.

> - It should be mentioned that the Sankoff algorithm with fold
> constraints also have been presented (Hofacker, Bernhart and
> Stadler).

I happened to review the Hofacker et al paper, and I would not
describe it as a "Sankoff algorithm with fold constraints" (see
response to reviewer#1). Certainly it does not have the generality of
the present algorithm.

> For completeness the complete name for KYC could be given.

The following sentence has been added: "(The name KYC is a simple
reversal of CYK, reflecting the fact that KYC is to CYK as Outside is
to Inside, i.e. the reverse algorithm, in a certain sense.)"


Best wishes,

Ian Holmes

Questions? Mail ihh at fruitfly dot org
ViewVC Help
Powered by ViewVC 1.0.3