Wednesday, February 23, 2011

GSoC idea 3: Store annotations within PDF

As I mentioned in my previous post I can't mentor GSoC students myself and therefore are looking for a developer to jump in for me.

GSoC idea 3: Store annotations in PDF file

Application/component: Okular/Poppler

Brief explanation:
It is possible to store annotations with Okular. They are saved in separat files. One of the most wanted bugs is 151614 (123 comments, 739 votes). It would be awsome to have that feature in our wonderful Okular.

Expected results:
  1. Store annotations in the PDF file.
  2. If that is not enough apped support to modify the PDF (insert, delete pages etc.)
In my next life I will become a developer and the I could mentor that project myself. Until this happens I am really hoping someone else steps up.

12 comments:

Beat Wolf said...

More important than storing them in the same file is to be able to print them. In an academic wolrd, okular is pretty much useless right now(many students take notes on the course pdf during class, not beeing able to print this makes it useless).
Bonus point for storing the annotations in a adobe reader compatible way.

Ivan Čukić said...

Yes, printing would be nice as well.

The other thing that should be included in the idea is submitting the notes to nepomuk - so that those become searchable.

Cheerio!

trueg said...

The annotations should really be stored in Nepomuk. And then we should implement metadata sharing, allowing to send arbitrary metadata as RDF with a file, be it in an email, in a IM attachment or with a simple copy operation.
We are actually very close to supporting that. Having a GSOC project which makes use of that feature would put much needed pressure on the topic.

smls said...

> If that is not enough apped support to
> modify the PDF (insert, delete pages
> etc.)

Being able to rearrange the order of the existing pages would be useful as well. (The command could be called "Move current page to position..." or similar).

Also, being able to save the current page as an individual PDF file.

kampf fuer bildung said...

@trueg
next to nepomuk there should also be an option to store it directly in the pdf file.
PDF is one of the most important formats and should be working in the same way on every system (with and without nepomuk). In my point of view it's not usefull to have a platform spezific solution. If you have to explain the receivers how to use ist, there would be no benefits.
Maybe it would be better to store it directly in the pdf file and extract it with strigi to nepomuk.

Thomas Thym (ungethym) said...

Thanks for your comments. Printing notes. I haven't needed that yet, but I clearly can see that point.
Also more modification features (moving, rotating, inserting, deleting pages) would be nice. Okluar is a viewer not an editor. However it would be a blast to have such an application.
And ofcause: Store in information in the file (in a compatible way) and nepomuk integration (also for comments done by someone using adobe software) would be cool.

PLEASE step forward and mentor that project. GSoC is a great chance!

Enoyan Dienaect said...

All mentioned features would be nice but, as far as I understand, actual open source PDF editing should have much higher priorities. Being able to edit PDF files would likely be the way to create cross-application annotations etc.

There is a project for this, http://www.gnupdf.org. However, I wouldn't hold my breath after checking the progress reports. Still, wouldn't this ultimately be the best approach?

ilm said...

you also have,
http://sourceforge.net/projects/pdfedit/
using old qt/kde3 for the gui I believe.

okular is the best pdf reader there is for me. While pdf editing might be out of scope for okular, embedding annotations would be awesome (doing that a lot when reviewing documents of windows users and vice versa). I think extending the pdf streamer of strigi to extract annotations might be the best way to go.

ps: i hate it having to create an account just to comment something. :-)

toddrme2178 said...

@ trueg: the whole point of PDFs are that they are supported by a wide range of software outside of KDE. The point of having annotations is that you can share them with everyone. Nepomuk would not let you do that, it would only be compatible with okular. That defeates the purpose of using PDFs in the first place.

I can certainly see making the annotations indexable by Nepomuk being helpful, but they need to be stored in the PDF to be at all useful. The fraction of computer users running KDE is just too small to make storing the annotations in nepomuk sufficient for most use-cases.

I do agree that printing is probably important as well, and could be another aspect of the project.

Really, in my opinion nepomuk should be able to write metadata to files in a coross-platform manner (that is, write to the existing metadata structures many file formats today come with). So if you add nepomuk tags to a picture, they are also added to the picture's internal metadata as well. That way if you give the pictures to someone on another system, they get the tags as well. Lots of files have their own internal metadata (including PDFs), but nepomuk cannot write to them (although I think it may be able to read from some of them). My understanding is that this is at least one of the reasons projects like digikam and amarok do not use nepomuk for handling their catalogs. But this is a separate project.

Paul said...

A powerful PDF library should one of the priority of the free software community.

We should focus on GNUpdf or poppler. The first one seems promising but is still not very advanced, while the second is very good to render & read documents but is not designed at all to make big modifications on PDF.

kampf fuer bildung said...

I just wrote it to the bug report (just as information):

Evince already supports annotations (stored in pdf-file). Look here
http://carlosgc.linups.org/2010/Jul

There is also a note in the changelog of poppler-0.13.2
[...]
* Add support for Screen annotations
[...]
http://poppler.freedesktop.org/releases.html

So it should be also possible to implement it in Okular.

AhmedG. said...

I really have to agree. This is much needed. I think the current "annotation saving" method is broken and deeply flawed.