librelist archives

« back to archive

Page breaks inside tables

Page breaks inside tables

From:
Simon Sapin
Date:
2012-03-16 @ 13:50
Hi Plaes,

You talked about using fixed positioning to put stuff at the bottom of 
the page. That is not supported right now, but can can use margin boxes 
instead. These are up to 16 boxes of generated content (much like 
::before) that go in the margin of each page. There is a usage example 
at the bottom of the home page.

http://dev.w3.org/csswg/css3-page/#margin-boxes
http://weasyprint.org/
http://weasyprint.org/samples/CSS21-print.css

Absolute and fixed positioning should be added eventually, but I’m not 
sure of the details of how it should work in paged media.


For making a change (fix or feature) in WeasyPrint, it’s best to first 
write a failing test, and then make it pass. (The truth is I don’t 
always do it in that order, but I try ;))

You can look at examples, especially the test_page_breaks function in 
tests/test_layout.py. The 'parse' function is a testing helper that 
takes a HTML string and returns a list of laid out PageBox objects. A 
statement like "div, = body.children" serves to get to the objects we’re 
interested in, but also implies a few assertions: 'body' is a ParentBox 
(with a 'children' attribute) and there is exactly one child (at least 
on the fragment of <body> for this page)


About page breaks inside tables. I skipped that when first implementing 
tables (they were big enough already), but it’s about time it should be 
added now.

The web site has a very high level overview of the code:
http://weasyprint.org/hacking/

The "documents" part is slightly out-dated, but the parts we’re 
interested in here are good. Currently, page breaks mostly happen in the 
block_level_height function of layout/blocks.py. This function does too 
much, it is misnamed, big, hairy and messy and needs to be refactored; 
but that’s what we have for now.

Basically, this function takes a box from the "not laid out" tree and 
does the layout for its children/descendants until we hit the bottom of 
the current page (marked by max_position_y). It returns a new laid out 
box with only part of the original content, together with a resume_at 
object. resume_at indicates where inside this box the layout should 
resume for the next page. 'None' means no page break, the whole box 
content was laid out, go on to its next sibling. Another value means 
we’re doing a page break. The function takes a skip_stack argument: 
either 'None' to start at the beginning, or the value of resume_at from 
the previous call for the same box.

If the document was a flat list of box a single integer would suffice to 
index it, but it is a tree. resume_at/skip_stack values (when they are 
not None) are a stack of integers, represented by a 2-tuple of an 
integer and the "next" value/stack. The integer is an index in the list 
of children for a box, and the next value is the stack for the indexed 
child.

There is actually many more arguments and return values to handle gory 
details like margin collapsing, but this is what’s most relevant for now.


Ok, this is how we do page breaks for "normal" blocks. Please ask if 
anything is unclear.


Now, for tables. I’m not sure how breaking inside of a table row/cell 
would work, so let’s focus on breaking between table rows. Table layout 
happens in the ... table_layout function of layout/tables.py. At this 
point, tables have been normalized so that rows are always in row 
groups. So there is two points where you might want to do a page break: 
between rows of the same group, and between groups.

max_position_y and resume_at are already there. You’ll need to stop the 
loops when appropriate with a non-None value for resume_at, and add 
skip_stack. Have a look at block_level_height (and others) to see how 
skip_stack is unpacked.

I suggest you first add page breaks pretending that rowspan on table 
cells is always 1, and later fix it for rowspan. I case a cell spans 
more than one row some page break opportunities disappear unless we also 
break inside cells. rowspan is handled much earlier in the wrap_table 
function of formatting_structure/build.py. You may need to go there and 
record more information about page break opportunities.


Sorry for the wall of text. This information should be public in a kind 
of "developer’s guide" on the website, but I’m not sure yet how to 
organize it.


That should be about it. This list should be good for "async" 
discussion. If you want to chat I’m on Freenode and our Jabber chatroom 
(community@room.jabber.kozea.fr) mostly on daytime hours UTC+1. Please 
ask if I can help any more.


Regards,
-- 
Simon Sapin

Re: [weasyprint] Page breaks inside tables

From:
Priit Laes
Date:
2012-03-16 @ 19:49
Ühel kenal päeval, R, 16.03.2012 kell 14:50, kirjutas Simon Sapin:
> Hi Plaes,

> About page breaks inside tables. I skipped that when first implementing 
> tables (they were big enough already), but it’s about time it should be 
> added now.

I think I now have something:
http://plaes.org/files/2012-Q1/weasy-first-breaks.pdf ;)

Code is here: https://github.com/plaes/WeasyPrint/commits/table-breaks

Now I'm wondering, what would be the best way to add tests for it? As
it's still a bit incomplete because of the missing footer/header
handling...

Päikest,
Priit :)

Re: [weasyprint] Page breaks inside tables

From:
Simon Sapin
Date:
2012-03-16 @ 23:45
Le 16/03/2012 20:49, Priit Laes a écrit :
> I think I now have something:
> http://plaes.org/files/2012-Q1/weasy-first-breaks.pdf  ;)
>
> Code is here:
> https://github.com/plaes/WeasyPrint/commits/table-breaks

I’ll have a more in depth look later, but this seems good.


> Now I'm wondering, what would be the best way to add tests for it?

Look at other tests in test_layout.py. You have some HTML with some page
size, and you expect the page break to be at a particular point. You do
the layout, and check that the resulting tree looks like what you expected.

> As it's still a bit incomplete because of the missing footer/header
> handling...

Indeed that had slipped my mind: tables may start with a "header"
row group and end with a "footer" row group; each should be repeated on
every page. There are header_group and footer_group boolean attributes
on TableRowGroupBox objects. But maybe you can start by doing nice
page breaks without doing this, and then add this later.

Regards,
-- 
Simon

Re: [weasyprint] Page breaks inside tables

From:
Priit Laes
Date:
2012-03-17 @ 09:21
Ühel kenal päeval, L, 17.03.2012 kell 00:45, kirjutas Simon Sapin:
> Le 16/03/2012 20:49, Priit Laes a écrit :
> > I think I now have something:
> > http://plaes.org/files/2012-Q1/weasy-first-breaks.pdf  ;)
> >
> > Code is here:
> > https://github.com/plaes/WeasyPrint/commits/table-breaks
> 
> I’ll have a more in depth look later, but this seems good.
> 
> 
> > Now I'm wondering, what would be the best way to add tests for it?
> 
> Look at other tests in test_layout.py. You have some HTML with some page
> size, and you expect the page break to be at a particular point. You do
> the layout, and check that the resulting tree looks like what you expected.
> 
> > As it's still a bit incomplete because of the missing footer/header
> > handling...
> 
OK, tried my hand at writing a test - the example below works quite well
with the parse() function in test_layouts.py, but a wild infinite loop
appears when testing with the regular pdf writer.. :)

<style>
    @page { -weasy-size: 100px; }
    td { height: 50px; }
</style>
<table>
  <tr><td></td></tr>
  <tr><td></td></tr>
  <tr><td></td></tr>
  <tr><td></td></tr>
  <tr><td></td></tr>
</table>

Ideas?

> Indeed that had slipped my mind: tables may start with a "header"
> row group and end with a "footer" row group; each should be repeated on
> every page. There are header_group and footer_group boolean attributes
> on TableRowGroupBox objects. But maybe you can start by doing nice
> page breaks without doing this, and then add this later.
> 
> Regards,

Re: [weasyprint] Page breaks inside tables

From:
Simon Sapin
Date:
2012-03-17 @ 09:47
Le 17/03/2012 10:21, Priit Laes a écrit :
> OK, tried my hand at writing a test - the example below works quite well
> with the parse() function in test_layouts.py, but a wild infinite loop
> appears when testing with the regular pdf writer..:)


The tests and the public API use a different UA stylesheet.

weasyprint/css/html5_ua.css
weasyprint/css/tests_ua.css

The former has a page margin of 2cm on each side (we usually want some 
margin in real documents) while the later has no page margin (tests want 
this more often than not.)

2 * 2cm is about 150px, so your document ends up with zero (or 
negative??) page content height. This is a real bug that should also be 
checked in a second test (with rows higher than the page).

The fix is that we let the first line of text / table row overflow if it 
happens to be higher than the page, so that we can advance in the 
document. See where blocks.py uses the "page_is_empty" variable.

By the way, CSS pixels are kind of meaningless for PDF (they’re always 
one 96th of an inch) but they are still the internal length unit.

-- 
Simon Sapin